Skip to content

[checks] Implement Python checks and fix some violations#9146

Open
eisenwave wants to merge 3 commits into
cplusplus:mainfrom
eisenwave:python-checks
Open

[checks] Implement Python checks and fix some violations#9146
eisenwave wants to merge 3 commits into
cplusplus:mainfrom
eisenwave:python-checks

Conversation

@eisenwave

@eisenwave eisenwave commented Jun 17, 2026

Copy link
Copy Markdown
Member

Fixes #8086

I've mentioned the option to improve the checks in Croydon to @tkoeppe, and he seemed generally open to the idea.

grafik

In summary, this PR converts all the checks we have in check-source.sh (plus some additional once adopted from check-output.sh) and turns them into a Python script. The benefits are:

  • Much nicer formatting for errors. You get VSCode-friendly source locations that you can use to navigate directly to the line and column in an editor. You also get a citation of the source with the affected column span highlighted, so you're not left guessing where the error is located.
  • No more hieroglyphic sed and grep piping. The code is relatively simple Python. It still uses a bunch of regular expressions for a lot of checks; the greatest benefits are in the more complex checks.
  • Ability to write test cases for checks using EXPECTCHECKNEXTLINE.
  • Ability to enable and disable individual checks for source spans using NOCHECKBEGIN and other directives. This makes it possible for us to add much more aggressive checks that would have previously been infeasible because they generate too much noise in old TeX sources.
    • A good example of that is in this PR, where I disable the check for \tcode{\exposid{...}} in a part of numerics.tex because there are too many violations.
    • We could also use this to e.g. disallow use of \textit in "newer" TeX ranges for example without having to fix every single violation. We could also use it to catch many "bad words" or "bad phrases" that appear too often to be checked right now, but that we don't want perpetuated.
    • We could also use this to enforce a little bit of auto-enforcement of semantic line breaks (such as requiring a new line after a sentence period) without needing to fix the countless existing violations.
  • With check suppression, it is possible to suppress checks that would have inevitably failed during motions. For example, if there is a dangling \ref to a section that is introduced by another paper, you can still get a green build by suppressing the \ref check (that is, at least for the check-source checks).

There are a few notable new checks that make things a lot easier in the future:

  1. base-env-balancing checks for balancing of \begin and \end commands. It's quite easy to get it wrong, and when you do, it tends to blow up after minutes during the TeX build in bizarre and hard-to-debug ways.
  2. base-unknown-command checks for use of TeX commands that have never appeared before in the code base. It's very unlikely that this is intentional in most PRs, and very likely that someone just wrote \tocde instead of \tcode or something else stupid. If we need a new TeX command, we can just add it to the list.

@eisenwave eisenwave requested a review from jensmaurer June 17, 2026 18:28
@eisenwave

Copy link
Copy Markdown
Member Author

Some more neat examples:

grafik grafik grafik grafik

@eisenwave

Copy link
Copy Markdown
Member Author

Latest innovation:
grafik

@eisenwave eisenwave changed the title Implement Python checks and fix some violations [checks] Implement Python checks and fix some violations Jun 17, 2026
Comment thread tools/test-checks.tex
%EXPECTCHECKNEXTLINE(base-tcode-exposid)
\tcode{\exposid{snake-case}}
%EXPECTCHECKNEXTLINE(base-tcode-exposid)
\tcode{\exposid{kebak-case}}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean "kebab-case"? Also, that snake-case one isn't actually snake_case.

Comment thread .gitignore
*.synctex.gz
*.synctex*
.check.stamp
__pycache__

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it won't matter until we have multiple Python modules, you might want to also ignore *.pyc, *.egg and *.egg-info.

Comment thread pyrightconfig.json
"reportUnknownParameterType": false,
"reportUnusedImport": true,
"reportUnusedVariable": true,
"pythonVersion": "3.11"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 3.11 specifically? That version of Python is EoL in about a year.

For the CI we use Ubuntu 24.04. This ships with Python 3.12 - the current Python release is 3.14

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just blindly copied and pasted this from another project, so no particular reason.

Comment on lines +24 to +30
- name: setup Node
uses: actions/setup-node@v6
with:
node-version: "20"

- name: install pyright
run: npm install -g pyright

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure pyright is a good call. There are other alternatives that do not require pulling in node as dependency.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just use it because it integrates nicely with VSCode out of the box. What would you recommend instead?

Comment thread tools/check-source.py
Comment on lines +19 to +21
# ==================================================================================================
# Lookup tables
# ==================================================================================================

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of those section comment headers. Would it be viable to chop this >2k LoC module up into several smaller modules?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some merit to pulling the lookup tables into a separate Python file just because they are a bit of clutter and make the script look a lot bigger than it actually is. The rest is just around 1K lines, and modulerization probably wouldn't improve readability much.

Comment thread tools/check-source.py
message: str,
) -> None:
"""Report a failure at `self.file_path`."""
emit_check_failure(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emit_check_failure is only called from within classes deriving from Check. Can you fold it into fail and consistently use this one?

Comment thread tools/check-source.py
class Check(ABC):
"""Base class for all checks."""

id: str = ""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is a Python builtin, please don't use it for identifiers. Also I'm not entirely sure what you're trying to do here - this id is a class attribute not an instance attribute.

Did you mean to say

class Check(ABC):
  uid: str
  def __init__(self, uid: str = ""):
    self.uid = uid

# ...
class Foo(Check):
  def __init__(self, uid: str, **whatever):
    super().__init__(uid)

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even know that there is such a thing as class attributes and instance attributes, but okay. I thought the "" is a default member initializer or something, and I think I used that at some point but then it became pointless.

Comment thread tools/check-source.py
file_locations.clear()

for fp in tex_files:
file_locations[os.path.relpath(fp)] = fp

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't mix os.path and pathlib usage. pathlib.Path exposes this functionality as relative_to().

Comment thread tools/check-source.py
Comment on lines +2084 to +2101
if directive == "BEGIN":
if cid == "*":
active.clear()
else:
active = {a for a in active if not fnmatch.fnmatch(a, cid)}
elif directive == "END":
if cid == "*":
active = set(all_ids)
else:
active |= {a for a in all_ids if fnmatch.fnmatch(a, cid)}
elif directive == "NEXTLINE":
if cid == "*":
for c in all_ids:
skip_next[c].add(line_num + 1)
else:
matched = {a for a in all_ids if fnmatch.fnmatch(a, cid)}
for c in matched:
skip_next[c].add(line_num + 1)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have some code duplication here

Comment thread tools/check-source.py
file=sys.stderr,
)

exit_code = 1 if num_failures > 0 else 0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do num_failures > 0 here. If you insist on getting an int, do int(num_failures > 0) but it's not necessary.

@eisenwave

Copy link
Copy Markdown
Member Author

@Tsche thanks a lot for the review. It'll take some time to clean it all up.

Comment thread source/numerics.tex
Comment on lines +17728 to 17730
%NOCHECKBEGIN(base-tcode-exposid)
Every type in the parameter pack \tcode{Flags} is one of \tcode{\exposid{convert-flag}},
\tcode{\exposid{aligned-flag}}, or \tcode{\exposid{over\-aligned-\brk{}flag}<N>}.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just remove the \tcode wrapping around \exposid here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could. It would be something like 10 separate fixes.

I think I'll just do a surgical fix in a separate PR for these cleanups so that this PR doesn't have to clean up any existing sources.

The problem with the existing check is that the regex pattern doesn't work on hyphenated identifiers.

Comment thread source/numerics.tex
\pnum
\recommended Implementations should support implicit conversions between
\recommended
Implementations should support implicit conversions between

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move all specific fixes to a separate commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[checks] Create a better check-source.sh in Python

3 participants