[checks] Implement Python checks and fix some violations#9146
[checks] Implement Python checks and fix some violations#9146eisenwave wants to merge 3 commits into
Conversation
| %EXPECTCHECKNEXTLINE(base-tcode-exposid) | ||
| \tcode{\exposid{snake-case}} | ||
| %EXPECTCHECKNEXTLINE(base-tcode-exposid) | ||
| \tcode{\exposid{kebak-case}} |
There was a problem hiding this comment.
did you mean "kebab-case"? Also, that snake-case one isn't actually snake_case.
| *.synctex.gz | ||
| *.synctex* | ||
| .check.stamp | ||
| __pycache__ |
There was a problem hiding this comment.
While it won't matter until we have multiple Python modules, you might want to also ignore *.pyc, *.egg and *.egg-info.
| "reportUnknownParameterType": false, | ||
| "reportUnusedImport": true, | ||
| "reportUnusedVariable": true, | ||
| "pythonVersion": "3.11" |
There was a problem hiding this comment.
why 3.11 specifically? That version of Python is EoL in about a year.
For the CI we use Ubuntu 24.04. This ships with Python 3.12 - the current Python release is 3.14
There was a problem hiding this comment.
I just blindly copied and pasted this from another project, so no particular reason.
| - name: setup Node | ||
| uses: actions/setup-node@v6 | ||
| with: | ||
| node-version: "20" | ||
|
|
||
| - name: install pyright | ||
| run: npm install -g pyright |
There was a problem hiding this comment.
I'm not sure pyright is a good call. There are other alternatives that do not require pulling in node as dependency.
There was a problem hiding this comment.
I just use it because it integrates nicely with VSCode out of the box. What would you recommend instead?
| # ================================================================================================== | ||
| # Lookup tables | ||
| # ================================================================================================== |
There was a problem hiding this comment.
I'm really not a fan of those section comment headers. Would it be viable to chop this >2k LoC module up into several smaller modules?
There was a problem hiding this comment.
There may be some merit to pulling the lookup tables into a separate Python file just because they are a bit of clutter and make the script look a lot bigger than it actually is. The rest is just around 1K lines, and modulerization probably wouldn't improve readability much.
| message: str, | ||
| ) -> None: | ||
| """Report a failure at `self.file_path`.""" | ||
| emit_check_failure( |
There was a problem hiding this comment.
emit_check_failure is only called from within classes deriving from Check. Can you fold it into fail and consistently use this one?
| class Check(ABC): | ||
| """Base class for all checks.""" | ||
|
|
||
| id: str = "" |
There was a problem hiding this comment.
id is a Python builtin, please don't use it for identifiers. Also I'm not entirely sure what you're trying to do here - this id is a class attribute not an instance attribute.
Did you mean to say
class Check(ABC):
uid: str
def __init__(self, uid: str = ""):
self.uid = uid
# ...
class Foo(Check):
def __init__(self, uid: str, **whatever):
super().__init__(uid)?
There was a problem hiding this comment.
I didn't even know that there is such a thing as class attributes and instance attributes, but okay. I thought the "" is a default member initializer or something, and I think I used that at some point but then it became pointless.
| file_locations.clear() | ||
|
|
||
| for fp in tex_files: | ||
| file_locations[os.path.relpath(fp)] = fp |
There was a problem hiding this comment.
please don't mix os.path and pathlib usage. pathlib.Path exposes this functionality as relative_to().
| if directive == "BEGIN": | ||
| if cid == "*": | ||
| active.clear() | ||
| else: | ||
| active = {a for a in active if not fnmatch.fnmatch(a, cid)} | ||
| elif directive == "END": | ||
| if cid == "*": | ||
| active = set(all_ids) | ||
| else: | ||
| active |= {a for a in all_ids if fnmatch.fnmatch(a, cid)} | ||
| elif directive == "NEXTLINE": | ||
| if cid == "*": | ||
| for c in all_ids: | ||
| skip_next[c].add(line_num + 1) | ||
| else: | ||
| matched = {a for a in all_ids if fnmatch.fnmatch(a, cid)} | ||
| for c in matched: | ||
| skip_next[c].add(line_num + 1) |
There was a problem hiding this comment.
you have some code duplication here
| file=sys.stderr, | ||
| ) | ||
|
|
||
| exit_code = 1 if num_failures > 0 else 0 |
There was a problem hiding this comment.
you can just do num_failures > 0 here. If you insist on getting an int, do int(num_failures > 0) but it's not necessary.
|
@Tsche thanks a lot for the review. It'll take some time to clean it all up. |
| %NOCHECKBEGIN(base-tcode-exposid) | ||
| Every type in the parameter pack \tcode{Flags} is one of \tcode{\exposid{convert-flag}}, | ||
| \tcode{\exposid{aligned-flag}}, or \tcode{\exposid{over\-aligned-\brk{}flag}<N>}. |
There was a problem hiding this comment.
I think we can just remove the \tcode wrapping around \exposid here?
There was a problem hiding this comment.
Yeah, we could. It would be something like 10 separate fixes.
I think I'll just do a surgical fix in a separate PR for these cleanups so that this PR doesn't have to clean up any existing sources.
The problem with the existing check is that the regex pattern doesn't work on hyphenated identifiers.
| \pnum | ||
| \recommended Implementations should support implicit conversions between | ||
| \recommended | ||
| Implementations should support implicit conversions between |
There was a problem hiding this comment.
Please move all specific fixes to a separate commit.





Fixes #8086
I've mentioned the option to improve the checks in Croydon to @tkoeppe, and he seemed generally open to the idea.
In summary, this PR converts all the checks we have in
check-source.sh(plus some additional once adopted fromcheck-output.sh) and turns them into a Python script. The benefits are:sedandgreppiping. The code is relatively simple Python. It still uses a bunch of regular expressions for a lot of checks; the greatest benefits are in the more complex checks.EXPECTCHECKNEXTLINE.NOCHECKBEGINand other directives. This makes it possible for us to add much more aggressive checks that would have previously been infeasible because they generate too much noise in old TeX sources.\tcode{\exposid{...}}in a part ofnumerics.texbecause there are too many violations.\textitin "newer" TeX ranges for example without having to fix every single violation. We could also use it to catch many "bad words" or "bad phrases" that appear too often to be checked right now, but that we don't want perpetuated.\refto a section that is introduced by another paper, you can still get a green build by suppressing the\refcheck (that is, at least for thecheck-sourcechecks).There are a few notable new checks that make things a lot easier in the future:
base-env-balancingchecks for balancing of\beginand\endcommands. It's quite easy to get it wrong, and when you do, it tends to blow up after minutes during the TeX build in bizarre and hard-to-debug ways.base-unknown-commandchecks for use of TeX commands that have never appeared before in the code base. It's very unlikely that this is intentional in most PRs, and very likely that someone just wrote\tocdeinstead of\tcodeor something else stupid. If we need a new TeX command, we can just add it to the list.