fix: make slugify replacement passes idempotent (supersedes #179)#182
Open
gaoflow wants to merge 1 commit into
Open
fix: make slugify replacement passes idempotent (supersedes #179)#182gaoflow wants to merge 1 commit into
gaoflow wants to merge 1 commit into
Conversation
User replacements can break slugify(slugify(x)) == slugify(x) in two ways: 1. Direct self-reference: old appears in new (e.g. a -> aa), causing compound growth on re-invocation. 2. Indirect self-reference: new contains non-word characters that, after slugification, become old (e.g. dash -> dollar-x-dollar, where dollar chars become dashes, creating dash-x-dash which contains dash, triggering pass-1 replacement in the next call). Fix: - Skip replacement rules in both passes when old-in-new (direct) or old appears after slugifying new (indirect). - Run the disallowed-char pattern + dedup + strip after both replacement passes so non-word characters from replacements do not break idempotence. Non-cyclic replacements (like pipe->or, percent->percent) are unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
replacementsparameter violates the basic contract thatslugify(slugify(x)) == slugify(x). Two categories:Direct self-reference (
old in new): e.g.[["a", "aa"]]— pass-2re-fires on its own output, compounding on every call. Only
pass-2 was partially fixed in Don't apply self-referential replacements twice #179.
Indirect self-reference: e.g.
[["-", "$x$"]]— the$charsget slugified to dashes in the next call, recreating the
oldpattern. Not covered by Don't apply self-referential replacements twice #179.
Fix (+57/-8, 2 files):
Cycle detection in both passes: compute
cleaned = pattern.sub("-", new)then
dedup(cleaned). Ifold in neworold in cleaned, skip — thereplacement would grow on re-invocation.
Post-pass cleanup: re-apply disallowed-char pattern + dash dedup
non-cyclic but still break idempotence.
Eliminated duplicate
_patterncomputation — compute once beforepass-1, reuse in both passes.
Tests: 83/83 pass (82 existing +
test_replacements_idempotent).RED→GREEN: new test fails on master with
"a$x$x$x$b" != "a$x$b".Fuzzed 21,024 idempotence combinations — 0 failures. pycodestyle clean.
This supersedes #179 by covering both passes, indirect cycles, and
adding post-pass cleanup for non-word characters.
This pull request was prepared with the assistance of AI, under my
direction and review.