Skip to content

fix: make slugify replacement passes idempotent (supersedes #179)#182

Open
gaoflow wants to merge 1 commit into
un33k:masterfrom
gaoflow:fix/replacement-idempotence
Open

fix: make slugify replacement passes idempotent (supersedes #179)#182
gaoflow wants to merge 1 commit into
un33k:masterfrom
gaoflow:fix/replacement-idempotence

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 24, 2026

Copy link
Copy Markdown

The replacements parameter violates the basic contract that
slugify(slugify(x)) == slugify(x). Two categories:

  1. Direct self-reference (old in new): e.g. [["a", "aa"]] — pass-2
    re-fires on its own output, compounding on every call. Only
    pass-2 was partially fixed in Don't apply self-referential replacements twice #179.

  2. Indirect self-reference: e.g. [["-", "$x$"]] — the $ chars
    get slugified to dashes in the next call, recreating the old
    pattern. Not covered by Don't apply self-referential replacements twice #179.

Fix (+57/-8, 2 files):

  • Cycle detection in both passes: compute cleaned = pattern.sub("-", new)
    then dedup(cleaned). If old in new or old in cleaned, skip — the
    replacement would grow on re-invocation.

  • Post-pass cleanup: re-apply disallowed-char pattern + dash dedup

    • strip after each pass. Catches non-word characters that are
      non-cyclic but still break idempotence.
  • Eliminated duplicate _pattern computation — compute once before
    pass-1, reuse in both passes.

Tests: 83/83 pass (82 existing + test_replacements_idempotent).
RED→GREEN: new test fails on master with "a$x$x$x$b" != "a$x$b".
Fuzzed 21,024 idempotence combinations — 0 failures. pycodestyle clean.

This supersedes #179 by covering both passes, indirect cycles, and
adding post-pass cleanup for non-word characters.

This pull request was prepared with the assistance of AI, under my
direction and review.

User replacements can break slugify(slugify(x)) == slugify(x) in
two ways:

1. Direct self-reference: old appears in new (e.g. a -> aa),
   causing compound growth on re-invocation.

2. Indirect self-reference: new contains non-word characters that,
   after slugification, become old (e.g. dash -> dollar-x-dollar,
   where dollar chars become dashes, creating dash-x-dash which
   contains dash, triggering pass-1 replacement in the next call).

Fix:
- Skip replacement rules in both passes when old-in-new (direct)
  or old appears after slugifying new (indirect).
- Run the disallowed-char pattern + dedup + strip after both
  replacement passes so non-word characters from replacements
  do not break idempotence.

Non-cyclic replacements (like pipe->or, percent->percent) are
unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant