FEAT: Add word-game option to DecompositionConverter by Raulster24 · Pull Request #2051 · microsoft/PyRIT

Raulster24 · 2026-06-18T22:13:17Z

Description

This adds an optional word-game mode to DecompositionConverter (the DrAttack decompose-and-reconstruct converter from #2003), via use_word_game: bool = False. When enabled, each harmful noun phrase is replaced by an innocuous codeword in the reconstruction questions, and a mapping preamble (for example 'apple' means 'a bomb') is established in the same prompt. This is the second half of DrAttack: it further conceals the harmful nouns by splitting them from the request behind codewords.

Off by default, so the merged converter behaviour is unchanged.

Two design choices worth flagging up front:

Inline, not a separate prepended conversation. We had discussed the word-game as a prepended/simulated conversation; I went with inline (preamble and reconstruction in one prompt) for two reasons. First, coupling: the codewords must match the reconstruction the converter builds, and a separate conversation generates its turns independently, so they cannot share the mapping without a stateful component (an attack class), which we wanted to avoid. Inline keeps it a pure converter. Second, the numbers, inline matches the two-turn version, and both are far above no word-game:

n=50, GPT-judge ASR: gpt-4o inline 44% / two-turn 46% / core 16%; gpt-4o-mini inline 52% / two-turn 62% / core 22%.
n=15 with the actual converter (not the harness), gpt-4o-mini: word-game off 20%, on 73%.

So, inline essentially keeps all of the effects on the frontier model, with no new attack class. Open to the prepended-conversation route if you prefer it.

A toggle on the converter, not a separate converter. The codewords have to stay in sync with the reconstruction this converter produces, so a separate converter cannot do it; it has to be a mode of this converter.

Note on the mechanism: the harmful phrase still appears once, in the mapping line; the concealment is that the question uses the codeword, splitting the harmful term from the request. This is the paper's word-game, and the numbers above show the lift.

(All numbers are GPT-judge refusal-bypass, not operational harm, consistent with the #2003 assessment.)

Tests and Documentation

Added unit tests: codeword substitution, off-mode unchanged, custom codewords, and a clear error when there are more noun phrases than codewords.
Documented the new use_word_game parameter in doc/code/converters/1_text_to_text_converters.py; ran JupyText --sync.
ruff check and format clean; ty reports no errors; full converter and docs test suites pass.

cc @rlundeen2 @romanlutz

adrian-gavrila

Thanks for the contribution! A few small things worth attention but overall looks great

adrian-gavrila · 2026-06-19T21:05:07Z

+            word_game_prompt (SeedPrompt | None): Template for the word-game mapping preamble. Defaults
+                to the bundled ``decomposition/word_game_preamble.yaml``. Only used when
+                ``use_word_game`` is True.
+            codewords (tuple[str, ...]): Innocuous codewords substituted for harmful noun phrases when


Nit: Rationale's already on the _CODEWORDS comment and the Raises: block; arg docstrings here stay terse (cf. _MIN_RECALL). Could trim.

trimmed to one line; the bound is already covered by the overflow ValueError

adrian-gavrila · 2026-06-19T21:05:07Z

+        self._word_game_prompt = word_game_prompt or SeedPrompt.from_yaml_file(
+            _DECOMPOSITION_DIR / "word_game_preamble.yaml"
+        )
+        self._codewords = codewords


codewords isn't validated for uniqueness, duplicates silently yield an ambiguous mapping ('apple' means 'bomb'; 'apple' means 'gun'). Worth a fail-fast len(set(codewords)) != len(codewords) check in __init__?

Added a fail-fast len(set(codewords)) != len(codewords) check in init + a Raises doc entry + test_duplicate_codewords_raise

adrian-gavrila · 2026-06-19T21:05:07Z

+                if self._use_word_game:
+                    if noun_index > len(self._codewords):
+                        raise ValueError(
+                            f"word-game supports at most {len(self._codewords)} noun phrases, got {noun_index}"


Nit: noun_index is the first overflowing index (len+1), not the total noun count, so 25 nouns with 20 codewords reports got 21. Maybe reword to a threshold breach.

Reworded to state the threshold breach, no misleading count

…mposition-word-game

…overflow message

Raulster24 · 2026-06-22T07:02:50Z

@adrian-gavrila Thanks for the review. Addressed all three: codeword uniqueness is now validated in init with a test, the arg docstring is trimmed, and the overflow message states the threshold breach instead of a count.

FEAT: Add word-game option to DecompositionConverter

64c84fb

adrian-gavrila self-assigned this Jun 19, 2026

adrian-gavrila reviewed Jun 19, 2026

View reviewed changes

Raulster24 added 2 commits June 22, 2026 10:58

Merge remote-tracking branch 'upstream/main' into raulster24/add-deco…

f7d00c3

…mposition-word-game

Address review: validate codeword uniqueness, trim docstring, reword …

0d78c5e

…overflow message

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add word-game option to DecompositionConverter#2051

FEAT: Add word-game option to DecompositionConverter#2051
Raulster24 wants to merge 3 commits into
microsoft:mainfrom
Raulster24:raulster24/add-decomposition-word-game

Raulster24 commented Jun 18, 2026 •

edited

Loading

Uh oh!

adrian-gavrila left a comment

Uh oh!

adrian-gavrila Jun 19, 2026

Uh oh!

Raulster24 Jun 22, 2026

Uh oh!

adrian-gavrila Jun 19, 2026

Uh oh!

Raulster24 Jun 22, 2026

Uh oh!

adrian-gavrila Jun 19, 2026

Uh oh!

Raulster24 Jun 22, 2026

Uh oh!

Raulster24 commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Raulster24 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

adrian-gavrila left a comment

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Raulster24 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Raulster24 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

adrian-gavrila Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Raulster24 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Raulster24 commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Raulster24 commented Jun 18, 2026 •

edited

Loading