FEAT: Standardize system_prompt as a first-class consumed attack argument#2040
FEAT: Standardize system_prompt as a first-class consumed attack argument#2040adrian-gavrila wants to merge 5 commits into
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Behavior is grep-discoverable, runtime-enforced, and test-covered; the section did not clear the bar this slim instruction file sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…prompt' into adrian-gavrila/standardize-system-prompt
There was a problem hiding this comment.
⚠️ Human review recommended
It changes a core execution chokepoint and parameter contract across many attacks, so a human reviewer should validate compatibility and any downstream behavioral impact beyond the added unit coverage.
Pull request overview
Standardizes system_prompt= as a first-class, consumed attack argument by lifting it into AttackParameters and reliably lowering it into a single leading system message at the shared AttackStrategy.execute_with_context_async entrypoint, ensuring delivery for both direct (execute_async) and executor-driven (AttackExecutor) runs.
Changes:
- Add
system_prompt: str | NonetoAttackParametersand lower it intocontext.prepended_conversationatAttackStrategy.execute_with_context_async, with a conflictValueErrorwhen a system-role prepended message is already present. - Remove the dead
SingleTurnAttackContext.system_promptfield and update tests to assert behavioral delivery rather than no-op state. - Explicitly exclude
system_promptfrom self-seeding / internally-constructed prompt attacks’params_typeand add unit coverage for rejection and executor-path regression.
File summaries
| File | Description |
|---|---|
pyrit/executor/attack/core/attack_parameters.py |
Adds system_prompt to the canonical attack parameter contract. |
pyrit/executor/attack/core/attack_strategy.py |
Lowers system_prompt into a prepended system message at the shared chokepoint and enforces conflict rules. |
pyrit/executor/attack/single_turn/single_turn_attack_strategy.py |
Removes unused SingleTurnAttackContext.system_prompt field. |
pyrit/executor/attack/single_turn/flip_attack.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/skeleton_key.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/many_shot_jailbreak.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/context_compliance.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/single_turn/role_play.py |
Excludes system_prompt from a self-seeding attack’s accepted params. |
pyrit/executor/attack/compound/sequential_attack.py |
Excludes system_prompt from compound attack per-call overrides. |
tests/unit/executor/attack/core/test_attack_strategy.py |
Adds unit coverage for lowering behavior, ordering, conflict errors, and executor-bypass simulation. |
tests/unit/executor/attack/core/test_attack_executor.py |
Regression test ensuring executor-path lowering happens via the shared chokepoint. |
tests/unit/executor/attack/single_turn/test_prompt_sending.py |
Updates assertions to validate lowering behavior and adds delivery-to-conversation-manager test. |
tests/unit/executor/attack/single_turn/test_role_play.py |
Verifies system_prompt exclusion from params_type and explicit rejection at runtime. |
Copilot's findings
- Files reviewed: 13/13 changed files
- Comments generated: 0
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
| "| `objective` | What you are trying to get the **objective target** (the system under test) to do. Drives scoring and multi-turn adversarial prompts. |\n", | ||
| "| `memory_labels` | A `dict[str, str]` tagged onto every prompt/response, so you can filter this run later in memory. |\n", | ||
| "| `prepended_conversation` | A list of `Message`s to seed the conversation before the attack's own turns (system prompt, prior history). |\n", | ||
| "| `system_prompt` | The objective target's system prompt, as a string. The standard one-line way to set it; PyRIT lowers it to a single `system` message at the front of the conversation. Mutually exclusive with a `system` message in `prepended_conversation`. |\n", |
There was a problem hiding this comment.
This is the most common case BY FAR. But it's a bit complicated because there can be multiple system prompts. Different models behave differently in these cases, which is interesting to pyrit which has a goal of being flexible.
Because of that, I like it being in prepended_conversation. I think it maps more cleanly to SeedPromptAttackGroups. It is more difficult to add it when manually creating attacks, but I think SeedPrompts is the more common case.
There was a problem hiding this comment.
In other words, my vote is to not take this change. Keep things general. And make it easy at other layers (e.g. even adding methods to the AttackExecutor is not too low)
| # Conversation that is automatically prepended to the target model | ||
| prepended_conversation: list[Message] | None = None | ||
|
|
||
| # System prompt for the objective target; lowered to a prepended system message |
There was a problem hiding this comment.
i feel like "lowered to a perpended system message" is not super clear what it means as in what does lowered mean in this context ?
| # turns — for example, to resume a prior conversation or to plant an agreeable assistant reply. | ||
| # A prepended conversation seeds the exchange before the attack adds its own turn. For just a system | ||
| # prompt, prefer `system_prompt=` above. Reach for a prepended conversation when you need to seed a | ||
| # sequence of `system` / `user` / `assistant` turns — for example, to resume a prior conversation or |
There was a problem hiding this comment.
nit: i know you mention the error above a few times, but it might be good to specify here that they shouldn't be used together and maybe give more of an explanation why that is
| conversation_id: str = field(default_factory=lambda: str(uuid.uuid4())) | ||
|
|
||
| # System prompt for chat-based targets | ||
| system_prompt: str | None = None |
There was a problem hiding this comment.
this is breaking right ?
Description
Makes
system_prompt=a first-class, consumed attack argument that always delivers to theobjective (system-under-test) target. Previously there were three inconsistent ways to set the
objective target's system prompt, including a
SingleTurnAttackContext.system_promptfield that wasdeclared but never consumed (a silent no-op).
This standardizes on a single mechanism:
system_promptis lifted toAttackParameters, so both single-turn and multi-turn attacks acceptit with one source of truth.
system-role message onprepended_conversationat theAttackStrategy.execute_with_context_asyncchokepoint — the one path that both single-shot(
execute_async) and batched (AttackExecutor) runs cross — so delivery is structurallyguaranteed and runs exactly once per task (outside the retry loop).
system_prompt=and asystem-role message inprepended_conversationraises aclear
ValueError(one source of truth).SingleTurnAttackContext.system_promptfield is removed.prepended_conversationfrom their params (flip_attack,skeleton_key,many_shot_jailbreak,context_compliance,role_play,sequential_attack)also exclude
system_prompt, rejecting it explicitly rather than silently dropping it.prepended_conversation=remains the advanced path for full multi-message seeds.Implements ADO #9697 (framework standardization track). The CoPyRIT GUI half of that story is tracked
separately and is not part of this PR.
Tests and Documentation
AttackExecutorbatch path(regression test for the executor-bypass case), end-to-end delivery to the target, the
both-supplied
ValueError, the self-seeding carve-outs, and single/multi-turn parity. Replaced theprevious
context.system_prompt == ...assertion (which locked in the no-op) with a behavioralassertion. Full unit suite green (9785 passed, 119 skipped).
doc/code/executor/3_attack_configuration(the attack-inputs page), alongside the existingprepended_conversationmaterial, plus asystem_promptrow in the inputs table. Regenerated thepaired notebook with JupyText (executed).