FEAT: Standardize system_prompt as a first-class consumed attack argument by adrian-gavrila · Pull Request #2040 · microsoft/PyRIT

adrian-gavrila · 2026-06-18T00:56:39Z

Description

Makes system_prompt= a first-class, consumed attack argument that always delivers to the
objective (system-under-test) target. Previously there were three inconsistent ways to set the
objective target's system prompt, including a SingleTurnAttackContext.system_prompt field that was
declared but never consumed (a silent no-op).

This standardizes on a single mechanism:

system_prompt is lifted to AttackParameters, so both single-turn and multi-turn attacks accept
it with one source of truth.
It is lowered to a single system-role message on prepended_conversation at the
AttackStrategy.execute_with_context_async chokepoint — the one path that both single-shot
(execute_async) and batched (AttackExecutor) runs cross — so delivery is structurally
guaranteed and runs exactly once per task (outside the retry loop).
Supplying both system_prompt= and a system-role message in prepended_conversation raises a
clear ValueError (one source of truth).
The dead SingleTurnAttackContext.system_prompt field is removed.
Self-seeding attacks that exclude prepended_conversation from their params (flip_attack,
skeleton_key, many_shot_jailbreak, context_compliance, role_play, sequential_attack)
also exclude system_prompt, rejecting it explicitly rather than silently dropping it.

prepended_conversation= remains the advanced path for full multi-message seeds.

Implements ADO #9697 (framework standardization track). The CoPyRIT GUI half of that story is tracked
separately and is not part of this PR.

Tests and Documentation

Tests: added/updated coverage for lowering at the chokepoint, the AttackExecutor batch path
(regression test for the executor-bypass case), end-to-end delivery to the target, the
both-supplied ValueError, the self-seeding carve-outs, and single/multi-turn parity. Replaced the
previous context.system_prompt == ... assertion (which locked in the no-op) with a behavioral
assertion. Full unit suite green (9785 passed, 119 skipped).
Documentation: added a "Setting a system prompt" section to
doc/code/executor/3_attack_configuration (the attack-inputs page), alongside the existing
prepended_conversation material, plus a system_prompt row in the inputs table. Regenerated the
paired notebook with JupyText (executed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Behavior is grep-discoverable, runtime-enforced, and test-covered; the section did not clear the bar this slim instruction file sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…prompt' into adrian-gavrila/standardize-system-prompt

Copilot

⚠️ Human review recommended

It changes a core execution chokepoint and parameter contract across many attacks, so a human reviewer should validate compatibility and any downstream behavioral impact beyond the added unit coverage.

Pull request overview

Standardizes system_prompt= as a first-class, consumed attack argument by lifting it into AttackParameters and reliably lowering it into a single leading system message at the shared AttackStrategy.execute_with_context_async entrypoint, ensuring delivery for both direct (execute_async) and executor-driven (AttackExecutor) runs.

Changes:

Add system_prompt: str | None to AttackParameters and lower it into context.prepended_conversation at AttackStrategy.execute_with_context_async, with a conflict ValueError when a system-role prepended message is already present.
Remove the dead SingleTurnAttackContext.system_prompt field and update tests to assert behavioral delivery rather than no-op state.
Explicitly exclude system_prompt from self-seeding / internally-constructed prompt attacks’ params_type and add unit coverage for rejection and executor-path regression.

File summaries

File	Description
`pyrit/executor/attack/core/attack_parameters.py`	Adds `system_prompt` to the canonical attack parameter contract.
`pyrit/executor/attack/core/attack_strategy.py`	Lowers `system_prompt` into a prepended system message at the shared chokepoint and enforces conflict rules.
`pyrit/executor/attack/single_turn/single_turn_attack_strategy.py`	Removes unused `SingleTurnAttackContext.system_prompt` field.
`pyrit/executor/attack/single_turn/flip_attack.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/skeleton_key.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/many_shot_jailbreak.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/context_compliance.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/single_turn/role_play.py`	Excludes `system_prompt` from a self-seeding attack’s accepted params.
`pyrit/executor/attack/compound/sequential_attack.py`	Excludes `system_prompt` from compound attack per-call overrides.
`tests/unit/executor/attack/core/test_attack_strategy.py`	Adds unit coverage for lowering behavior, ordering, conflict errors, and executor-bypass simulation.
`tests/unit/executor/attack/core/test_attack_executor.py`	Regression test ensuring executor-path lowering happens via the shared chokepoint.
`tests/unit/executor/attack/single_turn/test_prompt_sending.py`	Updates assertions to validate lowering behavior and adds delivery-to-conversation-manager test.
`tests/unit/executor/attack/single_turn/test_role_play.py`	Verifies `system_prompt` exclusion from params_type and explicit rejection at runtime.

Copilot's findings

Files reviewed: 13/13 changed files
Comments generated: 0

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

rlundeen2 · 2026-06-19T21:14:18Z

    "| `objective` | What you are trying to get the **objective target** (the system under test) to do. Drives scoring and multi-turn adversarial prompts. |\n",
    "| `memory_labels` | A `dict[str, str]` tagged onto every prompt/response, so you can filter this run later in memory. |\n",
-    "| `prepended_conversation` | A list of `Message`s to seed the conversation before the attack's own turns (system prompt, prior history). |\n",
+    "| `system_prompt` | The objective target's system prompt, as a string. The standard one-line way to set it; PyRIT lowers it to a single `system` message at the front of the conversation. Mutually exclusive with a `system` message in `prepended_conversation`. |\n",


This is the most common case BY FAR. But it's a bit complicated because there can be multiple system prompts. Different models behave differently in these cases, which is interesting to pyrit which has a goal of being flexible.

Because of that, I like it being in prepended_conversation. I think it maps more cleanly to SeedPromptAttackGroups. It is more difficult to add it when manually creating attacks, but I think SeedPrompts is the more common case.

In other words, my vote is to not take this change. Keep things general. And make it easy at other layers (e.g. even adding methods to the AttackExecutor is not too low)

hannahwestra25 · 2026-06-22T16:23:58Z

    # Conversation that is automatically prepended to the target model
    prepended_conversation: list[Message] | None = None

+    # System prompt for the objective target; lowered to a prepended system message


i feel like "lowered to a perpended system message" is not super clear what it means as in what does lowered mean in this context ?

hannahwestra25 · 2026-06-22T16:33:26Z

-# turns — for example, to resume a prior conversation or to plant an agreeable assistant reply.
+# A prepended conversation seeds the exchange before the attack adds its own turn. For just a system
+# prompt, prefer `system_prompt=` above. Reach for a prepended conversation when you need to seed a
+# sequence of `system` / `user` / `assistant` turns — for example, to resume a prior conversation or


nit: i know you mention the error above a few times, but it might be good to specify here that they shouldn't be used together and maybe give more of an explanation why that is

hannahwestra25 · 2026-06-22T16:34:16Z

    conversation_id: str = field(default_factory=lambda: str(uuid.uuid4()))

-    # System prompt for chat-based targets
-    system_prompt: str | None = None


this is breaking right ?

adrian-gavrila and others added 3 commits June 17, 2026 20:43

Standardize system_prompt as a first-class consumed attack argument

eabd84b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove system_prompt section from attacks instructions

ccbbb4a

Behavior is grep-discoverable, runtime-enforced, and test-covered; the section did not clear the bar this slim instruction file sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into adrian-gavrila/standardize-system-prompt

21d7f9a

adrian-gavrila marked this pull request as ready for review June 18, 2026 13:08

adrian-gavrila requested a review from Copilot June 18, 2026 13:10

Copilot started reviewing on behalf of adrian-gavrila June 18, 2026 13:11 View session

adrian-gavrila and others added 2 commits June 18, 2026 09:17

Add system_prompt example to attack configuration doc

1f08821

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge remote-tracking branch 'fork/adrian-gavrila/standardize-system-…

7508400

…prompt' into adrian-gavrila/standardize-system-prompt

Copilot AI reviewed Jun 18, 2026

View reviewed changes

adrian-gavrila changed the title ~~[DRAFT] FEAT: Standardize system_prompt as a first-class consumed attack argument~~ FEAT: Standardize system_prompt as a first-class consumed attack argument Jun 18, 2026

adrian-gavrila mentioned this pull request Jun 19, 2026

FEAT: set the objective target's system prompt from the CoPyRIT GUI #2056

Open

rlundeen2 reviewed Jun 19, 2026

View reviewed changes

hannahwestra25 reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Standardize system_prompt as a first-class consumed attack argument#2040

FEAT: Standardize system_prompt as a first-class consumed attack argument#2040
adrian-gavrila wants to merge 5 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/standardize-system-prompt

adrian-gavrila commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

rlundeen2 Jun 19, 2026

Uh oh!

rlundeen2 Jun 22, 2026 •

edited

Loading

Uh oh!

hannahwestra25 Jun 22, 2026 •

edited

Loading

Uh oh!

hannahwestra25 Jun 22, 2026

Uh oh!

hannahwestra25 Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

adrian-gavrila commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

⚠️ Human review recommended

Copilot's findings

Uh oh!

rlundeen2 Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahwestra25 Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahwestra25 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

hannahwestra25 Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adrian-gavrila commented Jun 18, 2026 •

edited

Loading

rlundeen2 Jun 22, 2026 •

edited

Loading

hannahwestra25 Jun 22, 2026 •

edited

Loading