Skip to content

FEAT: Standardize system_prompt as a first-class consumed attack argument#2040

Open
adrian-gavrila wants to merge 5 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/standardize-system-prompt
Open

FEAT: Standardize system_prompt as a first-class consumed attack argument#2040
adrian-gavrila wants to merge 5 commits into
microsoft:mainfrom
adrian-gavrila:adrian-gavrila/standardize-system-prompt

Conversation

@adrian-gavrila

@adrian-gavrila adrian-gavrila commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

Makes system_prompt= a first-class, consumed attack argument that always delivers to the
objective (system-under-test) target. Previously there were three inconsistent ways to set the
objective target's system prompt, including a SingleTurnAttackContext.system_prompt field that was
declared but never consumed (a silent no-op).

This standardizes on a single mechanism:

  • system_prompt is lifted to AttackParameters, so both single-turn and multi-turn attacks accept
    it with one source of truth.
  • It is lowered to a single system-role message on prepended_conversation at the
    AttackStrategy.execute_with_context_async chokepoint — the one path that both single-shot
    (execute_async) and batched (AttackExecutor) runs cross — so delivery is structurally
    guaranteed and runs exactly once per task (outside the retry loop).
  • Supplying both system_prompt= and a system-role message in prepended_conversation raises a
    clear ValueError (one source of truth).
  • The dead SingleTurnAttackContext.system_prompt field is removed.
  • Self-seeding attacks that exclude prepended_conversation from their params (flip_attack,
    skeleton_key, many_shot_jailbreak, context_compliance, role_play, sequential_attack)
    also exclude system_prompt, rejecting it explicitly rather than silently dropping it.

prepended_conversation= remains the advanced path for full multi-message seeds.

Implements ADO #9697 (framework standardization track). The CoPyRIT GUI half of that story is tracked
separately and is not part of this PR.

Tests and Documentation

  • Tests: added/updated coverage for lowering at the chokepoint, the AttackExecutor batch path
    (regression test for the executor-bypass case), end-to-end delivery to the target, the
    both-supplied ValueError, the self-seeding carve-outs, and single/multi-turn parity. Replaced the
    previous context.system_prompt == ... assertion (which locked in the no-op) with a behavioral
    assertion. Full unit suite green (9785 passed, 119 skipped).
  • Documentation: added a "Setting a system prompt" section to
    doc/code/executor/3_attack_configuration (the attack-inputs page), alongside the existing
    prepended_conversation material, plus a system_prompt row in the inputs table. Regenerated the
    paired notebook with JupyText (executed).

adrian-gavrila and others added 3 commits June 17, 2026 20:43
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Behavior is grep-discoverable, runtime-enforced, and test-covered; the section
did not clear the bar this slim instruction file sets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adrian-gavrila adrian-gavrila marked this pull request as ready for review June 18, 2026 13:08
@adrian-gavrila adrian-gavrila requested a review from Copilot June 18, 2026 13:10
adrian-gavrila and others added 2 commits June 18, 2026 09:17
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…prompt' into adrian-gavrila/standardize-system-prompt

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Human review recommended

It changes a core execution chokepoint and parameter contract across many attacks, so a human reviewer should validate compatibility and any downstream behavioral impact beyond the added unit coverage.

Pull request overview

Standardizes system_prompt= as a first-class, consumed attack argument by lifting it into AttackParameters and reliably lowering it into a single leading system message at the shared AttackStrategy.execute_with_context_async entrypoint, ensuring delivery for both direct (execute_async) and executor-driven (AttackExecutor) runs.

Changes:

  • Add system_prompt: str | None to AttackParameters and lower it into context.prepended_conversation at AttackStrategy.execute_with_context_async, with a conflict ValueError when a system-role prepended message is already present.
  • Remove the dead SingleTurnAttackContext.system_prompt field and update tests to assert behavioral delivery rather than no-op state.
  • Explicitly exclude system_prompt from self-seeding / internally-constructed prompt attacks’ params_type and add unit coverage for rejection and executor-path regression.
File summaries
File Description
pyrit/executor/attack/core/attack_parameters.py Adds system_prompt to the canonical attack parameter contract.
pyrit/executor/attack/core/attack_strategy.py Lowers system_prompt into a prepended system message at the shared chokepoint and enforces conflict rules.
pyrit/executor/attack/single_turn/single_turn_attack_strategy.py Removes unused SingleTurnAttackContext.system_prompt field.
pyrit/executor/attack/single_turn/flip_attack.py Excludes system_prompt from a self-seeding attack’s accepted params.
pyrit/executor/attack/single_turn/skeleton_key.py Excludes system_prompt from a self-seeding attack’s accepted params.
pyrit/executor/attack/single_turn/many_shot_jailbreak.py Excludes system_prompt from a self-seeding attack’s accepted params.
pyrit/executor/attack/single_turn/context_compliance.py Excludes system_prompt from a self-seeding attack’s accepted params.
pyrit/executor/attack/single_turn/role_play.py Excludes system_prompt from a self-seeding attack’s accepted params.
pyrit/executor/attack/compound/sequential_attack.py Excludes system_prompt from compound attack per-call overrides.
tests/unit/executor/attack/core/test_attack_strategy.py Adds unit coverage for lowering behavior, ordering, conflict errors, and executor-bypass simulation.
tests/unit/executor/attack/core/test_attack_executor.py Regression test ensuring executor-path lowering happens via the shared chokepoint.
tests/unit/executor/attack/single_turn/test_prompt_sending.py Updates assertions to validate lowering behavior and adds delivery-to-conversation-manager test.
tests/unit/executor/attack/single_turn/test_role_play.py Verifies system_prompt exclusion from params_type and explicit rejection at runtime.

Copilot's findings

  • Files reviewed: 13/13 changed files
  • Comments generated: 0

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

@adrian-gavrila adrian-gavrila changed the title [DRAFT] FEAT: Standardize system_prompt as a first-class consumed attack argument FEAT: Standardize system_prompt as a first-class consumed attack argument Jun 18, 2026
"| `objective` | What you are trying to get the **objective target** (the system under test) to do. Drives scoring and multi-turn adversarial prompts. |\n",
"| `memory_labels` | A `dict[str, str]` tagged onto every prompt/response, so you can filter this run later in memory. |\n",
"| `prepended_conversation` | A list of `Message`s to seed the conversation before the attack's own turns (system prompt, prior history). |\n",
"| `system_prompt` | The objective target's system prompt, as a string. The standard one-line way to set it; PyRIT lowers it to a single `system` message at the front of the conversation. Mutually exclusive with a `system` message in `prepended_conversation`. |\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most common case BY FAR. But it's a bit complicated because there can be multiple system prompts. Different models behave differently in these cases, which is interesting to pyrit which has a goal of being flexible.

Because of that, I like it being in prepended_conversation. I think it maps more cleanly to SeedPromptAttackGroups. It is more difficult to add it when manually creating attacks, but I think SeedPrompts is the more common case.

@rlundeen2 rlundeen2 Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, my vote is to not take this change. Keep things general. And make it easy at other layers (e.g. even adding methods to the AttackExecutor is not too low)

# Conversation that is automatically prepended to the target model
prepended_conversation: list[Message] | None = None

# System prompt for the objective target; lowered to a prepended system message

@hannahwestra25 hannahwestra25 Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like "lowered to a perpended system message" is not super clear what it means as in what does lowered mean in this context ?

# turns — for example, to resume a prior conversation or to plant an agreeable assistant reply.
# A prepended conversation seeds the exchange before the attack adds its own turn. For just a system
# prompt, prefer `system_prompt=` above. Reach for a prepended conversation when you need to seed a
# sequence of `system` / `user` / `assistant` turns — for example, to resume a prior conversation or

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i know you mention the error above a few times, but it might be good to specify here that they shouldn't be used together and maybe give more of an explanation why that is

conversation_id: str = field(default_factory=lambda: str(uuid.uuid4()))

# System prompt for chat-based targets
system_prompt: str | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is breaking right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants