[BREAKING] MAINT: Standardize garak.encoding defaults and fix atomic-attack name collisions#2058
Open
varunj-msft wants to merge 1 commit into
Conversation
rlundeen2
reviewed
Jun 20, 2026
| """ | ||
|
|
||
| # Aggregate member | ||
| # Aggregate members |
Contributor
There was a problem hiding this comment.
Should we make this run more attacks? What's the run time, and how does it compare to other scanners?
I think target of 10-20 minutes is good and this may finish too qquickly
rlundeen2
reviewed
Jun 20, 2026
| # ``encoding_name`` drives strategy selection and user-facing grouping (display_group); | ||
| # ``variant_slug`` is unique per row so that atomic-attack names stay unique even when one | ||
| # encoding name maps to multiple converter variants (e.g. base64, ascii85). | ||
| # NOTE: some base64 variants are near-duplicates (default == standard_b64encode; b2a only |
Contributor
There was a problem hiding this comment.
should we trim the near duplicates base64 now with the version dump?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Part of the Standardizing Scenarios effort. This standardizes the garak.encoding
scenario so its default run is fast and representative instead of exhaustive, and
fixes a latent atomic-attack naming bug.
Two changes:
Default strategy is now a curated DEFAULT aggregate (Base16, ROT13, MorseCode —
one base-N, one substitution cipher, one symbolic alphabet) instead of ALL. This
drops a default run from 106 to 16 atomic attacks. ALL is still available for an
exhaustive run. The fast path is
--strategies rot13 --max-dataset-size 1.Atomic-attack names were not unique: every converter variant of an encoding shared
the encoding name (e.g. all four base64 variants × five prompt configs were named
"base64"). Since results and the display map are keyed by atomic_attack_name, those
collapsed to a single key, corrupting result tracking and --resume. Names are now
unique per variant (e.g. base64_urlsafe_decode0), and display_group keeps the
per-encoding grouping in reports.
The encoding axis here is the scheme, not an attack technique, so SINGLE_TURN/MULTI_TURN
aggregates don't apply and are intentionally not added.
Breaking: same constructor call now produces different default atomic attacks, so the
scenario VERSION is bumped 1 -> 2. On --resume against an old result this raises a clear
ValueError instead of silently merging incompatible runs. Public API and constructor
signatures are unchanged.
Also adds a backend regression test pinning the real EncodingDatasetConfiguration
round-trip, since the backend silently degrades a lost config subclass to a base config.
Tests and Documentation
Tests (tests/unit/scenario/garak/test_encoding.py, tests/unit/backend/test_scenario_run_service.py):
Full scenario unit suite passes (708). ruff, ruff format, and ty are clean.
Documentation (doc/scanner/garak.py + .ipynb):