Skip to content

FEAT: Adding Garak Remote Datasets#2063

Open
rlundeen2 wants to merge 2 commits into
microsoft:mainfrom
rlundeen2:rlundeen2-garak-dataset-loader
Open

FEAT: Adding Garak Remote Datasets#2063
rlundeen2 wants to merge 2 commits into
microsoft:mainfrom
rlundeen2:rlundeen2-garak-dataset-loader

Conversation

@rlundeen2

Copy link
Copy Markdown
Contributor

PyRIT is porting garak's probing techniques, and many of those techniques depend on reference data (package-name registries, system-prompt libraries, audio jailbreak clips) published under the garak-llm HuggingFace org. This PR adds native seed-dataset loaders for that data so garak techniques can be wired into PyRIT scenarios without bespoke download code.

It introduces a shared _GarakRemoteDataset base (reusing the existing _RemoteDatasetLoader primitives) plus ten registered loaders: seven package-hallucination registries (pypi, npm, crates, rubygems, dart, perl, raku), two system-prompt libraries, and the audio_achilles_heel clip set. All dataset names are prefixed garak_, each row maps to one SeedPrompt with source metadata preserved, and the change ships unit tests (mocked HF data), docs/notebook updates, and the citation.

rlundeen2 and others added 2 commits June 21, 2026 13:22
Adds remote seed-dataset loaders for the datasets hosted under the garak-llm HuggingFace org so garak techniques are easier to use in PyRIT scenarios.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant