Skip to content

[Microsoft.Extensions.AI.Evaluation.Reporting] Evaluation report redesign#7609

Open
grafanaKibana wants to merge 14 commits into
dotnet:mainfrom
grafanaKibana:ai-eval-report-redesign
Open

[Microsoft.Extensions.AI.Evaluation.Reporting] Evaluation report redesign#7609
grafanaKibana wants to merge 14 commits into
dotnet:mainfrom
grafanaKibana:ai-eval-report-redesign

Conversation

@grafanaKibana

@grafanaKibana grafanaKibana commented Jul 4, 2026

Copy link
Copy Markdown

Closes #7593.

Summary

Redesigns the generated AI Evaluation HTML report (the React app under Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript) to the FluentUI 2 mockup agreed in #7593. It's a visual/UX refresh — the same Dataset / ScenarioRunResult / EvaluationMetric data is read through ReportContext, with no data or API changes.

The old single-page layout is replaced by four focused, tab-based views driven by a shared app shell:

  • Overview — headline KPIs (pass rate, cases failing, scenarios fully passing, good ratings), biggest movers vs. the previous run, a "needs attention" panel, and pass-rate-by-scenario-group.
  • Cases — searchable/taggable case list with inline drill-down (transcript + per-metric panel + diagnostics), pagination, "show failed" filter, and a scenario sort control.
  • History — per-metric trend chart and dumbbell deltas across runs.
  • Comparison — side-by-side A/B comparison of two runs, per scenario/metric.

The same components drive both the standalone report and the Azure DevOps extension tab.

What changed

Restructured components/ into feature folders (previously a flat directory):

Folder Contents
shell/ app frame — App, AppShell (top bar + pivot), SidebarTree, ViewRouter, theming (theme.ts / theme.css), ADO host-resize hook
core/ data/state — ReportContext, Summary, viewModels, scoring
overview/, cases/, history/ the four views + their pieces (TranscriptBlock, MetricPanel, TrendChart, dumbbellGeometry, …)
styles/ shared Griffel makeStyles slots
index.ts public barrel; internal files don't import through it
  • Removed 14 superseded components (App, MetricCard, PassFailBar, ScenarioTree, ScoreNodeHistory, TagsDisplay, …) and the old flat ReportContext/Summary/Styles.
  • Styling moved onto FluentUI 2 design tokens (a theme.css token layer + per-component Griffel slots), with light/dark parity and an acrylic material for elevated surfaces.
  • Added html-report/gen-devdata.js — a deterministic dev-data generator so the report can be run and reviewed locally with realistic multi-run data.

New dependencies

  • react-markdown + remark-gfm — markdown/GFM rendering in transcripts and reasons.
  • Dev/test only: vitest, @testing-library/react, @testing-library/jest-dom, jsdom, @vitest/coverage-v8.

Testing

Adds a test suite (didn't exist before) — 87 tests across 11 files (vitest + Testing Library), covering the load-bearing behavior the redesign must preserve: chronological run ordering, movers/comparison ordering, per-execution re-scoping, the "New" badge, transcript rendering, ADO theme/persistKey, and the Cases sort/filter.

Verification

  • npm run build (tsc composite + vite build) — clean; single-file bundle ~860 kB.
  • npm test — 87/87 pass.
  • eslint — 0 errors (2 inherent react-refresh warnings on the context/entry files).
  • Manually verified both light and dark themes across all four tabs.

Risk

As called out in #7593, the risk is visual-regression in the shared components, since the same React app drives both the standalone report and the Azure DevOps extension — both were exercised. No API or data-shape changes.

Microsoft Reviewers: Open in CodeFlow

grafanaKibana and others added 13 commits July 1, 2026 14:31
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
…ified comparisons

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
…ed unused styles, refactored context and metric formatting

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
@grafanaKibana

Copy link
Copy Markdown
Author

@dotnet-policy-service agree

…pendency

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AI Evaluation] Redesign the generated HTML report

1 participant