[Microsoft.Extensions.AI.Evaluation.Reporting] Evaluation report redesign by grafanaKibana · Pull Request #7609 · dotnet/extensions

grafanaKibana · 2026-07-04T06:13:33Z

Closes #7593.

Summary

Redesigns the generated AI Evaluation HTML report (the React app under Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript) to the FluentUI 2 mockup agreed in #7593. It's a visual/UX refresh — the same Dataset / ScenarioRunResult / EvaluationMetric data is read through ReportContext, with no data or API changes.

The old single-page layout is replaced by four focused, tab-based views driven by a shared app shell:

Overview — headline KPIs (pass rate, cases failing, scenarios fully passing, good ratings), biggest movers vs. the previous run, a "needs attention" panel, and pass-rate-by-scenario-group.
Cases — searchable/taggable case list with inline drill-down (transcript + per-metric panel + diagnostics), pagination, "show failed" filter, and a scenario sort control.
History — per-metric trend chart and dumbbell deltas across runs.
Comparison — side-by-side A/B comparison of two runs, per scenario/metric.

The same components drive both the standalone report and the Azure DevOps extension tab.

What changed

Restructured components/ into feature folders (previously a flat directory):

Folder	Contents
`shell/`	app frame — `App`, `AppShell` (top bar + pivot), `SidebarTree`, `ViewRouter`, theming (`theme.ts` / `theme.css`), ADO host-resize hook
`core/`	data/state — `ReportContext`, `Summary`, `viewModels`, `scoring`
`overview/`, `cases/`, `history/`	the four views + their pieces (`TranscriptBlock`, `MetricPanel`, `TrendChart`, `dumbbellGeometry`, …)
`styles/`	shared Griffel `makeStyles` slots
`index.ts`	public barrel; internal files don't import through it

Removed 14 superseded components (App, MetricCard, PassFailBar, ScenarioTree, ScoreNodeHistory, TagsDisplay, …) and the old flat ReportContext/Summary/Styles.
Styling moved onto FluentUI 2 design tokens (a theme.css token layer + per-component Griffel slots), with light/dark parity and an acrylic material for elevated surfaces.
Added html-report/gen-devdata.js — a deterministic dev-data generator so the report can be run and reviewed locally with realistic multi-run data.

New dependencies

react-markdown + remark-gfm — markdown/GFM rendering in transcripts and reasons.
Dev/test only: vitest, @testing-library/react, @testing-library/jest-dom, jsdom, @vitest/coverage-v8.

Testing

Adds a test suite (didn't exist before) — 87 tests across 11 files (vitest + Testing Library), covering the load-bearing behavior the redesign must preserve: chronological run ordering, movers/comparison ordering, per-execution re-scoping, the "New" badge, transcript rendering, ADO theme/persistKey, and the Cases sort/filter.

Verification

npm run build (tsc composite + vite build) — clean; single-file bundle ~860 kB.
npm test — 87/87 pass.
eslint — 0 errors (2 inherent react-refresh warnings on the context/entry files).
Manually verified both light and dark themes across all four tabs.

Risk

As called out in #7593, the risk is visual-regression in the shared components, since the same React app drives both the standalone report and the Azure DevOps extension — both were exercised. No API or data-shape changes.

Microsoft Reviewers: Open in CodeFlow

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

…-eval-report-redesign

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

…-eval-report-redesign

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

…ified comparisons Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

…ed unused styles, refactored context and metric formatting Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

grafanaKibana · 2026-07-04T06:16:25Z

@dotnet-policy-service agree

…pendency Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

grafanaKibana and others added 13 commits July 1, 2026 14:31

Redesign AI Evaluation Report: Part 1

cebdffa

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Redesign AI Evaluation Report: Part 2

45fc08f

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Redesign AI Evaluation Report: Bug fixes

aa34e48

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Merge branch 'dotnet:main' into ai-eval-report-redesign

2e5ca3a

Redesign AI Evaluation Report: Bug fixes

458131d

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Merge remote-tracking branch 'origin/ai-eval-report-redesign' into ai…

e945747

…-eval-report-redesign

Redesign AI Evaluation Report: Refactoring

9a3d7a1

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Redesign AI Evaluation Report: Refactoring

6d1f037

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Merge remote-tracking branch 'origin/ai-eval-report-redesign' into ai…

2e5a555

…-eval-report-redesign

Redesign AI Evaluation Report: Refactoring

e051fbe

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Redesign AI Evaluation Report: Fixed accent acrylic

b4275a4

Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Redesign AI Evaluation Report: Encapsulated dumbbell styles and simpl…

46d6460

…ified comparisons Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Redesign AI Evaluation Report: Added scenario sorting dropdown, remov…

9c69245

…ed unused styles, refactored context and metric formatting Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

grafanaKibana requested a review from a team as a code owner July 4, 2026 06:13

dotnet-policy-service Bot assigned grafanaKibana Jul 4, 2026

Redesign AI Evaluation Report:Removed unused "@vitest/coverage-v8" de…

a73cf7f

…pendency Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Microsoft.Extensions.AI.Evaluation.Reporting] Evaluation report redesign#7609

[Microsoft.Extensions.AI.Evaluation.Reporting] Evaluation report redesign#7609
grafanaKibana wants to merge 14 commits into
dotnet:mainfrom
grafanaKibana:ai-eval-report-redesign

grafanaKibana commented Jul 4, 2026 •

edited by dotnet-policy-service Bot

Loading

Uh oh!

grafanaKibana commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

grafanaKibana commented Jul 4, 2026 • edited by dotnet-policy-service Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

New dependencies

Testing

Verification

Risk

Microsoft Reviewers: Open in CodeFlow

Uh oh!

grafanaKibana commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

grafanaKibana commented Jul 4, 2026 •

edited by dotnet-policy-service Bot

Loading