[Microsoft.Extensions.AI.Evaluation.Reporting] Evaluation report redesign#7609
Open
grafanaKibana wants to merge 14 commits into
Open
[Microsoft.Extensions.AI.Evaluation.Reporting] Evaluation report redesign#7609grafanaKibana wants to merge 14 commits into
grafanaKibana wants to merge 14 commits into
Conversation
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
…-eval-report-redesign
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
…-eval-report-redesign
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
…ified comparisons Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
…ed unused styles, refactored context and metric formatting Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
Author
|
@dotnet-policy-service agree |
…pendency Signed-off-by: nikitareshetnik <reshetnik.nikita@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #7593.
Summary
Redesigns the generated AI Evaluation HTML report (the React app under
Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript) to the FluentUI 2 mockup agreed in #7593. It's a visual/UX refresh — the sameDataset/ScenarioRunResult/EvaluationMetricdata is read throughReportContext, with no data or API changes.The old single-page layout is replaced by four focused, tab-based views driven by a shared app shell:
The same components drive both the standalone report and the Azure DevOps extension tab.
What changed
Restructured
components/into feature folders (previously a flat directory):shell/App,AppShell(top bar + pivot),SidebarTree,ViewRouter, theming (theme.ts/theme.css), ADO host-resize hookcore/ReportContext,Summary,viewModels,scoringoverview/,cases/,history/TranscriptBlock,MetricPanel,TrendChart,dumbbellGeometry, …)styles/makeStylesslotsindex.tsApp,MetricCard,PassFailBar,ScenarioTree,ScoreNodeHistory,TagsDisplay, …) and the old flatReportContext/Summary/Styles.theme.csstoken layer + per-component Griffel slots), with light/dark parity and an acrylic material for elevated surfaces.html-report/gen-devdata.js— a deterministic dev-data generator so the report can be run and reviewed locally with realistic multi-run data.New dependencies
react-markdown+remark-gfm— markdown/GFM rendering in transcripts and reasons.vitest,@testing-library/react,@testing-library/jest-dom,jsdom,@vitest/coverage-v8.Testing
Adds a test suite (didn't exist before) — 87 tests across 11 files (
vitest+ Testing Library), covering the load-bearing behavior the redesign must preserve: chronological run ordering, movers/comparison ordering, per-execution re-scoping, the "New" badge, transcript rendering, ADO theme/persistKey, and the Cases sort/filter.Verification
npm run build(tsc composite +vite build) — clean; single-file bundle ~860 kB.npm test— 87/87 pass.eslint— 0 errors (2 inherentreact-refreshwarnings on the context/entry files).Risk
As called out in #7593, the risk is visual-regression in the shared components, since the same React app drives both the standalone report and the Azure DevOps extension — both were exercised. No API or data-shape changes.
Microsoft Reviewers: Open in CodeFlow