feat: add data licensing source registry#337
Conversation
Co-Authored-By: OpenAI Codex <codex@openai.com>
|
Warning Review limit reached
More reviews will be available in 3 minutes and 24 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds a data licensing document, a canonical source-rights registry, schema and resolution logic, validation/export wiring, fixture generation, and updated crowd-report URL requirements across ingest and triage paths. ChangesData Licensing and Source Registry
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/plans/active/data-licensing-attribution.md`:
- Around line 142-156: The documented taxonomy is missing the live generic-web
category, so update the recurring categories list to include it alongside the
existing entries. Use the existing generic-web symbol from Rights.ts and the
source-registry mappings for lta-website and web-archive-snapshot as the
reference point, and make sure the plan reflects the same category set used by
the schema and registry.
In `@packages/core/src/schema/Rights.ts`:
- Around line 36-51: The SourceRegistryRuleMatchSchema in Rights.ts currently
allows evidenceType-only matches, but the rights matcher in
packages/fs/src/rights.ts cannot use them when sourceUrl is missing or invalid.
Fix this by either tightening SourceRegistryRuleMatchSchema to require a URL
selector alongside evidenceType, or updating the rights matching logic
(especially the selector evaluation path) so evidenceType-only rules are still
checked without relying on sourceUrl parsing.
In `@packages/fs/src/rights.ts`:
- Around line 46-49: The host matching in rights.ts uses sourceUrl.host, which
includes the port and can cause false mismatches for rules like x.com versus
x.com:8443. Update the matching logic in the sourceUrlHost check to use
sourceUrl.hostname instead, keeping the normalizeHost comparison in place so the
existing rule matching behavior in the rights matching flow remains unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 884cd8b4-75a3-4727-954c-d1a22d9db990
📒 Files selected for processing (11)
LICENSE-DATA.mdREADME.mddata/rights/source-registry.jsondocs/plans/README.mddocs/plans/active/data-licensing-attribution.mdpackages/core/src/index.tspackages/core/src/schema/Rights.test.tspackages/core/src/schema/Rights.tspackages/fs/src/index.tspackages/fs/src/rights.test.tspackages/fs/src/rights.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d17174f119
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6124b04d6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/fs/src/index.test.ts`:
- Around line 256-264: The tests are parsing and rewriting evidence.ndjson as a
single JSON document instead of using the already-parsed bundle evidence from
listIssueBundles(). Update the relevant assertions in the evidence-related test
cases to mutate bundle.evidence directly, then write the file back in NDJSON
form while preserving all existing rows. Use the listIssueBundles() result and
the existing EvidenceSchema/EvidenceSchema.parse flow as the entry points to
locate the affected test logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 00ac726e-886c-410a-88e4-9c2e937360e7
📒 Files selected for processing (16)
data/rights/source-registry.jsondocs/plans/active/data-licensing-attribution.mdfixtures/ingest/crowd-report.jsonpackages/core/src/schema/Rights.tspackages/fs/src/constants.tspackages/fs/src/index.test.tspackages/fs/src/rights.test.tspackages/fs/src/rights.tspackages/fs/src/validate.tspackages/ingest-contracts/README.mdpackages/ingest-contracts/src/index.test.tspackages/ingest-contracts/src/index.tspackages/triage/src/util/ingestContent/helpers/formatContentTextForIngest.test.tspackages/triage/src/util/ingestContent/helpers/getEvidenceProvenanceForIngestContent.test.tspackages/triage/src/util/ingestContent/index.test.tsscripts/generate-fixtures.mjs
✅ Files skipped from review due to trivial changes (2)
- packages/triage/src/util/ingestContent/helpers/getEvidenceProvenanceForIngestContent.test.ts
- packages/ingest-contracts/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- data/rights/source-registry.json
- docs/plans/active/data-licensing-attribution.md
- packages/core/src/schema/Rights.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd7ba45bf3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f78e88d33
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/fs/src/publicExport.ts`:
- Around line 31-33: The fallback in publicExport currently fails open because
unresolved rights checks (`!result.ok`) return the original evidence unchanged;
update the export flow in `publicExport()` so inconclusive results do not pass
through full text. Either redact the evidence when `resolveSourceRegistryRule()`
is not ok or abort the export before `validateDataRoot()`, and keep the allow
path limited to `result.rule.publicExportAllowed` only.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 65638d2f-d980-40e8-9968-fa7992046e7f
📒 Files selected for processing (12)
.github/workflows/pages-deploy.yml.github/workflows/pages-preview.ymlpackages/cli/src/args.tspackages/cli/src/index.test.tspackages/core/src/schema/Manifest.tspackages/fs/src/index.test.tspackages/fs/src/index.tspackages/fs/src/manifest.tspackages/fs/src/pagesIndex.tspackages/fs/src/publicExport.tspackages/fs/src/validate.tsscripts/build-pages-artifact.mjs
💤 Files with no reviewable changes (1)
- packages/fs/src/validate.ts
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/pages-deploy.yml
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/fs/src/index.ts
- packages/fs/src/index.test.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: baa9eca838
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
Adds the Phase 1 data licensing foundation for MRTDown data:
LICENSE-DATA.mddata/rights/source-registry.jsonfor source rights and attribution policy@mrtdown/core@mrtdown/fsREADME.mdImpact
Downstream consumers get a clearer
CC-BY-4.0boundary for MRTDown-authored data while third-party evidence source material remains explicitly carved out. Later validation and attribution generation can now build on stable schemas and deterministic source-rule resolution.Validation
npm run test:corenpm run test:fsnpm run build:packagesnpm run lintnpm run check:boundariesnpm run check:docsThe fs test suite also parses the canonical source registry and confirms all current canonical and fixture evidence rows resolve to exactly one source rule.
Summary by CodeRabbit
rightsto export manifests.--scope rights.