Skip to content

feat: Phase 5+ P1 providers + shared core provider helpers#5

Merged
MyPrototypeWhat merged 34 commits into
mainfrom
m13t/reverent-chaplygin-0a65c4
Jun 29, 2026
Merged

feat: Phase 5+ P1 providers + shared core provider helpers#5
MyPrototypeWhat merged 34 commits into
mainfrom
m13t/reverent-chaplygin-0a65c4

Conversation

@MyPrototypeWhat

Copy link
Copy Markdown
Owner

Summary

Implements the Phase 5+ P1 provider backlog from docs/provider-roadmap.md and centralizes the per-provider helper duplication into @refkit/core.

New provider satellites (6 packages, ~7 provider ids)

Package Source Modality Auth License
@refkit/provider-rijksmuseum Rijksmuseum (modern Linked-Art API) image keyless CC0 / PD
@refkit/provider-polyhaven Poly Haven + ambientCG (two factories) image keyless CC0
@refkit/provider-freesound Freesound audio API key per-item CC / CC0
@refkit/provider-jamendo Jamendo audio API key per-item CC
@refkit/provider-europeana Europeana image API key per-item CC / PD / rights-statement
@refkit/provider-internet-archive Internet Archive video · text keyless per-item CC (dirty) → unknown

Shared core helpers (packages/core/src/provider-helpers.ts)

New exports consumed by every provider, removing duplicated copies (setIfString alone had been copy-pasted into 11 providers):

  • URL param setters: setIfString, setIfBoolean, setIfStringList, setIfInt/setIfPositiveInt/setIfNonNegativeInt/setIfNumber (with an opt-in clamp mode to preserve providers' Math.min clamping)
  • first (array-field helper)
  • mapCcDeedUrl (CC deed URL → LicenseId + version) and mapRightsUrl (CC deeds + faithful rightsstatements.org mapping)
  • image-URL heuristics: isLikelyImageUrl, imageMediaType, IMAGE_EXT

All 11 existing providers + the 5 new ones that had inline copies were retrofitted to import these (poly-haven has nothing centralizable; poetrydb is path-based with no query setters).

Faithful license normalization

Continues refkit's moat (honest per-item license). Notably, rightsstatements.org statements are mapped faithfully per token, not collapsed to unknown:

  • In-Copyright (InC*) → proprietary (we know it's copyrighted → denied, not needs-review)
  • NoC-USPD + jurisdiction: 'US' (jurisdiction-scoped; gated by evaluateUse when the caller supplies a jurisdiction)
  • NoC-NCproprietary (non-commercial)
  • opaque/undetermined (NoC-OKLR/NoC-CR/CNE/UND/NKC) → unknown

A source not annotating a license still correctly yields unknownneeds-review (e.g. Internet Archive, where ~93% of items carry no licenseurl) — that's the faithful output, not a defect.

Testing

  • pnpm -r typecheck — clean across all 22 packages
  • pnpm test:run37 test files / 250 tests, all passing
  • Each provider built TDD (red→green); the helper refactor is behavior-preserving (each provider's existing suite is the gate)

Process

Built via subagent-driven development (implementer + spec/quality review per unit), then an adversarial multi-agent review of the full branch diff (14 findings confirmed, 6 refuted). The confirmed fixes are in 471e5c0:

  • internet-archive: coerce array-valued Solr licenseurl/title to scalar + harden core mappers against non-string input (was: one array-valued doc could throw and sink the whole batch)
  • rijksmuseum: use mapRightsUrl so a found rightsstatements URI is mapped faithfully (was: matcher found it but CC-only mapper dropped it to unknown)
  • freesound / jamendo: null-drop items missing their canonical URL (batch resilience, matching met/rijksmuseum)
  • polyhaven: use core imageMediaType instead of an inline ternary
  • changeset + README accuracy

Intended behavior changes (not bugs — noted for reviewers)

  • Empty-string providerOptions and empty arrays are now dropped rather than emitted as empty query params (key=).
  • CC-deed mapping requires the creativecommons.org host (a CC path on another host → unknown).
  • A versionless …/licenses/by/ deed now maps to CC-BY (no version) instead of unknown.

Changesets

Included: @refkit/core minor (new exports), 6 new providers minor, @refkit/mcp minor (zero-config server now registers the new providers), 11 existing providers patch (internal refactor).

🤖 Generated with Claude Code

@MyPrototypeWhat MyPrototypeWhat merged commit 47fe6fc into main Jun 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant