Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
65a01b0
docs: finalize P1 provider plans + helper refactor + roadmap status
MyPrototypeWhat Jun 29, 2026
34dff0e
feat(core): shared provider helpers (setIf*, first, mapCcDeedUrl, ima…
MyPrototypeWhat Jun 29, 2026
c12ca1e
feat(provider-rijksmuseum): keyless CC0/PD art search satellite (P1)
MyPrototypeWhat Jun 29, 2026
43147ee
feat(provider-polyhaven): polyhaven() CC0 image satellite
MyPrototypeWhat Jun 29, 2026
3db0edb
feat(provider-polyhaven): ambientcg() sibling CC0 image satellite
MyPrototypeWhat Jun 29, 2026
958d8fc
feat(provider-freesound): scaffold + license mapper
MyPrototypeWhat Jun 29, 2026
03cc675
feat(provider-freesound): audio search + reference mapper
MyPrototypeWhat Jun 29, 2026
e739c9e
feat(provider-jamendo): scaffold + CC-BY audio mapping (P1)
MyPrototypeWhat Jun 29, 2026
cb5279c
test(provider-jamendo): NC/ND → proprietary denied for commercial
MyPrototypeWhat Jun 29, 2026
4824064
test(provider-jamendo): missing/unknown ccurl → needs-review
MyPrototypeWhat Jun 29, 2026
ceb283f
test(provider-jamendo): request forwarding (client_id/search/limit/op…
MyPrototypeWhat Jun 29, 2026
9cea671
feat(provider-europeana): scaffold + edm:rights mapper
MyPrototypeWhat Jun 29, 2026
643fd75
feat(provider-europeana): toReference mapper (image-only, hotlink reh…
MyPrototypeWhat Jun 29, 2026
8f89f7b
feat(provider-europeana): search + factory wiring
MyPrototypeWhat Jun 29, 2026
1469ee9
feat(provider-internet-archive): license + mediatype mappers
MyPrototypeWhat Jun 29, 2026
a03e765
feat(provider-internet-archive): toReference + search
MyPrototypeWhat Jun 29, 2026
29822e1
refactor(provider-met): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
4df9033
refactor(provider-artic): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
8382f70
refactor(provider-openverse): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
b595e45
refactor(provider-unsplash): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
5802722
refactor(provider-pexels): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
91b8221
refactor(provider-pixabay): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
e930f2f
refactor(provider-gutendex): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
20904f1
refactor(provider-smithsonian): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
a58f603
refactor(provider-brave): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
64f817c
refactor(provider-flickr): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
335fb81
refactor(provider-wikimedia-commons): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
c70d02b
refactor(provider-rijksmuseum): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
398e36d
refactor(provider-freesound): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
dbd8d23
refactor(provider-jamendo): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
ba39e0f
refactor(provider-europeana): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
039eebd
refactor(provider-internet-archive): use shared core provider helpers
MyPrototypeWhat Jun 29, 2026
2b16960
feat: register P1 providers + wire shared helpers (central wiring)
MyPrototypeWhat Jun 29, 2026
471e5c0
fix: address final review (IA array fields, rijks faithful rights, fr…
MyPrototypeWhat Jun 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/provider-europeana.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@refkit/provider-europeana": minor
---

Add @refkit/provider-europeana: Europeana as license-normalized image references (BYOK; per-item CC / PD / rightsstatements.org, hotlink-required media).
5 changes: 5 additions & 0 deletions .changeset/provider-freesound.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@refkit/provider-freesound": minor
---

Add @refkit/provider-freesound: Freesound as license-normalized audio references (BYOK; per-item CC / CC0, CC name-string mapping with no version).
16 changes: 16 additions & 0 deletions .changeset/provider-helpers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
"@refkit/core": minor
"@refkit/provider-met": patch
"@refkit/provider-artic": patch
"@refkit/provider-openverse": patch
"@refkit/provider-unsplash": patch
"@refkit/provider-pexels": patch
"@refkit/provider-pixabay": patch
"@refkit/provider-gutendex": patch
"@refkit/provider-smithsonian": patch
"@refkit/provider-brave": patch
"@refkit/provider-flickr": patch
"@refkit/provider-wikimedia-commons": patch
---

Add shared provider helpers to @refkit/core (setIf* URL setters, first, mapCcDeedUrl, mapRightsUrl, image-URL heuristics) and refactor all providers to use them instead of per-package copies.
5 changes: 5 additions & 0 deletions .changeset/provider-internet-archive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@refkit/provider-internet-archive": minor
---

Add @refkit/provider-internet-archive: Internet Archive as license-normalized video / text references (movies → video, texts → text; dirty per-item CC licenseurl → unknown fallback).
5 changes: 5 additions & 0 deletions .changeset/provider-jamendo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@refkit/provider-jamendo": minor
---

Add @refkit/provider-jamendo: Jamendo as license-normalized audio references (BYOK; per-item CC via license_ccurl URL matching).
5 changes: 5 additions & 0 deletions .changeset/provider-polyhaven.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@refkit/provider-polyhaven": minor
---

Add @refkit/provider-polyhaven: Poly Haven and ambientCG (sibling factory `ambientcg`) as CC0-normalized image references (textures/HDRIs/materials; 3D model formats skipped for v1).
8 changes: 8 additions & 0 deletions .changeset/provider-rijksmuseum.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
"@refkit/provider-rijksmuseum": minor
"@refkit/mcp": minor
---

Add @refkit/provider-rijksmuseum: Rijksmuseum as license-normalized image references (keyless; CC0 / Public Domain).

Register the P1 providers in the @refkit/mcp zero-config server — rijksmuseum, polyhaven, ambientCG and internet-archive (keyless); freesound, jamendo and europeana (when their API key/token is set).
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,12 @@ const refkit = createRefkit({
| `@refkit/provider-gutendex` | Project Gutenberg | text | keyless | per-item PD |
| `@refkit/provider-poetrydb` | PoetryDB | text | keyless | PD |
| `@refkit/provider-brave` | Brave web search (discovery) | image (web) | API key | unknown → needs-review |
| `@refkit/provider-rijksmuseum` | Rijksmuseum | image | keyless | CC0 / PD |
| `@refkit/provider-polyhaven` | Poly Haven + ambientCG | image | keyless | CC0 |
| `@refkit/provider-freesound` | Freesound | audio | API key | per-item CC / CC0 |
| `@refkit/provider-jamendo` | Jamendo | audio | API key | per-item CC |
| `@refkit/provider-europeana` | Europeana | image | API key | per-item CC / PD / rights-statement |
| `@refkit/provider-internet-archive` | Internet Archive | video · text | keyless | per-item CC (dirty) → unknown |

Audio/video are extra factories on existing packages: `openverseAudio()`, `pexelsVideo()`, `pixabayVideo()`. Modality routing is automatic — an `['audio']` search only hits audio-capable providers.

Expand Down Expand Up @@ -208,7 +214,7 @@ Agents can use refkit in two ways:
npx -y @refkit/mcp
```

It boots with the keyless sources (Met, Art Institute, Wikimedia, Openverse, Project Gutenberg, PoetryDB) and auto-adds any BYOK source whose key is in the environment (`UNSPLASH_KEY`, `PEXELS_KEY`, `BRAVE_TOKEN`, …). Pass `intent` to annotate each result with a use-verdict (may I use this, is attribution required); `gateFor` to return only allowed results. Or wire your own providers/keys via `serveStdio(createRefkit({ … }))` — see [`@refkit/mcp`](https://www.npmjs.com/package/@refkit/mcp).
It boots with the keyless sources (Met, Art Institute, Wikimedia, Openverse, Project Gutenberg, PoetryDB, Rijksmuseum, Poly Haven, ambientCG, Internet Archive) and auto-adds any BYOK source whose key is in the environment (`UNSPLASH_KEY`, `PEXELS_KEY`, `BRAVE_TOKEN`, …). Pass `intent` to annotate each result with a use-verdict (may I use this, is attribution required); `gateFor` to return only allowed results. Or wire your own providers/keys via `serveStdio(createRefkit({ … }))` — see [`@refkit/mcp`](https://www.npmjs.com/package/@refkit/mcp).

## Not legal advice

Expand Down
59 changes: 36 additions & 23 deletions docs/provider-roadmap.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,34 @@
# refkit provider roadmap

Status as of 2026-06-23. Grounded in a web-verified landscape scan (104
Status as of 2026-06-29. Grounded in a web-verified landscape scan (104
candidate sources → 101 unique → 16 depth-verified). This is the contract for
expanding refkit's provider coverage; execute against it, not against memory.

## Current inventory (7 providers)
> **Progress: Phases 1–4 are DONE in refkit.** §1 (CC version axis) shipped in
> commit `75c557e`; §2 P0 providers (flickr, wikimedia-commons, met, artic,
> smithsonian) and §3/§4 cheap modality wins (openverse-audio, pexels-video,
> pixabay-video) are all built and tested. The remaining work is the §3 **P1
> backlog** (rijksmuseum, europeana, freesound, jamendo, internet-archive,
> poly-haven/ambientcg). One open caveat: §1 item 7 lives in the **Slate** repo
> (not this worktree) and is not verified here.

| Modality | Providers | Verdict |
## Current inventory (12 provider packages, ~15 provider ids)

| Modality | Providers | Status |
|---|---|---|
| image | openverse, unsplash, pexels, pixabay | mainstream stock + the main CC aggregator — solid, **but two glaring omissions: Flickr, Wikimedia Commons** |
| text | gutendex (Project Gutenberg), poetrydb | thin — only PD books + a niche poetry DB |
| image | openverse, unsplash, pexels, pixabay, **flickr**, **wikimedia-commons**, **met**, **artic**, **smithsonian** | ✅ Flickr + Wikimedia gaps closed; GLAM CC0 cluster (met/artic/smithsonian) added |
| text | gutendex (Project Gutenberg), poetrydb | unchanged — still thin (PD books + niche poetry DB) |
| audio | **openverse-audio** | ✅ cheap leg added (§4) |
| video | **pexels-video**, **pixabay-video** | ✅ cheap legs added (§4) |
| grey/discovery | brave | represents the web-search category; do **not** bulk-add more (every web source is `license:unknown`) |
| video / audio / icon·vector / 3d·texture | — | **no leg at all** |
| icon·vector / 3d·texture | — | still no leg (P1 backlog: poly-haven/ambientcg) |

The moat is per-item license normalization, so the highest-value additions are
mainstream sources that return **structured per-item license** (Flickr,
Wikimedia, the GLAM museum APIs), not more commodity stock or more grey web
search.

## §1 — Prerequisite: CC version axis (Phase 1, atomic, blocks everything)
## §1 — Prerequisite: CC version axis (Phase 1) — ✅ DONE (`75c557e`)

The current `LicenseId` enum only models `CC-BY-4.0` / `CC-BY-SA-4.0`. Every
CC-BY/BY-SA at version 1.0–3.0 collapses to `unknown` → `needs-review`. The
Expand Down Expand Up @@ -49,7 +59,7 @@ which today throws away CC-BY-2.0/3.0 results as `unknown`.

**Files (atomic — a partial rename leaves the build red, so it is one phase):**

refkit:
refkit (all done in `75c557e`):
1. `packages/core/src/license.ts` — `LicenseId` union + `LICENSE_FACTS` keys.
2. `packages/core/src/rights.ts` — `licenseIdSchema` enum + add `licenseVersion?`
to interface & schema.
Expand All @@ -62,7 +72,7 @@ refkit:
"older CC-BY → CC-BY, allowed-with-attribution, version preserved". This is
the proof the fix works.

Slate (consumes refkit via link — same atomic change):
Slate (consumes refkit via link — same atomic change) — ⚠️ NOT verified in this worktree:
7. `packages/core/src/retrieval/__tests__/reference-to-asset.test.ts` — test
data `'CC-BY-4.0'` → `'CC-BY'` (+ `licenseVersion: '4.0'`), and the
`metadata.license`/attribution assertions.
Expand All @@ -73,9 +83,11 @@ suite green in the Slate worktree.
Optional follow-up (not Phase 1): a `licenseDeedUrl(license, version?)` helper so
attribution links the exact CC deed instead of only the source page.

## §2 — P0 providers (mainstream + per-item clean license + i2i-usable)
## §2 — P0 providers (mainstream + per-item clean license + i2i-usable) — ✅ DONE

Each is an independent `@refkit/provider-*` satellite. Build after Phase 1.
All five shipped as independent `@refkit/provider-*` satellites
(`provider-flickr`, `provider-wikimedia-commons`, `provider-met`,
`provider-artic`, `provider-smithsonian`).

| Provider | Modality | Effort | Auth | License field (verified) | Mapping |
|---|---|---|---|---|---|
Expand All @@ -96,13 +108,13 @@ Notes:

## §3 — P1 providers, modality gaps & cheap wins

**Cheapest wins first — reuse an existing integration's key + license mapping:**
- **openverse audio** — the openverse API already serves audio under the same
key/shape; near-free audio leg.
- **pexels-video / pixabay-video** — same keys, same license as the image
providers we already ship; a different endpoint adds the video leg cheaply.
**Cheapest wins — ✅ DONE (all three built):**
- **openverse-audio** ✅ — same key/shape as openverse image; `openverseAudio()`
in `provider-openverse`.
- **pexels-video** ✅ / **pixabay-video** ✅ — `pexelsVideo()` etc., same keys as
the image providers, video endpoint.

**Other P1:**
**Other P1 — ⬜ REMAINING (this is the actual next work):**

| Provider | Leg | Caveat (verified) |
|---|---|---|
Expand All @@ -129,11 +141,12 @@ Notes:

## §5 — Sequencing

1. **Phase 1** — §1 CC version axis (atomic, refkit + Slate test). ← do first.
2. **Phase 2** — flickr + wikimedia-commons (the two mainstream image gaps).
3. **Phase 3** — met + artic + smithsonian (GLAM CC0 cluster; Met/Artic are S).
4. **Phase 4** — cheap modality wins: openverse-audio, pexels-video, pixabay-video.
5. **Phase 5+** — P1 backlog as demand dictates.
1. ✅ **Phase 1** — §1 CC version axis (refkit done in `75c557e`; Slate test
unverified here).
2. ✅ **Phase 2** — flickr + wikimedia-commons (the two mainstream image gaps).
3. ✅ **Phase 3** — met + artic + smithsonian (GLAM CC0 cluster).
4. ✅ **Phase 4** — cheap modality wins: openverse-audio, pexels-video, pixabay-video.
5. ⬜ **Phase 5+** — P1 backlog as demand dictates. ← **only remaining work**.

Phases 2–4 are independent per-package satellites → parallelizable via
Phase 5+ items are independent per-package satellites → parallelizable via
worktree-isolated subagents (one provider per agent).
Loading
Loading