Skip to content

docs(#1458): SparkAdapter Codec Protocol spec + explainer#188

Open
dimitri-yatsenko wants to merge 2 commits into
mainfrom
feat/1458-renderable-spec
Open

docs(#1458): SparkAdapter Codec Protocol spec + explainer#188
dimitri-yatsenko wants to merge 2 commits into
mainfrom
feat/1458-renderable-spec

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Summary

Spec-first pair for the Renderable Codec Protocol landing in DataJoint 2.3 (#1458). Brought into 2.3 scope per user direction 2026-06-23 (was previously deferred to 2.4).

File Role
`src/reference/specs/renderable.md` (new) Normative spec: Protocol signature, return-value shape constraints (Spark-native: primitives / lists / dicts), why it's a Protocol vs. abstract method, eligibility detection via `isinstance`, out-of-scope items, worked codec examples.
`src/explanation/renderable-codecs.md` (new) Explainer: Bronze/Silver layer model, why `<blob@>` is bronze-only, design rationale (smaller OSS surface, cleaner opt-in, no plugin churn, structural typing), decision guide for choosing codecs in a new pipeline.
`mkdocs.yaml` Nav entries under Reference → Specifications → Type System, and Concepts → Storage.

Why spec-first

The Protocol itself is tiny (~10 lines: a `@runtime_checkable Protocol` declaration). The design conversation in the issue body settled the shape after #1457 (the earlier abstract-method-on-Codec framing) was rejected. Locking the spec now gives:

  • The downstream consumer (`datajoint-databricks`) a stable contract to build the silver-layer publish pipeline against.
  • Codec plugin authors (current and future) a clear opt-in target.
  • The implementation PR a small, well-scoped diff to land against.

Marked draft

Stays draft until the matching implementation PR opens in `datajoint-python` — same pattern as the provenance trinity spec (#183) before #1471 landed against it.

Sequencing

Independent of the provenance trinity (no code overlap). Can land in parallel with T2.2 implementation work.

  1. This PR (spec) — review while implementation is drafted.
  2. datajoint-python implementation PR — adds `src/datajoint/rendering.py` with the Protocol, re-exports as `dj.Renderable`, adds unit tests.
  3. Flip this PR from draft to ready alongside the implementation PR.

Test plan

  • `mkdocs serve` renders both pages under the new nav groups
  • Cross-links resolve (codec-api.md, custom-codecs.md explainer, the issue, #1457)
  • Examples use core DataJoint types per project convention
  • Reviewers can sketch a plugin codec from the worked examples without ambiguity

Spec-first pair for the Renderable Protocol landing in DataJoint 2.3
(per user direction 2026-06-23, bringing T3.2 back into 2.3 scope).

New files:

- src/reference/specs/renderable.md — normative spec for the Renderable
  Protocol. Covers signature, return-value shape constraints (primitives /
  lists / dicts mapping to Spark ArrayType / StructType / MapType), why
  the contract is a Protocol rather than an abstract method on Codec,
  eligibility detection via isinstance, out-of-scope items, and two
  worked example codec implementations (FloatArrayCodec, Image2DCodec,
  PointWithLabelCodec).

- src/explanation/renderable-codecs.md — explainer. Covers the
  Bronze/Silver layer model (CDC mirror vs typed silver layer), why
  <blob@> is bronze-only, what typed renderable codecs are, the design
  rationale for the Protocol pattern (smaller OSS surface, cleaner
  opt-in, no churn for existing plugins, structural typing), what's
  out of scope, and a decision guide for choosing codecs in a new
  pipeline.

Nav entries added:
- Reference > Specifications > Type System > Renderable Codec Protocol
- Concepts > Storage > Renderable Codecs

Implementation (against this spec) follows in datajoint-python; the
addition is small (~10 lines: a runtime_checkable Protocol declaration
in src/datajoint/rendering.py, re-exported as dj.Renderable).

Examples use core DataJoint types (float64, int32) per project convention.
Cross-links to codec-api.md (the base Codec interface that renderable
codecs extend by composition, not inheritance).
Renderable conflicts with the broader notion of graphically renderable
field types and is too generic for an interface targeted specifically at
Spark / Lakehouse Sync. Rename for clarity:

- Class: Renderable → SparkAdapter (parallels StorageAdapter)
- Method: render_spark → to_spark (matches pandas/Arrow conventions like
  to_pandas, to_arrow, __dataframe__)
- Spec file: renderable.md → spark-adapter.md
- Explainer: renderable-codecs.md → spark-adapters.md
- Nav entries updated in mkdocs.yaml
@dimitri-yatsenko dimitri-yatsenko changed the title docs(#1458): Renderable Codec Protocol spec + explainer docs(#1458): SparkAdapter Codec Protocol spec + explainer Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant