Skip to content

PDF stage 3.4: bare CFF (/FontFile3 / Type1C)#551

Draft
andiwand wants to merge 4 commits into
mainfrom
pdf-stage-3.4-cff
Draft

PDF stage 3.4: bare CFF (/FontFile3 / Type1C)#551
andiwand wants to merge 4 commits into
mainfrom
pdf-stage-3.4-cff

Conversation

@andiwand

@andiwand andiwand commented Jun 23, 2026

Copy link
Copy Markdown
Member

Fourth piece of stage 3 — embedded CFF fonts (/FontFile3 Type1C / CIDFontType0C, OpenType-CFF). Stacked on #550 (3.3). Draft.

Design: docs/design/pdf/stage-3.4-cff.md.

Landed

  • cff::CffFont : abstract::Font — parses the CFF structure (INDEX/DICT, Name/Top-DICT/String INDEXes, charset 0/1/2, CharStrings INDEX, Private DICT) for the abstract::Font facts. Raw bytes pass through for the CFF embed.
  • cff::wrap_to_otf — synthesizes the SFNT skeleton (head/hhea/maxp/hmtx/name/post/OS/2), embeds CFF verbatim, bakes the uniform PUA cmap (reuses the 3.1 serializers).
  • /FontFile3 loading (bare CFF vs. full SFNT via magic) + HTML @font-face wiring.
  • CFF Standard Strings + AGL reverse map — the 391-entry table is generated (tools/font/generate_cff_standard_strings.pycff_standard_strings.{hpp,cpp}), and code_point_for_glyph now maps glyph → name → Unicode through the real Adobe Glyph List. Decision (2026-06-23): the font module depends on the pdf module for the AGL (a font-domain table that lives in pdf today; no link cycle).

Tests

CFF facts, custom + standard glyph-name resolution, the AGL reverse/forward maps, charstring widths, magic, and wrap_to_otf round-tripping through SfntFont. Full font + pdf + html corpus green.

🤖 Generated with Claude Code

Base automatically changed from pdf-stage-3.3-truetype-fontface to main June 23, 2026 16:37
andiwand and others added 2 commits June 23, 2026 21:32
Seed the stage-3.4 branch with the detailed design that precedes
implementation (roadmap in src/odr/internal/pdf/AGENTS.md). Read a bare CFF
into abstract::Font, wrap to OTF reusing the 3.1 PUA pipeline, wire into PDF
@font-face reusing 3.3. Implementation follows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
First implementation piece of 3.4: a bare-CFF abstract::Font reader. Parses
the CFF structure (INDEX/DICT primitives, Name/Top-DICT/String INDEXes, Top
DICT, charset formats 0/1/2, CharStrings INDEX, Private DICT) for the
abstract::Font facts — glyph count, units-per-em (FontMatrix), bbox, advance
widths (Type2 charstring leading-width extraction + default/nominalWidthX),
glyph names (custom String-INDEX SIDs), CID-keyed facts — while the raw CFF
bytes pass through for later verbatim embedding as a `CFF ` table.

Reverse map (code_point_for_glyph) currently covers the algorithmic
uniXXXX/uXXXXXX names; the 391-entry standard-strings table and full AGL
hookup are follow-ups (see TODO + design doc). OTF wrap + PDF /FontFile3
wiring land next.

Tests: a hand-built minimal CFF (assertion-based, no fixtures) covering the
facts, custom-name resolution, charstring width vs. default, and magic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
@andiwand andiwand force-pushed the pdf-stage-3.4-cff branch from f479767 to 83c13af Compare June 23, 2026 19:32
andiwand and others added 2 commits June 23, 2026 21:46
Wire embedded CFF fonts end to end. cff::wrap_to_otf synthesizes the SFNT
skeleton (head/hhea/maxp v0.5/hmtx/name/post/OS/2) from the abstract::Font
facts and embeds the CFF verbatim as a `CFF ` table, with a uniform PUA cmap
(pua_code_point(glyph) -> glyph) baked in — reusing the 3.1 serializers
(build_sfnt, serialize_cmap/post/os2). load_embedded_font now reads
/FontFile3 (bare CFF -> CffFont, or a full SFNT -> SfntFont via magic), and
the HTML font_family path wraps a CffFont the same way it re-encodes an
SfntFont, so embedded CFF renders real glyphs via @font-face with the
transparent Unicode selection layer.

Tests: wrap_to_otf round-trips through SfntFont (OTTO, glyph count, PUA cmap,
synthesized hmtx widths). Full font + PDF HTML-output corpus green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
Close the CFF reverse-map gap. Generate the 391-entry CFF Standard Strings
table (Adobe TN #5176 Appendix A) as committed C++
(tools/font/generate_cff_standard_strings.py -> cff_standard_strings.{hpp,cpp},
mirroring the pdf encoding-data generator), so charset SIDs < 391 resolve to
their glyph names. CffFont::code_point_for_glyph now maps glyph -> name ->
Unicode through the real Adobe Glyph List (pdf::glyph_name_to_unicode) instead
of the algorithmic uni-names only — so a name-keyed CFF's reverse map and
glyph_for_code_point work for standard glyph names (e.g. "A" -> U+0041), which
sharpens simple-font glyph selection for embedded CFF (and, downstream, Type1).

This intentionally makes the font module depend on the pdf module for the AGL
(decision 2026-06-23): the AGL is a font-domain table that lives in pdf today,
and the static lib has no link cycle.

Test: a standard-SID ("A") glyph resolves its name and round-trips through the
AGL reverse/forward maps. Full font + pdf + html corpus green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant