PDF stage 3.5: Type1 (/FontFile)#552
Draft
andiwand wants to merge 6 commits into
Draft
Conversation
289f85a to
dccb1d9
Compare
f479767 to
83c13af
Compare
424f31f to
75d240f
Compare
Seed the stage-3.5 branch. Read a Type1 program (eexec + charstring decryption), translate Type1 -> Type2 charstrings, build a CFF and reuse 3.4's CFF -> OTF path; reverse map via glyph names -> AGL. Stacked on 3.4. Implementation follows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
First self-contained piece of 3.5: the Type1 running-key cipher (font::type1::decrypt) and its two entry points — decrypt_eexec (key 55665, 4-byte skip, binary or ASCII-hex/PFA auto-detected) and decrypt_charstring (key 4330, /lenIV-aware). These don't depend on the CFF translation work, so they land ahead of the full Type1Font reader (eexec parse + Type1->Type2 charstring translation -> reuse 3.4's CFF->OTF path). Tests: round-trips against an independent forward-cipher reference (so they're not circular), the lenIV override, and the hex eexec form. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
Parse an Adobe Type1 font program into its decrypted parts: split the clear-text header / eexec section / trailer, read /FontName, /FontMatrix, /FontBBox and /Encoding (StandardEncoding or a custom dup-code-name-put array) from the header, decrypt the eexec section (type1_crypt) and extract every glyph's decrypted charstring plus /Subrs (RD/-| binary entries, /lenIV-aware). PFB segment framing is stripped if present. Charstrings are not yet interpreted — that's the Type1->Type2 translation that follows, feeding 3.4's CFF->OTF path. Tests: a hand-built encrypted Type1 program (independent forward cipher) — magic, header/encoding parse, and the decrypted charstrings/subrs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
cff::build_cff serializes a name-keyed CFF from a list of (name, Type2 charstring) glyphs + default/nominalWidthX + bbox: Header, Name INDEX, Top DICT (FontBBox + charset/CharStrings/Private offsets, fixed-width so the layout resolves in one pass), String INDEX (every glyph name as a custom SID, so no standard-strings table is needed), empty Global Subr INDEX, CharStrings INDEX, format-0 charset, Private DICT. This is the assembly target for the Type1 -> CFF path: the translated Type2 charstrings land here, the result feeds CffFont + wrap_to_otf (3.4). Test: build a 2-glyph CFF, read it back through CffFont (name, glyph name, bbox, charstring width vs. default) and confirm it wraps to a loadable OTTO. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
type1::to_type2 translates a decrypted Type1 charstring to Type2 (CFF): a stack machine that flattens callsubr (inlining the font's /Subrs, depth guarded), folds div, lifts the hsbw side bearing into the first moveto and returns the advance width separately, drops Type1-only hints (dotsection, *stem3, hint-replacement OtherSubr 3), and translates the flex OtherSubrs (1/2/0 -> two rrcurvetos) and seac (-> Type2 endchar form). Path operators (r/h/v lineto, rr/vh/hv curveto, stems, moves, endchar) share opcodes with Type2 and pass through. Best-effort / display-oriented: hints affect rendering quality, not glyph shape. Tests: exact Type2 output for hsbw width + side-bearing folding into the first move, callsubr inlining, and div folding. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
type1::to_cff translates every glyph (to_type2, flattening /Subrs), places .notdef at glyph 0 (synthesizing one when absent) and assembles a CFF via the builder. load_embedded_font now reads /FontFile: parse the Type1 program, convert to CFF, and hold it as a CffFont — so embedded Type1 reuses the entire 3.4 CFF path (PUA re-encode, @font-face wrap, reverse map) with no new abstract::Font subclass. Simple-font glyph selection by PostScript name (PDF /Encoding -> name -> glyph) is the shared CFF/Type1 follow-up tied to the AGL/name-mapping decision; composite and the wrap/display path work today. Tests: a Type1 program converts to a CFF that reads back through CffFont (glyph count incl. synthesized .notdef, names) and wraps to a loadable OTTO. Full font + PDF + HTML corpus green (460 tests). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014hm5SrdJvGNJNEHxpxR1dz
f9351ef to
29cdc2f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fifth piece of stage 3 — embedded Type1 fonts (
/FontFile). Stacked on the 3.4 CFF PR. Now functionally complete end-to-end (still draft pending review).Design:
docs/design/pdf/stage-3.5-type1.md.Landed (the full Type1 pipeline)
font::type1decryption (type1_crypt) — eexec (key 55665, binary + ASCII-hex) and charstring (key 4330,/lenIV).Type1Program(type1_font) — split sections, strip PFB framing, parse header (/FontName//FontMatrix//FontBBox//Encoding), decrypt eexec, extract glyph charstrings +/Subrs.type1_charstring) — stack machine: flattencallsubr, folddiv, lift thehsbwside bearing, drop Type1-only hints, translate flex +seac.cff_builder) — serialize a CFF from Type2 charstrings (INDEX/DICT/charset/Private, single-pass layout).type1::to_cffassembles it all,.notdeffirst;/FontFilewiring reads it back as aCffFont, reusing the entire 3.4 CFF path (PUA re-encode,@font-facewrap, reverse map).Tests
Each layer assertion-based (non-circular cipher round-trips; exact Type2 output; CFF round-trip through the reader; Type1→CFF→OTTO end to end). Full font + PDF + HTML corpus green (460 tests).
Known follow-up
Simple-font glyph selection by PostScript name (PDF
/Encoding→ name → glyph) is the shared CFF/Type1 item tied to the AGL / name-mapping decision; composite fonts and the wrap/display path work today.🤖 Generated with Claude Code