Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -198,9 +198,13 @@ set(ODR_SOURCE_FILES
"src/odr/internal/pdf/pdf_object_parser.cpp"
"src/odr/internal/pdf/pdf_page_text.cpp"

"src/odr/internal/font/cff_builder.cpp"
"src/odr/internal/font/cff_font.cpp"
"src/odr/internal/font/cff_standard_strings.cpp"
"src/odr/internal/font/cff_transform.cpp"
"src/odr/internal/font/type1_charstring.cpp"
"src/odr/internal/font/type1_crypt.cpp"
"src/odr/internal/font/type1_font.cpp"
"src/odr/internal/font/sfnt_font.cpp"
"src/odr/internal/font/sfnt_parser.cpp"
"src/odr/internal/font/sfnt_transform.cpp"
Expand Down
79 changes: 79 additions & 0 deletions docs/design/pdf/stage-3.5-type1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# PDF stage 3.5 — Type1 (`/FontFile`)

Design for the Type1 font sub-stage. Status: **design draft** (no implementation
yet — this PR seeds the branch). Roadmap entry lives in
[`src/odr/internal/pdf/AGENTS.md`](../../../src/odr/internal/pdf/AGENTS.md).

Stacked on **3.4** — the whole point is to reuse 3.4's CFF → OTF path, so Type1
support is "translate Type1 to CFF, then everything downstream is 3.4".

## Goal

Read a PDF `/FontFile` (a Type1 / PostScript font program) and render it through
the same `@font-face` + dual-layer pipeline as TrueType (3.3) and CFF (3.4). The
hardest single font piece, but precisely specified (Adobe *Type 1 Font Format*
T1 spec; pdf.js as a reference implementation).

## What gets read (`internal/font/type1_font.{hpp,cpp}`)

`/FontFile` has three parts sized by the descriptor's `/Length1` (clear ASCII),
`/Length2` (binary eexec), `/Length3` (trailer of zeros + `cleartomark`):

1. **Clear text** — `/Encoding` (code → glyph name, or `StandardEncoding`),
`/FontMatrix`, `/FontBBox`, `/FontName`.
2. **eexec section** — decrypt with R = 55665 (skip the 4 random bytes), then
parse:
- **`/Subrs`** — index → (decrypted) charstring.
- **`/CharStrings`** — glyph name → charstring; each charstring decrypted
with R = 4330, `lenIV` (default 4) leading bytes dropped.
3. **Trailer** — ignored.

PFB segment framing (`0x80` markers) is handled if present; PDF embeds the raw
three-segment form.

## Type1 → Type2 charstring translation (the core)

Translate each decrypted **Type1** charstring into a **Type2 (CFF)** charstring,
then build a CFF and hand it to 3.4's wrap. The non-trivial cases:

- `hsbw` → seed the left side bearing + advance width, emit as the Type2 width +
initial `rmoveto`.
- `seac` (accented composite) → decompose into base + accent (StandardEncoding
lookup), or emit `endchar` with the seac operands (Type2 deprecated-seac form).
- `div`, `callsubr` / `return` (Subrs), and the `callothersubr` family —
**flex** (OtherSubrs 0–2) and **hint replacement** (OtherSubr 3) must be
interpreted/flattened, not passed through; this is the part that needs care.
- hint operators (`hstem`/`vstem`/`dotsection`) → Type2 equivalents (or drop;
display tolerates missing hints).

Output: a `cff::CffFont` (3.4) built from the translated charstrings, a charset
from the glyph names, and a private dict carrying the widths. Everything after
that — OTF wrap, PUA re-encode, OTS gate — is 3.4 unchanged.

## Reverse map

Charstring **glyph names** → AGL → Unicode (reuse `pdf_encoding`), same shape as
CFF. A symbolic Type1 with a built-in encoding becomes selectable via this map.

## PDF wiring (reuse 3.3 / 3.4)

- `pdf_document_parser`: `/FontFile` → `Type1Font` → (translate) → `CffFont` →
`Font::embedded_font`.
- `Font::glyph_for_code` simple-font branch resolves code → glyph name via the
PDF `/Encoding` (Differences over base) or the font's built-in `/Encoding`,
then name → glyph id through the CFF charset.
- `to_unicode` reverse-map fallback and HTML dual-layer emission unchanged.

## Scope / non-goals

- CID-keyed Type1 (Type1 in a Type0, rare) — defer unless a corpus file needs it.
- Multiple Master Type1 — out of scope.
- Hinting fidelity is best-effort (display only).

## Tests

Font-only, assertion-based: a minimal hand-built (or frozen-literal) Type1 —
eexec + charstring decryption, an `hsbw` + a `flex`/hint-replacement charstring
translated and round-tripped through 3.4's CFF wrap and OTS, the glyph-name
reverse map. Plus a `pdf_document_parser` case: `/FontFile` → `embedded_font`
with Unicode recovery.
184 changes: 184 additions & 0 deletions src/odr/internal/font/cff_builder.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
#include <odr/internal/font/cff_builder.hpp>

#include <cstdint>
#include <string>
#include <vector>

namespace odr::internal::font::cff {

namespace {

void put16(std::string &s, const std::uint16_t v) {
s += static_cast<char>(v >> 8);
s += static_cast<char>(v & 0xff);
}

/// A CFF DICT integer in the compact encoding (used for widths / bbox).
void dict_int(std::string &s, const int v) {
if (v >= -107 && v <= 107) {
s += static_cast<char>(v + 139);
} else if (v >= 108 && v <= 1131) {
const int u = v - 108;
s += static_cast<char>((u >> 8) + 247);
s += static_cast<char>(u & 0xff);
} else if (v >= -1131 && v <= -108) {
const int u = -v - 108;
s += static_cast<char>((u >> 8) + 251);
s += static_cast<char>(u & 0xff);
} else if (v >= -32768 && v <= 32767) {
s += static_cast<char>(28);
put16(s, static_cast<std::uint16_t>(v));
} else {
s += static_cast<char>(29);
s += static_cast<char>((v >> 24) & 0xff);
s += static_cast<char>((v >> 16) & 0xff);
s += static_cast<char>((v >> 8) & 0xff);
s += static_cast<char>(v & 0xff);
}
}

/// A CFF DICT integer in the fixed 5-byte form (`29 + int32`), so an operand
/// whose value (an offset) is not yet known can be sized before it is filled.
void dict_int_fixed(std::string &s, const std::int32_t v) {
s += static_cast<char>(29);
s += static_cast<char>((v >> 24) & 0xff);
s += static_cast<char>((v >> 16) & 0xff);
s += static_cast<char>((v >> 8) & 0xff);
s += static_cast<char>(v & 0xff);
}

void dict_operator(std::string &s, const int op) {
if (op >= 1200) {
s += static_cast<char>(12);
s += static_cast<char>(op - 1200);
} else {
s += static_cast<char>(op);
}
}

/// Serialize a CFF INDEX from its members.
std::string build_index(const std::vector<std::string> &members) {
std::string out;
put16(out, static_cast<std::uint16_t>(members.size()));
if (members.empty()) {
return out; // count 0: no offSize/offsets
}
std::uint32_t total = 1;
for (const std::string &m : members) {
total += static_cast<std::uint32_t>(m.size());
}
const std::uint8_t off_size = total <= 0xff ? 1
: total <= 0xffff ? 2
: total <= 0xffffff ? 3
: 4;
out += static_cast<char>(off_size);
const auto put_off = [&](const std::uint32_t off) {
for (int i = off_size - 1; i >= 0; --i) {
out += static_cast<char>((off >> (8 * i)) & 0xff);
}
};
std::uint32_t offset = 1;
put_off(offset);
for (const std::string &m : members) {
offset += static_cast<std::uint32_t>(m.size());
put_off(offset);
}
for (const std::string &m : members) {
out += m;
}
return out;
}

} // namespace

std::string build_cff(const std::string_view name,
const std::vector<BuilderGlyph> &glyphs,
const double default_width, const double nominal_width,
const FontBBox bbox) {
// CharStrings INDEX (one Type2 charstring per glyph).
std::vector<std::string> charstrings;
charstrings.reserve(glyphs.size());
for (const BuilderGlyph &glyph : glyphs) {
charstrings.push_back(glyph.charstring);
}
const std::string charstrings_index = build_index(charstrings);

// String INDEX: every glyph name gets a custom SID (391 + position). Glyph 0
// is the implicit `.notdef` (SID 0), so its name is not stored; the charset
// lists SIDs for glyphs 1..n-1.
std::vector<std::string> strings;
for (std::size_t i = 1; i < glyphs.size(); ++i) {
strings.push_back(glyphs[i].name);
}
const std::string string_index = build_index(strings);

// Format-0 charset: SID per glyph 1..n-1.
std::string charset;
charset += static_cast<char>(0); // format 0
for (std::size_t i = 1; i < glyphs.size(); ++i) {
put16(charset, static_cast<std::uint16_t>(391 + (i - 1)));
}

// Private DICT: defaultWidthX (20), nominalWidthX (21).
std::string private_dict;
dict_int(private_dict, static_cast<int>(default_width));
dict_operator(private_dict, 20);
dict_int(private_dict, static_cast<int>(nominal_width));
dict_operator(private_dict, 21);

const std::string name_index =
build_index({std::string(name.empty() ? "ODRType1" : name)});
const std::string global_subrs = build_index({});

// Top DICT, with the offsets to charset / CharStrings / Private filled once
// the layout is known. Fixed-width offset integers keep the size constant.
const auto top_dict = [&](const std::uint32_t charset_off,
const std::uint32_t charstrings_off,
const std::uint32_t private_off) {
std::string d;
dict_int(d, bbox.x_min);
dict_int(d, bbox.y_min);
dict_int(d, bbox.x_max);
dict_int(d, bbox.y_max);
dict_operator(d, 5); // FontBBox
dict_int_fixed(d, static_cast<std::int32_t>(charset_off));
dict_operator(d, 15); // charset
dict_int_fixed(d, static_cast<std::int32_t>(charstrings_off));
dict_operator(d, 17); // CharStrings
dict_int_fixed(d, static_cast<std::int32_t>(private_dict.size()));
dict_int_fixed(d, static_cast<std::int32_t>(private_off));
dict_operator(d, 18); // Private [size offset]
return d;
};

const std::string top_dict_probe = build_index({top_dict(0, 0, 0)});
constexpr std::uint32_t header_size = 4;
const auto prefix = static_cast<std::uint32_t>(
header_size + name_index.size() + top_dict_probe.size() +
string_index.size() + global_subrs.size());
// Layout after the prefix: CharStrings, charset, Private.
const std::uint32_t charstrings_off = prefix;
const std::uint32_t charset_off =
charstrings_off + static_cast<std::uint32_t>(charstrings_index.size());
const std::uint32_t private_off =
charset_off + static_cast<std::uint32_t>(charset.size());

const std::string top_dict_index =
build_index({top_dict(charset_off, charstrings_off, private_off)});

std::string out;
out += static_cast<char>(1); // major
out += static_cast<char>(0); // minor
out += static_cast<char>(4); // hdrSize
out += static_cast<char>(4); // offSize (absolute offsets; legacy/unused)
out += name_index;
out += top_dict_index;
out += string_index;
out += global_subrs;
out += charstrings_index;
out += charset;
out += private_dict;
return out;
}

} // namespace odr::internal::font::cff
40 changes: 40 additions & 0 deletions src/odr/internal/font/cff_builder.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#pragma once

#include <odr/font.hpp>

#include <string>
#include <string_view>
#include <vector>

namespace odr::internal::font::cff {

/// One glyph for the CFF builder: its PostScript name and its **Type2**
/// charstring (already translated from Type1, if applicable).
struct BuilderGlyph {
std::string name;
std::string charstring;
};

/// Serialize a name-keyed CFF font from Type2 charstrings.
///
/// Assembles the minimal CFF a `CffFont` reader (and, after wrapping, a
/// browser) needs: Header, Name INDEX, Top DICT (FontBBox +
/// charset/CharStrings/Private offsets), String INDEX (every glyph name, SID
/// 391+), an empty Global Subr INDEX, the CharStrings INDEX, a format-0 charset
/// and a Private DICT
/// (`defaultWidthX`/`nominalWidthX`). Glyph 0 is the implicit `.notdef`; the
/// caller orders @p glyphs so glyph 0 is `.notdef`.
///
/// This is the assembly target for the Type1 -> CFF path (stage 3.5): the
/// translated Type2 charstrings go in here, the result feeds `CffFont` +
/// `wrap_to_otf` (3.4). No `FontMatrix` is emitted, so the font is 1000
/// units/em (the Type1 default); a non-default matrix is a follow-up.
///
/// Offsets in the Top DICT use the fixed-width 5-byte integer form so the
/// layout resolves in a single pass.
[[nodiscard]] std::string build_cff(std::string_view name,
const std::vector<BuilderGlyph> &glyphs,
double default_width, double nominal_width,
FontBBox bbox);

} // namespace odr::internal::font::cff
Loading
Loading