Skip to content

perf: make Node subclass bytes (C-level hash/equality)#43

Merged
AmitMY merged 1 commit into
mainfrom
perf/node-subclass-bytes
Jun 28, 2026
Merged

perf: make Node subclass bytes (C-level hash/equality)#43
AmitMY merged 1 commit into
mainfrom
perf/node-subclass-bytes

Conversation

@AmitMY

@AmitMY AmitMY commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Answers "is there a faster counter?" — the bottleneck isn't Counter (it's already C); it's hashing/comparing the candidate node-tuples it's fed. Counter(graph.get_merges()) re-hashes them every step, and merge scans compare them. With Node a frozen dataclass, __hash__/__eq__ are Python calls.

Make a Node be its bytes:

class Node(bytes, GraphVertex):
    __str__ = GraphVertex.__str__   # bytes wins the MRO; keep our str()
    def __new__(cls, value): return super().__new__(cls, value)
    @property
    def value(self): return self[:]
    ...

Now __hash__/__eq__/__len__ are bytes' C-level operations (and tuples of bytes hash ~4× faster than tuples of dataclass instances). It's net simpler (drops the custom __eq__/__hash__/__len__).

Why this one matters for the whole pipeline: it speeds the recount path, so unlike the disconnected-only incremental work (#42) it also helps Boundless and SuperBPE — and the two compose.

Output identical (merge digests unchanged at 50/200 and 100/400), 138 tests pass, memory unchanged (nodes are interned, so the slightly larger object is negligible):

main this PR
BPE 0.384s / 1.33MB 0.220s / 1.33MB — 1.7×
BNE n=4 0.768s / 2.63MB 0.337s / 2.63MB — 2.3×
Boundless 0.469s / 1.42MB 0.264s / 1.42MB — 1.8×

Note: Node now compares equal to a raw bytes of the same value (bytes __eq__), but in the graph only Node/NodesSequence are ever compared, so behavior is unchanged in practice.

🤖 Generated with Claude Code

@AmitMY AmitMY force-pushed the perf/node-subclass-bytes branch from 4c4bec0 to a0d00c9 Compare June 28, 2026 04:03
The training loop's cost is hashing and comparing candidate node-tuples:
Counter(graph.get_merges()) hashes them every step, and merge scans
compare them. With Node a frozen dataclass, __hash__/__eq__ were Python
calls. Making Node a bytes subclass (a Node *is* its bytes) gives those
operations C-level implementations — and tuples of bytes hash ~4x faster
than tuples of dataclass instances.

This speeds the recount path too, so unlike the disconnected-only
incremental work it also helps Boundless/SuperBPE. It's net simpler:
__eq__/__hash__/__len__ are inherited from bytes, and Node no longer
needs a `value` field/property at all (a Node *is* its bytes) — Tree.merge
just returns the token. bytes wins the MRO for the dunders we want; we
keep GraphVertex.__str__ explicitly. Memory is unchanged (nodes are
interned).

~1.7x BPE, ~2.3x BNE, ~1.8x Boundless; memory unchanged; output identical
(digests unchanged); 138 tests pass.

Note: drops the public Node.value attribute — a Node is now its own bytes,
so use the node directly or bytes(node).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AmitMY AmitMY force-pushed the perf/node-subclass-bytes branch from a0d00c9 to 0e651f2 Compare June 28, 2026 04:17
@AmitMY AmitMY merged commit 587df0f into main Jun 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant