Skip to content

Fix: a very large numeric annotation permanently broke indexing and queries#35

Merged
pswies merged 3 commits into
mainfrom
fix-numeric-values-above-int64
Jul 3, 2026
Merged

Fix: a very large numeric annotation permanently broke indexing and queries#35
pswies merged 3 commits into
mainfrom
fix-numeric-values-above-int64

Conversation

@pswies

@pswies pswies commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What was broken

If anyone saved an entity with a numeric annotation of 2⁶³ or larger (roughly 9.2 quintillion), indexing that entity failed — and the indexer shut down for good. From that moment every arkiv_query on the node timed out with context cancelled, and a restart didn't help: the node re-read the same entity and broke again.

This is what took down query serving on all Braga RPC nodes on 2026-07-03 (the entity had a numeric annotation named big with such a value).

Why it happened

Numeric annotation values are kept in an SQLite column. SQLite stores whole numbers as signed 64-bit, and Go's database library refuses to pass an unsigned 64-bit number too big to fit:

sql: converting argument $2 type: uint64 values with high bit set are not supported

The value was passed straight through, so one big value was a fatal, permanent stop.

The fix

Store the value with its top bit flipped, as a signed number (value XOR 2⁶³). This solves both problems at once:

  • No more crash: the stored number always fits the database's signed 64-bit column.
  • Ordering stays right for every value: the flipped form grows in exactly the same order as the original number over the whole 0 … 2⁶⁴−1 range, so <, <=, >, >= give correct answers even across the 2⁶³ boundary. (An earlier revision of this PR kept large values misordered; review pushback was correct and this now handles them properly.)
  • Exact matches and IN lookups are one-to-one, so they stay exact for every possible value.

Existing databases

A one-time migration (000002) re-encodes the numeric index table automatically the first time a node starts on the new version. Notes for reviewers:

  • It rebuilds the table rather than updating in place — the transform swaps the two halves of the number line, so an in-place update could momentarily create duplicate keys.
  • The shift constant is written as two steps (± 9223372036854775807 ± 1): the one-step constant 2⁶³ doesn't fit in a 64-bit integer and SQLite would silently switch to floating point, corrupting values. This exact failure was caught by the migration test in this PR.
  • Wedged Braga nodes recover on upgrade: they resume from where they stopped, and the previously-fatal entity now indexes fine.
  • One-time startup cost proportional to the size of the numeric index table (a table copy).

Tests

  • Regression: index an entity with numeric annotations 2⁶³ and 2⁶⁴−1 through the same code path that failed in production; find it again by equality and IN lookup.
  • Range ordering across the boundary: entities with values 100, 2⁶³, 2⁶⁴−1 answer <, <=, >, >= correctly in all combinations.
  • Migration: build a database exactly as the released version wrote it (schema version 1, raw values), open it with the new code, and verify old rows still answer equality and range queries correctly.

🤖 Generated with Claude Code

pswies and others added 3 commits July 3, 2026 15:32
Saving an entity with a numeric annotation of 2^63 or larger killed the
event follower: SQLite stores integers as signed 64-bit, and Go's
database/sql refuses to bind a uint64 with the high bit set, so the
insert failed with "uint64 values with high bit set are not supported"
and the follower stopped permanently. Every arkiv_query then timed out
("context cancelled") until restart, and re-broke on the same entity.

Store the value's 8 bytes reinterpreted as int64 instead (two's
complement, lossless). Values below 2^63 keep the exact same stored
form, so existing databases need no migration and wedged nodes recover
by simply resuming. Equality and IN lookups stay exact for all values.

Known trade-off, left as follow-up: values of 2^63+ sort as negative in
SQL, so range comparisons order that extreme band incorrectly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Answers the review question "doesn't this break range queries for
large numbers?":

- For values below 2^63 (all data that could exist before the fix,
  since larger values crashed the indexer on write) the stored number
  is unchanged, so <, <=, >, >= behave exactly as before. Pinned by a
  spec.
- For values of 2^63+ the documented trade-off applies: they sort as
  if negative, so ranges misplace them. A second spec pins that
  current behavior explicitly so the follow-up (order-preserving
  encoding + migration) must consciously flip it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lues

Addresses the review concern that values of 2^63+ stored as plain
int64 bit patterns sort as negative and give wrong range results.

Store value XOR 2^63 (as int64) instead: the mapping is one-to-one and
strictly increasing over the whole uint64 range, so SQL's signed
ordering now equals numeric ordering everywhere - range queries are
correct for every value, not just those below 2^63.

Migration 000002 re-encodes existing rows once on startup via a table
rebuild (an in-place UPDATE could transiently collide under the
(name, value) primary key). The shift is split into two in-range steps
because the literal 9223372036854775808 does not fit in int64 and
SQLite would read it as floating point, silently storing imprecise
REALs - caught by the new migration test.

Tests: cross-boundary range ordering (100 / 2^63 / 2^64-1), plus a
migration test that builds a version-1 database with raw values and
verifies queries still answer correctly after upgrade.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@pswies pswies merged commit 9733fb1 into main Jul 3, 2026
1 check passed
@pswies pswies deleted the fix-numeric-values-above-int64 branch July 3, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant