Skip to content

fix(scripts): distinguish absent vs failed title fetches; gate resume cursor (#236)#246

Merged
williamzujkowski merged 1 commit into
mainfrom
fix/import-history-silent-drop-236
Jun 30, 2026
Merged

fix(scripts): distinguish absent vs failed title fetches; gate resume cursor (#236)#246
williamzujkowski merged 1 commit into
mainfrom
fix/import-history-silent-drop-236

Conversation

@williamzujkowski

Copy link
Copy Markdown
Collaborator

Summary

fetchWithRateRetry returned the same null for a title that is legitimately absent at a release point (301/302/404/doc-not-found) and for a transient failure (5xx after retries, 429 exhaustion, network error after retries). downloadAndExtractXml logged both as the misleading info line "Title not available", processBatch skipped them identically, and processReleasePoint advanced lastCompletedReleasePoint unconditionally — so a network blip during a multi-hour import permanently dropped a title's update from the historical git record, and --resume never retried it. The main loop also advanced the cursor on a thrown release point.

Fix

  • Discriminate outcomes: TitleFetch = { ok } | { absent } | { failed }. fetchWithRateRetry returns a FetchOutcome; downloadAndExtractXml returns TitleFetch. A present-but-corrupt archive or missing XML entry is failed (worth retrying/surfacing), not absent.
  • Gate the cursor: processBatch counts failedTitles; processReleasePoint advances lastCompletedReleasePoint only when failedTitles === 0. On failure it still saves manifests, so --resume reprocesses the release point and re-imports only the failed title (already-imported titles are no-op deltas — cheap and idempotent).
  • Surface it: the main loop no longer advances the cursor on a thrown release point, and exits non-zero when any release point had unrecovered failures, so silent gaps no longer look like a clean run.

Verification

Classification verified via tsx (404/302/doc-not-found → absent; 200 ZIP → response; persistent 5xx / network error → failed).

Note: scripts/ is not a pnpm workspace package and imports @civic-source/shared/types which aren't symlinked at the repo root, so it has no CI test/typecheck (the pre-existing delta-detector.test.ts is similarly local-only). Wiring scripts/ into CI is a worthwhile separate follow-up; kept out of scope here. CI green reflects that workspace packages are unaffected.

Closes #236

… cursor (#236)

fetchWithRateRetry returned the same null for a title that is legitimately
absent at a release point (301/302/404/doc-not-found) and for a transient
failure (5xx after retries, 429 exhaustion, network error after retries).
downloadAndExtractXml logged both as the misleading info "Title not available",
processBatch skipped them identically, and processReleasePoint advanced
lastCompletedReleasePoint unconditionally — so a network blip permanently
dropped a title's update from the historical record and --resume never retried
it. The main loop also advanced the cursor on a thrown release point.

- Introduce a TitleFetch discriminated union (ok | absent | failed);
  fetchWithRateRetry returns a FetchOutcome, downloadAndExtractXml returns
  TitleFetch. A present-but-corrupt archive / missing XML is 'failed', not
  'absent'.
- processBatch counts failedTitles; processReleasePoint only advances the
  resume cursor when failedTitles === 0 (it still saves manifests so a retry
  re-imports only the failed title — already-imported titles are no-op deltas).
- The main loop no longer advances the cursor on a thrown release point and
  exits non-zero when any release point had unrecovered failures, so silent
  gaps surface instead of looking like a clean run.

Classification verified via tsx (404/302/doc-not-found -> absent; 200 zip ->
response; persistent 5xx / network error -> failed). scripts/ is not a
workspace package so it has no CI test/typecheck; wiring it in is a follow-up.

Closes #236

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@williamzujkowski williamzujkowski requested a review from a team as a code owner June 30, 2026 01:48
@williamzujkowski williamzujkowski merged commit 73c87c9 into main Jun 30, 2026
3 checks passed
@williamzujkowski williamzujkowski deleted the fix/import-history-silent-drop-236 branch June 30, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(scripts): import-history silently drops a title's update on transient fetch failure; --resume never recovers it

1 participant