Skip to content

JET-2518: Log keyed-state key and state name on RocksDB migration failures#22

Open
rafael-sotelo wants to merge 3 commits into
release-1.20.3.ilfrom
feature/JET-2518
Open

JET-2518: Log keyed-state key and state name on RocksDB migration failures#22
rafael-sotelo wants to merge 3 commits into
release-1.20.3.ilfrom
feature/JET-2518

Conversation

@rafael-sotelo

@rafael-sotelo rafael-sotelo commented Jun 22, 2026

Copy link
Copy Markdown

Summary

When a savepoint is restored with an evolved POJO schema, Flink runs state migration: RocksDBKeyedStateBackend.migrateStateValues iterates the raw RocksDB bytes, deserializes each value with the old serializer, and re-serializes with the new one. The migration loop never decodes the key (it copies iterator.key() verbatim) and never calls setCurrentKey, so when a value fails to deserialize, PojoSerializer's error log shows key=null with no state name — leaving no way to identify which keyed entry holds the corrupt/incompatible state.

This change decodes the composite key during migration and threads it (plus the state name) into DeserializationContext, so the existing PojoSerializer failure log now prints [state=<name>, key=<key>].

Changes

  • In migrateStateValues, best-effort decode the composite key (key group + key) from each iterator entry using the existing CompositeKeySerializationUtils.
  • Wrap the per-entry migrateSerializedValue call with DeserializationContext.set(key, stateName) / clear() — the same try/finally pattern used by RocksDBValueState.value().
  • Key decoding is best-effort: a key that fails to decode leaves key=null and never masks the real value-migration failure.
  • No change to migration behavior or the bytes written.

Effect

Failure logs change from Deserialization failure: ... to Deserialization failure [state=<stateName>, key=<realKey>]: ..., pinpointing the offending entry.

Verification

  • flink-statebackend-rocksdb module compiles cleanly (./mvnw / Maven 3.8.6).

🤖 Generated with Claude Code

✨ PR Description

Purpose: Add diagnostic logging of keyed-state keys and state names during RocksDB migration failures to improve error attribution in state deserialization.

Main changes:

  • Import DeserializationContext and thread keyed-state key/name through migration loop for error diagnostics
  • Wrap migrateSerializedValue call with best-effort key decoding and DeserializationContext population/cleanup
  • Add PoisonV1TestTypeSerializer test utility and comprehensive migration failure test verifying context propagation

Generated by LinearB AI and added by gitStream.
AI-generated content may contain inaccuracies. Please verify before using.
💡 Tip: You can customize your AI Description using Guidelines Learn how

…lures

During savepoint state migration, RocksDBKeyedStateBackend.migrateStateValues
iterates raw RocksDB bytes and never decodes the key, so a value
deserialization failure logged "key=null" with no state name.

Best-effort decode the composite key (key group + key) from the iterator and
thread it, plus the state name, into DeserializationContext around the
per-entry migrateSerializedValue call (same try/finally pattern as
RocksDBValueState.value()). PojoSerializer's failure log now reports
[state=<name>, key=<key>], pinpointing the offending entry. Key decoding is
best-effort and never masks the real migration failure; migrated bytes are
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@gitstream-cm gitstream-cm Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✨ PR Review

LGTM

Generated by LinearB AI and added by gitStream.
AI-generated content may contain inaccuracies. Please verify before using.
💡 Tip: You can customize your AI Review using Guidelines Learn how

@gitstream-cm

gitstream-cm Bot commented Jun 22, 2026

Copy link
Copy Markdown

🥷 Code experts: kellinwood

kellinwood has most 👩‍💻 activity in the files.
kellinwood has most 🧠 knowledge in the files.

See details

flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java

Activity based on git-commit:

kellinwood
JUN 1089 additions & 0 deletions
MAY
APR
MAR
FEB
JAN

Knowledge based on git-blame:
kellinwood: 100%

flink-state-backends/flink-statebackend-rocksdb/src/test/java/org/apache/flink/contrib/streaming/state/RocksDBStateBackendMigrationTest.java

Activity based on git-commit:

kellinwood
JUN 66 additions & 0 deletions
MAY
APR
MAR
FEB
JAN

Knowledge based on git-blame:
kellinwood: 100%

✨ Comment /gs review for LinearB AI review. Learn how to automate it here.

@toolkit-timmy

toolkit-timmy Bot commented Jun 22, 2026

Copy link
Copy Markdown

[Automated] Latest version: 2026.6.23-4f28457

…d key

Adds RocksDBStateBackendMigrationTest#testMigrationFailureSurfacesStateAndKey
and a PoisonV1TestTypeSerializer that writes valid bytes but fails to read,
forcing migrateStateValues to fail mid-migration. The test captures the
DeserializationContext during the failing deserialize and asserts the keyed
key and state name are populated, exercising the diagnostics added in this PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 1 medium

Results:
1 new issue

Category Results
Complexity 1 medium

View in Codacy

🟢 Metrics 18 complexity

Metric Results
Complexity 18

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

…izer

Reflow class Javadoc to satisfy the googleJavaFormat (AOSP) 100-column
limit enforced by spotless-check. No functional change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant