Skip to content

GH-590: Allow VARIANT value to be omitted#591

Open
HuaHuaY wants to merge 2 commits into
apache:masterfrom
HuaHuaY:omitted_variant_value
Open

GH-590: Allow VARIANT value to be omitted#591
HuaHuaY wants to merge 2 commits into
apache:masterfrom
HuaHuaY:omitted_variant_value

Conversation

@HuaHuaY

@HuaHuaY HuaHuaY commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

Allow VARIANT value to be omitted.

There are some cases in parquet-testing https://github.com/apache/parquet-testing/tree/master/shredded_variant which don't have value field.

# Doesn't have top value field.
parquet-testing/shredded_variant on  master 
❯ ~/arrow/cpp/out/build/ninja-debug/debug/parquet-dump-schema case-131.parquet 
required group field_id=-1 table {
  required int32 field_id=1 id;
  required group field_id=2 var (Variant(1)) {
    required binary field_id=-1 metadata;
    optional int32 field_id=-1 typed_value;
  }
}

# Doesn't have inner value field.
parquet-testing/shredded_variant on  master 
❯ ~/arrow/cpp/out/build/ninja-debug/debug/parquet-dump-schema case-132.parquet
required group field_id=-1 table {
  required int32 field_id=1 id;
  optional group field_id=2 var (Variant(1)) {
    required binary field_id=-1 metadata;
    optional binary field_id=-1 value;
    optional group field_id=-1 typed_value {
      required group field_id=-1 a {
        optional int32 field_id=-1 typed_value;
      }
      required group field_id=-1 b {
        optional binary field_id=-1 typed_value (String);
      }
    }
  }
}

What changes are included in this PR?

Updated the descriptions in the variant-related Markdown files.

Do these changes have PoC implementations?

No.

@HuaHuaY

HuaHuaY commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Comment thread VariantEncoding.md Outdated
Comment thread LogicalTypes.md Outdated
Comment thread VariantEncoding.md Outdated
@HuaHuaY HuaHuaY requested a review from wgtmac June 30, 2026 07:34
@wgtmac

wgtmac commented Jun 30, 2026

Copy link
Copy Markdown
Member

Could you help review this? @aihuaxu @steveloughran

@alamb

alamb commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Perhaps @rdblue or @gene-db (the authors of the original spec in #456 #460) can help take a look

The example files were generated by spark so I assume they are now out in the wild and we shoudl adjust the spec accordingly

@rdblue

rdblue commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

I don't think this is correct.

The test cases are the ones that we use in Parquet Java and Iceberg to ensure that we can read variant data as long as it can be interpreted correctly. Being liberal about what we accept does not mean that it is a good idea for writers to produce values that way. We don't need to adjust the spec, this is just a reasonable choice for readers so they don't needlessly fail.

@wgtmac

wgtmac commented Jul 2, 2026

Copy link
Copy Markdown
Member

Thanks @rdblue for the explanation! Then I agree that this spec doesn't need any change. Perhaps we have two follow-ups to avoid future confusion:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants