Fix NameMapping loss in ParquetUtil.footerMetrics#14617
Merged
Conversation
When footerMetrics was called with a NameMapping, it correctly applied the mapping to get field IDs but then passed the wrong MessageType to ParquetMetrics.metrics, causing the IDs to be lost. This resulted in empty metrics for Parquet files without embedded field IDs. Fixed by passing parquetTypeWithIds instead of the original messageType.
| .optionalString("data") | ||
| .endRecord(); | ||
|
|
||
| ParquetWriter<GenericData.Record> writer = |
Contributor
There was a problem hiding this comment.
nit: you can use try-with-resource here too.
ebyhr
reviewed
Nov 19, 2025
| .endRecord(); | ||
|
|
||
| ParquetWriter<GenericData.Record> writer = | ||
| AvroParquetWriter.<GenericData.Record>builder(new org.apache.hadoop.fs.Path(file.toURI())) |
Member
There was a problem hiding this comment.
builder(Path file) is deprecated. Can we builder(OutputFile file) instead? You can refer to #14620
| reader.getFooter(), Stream.empty(), MetricsConfig.getDefault(), nameMapping); | ||
|
|
||
| // The key assertion: column sizes should be keyed by field IDs from NameMapping | ||
| assertThat(metrics.columnSizes()).containsKeys(1, 2); |
Member
There was a problem hiding this comment.
This assertion passes even when the metrics.columnSizes() contains additional keys. Could you use containsOnlyKeys instead?
nandorKollar
approved these changes
Nov 19, 2025
ebyhr
approved these changes
Nov 25, 2025
Contributor
|
Thanks @tmater for the PR. Thanks @ebyhr @nandorKollar for the review! |
Contributor
Author
|
Thanks @huaxingao , @nandorKollar, @ebyhr for the review! |
huaxingao
pushed a commit
to huaxingao/iceberg
that referenced
this pull request
Dec 6, 2025
* Fix NameMapping loss in ParquetUtil.footerMetrics When footerMetrics was called with a NameMapping, it correctly applied the mapping to get field IDs but then passed the wrong MessageType to ParquetMetrics.metrics, causing the IDs to be lost. This resulted in empty metrics for Parquet files without embedded field IDs. Fixed by passing parquetTypeWithIds instead of the original messageType. * Addressed review comments (cherry picked from commit a3c538f)
singhpk234
pushed a commit
that referenced
this pull request
Dec 6, 2025
* Fix NameMapping loss in ParquetUtil.footerMetrics When footerMetrics was called with a NameMapping, it correctly applied the mapping to get field IDs but then passed the wrong MessageType to ParquetMetrics.metrics, causing the IDs to be lost. This resulted in empty metrics for Parquet files without embedded field IDs. Fixed by passing parquetTypeWithIds instead of the original messageType. * Addressed review comments (cherry picked from commit a3c538f)
thomaschow
pushed a commit
to thomaschow/iceberg
that referenced
this pull request
Jan 19, 2026
* Fix NameMapping loss in ParquetUtil.footerMetrics When footerMetrics was called with a NameMapping, it correctly applied the mapping to get field IDs but then passed the wrong MessageType to ParquetMetrics.metrics, causing the IDs to be lost. This resulted in empty metrics for Parquet files without embedded field IDs. Fixed by passing parquetTypeWithIds instead of the original messageType. * Addressed review comments
talatuyarer
pushed a commit
to talatuyarer/iceberg
that referenced
this pull request
Apr 1, 2026
* Fix NameMapping loss in ParquetUtil.footerMetrics When footerMetrics was called with a NameMapping, it correctly applied the mapping to get field IDs but then passed the wrong MessageType to ParquetMetrics.metrics, causing the IDs to be lost. This resulted in empty metrics for Parquet files without embedded field IDs. Fixed by passing parquetTypeWithIds instead of the original messageType. * Addressed review comments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix NameMapping loss in ParquetUtil.footerMetrics
Summary
Fixed a bug where
ParquetUtil.footerMetricswas losing field IDs when using NameMapping, resulting in empty metrics for Parquet files without embedded field IDs.Background
When
footerMetricsis called with a NameMapping, it applies the mapping to get field IDs viagetParquetTypeWithIds(), but then passed the original MessageType toParquetMetrics.metrics. Later in themetrics()call, field IDs are extracted from the MessageType viatype.getColumnDescription().getPrimitiveType().getId(), which returns null for the original MessageType without IDs, causing all metrics to be skipped.Changes
parquetTypeWithIdstoParquetMetrics.metricsto preserve field IDs from NameMappingmessageTypevariableTesting
testFooterMetricsWithNameMappingForFileWithoutIdsthat verifies metrics are keyed by field IDs from NameMapping