Skip to content

Fix NameMapping loss in ParquetUtil.footerMetrics#14617

Merged
huaxingao merged 2 commits into
apache:mainfrom
tmater:footermetrics_namemapping
Nov 25, 2025
Merged

Fix NameMapping loss in ParquetUtil.footerMetrics#14617
huaxingao merged 2 commits into
apache:mainfrom
tmater:footermetrics_namemapping

Conversation

@tmater

@tmater tmater commented Nov 18, 2025

Copy link
Copy Markdown
Contributor

Fix NameMapping loss in ParquetUtil.footerMetrics

Summary

Fixed a bug where ParquetUtil.footerMetrics was losing field IDs when using NameMapping, resulting in empty metrics for Parquet files without embedded field IDs.

Background

When footerMetrics is called with a NameMapping, it applies the mapping to get field IDs via getParquetTypeWithIds(), but then passed the original MessageType to ParquetMetrics.metrics. Later in the metrics() call, field IDs are extracted from the MessageType via type.getColumnDescription().getPrimitiveType().getId(), which returns null for the original MessageType without IDs, causing all metrics to be skipped.

Changes

  • Pass parquetTypeWithIds to ParquetMetrics.metrics to preserve field IDs from NameMapping
  • Removed unused messageType variable

Testing

  • Added testFooterMetricsWithNameMappingForFileWithoutIds that verifies metrics are keyed by field IDs from NameMapping
When footerMetrics was called with a NameMapping, it correctly applied
the mapping to get field IDs but then passed the wrong MessageType to
ParquetMetrics.metrics, causing the IDs to be lost. This resulted in
empty metrics for Parquet files without embedded field IDs.

Fixed by passing parquetTypeWithIds instead of the original messageType.
.optionalString("data")
.endRecord();

ParquetWriter<GenericData.Record> writer =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use try-with-resource here too.

.endRecord();

ParquetWriter<GenericData.Record> writer =
AvroParquetWriter.<GenericData.Record>builder(new org.apache.hadoop.fs.Path(file.toURI()))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builder(Path file) is deprecated. Can we builder(OutputFile file) instead? You can refer to #14620

reader.getFooter(), Stream.empty(), MetricsConfig.getDefault(), nameMapping);

// The key assertion: column sizes should be keyed by field IDs from NameMapping
assertThat(metrics.columnSizes()).containsKeys(1, 2);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion passes even when the metrics.columnSizes() contains additional keys. Could you use containsOnlyKeys instead?

@tmater tmater requested review from ebyhr and nandorKollar November 19, 2025 10:31

@huaxingao huaxingao left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huaxingao huaxingao merged commit a3c538f into apache:main Nov 25, 2025
44 checks passed
@huaxingao

Copy link
Copy Markdown
Contributor

Thanks @tmater for the PR. Thanks @ebyhr @nandorKollar for the review!

@tmater

tmater commented Nov 25, 2025

Copy link
Copy Markdown
Contributor Author

Thanks @huaxingao , @nandorKollar, @ebyhr for the review!

@tmater tmater deleted the footermetrics_namemapping branch November 25, 2025 05:25
huaxingao pushed a commit to huaxingao/iceberg that referenced this pull request Dec 6, 2025
* Fix NameMapping loss in ParquetUtil.footerMetrics

When footerMetrics was called with a NameMapping, it correctly applied
the mapping to get field IDs but then passed the wrong MessageType to
ParquetMetrics.metrics, causing the IDs to be lost. This resulted in
empty metrics for Parquet files without embedded field IDs.

Fixed by passing parquetTypeWithIds instead of the original messageType.

* Addressed review comments

(cherry picked from commit a3c538f)
@huaxingao huaxingao added this to the Iceberg 1.10.1 milestone Dec 6, 2025
singhpk234 pushed a commit that referenced this pull request Dec 6, 2025
* Fix NameMapping loss in ParquetUtil.footerMetrics

When footerMetrics was called with a NameMapping, it correctly applied
the mapping to get field IDs but then passed the wrong MessageType to
ParquetMetrics.metrics, causing the IDs to be lost. This resulted in
empty metrics for Parquet files without embedded field IDs.

Fixed by passing parquetTypeWithIds instead of the original messageType.

* Addressed review comments

(cherry picked from commit a3c538f)
thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026
* Fix NameMapping loss in ParquetUtil.footerMetrics

When footerMetrics was called with a NameMapping, it correctly applied
the mapping to get field IDs but then passed the wrong MessageType to
ParquetMetrics.metrics, causing the IDs to be lost. This resulted in
empty metrics for Parquet files without embedded field IDs.

Fixed by passing parquetTypeWithIds instead of the original messageType.

* Addressed review comments
talatuyarer pushed a commit to talatuyarer/iceberg that referenced this pull request Apr 1, 2026
* Fix NameMapping loss in ParquetUtil.footerMetrics

When footerMetrics was called with a NameMapping, it correctly applied
the mapping to get field IDs but then passed the wrong MessageType to
ParquetMetrics.metrics, causing the IDs to be lost. This resulted in
empty metrics for Parquet files without embedded field IDs.

Fixed by passing parquetTypeWithIds instead of the original messageType.

* Addressed review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants