Skip to content

[Data] - Remove the calls to Iceberg Catalog Table in write tasks#60476

Merged
alexeykudinkin merged 5 commits into
ray-project:masterfrom
goutamvenkat-anyscale:goutam/reduce_iceberg_catalog_calls
Jan 26, 2026
Merged

[Data] - Remove the calls to Iceberg Catalog Table in write tasks#60476
alexeykudinkin merged 5 commits into
ray-project:masterfrom
goutamvenkat-anyscale:goutam/reduce_iceberg_catalog_calls

Conversation

@goutamvenkat-anyscale

Copy link
Copy Markdown
Contributor

Description

Removing the calls to catalog.load_table to reduce the likelihood of running into rate limits from the remote Iceberg catalog. Have to serialize the underlying FileIO and TableMetadata objects to ensure the writers can write to the cloud storage in lieu of accessing it from the table instance.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner January 24, 2026 17:04
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - Remove the calls to Iceberg Catalog in tasks Jan 24, 2026
@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Jan 24, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the need for worker tasks to call the Iceberg catalog by serializing the FileIO and TableMetadata objects on the driver and passing them to the workers. This is a good optimization to prevent potential rate-limiting issues with the remote catalog. The implementation is clean and directly addresses the problem. I have one minor suggestion to improve code readability.

Comment thread python/ray/data/_internal/datasource/iceberg_datasink.py Outdated
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - Remove the calls to Iceberg Catalog in write tasks Jan 24, 2026
goutamvenkat-anyscale and others added 3 commits January 25, 2026 16:06
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
@alexeykudinkin alexeykudinkin merged commit 7d33f91 into ray-project:master Jan 26, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

2 participants