Skip to content

[Data] Remove deprecated read_parquet_bulk API#58970

Merged
bveeramani merged 4 commits into
ray-project:masterfrom
rushikeshadhav:rushikesh/remove-read-parquet-bulk-api
Nov 26, 2025
Merged

[Data] Remove deprecated read_parquet_bulk API#58970
bveeramani merged 4 commits into
ray-project:masterfrom
rushikeshadhav:rushikesh/remove-read-parquet-bulk-api

Conversation

@rushikeshadhav

Copy link
Copy Markdown
Contributor

Description

This PR removes the deprecated read_parquet_bulk API from Ray Data, along with its implementation and documentation. This function was deprecated in favor of read_parquet, which now covers all equivalent use cases. The deprecation warning stated removal after May 2025, and that deadline has passed — so this cleanup reduces maintenance burden and prevents user confusion.

Summary of changes

  • Removed read_parquet_bulk from read_api.py and init.py
  • Deleted ParquetBulkDatasource + its file
  • Removed related tests and documentation
  • Updated references and docstrings mentioning the deprecated API

Related issues

Fixes #58969

@rushikeshadhav rushikeshadhav requested a review from a team as a code owner November 25, 2025 12:06

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the deprecated read_parquet_bulk API, which helps reduce maintenance and prevent user confusion. The changes are comprehensive, covering the function's implementation, tests, documentation, and internal references. The code removal is clean and I only have one minor suggestion to improve the clarity of an updated docstring.

Comment thread python/ray/data/_internal/datasource/parquet_datasource.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Orphaned parametrize decorators stacked on unrelated test function

The @pytest.mark.parametrize decorators for the deleted test_parquet_read_bulk and test_parquet_read_bulk_meta_provider functions were not removed along with the functions. These orphaned decorators (lines 234-249 and 250-265) are now stacked on top of test_parquet_read_partitioned, causing that test to have three parametrize decorators instead of one. This results in the test running with a Cartesian product of parameters from all three decorators, dramatically increasing test execution time and potentially causing failures from duplicate parameter name conflicts.

python/ray/data/tests/test_parquet.py#L233-L265

https://github.com/ray-project/ray/blob/bbfae94d1ecf252f23e498890c4fd03f2b8d6975/python/ray/data/tests/test_parquet.py#L233-L265

Fix in Cursor Fix in Web


@ray-gardener ray-gardener Bot added docs An issue or change related to documentation data Ray Data-related issues community-contribution Contributed by the community labels Nov 25, 2025

@bveeramani bveeramani left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for the contribution! Overall LGTM, just left a couple comments

Comment thread python/ray/data/_internal/datasource/parquet_datasource.py Outdated
Comment thread python/ray/data/tests/test_parquet.py Outdated
Comment thread python/ray/data/tests/test_parquet.py Outdated
Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com>
@rushikeshadhav rushikeshadhav force-pushed the rushikesh/remove-read-parquet-bulk-api branch from bbfae94 to 5619e3f Compare November 26, 2025 05:31
rushikeshadhav and others added 2 commits November 26, 2025 11:02
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

@bveeramani bveeramani left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢

@bveeramani

Copy link
Copy Markdown
Member

@rushikeshadhav as a follow up, would you be interested in removing FastFileMetadataProvider?

@bveeramani bveeramani changed the title data: remove deprecated read_parquet_bulk API Nov 26, 2025
@bveeramani bveeramani enabled auto-merge (squash) November 26, 2025 08:19
@github-actions github-actions Bot added the go add ONLY when ready to merge, run all tests label Nov 26, 2025
@bveeramani bveeramani self-assigned this Nov 26, 2025
@bveeramani bveeramani merged commit 2fbb0bd into ray-project:master Nov 26, 2025
7 of 8 checks passed
@rushikeshadhav

Copy link
Copy Markdown
Contributor Author

@rushikeshadhav as a follow up, would you be interested in removing FastFileMetadataProvider?

Yes, I would love to.

SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
## Description
>This PR removes the deprecated read_parquet_bulk API from Ray Data,
along with its implementation and documentation. This function was
deprecated in favor of read_parquet, which now covers all equivalent use
cases. The deprecation warning stated removal after May 2025, and that
deadline has passed — so this cleanup reduces maintenance burden and
prevents user confusion.

Summary of changes

- Removed read_parquet_bulk from read_api.py and __init__.py
- Deleted ParquetBulkDatasource + its file
- Removed related tests and documentation
- Updated references and docstrings mentioning the deprecated API

## Related issues
> Fixes ray-project#58969

---------

Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

2 participants