[Data] Remove legacy BlockList class by pushpavanthar · Pull Request #60575 · ray-project/ray

pushpavanthar · 2026-01-29T03:19:04Z

Remove the BlockList class from Ray Data, eliminating unnecessary conversion overhead between RefBundle representations.

Why
BlockList existed as a legacy abstraction from an older execution model. After LazyBlockList was removed in #46054, the remaining BlockList only served as an intermediate conversion layer:

Executor produces RefBundle
legacy_compat.py converts to BlockList
plan.py converts back to RefBundle

This round-trip is unnecessary overhead.

Changes

legacy_compat.py: Renamed execute_to_legacy_block_list() → execute_to_ref_bundle(), returns RefBundle directly
plan.py: Uses RefBundle directly from executor
stats.py: Removed unused _DatasetStatsBuilder.build() method and BlockList import
test_split.py: Updated test helper to use RefBundle
Deleted block_list.py

Testing
All existing tests pass (424 split tests, execution tests, basic dataset operations).

Fixes #60621

gemini-code-assist

Code Review

This pull request is a solid refactoring that removes the legacy BlockList class, simplifying the data flow within Ray Data and eliminating unnecessary conversion overhead. The changes are clean, consistent, and well-motivated. I've included one suggestion to further optimize the logic in legacy_compat.py for better performance and memory efficiency. Overall, this is an excellent improvement to the codebase.

owenowenisme · 2026-01-29T09:50:20Z

@@ -169,8 +168,8 @@ def _get_initial_stats_from_plan(plan: ExecutionPlan) -> DatasetStats:
        return plan._in_stats


-def _bundles_to_block_list(bundles: Iterator[RefBundle]) -> BlockList:
-    blocks, metadata = [], []
+def _bundles_to_ref_bundle(bundles: Iterator[RefBundle]) -> RefBundle:


I think we can reuse merge_ref_bundles with some changes to it? Something like

def merge_ref_bundles(cls, bundles: Iterable["RefBundle"]) -> "RefBundle": bundles = list(bundles) if not bundles: return cls(blocks=(), owns_blocks=True, schema=None) merged_blocks = list(itertools.chain.from_iterable(bundle.blocks for bundle in bundles)) merged_slices = list(itertools.chain.from_iterable(bundle.slices for bundle in bundles)) owns_blocks = all(bundle.owns_blocks for bundle in bundles) schema = _take_first_non_empty_schema(bundle.schema for bundle in bundles) return cls( blocks=tuple(merged_blocks), schema=schema, owns_blocks=owns_blocks, slices=merged_slices, )

Implemented! Using merge_ref_bundles() is cleaner and also fixed a couple of bugs in that method (schema selection and ownership calculation).

BlockList was an intermediate conversion layer between the executor's RefBundle output and the plan's RefBundle consumption. This removes the unnecessary round-trip by having execute_to_ref_bundle() return RefBundle directly. Changes: - Rename execute_to_legacy_block_list to execute_to_ref_bundle - Remove _bundles_to_block_list, add _bundles_to_ref_bundle - Remove unused _DatasetStatsBuilder.build() method - Update test_split.py to use RefBundle - Delete block_list.py Signed-off-by: Purushotham Pushpavanth <pushpavanthar@gmail.com>

BlockList was an intermediate conversion layer between the executor's RefBundle output and the plan's RefBundle consumption. This removes the unnecessary round-trip by returning RefBundle directly. Changes: - Update RefBundle.merge_ref_bundles() to handle empty input, use _take_first_non_empty_schema, and properly compute owns_blocks - Rename execute_to_legacy_block_list to execute_to_ref_bundle - Use RefBundle.merge_ref_bundles() instead of custom helper - Remove unused _DatasetStatsBuilder.build() method - Update test_split.py to use RefBundle - Delete block_list.py Signed-off-by: Purushotham Pushpavanth <pushpavanthar@gmail.com>

alexeykudinkin · 2026-02-10T03:05:20Z

-        assert bundles, "Cannot merge an empty list of RefBundles."
-        merged_blocks = list(itertools.chain(*[bundle.blocks for bundle in bundles]))
-        merged_slices = list(itertools.chain(*[bundle.slices for bundle in bundles]))
+    def merge_ref_bundles(cls, bundles: Iterable["RefBundle"]) -> "RefBundle":


Let's add test for this method

I think this method is already tested here:

ray/python/ray/data/tests/test_ref_bundle.py

Line 379 in 7ecbca7

def test_merge_ref_bundles():

Is there anything new we need to test for this refactor?

Yeah, let's test owns_block semantic properly (while we're at it)

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Remove the `BlockList` class from Ray Data, eliminating unnecessary conversion overhead between `RefBundle` representations. **Why** `BlockList` existed as a legacy abstraction from an older execution model. After `LazyBlockList` was removed in ray-project#46054, the remaining `BlockList` only served as an intermediate conversion layer: 1. Executor produces `RefBundle` 2. `legacy_compat.py` converts to `BlockList` 3. `plan.py` converts back to `RefBundle` This round-trip is unnecessary overhead. **Changes** - `legacy_compat.py`: Renamed `execute_to_legacy_block_list()` → `execute_to_ref_bundle()`, returns `RefBundle` directly - `plan.py`: Uses `RefBundle` directly from executor - `stats.py`: Removed unused `_DatasetStatsBuilder.build()` method and `BlockList` import - `test_split.py`: Updated test helper to use `RefBundle` - Deleted `block_list.py` **Testing** All existing tests pass (424 split tests, execution tests, basic dataset operations). Fixes ray-project#60621 --------- Signed-off-by: Purushotham Pushpavanth <pushpavanthar@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Adel Nour <ans9868@nyu.edu>

Remove the `BlockList` class from Ray Data, eliminating unnecessary conversion overhead between `RefBundle` representations. **Why** `BlockList` existed as a legacy abstraction from an older execution model. After `LazyBlockList` was removed in ray-project#46054, the remaining `BlockList` only served as an intermediate conversion layer: 1. Executor produces `RefBundle` 2. `legacy_compat.py` converts to `BlockList` 3. `plan.py` converts back to `RefBundle` This round-trip is unnecessary overhead. **Changes** - `legacy_compat.py`: Renamed `execute_to_legacy_block_list()` → `execute_to_ref_bundle()`, returns `RefBundle` directly - `plan.py`: Uses `RefBundle` directly from executor - `stats.py`: Removed unused `_DatasetStatsBuilder.build()` method and `BlockList` import - `test_split.py`: Updated test helper to use `RefBundle` - Deleted `block_list.py` **Testing** All existing tests pass (424 split tests, execution tests, basic dataset operations). Fixes ray-project#60621 --------- Signed-off-by: Purushotham Pushpavanth <pushpavanthar@gmail.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Balaji Veeramani <balaji@anyscale.com> Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>

pushpavanthar requested a review from a team as a code owner January 29, 2026 03:19

gemini-code-assist Bot reviewed Jan 29, 2026

View reviewed changes

ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 29, 2026

owenowenisme reviewed Jan 29, 2026

View reviewed changes

pushpavanthar added 2 commits January 29, 2026 18:55

pushpavanthar force-pushed the deprecate_blocklist branch from c986f54 to d13bf00 Compare January 29, 2026 13:25

pushpavanthar requested a review from owenowenisme January 29, 2026 13:26

bveeramani approved these changes Jan 30, 2026

View reviewed changes

bveeramani enabled auto-merge (squash) January 30, 2026 21:36

github-actions Bot added the go add ONLY when ready to merge, run all tests label Jan 30, 2026

Merge branch 'master' into deprecate_blocklist

53ab24b

github-actions Bot disabled auto-merge January 31, 2026 03:26

pushpavanthar and others added 2 commits February 2, 2026 15:44

Merge branch 'master' into deprecate_blocklist

9df29ed

Merge branch 'master' into deprecate_blocklist

1d33657

owenowenisme approved these changes Feb 3, 2026

View reviewed changes

iamjustinhsu assigned bveeramani Feb 4, 2026

alexeykudinkin reviewed Feb 10, 2026

View reviewed changes

bveeramani added 2 commits February 9, 2026 20:06

Merge branch 'master' into pr/60575

9a0579d

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

Address review comments

b5d751e

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

bveeramani enabled auto-merge (squash) February 11, 2026 01:39

github-actions Bot disabled auto-merge February 11, 2026 01:39

bveeramani merged commit 6f0458b into ray-project:master Feb 11, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data] Remove legacy BlockList class#60575

[Data] Remove legacy BlockList class#60575
bveeramani merged 7 commits into
ray-project:masterfrom
pushpavanthar:deprecate_blocklist

pushpavanthar commented Jan 29, 2026 •

edited by bveeramani

Loading

gemini-code-assist Bot left a comment

owenowenisme Jan 29, 2026

pushpavanthar Jan 29, 2026

Uh oh!

Uh oh!

alexeykudinkin Feb 10, 2026

bveeramani Feb 10, 2026

alexeykudinkin Feb 10, 2026

Uh oh!

Labels

5 participants

Uh oh!

Conversation

pushpavanthar commented Jan 29, 2026 • edited by bveeramani Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

owenowenisme Jan 29, 2026

Choose a reason for hiding this comment

pushpavanthar Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexeykudinkin Feb 10, 2026

Choose a reason for hiding this comment

bveeramani Feb 10, 2026

Choose a reason for hiding this comment

alexeykudinkin Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Labels

5 participants

pushpavanthar commented Jan 29, 2026 •

edited by bveeramani

Loading