[Data] Support strict=False mode for StreamingRepartition by machichima · Pull Request #60295 · ray-project/ray

machichima · 2026-01-19T12:00:26Z

Description

Currently, StreamingRepartition operator is essentially strict=True. We want to relax this to allow non-strict mode with following guarantees:

Strict mode: is guaranteeing that all output blocks (maybe except for the last one), will be of size target_num_rows
Non-strict mode: will provide more relaxed guarantee – it can produce 1 block that is < target_num_rows blocks per input block (ie it wouldn’t do any stitching)

This mode will be the default mode and would allow StreamingRepartition to be fused into previous operator

Related issues

Closes #60026

Additional information

Added strict: bool = False parameter to repartition()
Added mode-specific bundler selection in _get_fused_streaming_repartition_operator() and plan_streaming_repartition_op():
- Strict: uses ref_bundler=StreamingRepartitionRefBundler
- Non-strict: uses ref_bundler=None (default BlockRefBundler)
Add unit tests

Signed-off-by: machichima <nary12321@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a strict parameter to StreamingRepartition, allowing for a non-strict mode. In non-strict mode, repartitioning doesn't stitch blocks, which enables more operator fusion opportunities. The changes are well-implemented across the logical planning, fusion rules, and physical planning layers. The default for repartition is now non-strict, which is a good choice for performance. The added tests are comprehensive and cover both the new non-strict behavior and the fusion logic. My main feedback is to add documentation for the new strict parameter in the user-facing Dataset.repartition method to ensure users understand how to use it.

Signed-off-by: machichima <nary12321@gmail.com>

machichima · 2026-01-19T12:22:47Z

@owenowenisme PTAL. Thank you!

Signed-off-by: machichima <nary12321@gmail.com>

…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>

Signed-off-by: machichima <nary12321@gmail.com>

owenowenisme

test_operator_fusion is failing could you please take a look?

owenowenisme · 2026-01-21T16:14:05Z

        input_physical_dag,
        data_context,
        name=op.name,
        compute_strategy=compute,


I think we need min_rows_per_bundle = op.target_num_rows_per_block here if strict=False?

Updated in 89965d0

Seems like when we set min_rows_per_bundle here, the BlockRefBundler will try to stitch the output:

ray/python/ray/data/_internal/execution/operators/map_operator.py

Line 864 in 68d01c4

return list(output_buffer), _merge_ref_bundles(*output_buffer)

Therefor, I think we should keep it as None here to prevent stitching

ray/python/ray/data/_internal/execution/operators/map_operator.py

Lines 828 to 835 in 68d01c4

if self._min_rows_per_bundle is None:

# Short-circuit if no bundle row target was defined.

assert len(self._bundle_buffer) == 1

bundle = self._bundle_buffer[0]

self._bundle_buffer = []

self._bundle_buffer_size = 0

self._bundle_buffer_size_bytes = 0

return [bundle], bundle

owenowenisme · 2026-01-21T16:15:40Z

+            strict: If ``True``, ``repartition`` guarantees that all output blocks,
+                except for the last one, will have exactly ``target_num_rows_per_block`` rows.
+                If ``False``, ``repartition`` is more relaxed and may produce blocks smaller
+                than ``target_num_rows_per_block`` without stitching them together.
+                This parameter is only used with ``target_num_rows_per_block``.
+                Defaults to ``False``.


Might be better to say that will only produce at most 1 block that is < target_num_rows_per_block per input block if strict is false.

Updated in f748b79

owenowenisme · 2026-01-21T16:17:04Z

+
+
+@pytest.mark.parametrize("batch_size", [30, 35, 45])
+def test_streaming_repartition_fusion_non_strict(


I think fusion test should be in python/ray/data/tests/test_operator_fusion.py

There's existing fusion and streaming repartition related test in this file, I think we can put this here as it align with existing tests. WDYT?

ray/python/ray/data/tests/test_repartition_e2e.py

Line 313 in 45b5d6b

def test_streaming_repartition_fusion_output_shape(

owenowenisme · 2026-01-21T16:18:36Z

+            ref_bundler = StreamingRepartitionRefBundler(batch_size)
+            # No further fusion because StreamingRepartitionRefBundler is stateful
+            # and maintains internal buffering state across bundles.
+            supports_fusion = False


Will this prevent fusion when batch_size == target_num_rows_per_block ?

Yes, but I think it's intended. As the original code (strict mode) hard-coded supports_fusion=False to prevent further fusion

# For now, we don't want to over-fuse StreamingRepartition with other map operators, # so the result operator does not support further fusion. supports_fusion=False,

owenowenisme · 2026-01-21T16:20:01Z

+        strict: If True, guarantees that all output blocks, except for the last one,
+            will have exactly target_num_rows_per_block rows. If False, is more relaxed
+            and may produce blocks smaller than target_num_rows_per_block without
+            stitching them together. Defaults to False.


Ditto with the comment in dataset.py

Updated in f748b79

Signed-off-by: machichima <nary12321@gmail.com>

…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>

Signed-off-by: machichima <nary12321@gmail.com>

alexeykudinkin · 2026-02-06T22:50:52Z

+            ref_bundler = StreamingRepartitionRefBundler(batch_size)
+            # No further fusion because StreamingRepartitionRefBundler is stateful
+            # and maintains internal buffering state across bundles.
+            supports_fusion = False


We'd not be blocking any subsequent fusion like that

Let's add a test that we're able to fuse multiple ops like this:

Map > Map > SR

Map > SR > SR

While the comment is on line 338 (supports_fusion=False), I want to make sure do we want to support fusion for strict mode? Or just add test for non-strict mode? I think it's the latter one?

The Map > SR > SR case cannot work here because after the first Map > SR fusion, the logical operator becomes AbstractUDFMap rather than MapBatches.

ray/python/ray/data/_internal/logical/rules/operator_fusion.py

Lines 355 to 369 in f3d444a

logical_op = AbstractUDFMap(

name,

input_op,

up_logical_op.fn,

can_modify_num_rows=up_logical_op.can_modify_num_rows,

fn_args=up_logical_op.fn_args,

fn_kwargs=up_logical_op.fn_kwargs,

fn_constructor_args=up_logical_op.fn_constructor_args,

fn_constructor_kwargs=up_logical_op.fn_constructor_kwargs,

min_rows_per_bundled_input=batch_size,

compute=compute,

ray_remote_args_fn=ray_remote_args_fn,

ray_remote_args=ray_remote_args,

)

self._op_map[op] = logical_op

The current implementation only allows MapBatches > SR fusion:

ray/python/ray/data/_internal/logical/rules/operator_fusion.py

Line 126 in f3d444a

and isinstance(self._op_map[upstream_ops[0]], MapBatches)

To support Map > SR > SR fusion, we will need more changes, which I think is a bit out of scope of this PR.

Updated in:

111c054

83c5ddb

Let's keep it MapBatches then. Map > SR > SR needs to work

I look into it more, seems like Map > SR > SR already worked, but it's CombineShuffles._combine() combining two SR into one, so the result will just be Map > SR

Updated the test in 8552ff9

@machichima but this makes it order dependent -- my point is we should avoid setting supports_fusion=False for the resulting operator

Got it! I updated in 210b634 to set supports_fusion=True for both of them

Signed-off-by: machichima <nary12321@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: machichima <nary12321@gmail.com>

alexeykudinkin · 2026-02-13T02:07:34Z

+            ref_bundler = StreamingRepartitionRefBundler(batch_size)
+            # No further fusion because StreamingRepartitionRefBundler is stateful
+            # and maintains internal buffering state across bundles.
+            supports_fusion = False


@machichima but this makes it order dependent -- my point is we should avoid setting supports_fusion=False for the resulting operator

alexeykudinkin · 2026-02-13T02:09:42Z

+        # StreamingRepartition can only fuse in non-strict mode.
+        # In strict mode, it does not support further fusion.
        if isinstance(up_logical_op, StreamingRepartition):
-            return False
+            return not up_logical_op._strict


We actually don't want to fuse SR > Map, b/c that will reduce parallelism for Map (i believe we'd have the test for that)

Updated in 90153fc (also update the test as well)

alexeykudinkin · 2026-02-13T02:12:00Z

+        # In non-strict mode, use min_rows_per_bundle to ensure creating batches with batch_size.
+        # In strict mode, ref_bundler handles bundling, so do not set min_rows_per_bundle.
+        min_rows = None if down_logical_op._strict else batch_size


Ah, we'd clean up that parameter replacing it with the bundler (don't need to do that in this PR we can do it separately)

alexeykudinkin · 2026-02-13T02:17:19Z

+    Case 1: map_batches -> map_batches -> streaming_repartition(strict=True) -> map_batches -> map_batches
            Result: map -> (map -> s_r)-> (map -> map)
            The fused (map -> s_r) doesn't fuse further with surrounding maps.


@machichima this is the case we've talked about:

Should be (Map -> Map -> SR) -> (Map -> Map)

Updated in 90153fc

…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>

Signed-off-by: machichima <nary12321@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-02-18T10:16:14Z

-            # For now, we don't want to over-fuse StreamingRepartition with other map operators,
-            # so the result operator does not support further fusion.
-            supports_fusion=False,
+            supports_fusion=True,


Strict mode loses StreamingRepartitionRefBundler during further fusion

Medium Severity

Setting supports_fusion=True unconditionally allows the fused MapBatches→StreamingRepartition operator (in strict mode) to be further fused with upstream MapBatches via _get_fused_map_operator during the map fusion phase. That generic method doesn't preserve the ref_bundler, replacing StreamingRepartitionRefBundler with a BlockRefBundler. Unlike StreamingRepartitionRefBundler, which slices input blocks to guarantee exact-multiple row counts per task, BlockRefBundler simply accumulates blocks until a minimum threshold, potentially sending non-multiple row counts. This causes tasks to produce partial output blocks, breaking the strict mode guarantee that all blocks (except the last) have exactly target_num_rows_per_block rows. Non-strict mode is unaffected since it already uses ref_bundler=None.

Additional Locations (1)

python/ray/data/_internal/logical/rules/operator_fusion.py#L507-L520

@machichima PTAL ^

This review comment is related to your previous comment: #60295 (comment)

In the original codebase, we set supports_fusion=False for strict mode.

ray/python/ray/data/_internal/logical/rules/operator_fusion.py

Lines 344 to 346 in eabc0ac

# For now, we don't want to over-fuse StreamingRepartition with other map operators,

# so the result operator does not support further fusion.

supports_fusion=False,

I updated to supports_fusion=True for both strict and non-strict mode in 210b634 that breaks the CI test.

Want to confirm if we want to:

change it back to supports_fusion=False for strict mode

update the test to make CI pass, and keep supports_fusion=True for both strict and non-strict

@machichima yeah, we want to keep supports_fusion=True, we just need to fix the fusion to make sure that appropriate bundler is preserved.

Sure! Updated in 6d25a27

alexeykudinkin

@machichima changes LGTM!

We just need to address the last comment from Bugbot and 1 test failure and we should be good to go

Signed-off-by: machichima <nary12321@gmail.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

alexeykudinkin · 2026-02-23T21:40:30Z

-    Case 1: map_batches -> map_batches -> streaming_repartition -> map_batches -> map_batches
-            Result: map -> (map -> s_r)-> (map -> map)
-            The fused (map -> s_r) doesn't fuse further with surrounding maps.
+    Case 1: map_batches -> map_batches -> streaming_repartition(strict=True) -> map_batches -> map_batches


This shouldn't fuse irrespective of whether it's strict or not (otherwise we might decrease parallelism regardless of whether it's strict or not)

alexeykudinkin · 2026-02-23T21:41:10Z

Thanks for bringing this over the finish line @machichima!

- Expand Table 1 from 239 to 264 entries (add Dashboard + Core commits) - Rebuild Table 1 in correct branch commit order - Add fork point analysis (d60d131) - Add Revert pair documentation - Add 2.54.x cherry-pick correspondence table - Add Build/compilation risk analysis (Bazel 7, gRPC) - Add PR ray-project#60295, PR ray-project#61821, and Issue ray-project#63544 detailed analysis - Add Table 2: all 344 missing Core/Data/Dashboard commits - Add Table 3: 80 commits not covered by Table 1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

machichima added 7 commits January 15, 2026 04:29

feat: add strict option in StreamingRepartition

838513c

Signed-off-by: machichima <nary12321@gmail.com>

feat: enable fusion for non-strict mode

08008d1

Signed-off-by: machichima <nary12321@gmail.com>

fix: .strict to ._strict

591b00f

Signed-off-by: machichima <nary12321@gmail.com>

test: add non strict tests

ae95e03

Signed-off-by: machichima <nary12321@gmail.com>

test: set stirct=True for existing tests

8f7282a

Signed-off-by: machichima <nary12321@gmail.com>

fix: include strict=... in operator name

cdf8f9f

Signed-off-by: machichima <nary12321@gmail.com>

fix: min_row_per_bundle and support fusion issue

def13b2

Signed-off-by: machichima <nary12321@gmail.com>

machichima requested a review from a team as a code owner January 19, 2026 12:00

gemini-code-assist Bot reviewed Jan 19, 2026

View reviewed changes

Comment thread python/ray/data/dataset.py

This comment was marked as outdated.

Sign in to view

machichima added 2 commits January 19, 2026 20:10

docs: update docstring

dc609e1

Signed-off-by: machichima <nary12321@gmail.com>

feat: validate strict with target_num_rows_per_block

2c87758

Signed-off-by: machichima <nary12321@gmail.com>

ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 19, 2026

machichima added 2 commits January 20, 2026 04:49

refactor: precommit

d9d4295

Signed-off-by: machichima <nary12321@gmail.com>

docs: update docstring

04964bc

Signed-off-by: machichima <nary12321@gmail.com>

machichima force-pushed the streamingrepartition-strict-false branch from 6cfbfc5 to 04964bc Compare January 19, 2026 21:25

This comment was marked as outdated.

Sign in to view

machichima added 4 commits January 21, 2026 20:40

fix: pass strict param in CombineRepartitions

a9fbce0

Signed-off-by: machichima <nary12321@gmail.com>

fix: verify target_num_rows_per_block in StreamingRepartition

7b825e5

Signed-off-by: machichima <nary12321@gmail.com>

Merge branch 'master' of github.com:ray-project/ray into streamingrep…

55c79bd

…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>

fix: pass strict param in CombineShuffles

accb54a

Signed-off-by: machichima <nary12321@gmail.com>

owenowenisme reviewed Jan 21, 2026

View reviewed changes

machichima added 3 commits January 23, 2026 19:14

fix: pass min_rows_per_bundle in non-strict mode

89965d0

Signed-off-by: machichima <nary12321@gmail.com>

docs: update docstring

f748b79

Signed-off-by: machichima <nary12321@gmail.com>

test: set strict=True

49cc5fc

Signed-off-by: machichima <nary12321@gmail.com>

cursor Bot reviewed Jan 23, 2026

View reviewed changes

Comment thread python/ray/data/_internal/logical/rules/combine_shuffles.py

Comment thread python/ray/data/_internal/logical/rules/operator_fusion.py

fix: set min_rows_per_bundle to None

68d01c4

Signed-off-by: machichima <nary12321@gmail.com>

machichima requested a review from owenowenisme January 27, 2026 03:22

machichima added 3 commits February 5, 2026 18:12

Merge branch 'master' of github.com:ray-project/ray into streamingrep…

8a48fdd

…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>

fix: update _can_fuse logic for batch size

c77787c

Signed-off-by: machichima <nary12321@gmail.com>

refactor: precommit

6a2fec8

Signed-off-by: machichima <nary12321@gmail.com>

alexeykudinkin reviewed Feb 6, 2026

View reviewed changes

machichima and others added 4 commits February 7, 2026 14:07

fix: enable fuse with other operations in non-strict mode

111c054

Signed-off-by: machichima <nary12321@gmail.com>

test: add map>map>sr and map>sr>map case

83c5ddb

Signed-off-by: machichima <nary12321@gmail.com>

test: update test case to Map > SR > SR

6656b1e

Signed-off-by: machichima <nary12321@gmail.com>

Merge branch 'master' into streamingrepartition-strict-false

8552ff9

cursor Bot reviewed Feb 9, 2026

View reviewed changes

Comment thread python/ray/data/tests/test_operator_fusion.py Outdated

Merge branch 'master' into streamingrepartition-strict-false

7804f94

cursor Bot reviewed Feb 10, 2026

View reviewed changes

Comment thread python/ray/data/tests/test_operator_fusion.py Outdated

test: fix to fit actual behavior

4eac46d

Signed-off-by: machichima <nary12321@gmail.com>

alexeykudinkin reviewed Feb 18, 2026

View reviewed changes

machichima added 3 commits February 18, 2026 17:05

Merge branch 'master' of github.com:ray-project/ray into streamingrep…

fc6ad97

…artition-strict-false Signed-off-by: machichima <nary12321@gmail.com>

fix: avoid setting supports_fusion=False for result op

210b634

Signed-off-by: machichima <nary12321@gmail.com>

fix+test: prevent fuse SR > Map

90153fc

Signed-off-by: machichima <nary12321@gmail.com>

cursor Bot reviewed Feb 18, 2026

View reviewed changes

alexeykudinkin reviewed Feb 20, 2026

View reviewed changes

machichima added 2 commits February 21, 2026 11:41

fix: preserve StreamingRepartitionRefBundler

6d25a27

Signed-off-by: machichima <nary12321@gmail.com>

refactor: precommit

ac0e943

Signed-off-by: machichima <nary12321@gmail.com>

cursor Bot reviewed Feb 21, 2026

View reviewed changes

Comment thread python/ray/data/_internal/logical/rules/operator_fusion.py

alexeykudinkin approved these changes Feb 23, 2026

View reviewed changes

alexeykudinkin merged commit 35b297f into ray-project:master Feb 23, 2026
6 checks passed

claude Bot added the claude-code-assisted label Feb 24, 2026

This was referenced May 25, 2026

[Data] Extend StreamingRepartition non-strict fusion to Filter/MapRows/FlatMap #63624

Open

[Data] Extend StreamingRepartition non-strict fusion to Filter/MapRows/FlatMap #63625

Closed

	if self._min_rows_per_bundle is None:
	# Short-circuit if no bundle row target was defined.
	assert len(self._bundle_buffer) == 1
	bundle = self._bundle_buffer[0]
	self._bundle_buffer = []
	self._bundle_buffer_size = 0
	self._bundle_buffer_size_bytes = 0
	return [bundle], bundle



		@pytest.mark.parametrize("batch_size", [30, 35, 45])
		def test_streaming_repartition_fusion_non_strict(

	logical_op = AbstractUDFMap(
	name,
	input_op,
	up_logical_op.fn,
	can_modify_num_rows=up_logical_op.can_modify_num_rows,
	fn_args=up_logical_op.fn_args,
	fn_kwargs=up_logical_op.fn_kwargs,
	fn_constructor_args=up_logical_op.fn_constructor_args,
	fn_constructor_kwargs=up_logical_op.fn_constructor_kwargs,
	min_rows_per_bundled_input=batch_size,
	compute=compute,
	ray_remote_args_fn=ray_remote_args_fn,
	ray_remote_args=ray_remote_args,
	)
	self._op_map[op] = logical_op

	# For now, we don't want to over-fuse StreamingRepartition with other map operators,
	# so the result operator does not support further fusion.
	supports_fusion=False,

Uh oh!

Conversation

machichima commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

This comment was marked as outdated.

Uh oh!

machichima commented Jan 19, 2026

This comment was marked as outdated.

Uh oh!

owenowenisme left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

machichima Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

machichima Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

machichima Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cursor Bot left a comment

Choose a reason for hiding this comment

cursor Bot Feb 18, 2026

Choose a reason for hiding this comment

Strict mode loses StreamingRepartitionRefBundler during further fusion

Choose a reason for hiding this comment

machichima Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexeykudinkin left a comment

Choose a reason for hiding this comment

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin commented Feb 23, 2026

Labels

4 participants

machichima commented Jan 19, 2026 •

edited

Loading

machichima Jan 26, 2026 •

edited

Loading

machichima Jan 23, 2026 •

edited

Loading

machichima Feb 7, 2026 •

edited

Loading

machichima Feb 20, 2026 •

edited

Loading