Skip to content

[Data] Enable filter pushdown through StreamingRepartition OP#62347

Merged
goutamvenkat-anyscale merged 1 commit into
ray-project:masterfrom
owenowenisme:data/streaming-repartition-filter-pushdown
Apr 6, 2026
Merged

[Data] Enable filter pushdown through StreamingRepartition OP#62347
goutamvenkat-anyscale merged 1 commit into
ray-project:masterfrom
owenowenisme:data/streaming-repartition-filter-pushdown

Conversation

@owenowenisme

Copy link
Copy Markdown
Member

Description

StreamingRepartition only re-bundles rows into fixed-size blocks without modifying schema or row content. Filters can safely pass through it, same as the existing Repartition operator. This allows predicates to be evaluated earlier in the pipeline, reducing the volume of data that gets re-partitioned.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@owenowenisme owenowenisme requested a review from a team as a code owner April 4, 2026 19:18
@owenowenisme owenowenisme added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Apr 4, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables predicate passthrough for the StreamingRepartition logical operator by implementing the LogicalOperatorSupportsPredicatePassThrough interface. This change allows filters to be pushed through repartitioning operations that only adjust block sizes without altering the schema. Additionally, a new test case has been added to verify that filters correctly push through StreamingRepartition. I have no feedback to provide as there were no review comments.

@goutamvenkat-anyscale goutamvenkat-anyscale merged commit a6372ee into ray-project:master Apr 6, 2026
8 checks passed
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…oject#62347)

## Description
StreamingRepartition only re-bundles rows into fixed-size blocks without
modifying schema or row content. Filters can safely pass through it,
same as the existing Repartition operator. This allows predicates to be
evaluated earlier in the pipeline, reducing the volume of data that gets
re-partitioned.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

2 participants