Skip to content

[Data] Consolidate logical operator repr#63140

Merged
bveeramani merged 3 commits into
ray-project:masterfrom
myandpr:data-next8-consolidate-logical-repr
May 6, 2026
Merged

[Data] Consolidate logical operator repr#63140
bveeramani merged 3 commits into
ray-project:masterfrom
myandpr:data-next8-consolidate-logical-repr

Conversation

@myandpr

@myandpr myandpr commented May 5, 2026

Copy link
Copy Markdown
Member

Description

Why this is needed:

This is the next small cleanup in the logical operator dataclass migration stack. After Operator becomes the shared base contract, LogicalOperator no longer needs to duplicate the same string representation logic.

What this PR changes:

Removes the redundant dag_str, __repr__, and __str__ implementations from LogicalOperator so logical operators use the shared Operator implementation.

Related issues

Part of #60312.

Additional information

This PR is stacked on #63137 and only contains the repr/string consolidation step.

Tests

Ran targeted repr/state-export tests and touched-file pre-commit checks.

Stack Plan

Done:

  • Convert remaining concrete logical operators to frozen dataclasses.
  • Make logical abstract classes frozen dataclasses.
  • Derive logical operator names from the base class.
  • Replace single-input input_op InitVars with input_dependencies.
  • Remove the one-to-one input_dependency convenience property.
  • Clean up logical operator args export.
  • Make Operator a thin shared base.

This PR:

  • Consolidate redundant logical operator repr/string logic.

Next:

  • Clean up special-casing in logical rules.
  • Finish equality/comparability end-state.
@myandpr myandpr requested a review from a team as a May 5, 2026 19:17

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the dag_str, __repr__, and __str__ methods from the LogicalOperator class to rely on inherited implementations from the Operator base class. Feedback indicates that the base dag_str implementation lacks memoization for shared nodes in a DAG, which can lead to redundant computations and confusing representations; updating the implementation to use a memoized approach or a visitor pattern is recommended.

I am having trouble creating individual review comments. Click here to see my feedback.

python/ray/data/_internal/logical/interfaces/logical_operator.py (114-122)

medium

The dag_str implementation in the base Operator class (which this class now relies on) is recursive and does not handle shared nodes in the DAG. For plans that are not simple trees (e.g., those involving joins or zips of branches originating from a common source), this will result in redundant string computation and a potentially confusing representation where the same sub-plan appears multiple times.

While consolidating the logic is a good step, consider updating the base implementation in Operator.dag_str to use a memoized approach or a visitor pattern to handle DAG structures more efficiently.

@ray-gardener ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels May 5, 2026
Signed-off-by: yaommen <myanstu@163.com>
@myandpr myandpr force-pushed the data-next7-thin-operator-base branch from 2db8559 to 9d08b71 Compare May 6, 2026 03:07
Signed-off-by: yaommen <myanstu@163.com>
@myandpr myandpr force-pushed the data-next8-consolidate-logical-repr branch from 5373c3a to af220f7 Compare May 6, 2026 03:08
@myandpr myandpr added the go add ONLY when ready to merge, run all tests label May 6, 2026
@bveeramani bveeramani changed the base branch from data-next7-thin-operator-base to master May 6, 2026 05:14
@bveeramani bveeramani enabled auto-merge (squash) May 6, 2026 05:14
@bveeramani bveeramani merged commit 4b02ada into ray-project:master May 6, 2026
7 checks passed
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
## Description

#### Why this is needed:

This is the next small cleanup in the logical operator dataclass
migration stack. After `Operator` becomes the shared base contract,
`LogicalOperator` no longer needs to duplicate the same string
representation logic.

#### What this PR changes:

Removes the redundant `dag_str`, `__repr__`, and `__str__`
implementations from `LogicalOperator` so logical operators use the
shared `Operator` implementation.

## Related issues

Part of ray-project#60312.

## Additional information

This PR is stacked on ray-project#63137 and only contains the repr/string
consolidation step.

### Tests

Ran targeted repr/state-export tests and touched-file pre-commit checks.

### Stack Plan

Done:
- Convert remaining concrete logical operators to frozen dataclasses.
- Make logical abstract classes frozen dataclasses.
- Derive logical operator names from the base class.
- Replace single-input `input_op` InitVars with `input_dependencies`.
- Remove the one-to-one `input_dependency` convenience property.
- Clean up logical operator args export.
- Make `Operator` a thin shared base.

This PR:
- Consolidate redundant logical operator repr/string logic.

Next:
- Clean up special-casing in logical rules.
- Finish equality/comparability end-state.

---------

Signed-off-by: yaommen <myanstu@163.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

2 participants