[Data] Consolidate logical operator repr#63140
Conversation
There was a problem hiding this comment.
Code Review
This pull request removes the dag_str, __repr__, and __str__ methods from the LogicalOperator class to rely on inherited implementations from the Operator base class. Feedback indicates that the base dag_str implementation lacks memoization for shared nodes in a DAG, which can lead to redundant computations and confusing representations; updating the implementation to use a memoized approach or a visitor pattern is recommended.
I am having trouble creating individual review comments. Click here to see my feedback.
python/ray/data/_internal/logical/interfaces/logical_operator.py (114-122)
The dag_str implementation in the base Operator class (which this class now relies on) is recursive and does not handle shared nodes in the DAG. For plans that are not simple trees (e.g., those involving joins or zips of branches originating from a common source), this will result in redundant string computation and a potentially confusing representation where the same sub-plan appears multiple times.
While consolidating the logic is a good step, consider updating the base implementation in Operator.dag_str to use a memoized approach or a visitor pattern to handle DAG structures more efficiently.
Signed-off-by: yaommen <myanstu@163.com>
2db8559 to
9d08b71
Compare
Signed-off-by: yaommen <myanstu@163.com>
5373c3a to
af220f7
Compare
## Description #### Why this is needed: This is the next small cleanup in the logical operator dataclass migration stack. After `Operator` becomes the shared base contract, `LogicalOperator` no longer needs to duplicate the same string representation logic. #### What this PR changes: Removes the redundant `dag_str`, `__repr__`, and `__str__` implementations from `LogicalOperator` so logical operators use the shared `Operator` implementation. ## Related issues Part of ray-project#60312. ## Additional information This PR is stacked on ray-project#63137 and only contains the repr/string consolidation step. ### Tests Ran targeted repr/state-export tests and touched-file pre-commit checks. ### Stack Plan Done: - Convert remaining concrete logical operators to frozen dataclasses. - Make logical abstract classes frozen dataclasses. - Derive logical operator names from the base class. - Replace single-input `input_op` InitVars with `input_dependencies`. - Remove the one-to-one `input_dependency` convenience property. - Clean up logical operator args export. - Make `Operator` a thin shared base. This PR: - Consolidate redundant logical operator repr/string logic. Next: - Clean up special-casing in logical rules. - Finish equality/comparability end-state. --------- Signed-off-by: yaommen <myanstu@163.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Description
Why this is needed:
This is the next small cleanup in the logical operator dataclass migration stack. After
Operatorbecomes the shared base contract,LogicalOperatorno longer needs to duplicate the same string representation logic.What this PR changes:
Removes the redundant
dag_str,__repr__, and__str__implementations fromLogicalOperatorso logical operators use the sharedOperatorimplementation.Related issues
Part of #60312.
Additional information
This PR is stacked on #63137 and only contains the repr/string consolidation step.
Tests
Ran targeted repr/state-export tests and touched-file pre-commit checks.
Stack Plan
Done:
input_opInitVars withinput_dependencies.input_dependencyconvenience property.Operatora thin shared base.This PR:
Next: