[Data] Make Operator a thin shared base#63137
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Operator base class into an Abstract Base Class (ABC), making name and input_dependencies abstract properties and removing the base init method. Subclasses like PhysicalOperator now explicitly manage these attributes and implement the required properties. A potential issue was identified in the PhysicalOperator.input_dependencies setter, which fails to maintain bidirectional graph consistency by not updating the _output_dependencies of the input operators when dependencies are changed.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 8fcb1ad32089f75ef1fe5d6fdf5cf04a5f336a6e. Configure here.
8fcb1ad to
2db8559
Compare
Signed-off-by: yaommen <myanstu@163.com>
2db8559 to
9d08b71
Compare
## Description #### Why this is needed: This is the next small cleanup in the logical operator dataclass migration stack. After `Operator` becomes the shared base contract, `LogicalOperator` no longer needs to duplicate the same string representation logic. #### What this PR changes: Removes the redundant `dag_str`, `__repr__`, and `__str__` implementations from `LogicalOperator` so logical operators use the shared `Operator` implementation. ## Related issues Part of #60312. ## Additional information This PR is stacked on #63137 and only contains the repr/string consolidation step. ### Tests Ran targeted repr/state-export tests and touched-file pre-commit checks. ### Stack Plan Done: - Convert remaining concrete logical operators to frozen dataclasses. - Make logical abstract classes frozen dataclasses. - Derive logical operator names from the base class. - Replace single-input `input_op` InitVars with `input_dependencies`. - Remove the one-to-one `input_dependency` convenience property. - Clean up logical operator args export. - Make `Operator` a thin shared base. This PR: - Consolidate redundant logical operator repr/string logic. Next: - Clean up special-casing in logical rules. - Finish equality/comparability end-state. --------- Signed-off-by: yaommen <myanstu@163.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
## Description #### Why this is needed: This is the next PR in the `ray-project#60312` logical plan migration stack. After cleaning up logical-operator fields and `_get_args`, the shared `Operator` base still owns concrete `_name` / `_input_dependencies` storage through a constructor. That keeps the base contract tied to mutable storage even though `LogicalOperator` now provides its own frozen dataclass-backed fields. This PR makes the shared base contract explicit before the later `__repr__` / `__str__`, rule-cleanup, and equality/comparability steps. This removes the old shared storage model from the base class, so later cleanup PRs can reason about logical operator state through the public contract instead of working around `_name` / `_input_dependencies` ownership in `Operator`. #### What this PR changes: Makes `Operator` a thin shared base with no constructor-owned `_name` / `_input_dependencies` storage. `Operator` now defines the public `name` / `input_dependencies` contract and its default methods use that public contract instead of touching `_name` / `_input_dependencies` directly. `LogicalOperator` continues to provide the frozen dataclass-backed implementation of that contract. `PhysicalOperator` remains stateful and keeps the existing physical execution behavior, but now owns the mutable name/dependency storage it needs instead of relying on shared `Operator` storage. This physical-side change is compatibility-only: physical operators are not being made immutable or dataclass-backed. This PR does not make physical operators dataclasses or immutable, remove redundant `__repr__` / `__str__`, clean up broader logical-rule special-casing, or change equality/comparability semantics. Those remain separate follow-ups in the current split plan. ## Related issues Part of ray-project#60312 ## Additional information This PR corresponds to the shared `Operator` / `LogicalOperator` base-contract cleanup step in the current split plan. ### Tests - Ran targeted logical, physical-operator transform, operator-fusion, and state-export tests. - Ran touched-file pre-commit checks. ### Stack Plan Done: - PR-A: Add a default property implementation for `LogicalOperator.name` - PR-B: Move logical `output_dependencies` handling out of logical operators - PR-C: Make `LogicalOperator` an ABC with abstract `num_outputs` - PR-D1: Convert one-to-one logical operators to frozen dataclasses - PR-D2: Convert map logical operators to frozen dataclasses - PR-D3: Convert all-to-all, join, read, and write logical operators to frozen dataclasses - PR-D4: Convert remaining source logical operators to frozen dataclasses - PR-Next-0: Convert the remaining concrete logical operators to frozen dataclasses - PR-Next-1: Convert abstract logical operator classes to frozen dataclasses - PR-Next-2: make logical-operator names derived by default - PR-Next-3: deduplicate `_apply_transform` - PR-Next-4: replace `input_op: InitVar` with a real `input_dependencies` field - PR-Next-5: remove `input_dependency` on `AbstractOneToOne` - PR-Next-6: clean up `_get_args` - This PR: make `Operator` a thin shared base Next: - remove redundant `__repr__` / `__str__` - clean up special-casing in logical rules - finalize equality / comparability work for `ray-project#60312` Signed-off-by: yaommen <myanstu@163.com>
## Description #### Why this is needed: This is the next small cleanup in the logical operator dataclass migration stack. After `Operator` becomes the shared base contract, `LogicalOperator` no longer needs to duplicate the same string representation logic. #### What this PR changes: Removes the redundant `dag_str`, `__repr__`, and `__str__` implementations from `LogicalOperator` so logical operators use the shared `Operator` implementation. ## Related issues Part of ray-project#60312. ## Additional information This PR is stacked on ray-project#63137 and only contains the repr/string consolidation step. ### Tests Ran targeted repr/state-export tests and touched-file pre-commit checks. ### Stack Plan Done: - Convert remaining concrete logical operators to frozen dataclasses. - Make logical abstract classes frozen dataclasses. - Derive logical operator names from the base class. - Replace single-input `input_op` InitVars with `input_dependencies`. - Remove the one-to-one `input_dependency` convenience property. - Clean up logical operator args export. - Make `Operator` a thin shared base. This PR: - Consolidate redundant logical operator repr/string logic. Next: - Clean up special-casing in logical rules. - Finish equality/comparability end-state. --------- Signed-off-by: yaommen <myanstu@163.com> Co-authored-by: Balaji Veeramani <balaji@anyscale.com>

Description
Why this is needed:
This is the next PR in the
#60312logical plan migration stack.After cleaning up logical-operator fields and
_get_args, the sharedOperatorbase still owns concrete_name/_input_dependenciesstorage through a constructor. That keeps the base contract tied to mutable storage even thoughLogicalOperatornow provides its own frozen dataclass-backed fields.This PR makes the shared base contract explicit before the later
__repr__/__str__, rule-cleanup, and equality/comparability steps. This removes the old shared storage model from the base class, so later cleanup PRs can reason about logical operator state through the public contract instead of working around_name/_input_dependenciesownership inOperator.What this PR changes:
Makes
Operatora thin shared base with no constructor-owned_name/_input_dependenciesstorage.Operatornow defines the publicname/input_dependenciescontract and its default methods use that public contract instead of touching_name/_input_dependenciesdirectly.LogicalOperatorcontinues to provide the frozen dataclass-backed implementation of that contract.PhysicalOperatorremains stateful and keeps the existing physical execution behavior, but now owns the mutable name/dependency storage it needs instead of relying on sharedOperatorstorage. This physical-side change is compatibility-only: physical operators are not being made immutable or dataclass-backed.This PR does not make physical operators dataclasses or immutable, remove redundant
__repr__/__str__, clean up broader logical-rule special-casing, or change equality/comparability semantics. Those remain separate follow-ups in the current split plan.Related issues
Part of #60312
Additional information
This PR corresponds to the shared
Operator/LogicalOperatorbase-contract cleanup step in the current split plan.Tests
Stack Plan
Done:
LogicalOperator.nameoutput_dependencieshandling out of logical operatorsLogicalOperatoran ABC with abstractnum_outputs_apply_transforminput_op: InitVarwith a realinput_dependenciesfieldinput_dependencyonAbstractOneToOne_get_argsOperatora thin shared baseNext:
__repr__/__str__#60312