Skip to content

[Data] Make Operator a thin shared base#63137

Merged
bveeramani merged 1 commit into
ray-project:masterfrom
myandpr:data-next7-thin-operator-base
May 6, 2026
Merged

[Data] Make Operator a thin shared base#63137
bveeramani merged 1 commit into
ray-project:masterfrom
myandpr:data-next7-thin-operator-base

Conversation

@myandpr

@myandpr myandpr commented May 5, 2026

Copy link
Copy Markdown
Member

Description

Why this is needed:

This is the next PR in the #60312 logical plan migration stack.

After cleaning up logical-operator fields and _get_args, the shared Operator base still owns concrete _name / _input_dependencies storage through a constructor. That keeps the base contract tied to mutable storage even though LogicalOperator now provides its own frozen dataclass-backed fields.

This PR makes the shared base contract explicit before the later __repr__ / __str__, rule-cleanup, and equality/comparability steps. This removes the old shared storage model from the base class, so later cleanup PRs can reason about logical operator state through the public contract instead of working around _name / _input_dependencies ownership in Operator.

What this PR changes:

Makes Operator a thin shared base with no constructor-owned _name / _input_dependencies storage.

Operator now defines the public name / input_dependencies contract and its default methods use that public contract instead of touching _name / _input_dependencies directly.

LogicalOperator continues to provide the frozen dataclass-backed implementation of that contract.

PhysicalOperator remains stateful and keeps the existing physical execution behavior, but now owns the mutable name/dependency storage it needs instead of relying on shared Operator storage. This physical-side change is compatibility-only: physical operators are not being made immutable or dataclass-backed.

This PR does not make physical operators dataclasses or immutable, remove redundant __repr__ / __str__, clean up broader logical-rule special-casing, or change equality/comparability semantics. Those remain separate follow-ups in the current split plan.

Related issues

Part of #60312

Additional information

This PR corresponds to the shared Operator / LogicalOperator base-contract cleanup step in the current split plan.

Tests

  • Ran targeted logical, physical-operator transform, operator-fusion, and state-export tests.
  • Ran touched-file pre-commit checks.

Stack Plan

Done:

  • PR-A: Add a default property implementation for LogicalOperator.name
  • PR-B: Move logical output_dependencies handling out of logical operators
  • PR-C: Make LogicalOperator an ABC with abstract num_outputs
  • PR-D1: Convert one-to-one logical operators to frozen dataclasses
  • PR-D2: Convert map logical operators to frozen dataclasses
  • PR-D3: Convert all-to-all, join, read, and write logical operators to frozen dataclasses
  • PR-D4: Convert remaining source logical operators to frozen dataclasses
  • PR-Next-0: Convert the remaining concrete logical operators to frozen dataclasses
  • PR-Next-1: Convert abstract logical operator classes to frozen dataclasses
  • PR-Next-2: make logical-operator names derived by default
  • PR-Next-3: deduplicate _apply_transform
  • PR-Next-4: replace input_op: InitVar with a real input_dependencies field
  • PR-Next-5: remove input_dependency on AbstractOneToOne
  • PR-Next-6: clean up _get_args
  • This PR: make Operator a thin shared base

Next:

  • remove redundant __repr__ / __str__
  • clean up special-casing in logical rules
  • finalize equality / comparability work for #60312
@myandpr myandpr requested a review from a team as a code owner May 5, 2026 18:26

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Operator base class into an Abstract Base Class (ABC), making name and input_dependencies abstract properties and removing the base init method. Subclasses like PhysicalOperator now explicitly manage these attributes and implement the required properties. A potential issue was identified in the PhysicalOperator.input_dependencies setter, which fails to maintain bidirectional graph consistency by not updating the _output_dependencies of the input operators when dependencies are changed.

Comment thread python/ray/data/_internal/execution/interfaces/physical_operator.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 8fcb1ad32089f75ef1fe5d6fdf5cf04a5f336a6e. Configure here.

Comment thread python/ray/data/_internal/logical/interfaces/operator.py Outdated
@myandpr myandpr force-pushed the data-next7-thin-operator-base branch from 8fcb1ad to 2db8559 Compare May 5, 2026 18:58
@myandpr myandpr added the go add ONLY when ready to merge, run all tests label May 5, 2026
@ray-gardener ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels May 5, 2026
Signed-off-by: yaommen <myanstu@163.com>
@myandpr myandpr force-pushed the data-next7-thin-operator-base branch from 2db8559 to 9d08b71 Compare May 6, 2026 03:06
@bveeramani bveeramani merged commit edd3a96 into ray-project:master May 6, 2026
6 checks passed
bveeramani added a commit that referenced this pull request May 6, 2026
## Description

#### Why this is needed:

This is the next small cleanup in the logical operator dataclass
migration stack. After `Operator` becomes the shared base contract,
`LogicalOperator` no longer needs to duplicate the same string
representation logic.

#### What this PR changes:

Removes the redundant `dag_str`, `__repr__`, and `__str__`
implementations from `LogicalOperator` so logical operators use the
shared `Operator` implementation.

## Related issues

Part of #60312.

## Additional information

This PR is stacked on #63137 and only contains the repr/string
consolidation step.

### Tests

Ran targeted repr/state-export tests and touched-file pre-commit checks.

### Stack Plan

Done:
- Convert remaining concrete logical operators to frozen dataclasses.
- Make logical abstract classes frozen dataclasses.
- Derive logical operator names from the base class.
- Replace single-input `input_op` InitVars with `input_dependencies`.
- Remove the one-to-one `input_dependency` convenience property.
- Clean up logical operator args export.
- Make `Operator` a thin shared base.

This PR:
- Consolidate redundant logical operator repr/string logic.

Next:
- Clean up special-casing in logical rules.
- Finish equality/comparability end-state.

---------

Signed-off-by: yaommen <myanstu@163.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
## Description

#### Why this is needed:

This is the next PR in the `ray-project#60312` logical plan migration stack.

After cleaning up logical-operator fields and `_get_args`, the shared
`Operator` base still owns concrete `_name` / `_input_dependencies`
storage through a constructor. That keeps the base contract tied to
mutable storage even though `LogicalOperator` now provides its own
frozen dataclass-backed fields.

This PR makes the shared base contract explicit before the later
`__repr__` / `__str__`, rule-cleanup, and equality/comparability steps.
This removes the old shared storage model from the base class, so later
cleanup PRs can reason about logical operator state through the public
contract instead of working around `_name` / `_input_dependencies`
ownership in `Operator`.

#### What this PR changes:

Makes `Operator` a thin shared base with no constructor-owned `_name` /
`_input_dependencies` storage.

`Operator` now defines the public `name` / `input_dependencies` contract
and its default methods use that public contract instead of touching
`_name` / `_input_dependencies` directly.

`LogicalOperator` continues to provide the frozen dataclass-backed
implementation of that contract.

`PhysicalOperator` remains stateful and keeps the existing physical
execution behavior, but now owns the mutable name/dependency storage it
needs instead of relying on shared `Operator` storage. This
physical-side change is compatibility-only: physical operators are not
being made immutable or dataclass-backed.

This PR does not make physical operators dataclasses or immutable,
remove redundant `__repr__` / `__str__`, clean up broader logical-rule
special-casing, or change equality/comparability semantics. Those remain
separate follow-ups in the current split plan.

## Related issues

Part of ray-project#60312

## Additional information

This PR corresponds to the shared `Operator` / `LogicalOperator`
base-contract cleanup step in the current split plan.

### Tests

- Ran targeted logical, physical-operator transform, operator-fusion,
and state-export tests.
- Ran touched-file pre-commit checks.

### Stack Plan

Done:
- PR-A: Add a default property implementation for `LogicalOperator.name`
- PR-B: Move logical `output_dependencies` handling out of logical
operators
- PR-C: Make `LogicalOperator` an ABC with abstract `num_outputs`
- PR-D1: Convert one-to-one logical operators to frozen dataclasses
- PR-D2: Convert map logical operators to frozen dataclasses
- PR-D3: Convert all-to-all, join, read, and write logical operators to
frozen dataclasses
- PR-D4: Convert remaining source logical operators to frozen
dataclasses
- PR-Next-0: Convert the remaining concrete logical operators to frozen
dataclasses
- PR-Next-1: Convert abstract logical operator classes to frozen
dataclasses
- PR-Next-2: make logical-operator names derived by default
- PR-Next-3: deduplicate `_apply_transform`
- PR-Next-4: replace `input_op: InitVar` with a real
`input_dependencies` field
- PR-Next-5: remove `input_dependency` on `AbstractOneToOne`
- PR-Next-6: clean up `_get_args`
- This PR: make `Operator` a thin shared base

Next:
- remove redundant `__repr__` / `__str__`
- clean up special-casing in logical rules
- finalize equality / comparability work for `ray-project#60312`

Signed-off-by: yaommen <myanstu@163.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
## Description

#### Why this is needed:

This is the next small cleanup in the logical operator dataclass
migration stack. After `Operator` becomes the shared base contract,
`LogicalOperator` no longer needs to duplicate the same string
representation logic.

#### What this PR changes:

Removes the redundant `dag_str`, `__repr__`, and `__str__`
implementations from `LogicalOperator` so logical operators use the
shared `Operator` implementation.

## Related issues

Part of ray-project#60312.

## Additional information

This PR is stacked on ray-project#63137 and only contains the repr/string
consolidation step.

### Tests

Ran targeted repr/state-export tests and touched-file pre-commit checks.

### Stack Plan

Done:
- Convert remaining concrete logical operators to frozen dataclasses.
- Make logical abstract classes frozen dataclasses.
- Derive logical operator names from the base class.
- Replace single-input `input_op` InitVars with `input_dependencies`.
- Remove the one-to-one `input_dependency` convenience property.
- Clean up logical operator args export.
- Make `Operator` a thin shared base.

This PR:
- Consolidate redundant logical operator repr/string logic.

Next:
- Clean up special-casing in logical rules.
- Finish equality/comparability end-state.

---------

Signed-off-by: yaommen <myanstu@163.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

2 participants