Skip to content

[Data] Move requester_id in DefaultAutoscalingCoordinator from Method Signatures into the Constructor#62838

Merged
bveeramani merged 3 commits into
ray-project:masterfrom
rayhhome:requester-id-in-dac
Apr 21, 2026
Merged

[Data] Move requester_id in DefaultAutoscalingCoordinator from Method Signatures into the Constructor#62838
bveeramani merged 3 commits into
ray-project:masterfrom
rayhhome:requester-id-in-dac

Conversation

@rayhhome

@rayhhome rayhhome commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Description

The client wrapper DefaultAutoscalingCoordinator is always tied to a single requester for its entire lifetime—it is constructed once per execution with a fixed requester_id and never shared across multiple requesters. Despite this, requester_id was threaded through every method call (request_resources, cancel_request, get_allocated_resources), adding noise to both call sites and the abstract base class interface.

This PR moves requester_id from the method signatures into the constructor, making the single-requester ownership explicit in the type. The abstract base class (AutoscalingCoordinator) and testing implementation (FakeAutoscalingCoordinator) are updated to match. FakeAutoscalingCoordinator is also simplified from a Dict-keyed multi-tenant structure to a single Optional[Allocation], reflecting its single-tenant usage.

Additional information

API changes:

  • AutoscalingCoordinator abstract methods no longer take requester_id:
# Before
def request_resources(self, requester_id: str, resources, ...) -> None: ...
def cancel_request(self, requester_id: str) -> None: ...
def get_allocated_resources(self, requester_id: str) -> List[ResourceDict]: ...

# After
def request_resources(self, resources, ...) -> None: ...
def cancel_request(self) -> None: ...
def get_allocated_resources(self) -> List[ResourceDict]: ...
  • DefaultAutoscalingCoordinator.__init__ now requires requester_id:
# Before
DefaultAutoscalingCoordinator()

# After
DefaultAutoscalingCoordinator(requester_id="data-<execution_id>")

The _AutoscalingCoordinatorActor (underlying Ray actor that actually coordinates across multiple datasets) is unchanged. It still takes requester_id per call, as it genuinely serves multiple requesters.

Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
@rayhhome rayhhome self-assigned this Apr 21, 2026
Copilot AI review requested due to automatic review settings April 21, 2026 21:34
@rayhhome rayhhome requested a review from a team as a code owner April 21, 2026 21:34
@rayhhome rayhhome added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Apr 21, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the autoscaling coordinator to be single-tenant by moving the requester_id from method arguments to the class constructor. This change simplifies the API across the base, default, and fake coordinator implementations. The review feedback suggests further simplifying the internal caching mechanism in DefaultAutoscalingCoordinator by replacing the dictionary-based cache with a single list, as the class no longer needs to manage resources for multiple requesters.

Comment thread python/ray/data/_internal/cluster_autoscaler/default_autoscaling_coordinator.py Outdated
Comment thread python/ray/data/_internal/cluster_autoscaler/default_autoscaling_coordinator.py Outdated
Comment thread python/ray/data/_internal/cluster_autoscaler/default_autoscaling_coordinator.py Outdated
Comment thread python/ray/data/tests/test_autoscaling_coordinator.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the AutoscalingCoordinator interface so requester_id is bound at construction time (single-tenant coordinator instances), removing requester_id from method signatures and updating the default + fake implementations and call sites accordingly.

Changes:

  • Update AutoscalingCoordinator abstract methods to drop the requester_id parameter.
  • Make DefaultAutoscalingCoordinator require requester_id in its constructor and thread it internally to the underlying Ray actor.
  • Simplify FakeAutoscalingCoordinator from multi-tenant dict storage to a single optional allocation and update tests/callers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
python/ray/data/_internal/cluster_autoscaler/base_autoscaling_coordinator.py Updates the abstract interface to remove requester_id from method signatures and docs.
python/ray/data/_internal/cluster_autoscaler/default_autoscaling_coordinator.py Binds requester_id in the constructor and uses it internally for actor calls and timeout handling.
python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py Constructs DefaultAutoscalingCoordinator with a per-execution requester id; updates coordinator calls to the new signatures.
python/ray/data/_internal/cluster_autoscaler/fake_autoscaling_coordinator.py Updates fake coordinator to the new single-tenant interface and simplifies internal state.
python/ray/data/tests/test_autoscaling_coordinator.py Adjusts tests to construct and call DefaultAutoscalingCoordinator using the new API.
Comments suppressed due to low confidence (1)

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py:345

  • This warning message includes self._requester_id, but AutoscalingCoordinator.cancel_request() no longer takes a requester id, so the id in the log is no longer guaranteed to correspond to what was actually canceled (especially when a custom coordinator is injected). Consider removing the id from the message or deriving it from the coordinator instance (if available) to avoid misleading operational logs.
            msg = (
                f"Failed to cancel resource request for {self._requester_id}."
                " The request will still expire after the timeout of"
                f" {self._min_gap_between_autoscaling_requests_s} seconds."
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/ray/data/_internal/cluster_autoscaler/fake_autoscaling_coordinator.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 0454a47. Configure here.

Comment thread python/ray/data/_internal/cluster_autoscaler/fake_autoscaling_coordinator.py Outdated
…Coordinator

Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
@bveeramani bveeramani enabled auto-merge (squash) April 21, 2026 22:19
@bveeramani bveeramani merged commit 3823f83 into ray-project:master Apr 21, 2026
7 checks passed
HLDKNotFound pushed a commit to chichic21039/ray that referenced this pull request Apr 22, 2026
…thod Signatures into the Constructor (ray-project#62838)

##  Description
The client wrapper `DefaultAutoscalingCoordinator` is always tied to a
single requester for its entire lifetime—it is constructed once per
execution with a fixed `requester_id` and never shared across multiple
requesters. Despite this, `requester_id` was threaded through every
method call (`request_resources`, `cancel_request`,
`get_allocated_resources`), adding noise to both call sites and the
abstract base class interface.

This PR moves `requester_id` from the method signatures into the
constructor, making the single-requester ownership explicit in the type.
The abstract base class (`AutoscalingCoordinator`) and testing
implementation (`FakeAutoscalingCoordinator`) are updated to match.
`FakeAutoscalingCoordinator` is also simplified from a Dict-keyed
multi-tenant structure to a single `Optional[Allocation]`, reflecting
its single-tenant usage.

## Additional information
API changes:
- `AutoscalingCoordinator` abstract methods no longer take
`requester_id`:
```python
# Before
def request_resources(self, requester_id: str, resources, ...) -> None: ...
def cancel_request(self, requester_id: str) -> None: ...
def get_allocated_resources(self, requester_id: str) -> List[ResourceDict]: ...

# After
def request_resources(self, resources, ...) -> None: ...
def cancel_request(self) -> None: ...
def get_allocated_resources(self) -> List[ResourceDict]: ...
```
- `DefaultAutoscalingCoordinator.__init__` now requires `requester_id`:
```python
# Before
DefaultAutoscalingCoordinator()

# After
DefaultAutoscalingCoordinator(requester_id="data-<execution_id>")
```

The `_AutoscalingCoordinatorActor` (underlying Ray actor that actually
coordinates across multiple datasets) is unchanged. It still takes
`requester_id` per call, as it genuinely serves multiple requesters.

---------

Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
@rayhhome rayhhome deleted the requester-id-in-dac branch April 22, 2026 20:33
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…thod Signatures into the Constructor (ray-project#62838)

##  Description
The client wrapper `DefaultAutoscalingCoordinator` is always tied to a
single requester for its entire lifetime—it is constructed once per
execution with a fixed `requester_id` and never shared across multiple
requesters. Despite this, `requester_id` was threaded through every
method call (`request_resources`, `cancel_request`,
`get_allocated_resources`), adding noise to both call sites and the
abstract base class interface.

This PR moves `requester_id` from the method signatures into the
constructor, making the single-requester ownership explicit in the type.
The abstract base class (`AutoscalingCoordinator`) and testing
implementation (`FakeAutoscalingCoordinator`) are updated to match.
`FakeAutoscalingCoordinator` is also simplified from a Dict-keyed
multi-tenant structure to a single `Optional[Allocation]`, reflecting
its single-tenant usage.

## Additional information
API changes:
- `AutoscalingCoordinator` abstract methods no longer take
`requester_id`:
```python
# Before
def request_resources(self, requester_id: str, resources, ...) -> None: ...
def cancel_request(self, requester_id: str) -> None: ...
def get_allocated_resources(self, requester_id: str) -> List[ResourceDict]: ...

# After
def request_resources(self, resources, ...) -> None: ...
def cancel_request(self) -> None: ...
def get_allocated_resources(self) -> List[ResourceDict]: ...
```
- `DefaultAutoscalingCoordinator.__init__` now requires `requester_id`:
```python
# Before
DefaultAutoscalingCoordinator()

# After
DefaultAutoscalingCoordinator(requester_id="data-<execution_id>")
```

The `_AutoscalingCoordinatorActor` (underlying Ray actor that actually
coordinates across multiple datasets) is unchanged. It still takes
`requester_id` per call, as it genuinely serves multiple requesters.

---------

Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

3 participants