[Data] Move requester_id in DefaultAutoscalingCoordinator from Method Signatures into the Constructor#62838
Conversation
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the autoscaling coordinator to be single-tenant by moving the requester_id from method arguments to the class constructor. This change simplifies the API across the base, default, and fake coordinator implementations. The review feedback suggests further simplifying the internal caching mechanism in DefaultAutoscalingCoordinator by replacing the dictionary-based cache with a single list, as the class no longer needs to manage resources for multiple requesters.
There was a problem hiding this comment.
Pull request overview
This PR refactors the AutoscalingCoordinator interface so requester_id is bound at construction time (single-tenant coordinator instances), removing requester_id from method signatures and updating the default + fake implementations and call sites accordingly.
Changes:
- Update
AutoscalingCoordinatorabstract methods to drop therequester_idparameter. - Make
DefaultAutoscalingCoordinatorrequirerequester_idin its constructor and thread it internally to the underlying Ray actor. - Simplify
FakeAutoscalingCoordinatorfrom multi-tenant dict storage to a single optional allocation and update tests/callers.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
python/ray/data/_internal/cluster_autoscaler/base_autoscaling_coordinator.py |
Updates the abstract interface to remove requester_id from method signatures and docs. |
python/ray/data/_internal/cluster_autoscaler/default_autoscaling_coordinator.py |
Binds requester_id in the constructor and uses it internally for actor calls and timeout handling. |
python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py |
Constructs DefaultAutoscalingCoordinator with a per-execution requester id; updates coordinator calls to the new signatures. |
python/ray/data/_internal/cluster_autoscaler/fake_autoscaling_coordinator.py |
Updates fake coordinator to the new single-tenant interface and simplifies internal state. |
python/ray/data/tests/test_autoscaling_coordinator.py |
Adjusts tests to construct and call DefaultAutoscalingCoordinator using the new API. |
Comments suppressed due to low confidence (1)
python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py:345
- This warning message includes
self._requester_id, butAutoscalingCoordinator.cancel_request()no longer takes a requester id, so the id in the log is no longer guaranteed to correspond to what was actually canceled (especially when a custom coordinator is injected). Consider removing the id from the message or deriving it from the coordinator instance (if available) to avoid misleading operational logs.
msg = (
f"Failed to cancel resource request for {self._requester_id}."
" The request will still expire after the timeout of"
f" {self._min_gap_between_autoscaling_requests_s} seconds."
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Reviewed by Cursor Bugbot for commit 0454a47. Configure here.
…Coordinator Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
…thod Signatures into the Constructor (ray-project#62838) ## Description The client wrapper `DefaultAutoscalingCoordinator` is always tied to a single requester for its entire lifetime—it is constructed once per execution with a fixed `requester_id` and never shared across multiple requesters. Despite this, `requester_id` was threaded through every method call (`request_resources`, `cancel_request`, `get_allocated_resources`), adding noise to both call sites and the abstract base class interface. This PR moves `requester_id` from the method signatures into the constructor, making the single-requester ownership explicit in the type. The abstract base class (`AutoscalingCoordinator`) and testing implementation (`FakeAutoscalingCoordinator`) are updated to match. `FakeAutoscalingCoordinator` is also simplified from a Dict-keyed multi-tenant structure to a single `Optional[Allocation]`, reflecting its single-tenant usage. ## Additional information API changes: - `AutoscalingCoordinator` abstract methods no longer take `requester_id`: ```python # Before def request_resources(self, requester_id: str, resources, ...) -> None: ... def cancel_request(self, requester_id: str) -> None: ... def get_allocated_resources(self, requester_id: str) -> List[ResourceDict]: ... # After def request_resources(self, resources, ...) -> None: ... def cancel_request(self) -> None: ... def get_allocated_resources(self) -> List[ResourceDict]: ... ``` - `DefaultAutoscalingCoordinator.__init__` now requires `requester_id`: ```python # Before DefaultAutoscalingCoordinator() # After DefaultAutoscalingCoordinator(requester_id="data-<execution_id>") ``` The `_AutoscalingCoordinatorActor` (underlying Ray actor that actually coordinates across multiple datasets) is unchanged. It still takes `requester_id` per call, as it genuinely serves multiple requesters. --------- Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
…thod Signatures into the Constructor (ray-project#62838) ## Description The client wrapper `DefaultAutoscalingCoordinator` is always tied to a single requester for its entire lifetime—it is constructed once per execution with a fixed `requester_id` and never shared across multiple requesters. Despite this, `requester_id` was threaded through every method call (`request_resources`, `cancel_request`, `get_allocated_resources`), adding noise to both call sites and the abstract base class interface. This PR moves `requester_id` from the method signatures into the constructor, making the single-requester ownership explicit in the type. The abstract base class (`AutoscalingCoordinator`) and testing implementation (`FakeAutoscalingCoordinator`) are updated to match. `FakeAutoscalingCoordinator` is also simplified from a Dict-keyed multi-tenant structure to a single `Optional[Allocation]`, reflecting its single-tenant usage. ## Additional information API changes: - `AutoscalingCoordinator` abstract methods no longer take `requester_id`: ```python # Before def request_resources(self, requester_id: str, resources, ...) -> None: ... def cancel_request(self, requester_id: str) -> None: ... def get_allocated_resources(self, requester_id: str) -> List[ResourceDict]: ... # After def request_resources(self, resources, ...) -> None: ... def cancel_request(self) -> None: ... def get_allocated_resources(self) -> List[ResourceDict]: ... ``` - `DefaultAutoscalingCoordinator.__init__` now requires `requester_id`: ```python # Before DefaultAutoscalingCoordinator() # After DefaultAutoscalingCoordinator(requester_id="data-<execution_id>") ``` The `_AutoscalingCoordinatorActor` (underlying Ray actor that actually coordinates across multiple datasets) is unchanged. It still takes `requester_id` per call, as it genuinely serves multiple requesters. --------- Signed-off-by: Sirui Huang <ray.huang@anyscale.com>

Description
The client wrapper
DefaultAutoscalingCoordinatoris always tied to a single requester for its entire lifetime—it is constructed once per execution with a fixedrequester_idand never shared across multiple requesters. Despite this,requester_idwas threaded through every method call (request_resources,cancel_request,get_allocated_resources), adding noise to both call sites and the abstract base class interface.This PR moves
requester_idfrom the method signatures into the constructor, making the single-requester ownership explicit in the type. The abstract base class (AutoscalingCoordinator) and testing implementation (FakeAutoscalingCoordinator) are updated to match.FakeAutoscalingCoordinatoris also simplified from a Dict-keyed multi-tenant structure to a singleOptional[Allocation], reflecting its single-tenant usage.Additional information
API changes:
AutoscalingCoordinatorabstract methods no longer takerequester_id:DefaultAutoscalingCoordinator.__init__now requiresrequester_id:The
_AutoscalingCoordinatorActor(underlying Ray actor that actually coordinates across multiple datasets) is unchanged. It still takesrequester_idper call, as it genuinely serves multiple requesters.