Skip to content

[Data] Relaxing DefaultActorAutoscaler constraint to keep autoscaling #61917

Merged
alexeykudinkin merged 3 commits into
masterfrom
ak/act-ascl-fix
Mar 24, 2026
Merged

[Data] Relaxing DefaultActorAutoscaler constraint to keep autoscaling #61917
alexeykudinkin merged 3 commits into
masterfrom
ak/act-ascl-fix

Conversation

@alexeykudinkin

Copy link
Copy Markdown
Contributor

Description

Currently, DefaultActorAutoscaler limits autoscaling if current number of task slots in the pool (ie actor-pool-size x max_tasks_in_flight_per_actor) are more than the pending inputs available.

This is unnecessarily limiting autoscaling:

  • We want to increase actual concurrency, which is measured by actor pool utilization. This check is outdated and is rudimentary to the new Autoscaler architecture.

Case in point: If you increase max_tasks_in_flight_per_actor to improve locality then you essentially disabling ability to autoscale.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@alexeykudinkin alexeykudinkin requested a review from a team as a code owner March 20, 2026 18:31
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Mar 20, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully relaxes the DefaultActorAutoscaler constraint, allowing the autoscaler to scale up based on utilization even when there are sufficient free task slots to handle pending inputs. This change aligns with the goal of improving concurrency and addressing limitations with max_tasks_in_flight_per_actor settings. The corresponding test case has been updated to reflect this new behavior, ensuring the logic is correctly applied.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

# Do not scale up if either
# - Actor Pool is at max size already
# - Op is throttled (ie exceeding allocated resource quota)
elif actor_pool.current_size() >= actor_pool.max_size():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function left behind after removing its caller

Low Severity

The _estimate_expected_tasks function is now completely dead code. It was only called from the removed "enough free task slots" check in _derive_target_scaling_config, and a codebase-wide search confirms it has zero remaining callers — no production code or test code references it. The sibling function _estimate_total_available_task_slots is similarly unused in production but is at least still exercised by tests in test_actor_pool_map_operator.py.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 remove if dead code

@ray-gardener ray-gardener Bot added the data Ray Data-related issues label Mar 20, 2026
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) March 21, 2026 00:37
@github-actions github-actions Bot disabled auto-merge March 21, 2026 00:37
@alexeykudinkin alexeykudinkin merged commit 732c259 into master Mar 24, 2026
6 checks passed
@alexeykudinkin alexeykudinkin deleted the ak/act-ascl-fix branch March 24, 2026 19:34
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Mar 25, 2026
…ng (ray-project#61917)

## Description

Currently, `DefaultActorAutoscaler` limits autoscaling if current number
of task slots in the pool (ie `actor-pool-size x
max_tasks_in_flight_per_actor`) are more than the pending inputs
available.

This is unnecessarily limiting autoscaling:

- We want to increase actual concurrency, which is measured by actor
pool utilization. This check is outdated and is rudimentary to the new
Autoscaler architecture.

Case in point: If you increase `max_tasks_in_flight_per_actor` to
improve locality then you essentially disabling ability to autoscale.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…ng (ray-project#61917)

## Description

Currently, `DefaultActorAutoscaler` limits autoscaling if current number
of task slots in the pool (ie `actor-pool-size x
max_tasks_in_flight_per_actor`) are more than the pending inputs
available.

This is unnecessarily limiting autoscaling:

- We want to increase actual concurrency, which is measured by actor
pool utilization. This check is outdated and is rudimentary to the new
Autoscaler architecture.

Case in point: If you increase `max_tasks_in_flight_per_actor` to
improve locality then you essentially disabling ability to autoscale.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

2 participants