Skip to content

[Data] Fixing ReservationOpResourceAllocator to properly borrow resources for ActorPoolMapOperator#60882

Merged
alexeykudinkin merged 5 commits into
masterfrom
ak/ra-v1-gpu-alloc-fix
Feb 9, 2026
Merged

[Data] Fixing ReservationOpResourceAllocator to properly borrow resources for ActorPoolMapOperator#60882
alexeykudinkin merged 5 commits into
masterfrom
ak/ra-v1-gpu-alloc-fix

Conversation

@alexeykudinkin

@alexeykudinkin alexeykudinkin commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Description

ActorPoolMapOperator.incremental_resource_usage returns 0, since no new incremental resources are necessary to launch new tasks.

However, for resource allocation to properly borrow from shared resources min_scheduling_resources should be used instead.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin requested a review from a team as a code owner February 9, 2026 21:15
@alexeykudinkin alexeykudinkin added the go add ONLY when ready to merge, run all tests label Feb 9, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a resource allocation bug for ActorPoolMapOperator by using min_scheduling_resources() instead of incremental_resource_usage() for resource borrowing. This change allows actor-based operators to properly acquire resources for scaling up. The addition of a targeted regression test is excellent, and the refactoring of existing tests to remove mocking of incremental_resource_usage improves the test suite's realism and maintainability. The changes are well-implemented and look solid.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

# is still enough.
to_borrow = (
op.incremental_resource_usage()
op.min_scheduling_resources()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borrowing and task submission use mismatched resource methods

Low Severity

The borrowing logic now uses op.min_scheduling_resources() while can_submit_new_task still checks against op.incremental_resource_usage(). HashShuffleOperator overrides incremental_resource_usage() to return cpu=1 but does not override min_scheduling_resources() (inheriting the base class zero return). This means the borrowing logic will never borrow CPU for shuffle operators, yet can_submit_new_task still requires cpu >= 1 — so in resource-constrained scenarios the shuffle operator may be unable to make progress where it previously could.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this behavior is correct.

  • can_submit_new_task checks whether a new task can be launched which for APs won't incur new resource usage (unlike scheduling new actors)

@iamjustinhsu iamjustinhsu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

Comment thread python/ray/data/_internal/execution/resource_manager.py Outdated
# is still enough.
to_borrow = (
op.incremental_resource_usage()
op.min_scheduling_resources()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do u know why it was set to incremental_resource_usage usage in the 1st place? Since this is all 0s for actors, I wonder how much this affect CPU workloads?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min_scheduling_resources is a recent thing

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) February 9, 2026 21:35
@alexeykudinkin alexeykudinkin merged commit 0d024e6 into master Feb 9, 2026
6 of 7 checks passed
@alexeykudinkin alexeykudinkin deleted the ak/ra-v1-gpu-alloc-fix branch February 9, 2026 22:11
alexeykudinkin added a commit that referenced this pull request Feb 9, 2026
…ources for `ActorPoolMapOperator` (#60882)

## Description

`ActorPoolMapOperator.incremental_resource_usage` returns 0, since no
new incremental resources are necessary to launch new tasks.

However, for resource allocation to properly borrow from shared
resources `min_scheduling_resources` should be used instead.

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
aslonnie pushed a commit that referenced this pull request Feb 9, 2026
…ources for `ActorPoolMapOperator` (#60891)

## Description

Cherry-pick of #60882

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ources for `ActorPoolMapOperator` (ray-project#60882)

## Description

`ActorPoolMapOperator.incremental_resource_usage` returns 0, since no
new incremental resources are necessary to launch new tasks.

However, for resource allocation to properly borrow from shared
resources `min_scheduling_resources` should be used instead.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ources for `ActorPoolMapOperator` (ray-project#60882)

## Description

`ActorPoolMapOperator.incremental_resource_usage` returns 0, since no
new incremental resources are necessary to launch new tasks.

However, for resource allocation to properly borrow from shared
resources `min_scheduling_resources` should be used instead.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

3 participants