Skip to content

[Serve] Downstream deployments over-provision when receiving Deployme…#60747

Merged
abrarsheikh merged 1 commit into
masterfrom
60624-abrar-auto
Feb 4, 2026
Merged

[Serve] Downstream deployments over-provision when receiving Deployme…#60747
abrarsheikh merged 1 commit into
masterfrom
60624-abrar-auto

Conversation

@abrarsheikh

Copy link
Copy Markdown
Contributor

fixes #60624

…ntResponse arguments from slow upstream

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh requested a review from a team as a code owner February 4, 2026 18:55
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Feb 4, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an over-provisioning issue in downstream deployments by resolving request arguments before they are counted as queued. The logic change in router.py is direct and well-commented, and the new test case in test_autoscaling_policy.py effectively validates the fix. I have one suggestion to make the test even more robust against potential timing issues.


# Wait for all 5 requests to be blocked at SlowUpstream (waiting on signal)
wait_for_condition(lambda: ray.get(signal.cur_num_waiters.remote()) == 5)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make this test more robust against timing-related flakiness, it would be beneficial to add a short time.sleep() after waiting for the requests to be blocked and before asserting the number of replicas. This ensures that the autoscaler has had sufficient time to make a (potentially incorrect) scaling decision. Given upscale_delay_s is 0.2s, a sleep of 0.5s should be adequate.

Suggested change
# Give the autoscaler time to potentially make a wrong decision.
# A sleep duration longer than upscale_delay_s (0.2s) ensures that
# we would have seen an upscale event if the fix was not effective.
time.sleep(0.5)
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label Feb 4, 2026
@abrarsheikh abrarsheikh merged commit c40ef35 into master Feb 4, 2026
6 checks passed
@abrarsheikh abrarsheikh deleted the 60624-abrar-auto branch February 4, 2026 23:32
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
ray-project#60747)


fixes ray-project#60624

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
#60747)

fixes #60624

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
ray-project#60747)

fixes ray-project#60624

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

2 participants