Skip to content

[Serve][3/n] Add router queue latency#59233

Merged
abrarsheikh merged 8 commits into
masterfrom
59218-abrar-router_latency
Dec 16, 2025
Merged

[Serve][3/n] Add router queue latency#59233
abrarsheikh merged 8 commits into
masterfrom
59218-abrar-router_latency

Conversation

@abrarsheikh

@abrarsheikh abrarsheikh commented Dec 7, 2025

Copy link
Copy Markdown
Contributor

fixes #59218

import asyncio
from ray import serve
from typing import List

@serve.deployment(max_ongoing_requests=1000)
class MyDeployment:
    async def __call__(self) -> List[int]:
        await asyncio.sleep(.01)
        return 1

app = MyDeployment.bind()

testing with 100 users

Metric With Change Master Δ (Master – With Change)
Requests 128,197 128,136 –61
Fails 1,273 1,363 +90
Median (ms) 170 170 0
95%ile (ms) 260 260 0
99%ile (ms) 310 320 +10 ms
Average (ms) 178.66 178.99 +0.33 ms
Min (ms) 9 8 –1 ms
Max (ms) 1,171 2,995 +1,824 ms
Average size (bytes) 0.99 0.99 0
Current RPS 538.8 558.2 +19.4
Current Failures/s 0 0 0
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Dec 7, 2025

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new metric, serve_queue_wait_time_ms, to measure the time requests spend in the router queue. The implementation adds logic to record this latency in RequestRouter and FIFOMixin, and includes a new test case to verify the metric's correctness.

My review focuses on improving code maintainability by addressing duplication and ensuring consistency in metric tagging. I've also identified a bug in the new test case where time units are being compared incorrectly.

Key feedback points:

  • Refactor duplicated metric recording logic into a helper method.
  • Ensure consistent metric tagging for the new histogram.
  • Correct the assertion in the new test to use the correct time unit (milliseconds).
Comment thread python/ray/serve/tests/test_metrics.py Outdated
Comment thread python/ray/serve/_private/request_router/request_router.py Outdated
Comment thread python/ray/serve/_private/request_router/request_router.py Outdated
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh marked this pull request as ready for review December 15, 2025 07:58
@abrarsheikh abrarsheikh requested review from a team as code owners December 15, 2025 07:58
Comment thread python/ray/serve/_private/request_router/request_router.py Outdated
@ray-gardener ray-gardener Bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Dec 15, 2025
Signed-off-by: abrar <abrar@anyscale.com>

@harshit-anyscale harshit-anyscale left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment from cursor seems legit, else LGTM

Comment thread python/ray/serve/_private/request_router/request_router.py Outdated
Comment thread python/ray/serve/_private/request_router/request_router.py
Signed-off-by: abrar <abrar@anyscale.com>
Comment thread python/ray/serve/_private/request_router/request_router.py Outdated
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh merged commit 31a0e1e into master Dec 16, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the 59218-abrar-router_latency branch December 16, 2025 05:34
kriyanshii pushed a commit to kriyanshii/ray that referenced this pull request Dec 16, 2025
fixes ray-project#59218

```python
import asyncio
from ray import serve
from typing import List

@serve.deployment(max_ongoing_requests=1000)
class MyDeployment:
    async def __call__(self) -> List[int]:
        await asyncio.sleep(.01)
        return 1

app = MyDeployment.bind()

```

testing with 100 users

Metric | With Change | Master | Δ (Master – With Change)
-- | -- | -- | --
Requests | 128,197 | 128,136 | –61
Fails | 1,273 | 1,363 | +90
Median (ms) | 170 | 170 | 0
95%ile (ms) | 260 | 260 | 0
99%ile (ms) | 310 | 320 | +10 ms
Average (ms) | 178.66 | 178.99 | +0.33 ms
Min (ms) | 9 | 8 | –1 ms
Max (ms) | 1,171 | 2,995 | +1,824 ms
Average size (bytes) | 0.99 | 0.99 | 0
Current RPS | 538.8 | 558.2 | +19.4
Current Failures/s | 0 | 0 | 0

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: kriyanshii <kriyanshishah06@gmail.com>
aslonnie pushed a commit that referenced this pull request Jan 21, 2026
## Why are these changes needed?

The `test_router_queue_len_metric` test was flaky because the router
queue length gauge has a 100ms throttle
(`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates
when they happen too quickly.

When replica initialization sets the gauge to 0 and a request
immediately updates it to 1, the second update may be throttled, causing
the test to see 0 instead of 1.

## Related issue number

Fixes flaky test introduced in #59233 after #60139 added throttling.

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

3 participants