[Serve] Optimize replica routing request data structures by abrarsheikh · Pull Request #60139 · ray-project/ray

abrarsheikh · 2026-01-14T16:36:05Z

O(1) Pending Request Lookups
- Added dict indices (_pending_requests_by_id and _pending_requests_by_model_id) for fast lookups
- Replaced O(n) linear scans with O(1) dict lookups when finding requests by ID or multiplexed model
Cached Replica List
- Added _replicas_list cache to avoid O(n) dict-to-list conversion on every routing iteration
- List updated only when replicas change via update_replicas() or on_replica_actor_died()
Lazy Cleanup Strategy
- Done futures are lazily cleaned from _pending_requests_by_model_id during lookups using O(1) popleft()
- Avoids expensive O(n) removal from deques
Optimized Retry Insertion
- Extracted sorted insertion logic into _insert_pending_request_sorted() helper
- O(1) fast path for common case (recent retries append to end)
Simplified pow_2_router
- Removed redundant dict creation per routing call
- Direct lookup via self._replicas[chosen_id] instead of building temporary map

random.sample → Direct Selection
Lazy Hash Caching (common.py)
Metrics Throttling (request_router.py, constants.py)

flamegraph of the router after all the optimization

Signed-off-by: abrar <abrar@anyscale.com>

gemini-code-assist

Code Review

This pull request significantly optimizes the replica routing mechanism in Ray Serve by refactoring data structures and lookup logic. The changes introduce dictionary-based indices (_pending_requests_by_id, _pending_requests_by_model_id) for O(1) lookups of pending requests, replacing previous O(N) iterations over deques. Lazy cleanup of completed futures is implemented to prevent memory leaks, and a cached list of replicas (_replicas_list) is maintained to avoid redundant list conversions. These improvements enhance the efficiency of request matching, fulfillment, and replica selection, leading to better performance, especially in high-throughput or multiplexed model scenarios. The code is well-commented, explaining the rationale behind the optimizations.

Signed-off-by: abrar <abrar@anyscale.com>

harshit-anyscale

great improvements, nice work!
left some comments, else LGTM

Signed-off-by: abrar <abrar@anyscale.com>

## Why are these changes needed? The `test_router_queue_len_metric` test was flaky because the router queue length gauge has a 100ms throttle (`RAY_SERVE_ROUTER_QUEUE_LEN_GAUGE_THROTTLE_S`) that can skip updates when they happen too quickly. When replica initialization sets the gauge to 0 and a request immediately updates it to 1, the second update may be throttled, causing the test to see 0 instead of 1. ## Related issue number Fixes flaky test introduced in #59233 after #60139 added throttling. --------- Signed-off-by: Seiji Eicher <seiji@anyscale.com>

abrarsheikh added 2 commits January 14, 2026 14:38

[Serve] optimize request router

a5f96e7

Signed-off-by: abrar <abrar@anyscale.com>

cache replica list

7031e86

Signed-off-by: abrar <abrar@anyscale.com>

gemini-code-assist Bot reviewed Jan 14, 2026

View reviewed changes

abrarsheikh added the go add ONLY when ready to merge, run all tests label Jan 14, 2026

bug

1a9c9a3

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh marked this pull request as ready for review January 14, 2026 20:04

abrarsheikh requested a review from a team as a code owner January 14, 2026 20:04

abrarsheikh requested a review from akyang-anyscale January 14, 2026 20:05

throttle metrics for queue len

84ac4ae

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh requested a review from harshit-anyscale January 14, 2026 22:33

randomize 2 replicas

e9969f6

Signed-off-by: abrar <abrar@anyscale.com>

cursor Bot reviewed Jan 15, 2026

View reviewed changes

Comment thread python/ray/serve/_private/request_router/request_router.py

pop queue len

a140850

Signed-off-by: abrar <abrar@anyscale.com>

ray-gardener Bot added the serve Ray Serve Related Issue label Jan 15, 2026

abrarsheikh added 2 commits January 15, 2026 02:36

fix test

0f89ca5

Signed-off-by: abrar <abrar@anyscale.com>

fix test

476c2b0

Signed-off-by: abrar <abrar@anyscale.com>

harshit-anyscale reviewed Jan 15, 2026

View reviewed changes

use randbits

f0456ea

Signed-off-by: abrar <abrar@anyscale.com>

akyang-anyscale approved these changes Jan 16, 2026

View reviewed changes

Comment thread python/ray/serve/_private/request_router/request_router.py Outdated

simplify code

c952ea7

Signed-off-by: abrar <abrar@anyscale.com>

cursor Bot reviewed Jan 16, 2026

View reviewed changes

Comment thread python/ray/serve/_private/request_router/request_router.py

dedupe

7d9b35e

Signed-off-by: abrar <abrar@anyscale.com>

harshit-anyscale approved these changes Jan 16, 2026

View reviewed changes

abrarsheikh merged commit 00c877d into master Jan 16, 2026
6 checks passed

abrarsheikh deleted the opt-routing branch January 16, 2026 18:10

abrarsheikh mentioned this pull request Jan 20, 2026

[Serve] send requests to replica immediately when replicas are full and max_queued = -1 #60306

Closed

eicherseiji mentioned this pull request Jan 20, 2026

[Serve] Fix flaky test_router_queue_len_metric #60333

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve] Optimize replica routing request data structures#60139

[Serve] Optimize replica routing request data structures#60139
abrarsheikh merged 11 commits into
masterfrom
opt-routing

abrarsheikh commented Jan 14, 2026 •

edited

Loading

gemini-code-assist Bot left a comment

Uh oh!

harshit-anyscale left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

3 participants

Uh oh!

Conversation

abrarsheikh commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

harshit-anyscale left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

3 participants

abrarsheikh commented Jan 14, 2026 •

edited

Loading