[serve][3/N] Introduce experimental `ConsistentHashRouter` for session-sticky routing by jeffreywang88 · Pull Request #62906 · ray-project/ray

jeffreywang88 · 2026-04-24T07:15:59Z

Summary

Adds ConsistentHashRouter, an experimental subclass of RequestRouter that maps session_id → replica via a consistent-hash ring for sticky-session request routing. Client must send session_id along with its request header to benefit from session-stickiness.

Preceding PRs:

Plumbing: [serve][1/N] Plumb session_id through request metadata and proxy layers #62905
mmh3 dependency introduction: [serve][2/N] Add mmh3 for consistent hashing #63096

Changes

Build a ring with V=100 virtual nodes per replica. When the assigned replica rejects the request, walk clockwise for up to K=2 fallback replicas.
Route session-less requests through the same ring using internal_request_id. Do not fall back to power-of-two-choices.
Rebuild the ring only when the replica set changes.

Implementation gotchas

choose_replicas returns [[primary], [fallback_1], [fallback_2]] instead of one multi-element rank; otherwise the framework's _select_from_candidate_replicas would pick the lowest-queue-length replica, defeating stickiness.
Override _fulfill_pending_requests: ConsistentHashRouter cannot safely use RequestRouter's FIFO-style task-shedding behavior. Once a routing task pops a request from _pending_requests_to_route, that task owns the request metadata needed to compute the consistent-hash replica. If the base loop exits early because there are “too many” routing tasks, the popped request can remain unfulfilled but no longer be available for another task to route. Pow-2 can recover from that with FIFO fallback. Consistent hashing cannot, because assigning a replica chosen for one request/session to a different pending request would break stickiness. Therefore, the override enforces: if a task pops a request, it must keep trying until that exact request is fulfilled.

Opt-in API

@serve.deployment(
    request_router_config=RequestRouterConfig(
        request_router_class=(
            "ray.serve.experimental.consistent_hash_router:ConsistentHashRouter"
        ),
        request_router_kwargs={"num_virtual_nodes": 100, "num_fallback_replicas": 2},
    ),
)
class SessionAwareDeployment: ...

Benchmarks

Performance comparison w/ `PowerOfTwoChoicesRouter`

No overhead over power-of-two-choices router.

Correctness

During a scaling event, the session affinity rate drops by M/N+1 because M/N+1 sessions are re-assigned to different replicas.

Effectiveness -- LLM session affinity

Replica assignment distribution

With a higher number of virtual nodes, the request -> replica assignment is more uniform.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

gemini-code-assist

Code Review

This pull request introduces a ConsistentHashRouter to Ray Serve, providing session stickiness via a consistent-hash ring with virtual nodes. The implementation supports fallback replicas during backpressure and maintains affinity during scaling. The PR also includes extensive unit and integration tests, refactors test utilities, and adds the mmh3 dependency. Review feedback suggests optimizing the ring unzipping logic and the ranked replica lookup loop for better performance and idiomatic Python usage.

jeffreywang88 · 2026-04-24T07:45:34Z

@cursor review

…yers (#62905) ## Summary Adds a new `session_id` field that flows from the client to `RequestMetadata`, giving session-aware request routers a stable key to hash on. In the follow-up [PR](#62906), we introduce a new router that applies consistent hashing based on `session_id`. No router consumes `session_id` yet. This PR is pure plumbing -- behavior is unchanged. ## API ### 1. Python handle: `handle.options(session_id=...)` ```python handle.options(session_id="user_123").remote(data) ``` Threaded through `DynamicHandleOptions.session_id` → `get_request_metadata` → `RequestMetadata.session_id`. ### 2. HTTP: `x-session-id` header ``` GET /chat HTTP/1.1 x-session-id: user_123 ``` Extracted in `HTTPProxy.setup_request_context_and_handle`. Case-insensitive, accepts both `x-session-id` and `x_session_id`. ### 3. gRPC: `session_id` invocation metadata ```python stub.__call__.with_call( request=req, metadata=(("session_id", "user_123"),), ) ``` ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

## Description We need a fast, deterministic hashing algorithm with good avalanche and uniformity, and `mmh3`, i.e. `MurmurHash3`, has been proven as a good fit. For example, Cassandra uses `MurmurHash3` for partition tokens ([reference](https://javadoc.io/static/org.apache.cassandra/cassandra-all/3.11.4/org/apache/cassandra/dht/Murmur3Partitioner.html)). Next PR #62906 uses `mmh3` to implement a consistent-hashing based router to satisfy session affinity. ## Related issues > Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

…e-session bursts Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d3e3ff9. Configure here.}

cursor · 2026-05-07T18:08:49Z

+                            pending_request.future.done()
+                            and len(self._routing_tasks) > self.target_num_routing_tasks
+                        ):
+                            break


Routing task probes cancelled request indefinitely when under target

Medium Severity

The break condition requires both pending_request.future.done() AND len(self._routing_tasks) > self.target_num_routing_tasks. When a request is externally cancelled but routing tasks are at or below target, the routing task continues probing replicas indefinitely (with backoff sleeps) for a request nobody is waiting for. Since the non-FIFO _fulfill_next_pending_request cannot reassign a found replica to another request, any probed replica with capacity is simply wasted. Under sustained load with cancellations and saturated replicas, this can keep a routing task slot occupied for extended periods, blocking real pending requests from being routed.

^{Reviewed by Cursor Bugbot for commit d3e3ff9. Configure here.}

…yers (ray-project#62905) ## Summary Adds a new `session_id` field that flows from the client to `RequestMetadata`, giving session-aware request routers a stable key to hash on. In the follow-up [PR](ray-project#62906), we introduce a new router that applies consistent hashing based on `session_id`. No router consumes `session_id` yet. This PR is pure plumbing -- behavior is unchanged. ## API ### 1. Python handle: `handle.options(session_id=...)` ```python handle.options(session_id="user_123").remote(data) ``` Threaded through `DynamicHandleOptions.session_id` → `get_request_metadata` → `RequestMetadata.session_id`. ### 2. HTTP: `x-session-id` header ``` GET /chat HTTP/1.1 x-session-id: user_123 ``` Extracted in `HTTPProxy.setup_request_context_and_handle`. Case-insensitive, accepts both `x-session-id` and `x_session_id`. ### 3. gRPC: `session_id` invocation metadata ```python stub.__call__.with_call( request=req, metadata=(("session_id", "user_123"),), ) ``` ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

## Description We need a fast, deterministic hashing algorithm with good avalanche and uniformity, and `mmh3`, i.e. `MurmurHash3`, has been proven as a good fit. For example, Cassandra uses `MurmurHash3` for partition tokens ([reference](https://javadoc.io/static/org.apache.cassandra/cassandra-all/3.11.4/org/apache/cassandra/dht/Murmur3Partitioner.html)). Next PR ray-project#62906 uses `mmh3` to implement a consistent-hashing based router to satisfy session affinity. ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

…n-sticky routing (ray-project#62906) ## Summary Adds `ConsistentHashRouter`, an experimental subclass of `RequestRouter` that maps `session_id` → replica via a consistent-hash ring for sticky-session request routing. **Client must send `session_id` along with its request header** to benefit from session-stickiness. Preceding PRs: - Plumbing: ray-project#62905 - `mmh3` dependency introduction: ray-project#63096 ## Changes - Build a ring with V=100 virtual nodes per replica. When the assigned replica rejects the request, walk clockwise for up to K=2 fallback replicas. - Route session-less requests through the same ring using `internal_request_id`. **Do not** fall back to power-of-two-choices. - Rebuild the ring only when the replica set changes. ## Implementation gotchas - `choose_replicas` returns `[[primary], [fallback_1], [fallback_2]]` instead of one multi-element rank; otherwise the framework's `_select_from_candidate_replicas` would pick the lowest-queue-length replica, defeating stickiness. - Override `_fulfill_pending_requests`: `ConsistentHashRouter` cannot safely use `RequestRouter`'s FIFO-style task-shedding behavior. Once a routing task pops a request from `_pending_requests_to_route`, that task owns the request metadata needed to compute the consistent-hash replica. If the base loop exits early because there are “too many” routing tasks, the popped request can remain unfulfilled but no longer be available for another task to route. Pow-2 can recover from that with FIFO fallback. Consistent hashing cannot, because assigning a replica chosen for one request/session to a different pending request would break stickiness. Therefore, the override enforces: if a task pops a request, it must keep trying until that exact request is fulfilled. ## Opt-in API ```python @serve.deployment( request_router_config=RequestRouterConfig( request_router_class=( "ray.serve.experimental.consistent_hash_router:ConsistentHashRouter" ), request_router_kwargs={"num_virtual_nodes": 100, "num_fallback_replicas": 2}, ), ) class SessionAwareDeployment: ... ``` ## Benchmarks ### Performance comparison w/ `PowerOfTwoChoicesRouter` No overhead over power-of-two-choices router. <img width="1183" height="574" alt="Screenshot 2026-05-05 at 5 33 04 PM" src="https://github.com/user-attachments/assets/0feacc4b-1700-4336-af12-ce31604bed64" /> <img width="1185" height="583" alt="Screenshot 2026-05-05 at 5 33 18 PM" src="https://github.com/user-attachments/assets/23f26f57-93fa-4a05-9639-b5b97a77db99" /> ### Correctness During a scaling event, the session affinity rate drops by `M/N+1` because `M/N+1` sessions are re-assigned to different replicas. <img width="1522" height="698" alt="Screenshot 2026-05-01 at 5 10 26 PM" src="https://github.com/user-attachments/assets/f1a2f8a9-b038-45fd-8c86-2460db098110" /> ### Effectiveness -- LLM session affinity <img width="1117" height="362" alt="Screenshot 2026-05-06 at 2 21 35 PM" src="https://github.com/user-attachments/assets/730d33f7-d41b-4936-9736-b289fa627c6f" /> ### Replica assignment distribution With a higher number of virtual nodes, the request -> replica assignment is more uniform. <img width="1325" height="673" alt="Screenshot 2026-05-04 at 3 01 20 PM" src="https://github.com/user-attachments/assets/3be2630d-afe9-4f8f-8f31-a5606553a8af" /> ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 mentioned this pull request Apr 24, 2026

[serve][1/N] Plumb session_id through request metadata and proxy layers #62905

Merged

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread python/ray/serve/experimental/consistent_hash_router.py Outdated

Comment thread python/ray/serve/experimental/consistent_hash_router.py

cursor Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread python/ray/serve/experimental/consistent_hash_router.py

jeffreywang88 closed this Apr 24, 2026

jeffreywang88 force-pushed the consistent-hashing-router branch from e360b97 to ffdbd63 Compare April 24, 2026 17:22

jeffreywang88 reopened this Apr 24, 2026

jeffreywang88 changed the title ~~[serve][2/N] Introduce experimental ConsistentHashRouter for session-sticky routing~~ May 4, 2026

jeffreywang88 mentioned this pull request May 4, 2026

[serve][2/N] Add mmh3 for consistent hashing #63096

Merged

Base automatically changed from consistent-hashing-plumbing to master May 4, 2026 17:54

jeffreywang88 force-pushed the consistent-hashing-router branch from d2a68e5 to c582bd4 Compare May 5, 2026 18:28

jeffreywang88 changed the base branch from master to mmh3 May 5, 2026 18:29

Base automatically changed from mmh3 to master May 5, 2026 23:00

jeffreywang88 added 4 commits May 5, 2026 23:11

Add mmh3 deps for consistent hashing

80be990

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

[serve] Add ConsistentHashRouter for session-sticky routing

7cb393e

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Add backoff when all primary and fallback replicas are saturated

ca28247

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

[serve] Fix ConsistentHashRouter orphan livelock under concurrent sam…

56d71d0

…e-session bursts Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 force-pushed the consistent-hashing-router branch from c582bd4 to 56d71d0 Compare May 5, 2026 23:12

Remove mmh3 from serve-requirements.txt

8a8f0f5

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 marked this pull request as ready for review May 6, 2026 00:48

jeffreywang88 requested a review from a team as a code owner May 6, 2026 00:48

jeffreywang88 added the go add ONLY when ready to merge, run all tests label May 6, 2026

jeffreywang88 requested a review from abrarsheikh May 6, 2026 00:48

ray-gardener Bot added the serve Ray Serve Related Issue label May 6, 2026

abrarsheikh reviewed May 6, 2026

View reviewed changes

Rename variables

2f39e05

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

abrarsheikh approved these changes May 7, 2026

View reviewed changes

Comment thread python/ray/serve/tests/unit/test_pow_2_request_router.py Outdated

Comment thread python/ray/serve/experimental/consistent_hash_router.py

Move test utilities

d3e3ff9

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

cursor Bot reviewed May 7, 2026

View reviewed changes

Merge branch 'master' into consistent-hashing-router

63e2ce5

abrarsheikh merged commit fa4a6f6 into master May 8, 2026
6 checks passed

abrarsheikh deleted the consistent-hashing-router branch May 8, 2026 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[serve][3/N] Introduce experimental `ConsistentHashRouter` for session-sticky routing#62906

[serve][3/N] Introduce experimental `ConsistentHashRouter` for session-sticky routing#62906
abrarsheikh merged 8 commits into
masterfrom
consistent-hashing-router

jeffreywang88 commented Apr 24, 2026 •

edited

Loading

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

jeffreywang88 commented Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

cursor Bot May 7, 2026

Uh oh!

Labels

2 participants

Uh oh!

Conversation

jeffreywang88 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Implementation gotchas

Opt-in API

Benchmarks

Performance comparison w/ PowerOfTwoChoicesRouter

Correctness

Effectiveness -- LLM session affinity

Replica assignment distribution

Related issues

Additional information

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

jeffreywang88 commented Apr 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

cursor Bot May 7, 2026

Choose a reason for hiding this comment

Routing task probes cancelled request indefinitely when under target

Uh oh!

Labels

2 participants

jeffreywang88 commented Apr 24, 2026 •

edited

Loading

Performance comparison w/ `PowerOfTwoChoicesRouter`