Skip to content

[serve][3/N] Introduce experimental ConsistentHashRouter for session-sticky routing#62906

Merged
abrarsheikh merged 8 commits into
masterfrom
consistent-hashing-router
May 8, 2026
Merged

[serve][3/N] Introduce experimental ConsistentHashRouter for session-sticky routing#62906
abrarsheikh merged 8 commits into
masterfrom
consistent-hashing-router

Conversation

@jeffreywang88

@jeffreywang88 jeffreywang88 commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds ConsistentHashRouter, an experimental subclass of RequestRouter that maps session_id → replica via a consistent-hash ring for sticky-session request routing. Client must send session_id along with its request header to benefit from session-stickiness.

Preceding PRs:

Changes

  • Build a ring with V=100 virtual nodes per replica. When the assigned replica rejects the request, walk clockwise for up to K=2 fallback replicas.
  • Route session-less requests through the same ring using internal_request_id. Do not fall back to power-of-two-choices.
  • Rebuild the ring only when the replica set changes.

Implementation gotchas

  • choose_replicas returns [[primary], [fallback_1], [fallback_2]] instead of one multi-element rank; otherwise the framework's _select_from_candidate_replicas would pick the lowest-queue-length replica, defeating stickiness.
  • Override _fulfill_pending_requests: ConsistentHashRouter cannot safely use RequestRouter's FIFO-style task-shedding behavior. Once a routing task pops a request from _pending_requests_to_route, that task owns the request metadata needed to compute the consistent-hash replica. If the base loop exits early because there are “too many” routing tasks, the popped request can remain unfulfilled but no longer be available for another task to route. Pow-2 can recover from that with FIFO fallback. Consistent hashing cannot, because assigning a replica chosen for one request/session to a different pending request would break stickiness. Therefore, the override enforces: if a task pops a request, it must keep trying until that exact request is fulfilled.

Opt-in API

@serve.deployment(
    request_router_config=RequestRouterConfig(
        request_router_class=(
            "ray.serve.experimental.consistent_hash_router:ConsistentHashRouter"
        ),
        request_router_kwargs={"num_virtual_nodes": 100, "num_fallback_replicas": 2},
    ),
)
class SessionAwareDeployment: ...

Benchmarks

Performance comparison w/ PowerOfTwoChoicesRouter

No overhead over power-of-two-choices router.

Screenshot 2026-05-05 at 5 33 04 PM Screenshot 2026-05-05 at 5 33 18 PM

Correctness

During a scaling event, the session affinity rate drops by M/N+1 because M/N+1 sessions are re-assigned to different replicas.
Screenshot 2026-05-01 at 5 10 26 PM

Effectiveness -- LLM session affinity

Screenshot 2026-05-06 at 2 21 35 PM

Replica assignment distribution

With a higher number of virtual nodes, the request -> replica assignment is more uniform.
Screenshot 2026-05-04 at 3 01 20 PM

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ConsistentHashRouter to Ray Serve, providing session stickiness via a consistent-hash ring with virtual nodes. The implementation supports fallback replicas during backpressure and maintains affinity during scaling. The PR also includes extensive unit and integration tests, refactors test utilities, and adds the mmh3 dependency. Review feedback suggests optimizing the ring unzipping logic and the ranked replica lookup loop for better performance and idiomatic Python usage.

Comment thread python/ray/serve/experimental/consistent_hash_router.py Outdated
Comment thread python/ray/serve/experimental/consistent_hash_router.py
@jeffreywang88

Copy link
Copy Markdown
Contributor Author

@cursor review

Comment thread python/ray/serve/experimental/consistent_hash_router.py
@jeffreywang88 jeffreywang88 force-pushed the consistent-hashing-router branch from e360b97 to ffdbd63 Compare April 24, 2026 17:22
@jeffreywang88 jeffreywang88 reopened this Apr 24, 2026
@jeffreywang88 jeffreywang88 changed the title [serve][2/N] Introduce experimental ConsistentHashRouter for session-sticky routing May 4, 2026
abrarsheikh pushed a commit that referenced this pull request May 4, 2026
…yers (#62905)

## Summary

Adds a new `session_id` field that flows from the client to
`RequestMetadata`, giving session-aware request routers a stable key to
hash on. In the follow-up
[PR](#62906), we introduce a new
router that applies consistent hashing based on `session_id`.

No router consumes `session_id` yet. This PR is pure plumbing --
behavior is unchanged.

## API
### 1. Python handle: `handle.options(session_id=...)`

```python
handle.options(session_id="user_123").remote(data)
```

Threaded through `DynamicHandleOptions.session_id` →
`get_request_metadata` → `RequestMetadata.session_id`.

### 2. HTTP: `x-session-id` header

```
GET /chat HTTP/1.1
x-session-id: user_123
```

Extracted in `HTTPProxy.setup_request_context_and_handle`.
Case-insensitive,
accepts both `x-session-id` and `x_session_id`.

### 3. gRPC: `session_id` invocation metadata

```python
stub.__call__.with_call(
    request=req,
    metadata=(("session_id", "user_123"),),
)
```

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Base automatically changed from consistent-hashing-plumbing to master May 4, 2026 17:54
@jeffreywang88 jeffreywang88 force-pushed the consistent-hashing-router branch from d2a68e5 to c582bd4 Compare May 5, 2026 18:28
@jeffreywang88 jeffreywang88 changed the base branch from master to mmh3 May 5, 2026 18:29
elliot-barn pushed a commit that referenced this pull request May 5, 2026
## Description
We need a fast, deterministic hashing algorithm with good avalanche and
uniformity, and `mmh3`, i.e. `MurmurHash3`, has been proven as a good
fit. For example, Cassandra uses `MurmurHash3` for partition tokens
([reference](https://javadoc.io/static/org.apache.cassandra/cassandra-all/3.11.4/org/apache/cassandra/dht/Murmur3Partitioner.html)).

Next PR #62906 uses `mmh3` to
implement a consistent-hashing based router to satisfy session affinity.

## Related issues
> Link related issues: "Fixes #1234", "Closes #1234", or "Related to
#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Base automatically changed from mmh3 to master May 5, 2026 23:00
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
…e-session bursts

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang88 jeffreywang88 force-pushed the consistent-hashing-router branch from c582bd4 to 56d71d0 Compare May 5, 2026 23:12
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang88 jeffreywang88 marked this pull request as ready for review May 6, 2026 00:48
@jeffreywang88 jeffreywang88 requested a review from a team as a code owner May 6, 2026 00:48
@jeffreywang88 jeffreywang88 added the go add ONLY when ready to merge, run all tests label May 6, 2026
@jeffreywang88 jeffreywang88 requested a review from abrarsheikh May 6, 2026 00:48
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label May 6, 2026
Comment thread python/ray/serve/experimental/consistent_hash_router.py Outdated
Comment thread python/ray/serve/experimental/consistent_hash_router.py Outdated
Comment thread python/ray/serve/experimental/consistent_hash_router.py Outdated
Comment thread python/ray/serve/experimental/consistent_hash_router.py Outdated
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Comment thread python/ray/serve/tests/unit/test_pow_2_request_router.py Outdated
Comment thread python/ray/serve/experimental/consistent_hash_router.py
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d3e3ff9. Configure here.

pending_request.future.done()
and len(self._routing_tasks) > self.target_num_routing_tasks
):
break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routing task probes cancelled request indefinitely when under target

Medium Severity

The break condition requires both pending_request.future.done() AND len(self._routing_tasks) > self.target_num_routing_tasks. When a request is externally cancelled but routing tasks are at or below target, the routing task continues probing replicas indefinitely (with backoff sleeps) for a request nobody is waiting for. Since the non-FIFO _fulfill_next_pending_request cannot reassign a found replica to another request, any probed replica with capacity is simply wasted. Under sustained load with cancellations and saturated replicas, this can keep a routing task slot occupied for extended periods, blocking real pending requests from being routed.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d3e3ff9. Configure here.

@abrarsheikh abrarsheikh merged commit fa4a6f6 into master May 8, 2026
6 checks passed
@abrarsheikh abrarsheikh deleted the consistent-hashing-router branch May 8, 2026 22:07
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…yers (ray-project#62905)

## Summary

Adds a new `session_id` field that flows from the client to
`RequestMetadata`, giving session-aware request routers a stable key to
hash on. In the follow-up
[PR](ray-project#62906), we introduce a new
router that applies consistent hashing based on `session_id`.

No router consumes `session_id` yet. This PR is pure plumbing --
behavior is unchanged.

## API
### 1. Python handle: `handle.options(session_id=...)`

```python
handle.options(session_id="user_123").remote(data)
```

Threaded through `DynamicHandleOptions.session_id` →
`get_request_metadata` → `RequestMetadata.session_id`.

### 2. HTTP: `x-session-id` header

```
GET /chat HTTP/1.1
x-session-id: user_123
```

Extracted in `HTTPProxy.setup_request_context_and_handle`.
Case-insensitive,
accepts both `x-session-id` and `x_session_id`.

### 3. gRPC: `session_id` invocation metadata

```python
stub.__call__.with_call(
    request=req,
    metadata=(("session_id", "user_123"),),
)
```

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
## Description
We need a fast, deterministic hashing algorithm with good avalanche and
uniformity, and `mmh3`, i.e. `MurmurHash3`, has been proven as a good
fit. For example, Cassandra uses `MurmurHash3` for partition tokens
([reference](https://javadoc.io/static/org.apache.cassandra/cassandra-all/3.11.4/org/apache/cassandra/dht/Murmur3Partitioner.html)).

Next PR ray-project#62906 uses `mmh3` to
implement a consistent-hashing based router to satisfy session affinity.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…n-sticky routing (ray-project#62906)

## Summary
Adds `ConsistentHashRouter`, an experimental subclass of `RequestRouter`
that maps `session_id` → replica via a consistent-hash ring for
sticky-session request routing. **Client must send `session_id` along
with its request header** to benefit from session-stickiness.

Preceding PRs:
- Plumbing: ray-project#62905
- `mmh3` dependency introduction:
ray-project#63096

## Changes
- Build a ring with V=100 virtual nodes per replica. When the assigned
replica rejects the request, walk clockwise for up to K=2 fallback
replicas.
- Route session-less requests through the same ring using
`internal_request_id`. **Do not** fall back to power-of-two-choices.
- Rebuild the ring only when the replica set changes.

## Implementation gotchas
- `choose_replicas` returns `[[primary], [fallback_1], [fallback_2]]`
instead of one multi-element rank; otherwise the framework's
`_select_from_candidate_replicas` would pick the lowest-queue-length
replica, defeating stickiness.
- Override `_fulfill_pending_requests`: `ConsistentHashRouter` cannot
safely use `RequestRouter`'s FIFO-style task-shedding behavior. Once a
routing task pops a request from `_pending_requests_to_route`, that task
owns the request metadata needed to compute the consistent-hash replica.
If the base loop exits early because there are “too many” routing tasks,
the popped request can remain unfulfilled but no longer be available for
another task to route. Pow-2 can recover from that with FIFO fallback.
Consistent hashing cannot, because assigning a replica chosen for one
request/session to a different pending request would break stickiness.
Therefore, the override enforces: if a task pops a request, it must keep
trying until that exact request is fulfilled.

## Opt-in API

```python
@serve.deployment(
    request_router_config=RequestRouterConfig(
        request_router_class=(
            "ray.serve.experimental.consistent_hash_router:ConsistentHashRouter"
        ),
        request_router_kwargs={"num_virtual_nodes": 100, "num_fallback_replicas": 2},
    ),
)
class SessionAwareDeployment: ...
```

## Benchmarks
### Performance comparison w/ `PowerOfTwoChoicesRouter`
No overhead over power-of-two-choices router.

<img width="1183" height="574" alt="Screenshot 2026-05-05 at 5 33 04 PM"
src="https://github.com/user-attachments/assets/0feacc4b-1700-4336-af12-ce31604bed64"
/>
<img width="1185" height="583" alt="Screenshot 2026-05-05 at 5 33 18 PM"
src="https://github.com/user-attachments/assets/23f26f57-93fa-4a05-9639-b5b97a77db99"
/>

### Correctness
During a scaling event, the session affinity rate drops by `M/N+1`
because `M/N+1` sessions are re-assigned to different replicas.
<img width="1522" height="698" alt="Screenshot 2026-05-01 at 5 10 26 PM"
src="https://github.com/user-attachments/assets/f1a2f8a9-b038-45fd-8c86-2460db098110"
/>

### Effectiveness -- LLM session affinity
<img width="1117" height="362" alt="Screenshot 2026-05-06 at 2 21 35 PM"
src="https://github.com/user-attachments/assets/730d33f7-d41b-4936-9736-b289fa627c6f"
/>

### Replica assignment distribution
With a higher number of virtual nodes, the request -> replica assignment
is more uniform.
<img width="1325" height="673" alt="Screenshot 2026-05-04 at 3 01 20 PM"
src="https://github.com/user-attachments/assets/3be2630d-afe9-4f8f-8f31-a5606553a8af"
/>

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

2 participants