[Feat][Executor] Introduce RayExecutorV2 by jeffreywang88 · Pull Request #36836 · vllm-project/vllm

jeffreywang88 · 2026-03-12T01:40:50Z

Purpose

Implement RayExecutorV2, a new Ray-based distributed executor that uses MessageQueue (shared memory + TCP fallback) for the control plane instead of Ray compiled graphs. It reuses MultiprocExecutor's MQ-based RPC and NCCL data plane while spawning workers as Ray actors into placement group bundles.
Workers on the same node as the driver communicate via shared memory; cross-node workers automatically fall back to ZMQ TCP transport. Bundle assignments are sorted driver-node-first to ensure rank 0 is co-located with the executor.
Add VLLM_USE_RAY_V2_EXECUTOR_BACKEND env var feature flag (default off) to opt into the new executor when distributed_executor_backend="ray". Enable async scheduling support for the new backend.

For more details, please refer to RFC: #35848.

EEP support is out-of-scope for this PR and is tracked here: #38164.

Test Plan

Unit tests

pytest tests/distributed/test_ray_v2_executor.py: executor init, TP/PP combos, placement groups, RPC, worker death, shutdown
pytest tests/utils_/test_ray_utils.py: bundle sorting logic
Validate cross-node TCP path for MessageQueue with test_mq_tcp_multinode.py

Integration tests

pytest tests/distributed/test_ray_v2_executor_e2e.py: Creates Ray actors which initialize AsyncLLMEngine internally and verify that they can serve requests.
pytest tests/distributed/test_pipeline_parallel.py -k "ray": PP correctness with the new backend
pytest tests/basic_correctness/test_basic_correctness.py -k "ray": basic correctness

Test Result

Benchmark results (Qwen/Qwen3-8B on L4)

Server:

# MP backend
vllm serve Qwen/Qwen3-8B --tensor-parallel-size 4 --distributed-executor-backend mp --port 8000

# Existing Ray backend
VLLM_USE_RAY_V2_EXECUTOR_BACKEND=0 vllm serve Qwen/Qwen3-8B --tensor-parallel-size 4 --distributed-executor-backend ray --port 8000

# Ray V2 backend
VLLM_USE_RAY_V2_EXECUTOR_BACKEND=1 vllm serve Qwen/Qwen3-8B --tensor-parallel-size 4 --distributed-executor-backend ray --port 8000

Client

vllm bench serve --model Qwen/Qwen3-8B --dataset-name random --input-len 512 --output-len 128 --num-prompts 500 --request-rate 10 --port 8000

TP=4; MP backend (async scheduling is on by default)

============ Serving Benchmark Result ============
Successful requests:                     500       
Failed requests:                         0         
Request rate configured (RPS):           10.00     
Benchmark duration (s):                  53.64     
Total input tokens:                      256000    
Total generated tokens:                  64000     
Request throughput (req/s):              9.32      
Output token throughput (tok/s):         1193.20   
Peak output token throughput (tok/s):    1475.00   
Peak concurrent requests:                82.00     
Total token throughput (tok/s):          5965.99   
---------------Time to First Token----------------
Mean TTFT (ms):                          117.12    
Median TTFT (ms):                        117.26    
P99 TTFT (ms):                           156.28    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          40.95     
Median TPOT (ms):                        41.81     
P99 TPOT (ms):                           46.68     
---------------Inter-token Latency----------------
Mean ITL (ms):                           40.95     
Median ITL (ms):                         40.80     
P99 ITL (ms):                            54.51

TP=4; Ray backend

============ Serving Benchmark Result ============
Successful requests:                     500       
Failed requests:                         0         
Request rate configured (RPS):           10.00     
Benchmark duration (s):                  53.93     
Total input tokens:                      256000    
Total generated tokens:                  64000     
Request throughput (req/s):              9.27      
Output token throughput (tok/s):         1186.80   
Peak output token throughput (tok/s):    1464.00   
Peak concurrent requests:                84.00     
Total token throughput (tok/s):          5934.02   
---------------Time to First Token----------------
Mean TTFT (ms):                          86.00     
Median TTFT (ms):                        86.32     
P99 TTFT (ms):                           120.62    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.88     
Median TPOT (ms):                        47.14     
P99 TPOT (ms):                           51.94     
---------------Inter-token Latency----------------
Mean ITL (ms):                           45.88     
Median ITL (ms):                         47.21     
P99 ITL (ms):                            58.59

TP=4; Ray V2 backend w/ async scheduling

============ Serving Benchmark Result ============
Successful requests:                     500       
Failed requests:                         0         
Request rate configured (RPS):           10.00     
Benchmark duration (s):                  53.67     
Total input tokens:                      256000    
Total generated tokens:                  64000     
Request throughput (req/s):              9.32      
Output token throughput (tok/s):         1192.53   
Peak output token throughput (tok/s):    1442.00   
Peak concurrent requests:                82.00     
Total token throughput (tok/s):          5962.65   
---------------Time to First Token----------------
Mean TTFT (ms):                          119.11    
Median TTFT (ms):                        120.43    
P99 TTFT (ms):                           154.20    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          41.11     
Median TPOT (ms):                        42.06     
P99 TPOT (ms):                           46.64     
---------------Inter-token Latency----------------
Mean ITL (ms):                           41.11     
Median ITL (ms):                         40.82     
P99 ITL (ms):                            54.10

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify · 2026-03-12T20:54:42Z

Hi @jeffreywang-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

jeffreywang88 · 2026-03-12T20:55:19Z

@njhill FYI this PR is not ready for review yet as I'm iterating on the CI. Will let you know once it's in a good shape for review!

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify · 2026-03-16T20:36:04Z

Hi @jeffreywang-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify · 2026-03-17T07:06:39Z

Hi @jeffreywang-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha

ok beautiful. some broad comments after the first pass.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 added 2 commits March 10, 2026 13:54

Implement RayExecutorV2 & tested on a single-node

3a3a250

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Enable multinode

df75664

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify Bot added ci/build v1 labels Mar 12, 2026

khluu added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 12, 2026

jeffreywang88 marked this pull request as ready for review March 12, 2026 20:54

jeffreywang88 requested a review from njhill as a code owner March 12, 2026 20:54

jeffreywang88 added 2 commits March 16, 2026 11:46

Fix pre-commit

bbaa21b

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Fix RayExecutorV2 monitor thread self-join

2541f2d

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 force-pushed the ray branch from 44868f7 to 39402d7 Compare March 17, 2026 05:43

Remove unnecessary changes

c3ad8e5

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 force-pushed the ray branch from 39402d7 to c3ad8e5 Compare March 17, 2026 05:45

Extract bundle sorting to a utility

300d0ae

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 added 2 commits March 17, 2026 07:28

Fix linter

11d32eb

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Enable async scheduling

5795f1d

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 18, 2026 01:46

jeffreywang88 marked this pull request as draft March 18, 2026 02:18

kouroshHakha reviewed Mar 18, 2026

View reviewed changes

puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

187ea0a

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

05ba12f

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

aidendle94 pushed a commit to aidendle94/vllm that referenced this pull request Apr 11, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

e382d13

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang88 mentioned this pull request Apr 16, 2026

[core][cgraph] Introduce fault-tolerant PushMutableObject ray-project/ray#58866

Open

tomeras91 mentioned this pull request Apr 20, 2026

[Bugfix][Ray] Fix RayExecutorV2 actor name collision with DP > 1 #40398

Merged

4 tasks

jamesbraza mentioned this pull request Apr 29, 2026

vllm==0.20.0 (and thus torch==2.11.0) support NovaSky-AI/SkyRL#1590

Closed

sigridjineth mentioned this pull request Apr 30, 2026

[Bugfix][DeepSeek V4] Enable cross-node TP=16 FP8 serving #41312

Open

4 tasks

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

54a5314

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

devdev999 mentioned this pull request May 11, 2026

[Async Scheduling] Support async scheduling with ray backend #29012

Open

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

3bf45d8

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

34ff608

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

4811215

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

penfever mentioned this pull request May 22, 2026

[Bug]: RayExecutorV2 multi-node DP hangs on shm_broadcast — cross-node ranks can't share single-host shared memory #43420

Closed

mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

12eb5ca

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

56e40f7

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feat][Executor] Introduce RayExecutorV2#36836

[Feat][Executor] Introduce RayExecutorV2#36836
njhill merged 31 commits into
vllm-project:mainfrom
jeffreywang88:ray

jeffreywang88 commented Mar 12, 2026 •

edited

Loading

mergify Bot commented Mar 12, 2026

jeffreywang88 commented Mar 12, 2026

mergify Bot commented Mar 16, 2026

mergify Bot commented Mar 17, 2026

kouroshHakha left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

5 participants

Uh oh!

Uh oh!

Conversation

jeffreywang88 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Unit tests

Integration tests

Test Result

Benchmark results (Qwen/Qwen3-8B on L4)

mergify Bot commented Mar 12, 2026

jeffreywang88 commented Mar 12, 2026

mergify Bot commented Mar 16, 2026

mergify Bot commented Mar 17, 2026

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

5 participants

jeffreywang88 commented Mar 12, 2026 •

edited

Loading