[Bug] Fix deepseek v4 OOM issue by yewentao256 · Pull Request #44914 · vllm-project/vllm

yewentao256 · 2026-06-08T18:42:38Z

Purpose

On H200

vllm serve deepseek-ai/DeepSeek-V4-Pro   --trust-remote-code   --kv-cache-dtype fp8   --block-size 256   --enable-expert-parallel   --tensor-parallel-size 8   --max-model-len 800000   --gpu-memory-utilization 0.95   --max-num-seqs 512   --max-num-batched-tokens 512   --no-enable-flashinfer-autotune   --compilation-config '{"mode": 0, "cudagraph_mode": "FULL_DECODE_ONLY"}'

Will raise

(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]     self.ffn = DeepseekV4MoE(vllm_config, prefix=f"{prefix}.ffn")
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]   File "/home/yewentao256/vllm-source/vllm/models/deepseek_v4/nvidia/model.py", line 569, in __init__
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]     self._init_fused_moe_experts(config, quant_config, prefix)
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]   File "/home/yewentao256/vllm-source/vllm/models/deepseek_v4/nvidia/model.py", line 634, in _init_fused_moe_experts
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]     self.experts = FusedMoE(
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]                    ^^^^^^^^^
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]   File "/home/yewentao256/vllm-source/vllm/model_executor/layers/fused_moe/layer.py", line 336, in FusedMoE
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]     routed_experts = routed_experts_cls(
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]   File "/home/yewentao256/vllm-source/vllm/model_executor/layers/fused_moe/routed_experts.py", line 165, in __init__
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]     self.quant_method.create_weights(layer=self, **moe_quant_params)
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]   File "/home/yewentao256/vllm-source/vllm/model_executor/layers/quantization/fp8.py", line 657, in create_weights
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]     torch.empty(
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]   File "/home/yewentao256/.venv/lib/python3.12/site-packages/torch/utils/_device.py", line 116, in __torch_function__
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=1170423) ERROR 06-08 16:38:17 [multiproc_executor.py:888] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1008.00 MiB. GPU 0 has a total capacity of 139.80 GiB of which 979.00 MiB is free. Including non-PyTorch memory, this process has 138.84 GiB memory in use. Of the allocated memory 136.71 GiB is allocated by PyTorch, and 111.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)

PR #41184 introduces the issue as the new class is not considered for DSV4

This PR fixes this bug, now

(APIServer pid=2147715) INFO 06-08 18:33:42 [loggers.py:271] Engine 000: Avg prompt throughput: 6.4 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=2147715) INFO 06-08 18:33:52 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%

Signed-off-by: yewentao256 <zhyanwentao@126.com>

sfeng33

LGTM

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Upstream vllm-project#44914 carries the runtime fix. Keep a local regression test for the DeepSeek V4 MoE runner refactor path so RoutedExperts continues to use MXFP4 expert quantization. Signed-off-by: jasl <jasl9187@hotmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Upstream vllm-project#44914 carries the runtime fix. Keep a local regression test for the DeepSeek V4 MoE runner refactor path so RoutedExperts continues to use MXFP4 expert quantization. Signed-off-by: jasl <jasl9187@hotmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Upstream vllm-project#44914 carries the runtime fix. Keep a local regression test for the DeepSeek V4 MoE runner refactor path so RoutedExperts continues to use MXFP4 expert quantization. Signed-off-by: jasl <jasl9187@hotmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

fix deepseek v4 oom issue

bdbd602

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested a review from zyongye as a code owner June 8, 2026 18:42

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 8, 2026

mergify Bot added deepseek Related to DeepSeek models bug Something isn't working labels Jun 8, 2026

Merge branch 'main' into wentao-fix-dsv4-oom

1875839

sfeng33 approved these changes Jun 9, 2026

View reviewed changes

vllm-project deleted a comment from mergify Bot Jun 9, 2026

yewentao256 enabled auto-merge (squash) June 9, 2026 18:39

Merge branch 'main' into wentao-fix-dsv4-oom

4fb6b54

vllm-bot merged commit d7607ad into main Jun 9, 2026
38 of 41 checks passed

vllm-bot deleted the wentao-fix-dsv4-oom branch June 9, 2026 22:47

khluu added this to the v0.23.0 cherry picks milestone Jun 9, 2026

waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026

[Bug] Fix deepseek v4 OOM issue (vllm-project#44914)

327ada5

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[Bug] Fix deepseek v4 OOM issue (vllm-project#44914)

a394cbf

Signed-off-by: yewentao256 <zhyanwentao@126.com>

vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026

[Bug] Fix deepseek v4 OOM issue (vllm-project#44914)

c2f6906

Signed-off-by: yewentao256 <zhyanwentao@126.com>

divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026

[Bug] Fix deepseek v4 OOM issue (vllm-project#44914)

2d21fbd

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Bug] Fix deepseek v4 OOM issue (vllm-project#44914)

a7de704

Signed-off-by: yewentao256 <zhyanwentao@126.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Bug] Fix deepseek v4 OOM issue (vllm-project#44914)

b71e3d4

Signed-off-by: yewentao256 <zhyanwentao@126.com>

ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026

[Bug] Fix deepseek v4 OOM issue (vllm-project#44914)

9001e69

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug] Fix deepseek v4 OOM issue#44914

[Bug] Fix deepseek v4 OOM issue#44914
vllm-bot merged 3 commits into
mainfrom
wentao-fix-dsv4-oom

yewentao256 commented Jun 8, 2026

sfeng33 left a comment

Uh oh!

Labels

4 participants

Uh oh!

Uh oh!

Conversation

yewentao256 commented Jun 8, 2026

Purpose

sfeng33 left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants