[Bugfix] Re-enable FP8 MoE on NVIDIA Thor by DarkLight1337 · Pull Request #46339 · vllm-project/vllm

DarkLight1337 · 2026-06-22T07:07:31Z

Purpose

Partially revert a change in #45277 which broke Qwen/Qwen3.5-35B-A3B-FP8 inference on NVIDIA Thor (SM101 for CUDA 12 and SM110 for CUDA 13). This parallels how cutlass_3x_gemm_sm100_fp8 is also enabled for this architecture.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Harry-Chen · 2026-06-22T07:32:23Z

                    bool per_out_ch) {
  int32_t version_num = get_sm_version_num();
 #if defined ENABLE_CUTLASS_MOE_SM100 && ENABLE_CUTLASS_MOE_SM100
  if (version_num >= 100 && version_num < 110) {


I think cutlass_moe_mm_sm100 from grouped_mm_c3x_sm100.cu (on which you changed CUDA targets) is actually guarded here?

Yes, but the code from your PR essentially disables the ENABLE_CUTLASS_MOE_SM100 flag for SM110, which in turn makes cutlass_group_gemm_supported resolve to false and prevents the model from being run

Oh, I see...Sorry for the breakage. Can we add something like ENABLE_CUTLASS_MOE_SM110 or so to make its support clearer?

Unless you decide to create a new set of kernels for SM110 as well, I think breaking the 1-to-1 mapping between files and kernels would introduce some confusion. But adding a new set of kernels would also introduce a bunch of duplicate code. So I prefer to just keep it as-is.

Another way would be to simply rename all relevant flags/kernels to use sm100_to_110 instead of sm100

Yeah I think something like sm100_or_110 makes more sense to me.

DarkLight1337 · 2026-06-23T10:09:13Z

Is it ok to merge this first so we can cherry-pick this into v0.24?

Harry-Chen · 2026-06-24T01:06:58Z

Is it ok to merge this first so we can cherry-pick this into v0.24?

Sure I think so.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> (cherry picked from commit 24d5186)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Harry-Chen · 2026-06-30T08:21:04Z

@DarkLight1337 Were you testing this PR on CUDA 12.9?

DarkLight1337 · 2026-06-30T15:08:38Z

No, I was on CUDA 13

[Bugfix] Re-enable FP8 on NVIDIA Thor

7fa23c4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested review from Isotr0py, mgoin and ywang96 June 22, 2026 07:07

DarkLight1337 requested review from Harry-Chen, LucasWilkinson and tlrmchlsmth as code owners June 22, 2026 07:07

mergify Bot added ci/build nvidia bug Something isn't working labels Jun 22, 2026

github-project-automation Bot added this to NVIDIA Jun 22, 2026

DarkLight1337 changed the title ~~[Bugfix] Re-enable FP8 on NVIDIA Thor~~ Jun 22, 2026

Unify

b67b57a

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Harry-Chen reviewed Jun 22, 2026

View reviewed changes

Merge branch 'main' into fix-fp8-thor

c542f9f

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 23, 2026

Harry-Chen mentioned this pull request Jun 24, 2026

Fix unresolved cutlass_moe_mm_sm100 symbol in stable libtorch #46436

Closed

4 tasks

DarkLight1337 added this to the v0.24.0 cherrypick milestone Jun 24, 2026

Isotr0py approved these changes Jun 24, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Jun 24, 2026

vllm-bot merged commit 24d5186 into vllm-project:main Jun 24, 2026
210 of 214 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Jun 24, 2026

DarkLight1337 deleted the fix-fp8-thor branch June 24, 2026 14:36

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor (vllm-project#46339)

83161f4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

khluu pushed a commit that referenced this pull request Jun 25, 2026

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor (#46339)

6829a6d

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> (cherry picked from commit 24d5186)

qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor (vllm-project#46339)

bc6afad

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Qiang Li <qiang.li2@amd.com>

wincent8 pushed a commit to wincent8/vllm that referenced this pull request Jun 29, 2026

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor (vllm-project#46339)

70f64f3

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Harry-Chen mentioned this pull request Jun 30, 2026

[Build] Fix CUDA arch coverage checks and scoped kernel feature flags #47149

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor#46339

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor#46339
vllm-bot merged 3 commits into
vllm-project:mainfrom
DarkLight1337:fix-fp8-thor

DarkLight1337 commented Jun 22, 2026 •

edited

Loading

Harry-Chen Jun 22, 2026

DarkLight1337 Jun 22, 2026 •

edited

Loading

Harry-Chen Jun 22, 2026

DarkLight1337 Jun 22, 2026

DarkLight1337 Jun 22, 2026

Harry-Chen Jun 24, 2026

DarkLight1337 commented Jun 23, 2026

Harry-Chen commented Jun 24, 2026

Uh oh!

Harry-Chen commented Jun 30, 2026

DarkLight1337 commented Jun 30, 2026

Labels

4 participants

Uh oh!

Uh oh!

Conversation

DarkLight1337 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Harry-Chen Jun 22, 2026

Choose a reason for hiding this comment

DarkLight1337 Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Harry-Chen Jun 22, 2026

Choose a reason for hiding this comment

DarkLight1337 Jun 22, 2026

Choose a reason for hiding this comment

DarkLight1337 Jun 22, 2026

Choose a reason for hiding this comment

Harry-Chen Jun 24, 2026

Choose a reason for hiding this comment

DarkLight1337 commented Jun 23, 2026

Harry-Chen commented Jun 24, 2026

Uh oh!

Harry-Chen commented Jun 30, 2026

DarkLight1337 commented Jun 30, 2026

Labels

4 participants

DarkLight1337 commented Jun 22, 2026 •

edited

Loading

DarkLight1337 Jun 22, 2026 •

edited

Loading