Skip to content

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor#46339

Merged
vllm-bot merged 3 commits into
vllm-project:mainfrom
DarkLight1337:fix-fp8-thor
Jun 24, 2026
Merged

[Bugfix] Re-enable FP8 MoE on NVIDIA Thor#46339
vllm-bot merged 3 commits into
vllm-project:mainfrom
DarkLight1337:fix-fp8-thor

Conversation

@DarkLight1337

@DarkLight1337 DarkLight1337 commented Jun 22, 2026

Copy link
Copy Markdown
Member

Purpose

Partially revert a change in #45277 which broke Qwen/Qwen3.5-35B-A3B-FP8 inference on NVIDIA Thor (SM101 for CUDA 12 and SM110 for CUDA 13). This parallels how cutlass_3x_gemm_sm100_fp8 is also enabled for this architecture.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@mergify mergify Bot added ci/build nvidia bug Something isn't working labels Jun 22, 2026
@DarkLight1337 DarkLight1337 changed the title [Bugfix] Re-enable FP8 on NVIDIA Thor Jun 22, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
bool per_out_ch) {
int32_t version_num = get_sm_version_num();
#if defined ENABLE_CUTLASS_MOE_SM100 && ENABLE_CUTLASS_MOE_SM100
if (version_num >= 100 && version_num < 110) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cutlass_moe_mm_sm100 from grouped_mm_c3x_sm100.cu (on which you changed CUDA targets) is actually guarded here?

@DarkLight1337 DarkLight1337 Jun 22, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the code from your PR essentially disables the ENABLE_CUTLASS_MOE_SM100 flag for SM110, which in turn makes cutlass_group_gemm_supported resolve to false and prevents the model from being run

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see...Sorry for the breakage. Can we add something like ENABLE_CUTLASS_MOE_SM110 or so to make its support clearer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you decide to create a new set of kernels for SM110 as well, I think breaking the 1-to-1 mapping between files and kernels would introduce some confusion. But adding a new set of kernels would also introduce a bunch of duplicate code. So I prefer to just keep it as-is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way would be to simply rename all relevant flags/kernels to use sm100_to_110 instead of sm100

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think something like sm100_or_110 makes more sense to me.

@DarkLight1337

Copy link
Copy Markdown
Member Author

Is it ok to merge this first so we can cherry-pick this into v0.24?

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 23, 2026
@Harry-Chen

Copy link
Copy Markdown
Member

Is it ok to merge this first so we can cherry-pick this into v0.24?

Sure I think so.

@DarkLight1337 DarkLight1337 added this to the v0.24.0 cherrypick milestone Jun 24, 2026
@github-project-automation github-project-automation Bot moved this to Ready in NVIDIA Jun 24, 2026
@vllm-bot vllm-bot merged commit 24d5186 into vllm-project:main Jun 24, 2026
210 of 214 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in NVIDIA Jun 24, 2026
@DarkLight1337 DarkLight1337 deleted the fix-fp8-thor branch June 24, 2026 14:36
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
khluu pushed a commit that referenced this pull request Jun 25, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
(cherry picked from commit 24d5186)
qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
wincent8 pushed a commit to wincent8/vllm that referenced this pull request Jun 29, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@Harry-Chen

Copy link
Copy Markdown
Member

@DarkLight1337 Were you testing this PR on CUDA 12.9?

@DarkLight1337

Copy link
Copy Markdown
Member Author

No, I was on CUDA 13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/build nvidia ready ONLY add when PR is ready to merge/full CI is needed

4 participants