[Perf] fuse qk rmsnorm rope gate for qwen3.5 by ZJY0516 · Pull Request #44176 · vllm-project/vllm

ZJY0516 · 2026-06-01T08:11:40Z

Purpose

combine split + QK-RMSNorm + partial RoPE + gate copy into one kernel launch
Ref: lightseekorg/tokenspeed#228

Test Plan

vllm bench throughput \
 --model Qwen/Qwen3.5-27B \
 --dataset-name random --language-model-only

Test Result

main

Throughput: 14.35 requests/s, 16535.51 total tokens/s, 1837.28 output tokens/s

PR

Throughput: 14.65 requests/s, 16877.84 total tokens/s, 1875.32 output tokens/s

Accuracy

main

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.6702	±	0.0129
		strict-match	5	exact_match	↑	0.6710	±	0.0129

PR

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.6619	±	0.013
		strict-match	5	exact_match	↑	0.6603	±	0.013

GPQA

main: 0.8485

PR: 0.8485

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 410c9f80da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

dllehr-amd · 2026-06-01T17:19:23Z

This looks like one of the first PRs to tackle the new RFC for putting fusions directly in model files. Given the size of the kernel, and how it affects the overall LoC's in the qwen_next.py. Would it be worth having a fusion, or kernel file that can hold these instances?

I'm looking at this from a ROCm/Aiter perspective, where we have "aiter_ops" to track a lot of these kernels, and keep the model file focused more on the overall flow.

What are people's thoughts here @SageMoore @robertgshaw2-redhat @WoosukKwon ?

vadiklyutiy

LGTM
But only one thing. I'd propose to move triton kernel and its python interface to another file

ZJY0516 · 2026-06-06T16:52:51Z

LGTM But only one thing. I'd propose to move triton kernel and its python interface to another file

ok, I'll refactor it. cc @dllehr-amd

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

mergify · 2026-06-09T06:47:58Z

Hi @ZJY0516, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 added 3 commits June 1, 2026 07:14

update

a493b26

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

4649d31

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

410c9f8

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested review from AndreasKaratzas, WoosukKwon, mgoin, sighingnow, tlrmchlsmth, vadiklyutiy, yewentao256 and zyongye as code owners June 1, 2026 08:11

mergify Bot added the qwen Related to Qwen models label Jun 1, 2026

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread vllm/model_executor/models/qwen3_next.py

AndreasKaratzas reviewed Jun 1, 2026

View reviewed changes

Comment thread tests/kernels/test_fused_qk_norm_rope_gate.py Outdated

ZJY0516 added 4 commits June 1, 2026 08:24

update

5538ef9

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

4f9b46f

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

ff5662e

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

add kernel in CI

db6835e

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 requested review from Harry-Chen and khluu as code owners June 1, 2026 09:36

mergify Bot added the ci/build label Jun 1, 2026

vadiklyutiy approved these changes Jun 6, 2026

View reviewed changes

ZJY0516 added 2 commits June 9, 2026 06:25

Merge branch 'main' into fused_qk_rmsnorm_rope_gate

4ae5951

update

bd67e2f

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026

ZJY0516 merged commit 7a89b72 into vllm-project:main Jun 9, 2026
96 of 98 checks passed

waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026

[Perf] fuse qk rmsnorm rope gate for qwen3.5 (vllm-project#44176)

a95c070

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[Perf] fuse qk rmsnorm rope gate for qwen3.5 (vllm-project#44176)

16f5386

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026

[Perf] fuse qk rmsnorm rope gate for qwen3.5 (vllm-project#44176)

04a1013

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026

[Perf] fuse qk rmsnorm rope gate for qwen3.5 (vllm-project#44176)

1a88abd

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Perf] fuse qk rmsnorm rope gate for qwen3.5 (vllm-project#44176)

2de3538

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Perf] fuse qk rmsnorm rope gate for qwen3.5 (vllm-project#44176)

0b2a303

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026

[Perf] fuse qk rmsnorm rope gate for qwen3.5 (vllm-project#44176)

53e09de

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Perf] fuse qk rmsnorm rope gate for qwen3.5#44176

[Perf] fuse qk rmsnorm rope gate for qwen3.5#44176
ZJY0516 merged 9 commits into
vllm-project:mainfrom
ZJY0516:fused_qk_rmsnorm_rope_gate

ZJY0516 commented Jun 1, 2026 •

edited

Loading

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

dllehr-amd commented Jun 1, 2026

vadiklyutiy left a comment

ZJY0516 commented Jun 6, 2026

mergify Bot commented Jun 9, 2026

Uh oh!

Labels

4 participants

Uh oh!

Uh oh!

Conversation

ZJY0516 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Accuracy

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

dllehr-amd commented Jun 1, 2026

vadiklyutiy left a comment

Choose a reason for hiding this comment

ZJY0516 commented Jun 6, 2026

mergify Bot commented Jun 9, 2026

Uh oh!

Labels

4 participants

ZJY0516 commented Jun 1, 2026 •

edited

Loading