Skip to content

[Model]Fix MiniMaxM2ForCausalLM perf regression#45935

Merged
jeejeelee merged 11 commits into
mainfrom
fix-m2-regression
Jun 21, 2026
Merged

[Model]Fix MiniMaxM2ForCausalLM perf regression#45935
jeejeelee merged 11 commits into
mainfrom
fix-m2-regression

Conversation

@jeejeelee

@jeejeelee jeejeelee commented Jun 17, 2026

Copy link
Copy Markdown
Member

Purpose

The root cause is that torch.compile can't fuse these torch glue ops, which leads to the M25 perf regression. This PR fuses these torch ops manually with Triton.

Test Plan

Test Result

  • PERF benchmark(nvidia/MiniMax-M2.5-NVFP4 TP4 on GB200*4)
image
  • Accuracy evaluation for nvidia/MiniMax-M2.5-NVFP4
/ GPQA Diamond AIME 2025
Official score 0.839 0.853
This PR 0.833 0.858

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
@jeejeelee jeejeelee marked this pull request as ready for review June 18, 2026 13:52
@ZJY0516 ZJY0516 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 18, 2026
@jeejeelee jeejeelee merged commit 745bba5 into main Jun 21, 2026
79 checks passed
@jeejeelee jeejeelee deleted the fix-m2-regression branch June 21, 2026 16:28
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

2 participants