Skip to content

[Kernel] GLM5 Router GEMM#46385

Merged
simon-mo merged 5 commits into
mainfrom
glm5-router-gemm
Jun 24, 2026
Merged

[Kernel] GLM5 Router GEMM#46385
simon-mo merged 5 commits into
mainfrom
glm5-router-gemm

Conversation

@jeejeelee

@jeejeelee jeejeelee commented Jun 22, 2026

Copy link
Copy Markdown
Member

Purpose

Borrrow idea from NVIDIA/TensorRT-LLM#13740

Test Plan

Test Result

GSM8K

  • main branch
local-completions ({'model': 'zai-org/GLM-5.2-FP8', 'base_url': 'http://0.0.0.0:8000/v1/completions', 'tokenized_requests': False, 'tokenizer_backend': None, 'num_concurrent': 256}), gen_kwargs: ({}), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9454|±  |0.0063|
|     |       |strict-match    |     5|exact_match|↑  |0.9454|±  |0.0063|

  • this PR
local-completions ({'model': 'zai-org/GLM-5.2-FP8', 'base_url': 'http://0.0.0.0:8000/v1/completions', 'tokenized_requests': False, 'tokenizer_backend': None, 'num_concurrent': 256}), gen_kwargs: ({}), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9424|±  |0.0064|
|     |       |strict-match    |     5|exact_match|↑  |0.9431|±  |0.0064|

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
@jeejeelee jeejeelee marked this pull request as draft June 22, 2026 16:27
@jeejeelee jeejeelee marked this pull request as ready for review June 23, 2026 02:49
@zyongye zyongye added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 23, 2026
@simon-mo simon-mo merged commit 9d6fdc2 into main Jun 24, 2026
218 of 222 checks passed
@simon-mo simon-mo deleted the glm5-router-gemm branch June 24, 2026 05:54
@noooop noooop added this to the v0.24.0 cherrypick milestone Jun 24, 2026
khluu pushed a commit that referenced this pull request Jun 24, 2026
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
(cherry picked from commit 9d6fdc2)
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
wincent8 pushed a commit to wincent8/vllm that referenced this pull request Jun 29, 2026
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

4 participants