[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric quant regression by yuwenzho · Pull Request #45656 · vllm-project/vllm

yuwenzho · 2026-06-15T06:54:38Z

Purpose

Fix a regression introduced by #43409 (comment) that broke symmetric GPTQ MoE quants (e.g. Autoround) on NVIDIA GPUs.

#43409 removed the if not self.quant_config.is_sym else None guard in AutoGPTQMoEMethod.get_fused_moe_quant_config so that CPU could pass synthesized zero points for symmetric models. However, this caused symmetric GPU quants (Autoround and standard GPTQ with is_sym=True) to pass meaningless qzeros tensors to the Marlin kernel, producing incorrect results.

The same pattern existed in CompressedTensorsWNA16MarlinMoEMethod.process_weights_after_loading, where the if not self.symmetric: guard was absent, risking incorrect zp registration for non-CPU symmetric backends.

Fix

auto_gptq.py and compressed_tensors_moe_wna16_marlin.py: Restore the is_sym guard with an explicit CPU exception

Test Plan

CPU tests:

GPTQ W4A16 INT4 MoE: Qwen/Qwen3-30B-A3B-GPTQ-Int4 passed
Compressed-tensor W4A16 INT4 MoE: RedHatAI/Qwen3-30B-A3B-quantized.w4a16 passed

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

…quant regression (vllm-project#45656) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

…quant regression (vllm-project#45656) Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

fix zp bug

5b7eb2a

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

yuwenzho requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and zyongye as code owners June 15, 2026 06:54

mergify Bot added the bug Something isn't working label Jun 15, 2026

bigPYJ1151 added the verified Run pre-commit for new contributors without triggering other tests label Jun 15, 2026

yuwenzho mentioned this pull request Jun 15, 2026

[CPU] Support CPU W4A16 INT4 MoE #43409

Merged

4 tasks

bigPYJ1151 approved these changes Jun 18, 2026

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) June 18, 2026 01:50

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 18, 2026

Merge branch 'main' into zyw/fix_zp

f7f219c

bigPYJ1151 merged commit 058cc0a into vllm-project:main Jun 18, 2026
87 checks passed

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Jun 21, 2026

[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric …

f63efe6

…quant regression (vllm-project#45656) Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric …

c4f0de7

…quant regression (vllm-project#45656) Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric …

692e826

…quant regression (vllm-project#45656) Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Coisinixixi mentioned this pull request Jul 2, 2026

sync(VLLM-QUANT): cherry-pick initial quantization bugfixes vLLM-HUST/vllm-hust#87

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric quant regression#45656

[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric quant regression#45656
bigPYJ1151 merged 2 commits into
vllm-project:mainfrom
yuwenzho:zyw/fix_zp

yuwenzho commented Jun 15, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Labels

2 participants

Uh oh!

Uh oh!

Conversation

yuwenzho commented Jun 15, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Fix

Test Plan

Test Result

Uh oh!

Labels

2 participants

yuwenzho commented Jun 15, 2026 •

edited by github-actions Bot

Loading