Skip to content

[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric quant regression#45656

Merged
bigPYJ1151 merged 2 commits into
vllm-project:mainfrom
yuwenzho:zyw/fix_zp
Jun 18, 2026
Merged

[Bugfix] Restore is_sym guard for zp in GPTQ/CT MoE to fix symmetric quant regression#45656
bigPYJ1151 merged 2 commits into
vllm-project:mainfrom
yuwenzho:zyw/fix_zp

Conversation

@yuwenzho

@yuwenzho yuwenzho commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Purpose

Fix a regression introduced by #43409 (comment) that broke symmetric GPTQ MoE quants (e.g. Autoround) on NVIDIA GPUs.

#43409 removed the if not self.quant_config.is_sym else None guard in AutoGPTQMoEMethod.get_fused_moe_quant_config so that CPU could pass synthesized zero points for symmetric models. However, this caused symmetric GPU quants (Autoround and standard GPTQ with is_sym=True) to pass meaningless qzeros tensors to the Marlin kernel, producing incorrect results.

The same pattern existed in CompressedTensorsWNA16MarlinMoEMethod.process_weights_after_loading, where the if not self.symmetric: guard was absent, risking incorrect zp registration for non-CPU symmetric backends.

Fix

auto_gptq.py and compressed_tensors_moe_wna16_marlin.py: Restore the is_sym guard with an explicit CPU exception

Test Plan

CPU tests:

  • GPTQ W4A16 INT4 MoE: Qwen/Qwen3-30B-A3B-GPTQ-Int4 passed
  • Compressed-tensor W4A16 INT4 MoE: RedHatAI/Qwen3-30B-A3B-quantized.w4a16 passed

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@mergify mergify Bot added the bug Something isn't working label Jun 15, 2026
@bigPYJ1151 bigPYJ1151 added the verified Run pre-commit for new contributors without triggering other tests label Jun 15, 2026
@yuwenzho yuwenzho mentioned this pull request Jun 15, 2026
4 tasks
@bigPYJ1151 bigPYJ1151 enabled auto-merge (squash) June 18, 2026 01:50
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 18, 2026
@bigPYJ1151 bigPYJ1151 merged commit 058cc0a into vllm-project:main Jun 18, 2026
87 checks passed
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…quant regression (vllm-project#45656)

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Jun 21, 2026
…quant regression (vllm-project#45656)

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
…quant regression (vllm-project#45656)

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
…quant regression (vllm-project#45656)

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed verified Run pre-commit for new contributors without triggering other tests

2 participants