[XPU] add awq format for INCXPULinear by Liangliang-Ma · Pull Request #43404 · vllm-project/vllm

Liangliang-Ma · 2026-05-22T08:35:57Z

convert AWQ to GPTQ, letting INCXPULinear can handle AWQ-format autoround models load.

gemini-code-assist

Code Review

This pull request introduces support for AWQ-packed AutoRound checkpoints within the Intel Extension for Transformers (INC) quantization backend. Key changes include the addition of a packing_format parameter to quantization layers and the implementation of a lossless conversion method, _convert_awq_qweight_to_gptq, which transforms AWQ-style nibble ordering into the GPTQ-style layout required by the underlying kernels. The INCXPULinearBase class and its derivatives, INCXPULinearMethod and INCARKLinearMethod, have been updated to handle these different packing formats during weight initialization and processing. I have no feedback to provide as there were no review comments.

yiliu30 · 2026-06-17T14:29:10Z

Hi @Liangliang-Ma, thanks for the fix, and that makes sense to me!
Please help adapt the new INC flow. #40601

vllm/vllm/model_executor/layers/quantization/inc/schemes/inc_wna16_linear.py

Line 185 in 0b131b1

class INCXPULinearBase(INCLinearScheme):

mergify · 2026-06-17T16:13:58Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Liangliang-Ma.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Add AWQ-packed checkpoint support to the XPU INC w4a16 path inside the INC scheme orchestrator introduced in vllm-project#40601: - INCXPULinearBase reads layer_config.is_awq and creates qweight with the AWQ shape [K, N // pack_factor] (packed along output dim) when the checkpoint is AWQ-packed, or the GPTQ shape [K // pack_factor, N] (packed along input dim) otherwise. - A lossless _convert_awq_qweight_to_gptq helper reorders the AWQ nibble layout ([0, 2, 4, 6, 1, 3, 5, 7]) into sequential order and repacks along the input dim, matching the GPTQ layout that the oneDNN int4_gemm_w4a16 kernel and the ARK backend already consume. - Both INCXPULinearMethod.process_weights_after_loading and INCARKLinearMethod.process_weights_after_loading invoke the converter before the existing NT transpose / ARK weight copy. - test_auto_round_model[auto_round:auto_awq] is enabled on XPU. Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>

yiliu30 · 2026-06-22T09:19:14Z

LGTM! Thanks for the support!

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Liangliang-Ma requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and zyongye as code owners May 22, 2026 08:35

mergify Bot added the intel-gpu Related to Intel GPU label May 22, 2026

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

mergify Bot added the needs-rebase label Jun 17, 2026

Liangliang-Ma force-pushed the mll_fix_1460 branch from 38cb574 to baa0a3e Compare June 22, 2026 07:42

Liangliang-Ma requested a review from AndreasKaratzas as a code owner June 22, 2026 07:42

mergify Bot removed the needs-rebase label Jun 22, 2026

yiliu30 approved these changes Jun 22, 2026

View reviewed changes

jikunshang approved these changes Jun 22, 2026

View reviewed changes

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 22, 2026

jikunshang merged commit 3da4a1b into vllm-project:main Jun 22, 2026
89 of 90 checks passed

yiliu30 mentioned this pull request Jun 24, 2026

[RFC]: Intel Quantization Support Roadmap (H1 2026) #37979

Open

1 task

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[XPU] add awq format for INCXPULinear (vllm-project#43404)

b046a40

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>

qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026

[XPU] add awq format for INCXPULinear (vllm-project#43404)

66c56ea

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[XPU] add awq format for INCXPULinear#43404

[XPU] add awq format for INCXPULinear#43404
jikunshang merged 1 commit into
vllm-project:mainfrom
Liangliang-Ma:mll_fix_1460

Liangliang-Ma commented May 22, 2026

gemini-code-assist Bot left a comment

yiliu30 commented Jun 17, 2026

mergify Bot commented Jun 17, 2026

yiliu30 commented Jun 22, 2026

Uh oh!

Labels

3 participants

Uh oh!

Uh oh!

Conversation

Liangliang-Ma commented May 22, 2026

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

yiliu30 commented Jun 17, 2026

mergify Bot commented Jun 17, 2026

yiliu30 commented Jun 22, 2026

Uh oh!

Labels

3 participants