Skip to content

[XPU] add awq format for INCXPULinear#43404

Merged
jikunshang merged 1 commit into
vllm-project:mainfrom
Liangliang-Ma:mll_fix_1460
Jun 22, 2026
Merged

[XPU] add awq format for INCXPULinear#43404
jikunshang merged 1 commit into
vllm-project:mainfrom
Liangliang-Ma:mll_fix_1460

Conversation

@Liangliang-Ma

Copy link
Copy Markdown
Contributor
image

convert AWQ to GPTQ, letting INCXPULinear can handle AWQ-format autoround models load.

cc: @yiliu30 @jikunshang

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for AWQ-packed AutoRound checkpoints within the Intel Extension for Transformers (INC) quantization backend. Key changes include the addition of a packing_format parameter to quantization layers and the implementation of a lossless conversion method, _convert_awq_qweight_to_gptq, which transforms AWQ-style nibble ordering into the GPTQ-style layout required by the underlying kernels. The INCXPULinearBase class and its derivatives, INCXPULinearMethod and INCARKLinearMethod, have been updated to handle these different packing formats during weight initialization and processing. I have no feedback to provide as there were no review comments.

@yiliu30

yiliu30 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Hi @Liangliang-Ma, thanks for the fix, and that makes sense to me!
Please help adapt the new INC flow. #40601

class INCXPULinearBase(INCLinearScheme):

@mergify

mergify Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Liangliang-Ma.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 17, 2026
Add AWQ-packed checkpoint support to the XPU INC w4a16 path inside
the INC scheme orchestrator introduced in vllm-project#40601:

- INCXPULinearBase reads layer_config.is_awq and creates qweight with
  the AWQ shape [K, N // pack_factor] (packed along output dim) when
  the checkpoint is AWQ-packed, or the GPTQ shape [K // pack_factor, N]
  (packed along input dim) otherwise.
- A lossless _convert_awq_qweight_to_gptq helper reorders the AWQ
  nibble layout ([0, 2, 4, 6, 1, 3, 5, 7]) into sequential order and
  repacks along the input dim, matching the GPTQ layout that the
  oneDNN int4_gemm_w4a16 kernel and the ARK backend already consume.
- Both INCXPULinearMethod.process_weights_after_loading and
  INCARKLinearMethod.process_weights_after_loading invoke the
  converter before the existing NT transpose / ARK weight copy.
- test_auto_round_model[auto_round:auto_awq] is enabled on XPU.

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
@yiliu30

yiliu30 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

LGTM! Thanks for the support!

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 22, 2026
@jikunshang jikunshang merged commit 3da4a1b into vllm-project:main Jun 22, 2026
89 of 90 checks passed
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
qli88 pushed a commit to qli88/vllm that referenced this pull request Jun 26, 2026
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Signed-off-by: Qiang Li <qiang.li2@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

intel-gpu Related to Intel GPU ready ONLY add when PR is ready to merge/full CI is needed

3 participants