[XPU] add awq format for INCXPULinear#43404
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for AWQ-packed AutoRound checkpoints within the Intel Extension for Transformers (INC) quantization backend. Key changes include the addition of a packing_format parameter to quantization layers and the implementation of a lossless conversion method, _convert_awq_qweight_to_gptq, which transforms AWQ-style nibble ordering into the GPTQ-style layout required by the underlying kernels. The INCXPULinearBase class and its derivatives, INCXPULinearMethod and INCARKLinearMethod, have been updated to handle these different packing formats during weight initialization and processing. I have no feedback to provide as there were no review comments.
|
Hi @Liangliang-Ma, thanks for the fix, and that makes sense to me! |
|
This pull request has merge conflicts that must be resolved before it can be |
Add AWQ-packed checkpoint support to the XPU INC w4a16 path inside the INC scheme orchestrator introduced in vllm-project#40601: - INCXPULinearBase reads layer_config.is_awq and creates qweight with the AWQ shape [K, N // pack_factor] (packed along output dim) when the checkpoint is AWQ-packed, or the GPTQ shape [K // pack_factor, N] (packed along input dim) otherwise. - A lossless _convert_awq_qweight_to_gptq helper reorders the AWQ nibble layout ([0, 2, 4, 6, 1, 3, 5, 7]) into sequential order and repacks along the input dim, matching the GPTQ layout that the oneDNN int4_gemm_w4a16 kernel and the ARK backend already consume. - Both INCXPULinearMethod.process_weights_after_loading and INCARKLinearMethod.process_weights_after_loading invoke the converter before the existing NT transpose / ARK weight copy. - test_auto_round_model[auto_round:auto_awq] is enabled on XPU. Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
38cb574 to
baa0a3e
Compare
|
LGTM! Thanks for the support! |
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>
convert AWQ to GPTQ, letting INCXPULinear can handle AWQ-format autoround models load.
cc: @yiliu30 @jikunshang