[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V by velonica0 · Pull Request #44478 · vllm-project/vllm

velonica0 · 2026-06-04T02:14:11Z

Purpose

Problem

vLLM's CPU W8A8 dispatcher (vllm/model_executor/kernels/linear/scaled_mm/cpu.py:40-55) has two branches: an x86-only SGL fast path, and an oneDNN path for everyone else.
On RISC-V the else is unconditional. But oneDNN itself isn't compiled in on RISC-V: cmake/cpu_extension.cmake doesn't gate it on, and csrc/cpu/torch_bindings.cpp doesn't register the ops behind __riscv_v. Result: loading any compressed-tensors W8A8 model on RISC-V crashes immediately

What does

Make the existing oneDNN dispatcher reachable on RISC-V. Four minimum-necessary changes, each on the dependency chain from "user loads a W8A8 model" down to "GEMM kernel exists"

Test Plan

Build — confirm the new cmake gate compiles dnnl_kernels.cpp and registers the ops on a RISC-V target.
Op registration — verify the two oneDNN ops appear in the C++ namespace.
Unit tests — vLLM's existing op tests, both unquantized and W8A8.
End-to-end model — load and generate from a compressed-tensors W8A8 checkpoint

Test Result

Hardware: Spacemit X100 (RISC-V, RVA23 + RVV 1.0, VLEN=256), 16 cores (SoC: K3 / BananaPi BPI-F3 class).
Software: GCC 15.2, PyTorch 2.12.0a0+git0d62256 (RISC-V build), oneDNN v3.10 fetched by cmake.

Test	Result
Build (1)	succeeds; `dnnl_kernels.cpp` compiles, ops register
Op registration (2)	both ops present in `torch.ops._C`
`test_onednn_gemm` (3a)	pass (fp/bf16 paths)
`test_onednn_int8_scaled_gemm` (3b)	pass
`vllm bench latency` (4)	3 iters complete; avg 261.4 s / iter; p50 261.0 s; p99 262.4 s; 0.245 generation tok/s

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn> Signed-off-by: divineearthly <divineearthly@gmail.com>

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

velonica0 requested a review from bigPYJ1151 as a code owner June 4, 2026 02:14

mergify Bot added ci/build cpu Related to CPU backends labels Jun 4, 2026

Enable oneDNN W8A8 INT8 to run on RISC-V

d79f2a9

Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

velonica0 force-pushed the risc-v-W8A8 branch from f53cde8 to d79f2a9 Compare June 4, 2026 02:17

bigPYJ1151 approved these changes Jun 9, 2026

View reviewed changes

Merge branch 'main' into risc-v-W8A8

eb09692

bigPYJ1151 enabled auto-merge (squash) June 9, 2026 05:42

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026

bigPYJ1151 and others added 3 commits June 9, 2026 14:27

Merge branch 'main' into risc-v-W8A8

6fbe7e2

Merge branch 'main' into risc-v-W8A8

389d39f

Merge branch 'main' into risc-v-W8A8

5be5050

bigPYJ1151 merged commit f31bc2e into vllm-project:main Jun 11, 2026
23 checks passed

wcynb1023 pushed a commit to wcynb1023/vllm that referenced this pull request Jun 11, 2026

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V (vllm-project#…

4f56516

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V (vllm-project#…

a295ecf

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V (vllm-project#…

1713731

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V (vllm-project#…

99e8c76

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V (vllm-project#…

742dd73

…44478) Signed-off-by: velonica0 <like@mail.nankai.edu.cn>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V#44478

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V#44478
bigPYJ1151 merged 5 commits into
vllm-project:mainfrom
velonica0:risc-v-W8A8

velonica0 commented Jun 4, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Labels

2 participants

Uh oh!

Uh oh!

Conversation

velonica0 commented Jun 4, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Problem

What does

Test Plan

Test Result

Uh oh!

Labels

2 participants

velonica0 commented Jun 4, 2026 •

edited by github-actions Bot

Loading