Skip to content

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V#44478

Merged
bigPYJ1151 merged 5 commits into
vllm-project:mainfrom
velonica0:risc-v-W8A8
Jun 11, 2026
Merged

[CPU][RISC-V] Enable oneDNN W8A8 INT8 to run on RISC-V#44478
bigPYJ1151 merged 5 commits into
vllm-project:mainfrom
velonica0:risc-v-W8A8

Conversation

@velonica0

@velonica0 velonica0 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Purpose

Problem

vLLM's CPU W8A8 dispatcher (vllm/model_executor/kernels/linear/scaled_mm/cpu.py:40-55) has two branches: an x86-only SGL fast path, and an oneDNN path for everyone else.
On RISC-V the else is unconditional. But oneDNN itself isn't compiled in on RISC-V: cmake/cpu_extension.cmake doesn't gate it on, and csrc/cpu/torch_bindings.cpp doesn't register the ops behind __riscv_v. Result: loading any compressed-tensors W8A8 model on RISC-V crashes immediately

What does

Make the existing oneDNN dispatcher reachable on RISC-V. Four minimum-necessary changes, each on the dependency chain from "user loads a W8A8 model" down to "GEMM kernel exists"

Test Plan

  1. Build — confirm the new cmake gate compiles dnnl_kernels.cpp and registers the ops on a RISC-V target.
  2. Op registration — verify the two oneDNN ops appear in the C++ namespace.
  3. Unit tests — vLLM's existing op tests, both unquantized and W8A8.
  4. End-to-end model — load and generate from a compressed-tensors W8A8 checkpoint

Test Result

Hardware: Spacemit X100 (RISC-V, RVA23 + RVV 1.0, VLEN=256), 16 cores (SoC: K3 / BananaPi BPI-F3 class).
Software: GCC 15.2, PyTorch 2.12.0a0+git0d62256 (RISC-V build), oneDNN v3.10 fetched by cmake.

Test Result
Build (1) succeeds; dnnl_kernels.cpp compiles, ops register
Op registration (2) both ops present in torch.ops._C
test_onednn_gemm (3a) pass (fp/bf16 paths)
test_onednn_int8_scaled_gemm (3b) pass
vllm bench latency (4) 3 iters complete; avg 261.4 s / iter; p50 261.0 s; p99 262.4 s; 0.245 generation tok/s

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
@velonica0 velonica0 requested a review from bigPYJ1151 as a code owner June 4, 2026 02:14
@mergify mergify Bot added ci/build cpu Related to CPU backends labels Jun 4, 2026
Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
@bigPYJ1151 bigPYJ1151 enabled auto-merge (squash) June 9, 2026 05:42
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026
@bigPYJ1151 bigPYJ1151 merged commit f31bc2e into vllm-project:main Jun 11, 2026
23 checks passed
wcynb1023 pushed a commit to wcynb1023/vllm that referenced this pull request Jun 11, 2026
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
…44478)

Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
…44478)

Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…44478)

Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
…44478)

Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends ready ONLY add when PR is ready to merge/full CI is needed

2 participants