Skip to content

[Security] Fix info disclosure via int32 truncation in GGUF dequantize kernels#44971

Merged
Isotr0py merged 2 commits into
vllm-project:mainfrom
jperezdealgaba:fix/gguf-int-truncation-info-disclosure
Jun 11, 2026
Merged

[Security] Fix info disclosure via int32 truncation in GGUF dequantize kernels#44971
Isotr0py merged 2 commits into
vllm-project:mainfrom
jperezdealgaba:fix/gguf-int-truncation-info-disclosure

Conversation

@jperezdealgaba

Copy link
Copy Markdown
Contributor

Purpose

Fix an information disclosure vulnerability caused by integer truncation in GGUF dequantize kernels (csrc/libtorch_stable/quantization/gguf/).

The to_cuda_ggml_t function pointer typedef declares its element count parameter k as int (32-bit). When a GGUF model has weight tensor dimensions whose product exceeds INT_MAX (e.g. a 65536x65536 matrix), the int64_t product m * n is silently truncated to int at the call site. The CUDA kernel then processes only the truncated number of elements. Since output tensors are allocated with torch::empty (uninitialized memory), the unfilled portion retains stale GPU memory which — in multi-tenant inference deployments — may contain tensor data from other users' requests.

Changes:

  • Widen k from int to int64_t in the to_cuda_ggml_t typedef (ggml-common.h) and all 18 dequantize function signatures in dequantize.cuh
  • Widen the dequantize_block kernel's index arithmetic to int64_t to match
  • Widen col, batch, vecs, and padded local variables in gguf_kernel.cu from int to int64_t to prevent the same class of truncation in the matmul and MoE paths
  • Defense-in-depth: zero-initialize all output tensors via torch::stable::fill_(Y, 0.0) in ggml_dequantize, ggml_mul_mat_vec_a8, ggml_mul_mat_a8, and ggml_moe_a8, so that even if a future truncation bug occurs, stale GPU memory is never exposed

Test Plan

# Build and verify compilation succeeds
python setup.py build_ext --inplace
# Run existing GGUF tests to verify no regression
python -m pytest tests/ -k "gguf" -v

The type change from int to int64_t is ABI-compatible and all downstream call sites already pass int64_t values (e.g. m * n in ggml_dequantize). The fix eliminates the silent truncation at the function pointer boundary.

Test Result

  • All existing GGUF kernel call sites pass int64_t values — the widened signatures eliminate the implicit narrowing conversion
  • clang-format passes on all changed files
  • ggml_moe_a8_vec already used fill_(Y, 0.0) — the other four functions now match

MR was created with the assisstance of: opus-4.6-high

Widen the element-count parameter `k` in `to_cuda_ggml_t` and all
dequantize functions from `int` to `int64_t`.  When tensor dimensions
exceed INT_MAX (e.g. 65536x65536), the 32-bit `k` silently truncates,
causing the kernel to process only a fraction of the output tensor.
Since output tensors are allocated with `torch::empty` (uninitialized),
the unfilled portion retains stale GPU memory which may contain data
from other users' inference requests in multi-tenant deployments.

Also widen `col`, `batch`, `vecs`, and `padded` variables in
gguf_kernel.cu from `int` to `int64_t` to prevent the same class of
truncation in the matmul and MoE paths.

As defense-in-depth, zero-initialize all output tensors via
`torch::stable::fill_(Y, 0.0)` so that even if a future truncation
bug occurs, stale GPU memory is never exposed.

Signed-off-by: Juan Pérez de Algaba <jperezde@redhat.com>

Signed-off-by: jperezde <jperezde@redhat.com>
@Isotr0py Isotr0py enabled auto-merge (squash) June 11, 2026 01:30
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 11, 2026
@Isotr0py Isotr0py merged commit f219788 into vllm-project:main Jun 11, 2026
188 checks passed
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
…e kernels (vllm-project#44971)

Signed-off-by: jperezde <jperezde@redhat.com>
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
…e kernels (vllm-project#44971)

Signed-off-by: jperezde <jperezde@redhat.com>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…e kernels (vllm-project#44971)

Signed-off-by: jperezde <jperezde@redhat.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
…e kernels (vllm-project#44971)

Signed-off-by: jperezde <jperezde@redhat.com>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
…e kernels (vllm-project#44971)

Signed-off-by: jperezde <jperezde@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

2 participants