[Security] Fix info disclosure via int32 truncation in GGUF dequantize kernels by jperezdealgaba · Pull Request #44971 · vllm-project/vllm

jperezdealgaba · 2026-06-09T07:09:29Z

Purpose

Fix an information disclosure vulnerability caused by integer truncation in GGUF dequantize kernels (csrc/libtorch_stable/quantization/gguf/).

The to_cuda_ggml_t function pointer typedef declares its element count parameter k as int (32-bit). When a GGUF model has weight tensor dimensions whose product exceeds INT_MAX (e.g. a 65536x65536 matrix), the int64_t product m * n is silently truncated to int at the call site. The CUDA kernel then processes only the truncated number of elements. Since output tensors are allocated with torch::empty (uninitialized memory), the unfilled portion retains stale GPU memory which — in multi-tenant inference deployments — may contain tensor data from other users' requests.

Changes:

Widen k from int to int64_t in the to_cuda_ggml_t typedef (ggml-common.h) and all 18 dequantize function signatures in dequantize.cuh
Widen the dequantize_block kernel's index arithmetic to int64_t to match
Widen col, batch, vecs, and padded local variables in gguf_kernel.cu from int to int64_t to prevent the same class of truncation in the matmul and MoE paths
Defense-in-depth: zero-initialize all output tensors via torch::stable::fill_(Y, 0.0) in ggml_dequantize, ggml_mul_mat_vec_a8, ggml_mul_mat_a8, and ggml_moe_a8, so that even if a future truncation bug occurs, stale GPU memory is never exposed

Test Plan

# Build and verify compilation succeeds
python setup.py build_ext --inplace
# Run existing GGUF tests to verify no regression
python -m pytest tests/ -k "gguf" -v

The type change from int to int64_t is ABI-compatible and all downstream call sites already pass int64_t values (e.g. m * n in ggml_dequantize). The fix eliminates the silent truncation at the function pointer boundary.

Test Result

All existing GGUF kernel call sites pass int64_t values — the widened signatures eliminate the implicit narrowing conversion
clang-format passes on all changed files
ggml_moe_a8_vec already used fill_(Y, 0.0) — the other four functions now match

MR was created with the assisstance of: opus-4.6-high

Widen the element-count parameter `k` in `to_cuda_ggml_t` and all dequantize functions from `int` to `int64_t`. When tensor dimensions exceed INT_MAX (e.g. 65536x65536), the 32-bit `k` silently truncates, causing the kernel to process only a fraction of the output tensor. Since output tensors are allocated with `torch::empty` (uninitialized), the unfilled portion retains stale GPU memory which may contain data from other users' inference requests in multi-tenant deployments. Also widen `col`, `batch`, `vecs`, and `padded` variables in gguf_kernel.cu from `int` to `int64_t` to prevent the same class of truncation in the matmul and MoE paths. As defense-in-depth, zero-initialize all output tensors via `torch::stable::fill_(Y, 0.0)` so that even if a future truncation bug occurs, stale GPU memory is never exposed. Signed-off-by: Juan Pérez de Algaba <jperezde@redhat.com> Signed-off-by: jperezde <jperezde@redhat.com>

…e kernels (vllm-project#44971) Signed-off-by: jperezde <jperezde@redhat.com>

…e kernels (vllm-project#44971) Signed-off-by: jperezde <jperezde@redhat.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

…e kernels (vllm-project#44971) Signed-off-by: jperezde <jperezde@redhat.com>

Isotr0py approved these changes Jun 11, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) June 11, 2026 01:30

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 11, 2026

Merge branch 'main' into fix/gguf-int-truncation-info-disclosure

1ec1bd1

Isotr0py merged commit f219788 into vllm-project:main Jun 11, 2026
188 checks passed

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[Security] Fix info disclosure via int32 truncation in GGUF dequantiz…

e99c22b

…e kernels (vllm-project#44971) Signed-off-by: jperezde <jperezde@redhat.com>

vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026

[Security] Fix info disclosure via int32 truncation in GGUF dequantiz…

0180848

…e kernels (vllm-project#44971) Signed-off-by: jperezde <jperezde@redhat.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Security] Fix info disclosure via int32 truncation in GGUF dequantiz…

a754879

…e kernels (vllm-project#44971) Signed-off-by: jperezde <jperezde@redhat.com>

nixpkgs-security-tracker Bot mentioned this pull request Jun 23, 2026

vLLM: security issues < 0.23.1rc0 NixOS/nixpkgs#534486

Open

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Security] Fix info disclosure via int32 truncation in GGUF dequantiz…

49f2637

…e kernels (vllm-project#44971) Signed-off-by: jperezde <jperezde@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Security] Fix info disclosure via int32 truncation in GGUF dequantize kernels#44971

[Security] Fix info disclosure via int32 truncation in GGUF dequantize kernels#44971
Isotr0py merged 2 commits into
vllm-project:mainfrom
jperezdealgaba:fix/gguf-int-truncation-info-disclosure

jperezdealgaba commented Jun 9, 2026

Uh oh!

Labels

2 participants

Uh oh!

Uh oh!

Conversation

jperezdealgaba commented Jun 9, 2026

Purpose

Test Plan

Test Result

Uh oh!

Labels

2 participants