-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Rust Frontend] Bump llm-multimodal version
ready
ONLY add when PR is ready to merge/full CI is needed
rust
#47530
opened Jul 3, 2026 by
Isotr0py
Member
Loading…
1 of 4 tasks
[Frontend] Limit
SO_REUSEPORT to multi-worker serving
frontend
#47529
opened Jul 3, 2026 by
BugenZhao
Member
Loading…
4 tasks
fix: return SSE content type from NIXL toy proxy
kv-connector
v1
#47526
opened Jul 3, 2026 by
Spycsh
Loading…
3 of 4 tasks
Add assertion for group_size and BLOCK_K consistency
#47525
opened Jul 3, 2026 by
hnhyzz
Loading…
4 tasks
[MRV2] Draw 64-bit uniforms for fp32 Gumbel sampling
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#47524
opened Jul 3, 2026 by
WoosukKwon
Collaborator
Loading…
[Rust Frontend] Speed up chat roundtrip tests
ready
ONLY add when PR is ready to merge/full CI is needed
rust
#47523
opened Jul 3, 2026 by
BugenZhao
Member
Loading…
4 tasks
[Quantization][INC] Support INT2 XPU Linear
intel-gpu
Related to Intel GPU
#47521
opened Jul 3, 2026 by
Zhenzhong1
Contributor
Loading…
[Attention] Derive Triton 3D flash-decoding threshold from SM count a…
v1
#47520
opened Jul 3, 2026 by
tuananhlfc
Loading…
5 tasks done
[ROCm][CI] Fix Kernels and Kernels attention test failures
rocm
Related to AMD ROCm
#47519
opened Jul 3, 2026 by
cpersson-amd
Loading…
4 tasks done
[ROCm][DSV4] Enable fused AITER mHC post+pre kernel for decode
rocm
Related to AMD ROCm
#47518
opened Jul 3, 2026 by
Fangzhou-Ai
Contributor
•
Draft
[XPU][UT]fix _POSSIBLE_KERNELS error on XPU
intel-gpu
Related to Intel GPU
#47516
opened Jul 3, 2026 by
Yejing-Lai
Contributor
Loading…
[Quantization] Fix NVFP4 per-half global scale for fused gate_up_proj
#47515
opened Jul 3, 2026 by
Charles-JCJ
Loading…
5 tasks done
[Quantization][INC]Add MXFP8 Linear Support
#47514
opened Jul 3, 2026 by
Zhenzhong1
Contributor
Loading…
[Spec Decode] Enable full CUDA graphs on padded DSpark
nvidia
qwen
Related to Qwen models
speculative-decoding
v1
fix: avoid JSON constraints for native tool parsers
tool-calling
#47512
opened Jul 3, 2026 by
hubunt
Loading…
DFlash SWA — resolved for personal build
ci/build
cpu
Related to CPU backends
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
frontend
needs-rebase
nvidia
qwen
Related to Qwen models
rocm
Related to AMD ROCm
speculative-decoding
v1
[Bugfix] 'Already borrowed' by pooling tokenizer in StructuredOutputManager
bug
Something isn't working
structured-output
v1
#47509
opened Jul 3, 2026 by
junhee-yoo
Loading…
[Bugfix] Gemma-4 k_eq_v x compressed-tensors: propagate shard aliases
bug
Something isn't working
#47507
opened Jul 3, 2026 by
soaringk
Contributor
Loading…
2 tasks done
[KVConnector] Guard lmcache_mp_connector state transition with num_external_tokens
kv-connector
v1
#47505
opened Jul 3, 2026 by
Alex-ai-future
Contributor
Loading…
[Bugfix][Tool Parser] deepseek_v3: accept optional newline before JSON arguments
bug
Something isn't working
deepseek
Related to DeepSeek models
tool-calling
#47503
opened Jul 3, 2026 by
weizhoublue
Contributor
Loading…
4 tasks
[Minimax-M3] Using tok_sparse_select from MSA instead of triton kernels
ci/build
#47502
opened Jul 3, 2026 by
zyongye
Member
Loading…
[Rust Frontend] add gigachat3 tool parser
rust
#47501
opened Jul 3, 2026 by
yangyang-cs95
Contributor
Loading…
1 of 2 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.