-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Pull requests: sgl-project/sglang
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix kv_b_proj channel scale broadcast when reshape hasn't run yet
#30029
opened Jul 3, 2026 by
kewang-amd
Loading…
Warn when Dumper may capture CUDA graph outputs
documentation
Improvements or additions to documentation
#30028
opened Jul 3, 2026 by
feichai0017
Loading…
Add deterministic inference for eagle parity test
#30026
opened Jul 3, 2026 by
ANSHUMAN87
Contributor
Loading…
perf: reorder DSA indexer dual-stream ops to avoid CUDA graph stream explosion
#30025
opened Jul 3, 2026 by
kpham-sgl
Collaborator
Loading…
3 tasks
perf(sgl-kernel): default block_quota=16 for MLA page_first KV gather…
sgl-kernel
#30024
opened Jul 3, 2026 by
TianDi101
Loading…
5 tasks
[tracing] sglang tracing v2: support exporting tracing data asynchronously
documentation
Improvements or additions to documentation
#30023
opened Jul 3, 2026 by
sufeng-buaa
Collaborator
Loading…
4 of 5 tasks
fix: serialize FanOutCommunicator queueing calls with a lock
#30022
opened Jul 3, 2026 by
lyang24
Loading…
5 tasks
[codex] Support CUDA 12.2 source builds
blackwell
SM100/SM120
jit-kernel
npu
quant
LLM Quantization
sgl-kernel
[diffusion] feat: performance_mode=speed enables torch.compile by default
diffusion
SGLang Diffusion
run-ci
#30016
opened Jul 3, 2026 by
mickqian
Collaborator
Loading…
For hybrid sliding-window (SWA) models the SWA KV pool is small and quickly
#30013
opened Jul 3, 2026 by
TensorGlue-IEIT
Loading…
[DSv4] Use BF16 instead of FP32 for indexer score computation
#30012
opened Jul 3, 2026 by
TTThanos
Contributor
Loading…
5 tasks
[AMD] WIP - Set REQUEST_TIMEOUT=30 for AMD to deflake multimodal tests
amd
bypass-fastfail
run-ci
#30008
opened Jul 3, 2026 by
yctseng0211
Collaborator
Loading…
5 tasks
[CI] increase XPU container shm-size from default 64MB to 8GB
run-ci
run-ci-extra
#30007
opened Jul 3, 2026 by
vshekhawat-hlab
Contributor
Loading…
5 tasks
Fix prefill CUDA graph disabled for deeply-nested multimodal models
#30006
opened Jul 3, 2026 by
rahulvijayaraghavan
Contributor
Loading…
refactor: make time_stats msgpack-native
#30005
opened Jul 3, 2026 by
oleksii-tumanov
Contributor
Loading…
5 tasks done
[diffusion] feat: per-layer TP shard planner for DiT linears (--dit-tp-plan)
diffusion
SGLang Diffusion
#30004
opened Jul 3, 2026 by
mickqian
Collaborator
Loading…
[MoE] Retire the AOT moe_fused_gate / kimi_k2_moe_fused_gate gate kernels (#26771)
jit-kernel
mthreads
run-ci
sgl-kernel
#29997
opened Jul 3, 2026 by
BBuf
Collaborator
Loading…
3 tasks done
Fix device mismatch when mixing JPEG (GPU-decoded) and other type (CP…
#29996
opened Jul 3, 2026 by
yuanshaochen
Loading…
1 of 5 tasks
fix(mimo-vl): pass padded_context_dim to Qwen2_5_VisionPatchMerger
#29994
opened Jul 3, 2026 by
alisonshao
Collaborator
Loading…
2 of 3 tasks
FlashInfer Backend for MXFP8 Grouped Quantization
documentation
Improvements or additions to documentation
quant
LLM Quantization
sgl-kernel
#29992
opened Jul 3, 2026 by
philipphack
Loading…
5 tasks done
[docs] Multi-node deployment: add PD disaggregation and Apptainer examples for SLURM
documentation
Improvements or additions to documentation
#29991
opened Jul 3, 2026 by
davislx
Loading…
3 of 5 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-06-03.