LMSYS Org (@lmsysorg) / X

LMSYS Org

1,164 posts

LMSYS Org

@lmsysorg

Large Model Systems Organization: Join our Slack: slack.sglang.io. We developed SGLang sglang.io, Chatbot Arena (now @arena), and Vicuna!

Joined August 2024

LMSYS Org
@lmsysorg
Sep 22, 2025
SGLang now supports deterministic LLM inference! Building on @thinkymachines batch-invariant kernels, we integrated deterministic attention & sampling ops into a high-throughput engine - fully compatible with chunked prefill, CUDA graphs, radix cache, and non-greedy sampling. ✅
112K
LMSYS Org
@lmsysorg
May 5, 2025
🚀 Breaking: SGLang provides the first open-source implementation to serve @deepseek_ai V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs. It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input
162K
LMSYS Org
@lmsysorg
Oct 14, 2025
🚀 SGLang In-Depth Review of the NVIDIA DGX Spark is LIVE! Thanks to @nvidia’s early access program, SGLang makes its first ever appearance in a consumer product, the brand-new DGX Spark. The DGX Spark’s 128GB Unified Memory and Blackwell architecture set a new standard for
411K
LMSYS Org
@lmsysorg
Sep 29, 2025
🎉 Congrats to the DeepSeek team on the amazing release of Sparse Attention (DSA) in V3.2! This fine-grained design sets a new bar for long-context efficiency 🚀 We’re proud that SGLang is an official inference framework for DeepSeek-V3.2 — with optimized sparse attention
DeepSeek
@deepseek_ai
Sep 29, 2025
🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n
56K
LMSYS Org
@lmsysorg
Nov 7, 2025
🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API
109K
LMSYS Org
@lmsysorg
May 31, 2025
Hello everyone, the SGLang community, in collaboration with the Search R1 team, has quickly reproduced Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning based on the previously open-sourced multi-turn RL. We welcome you to get hands-on
18K
LMSYS Org
@lmsysorg
Jul 30, 2025
🚨Big News! We collaborated with @nvidia to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world’s most advanced data center–scale accelerated computing platform. This docker container runs a single copy of the model across 56
29K
LMSYS Org
@lmsysorg
Sep 25, 2025
🚀 Follow-up to our last breakthrough on DeepSeek V3/R1 inference! On NVIDIA GB200 NVL72, SGLang now achieves 26k input tokens/s and 13k output tokens/s per GPU with FP8 attention + NVFP4 MoE - that’s a 3.8× / 4.8× speedup vs H100 settings. See the details in the 🧵 (1/4)
69K
LMSYS Org
@lmsysorg
Dec 26, 2024
The best open-source LLM, DeepSeek V3, has just been released! SGLang v0.4.1 is the officially recommended inference solution for it. The SGLang and DeepSeek teams worked together to support DeepSeek V3 FP8 on NVIDIA and AMD GPUs from day one. SGLang has supported MLA and DP
DeepSeek
@deepseek_ai
Dec 26, 2024
🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n
GIF
33K
LMSYS Org
@lmsysorg
May 14, 2025
SGLang, verl, OpenBMB and Tsinghua University: Pioneering End-to-End Multi-Turn RLHF We are thrilled to announce the release of the first fully functional, convergence-verified, end-to-end open source multi-turn Reinforcement Learning with Human Feedback (RLHF) framework,
18K
LMSYS Org
@lmsysorg
Aug 11, 2025
Honored to see SGLang adopted in RL training for GLM-4.5 at @Zai_org — large-scale validation from a frontier AI lab pushing the boundaries of LLMs!
86K
LMSYS Org
@lmsysorg
Jun 12, 2025
Huge thanks to @AMD for donating an MI350 to SGLang! This advanced AI accelerator is making a meaningful difference—enabling us to move faster in developing scalable LLM systems and pushing the limits of inference optimization. Special thank to our awesome infra partner
44K
LMSYS Org
@lmsysorg
Apr 28, 2025
Qwen 3 @Alibaba_Qwen has been released! SGLang is proud to be a close partner supporting it from day 0!
28K
LMSYS Org
@lmsysorg
Sep 11, 2025
We are excited to announce SGLang HiCache, our community solution for hierarchical KV caching to power high-performance LLM serving. ⚡ Performance: up to 6× throughput and 80% TTFT reduction demonstrated in benchmarks and real-world deployments. 🗂️ Flexibility: seamless
60K