LMSYS Org (@lmsysorg) / X

LMSYS Org

1,164 posts

LMSYS Org

@lmsysorg

Large Model Systems Organization: Join our Slack: slack.sglang.io. We developed SGLang sglang.io, Chatbot Arena (now @arena), and Vicuna!

lmsys.org

Datum pridruživanja: August 2024

LMSYS Org
@lmsysorg
22. ruj 2025.
SGLang now supports deterministic LLM inference! Building on @thinkymachines batch-invariant kernels, we integrated deterministic attention & sampling ops into a high-throughput engine - fully compatible with chunked prefill, CUDA graphs, radix cache, and non-greedy sampling. ✅
112K
LMSYS Org
@lmsysorg
5. svi 2025.
🚀 Breaking: SGLang provides the first open-source implementation to serve @deepseek_ai V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs. It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input
162K
LMSYS Org
@lmsysorg
14. lis 2025.
🚀 SGLang In-Depth Review of the NVIDIA DGX Spark is LIVE! Thanks to @nvidia’s early access program, SGLang makes its first ever appearance in a consumer product, the brand-new DGX Spark. The DGX Spark’s 128GB Unified Memory and Blackwell architecture set a new standard for
411K
LMSYS Org
@lmsysorg
29. ruj 2025.
🎉 Congrats to the DeepSeek team on the amazing release of Sparse Attention (DSA) in V3.2! This fine-grained design sets a new bar for long-context efficiency 🚀 We’re proud that SGLang is an official inference framework for DeepSeek-V3.2 — with optimized sparse attention
DeepSeek
@deepseek_ai
29. ruj 2025.
🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n
56K
LMSYS Org
@lmsysorg
7. stu 2025.
🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API
109K
LMSYS Org
@lmsysorg
31. svi 2025.
Hello everyone, the SGLang community, in collaboration with the Search R1 team, has quickly reproduced Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning based on the previously open-sourced multi-turn RL. We welcome you to get hands-on
18K
LMSYS Org
@lmsysorg
30. srp 2025.
🚨Big News! We collaborated with @nvidia to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world’s most advanced data center–scale accelerated computing platform. This docker container runs a single copy of the model across 56
29K
LMSYS Org
@lmsysorg
25. ruj 2025.
🚀 Follow-up to our last breakthrough on DeepSeek V3/R1 inference! On NVIDIA GB200 NVL72, SGLang now achieves 26k input tokens/s and 13k output tokens/s per GPU with FP8 attention + NVFP4 MoE - that’s a 3.8× / 4.8× speedup vs H100 settings. See the details in the 🧵 (1/4)
69K
LMSYS Org
@lmsysorg
26. pro 2024.
The best open-source LLM, DeepSeek V3, has just been released! SGLang v0.4.1 is the officially recommended inference solution for it. The SGLang and DeepSeek teams worked together to support DeepSeek V3 FP8 on NVIDIA and AMD GPUs from day one. SGLang has supported MLA and DP
DeepSeek
@deepseek_ai
26. pro 2024.
🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n
GIF
33K
LMSYS Org
@lmsysorg
14. svi 2025.
SGLang, verl, OpenBMB and Tsinghua University: Pioneering End-to-End Multi-Turn RLHF We are thrilled to announce the release of the first fully functional, convergence-verified, end-to-end open source multi-turn Reinforcement Learning with Human Feedback (RLHF) framework,
18K
LMSYS Org
@lmsysorg
11. kol 2025.
Honored to see SGLang adopted in RL training for GLM-4.5 at @Zai_org — large-scale validation from a frontier AI lab pushing the boundaries of LLMs!
86K
LMSYS Org
@lmsysorg
12. lip 2025.
Huge thanks to @AMD for donating an MI350 to SGLang! This advanced AI accelerator is making a meaningful difference—enabling us to move faster in developing scalable LLM systems and pushing the limits of inference optimization. Special thank to our awesome infra partner
44K
LMSYS Org
@lmsysorg
28. tra 2025.
Qwen 3 @Alibaba_Qwen has been released! SGLang is proud to be a close partner supporting it from day 0!
28K
LMSYS Org
@lmsysorg
11. ruj 2025.
We are excited to announce SGLang HiCache, our community solution for hierarchical KV caching to power high-performance LLM serving. ⚡ Performance: up to 6× throughput and 80% TTFT reduction demonstrated in benchmarks and real-world deployments. 🗂️ Flexibility: seamless
60K