Log inSign up
LMSYS Org
1,164 posts
user avatar
LMSYS Org
@lmsysorg
Large Model Systems Organization: Join our Slack: slack.sglang.io. We developed SGLang sglang.io, Chatbot Arena (now @arena), and Vicuna!
US
lmsys.org
Joined August 2024
199
Following
15.8K
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    LMSYS Org
    @lmsysorg
    22 Sept 2025
    SGLang now supports deterministic LLM inference! Building on @thinkymachines batch-invariant kernels, we integrated deterministic attention & sampling ops into a high-throughput engine - fully compatible with chunked prefill, CUDA graphs, radix cache, and non-greedy sampling. ✅
    112K
  • user avatar
    LMSYS Org
    @lmsysorg
    5 May 2025
    🚀 Breaking: SGLang provides the first open-source implementation to serve @deepseek_ai V3/R1 models with large-scale expert parallelism and prefill-decode disaggregation on 96 GPUs. It nearly matches the throughput reported by the official DeepSeek blog, achieving 52.3K input
    162K
  • user avatar
    LMSYS Org
    @lmsysorg
    14 Oct 2025
    🚀 SGLang In-Depth Review of the NVIDIA DGX Spark is LIVE! Thanks to @nvidia’s early access program, SGLang makes its first ever appearance in a consumer product, the brand-new DGX Spark. The DGX Spark’s 128GB Unified Memory and Blackwell architecture set a new standard for
    411K
  • user avatar
    LMSYS Org
    @lmsysorg
    29 Sept 2025
    🎉 Congrats to the DeepSeek team on the amazing release of Sparse Attention (DSA) in V3.2! This fine-grained design sets a new bar for long-context efficiency 🚀 We’re proud that SGLang is an official inference framework for DeepSeek-V3.2 — with optimized sparse attention
    user avatar
    DeepSeek
    @deepseek_ai
    29 Sept 2025
    🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n
    56K
  • user avatar
    LMSYS Org
    @lmsysorg
    7 Nov 2025
    🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API
    109K
  • user avatar
    LMSYS Org
    @lmsysorg
    31 May 2025
    Hello everyone, the SGLang community, in collaboration with the Search R1 team, has quickly reproduced Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning based on the previously open-sourced multi-turn RL. We welcome you to get hands-on
    18K
  • user avatar
    LMSYS Org
    @lmsysorg
    30 Jul 2025
    🚨Big News! We collaborated with @nvidia to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world’s most advanced data center–scale accelerated computing platform. This docker container runs a single copy of the model across 56
    29K
  • user avatar
    LMSYS Org
    @lmsysorg
    25 Sept 2025
    🚀 Follow-up to our last breakthrough on DeepSeek V3/R1 inference! On NVIDIA GB200 NVL72, SGLang now achieves 26k input tokens/s and 13k output tokens/s per GPU with FP8 attention + NVFP4 MoE - that’s a 3.8× / 4.8× speedup vs H100 settings. See the details in the 🧵 (1/4)
    69K
  • user avatar
    LMSYS Org
    @lmsysorg
    26 Dec 2024
    The best open-source LLM, DeepSeek V3, has just been released! SGLang v0.4.1 is the officially recommended inference solution for it. The SGLang and DeepSeek teams worked together to support DeepSeek V3 FP8 on NVIDIA and AMD GPUs from day one. SGLang has supported MLA and DP
    user avatar
    DeepSeek
    @deepseek_ai
    26 Dec 2024
    🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n
    GIF
    33K
  • user avatar
    LMSYS Org
    @lmsysorg
    14 May 2025
    SGLang, verl, OpenBMB and Tsinghua University: Pioneering End-to-End Multi-Turn RLHF We are thrilled to announce the release of the first fully functional, convergence-verified, end-to-end open source multi-turn Reinforcement Learning with Human Feedback (RLHF) framework,
    18K
  • user avatar
    LMSYS Org
    @lmsysorg
    11 Aug 2025
    Honored to see SGLang adopted in RL training for GLM-4.5 at @Zai_org — large-scale validation from a frontier AI lab pushing the boundaries of LLMs!
    86K
  • user avatar
    LMSYS Org
    @lmsysorg
    12 Jun 2025
    Huge thanks to @AMD for donating an MI350 to SGLang! This advanced AI accelerator is making a meaningful difference—enabling us to move faster in developing scalable LLM systems and pushing the limits of inference optimization. Special thank to our awesome infra partner
    44K
  • user avatar
    LMSYS Org
    @lmsysorg
    28 Apr 2025
    Qwen 3 @Alibaba_Qwen has been released! SGLang is proud to be a close partner supporting it from day 0!
    28K
  • user avatar
    LMSYS Org
    @lmsysorg
    11 Sept 2025
    We are excited to announce SGLang HiCache, our community solution for hierarchical KV caching to power high-performance LLM serving. ⚡ Performance: up to 6× throughput and 80% TTFT reduction demonstrated in benchmarks and real-world deployments. 🗂️ Flexibility: seamless
    60K