Iniciar sesiónRegístrate
vLLM
1,091 posts
user avatar
vLLM
@vllm_project
A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!
vllm.ai
Te uniste el March 2024
36
Siguiendo
43.1K
Seguidores

¿Eres nuevo en X?

Regístrate ahora para obtener tu propia cronología personalizada.

Crear cuenta

Al registrarte, aceptas los Términos de servicio y la Política de Privacidad, incluyendo el Uso de Cookies.

Terms·Privacy·Cookies·Accesibilidad·Ads Info·© 2026 X Corp.
Don't miss what's happening
Las personas en X son las primeras en enterarse.
Iniciar sesiónRegístrate
  • user avatar
    vLLM
    @vllm_project
    20 oct 2025
    🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping
    2M
  • user avatar
    vLLM
    @vllm_project
    14 abr 2025
    🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit!
    open-infra-index/OpenSourcing_DeepSeek_Inference_Engine/README.md at main · deepseek-ai/open-infr...
    De github.com
    203K
  • user avatar
    vLLM
    @vllm_project
    3 nov 2025
    Wow excited to see PewDiePie using vLLM to serve language models locally 😃 vLLM brings easy, fast, and cheap LLM serving for everyone 🥰
    user avatar
    Yuchen Jin
    @Yuchenj_UW
    31 oct 2025
    PewDiePie in 2025: – built a 10×4090 rig – runs Llama 70B, gpt-oss-120B & Qwen 245B locally via vLLM – built a custom web UI (chat, RAG, search, TTS) – ran protein-folding simulations for charity – created an AI “council”, a swarm of 64 models – now fine-tuning his own model
    164K
  • user avatar
    vLLM
    @vllm_project
    18 sept 2025
    Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰
    213K
  • user avatar
    vLLM
    @vllm_project
    17 ago 2025
    🚀 Amazing community project! vLLM CLI — a command-line tool for serving LLMs with vLLM: ✅ Interactive menu-driven UI & scripting-friendly CLI ✅ Local + HuggingFace Hub model management ✅ Config profiles for perf/memory tuning ✅ Real-time server & GPU monitoring ✅ Error
    71K
  • user avatar
    vLLM
    @vllm_project
    21 feb 2025
    We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!
    119K
  • user avatar
    vLLM
    @vllm_project
    16 oct 2025
    Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on
    157K
  • user avatar
    vLLM
    @vllm_project
    17 abr 2025
    vLLM🤝🤗! You can now deploy any @huggingface language model with vLLM's speed. This integration makes it possible for one consistent implementation of the model in HF for both training and inference. 🧵
    Transformers modeling backend integration in vLLM
    De vllm.ai
    73K
  • user avatar
    vLLM
    @vllm_project
    1 feb 2025
    We landed the 1st batch of enhancements to the @deepseek_ai models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.
    90K
  • user avatar
    vLLM
    @vllm_project
    29 sept 2025
    How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.
    Este post no está disponible.
    103K
  • user avatar
    vLLM
    @vllm_project
    28 sept 2025
    🚀 New in vLLM: dots.ocr 🔥 A powerful multilingual OCR model from @xiaohongshu hi lab is now officially supported in vLLM! 📝 Single end-to-end parser for text, tables (HTML), formulas (LaTeX), and layouts (Markdown) 🌍 Supports 100 languages with robust performance on
    user avatar
    merve
    @mervenoyann
    5 ago 2025
    we're all sleeping on this OCR model 🔥 dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯 single e2e model to extract image, convert tables, formula, and more into markdown 📝
    69K
  • user avatar
    vLLM
    @vllm_project
    22 oct 2025
    it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most
    170K
  • user avatar
    vLLM
    @vllm_project
    27 ene 2025
    🚀 With the v0.7.0 release today, we are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more.
    95K
  • user avatar
    vLLM
    @vllm_project
    9 sept 2025
    The amazing blogpost from @gordic_aleksa is alive at vLLM's blogpost blog.vllm.ai/2025/09/05/ana… (after more proofreading and clarifications)! Looking forward to future series of tech deep dive blogposts😍
    user avatar
    Aleksa Gordić (水平问题)
    @gordic_aleksa
    1 sept 2025
    New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up
    47K