ورودنام‌نویسی
vLLM
1,091 posts
user avatar
vLLM
@vllm_project
A high-throughput and memory-efficient inference and serving engine for LLMs. Join slack.vllm.ai to discuss together with the community!
vllm.ai
تاریخ پیوستن: March 2024
36
دنبال‌شده
43.1K
دنبال‌کنندگان

در X تازه‌وارد هستید؟

همین حالا نام‌نویسی کنید تا خط زمان شخصی‌شده خودتان را داشته باشید!

ایجاد حساب کاربری

با نام‌نویسی کردن، با شرایط استفاده و سیاست‌های مربوط به حریم شخصی، ازجمله استفاده از کوکی‌ها موافقت می‌کنید.

Terms·Privacy·Cookies·دسترس‌پذیری·Ads Info·© 2026 X Corp.
Don't miss what's happening
افرادی که در X هستند نخستین افرادی هستند که باخبر می‌شوند.
ورودنام‌نویسی
  • user avatar
    vLLM
    @vllm_project
    ۲۸ مهر ۱۴۰۴
    🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping
    2M
  • user avatar
    vLLM
    @vllm_project
    ۲۵ فروردین ۱۴۰۴
    🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit!
    open-infra-index/OpenSourcing_DeepSeek_Inference_Engine/README.md at main · deepseek-ai/open-infr...
    از github.com
    203K
  • user avatar
    vLLM
    @vllm_project
    ۱۲ آبان ۱۴۰۴
    Wow excited to see PewDiePie using vLLM to serve language models locally 😃 vLLM brings easy, fast, and cheap LLM serving for everyone 🥰
    user avatar
    Yuchen Jin
    @Yuchenj_UW
    ۹ آبان ۱۴۰۴
    PewDiePie in 2025: – built a 10×4090 rig – runs Llama 70B, gpt-oss-120B & Qwen 245B locally via vLLM – built a custom web UI (chat, RAG, search, TTS) – ran protein-folding simulations for charity – created an AI “council”, a swarm of 64 models – now fine-tuning his own model
    164K
  • user avatar
    vLLM
    @vllm_project
    ۲۷ شهریور ۱۴۰۴
    Congrats to @deepseek_ai ! DeepSeek-R1 was published in Nature yesterday as the cover article, and vLLM is proud to have supported its RL training and inference🥰
    213K
  • user avatar
    vLLM
    @vllm_project
    ۲۶ مرداد ۱۴۰۴
    🚀 Amazing community project! vLLM CLI — a command-line tool for serving LLMs with vLLM: ✅ Interactive menu-driven UI & scripting-friendly CLI ✅ Local + HuggingFace Hub model management ✅ Config profiles for perf/memory tuning ✅ Real-time server & GPU monitoring ✅ Error
    71K
  • user avatar
    vLLM
    @vllm_project
    ۳ اسفند ۱۴۰۳
    We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you @nvidia!
    119K
  • user avatar
    vLLM
    @vllm_project
    ۲۴ مهر ۱۴۰۴
    Announcing the completely reimagined vLLM TPU! In collaboration with @Google, we've launched a new high-performance TPU backend unifying @PyTorch and JAX under a single lowering path for amazing performance and flexibility. 🚀 What's New? - JAX + Pytorch: Run PyTorch models on
    157K
  • user avatar
    vLLM
    @vllm_project
    ۲۸ فروردین ۱۴۰۴
    vLLM🤝🤗! You can now deploy any @huggingface language model with vLLM's speed. This integration makes it possible for one consistent implementation of the model in HF for both training and inference. 🧵
    Transformers modeling backend integration in vLLM
    از vllm.ai
    73K
  • user avatar
    vLLM
    @vllm_project
    ۱۳ بهمن ۱۴۰۳
    We landed the 1st batch of enhancements to the @deepseek_ai models, starting MLA and cutlass fp8 kernels. Compared to v0.7.0, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism.
    90K
  • user avatar
    vLLM
    @vllm_project
    ۷ مهر ۱۴۰۴
    How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.
    این پست دردسترس نیست.
    103K
  • user avatar
    vLLM
    @vllm_project
    ۶ مهر ۱۴۰۴
    🚀 New in vLLM: dots.ocr 🔥 A powerful multilingual OCR model from @xiaohongshu hi lab is now officially supported in vLLM! 📝 Single end-to-end parser for text, tables (HTML), formulas (LaTeX), and layouts (Markdown) 🌍 Supports 100 languages with robust performance on
    user avatar
    merve
    @mervenoyann
    ۱۴ مرداد ۱۴۰۴
    we're all sleeping on this OCR model 🔥 dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯 single e2e model to extract image, convert tables, formula, and more into markdown 📝
    69K
  • user avatar
    vLLM
    @vllm_project
    ۳۰ مهر ۱۴۰۴
    it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most
    170K
  • user avatar
    vLLM
    @vllm_project
    ۸ بهمن ۱۴۰۳
    🚀 With the v0.7.0 release today, we are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more.
    95K
  • user avatar
    vLLM
    @vllm_project
    ۱۸ شهریور ۱۴۰۴
    The amazing blogpost from @gordic_aleksa is alive at vLLM's blogpost blog.vllm.ai/2025/09/05/ana… (after more proofreading and clarifications)! Looking forward to future series of tech deep dive blogposts😍
    user avatar
    Aleksa Gordić (水平问题)
    @gordic_aleksa
    ۱۰ شهریور ۱۴۰۴
    New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work! Took me a while to get this level of understanding of the codebase and then to write up
    47K