馃毃Big News! We collaborated with @nvidia to release a DeepSeek R1 inference container optimized for large scale deployment on GB200 NVL72, the world鈥檚 most advanced data center鈥搒cale accelerated computing platform. This docker container runs a single copy of the model across 56 Blackwell GPUs, achieving over 13,149 tokens/sec for prefill and 9,290 tokens/sec for decode. These results represent a 2x and 3x per GPU increase, respectively, compared to running the model on an H100 cluster nearly twice as large. Read our blog to learn more and download the container to reproduce the results!馃憞
