Anyscale@anyscalecomputeWe’ve recently contributed FP8 support to the @vllm_project in collaboration with @neuralmagic. With this feature, you can see up to a 1.8x reduction in inter-token latency, with >99% accuracy preservation! 1/n3:25 PM · Jul 10, 202434.3KViews2232321051054646Read 2 replies