As platform teams begin supporting agentic AI systems, many are discovering that the infrastructure built for cloud-native applications doesn't naturally extend to AI workloads. Kubernetes excels at scaling stateless services, but AI introduces fundamentally different workload patterns: ▪️Training needs distributed scheduling, fault tolerance, and fair GPU sharing. ▪️Inference demands low-latency serving, efficient GPU utilization, and cost-aware placement. ▪️Reinforcement learning loops combine data processing, training, simulation, and inference into a single continuous workflow. If simply trying to run containers with AI models on K8s, teams run into GPU contention, fragmented tooling, scheduling complexity, and infrastructure that wasn't designed for multiple AI workload types. The next evolution isn't replacing Kubernetes, it's extending it with AI-native workload orchestration, multi-workload support, and smarter GPU scheduling. Learn more from Christian Stano on how to address this with Ray on Anyscale from his session at PlatformCon 2026 from Platform Engineering: https://lnkd.in/gK_3aMRv
Anyscale
Software Development
San Francisco, California 60,970 followers
Scalable compute for AI and Python. Creators of Ray distributed compute framework.
About us
Anyscale enables Python developers to build and run all their AI—from data prep to training and inference—at any scale. Anyscale is trusted by leading AI teams at Canva, TripAdvisor, Physical Intelligence, Coinbase and more.
- Website
-
https://anyscale.com
External link for Anyscale
- Industry
- Software Development
- Company size
- 201-500 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2019
Employees at Anyscale
Locations
-
Primary
Get directions
600 Harrison St
San Francisco, California 94107, US
-
Get directions
411 High St
Palo Alto, California 94301, US
Updates
-
Anyscale reposted this
I got to present our team's work on how we scale vision AI inference on satellite imagery to continental scale using Ray on Anyscale. A big thanks to the Anyscale team for hosting us and for the invaluable technical guidance they gave us throughout. https://lnkd.in/e238HpYd Milos Colic Vuong Nguyen Pritimoy Podder Pablo Hidalgo Ryan Bashir Ali Sezer Alexandr Plashchinsky
-
Most GPU platforms are built for one user, one cluster, one workload at a time. At Geotab, GPU Docker images took 20 to 30 minutes to load, one researcher occupying a machine meant everyone else waited, and every new team needed its own Terraform setup just to get started. With Anyscale, they built a platform where data scientists can annotate a job with "GPU = 0.1", get exactly that fraction of a GPU, and run alongside a dozen other workloads simultaneously, all without touching Kubernetes. Image load times dropped to 4 to 5 minutes. GPU utilization improved 4x. And the platform team now supports a growing organization of researchers without growing the infrastructure burden alongside it. Full case study: https://lnkd.in/gftDrqb9
-
-
Anyscale reposted this
Try Ray 2.56!
🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwN
-
Anyscale reposted this
🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwN
-
Anyscale reposted this
Robert Nishihara says “Inference is a subroutine of larger more complex AI pipelines”. This is a very succinct way to understand what is happening in AI right now. AI projects are graduating from custom inference to custom models. The business imperative is shifting from simply lower costs to owning a moat. The moat is the data and the AI learning loop. Learning loops require complex orchestration of rollouts, data, evals, policy updates and more across a heterogeneous compute estate of GPUs and CPUs. Inference is a subroutine in this context. It’s still critical. But a part of a whole that is more complex. For this new era of AI, composability becomes a key aspect without giving up on performance. Ray is the backbone for this era with Ray Serve as the most ergonomic way for developers to compose model serving as a part of the AI learning loop. But that is not an excuse for lower performance. Performance still matters in this context. This is why we have focused on improving Ray Serve performance 4.4x for prefill and 28x for decode stages. We are excited for what this does to unify the disparate parts of the AI learning loop into a single cohesive AI backbone for all your varied workload needs. Read more about the performance optimizations in this blog: https://lnkd.in/gVdsg7cj Try it out in Ray 2.56 or easier still on Anyscale, and join us on the Ray Slack to share feedback!
-
Evaluating a robot foundation model is one of the most demanding closed-loop problems in robotics. Before you can trust a policy on a real robot, you need to validate it across thousands of starting conditions, pairing GPU-heavy model inference with GPU-heavy physics simulation, step by step. At scale, evaluation quickly becomes an infrastructure challenge, not just a robotics problem. In this new blog, Ian D. Jordan, PhD explains how to run thousands of simulation rollouts in parallel, scaling from a single machine to distributed clusters with minimal code changes. Learn how robotics teams can maximize GPU utilization, reduce infrastructure overhead, and spend more time improving robot policies. Read the blog: https://lnkd.in/gHvqbEi5
-
Anyscale reposted this
Anyscale just published a case study on our work at Geotab! We've been building our AI platform with Ray and Anyscale to run AI/ML inference at scale efficiently and without burning through GPUs. And the results are: - 43x peak-hour throughput - 4x GPU utilization - 40% fewer GPUs at peak Read the full case study here: https://lnkd.in/gPHGGgvn #Geotab #Ray #Anyscale #AIPlatform
-
Ray Summit kicks off with a full day of hands-on training on Aug 24. 🛠️ Built for engineers running AI in production, not a weekend hackathon or a deploy-your-first-model tutorial. Choose your own training: Select 1 AM track and 1 PM track. Morning → Multimodal data processing pipelines for AI systems → Foundation model distributed training with Ray → Production-ready distributed inference with Ray Serve Afternoon → Scaling physical AI & robotics systems with Ray → Real-time search & recommendation systems for AI commerce → LLM post-training and high-performance serving Passes with training are limited. Secure them now with early pricing at $250! https://lnkd.in/gF3FqYHS
-
-
Anyscale reposted this
Fast LLM inference with Ray Serve + vLLM + GKE. https://lnkd.in/gMsuYSZR