Sign in to view Robert’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Robert’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
San Francisco Bay Area
Sign in to view Robert’s full profile
Robert can introduce you to 10+ people at Anyscale
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
28K followers
500+ connections
Sign in to view Robert’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Robert
Robert can introduce you to 10+ people at Anyscale
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Robert
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Robert’s full profile
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Activity
28K followers
-
Robert Nishihara reposted thisRobert Nishihara reposted thisI got to present our team's work on how we scale vision AI inference on satellite imagery to continental scale using Ray on Anyscale. A big thanks to the Anyscale team for hosting us and for the invaluable technical guidance they gave us throughout. https://lnkd.in/e238HpYd Milos Colic Vuong Nguyen Pritimoy Podder Pablo Hidalgo Ryan Bashir Ali Sezer Alexandr PlashchinskyHow Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | AnyscaleHow Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | Anyscale
-
Robert Nishihara reposted thisReally enjoyed showcasing how platform teams need to evolve to support large scale foundation model building and where we see the ecosystem evolving. I frequently talk with infra orgs thinking about how to make the transition from cloud microservices to supporting large scale GPU fleets for data processing, training, and inference. The number 1 challenge I hear is the amount of noise in the space right now - the tool and cognitive overload is real. This talk is my attempt to create some clarity on what teams should be building in this space. Check it out!Robert Nishihara reposted thisAs platform teams begin supporting agentic AI systems, many are discovering that the infrastructure built for cloud-native applications doesn't naturally extend to AI workloads. Kubernetes excels at scaling stateless services, but AI introduces fundamentally different workload patterns: ▪️Training needs distributed scheduling, fault tolerance, and fair GPU sharing. ▪️Inference demands low-latency serving, efficient GPU utilization, and cost-aware placement. ▪️Reinforcement learning loops combine data processing, training, simulation, and inference into a single continuous workflow. If simply trying to run containers with AI models on K8s, teams run into GPU contention, fragmented tooling, scheduling complexity, and infrastructure that wasn't designed for multiple AI workload types. The next evolution isn't replacing Kubernetes, it's extending it with AI-native workload orchestration, multi-workload support, and smarter GPU scheduling. Learn more from Christian Stano on how to address this with Ray on Anyscale from his session at PlatformCon 2026 from Platform Engineering: https://lnkd.in/gK_3aMRv
-
Robert Nishihara shared thisTry Ray 2.56!Robert Nishihara shared this🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwNRay Data 2.56: Improving Reliability for AI Data Pipelines | AnyscaleRay Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale
-
Robert Nishihara reposted thisRobert Nishihara reposted this🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwNRay Data 2.56: Improving Reliability for AI Data Pipelines | AnyscaleRay Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale
-
Robert Nishihara reposted thisRobert Nishihara reposted thisRobert Nishihara says “Inference is a subroutine of larger more complex AI pipelines”. This is a very succinct way to understand what is happening in AI right now. AI projects are graduating from custom inference to custom models. The business imperative is shifting from simply lower costs to owning a moat. The moat is the data and the AI learning loop. Learning loops require complex orchestration of rollouts, data, evals, policy updates and more across a heterogeneous compute estate of GPUs and CPUs. Inference is a subroutine in this context. It’s still critical. But a part of a whole that is more complex. For this new era of AI, composability becomes a key aspect without giving up on performance. Ray is the backbone for this era with Ray Serve as the most ergonomic way for developers to compose model serving as a part of the AI learning loop. But that is not an excuse for lower performance. Performance still matters in this context. This is why we have focused on improving Ray Serve performance 4.4x for prefill and 28x for decode stages. We are excited for what this does to unify the disparate parts of the AI learning loop into a single cohesive AI backbone for all your varied workload needs. Read more about the performance optimizations in this blog: https://lnkd.in/gVdsg7cj Try it out in Ray 2.56 or easier still on Anyscale, and join us on the Ray Slack to share feedback!High Performance Distributed Inference with Ray Serve LLM | AnyscaleHigh Performance Distributed Inference with Ray Serve LLM | Anyscale
-
Robert Nishihara reposted thisRobert Nishihara reposted thisExcited to share our latest case study with Geotab — a global leader in connected fleet operations and video telematics! Geotab processes billions of dashcam frames every day to power real-time driver safety insights — things like speed sign detection, risky driving alerts, and live coaching. The challenge? Getting their data scientists to move fast without drowning in infrastructure complexity. By building their video AI platform on Anyscale, here's what they unlocked: Business Benefits: - 43x higher peak-hour video processing throughput - 40% fewer GPUs needed at peak — real cost savings - Faster iteration cycles mean safer roads sooner - Data scientists can self-serve at scale, no bottlenecks Technical Wins: - 4x improvement in GPU utilization - Docker image load times dropped from 20-30 min to 4-5 min - Fractional GPU allocation with simple Python annotations (no Kubernetes headaches) - On-demand model deployment — spin up, process, shut down. No idle GPU waste. - End-to-end batch pipelines without bolting on a separate orchestrator A huge thank you to the team at Geotab for being incredible partners in pushing the boundaries of fleet AI. It's been awesome seeing what your team builds when infrastructure gets out of the way. Check out the full case study here: https://lnkd.in/e6fEb3tf #Geotab #Anyscale #GPUCompute #Ray #AIInfrastructure
-
Robert Nishihara reposted thisRobert Nishihara reposted thisWhat if you could do 43x more processing with 40% less GPUs? That's what happened when Geotab made Anyscale the foundational platform to run multiple AI workloads across multiple teams. A great example of how better orchestration and resource utilization can unlock massive gains—without simply throwing more hardware at the problem. 🔥 https://lnkd.in/g45VWQnV
-
Robert Nishihara shared thisFast LLM inference with Ray Serve + vLLM + GKE. https://lnkd.in/gMsuYSZRImproving Ray Serve LLM on GKE throughput, latency | Google Cloud BlogImproving Ray Serve LLM on GKE throughput, latency | Google Cloud Blog
-
Robert Nishihara reposted thisRobert Nishihara reposted thisToday we are excited to announce, in partnership with the Google Kubernetes Engine (GKE) team at Google Cloud, a major milestone in Ray Serve LLM’s throughput and latency characteristics: Ray Serve LLM now matches high performance, rust-based routing frameworks such as vllm-router in benchmarks across a variety of workloads and deployment patterns. In our new blog, we cover three major optimizations to the Ray Serve LLM + vLLM stack that made this possible: direct streaming, a new vLLM Ray executor backend, and HAProxy integration. As a result, we see up to 4.4x higher request throughput than previous versions on prefill-heavy workloads, and up to 24x higher request throughput on decode-heavy workloads. Ray is a popular choice for complex, Python-native distributed computing batch inference pipelines with heterogeneous hardware. And now, we believe that Ray’s powerful primitives for fault tolerance, observability, flexibility across Kubernetes and VMs will enable the next generation of optimizations as LLM inference deployments become increasingly complex. Thanks to Spencer Peterson, Andrew Sy Kim, Kourosh Hakhamaneshi, Jeffrey (Yu-Che) Wang, Richard Liaw, Akshay Malik, Abrar Sheikh, and Alex Yang whose contributions to Ray Serve and Ray Serve LLM made this possible. A special thanks to the vllm-router (vLLM) and SGLang Model Gateway (SGLang) teams for great engineering on their respective projects. Read the full writeup here: https://lnkd.in/guHrz_FA
-
Robert Nishihara liked thisRobert Nishihara liked thisI got to present our team's work on how we scale vision AI inference on satellite imagery to continental scale using Ray on Anyscale. A big thanks to the Anyscale team for hosting us and for the invaluable technical guidance they gave us throughout. https://lnkd.in/e238HpYd Milos Colic Vuong Nguyen Pritimoy Podder Pablo Hidalgo Ryan Bashir Ali Sezer Alexandr PlashchinskyHow Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | AnyscaleHow Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | Anyscale
-
Robert Nishihara reacted on thisReally enjoyed showcasing how platform teams need to evolve to support large scale foundation model building and where we see the ecosystem evolving. I frequently talk with infra orgs thinking about how to make the transition from cloud microservices to supporting large scale GPU fleets for data processing, training, and inference. The number 1 challenge I hear is the amount of noise in the space right now - the tool and cognitive overload is real. This talk is my attempt to create some clarity on what teams should be building in this space. Check it out!Robert Nishihara reacted on thisAs platform teams begin supporting agentic AI systems, many are discovering that the infrastructure built for cloud-native applications doesn't naturally extend to AI workloads. Kubernetes excels at scaling stateless services, but AI introduces fundamentally different workload patterns: ▪️Training needs distributed scheduling, fault tolerance, and fair GPU sharing. ▪️Inference demands low-latency serving, efficient GPU utilization, and cost-aware placement. ▪️Reinforcement learning loops combine data processing, training, simulation, and inference into a single continuous workflow. If simply trying to run containers with AI models on K8s, teams run into GPU contention, fragmented tooling, scheduling complexity, and infrastructure that wasn't designed for multiple AI workload types. The next evolution isn't replacing Kubernetes, it's extending it with AI-native workload orchestration, multi-workload support, and smarter GPU scheduling. Learn more from Christian Stano on how to address this with Ray on Anyscale from his session at PlatformCon 2026 from Platform Engineering: https://lnkd.in/gK_3aMRv
-
Robert Nishihara reacted on thisRobert Nishihara reacted on thisMost GPU platforms are built for one user, one cluster, one workload at a time. At Geotab, GPU Docker images took 20 to 30 minutes to load, one researcher occupying a machine meant everyone else waited, and every new team needed its own Terraform setup just to get started. With Anyscale, they built a platform where data scientists can annotate a job with "GPU = 0.1", get exactly that fraction of a GPU, and run alongside a dozen other workloads simultaneously, all without touching Kubernetes. Image load times dropped to 4 to 5 minutes. GPU utilization improved 4x. And the platform team now supports a growing organization of researchers without growing the infrastructure burden alongside it. Full case study: https://lnkd.in/gftDrqb9
-
Robert Nishihara reacted on thisRobert Nishihara reacted on this🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwNRay Data 2.56: Improving Reliability for AI Data Pipelines | AnyscaleRay Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale
Experience & Education
-
Anyscale
**********
-
********
******** ****** ********* ** *********
-
********* ********
******** ****** ******** ******** *** ********** ******
-
********** ** *********** ********
****** ** ********** ***** ******** ******* undefined
-
-
******* **********
** ***********
-
View Robert’s full experience
See their title, tenure and more.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View Robert’s full profile
-
See who you know in common
-
Get introduced
-
Contact Robert directly
Other similar profiles
Explore more posts
-
Kit Yu
33K followers
2) 2026 setups coming into focus: Heading into 4Q EPS, average CY26 total revenue estimates are effectively unchanged vs. pre-3Q25 (-0.2% on average), though TTAN/VEEV/GWRE/WK all saw a >1% increase in estimates. Over the same period, CY27 total revenue estimates saw modest downward revisions (-0.8% on average), with only TTAN/VEEV/GWRE seeing positive revisions in our coverage (Exhibit 8). When assessing 2026 setups across our coverage, we note that Consensus estimates currently imply a ~2.5pt deceleration in YoY growth on average vs. 2025 Consensus growth. Across our coverage, only BL/WAY Consensus estimates imply an acceleration in growth off of 2025 levels (though we note 2026 estimates for WAY include inorganic contributions from the company's acquisition of Iodine Software, Exhibit 9). On 2026 total revenue estimates, we are most above Consensus for TTAN/WAY/VERX/GWRE and most below Consensus for VEEV/SPT/NCNO. For CY27, we are most above Consensus for TTAN/WK/NCNO (Exhibit 10). While we anticipate management teams across our SMID-cap software coverage universe will maintain a prudent approach to FY26 guidance (a continuation of the trend we observed throughout FY25), we expect investors will increasingly focus on the magnitude of quarterly beat-and-raises to get more positive on shares against what we view as effectively de-risked 2026 estimates.
1
-
Neal Ghosh
9point8 Collective • 3K followers
Systems first. AI next. The order matters, a lot. Over the past few weeks, Evan Allen and I have been digging into agentic workflows - reading papers, testing tools, and mapping patterns across different implementations. One insight keeps standing out. Agentic workflows aren’t inherently “AI.” They’re decomposition systems. The core idea is simple: take a complex task, break it into smaller, well-defined steps, execute those steps independently, then reassemble the output. Who performs each step is secondary. 👉 Some steps are best handled by machines: speed, recall, pattern matching. 👉 Others still benefit from humans: judgment, accountability. 💡 Once you see it this way, the false binary between “automated” and "manual” disappears. For human-in-the-loop systems - like what we're building with Nexus - it shifts the goal from replacing people to assigning work to whoever (or whatever) does it best, step by step. AI becomes a delegate, not an endpoint. That also forces more discipline. You have to be explicit about what the task actually is, what good looks like, and where judgment truly matters. Sloppy workflows don’t magically improve when you add AI - they just fail faster. I suspect the most durable systems won’t be the most automated ones, but the most intentional about where humans stay in the loop. Where have you seen the balance work well, or break down?
14
2 Comments -
Mo Islam
Arkaea Media Group • 8K followers
We recently had Brian Barritt, Co-Founder and CTO of Aalyria on Valley of Depth. Aalyria just raised a $100 million round at a $1.3 billion valuation. The company is building two core technologies: 1) Spacetime, a software orchestration layer designed to manage networks in motion, and 2) Tightbeam, a laser communications system delivering fiber-like speeds through the atmosphere. Together, they aim to solve one of the hardest infrastructure challenges in aerospace and defense: how to coordinate satellites, aircraft, drones, ships, and ground systems into a seamless “network of networks.” Brian and I cover: – How laser communications can reach 100 gigabits per second through atmosphere – The technical challenge of Earth-to-space vs. space-to-Earth optical links – Why interoperability has been a 40-year ambition inside the DoD – What resilience and roaming look like in hybrid satellite architectures and much more... Full episode in comments 👇
113
4 Comments -
Thomas Wolf
Hugging Face • 185K followers
Despite all the big funding rounds and flashy demos in US robotics, K-Scale’s inability to raise more money should worry us. We're at risk of replaying the LLM story all over again in robotics: - Chinese companies are going open-source and collaborating across the value chain (from EV suppliers to downstream integrators) - most western teams are going full-stack proprietary, closed-source, all in-house Guess which robots the next wave of research labs and startups will actually be able to build on when they want to invent new algorithms or tackle unseen real-world use cases?
273
33 Comments
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content