Robert Nishihara

San Francisco Bay Area

Sign in to view Robert’s full profile

Robert can introduce you to 10+ people at Anyscale

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

28K followers 500+ connections

View mutual connections with Robert

Robert can introduce you to 10+ people at Anyscale

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Join to view profile

Anyscale

University of California, Berkeley

Personal Website

Activity

28K followers

Robert Nishihara reposted this
Report this post
Robert Nishihara reposted this

Marcell Ferencz

Marcell Ferencz

17h

Robert Nishihara reposted this
I got to present our team's work on how we scale vision AI inference on satellite imagery to continental scale using Ray on Anyscale. A big thanks to the Anyscale team for hosting us and for the invaluable technical guidance they gave us throughout. https://lnkd.in/e238HpYd Milos Colic Vuong Nguyen Pritimoy Podder Pablo Hidalgo Ryan Bashir Ali Sezer Alexandr Plashchinsky

How Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | Anyscale

How Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | Anyscale
1 Comment
Robert Nishihara reposted this
Report this post
Christian Stano

Christian Stano

9h

Robert Nishihara reposted this
Really enjoyed showcasing how platform teams need to evolve to support large scale foundation model building and where we see the ecosystem evolving. I frequently talk with infra orgs thinking about how to make the transition from cloud microservices to supporting large scale GPU fleets for data processing, training, and inference. The number 1 challenge I hear is the amount of noise in the space right now - the tool and cognitive overload is real. This talk is my attempt to create some clarity on what teams should be building in this space. Check it out!

Anyscale

Anyscale

12h

Robert Nishihara reposted this
As platform teams begin supporting agentic AI systems, many are discovering that the infrastructure built for cloud-native applications doesn't naturally extend to AI workloads. Kubernetes excels at scaling stateless services, but AI introduces fundamentally different workload patterns: ▪️Training needs distributed scheduling, fault tolerance, and fair GPU sharing. ▪️Inference demands low-latency serving, efficient GPU utilization, and cost-aware placement. ▪️Reinforcement learning loops combine data processing, training, simulation, and inference into a single continuous workflow. If simply trying to run containers with AI models on K8s, teams run into GPU contention, fragmented tooling, scheduling complexity, and infrastructure that wasn't designed for multiple AI workload types. The next evolution isn't replacing Kubernetes, it's extending it with AI-native workload orchestration, multi-workload support, and smarter GPU scheduling. Learn more from Christian Stano on how to address this with Ray on Anyscale from his session at PlatformCon 2026 from Platform Engineering: https://lnkd.in/gK_3aMRv

public_profile__posts
Robert Nishihara

Robert Nishihara

2d
Report this post
Robert Nishihara shared this
Try Ray 2.56!

Richard Liaw

Richard Liaw

2d

Robert Nishihara shared this
🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwN

Ray Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale

Ray Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale
1 Comment
Robert Nishihara reposted this
Report this post
Robert Nishihara reposted this

Richard Liaw

Richard Liaw

2d

Robert Nishihara reposted this
🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwN

Ray Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale

Ray Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale
Robert Nishihara reposted this
Report this post
Robert Nishihara reposted this

Keerti Melkote

Keerti Melkote

2d

Robert Nishihara reposted this
Robert Nishihara says “Inference is a subroutine of larger more complex AI pipelines”. This is a very succinct way to understand what is happening in AI right now. AI projects are graduating from custom inference to custom models. The business imperative is shifting from simply lower costs to owning a moat. The moat is the data and the AI learning loop. Learning loops require complex orchestration of rollouts, data, evals, policy updates and more across a heterogeneous compute estate of GPUs and CPUs. Inference is a subroutine in this context. It’s still critical. But a part of a whole that is more complex. For this new era of AI, composability becomes a key aspect without giving up on performance. Ray is the backbone for this era with Ray Serve as the most ergonomic way for developers to compose model serving as a part of the AI learning loop. But that is not an excuse for lower performance. Performance still matters in this context. This is why we have focused on improving Ray Serve performance 4.4x for prefill and 28x for decode stages. We are excited for what this does to unify the disparate parts of the AI learning loop into a single cohesive AI backbone for all your varied workload needs. Read more about the performance optimizations in this blog: https://lnkd.in/gVdsg7cj Try it out in Ray 2.56 or easier still on Anyscale, and join us on the Ray Slack to share feedback!

High Performance Distributed Inference with Ray Serve LLM | Anyscale

High Performance Distributed Inference with Ray Serve LLM | Anyscale
Robert Nishihara reposted this
Report this post
Robert Nishihara reposted this

Chad Carlisle

Chad Carlisle

3d

Robert Nishihara reposted this
Excited to share our latest case study with Geotab — a global leader in connected fleet operations and video telematics! Geotab processes billions of dashcam frames every day to power real-time driver safety insights — things like speed sign detection, risky driving alerts, and live coaching. The challenge? Getting their data scientists to move fast without drowning in infrastructure complexity. By building their video AI platform on Anyscale, here's what they unlocked: Business Benefits: - 43x higher peak-hour video processing throughput - 40% fewer GPUs needed at peak — real cost savings - Faster iteration cycles mean safer roads sooner - Data scientists can self-serve at scale, no bottlenecks Technical Wins: - 4x improvement in GPU utilization - Docker image load times dropped from 20-30 min to 4-5 min - Fractional GPU allocation with simple Python annotations (no Kubernetes headaches) - On-demand model deployment — spin up, process, shut down. No idle GPU waste. - End-to-end batch pipelines without bolting on a separate orchestrator A huge thank you to the team at Geotab for being incredible partners in pushing the boundaries of fleet AI. It's been awesome seeing what your team builds when infrastructure gets out of the way. Check out the full case study here: https://lnkd.in/e6fEb3tf #Geotab #Anyscale #GPUCompute #Ray #AIInfrastructure
Robert Nishihara reposted this
Report this post
Robert Nishihara reposted this

Julian Forero

Julian Forero

1w

Robert Nishihara reposted this
What if you could do 43x more processing with 40% less GPUs? That's what happened when Geotab made Anyscale the foundational platform to run multiple AI workloads across multiple teams. A great example of how better orchestration and resource utilization can unlock massive gains—without simply throwing more hardware at the problem. 🔥 https://lnkd.in/g45VWQnV

public_profile__posts
Robert Nishihara

Robert Nishihara

1w
Report this post
Robert Nishihara shared this
Fast LLM inference with Ray Serve + vLLM + GKE. https://lnkd.in/gMsuYSZR

Improving Ray Serve LLM on GKE throughput, latency | Google Cloud Blog

Improving Ray Serve LLM on GKE throughput, latency | Google Cloud Blog
3 Comments
Robert Nishihara reposted this
Report this post
Robert Nishihara reposted this

Seiji Eicher

Seiji Eicher

2w

Robert Nishihara reposted this
Today we are excited to announce, in partnership with the Google Kubernetes Engine (GKE) team at Google Cloud, a major milestone in Ray Serve LLM’s throughput and latency characteristics: Ray Serve LLM now matches high performance, rust-based routing frameworks such as vllm-router in benchmarks across a variety of workloads and deployment patterns. In our new blog, we cover three major optimizations to the Ray Serve LLM + vLLM stack that made this possible: direct streaming, a new vLLM Ray executor backend, and HAProxy integration. As a result, we see up to 4.4x higher request throughput than previous versions on prefill-heavy workloads, and up to 24x higher request throughput on decode-heavy workloads. Ray is a popular choice for complex, Python-native distributed computing batch inference pipelines with heterogeneous hardware. And now, we believe that Ray’s powerful primitives for fault tolerance, observability, flexibility across Kubernetes and VMs will enable the next generation of optimizations as LLM inference deployments become increasingly complex. Thanks to Spencer Peterson, Andrew Sy Kim, Kourosh Hakhamaneshi, Jeffrey (Yu-Che) Wang, Richard Liaw, Akshay Malik, Abrar Sheikh, and Alex Yang whose contributions to Ray Serve and Ray Serve LLM made this possible. A special thanks to the vllm-router (vLLM) and SGLang Model Gateway (SGLang) teams for great engineering on their respective projects. Read the full writeup here: https://lnkd.in/guHrz_FA

public_profile__posts
4 Comments

Robert Nishihara liked this
Report this post
Robert Nishihara liked this

Marcell Ferencz

Marcell Ferencz

17h

Robert Nishihara liked this
I got to present our team's work on how we scale vision AI inference on satellite imagery to continental scale using Ray on Anyscale. A big thanks to the Anyscale team for hosting us and for the invaluable technical guidance they gave us throughout. https://lnkd.in/e238HpYd Milos Colic Vuong Nguyen Pritimoy Podder Pablo Hidalgo Ryan Bashir Ali Sezer Alexandr Plashchinsky

How Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | Anyscale

How Adyen trains a Transaction Foundation Model (TFM) on 51 trillion tokens and other stories on scaling AI with Ray from Xoople, Criteo, and BMW | Anyscale
1 Comment
Robert Nishihara reacted on this
Report this post
Christian Stano

Christian Stano

9h

Robert Nishihara reacted on this
Really enjoyed showcasing how platform teams need to evolve to support large scale foundation model building and where we see the ecosystem evolving. I frequently talk with infra orgs thinking about how to make the transition from cloud microservices to supporting large scale GPU fleets for data processing, training, and inference. The number 1 challenge I hear is the amount of noise in the space right now - the tool and cognitive overload is real. This talk is my attempt to create some clarity on what teams should be building in this space. Check it out!

Anyscale

Anyscale

12h

Robert Nishihara reacted on this
As platform teams begin supporting agentic AI systems, many are discovering that the infrastructure built for cloud-native applications doesn't naturally extend to AI workloads. Kubernetes excels at scaling stateless services, but AI introduces fundamentally different workload patterns: ▪️Training needs distributed scheduling, fault tolerance, and fair GPU sharing. ▪️Inference demands low-latency serving, efficient GPU utilization, and cost-aware placement. ▪️Reinforcement learning loops combine data processing, training, simulation, and inference into a single continuous workflow. If simply trying to run containers with AI models on K8s, teams run into GPU contention, fragmented tooling, scheduling complexity, and infrastructure that wasn't designed for multiple AI workload types. The next evolution isn't replacing Kubernetes, it's extending it with AI-native workload orchestration, multi-workload support, and smarter GPU scheduling. Learn more from Christian Stano on how to address this with Ray on Anyscale from his session at PlatformCon 2026 from Platform Engineering: https://lnkd.in/gK_3aMRv

public_profile__reactions
Robert Nishihara reacted on this
Report this post
Robert Nishihara reacted on this

Anyscale

Anyscale

1d

Robert Nishihara reacted on this
Most GPU platforms are built for one user, one cluster, one workload at a time. At Geotab, GPU Docker images took 20 to 30 minutes to load, one researcher occupying a machine meant everyone else waited, and every new team needed its own Terraform setup just to get started. With Anyscale, they built a platform where data scientists can annotate a job with "GPU = 0.1", get exactly that fraction of a GPU, and run alongside a dozen other workloads simultaneously, all without touching Kubernetes. Image load times dropped to 4 to 5 minutes. GPU utilization improved 4x. And the platform team now supports a growing organization of researchers without growing the infrastructure burden alongside it. Full case study: https://lnkd.in/gftDrqb9

public_profile__reactions
Robert Nishihara reacted on this
Report this post
Robert Nishihara reacted on this

Richard Liaw

Richard Liaw

2d

Robert Nishihara reacted on this
🚀 Ray 2.56 just landed! The team has been doing a lot of work to reduce OOMs and unnecessary spilling in Ray Data pipelines, driven by improvements in Ray Data memory management, better defaults, better process management, and more. In our testing, we’ve seen: 📉 Batch inference pipelines go from 300+ OOMs in 2.55 to 0 in 2.56 ⚡ Training data pipelines with local shuffle improve throughput by 3x ⏱️ Ray Data scheduling loop latency reduce by 6x at 2,000-worker scale 🧹 Training pipelines that previously spilled over 70 GB in 2.55 drop down to zero spilling in 2.56 If you’ve run into Ray Data issues in the past, we encourage you to try Ray Data 2.56! Read more on the release blog: https://lnkd.in/gK6xuBwN

Ray Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale

Ray Data 2.56: Improving Reliability for AI Data Pipelines | Anyscale

See all activities

Experience & Education

Anyscale

********

******** ****** ********* ** *********
********* ********

******** ****** ******** ******** *** ********** ******
********** ** *********** ********

****** ** ********** ***** ******** ******* undefined

2013 - 2019
******* **********

** ***********

2009 - 2013

View Robert’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

View Robert’s full profile

See who you know in common
Get introduced
Contact Robert directly

Join to view full profile

Other similar profiles

Josh Lessing

Josh Lessing

Shockwave Ventures

14K followers
United States

View Profile
Jian Huang

Jian Huang

University of Tennessee

10K followers
Knoxville, TN

View Profile
Matt Conger

Matt Conger

Entrepreneurs' Organization

17K followers
Singapore

View Profile
Robert Sweeney

Robert Sweeney

Meta

167K followers
Mountain View, CA

View Profile
Joel Butterly

Joel Butterly

InGenius Prep

27K followers
Greater Boston

View Profile
Kieran Snyder

Kieran Snyder

Microsoft

27K followers
Seattle, WA

View Profile
Samay Kohli

Samay Kohli

Budy

11K followers
San Francisco Bay Area

View Profile
Vidya Narayanan

Vidya Narayanan

FinalLayer

15K followers
San Francisco Bay Area

View Profile
Sachin Gupta

Sachin Gupta

Breakout

29K followers
San Jose, CA

View Profile
Robert Rose

Robert Rose

Reliable Robotics Corporation

17K followers
San Francisco Bay Area

View Profile

Explore more posts

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses

See all courses

See your mutual connections View mutual connections with Robert Robert can introduce you to 10+ people at Anyscale Sign in with Email or New to LinkedIn? Join now By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Activity

28K followers

Marcell Ferencz

Christian Stano

Anyscale

Robert Nishihara

Richard Liaw

Richard Liaw

Keerti Melkote

Chad Carlisle

Julian Forero

Robert Nishihara

Seiji Eicher

Marcell Ferencz

Christian Stano

Anyscale

Anyscale

Richard Liaw

Experience & Education

Anyscale

**********

View Robert’s full experience

See their title, tenure and more.

View Robert’s full profile

Other similar profiles

Josh Lessing

Jian Huang

Matt Conger

Robert Sweeney

Joel Butterly

Kieran Snyder

Samay Kohli

Vidya Narayanan

Sachin Gupta

Robert Rose

Explore more posts

Explore top content on LinkedIn

Add new skills with these courses

Complete Guide to AI and Data Science for SQL: From Beginner to Advanced

Advanced Python: Top Tools for Data Science and Engineering

Python Data Structures and Algorithms

View mutual connections with Robert

Robert can introduce you to 10+ people at Anyscale

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.