Akshay Malik

San Francisco Bay Area

Sign in to view Akshay’s full profile

Akshay can introduce you to 10+ people at Anyscale

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

3K followers 500+ connections

View mutual connections with Akshay

Akshay can introduce you to 10+ people at Anyscale

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Join to view profile

Anyscale

University of California, Berkeley, Haas School of Business

Activity

3K followers

Akshay Malik reposted this
Report this post
Akshay Malik reposted this

Seiji Eicher

Seiji Eicher

2w

Akshay Malik reposted this
Today we are excited to announce, in partnership with the Google Kubernetes Engine (GKE) team at Google Cloud, a major milestone in Ray Serve LLM’s throughput and latency characteristics: Ray Serve LLM now matches high performance, rust-based routing frameworks such as vllm-router in benchmarks across a variety of workloads and deployment patterns. In our new blog, we cover three major optimizations to the Ray Serve LLM + vLLM stack that made this possible: direct streaming, a new vLLM Ray executor backend, and HAProxy integration. As a result, we see up to 4.4x higher request throughput than previous versions on prefill-heavy workloads, and up to 24x higher request throughput on decode-heavy workloads. Ray is a popular choice for complex, Python-native distributed computing batch inference pipelines with heterogeneous hardware. And now, we believe that Ray’s powerful primitives for fault tolerance, observability, flexibility across Kubernetes and VMs will enable the next generation of optimizations as LLM inference deployments become increasingly complex. Thanks to Spencer Peterson, Andrew Sy Kim, Kourosh Hakhamaneshi, Jeffrey (Yu-Che) Wang, Richard Liaw, Akshay Malik, Abrar Sheikh, and Alex Yang whose contributions to Ray Serve and Ray Serve LLM made this possible. A special thanks to the vllm-router (vLLM) and SGLang Model Gateway (SGLang) teams for great engineering on their respective projects. Read the full writeup here: https://lnkd.in/guHrz_FA

public_profile__posts
4 Comments
Akshay Malik reposted this
Report this post
Akshay Malik reposted this

Goku Mohandas

Goku Mohandas

2w

Akshay Malik reposted this
Wrote a technical deep dive on Satya Nadella's point that the moat is the learning loop you build on a model (not the model you rent). A top-down walkthrough, with diagrams and code, of why companies across industries (finance, robotics, autonomy, ecommerce, and biology) are already doing it and how they're winning.

A technical guide to building your own learning loop

A technical guide to building your own learning loop

Goku Mohandas
5 Comments
Akshay Malik reposted this
Report this post
Akshay Malik reposted this

Kourosh Hakhamaneshi

Kourosh Hakhamaneshi

2w

Akshay Malik reposted this
One pattern we keep seeing with teams serving LLMs at scale: Prefill-decode disaggregation is often treated like a magic wand. Turn it on, and latency/throughput should get better. But the reality is more nuanced. PD can deliver major gains — in our experiments, up to 2.7x better goodput and up to 67% compute cost reduction on AMD MI325X with Ray Serve + vLLM — but only when the workload and SLA are a good fit. That is why we wrote this post: to share the core insights for when PD helps, when it does not, and how to reason about it in practice. A few takeaways: 1. PD does not make prefill faster. It adds a KV transfer step, so TTFT can get worse. If your SLA is strictly TTFT-bound, aggregated serving is often simpler and better. 2. PD’s real win is TPOT. By separating prefill and decode onto dedicated GPUs, decode avoids prefill interruptions and TPOT stays much flatter under load. 3. TPOT savings compound over generation length. A few milliseconds per token may look small, but over hundreds or thousands of generated tokens, it can become a meaningful E2E latency and throughput improvement. 4. The P ratio is workload-dependent. Input/output length, KV cache hit rate, target QPS, and latency SLA all affect the optimal split. A bad ratio can make PD worse than aggregated. We also validated this on AMD + vLLM, where the path for prefill-decode disaggregation has been much less paved. Full post with intuition, benchmarks, and reproducible AMD + Ray + vLLM setup: https://lnkd.in/gnrmmSrK

Achieving Up to 67% Cost Savings with Prefill-Decode Disaggregation Using Ray + vLLM on AMD MI325X | Anyscale

Achieving Up to 67% Cost Savings with Prefill-Decode Disaggregation Using Ray + vLLM on AMD MI325X | Anyscale
6 Comments
Akshay Malik reposted this
Report this post
Sumanth R Hegde

Sumanth R Hegde

1mo

Akshay Malik reposted this
Check out our work on native RL APIs for vLLM! Blog: https://lnkd.in/giq22pwP

Aaron Hao

Aaron Hao

1mo

Akshay Malik reposted this
Excited to share some of our work on improving vLLM for RL! A number of RL frameworks, including SkyRL, use vLLM for inference, and we’ve noticed some common problems: Weight syncing between training and inference is implemented in an ad-hoc fashion and duplicated across frameworks. Asynchronous RL is prone to break at scale, especially in P/D and DPEP deployments. We’ve been working on improving both! For more details check out: https://lnkd.in/geWNvSav 𝗪𝗲𝗶𝗴𝗵𝘁 𝘀𝘆𝗻𝗰𝗶𝗻𝗴 𝗶𝗻 𝘃𝗟𝗟𝗠 Weight syncing with vLLM has typically been implemented with ad-hoc worker extensions and RPC endpoints. While this works, it leads to a few issues. Most frameworks typically care about specifying the transport logic of how exactly vLLM will receive weights, but now they also need to deal with ad-hoc pre/postprocessing. Many frameworks also end up duplicating transport logic for popular strategies like NCCL, CUDA IPC as well as implementing the same optimizations (ex: packed tensor). This also leads to version locked implementations because they reach into vLLM internals. We introduce native APIs for weight transfer: /init_weight_transfer_engine, /start_weight_update , /update_weights and /finish_weight_update for differents stages of weight transfer. Along with these endpoints is a WeightTransferEngine abstraction allowing users to specify custom transport logic for receiving weights. We provide NCCL and CUDA IPC implementations out of the box, but framework developers can bring their own. The APIs being simple still allows for advanced use-cases like sharded weight transfer from M trainer ranks to N inference ranks. See my prototype here: https://lnkd.in/gbZAhNik 𝗔𝘀𝘆𝗻𝗰𝗵𝗿𝗼𝗻𝗼𝘂𝘀 𝗥𝗟 𝘄𝗶𝘁𝗵 𝘃𝗟𝗟𝗠 Async RL is the default for reasoning and agentic RL to maximize utilization with long tailed trajectories. We’ve worked on upgrading async RL with vLLM: - New pause mode to preserve requests in the scheduler: Users don’t need to manually bookkeep requests - Deadlock fixes for DPEP: This one took many iterations! DPEP requires careful coordination between vLLM engines for generation, and we ensure the same with weight syncing! The fixes have been tested at scale - Prime Intellect has validated async RL training for zai-org/GLM-5.1-FP8 in a P/D, DPEP32 deployment across 16 8xH200 nodes. It’s been great working with Sumanth R Hegde on this effort!

public_profile__posts
Akshay Malik

Akshay Malik

1mo
Report this post
Akshay Malik shared this
The Ray Core and Ray Data teams at Anyscale are actively hiring system engineers in Bengaluru! Ray Core serves as the cornerstone of the entire Ray ecosystem and powers libraries like Ray Train, Ray Data, and Ray Serve — quickly adopted by companies like OpenAI, DeepSeek, Spotify, Uber, DoorDash, Pinterest, Apple, and many more. Ray Data takes this further as the scalable data processing engine behind the training and inference pipelines of some of the largest AI models in the world. In this role, you will play a pivotal part in shaping the future of Ray and Anyscale. Particularly in the context of the growing importance of opensource and LLMs, you will be a crucial contributor to our strategic goal of establishing ourselves as the compute substrate of this unprecedented AI wave. If the prospect of tackling challenges like: → Scaling clusters to 10K+ nodes → Optimizing network transfer speed for petabyte-scale workloads → Building the data and execution infrastructure behind large multi-modal models → Pushing the limits of distributed training and inference …excites you, I encourage you to apply or message me directly if you have any questions. 📍 Bengaluru 🔗 Open roles: https://lnkd.in/ga6R6bvJ #Hiring #Bengaluru #DistributedSystems #RayCore #RayData #OpenSource #LLM #MLInfra

Anyscale Jobs

Anyscale Jobs
3 Comments
Akshay Malik reposted this
Report this post
Akshay Malik reposted this

Kunling Geng

Kunling Geng

2mo

Akshay Malik reposted this
Today, we are officially making Anyscale Agent Skills Generally Available! Over the past year at Anyscale, I've watched the same pattern repeat: teams adopt AI coding agents, point them at Ray workloads, and hit the same walls: wrong GPU configs, stale APIs, broken deploys. The agent writes confident code that fails at runtime. My team and I wanted to fix that at the source. Not by building another chatbot, but by encoding what our field engineering team has learned across hundreds of production Ray deployments directly into the tools developers already use. The part I'm most proud of: agents with these skills don't just generate code. They ask the right questions before writing a single line. They validate GPU memory constraints, use current Ray APIs, and produce configs pulled from tested templates, not hallucinated ones. Now, your agent actually understands how to build and operate Ray: 🔹 Workload Skills: Turn a single prompt into production-ready configs using current Ray APIs and validated templates. 🔹 Platform Skills: Read live logs and metrics from the Anyscale API to diagnose failures, patch code, and redeploy in the exact same conversation. 🔹 Infra Skills: Get guided, step-by-step help deploying Anyscale on Kubernetes or cloud VMs tailored to your specific environment. In addition, Anyscale is launching an limited-access Optimization Services Program where agents paired with our engineers analyze throughput bottlenecks and GPU waste to generate tuning recommendations to help optimize cost and performance of production AI workloads. Read the launch blog: https://lnkd.in/gDki6PMm

public_profile__posts
3 Comments
Akshay Malik reposted this
Report this post
Akshay Malik reposted this

Julian Forero

Julian Forero

2mo

Akshay Malik reposted this
AI is not only shifting to bigger models. It is also shifting toward more complex data pipelines. To build AI models or search services with multimodal capabilities (e.g. vision language models - VLMs, or search over both images and text), teams need to combine a mix of Python libraries and frameworks, along with a combination of CPU and GPU resources, to complete one end-to-end pipeline. While CPUs still play a critical role, GPU demand is increasing for data preprocessing. Preparing data now also requires GPUs, since running embedding generation steps or tasks like image captioning and text summarizaton rely on AI models. This increased demand for GPUs makes workload orchestration more complex, as GPU capacity is limited and not always available in the same cloud or region when you need it. The industry most clearly driving the demands of this modern AI stack is physical AI and its cool to see it all happening in real-time in teams like Multiply Labs, Bonsai Robotics, Physical Intelligence, and others pushing the boundaries of what infrastructure needs to look like for this new world. Link to their stories in the comments

public_profile__posts
3 Comments
Akshay Malik reposted this
Report this post
Akshay Malik reposted this

Varun Bhatia

Varun Bhatia

2mo

Akshay Malik reposted this
I recently had the opportunity to join a panel with Nebius and Anyscale to discuss a challenge every robotics team is facing: 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗶𝗻𝗴 𝗿𝗲𝘀𝗶𝗹𝗶𝗲𝗻𝘁, 𝘀𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗲𝗿𝗮 𝗼𝗳 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗻𝗱 𝗽𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗔𝗜. Brittle infrastructure is one of the biggest bottlenecks to R&D velocity today, but at Multiply Labs, we’ve turned that hurdle into a competitive advantage. I supported the shift to a "one-click" deployment model, building the infrastructure-as-code pipelines that allow our team to deploy seamlessly across AWS, GCP, and Nebius. By partnering with Anyscale to create a fully portable, multi-cloud Ray environment, we’ve been able to drastically reduce the time it takes for a developer to spin up training software. Our training is now cloud-agnostic, letting us follow GPU capacity across clouds without any operational burden. If you’re scaling distributed training and thinking about how your infrastructure needs to evolve as robotics models get more capable, check out the full case study on how Multiply Labs built our multi-cloud foundation using Anyscale: https://lnkd.in/gS28uAXy

Multiply Labs Advances AI for Biologics Robotics on Anyscale

Multiply Labs Advances AI for Biologics Robotics on Anyscale
7 Comments
Akshay Malik reposted this
Report this post
Akshay Malik reposted this

Jeffrey (Yu-Che) Wang

Jeffrey (Yu-Che) Wang

3mo

Akshay Malik reposted this
WideEP has become the industry standard for serving large MoE models like DeepSeek-V3. By distributing experts across a large number of GPUs, WideEP expands effective GPU memory for KV caches — enabling larger batch sizes and higher throughput. But WideEP introduces a critical production challenge: fault blast radius. Because the dispatch-combine collective requires all ranks to participate together, a single GPU failure can take down the entire DP/EP group. At a typical WideEP width of 32 GPUs, one failed GPU means 32 GPUs go dark – and so does your serving availability. Ray Serve LLM solves this with DP Group Fault Tolerance. Here's how it works 👉 WideEP Refresher In DeepSeek-style MoE, the attention layer uses Multi-head Latent Attention (MLA). Unlike standard multi-head attention, where tensor parallelism can be applied across KV heads, MLA compresses the KV cache into a shared latent representation, making KV-head sharding incompatible. The common WideEP strategy is therefore to replicate the MLA layer across all participating ranks and to apply data parallelism (DP) at the request level. In sparse MoE LLMs, the linear layers consist of a collection of smaller linear layers, each representing an expert. For example, DeepSeek-V3 has 256 experts per MoE layer, where only 8 experts are activated at a time. In WideEP deployments, the experts are spread across the participating ranks, and this is known as expert parallelism (EP). Together, the replicated DP attention layer and the sharded EP MoEs form a DP/EP group. Why DP Group is the Right Unit for Fault Tolerance In WideEP deployments, partial DP/EP groups are not functional. Tokens processed on a certain DP attention rank may be routed to an expert living on a different EP rank. The control plane should never expose partial groups to traffic. We apply gang scheduling to deliver this atomicity requirement among a DP group. Ray Serve, acting as the control plane, implements gang scheduling to ensure all ranks among a gang are scheduled, health-checked, torn down, and recovered all together. This gives Ray Serve the right orchestration semantics for WideEP: • All ranks in a DP group are scheduled together. • A rank failure invalidates the DP group. • The faulty DP group is torn down and recreated atomically. • Ray Serve router continues sending traffic to other healthy DP groups. This enables minimal downtime and effective recovery mechanism for serving WideEP deployments in production. Shout out to the team who makes this happen: Kourosh Hakhamaneshi, Abrar Sheikh, and Seiji Eicher! Check out the full writeup here: https://lnkd.in/gMqkmxxE.

public_profile__posts
2 Comments

Akshay Malik liked this
Report this post
Akshay Malik liked this

Lanxiang Hu

Lanxiang Hu

1w

Akshay Malik liked this
Introducing JetSpec: we find speculative decoding can push LLM generation latency to extreme by co-optimizing drafting cost and drafting quality with causal parallel tree drafting. JetSpec reaches up to 9.64× end-to-end speedup on MATH-500 and 4.58× on open-ended chat while keeping lossless. With CUDA graph and kernel optimizations, JetSpec further translates to around 1000 TPS on a single B200 GPU. ⚡️ Prior SD faces a dilemma: 1. AR-style draft heads preserve causality for quality, but drafting cost grows with tree depth. 2. Block-diffusion style heads draft cheaply in one pass, but branches are often scored independently, so deeper paths can become mutually inconsistent. JetSpec enables such speed by drafting a causality-preserving tree in one single pass. 🚀🌳 Check out our project page for demos and how we built it 👇 https://lnkd.in/gjg9rutV 📖 Paper: https://lnkd.in/gcGFWMTp 💻 Code: https://lnkd.in/gywd9vu2

public_profile__reactions
Akshay Malik liked this
Report this post
Akshay Malik liked this

Arun Kumar

Arun Kumar

1w

Akshay Malik liked this
Promoted to Full Professor at UC San Diego. 😊 13 years ago, I almost dropped out of my PhD midway, believing I was not cut out for research. My family and my then advisors/mentors convinced me to continue, via their faith in me and by giving me the freedom to pursue my research interests while staying engaged with their insightful feedback. It took me 7 years (!) to finish my MS + PhD at University of Wisconsin-Madison. Longer than most students around me, but I had internalized a key message in Prof. David Patterson’s famous advice deck: “”” Concentrate on graduating as fast as possible? … To a person in their 40s or 50s, 1 or 2 more years is roundoff error (27 = 29) “”” So, in 2015 I convinced my advisors to fund me for an extra year so that I can finish one last paper for my thesis. That paper almost got rejected by SIGMOD’16, with sharply polarized reviews from 6 (!) reviewers (likely a record for DB venues). But it got accepted after a revision and ended up being a big part of my job talk, spurring new links between the DB and ML/AI worlds. Looking back now as a full professor at 38, perhaps that extra year turned out to be a reasonable trade after all. 😄 I am grateful for the last decade at UCSD being an absolute blast — fantastic students, amazing colleagues/mentors/friends, a thriving and inclusive campus community — and all that in an incredibly beautiful city! I’ve been enjoying working with and helping various parts of UCSD — CSE, HDSI, SDSC, QI/CalIT2, Public Health, Extension, LGBT Resource Center, oSTEM, STARS, MAP — via my service, teaching, mentoring, and/or research, by establishing new bridges with industry/startups/OSS communities, and via outreach to the wider SD/SoCal ecosystem. Looking forward to amping up on all fronts in the coming decades! 🥂 Finally, if you are reading this as a grad student or junior faculty doubting yourself, I have this to say: hold that doubt gently, balanced just enough to perhaps propel your self-growth but without crushing your self-confidence. You have my best wishes!
151 Comments
Akshay Malik liked this
Report this post
Akshay Malik liked this

Amir Haghighat

Amir Haghighat

1w

Akshay Malik liked this
We closed our Series F today at a $13B valuation. Our inference business grew 20x in the last year. I want to explain why: The growth comes from a shift I think is permanent: companies want to own their intelligence layer. Instead of relying exclusively on closed models, teams are post-training open models for their specific use cases. Customers like Abridge, Cursor, Decagon, Harvey, HubSpot, Lovable, Notion, OpenEvidence, and Parallel are building this way. But post-training is still more of an art than a science. That’s why we’ve been working hands-on with customers to build specialized models that match or exceed closed models on the tasks they care about. We provide not just the weights, but also the training recipes and tooling so that they're in charge of the continual learning process. I think more companies, both AI-natives and enterprises, will own their intelligence layer. And I’m excited to help build that future.
127 Comments
Akshay Malik liked this
Report this post
Vatshank Chaturvedi

Vatshank Chaturvedi

2w

Akshay Malik liked this
Our tech report "Dissecting Model Behavior Using Agent Trajectories" on the work behind building the coding agent harness SSA is out at https://lnkd.in/gs2a2Pc7 We discuss how we designed SSA to be minimal and still work well across different frontier model-families (Claude, GPT, Gemini, Qwen). We then analyzed over a 100k agent trajectories to see how different models, even when they are neck-and-neck in accuracy, go about solving problems differently from each other. SSA is fully open-sourced at https://lnkd.in/gvuHAM3c, is simple to use, and comes with pre-packaged configs for benchmarking 21 models from different providers on SWE-Bench-Pro, SWE-Bench-Verified and Terminal-Bench-2. Was awesome working with Gaurav Gupta and big thanks to WEI XIA, Jun (Luke) Huan and Anoop Deoras!

Gaurav Gupta

Gaurav Gupta

2w

Akshay Malik liked this
🚀 Excited to share Part II of our work: “Dissecting Model Behavior Using Agent Trajectories” https://lnkd.in/guXicF4N As we studied what makes agents perform well in real environments, one thing became clear: success rates alone do not tell the full story. In Part I, we showed that a simple intent–execution gap was enough to reach state-of-the-art results on popular agentic benchmarks. For Part II, we went deeper. We conducted a large-scale study across 21 models from diverse model-provider families and collected 138K high-quality agent trajectories. 🧭 By mapping these trajectories into code state spaces and extracting transient behavioral metrics, we found that models with similar pass@1 scores can behave very differently internally. In other words: two models may solve the same task at similar rates, but take fundamentally different paths to get there. These differences are often invisible in aggregate benchmark scores, but become measurable through trajectory-level analysis. This is a joint work with amazing collaborators: Vatshank Chaturvedi Jun (Luke) Huan Anoop Deoras

public_profile__reactions
1 Comment
Akshay Malik liked this
Report this post
Akshay Malik liked this

Anyscale

Anyscale

2w

Akshay Malik liked this
Together with the Google Kubernetes Engine (GKE) team at Google Cloud, we're announcing a major throughput and latency milestone for Ray Serve LLM. With architecture changes across the whole stack, Ray Serve is able to achieve up to 4.4x higher request throughput on prefill-heavy workloads and up to 24.8x on decode-heavy workloads vs. the pre-optimized baseline. Ray Serve LLM now matches vllm-router, a high-performance Rust-based routing framework, while keeping Ray's primitives for fault tolerance, observability, and portability across Kubernetes and VMs for distributed inference. Read more about the three optimizations we made in this blog: https://lnkd.in/gfaBesSn

public_profile__reactions
4 Comments
Akshay Malik liked this
Report this post
Richard Liaw

Richard Liaw

2w

Akshay Malik liked this
Very exciting work! Ray Serve LLM, which helps scale vLLM for distributed, disaggregated, and multi-replica inference, now offers 4.4x higher request throughput for prefill-heavy workloads compared to previous versions of Ray. Combined with its flexibility to easily support inference disaggregation, Ray Serve LLM is now a very competitive offering for large scale distributed inference. The team has implemented a ton of performance optimizations on the stack -- take a look at the blog and give it a try!

Anyscale

Anyscale

2w

Akshay Malik liked this
Together with the Google Kubernetes Engine (GKE) team at Google Cloud, we're announcing a major throughput and latency milestone for Ray Serve LLM. With architecture changes across the whole stack, Ray Serve is able to achieve up to 4.4x higher request throughput on prefill-heavy workloads and up to 24.8x on decode-heavy workloads vs. the pre-optimized baseline. Ray Serve LLM now matches vllm-router, a high-performance Rust-based routing framework, while keeping Ray's primitives for fault tolerance, observability, and portability across Kubernetes and VMs for distributed inference. Read more about the three optimizations we made in this blog: https://lnkd.in/gfaBesSn

public_profile__reactions
Akshay Malik liked this
Report this post
Akshay Malik liked this

Seiji Eicher

Seiji Eicher

2w

Akshay Malik liked this
Today we are excited to announce, in partnership with the Google Kubernetes Engine (GKE) team at Google Cloud, a major milestone in Ray Serve LLM’s throughput and latency characteristics: Ray Serve LLM now matches high performance, rust-based routing frameworks such as vllm-router in benchmarks across a variety of workloads and deployment patterns. In our new blog, we cover three major optimizations to the Ray Serve LLM + vLLM stack that made this possible: direct streaming, a new vLLM Ray executor backend, and HAProxy integration. As a result, we see up to 4.4x higher request throughput than previous versions on prefill-heavy workloads, and up to 24x higher request throughput on decode-heavy workloads. Ray is a popular choice for complex, Python-native distributed computing batch inference pipelines with heterogeneous hardware. And now, we believe that Ray’s powerful primitives for fault tolerance, observability, flexibility across Kubernetes and VMs will enable the next generation of optimizations as LLM inference deployments become increasingly complex. Thanks to Spencer Peterson, Andrew Sy Kim, Kourosh Hakhamaneshi, Jeffrey (Yu-Che) Wang, Richard Liaw, Akshay Malik, Abrar Sheikh, and Alex Yang whose contributions to Ray Serve and Ray Serve LLM made this possible. A special thanks to the vllm-router (vLLM) and SGLang Model Gateway (SGLang) teams for great engineering on their respective projects. Read the full writeup here: https://lnkd.in/guHrz_FA

public_profile__reactions
4 Comments
Akshay Malik liked this
Report this post
Akshay Malik liked this

Jim Fan

Jim Fan

2w

Akshay Malik liked this
Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake. Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is giving Codex an API to the world of atoms, and the rest is emergence. ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel solves the task significantly faster than fewer ones. A part of our NVIDIA GEAR lab now self-improves tirelessly overnight. We just read the reports in the morning. /goal: we all take a holiday and Jensen wouldn't even notice ;) We will be open-sourcing everything, so you can host your self-running robot lab at home too! Project site and paper: https://lnkd.in/g3t4qS8Y

public_profile__reactions
99 Comments

See all activities

Experience & Education

Anyscale

******

** *********** *******
********** ** ******** ** ****************

*********** ** ******* **********
********** ** *********** ********* **** ****** ** ********

****** ** ******** ************** * *** undefined undefined

2018 - 2021
********** ** ******** ** ****************

********* ** ******* ******** ***********

2010 - 2014

View Akshay’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

View Akshay’s full profile

See who you know in common
Get introduced
Contact Akshay directly

Join to view full profile

Other similar profiles

Zhen Long

Zhen Long

Amazon

8K followers
Seattle, WA

View Profile
Sheng Yuan

Sheng Yuan

Zoom

4K followers
Greater Seattle Area

View Profile
Nan Jiang

Nan Jiang

Apple

2K followers
Bellevue, WA

View Profile
Xin(Bella) Wang

Xin(Bella) Wang

Amazon

5K followers
Seattle, WA

View Profile
Victor Peralta Santa Anna

Victor Peralta Santa Anna

Amazon

8K followers
Seattle, WA

View Profile
GARVIT JAIN

GARVIT JAIN

Expedia Group

4K followers
Gurugram

View Profile
Pengfei Tan

Pengfei Tan

Amazon

5K followers
Greater Seattle Area

View Profile
Anshul Rawat

Anshul Rawat

Microsoft

3K followers
Seattle, WA

View Profile
Andrew Le

Andrew Le

Walmart Global Tech

3K followers
United States

View Profile

Explore more posts

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

See your mutual connections View mutual connections with Akshay Akshay can introduce you to 10+ people at Anyscale Sign in with Email or New to LinkedIn? Join now By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Activity

3K followers

Seiji Eicher

Goku Mohandas

Kourosh Hakhamaneshi

Sumanth R Hegde

Aaron Hao

Akshay Malik

Kunling Geng

Julian Forero

Varun Bhatia

Jeffrey (Yu-Che) Wang

Lanxiang Hu

Arun Kumar

Amir Haghighat

Vatshank Chaturvedi

Gaurav Gupta

Anyscale

Richard Liaw

Anyscale

Seiji Eicher

Jim Fan

Experience & Education

Anyscale

*********** ****

View Akshay’s full experience

See their title, tenure and more.

View Akshay’s full profile

Other similar profiles

Zhen Long

Sheng Yuan

Nan Jiang

Xin(Bella) Wang

Victor Peralta Santa Anna

GARVIT JAIN

Pengfei Tan

Anshul Rawat

Andrew Le

Explore more posts

Explore top content on LinkedIn

View mutual connections with Akshay

Akshay can introduce you to 10+ people at Anyscale

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.