Gemma 4 open models available on Google Cloud
Today, we’re introducing Gemma 4 — our most intelligent open models to date. Purpose-built for advanced reasoning and agentic workflows, Gemma 4 delivers an unprecedented level of intelligence-per-parameter.
In this edition, we’ll share six reasons Gemma 4 is our most capable open model family yet, and all the surfaces you can get started with today on Google Cloud.
Six reasons Gemma 4 is our most capable open family model yet
- Advanced reasoning: Capable of multi-step planning and deep logic, Gemma 4 demonstrates significant improvements in math and instruction-following benchmarks that require it.
- Agentic workflows: Native support for function-calling, structured JSON output, and native system instructions enables you to build autonomous agents that can interact with different tools and APIs and execute workflows reliably.
- Code generation: Gemma 4 supports high-quality offline code, turning your workstation into a local-first AI code assistant.
- Vision and audio: All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding. Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding.
- Longer context: Process long-form content seamlessly. The edge models feature a 128K context window, while the larger models offer up to 256K, allowing you to pass repositories or long documents in a single prompt.
- 140+ languages: Natively trained on over 140 languages, Gemma 4 helps developers build inclusive, high-performance applications for a global audience.
Why it matters for your business: Enterprise AI requires models that execute complex logic while keeping data within secure boundaries. Gemma 4 gives you this balance. Organizations can deploy these models across Google Cloud to meet strict compliance guarantees, including Sovereign Cloud solutions. This provides a foundation for digital sovereignty, granting teams complete control over their data, infrastructure, and models.
Where you can get started with Gemma 4
Vertex AI
Deploy Gemma 4 to your own Vertex AI endpoints. Select the model from Model Garden and provision the specific compute resources your application requires. This self-deployment model gives you direct control over your serving infrastructure and costs while keeping your data within your Google Cloud environment.
You can also fine-tune Gemma 4 using Vertex AI Training Clusters (VTC), which offer optimized SFT recipes and high-scale resiliency through NVIDIA NeMo Megatron. This ensures you can efficiently adapt any variant, from the effective 2B (E2B) model for edge tasks to the 31B dense model for complex enterprise orchestration.
Additionally, we're committed to empowering customer choice and innovation through our curated collection of first-party, open, and third-party models available on Vertex AI. That’s why we're thrilled to announce that Gemma 4 26B MoE model will be available as fully managed and serverless on Model Garden over the coming days.
Agent Development Kit (ADK)
ADK is a flexible and modular open-source framework for developing and deploying AI agents. Gemma 4 offers advanced agentic capabilities, including reasoning, function calling, code generation, and structured output. ADK helps you build fully functional AI agents with Gemma 4. Start building AI agents with Gemma 4 and Google ADK today.
Cloud Run
You can now run demanding Gemma 4 inference workloads efficiently on Cloud Run, leveraging the power of NVIDIA RTX PRO 6000 (Blackwell) GPUs. With 96GB of vGPU memory, you can easily deploy models like Gemma-4-31B-it on serverless GPUs.
Cloud Run handles the underlying infrastructure, allowing you to focus on your applications. Your models scale to zero when inactive and dynamically adjust with demand, ensuring optimized costs as you only pay for what you use. Plus, you have the flexibility to tailor CPU and memory configurations for each inference workload. Try it out now, on demand with no reservations, in us-central1 or europe-west4.
Recommended by LinkedIn
Google Kubernetes Engine (GKE)
GKE provides a highly scalable and customizable environment for deploying Gemma 4, perfect for teams that require fine-grained control over their AI infrastructure. By managing your own infrastructure on GKE, you gain the flexibility to tailor compute resources, select specific GPU or TPU accelerators, and implement custom autoscaling metrics that match your exact traffic patterns. This level of control also ensures your AI workloads can seamlessly integrate with your existing microservices while adhering to your organization's strict security and data compliance requirements.
Starting today, you can efficiently serve Gemma 4 models on GKE using vLLM, a high-throughput and memory-efficient LLM serving engine. By leveraging GKE, you can seamlessly scale your inference workloads from zero to peak demand while optimizing your resource utilization and costs. To help you get started, check out our newly updated tutorial on how to serve Gemma 4 on GKE.
Looking ahead, Gemma 4 is uniquely positioned to power the next generation of agentic applications on Google Cloud. Pairing Gemma 4’s multiit-step planning capabilities with the new GKE Agent Sandbox, developers can safely execute LLM-generated code and tool calls within highly isolated, Kubernetes-native environments that offer sub-second cold starts with up to 300 sandboxes per second for secure, efficient multi-step planning.
Google Cloud TPUs
Gemma 4 will be available on TPUs across Google Cloud through GKE, GCE, and Vertex AI. Starting today, you can now use a number of popular open source TPU projects to serve, pretrain, and post-train Gemma-4-31B dense and Gemma-4-26B-A4B MoE.
- For pretraining and post-training experimentation, you can leverage MaxText and perform post training to customize for text analysis and generation, reasoning and image analysis use cases.
Stay tuned for community- contributed SGLang-JAX tutorials.
Sovereign Cloud
Gemma 4 will be available across all our sovereign cloud offerings, including public cloud with Data Boundary, Google Cloud Dedicated (such as S3NS in France), and Google Distributed Cloud for air-gapped and on-premises deployments. This expansion reinforces our commitment to an open, sovereign digital world where organizations maintain total control over their data, encryption, and operational environment.
By providing open weights, Gemma 4 empowers developers to build specialized solutions for highly sensitive environments. Enterprise and government agencies can now deploy localized services that respect regional nuances and domain expertise while meeting strict data residency and sovereignty rules. This approach ensures that organizations can innovate rapidly with AI while remaining fully compliant with national and industry requirements.
Get started today
By choosing Gemma 4 on Google Cloud, enterprises and sovereign organizations gain a trusted, transparent foundation that delivers state-of-the-art capabilities while meeting the highest standards for security and reliability. Learn more in our blog here.
👍
We’re still waiting for the Gemma 4 model to be available on Ollama 🤗
“Huge respect and congratulations to you 😎🔥 Your success speaks louder than words, truly inspiring to see your journey growing so strong 🚀 Keep shining like this, because people like you set the standard, not follow it 💯
I like the open Gemma 4 model, especially for content related work... also when passing the whole email thread to summarize via cli automation...
Yaa it a very confusing to the poster of google