Google has unveiled Gemma 4 on its Cloud platform, marking a significant advancement in open model technology.
Key Features: Gemma 4 represents the most capable family of open models to date, built on the same research foundations as Gemini 3. These models support context windows of up to 256K, native processing for vision and audio, and are fluent in over 140 languages. They excel in complex logic execution, offline code generation, and agentic workflows.
Importance for Businesses: As enterprises increasingly rely on AI, the need for models that can execute complex logic while ensuring data security has become paramount. Gemma 4 offers a solution that balances these requirements, allowing organizations to deploy models across Google Cloud while adhering to compliance standards, including Sovereign Cloud solutions.
Getting Started with Gemma 4
Vertex AI: Users can deploy Gemma 4 on their own Vertex AI endpoints. By selecting the model from the Model Garden, organizations can provision the necessary compute resources tailored to their applications. This approach allows for direct control over infrastructure and costs while maintaining data security within the Google Cloud environment.
Fine-tuning is also possible using Vertex AI Training Clusters (VTC), which provide optimized recipes and resilient performance through NVIDIA NeMo Megatron. This flexibility enables adaptation to various model sizes, from the 2B model for edge tasks to the 31B dense model for complex enterprise needs.
Agent Development Kit (ADK): The ADK is a modular open-source framework designed for developing AI agents. With Gemma 4, users can leverage advanced capabilities such as reasoning, function calling, and structured output to create fully functional AI agents.
Cloud Run: Gemma 4 can now run inference workloads on Cloud Run, utilizing NVIDIA RTX PRO 6000 GPUs. This setup allows for efficient deployment of models like Gemma-4-31B-it on serverless GPUs, with the infrastructure managed by Cloud Run. The service scales dynamically based on demand, ensuring cost optimization and flexibility in resource allocation.
Google Kubernetes Engine (GKE): GKE offers a customizable environment for deploying Gemma 4, suitable for teams needing precise control over their AI infrastructure. Organizations can tailor compute resources, choose specific GPU or TPU accelerators, and implement custom autoscaling metrics to align with their traffic patterns. GKE also supports efficient scaling of inference workloads, optimizing resource utilization and costs.
Future Developments: Gemma 4 is poised to drive the next generation of agentic applications on Google Cloud. Its multi-step planning capabilities, combined with the GKE Agent Sandbox, allow for secure execution of LLM-generated code in isolated environments. The GKE Inference Gateway enhances this by providing predictive latency features that significantly reduce response times.
TPU Availability: Gemma 4 will also be accessible on TPUs across Google Cloud, supporting various open-source TPU projects for serving, pretraining, and post-training tasks.
- Pretraining and post-training can utilize MaxText for customization in text analysis and reasoning.
- For production workloads, vLLM TPU will facilitate online serving and batch inference using prebuilt containers and tutorials.
Sovereign Cloud Solutions: Gemma 4 will be available across all sovereign cloud offerings, ensuring organizations maintain control over their data and operational environments. This commitment to digital sovereignty allows enterprises to innovate rapidly while complying with regional data residency requirements.
Next Steps
Organizations can begin building with Gemma 4 today, leveraging its capabilities across Vertex AI and Sovereign Cloud. This launch provides a robust foundation for enterprises looking to enhance their AI strategies while ensuring security and compliance.