Google Unveils Advanced AI Infrastructure for the Agentic Era

The landscape of artificial intelligence is evolving beyond simple question-and-answer capabilities, progressing into a realm where reasoning and action are paramount. To thrive in this agentic era, businesses require computing infrastructure that is specifically designed and optimized for these new demands. At Google Cloud Next, the company unveiled a suite of new AI infrastructure features aimed at accelerating innovation, enhancing user experiences, and optimizing costs and energy efficiency.

Transitioning to Agentic Intelligence

In the agentic era, a single intention can trigger a chain reaction. Unlike basic chatbots, the primary AI agent breaks down goals into detailed tasks, assigning them to specialized agent groups. These agents collaborate, maintain state, and utilize reinforcement learning to produce results in real time. While this increases the density of intelligence per interaction, it also introduces complexities that existing architectures struggle to manage without escalating costs or performance bottlenecks.

To effectively scale, a shift away from manually integrating fragmented components and technologies is essential. An integrated infrastructure stack that encompasses dedicated hardware, open software, and flexible consumption models is necessary to deliver a smart, fast, scalable, and cost-efficient agentic experience.

Google's AI Hypercomputer is built and optimized for the agentic era, designed to meet these new requirements. It serves as the foundation for Google’s flagship model, Gemini, along with consumer AI services and enterprise solutions. The company announced a significant expansion of its AI infrastructure portfolio, including:

TPU 8t and TPU 8i: Google’s eighth-generation Tensor Processing Units
A5X Bare Metal Instances: Based on NVIDIA Vera Rubin NVL72
Axion N4A VM: Featuring Google’s custom Axion Arm-based CPU
Google Compute Engine 4th Gen VM: Utilizing Intel and AMD x86-based CPUs
Virgo Network: An innovative data center fabric for AI workloads
Google Cloud Managed Lustre: A high-performance parallel file system
Z4M VM: Large local SSD storage and RDMA support for open parallel file systems
KV Cache: An expandable storage subsystem
Native PyTorch support for TPU
New GKE features for orchestrating agent-native workloads

These advancements are set to accelerate the development of models and complex agentic workflows, enabling faster innovation and responsive services while reducing costs and promoting responsible energy use.

Introducing the 8th Generation TPU Systems

The announcement of the eighth-generation Tensor Processing Units (TPUs) marks a significant milestone. This generation features two distinct chips and specialized systems tailored for the agentic era.

TPU 8t: Optimized for high-throughput AI workloads, offering approximately three times the computing performance of its predecessor. This unit can house 9,600 chips in a single superpod, delivering 121 exaflops of computational power with 2 petabytes of shared memory.
TPU 8i: Designed for inference and reinforcement learning, providing ultra-low latency. It features enhanced on-chip SRAM and high-bandwidth memory, significantly improving cost efficiency per inference.

Both TPU 8t and TPU 8i will soon be available to cloud customers.

A5X Instances Based on NVIDIA Vera Rubin Platform

Recognizing that there is no one-size-fits-all solution, Google collaborates closely with NVIDIA to offer the latest GPU platform as a reliable and scalable service on Google Cloud. Google will be among the first partners to provide next-generation instances based on the Vera Rubin platform.

Additionally, the companies are co-designing the open-source Falcon networking protocol to enhance reliable transmission protocols.

Enhancing Agentic Logic and Reinforcement Learning with Axion

While GPUs and TPUs excel in training and serving AI models, high-performance CPU-based services are essential for managing complex logic surrounding core AI models. Google’s new Axion-based N4A CPU instances deliver excellent cost efficiency, outperforming competitors by up to 30% in agent workloads.

Virgo Network for Data Center-Grade Scale-Out Fabric

The Virgo Network, part of the AI Hypercomputer, is designed to meet the demanding requirements of modern large-scale AI workloads. Its integrated fabric architecture significantly enhances bandwidth and eliminates scaling costs, allowing for efficient expansion of ambitious AI workloads.

Supports clustering of up to 134,000 TPUs within a single data center and over 1 million TPUs across multiple sites.
Enables GPU connections of up to 80,000 in a single data center and 960,000 across multiple sites.

Innovations in Storage to Minimize Data Bottlenecks

The efficiency of large computing clusters is heavily dependent on the performance of the storage systems supplying data. Google introduces four key storage innovations to prevent bottlenecks:

Accelerated Learning and Inference: Google Cloud Managed Lustre now offers 10TB/s bandwidth, significantly faster than competitors.
Minimized Latency: Managed Lustre utilizes TPUDirect and RDMA to enable direct data movement to accelerators.
Maximized Learning Utilization: Rapid Buckets in Google Cloud Storage maintain over 95% utilization rates for accelerators.
Custom Solution Development: Z4M instances provide scalable local SSD capacity for high-performance parallel file systems.

GKE: Orchestrating Agent-Native Workloads

In the agentic era, the efficiency of intelligence is directly tied to scaling speed. Google Kubernetes Engine (GKE) has been transformed into a premier orchestration engine for agent-native workloads.

Accelerated Node and Pod Start Times: GKE now offers up to four times faster node start speeds and up to 80% faster pod start times.
Rapid Model Loading: Leveraging run:AI Model Streamer and Rapid Cache to enhance model loading speeds.

GKE’s Inference Gateway employs machine learning-based real-time capacity-aware routing, significantly reducing latency and improving user interactions.

An Open Software Ecosystem for the Entire AI Lifecycle

Hardware achieves its full potential when paired with well-designed software. The AI Hypercomputer optimizes support for widely used frameworks like JAX and PyTorch, facilitating faster workflows for engineers.

The introduction of TorchTPU for native PyTorch support allows for seamless execution of models on TPU, emphasizing Google’s commitment to openness and customer choice.

Foundation for Agentic Growth

To innovate quickly and cost-effectively in the agentic era, integrated systems that do not compromise on performance or choice are essential. The AI Hypercomputer provides this value by designing all layers together, allowing teams to focus on advancing their business.

This integrated stack supports all of Google’s top-tier services, ensuring that infrastructure innovations translate directly into business value.