Google Cloud Introduces Fluid Compute for Enhanced AI and General Workloads

During Google Cloud Next '26, significant advancements were announced aimed at enhancing the performance and cost-efficiency of core general workloads and AI tasks within an agentic framework.

Importance of the Announcement: IT leaders and developers face the challenge of balancing computing investments and resources across general use cases, such as web servers and databases, while adapting to unpredictable demands generated by agents.

These agents can create sudden spikes in demand, necessitating high throughput and low latency for tasks triggered by single user interactions. Conversely, general workloads are essential for generating and storing the data that powers this agentic world. Relying on static and fragmented infrastructure can lead to performance bottlenecks and skyrocketing costs, ultimately hindering organizations' ability to respond to surges in demand.

Consider a global travel application where a simple vacation search can trigger complex orchestration tasks like inventory checks and dynamic pricing models. Without a modern architecture, such demand spikes could paralyze critical reservation databases and disrupt business operations.

To address these issues, Google Cloud introduced Fluid Compute, an adaptive infrastructure designed to support both general and agentic workflows. Fluid Compute enables real-time adjustments in performance, capacity, and scale, ensuring both workloads function effectively. This dynamic flexibility is built on the automated orchestration of Google Kubernetes Engine (GKE) and the new Agent Sandbox, which provisions secure, isolated execution environments at machine speed.

Integration of AI and General Workloads

Agentic planning and reinforcement learning depend heavily on highly flexible computing to manage unpredictable surges in autonomous tasks. Relying on static infrastructure for isolating agent-generated code can lead to significant provisioning delays and increased cloud budgets. Adopting an adaptive cloud foundation can eliminate these bottlenecks. Utilizing the GKE Agent Sandbox allows teams to securely launch thousands of execution environments, accelerating AI innovation while optimizing total cost of ownership (TCO).

Key announcements regarding Google Cloud's computing capabilities include:

Google Axion N4A General Availability: Leverage the agility of Google's custom Arm-based Axion CPU, achieving up to twice the cost-effectiveness compared to the latest x86-based VMs for cost-sensitive workloads like Java applications and SaaS.
GKE Agent Sandbox Launch: The industry's only native sandbox service allows agents to execute untrusted code and tool calls securely without performance degradation, providing scalable low-latency infrastructure.
First Axion Bare Metal Instance Preview: The C4A.metal instance supports various workloads, including Android development and CI/CD pipelines, without the complexity of nested virtualization.
C4 Instances with Expanded Intel Xeon 6 Support: Achieve high performance for AI workloads with support for native FP16 and improved cost-effectiveness.
Flexible Committed Use Discounts: Optimize TCO while allowing flexible spending across regions and VM families.

Customer Success Stories

Unity: Improved cost efficiency by 20% by migrating on-demand processor workloads to Google Axion N4A instances.
Deutsche Börse: Modernized key financial applications to enhance performance and reduce TCO by 33%.
WP Engine: Reduced latency by up to 60% for mobile-optimized REST APIs using GKE clusters.
eDreams ODIGEO: Migrated Java-based e-commerce modules to Axion VMs, significantly improving latency.
Chainguard: Implemented Axion C4A bare metal instances for a secure development pipeline.

Unified Execution for I/O and Latency-Sensitive Workloads

Both AI and core workloads rely on the ability to store, read, and move data efficiently. Traditionally, these processes have been slowed by network and storage limitations tied to vCPU counts, leading to insufficient data supply for AI models.

Accelerated Hyperdisk performance now enables rapid data access and high-performance networking, eliminating these constraints. By allowing data pipelines to scale independently from computing, AI training and I/O-sensitive workloads can maintain stability even during peak demand periods.

C4N Preview: Designed for large network applications, C4N can handle up to 95 million packets per second, providing 400Gbps inter-VM bandwidth.
M4N Preview: Addresses memory-intensive database needs by offering high RAM per vCPU, significantly reducing TCO.
Z4D Announcement: Optimizes I/O-intensive workloads with high-performance local SSDs and eliminates network storage bottlenecks.

Addressing Challenging Storage Requirements

Basic workloads like web servers and databases hold the crucial data needed to drive the agentic world. Storing this information on rigid hardware can create bottlenecks that halt modernization efforts.

Organizations need high-performance database hosts backed by high IOPS and throughput to prevent data delivery blockages. Migrating these applications to modern cloud infrastructure can significantly enhance TCO and operational throughput, removing architectural barriers to modernization and opening data for AI.

Hyperdisk Balanced Performance Improvement: Now supports up to 2.4 GiB/s throughput and 160K IOPS, enhancing performance for general workloads.
Hyperdisk ML Performance Improvement: Increases integrated throughput to 2 TiB/s, eliminating AI storage bottlenecks.
Z4M Announcement: Offers local SSDs and high network bandwidth to support distributed parallel file systems.

Conclusion

With Fluid Compute, Google Cloud enables organizations to avoid bottlenecks and allows core workloads and AI agents to thrive together in a collaborative cloud infrastructure.