Agent Factory Recap: How Gemma 4 Taught Itself Physics

In this episode of The Agent Factory, Vlad Kolesnikov and I sat down with Omar Sanseviero from the Developer Experience team at Google DeepMind. We explored the groundbreaking release of Gemma 4: a new family of open models designed to bring high-level intelligence and agentic capabilities directly to consumer hardware and mobile devices. Since the launch last month, Gemma 4 had over 50 million downloads!

This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.

Gemma 4 - What is it?

Gemma 4 is the latest generation of open models from Google DeepMind, built on the same foundational research as Gemini 3. The family is designed to deliver exceptional "intelligence per parameter" across a range of deployment scenarios, from mobile phones to powerful workstations.The Gemma 4 model family now spans three distinct architectures:

Small Sizes (E2B & E4B): Optimized for ultra-mobile, edge, and browser deployment (such as Pixel or Chrome).
Dense (31B): A powerful 31-billion parameter model that provides server-grade performance for local execution on consumer GPUs.
Mixture-of-Experts (26B MoE): A highly efficient architecture designed for high-throughput tasks and advanced reasoning.

With the shift to an Apache 2 license, these models provide developers and startups with the flexibility to build, modify, and commercialize applications while maintaining full control over their infrastructure.

Omar Sanseviero on how Gemma 4 changes the landscape for agent developers

Timestamp: 1:40

Omar highlighted that Gemma 4 brings "very high intelligence per parameter," making it possible to run agentic workflows entirely offline. We saw examples of multiple Gemma instances running locally to generate SVGs (1:53) and an Android-based agent picking specific skills, like playing the piano, to complete tasks (2:45). As Omar noted, "This means that you can run very powerful things with very little hardware overhead...even in the phone that you have in your pocket."

The Factory Floor

Building a Local Food Tour Agent

Timestamp: 5:29

We showcased a food tour agent powered by Gemma 4 using the Agent Development Kit (ADK) and a Google Maps MCP server. We demonstrated how a local model can handle complex, multi-step reasoning tasks.

The agent identified the best ramen spots in Seattle under a $30 budget.
It verified that the locations were within walking distance of each other.
It processed search results to provide specific tips on what to order and what to avoid.

Autonomous Python Code Execution

Timestamp: 8:03

In this demo, we pushed Gemma 4’s coding capabilities to the limit by asking it to express itself through animation. Using a sandbox execution environment, the model performed the following:

Wrote Python code using the Matplotlib library.
Attempted to build a physics engine to simulate a bouncing ball.
Self-corrected when the initial execution environment lacked certain CPU features, finding an alternative path to successfully generate the animation.
Demonstrated a deep understanding of real-world physics and gravity through code.

The Shift to Apache 2 Licensing

Timestamp: 4:05

A major theme of the conversation was the community-driven decision to move Gemma 4 to an Apache 2 license. This change provides developers and startups with maximum flexibility to build, modify, and commercialize applications. Omar emphasized that this was a direct response to developer feedback, aiming to unlock a new wave of innovation in the open models ecosystem.

Developer Q&A

Architectural Decisions and Mixture of Experts (MoE)

Timestamp: 17:23

Omar explained the technical shifts that make Gemma 4 so efficient. For the first time, the Gemma family includes a Mixture of Experts (MoE) architecture, which optimizes for extremely low latency in production. Additionally, the smaller E2B and E4B models utilize per-layer embeddings to remain "cheap" to run on GPUs. For vision tasks, the model now supports variable aspect ratios, allowing it to understand images of various sizes more accurately than previous fixed-resolution versions.

Comparing Gemma to Gemini

Timestamp: 19:51

When asked how Gemma stacks up against its larger sibling, Gemini, Omar clarified that they serve different purposes. While Gemini excels at massive-scale tasks and deep "world knowledge" due to its size, Gemma is the "best open model that can run on a single consumer GPU." It is specifically optimized for instruction following, coding, and agentic use cases where local deployment or fine-tuning is required.

Fine-Tuning for Specialized Industries

Timestamp: 21:10

The conversation touched on the importance of "Sovereign AI" and privacy. Because Gemma is an open model, developers in regulated industries, like healthcare or finance, can fine-tune the model on their private data and deploy it within their own air-gapped infrastructure. This gives developers full control over their data and the model's specialized expertise.

Conclusion

Gemma 4 marks a turning point for agentic development, proving that you don't always need a massive cloud cluster to build something smart. Whether it's running a physics simulation on a laptop or a travel guide on a phone, the barrier to entry for high-performance AI has never been lower. We are entering an era where the "conductor" of the AI orchestra can be any developer with a single GPU and a great idea.

Your turn to build

Now that you've seen what Gemma 4 can do, it's time to start building. Check out the resources in our show notes, the food tour agent, the coding agent, explore the ADK support, and try running Gemma 4 on your local machine or on Cloud Run. We can't wait to see what agents you create!

Watch more of The Agent Factory → Reinforcement learning & fine-tuning on TP...

Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech