Frontier Labs · google deepmind

Google DeepMind Is Done Playing Catch-Up

With Gemini 3.5, 3.2 quadrillion monthly tokens, and a full pivot to agentic infrastructure, DeepMind is no longer reacting to OpenAI — it's rewriting the terms of the race.

Flux Desk·2026-05-10·6 min read

At last year's Google I/O, Sundar Pichai announced they were processing roughly 480 trillion tokens a month. This May, on the same stage at Shoreline Amphitheatre, he corrected the record: it's 3.2 quadrillion. That's a 7x increase in twelve months — a number that would strain credulity if Google weren't the one entity on earth with the infrastructure to make it plausible. The scale of the compute story at Google I/O 2026 was not incidental. It was the argument.

Google DeepMind has spent three years absorbing the accusation that it was perpetually second. Second on chatbots, second on code, second on multimodal, second on agents. The accusation is no longer supportable. What emerged from I/O 2026 is a lab that has stopped trying to win individual model benchmarks and started trying to own the layer beneath them — the agentic infrastructure, the world model, the token economy itself.

Gemini 3.5 Flash Is the Shape of the New Playbook

The flagship announcement wasn't a dramatic numerical leap. It was structural. Gemini 3.5 Flash — described as the first in a family combining "frontier intelligence with action" — beats its predecessor, Gemini 3.1 Pro, on nearly every agentic benchmark while costing 40% less and running 4x faster. This is not a research flex. This is a unit-economics argument aimed directly at every enterprise that ran the math on GPT-4o costs and winced.

Intelligence at the frontier, delivered at commodity margins: that is Google's competitive moat in 2026.

3.5 Flash launched with a 1 million-token context window, audio/video input support, and a 64,000-token output limit. On release day it became the default model in the Gemini app and in Google's AI Mode inside Search — for all users, including the free tier. The strategic meaning of that last clause is easy to understate: Google immediately put its best agentic model in front of 900 million monthly active users and gave it to them for free. The Gemini app has more than doubled its user base in a year, from 400 million to 900 million. That is a distribution advantage OpenAI's API business cannot replicate.

Gemini Omni and the World Model Ambition

If 3.5 Flash is the product, Gemini Omni is the statement of intent. Described internally as a leap forward in "world understanding, multimodality, and editing," Omni accepts any modality as input and generates any modality as output — text, image, audio, video, in any combination. Gemini Omni Flash is available immediately to developers; the full Omni model is being staged.

The "world model" framing is deliberate. DeepMind's research roots — AlphaGo, AlphaFold, now AlphaProof — have always been oriented toward systems that build internal models of reality rather than systems that pattern-match to surface outputs. Gemini Omni is the first product expression of that research DNA at scale. It is not a chatbot with video bolted on. It is an attempt to build a model that reasons about physics, causality, and temporal coherence across modalities.

The Nobel Prize awarded to AlphaFold last year wasn't lost on the industry. It anchored DeepMind's credibility as a scientific institution in a way that no product benchmark can. When Google says Omni "understands" video, there is now a track record that gives that claim weight it wouldn't have had three years ago.

Antigravity and the Agent Infrastructure Play

The announcement that drew the least consumer coverage and the most enterprise attention was Antigravity: Google's unified platform for builders working with agents, shipping with a standalone desktop app. Antigravity is not a framework in the LangGraph or CrewAI sense. It is an orchestration surface — a place where agents, tools, memory, and human review points get wired together without requiring a team to own the plumbing.

Alongside it: Gemini Spark, a personal agent that runs continuously in the background inside the Gemini app, connected to Google Workspace tools and opening to third-party integrations via Model Context Protocol. The MCP move is meaningful. It signals that Google has accepted the emerging agent interoperability layer rather than trying to replace it — a pragmatism that has historically been hard for Google to execute.

The bet is that the agent surface, not the model, is where user lock-in gets built.

This tracks with what Satya Nadella has been saying at Microsoft about outcome-based pricing. Google hasn't moved there publicly yet, but Antigravity's architecture — agents that complete tasks in the background, not conversations users must initiate — is precisely the infrastructure you build when you intend to charge for results rather than tokens.

Veo Holds the Line on Video

On the generative video front, Veo 3.1 Lite entered the market in March as a lower-cost, developer-accessible model supporting text-to-video and image-to-video. It is not Google's most capable video model. It is Google's most accessible one — a deliberate move to seed the Veo ecosystem among developers who are building product features, not one-off cinematic demos.

The broader video arms race is real: Sora, Kling, and Veo are all within striking distance of each other on quality benchmarks, and the differentiation is rapidly shifting to latency, cost, and API ergonomics. Google's distribution advantage — Veo integrated directly into Workspace, YouTube creation tools, and Google Cloud's Vertex AI — is the moat here, not raw generative quality.

The Lab That Runs the Whole Stack

What makes Google DeepMind distinctive in June 2026 is not any single model. It is the vertical integration. TPU v6 (Trillium) powers training. Gemini Flash powers the free tier. Omni anchors the world model research agenda. Veo holds video. AlphaFold holds scientific credibility. Antigravity holds the agent surface. Gemini Spark holds the consumer daily habit.

No other lab controls this many layers simultaneously. Anthropic builds great models. OpenAI builds a great API and consumer product. Meta builds great open weights. None of them own the compute, the distribution, the consumer surface, the enterprise cloud, and the foundational research in a single org.

The 7x token growth is the tell. It means developers are building on Gemini at a rate that keeps compounding. Every new model generation gives them a reason not to switch. The catch-up story is over.

The harder question — the one that doesn't show up in keynote slides — is whether the company that built Gmail and Search can actually ship with the speed and precision the agentic era demands. Google's historical failure mode isn't talent or research. It's focus. Antigravity, Spark, Omni, and 3.5 Flash are all shipping at once. So is everything else. The lab is running the whole stack. Whether any one team is running it well is what the next twelve months will tell us.

#gemini#google-deepmind#agentic-ai#veo