Nvidia Bets the Stack on Agentic AI at GTC 2026

Jensen Huang walked onto the SAP Center stage on March 16 in his leather jacket, and two hours later walked off having repositioned Nvidia from a chipmaker into a full-stack agentic AI platform company. The keynote was dense with product announcements, but the throughline was unmistakable: Nvidia thinks the next phase of AI is agents, and it plans to own every layer they run on.

That framing has uncomfortable implications for the rest of the industry. If Jensen is right that AI models are rapidly commoditizing while infrastructure becomes the real moat, then Nvidia just drew a circle around the only part of the stack that prints money long-term.

The trillion-dollar demand signal

The headline number: Nvidia now projects at least $1 trillion in revenue opportunity from Blackwell and Vera Rubin platforms through 2027. That is double the $500 billion estimate from last year's GTC. Jensen attributed this to a straightforward calculation: AI compute demand has grown roughly one million times over the past two years as reasoning models replaced retrieval-based systems and usage scaled simultaneously.

"If they could just get more capacity, they could generate more tokens, their revenues would go up," Huang told the crowd, referring to Nvidia's cloud and enterprise customers. The company reported 11 straight quarters of revenue growth above 55%, and the current quarter is tracking at about $78 billion, up 77% year-over-year.

These numbers are large enough that they tend to make people skeptical. But the demand signal appears real. AWS, Microsoft Azure, Google Cloud, Oracle, and CoreWeave are all expanding their Nvidia deployments. Meta, ByteDance, and Alibaba are building at similar scale. The constraint right now is not demand but power delivery.

Vera Rubin: the inference machine

The hardware centerpiece was the Vera Rubin platform, which is now in production. It is not a single chip but a full-stack system with seven chip types assembled into five rack-scale designs: Vera CPUs, Rubin GPUs, NVLink 6 switches, ConnectX-9 NICs, BlueField-4 DPUs, Spectrum-X optical NICs, and Groq 3 LPUs. Combined specs: 3.6 exaflops and 260 terabytes per second of NVLink bandwidth.

The Groq integration matters here. Nvidia acquired Groq for $20 billion in late 2025, and the Groq 3 LPX chip is purpose-built for inference decode. Nvidia's new Dynamo software layer disaggregates inference: prefill goes to the Rubin GPU, decode goes to the Groq LPU. The result, according to both Nvidia and third-party analysis from Semi Analysis, is 35x more throughput per megawatt compared to Blackwell alone.

That figure deserves scrutiny. Semi Analysis independently verified and actually exceeded Nvidia's own claims, finding roughly 50x more tokens per watt versus the Hopper H200. Jensen joked that analyst Dylan Patel "accused me of sandbagging. He was right." Whether these benchmarks hold across diverse production workloads remains to be seen, but the direction is clear: Nvidia is engineering specifically for inference throughput, not just training FLOPS.

The system is 100% liquid-cooled at 45 degrees Celsius, and installation time has dropped from two days to two hours. At the scale Nvidia is targeting, those operational details matter as much as the silicon.

The agentic play: NemoClaw and the OS analogy

About two hours into the keynote, Jensen pivoted to what Forbes called "the most consequential announcement" of the show, and it was not a chip.

He spotlighted OpenClaw, the open-source agentic framework created by developer Peter Steinberger in January, and compared it to Linux. "OpenClaw has open-sourced the operating system of agentic computers," Huang said. He described its primitives in OS terms: resource management, tool access, file system access, LLM connectivity, scheduling, sub-agent spawning.

Then he introduced NemoClaw, Nvidia's enterprise reference stack built on top of OpenClaw. "It finds OpenClaw, it downloads it. It builds you an AI agent," Huang said. NemoClaw adds the things enterprises need and OpenClaw alone does not provide: privacy guardrails, sandboxed execution via OpenShell, security certification, and one-command deployment on RTX PCs, DGX systems, or cloud instances.

The Linux comparison is ambitious but not crazy. If agentic AI really does become the dominant computing paradigm, then whoever provides the reliable, secure, standards-based layer for deploying agents has a position similar to what Red Hat held for Linux in the enterprise. Nvidia is betting NemoClaw can be that layer.

Jensen also claimed that every SaaS company will eventually become "Agentic-as-a-Service" and that Nvidia employees are already 100% using agent coding tools like Claude Code and Cursor. Take the first claim as directional. Take the second at face value, because Nvidia has the infrastructure budget to back it.

Tokens as the new commodity

The most interesting conceptual shift in the keynote was Jensen's framing of data centers as "token factories." In this model, the unit economics are simple: token throughput per watt is your revenue. Every efficiency gain translates directly to margin.

Jensen presented a tiered token market structured like SaaS pricing: free-tier tokens at one end, premium research-grade tokens at $150 per million at the other. He pointed out that platform inference providers saw generation speeds climb from 700 to nearly 5,000 tokens per second after Nvidia pushed software updates to existing hardware. A 7x revenue multiplier from a software patch, no new silicon required.

This reframe is the quiet center of the commoditization debate. If models themselves are becoming interchangeable, then the value migrates to whoever can produce tokens cheapest and fastest. That is a game Nvidia is extraordinarily well-positioned to win, because it controls the silicon, the interconnect, the inference software, and now the agentic deployment layer.

The counter-argument comes from model labs: differentiated capabilities in reasoning, tool use, and domain expertise are not easily commoditized. Both things can be true simultaneously. The model layer may retain value through specialization while the infrastructure layer captures value through scale. But Jensen's keynote made clear which side of that bet Nvidia is on.

What comes next

Beyond Vera Rubin, Jensen sketched the Feynman architecture for 2028: a new GPU, the LP40 LPU co-developed with the Groq team, the Rosa CPU, BlueField-5, and Kyber co-packaged optics for scale-up. He even teased Vera Rubin Space-1 for orbital data centers, which sounds like science fiction until you consider the thermal advantages of radiating waste heat into vacuum.

The more immediate signal is on the software side. Nvidia launched Dynamo 1.0 as an open-source inference engine alongside the Nemotron model coalition, partnering with Mistral, Perplexity, Cursor, and LangChain. Combined with physical AI advances like the Uber robotaxi fleet (28 cities, four continents by 2028) and Disney's Olaf robot demo powered by Newton physics simulation, Nvidia is clearly trying to make the case that its platform spans from cloud to edge to the physical world.

For anyone building on or investing in AI infrastructure, the takeaway from GTC 2026 is fairly direct. Nvidia is not content to be the company that sells shovels during a gold rush. It wants to own the mine, the refinery, and the distribution network. Whether it can execute on that ambition at the speed the market demands is the open question. But after this week in San Jose, dismissing the attempt would be a mistake.

Kai Nakamura covers AI for The Daily Vibe.