Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Jensen Huang GTC Speech Full Text: The Age of Inference Has Arrived, 2027 Revenue at Least Trillion Dollars, Robotics is the New Operating System
NVIDIA is developing and deploying space-based data center computers called “Vera Rubin Space-1,” fully opening up the imagination of extending AI computing power beyond Earth.
Source: Wall Street Insights
On March 16, 2026, NVIDIA’s GTC 2026 conference officially opened, with founder and CEO Jensen Huang delivering the keynote speech.
At this event, regarded as the “AI industry’s annual pilgrimage,” Huang explained NVIDIA’s transformation from a “chip company” to an “AI infrastructure and factory company.” Confronted with market concerns about sustained performance and growth potential, Huang detailed the underlying business logic driving future expansion—“Token Factory Economics.”
Performance guidance is extremely optimistic: “Demand of at least $1 trillion by 2027.”
Over the past two years, global AI computing demand has exploded exponentially. As large models evolve from “perception” and “generation” to “reasoning” and “action (task execution),” the consumption of computing power has surged sharply. Addressing market concerns about order and revenue ceilings, Huang provided very strong expectations.
In his speech, Huang openly stated:
Last year, I mentioned we saw a high-confidence demand of $500 billion, covering Blackwell and Rubin through 2026. Now, right here, I see at least $1 trillion in demand by 2027.
Huang’s trillion-dollar forecast once drove NVIDIA’s stock price up by over 4.3%.
Moreover, he added:
Is this reasonable? That’s what I’m about to discuss. In fact, we might even be undersupplied. I am certain that actual computing demand will be much higher than this.
Huang pointed out that NVIDIA’s current systems have proven themselves to be the world’s “lowest-cost infrastructure.” Because NVIDIA can run nearly all AI models across various fields, this versatility allows the $1 trillion investment from customers to be fully utilized and maintain a long lifecycle.
Currently, 60% of NVIDIA’s business comes from the top five hyperscale cloud providers, while the remaining 40% is widely distributed across sovereign clouds, enterprises, industrial sectors, robotics, and edge computing.
Token Factory Economics: Performance per Watt Determines Business Vitality
To explain the reasonableness of this $1 trillion demand, Huang presented a new business mindset to global CEOs. He pointed out that future data centers will no longer be file storage warehouses but “Token factories”—production lines for AI-generated fundamental units.
Huang emphasized:
Every data center, every factory, by definition, is limited by power. A 1GW (gigawatt) factory will never become a 2GW one—that’s a physical and atomic law. Under fixed power, whoever’s performance per watt for Token throughput is highest will have the lowest production cost.
Huang divided future AI services into four business tiers:
He pointed out that as models grow larger and context lengths increase, AI becomes smarter, but token generation speed decreases. Huang stated:
In this Token Factory, your throughput and token generation speed will directly translate into your precise revenue next year.
Huang emphasized that NVIDIA’s architecture enables customers to achieve extremely high throughput at the free tier, while at the highest inference tier, performance can be improved by an astonishing 35 times.
Vera Rubin achieved 350x acceleration in two years, with Groq filling the gap for ultra-fast inference
Under these physical limits, NVIDIA introduced its most complex AI computing system ever, Vera Rubin. Huang said:
Last year, I mentioned Hopper, and I would hold up a chip—very cute. But when I mention Vera Rubin, everyone thinks of the entire system. In this fully liquid-cooled system, eliminating traditional cables, racks that once took two days to install now take only two hours.
Huang pointed out that through extreme end-to-end hardware-software co-design, Vera Rubin has created astonishing data leaps within the same 1GW data center:
In just two years, we increased the token generation rate from 22 million to 700 million, a 350-fold increase. Moore’s Law during the same period only provides about a 1.5x boost.
To address bandwidth bottlenecks under ultra-fast inference conditions (e.g., 1000 tokens/sec), NVIDIA offers an integrated final solution acquired from Groq: asymmetric separation inference. Huang explained:
These two processors have very different characteristics. Groq chips have 500MB of SRAM, while a Rubin chip has 288GB of memory.
Huang noted that NVIDIA’s Dynamo software system offloads the “pre-fill” (preloading) phase, which requires massive computation and memory, to Vera Rubin, and the latency-sensitive “decode” phase to Groq. He also provided enterprise compute configuration suggestions:
If your workload is mainly high throughput, use 100% Vera Rubin; if you have substantial high-value programming-level token generation needs, allocate about 25% of your data center to Groq.
It is revealed that Samsung-processed Groq LP30 chips are already in mass production, expected to ship in Q3, and the first Vera Rubin rack is already running on Microsoft Azure cloud.
Additionally, Huang showcased the world’s first mass-produced co-packaged optical (CPO) switch Spectrum X, calming market concerns over the “copper retreat, optical advance” route:
We need more copper cable capacity, more optical chip capacity, and more CPO capacity.
Agent Ends Traditional SaaS: “Annual Salary + Token” Becomes Silicon Valley Standard
Beyond hardware barriers, Huang dedicated much of his speech to the revolution in AI software and ecosystems, especially the explosion of Agents (intelligent entities).
He described the open-source project OpenClaw as “the most popular open-source project in human history,” claiming it surpassed what Linux achieved in 30 years within just a few weeks. Huang straightforwardly said that OpenClaw is essentially the “operating system” for agent computers.
Huang asserted:
Every SaaS (Software as a Service) company will become an AaaS (Agent-as-a-Service) company. Undoubtedly, to ensure the safe deployment of these intelligent agents capable of accessing sensitive data and executing code, NVIDIA has launched the enterprise-grade NeMo Claw reference design, adding policy engines and privacy routers.
For ordinary workers, this transformation is also imminent. Huang depicted the future workplace:
In the future, every engineer in our company will have an annual token budget. Their base salary might be hundreds of thousands of dollars, and I will allocate about half of that amount as token quota, enabling them to achieve 10x efficiency improvements. This has become a new recruiting chip in Silicon Valley: how many tokens are included in your offer?
He also “leaked” that the next-generation computing architecture Feynman will enable the first co-expansion of copper and CPO. More intriguingly, NVIDIA is developing and deploying space-based data center computers called “Vera Rubin Space-1,” fully opening the imagination of extending AI compute power beyond Earth.
Huang’s full GTC 2026 speech (with AI tool assistance) is as follows:
Host: Welcome to the stage, NVIDIA founder and CEO Jensen Huang.
Jensen Huang, Founder and CEO:
Welcome to GTC. I want to remind everyone that this is a technology conference. Seeing so many people queuing early in the morning, and being here with all of you, makes me very happy.
At GTC, we focus on three main themes: technology, platform, and ecosystem. NVIDIA currently has three major platforms: the CUDA-X platform, system platform, and our latest AI factory platform.
Before we begin, I want to thank our pre-show hosts—Sarah Guo from Conviction, Alfred Lin from Sequoia Capital (NVIDIA’s first venture investor), and NVIDIA’s first major institutional investor Gavin Baker. These three have deep insights into technology and broad influence in the entire tech ecosystem. Of course, I also want to thank all the distinguished guests I personally invited to attend today. Thank you to this all-star team.
I also want to thank all the companies present today. NVIDIA is a platform company with technology, platforms, and a rich ecosystem. The companies here represent nearly all participants in the $100 trillion industry—450 companies sponsored this event, for which I am deeply grateful.
This conference features 1,000 technical forums and 2,000 speakers, covering every level of the AI “five-layer cake” architecture—from infrastructure like land, power, and data centers, to chips, platforms, models, and the various applications driving the industry forward.
CUDA: Twenty Years of Technological Accumulation
Everything starts here. This year marks the 20th anniversary of CUDA.
For twenty years, we have been committed to developing this architecture. CUDA is a revolutionary invention—SIMT (Single Instruction, Multiple Threads) technology allows developers to write scalar code and extend it to multi-threaded applications, with much lower programming difficulty than previous SIMD architectures. Recently, we added Tiles functionality to help developers more easily program Tensor Cores, and various mathematical structures essential for AI today. Currently, CUDA has thousands of tools, compilers, frameworks, and libraries, with hundreds of thousands of open projects in the open-source community, deeply integrated into every tech ecosystem.
This chart reveals NVIDIA’s 100% strategic logic, a slide I’ve been showing from the beginning. The most difficult and core element is the “Installed Base” at the bottom of the chart. Over twenty years, we have accumulated hundreds of millions of GPUs and computing systems running CUDA worldwide.
Our GPUs cover all cloud platforms, serving nearly all computer manufacturers and industries. The vast installed base of CUDA is the fundamental reason this flywheel accelerates continuously. The installed base attracts developers, who create new algorithms and breakthroughs, which in turn spawn new markets, form new ecosystems, and attract more companies, further expanding the installed base—this flywheel keeps speeding up.
NVIDIA’s software downloads are growing at an astonishing rate, large in scale and increasing rapidly. This flywheel enables our computing platform to support massive applications and continuous breakthroughs.
More importantly, it also grants these infrastructures a very long lifespan. The reason is clear: applications running on NVIDIA CUDA are extremely diverse, covering every stage of the AI lifecycle, various data processing platforms, and scientific solvers. Once installed, NVIDIA GPUs have high actual value. That’s why, even six years after releasing the Ampere architecture GPU, its cloud prices have increased.
All this is driven by the enormous installed base, a powerful flywheel, and a broad developer ecosystem. When these factors work together, along with our ongoing software updates, computing costs keep decreasing. Accelerated computing not only boosts application performance significantly but also, through long-term software maintenance and iteration, allows users to enjoy ongoing performance gains and decreasing costs. We are committed to supporting every GPU globally for the long term because of their architecture compatibility.
We do this because of the huge installed base—every time we release an optimization, it benefits millions of users. This dynamic combination continuously expands our reach, accelerates our growth, and drives down costs, ultimately fueling new growth. CUDA is at the core of all this.
From GeForce to CUDA: Twenty-Five Years of Evolution
Our journey with CUDA actually began twenty-five years ago.
GeForce—many of you grew up with GeForce. GeForce is NVIDIA’s most successful marketing project. We started cultivating future customers when you couldn’t afford our products—your parents became NVIDIA’s earliest users, buying our products year after year, until one day you grew into excellent computer scientists and true customers and developers.
This foundation was laid by GeForce twenty-five years ago. We invented programmable shaders—an obvious yet profound invention that enabled accelerators to become programmable, and the world’s first programmable accelerator, the pixel shader. Five years later, we created CUDA—one of our most important investments ever. At that time, our financial resources were limited, but we bet most of our profits on it, aiming to extend CUDA from GeForce to every computer. Our conviction was deep because we believed in its potential. Despite initial hardships, we persisted through 13 generations over twenty years, and now CUDA is everywhere.
It was the pixel shader that drove the GeForce revolution. About eight years ago, we launched RTX—a comprehensive overhaul of architecture for modern computer graphics. GeForce brought CUDA to the world, and because of that, researchers like Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, and Andrew Ng discovered that GPUs could be powerful accelerators for deep learning, igniting the AI explosion a decade ago.
Ten years ago, we decided to fuse programmable shading with two new ideas: first, hardware ray tracing, which was technically challenging; second, a forward-looking idea—about ten years ago, we foresaw that AI would fundamentally transform computer graphics. Just as GeForce brought AI to the world, AI now is reshaping how computer graphics are realized.
Today, I want to show you the future. It’s our next-generation graphics technology, called Neural Rendering—a deep fusion of 3D graphics and AI. This is DLSS 5, please watch.
Neural Rendering: The Fusion of Structured Data and Generative AI
Isn’t this breathtaking? Computer graphics are coming alive again.
What have we done? We combined controllable 3D graphics (the real foundation of virtual worlds) with structured data, then integrated generative AI and probabilistic computing. One is fully deterministic, the other probabilistic but highly realistic—we merge these two concepts, achieving precise control through structured data while generating in real-time. The result is content that is both stunning and fully controllable.
The idea of merging structured information with generative AI will repeatedly appear across industries. Structured data is the foundation of trustworthy AI.
Accelerating Platforms for Structured and Unstructured Data
Now I will show a technical architecture diagram.
Structured data—familiar platforms like SQL, Spark, Pandas, Velox, and major cloud platforms such as Snowflake, Databricks, Amazon EMR, Azure Fabric, Google BigQuery—all handle data frames. These data frames are like giant spreadsheets, carrying all the information of the business world, the fundamental facts (Ground Truth) for enterprise computing.
In the AI era, we need AI to use structured data and achieve extreme acceleration. In the past, accelerating structured data processing was to make enterprises more efficient. In the future, AI will use these data structures at speeds far beyond humans, and AI agents will heavily invoke structured databases.
For unstructured data, the majority of data types—vector databases, PDFs, videos, audio—constitute most of the world’s data. About 90% of data generated annually is unstructured. In the past, these data were almost unusable: we read them, store them in file systems, and that’s it. We couldn’t query or retrieve easily because unstructured data lack simple indexing; understanding their meaning and context is necessary. Now, AI can do this—using multimodal perception and understanding, AI can read PDFs, grasp their meaning, and embed them into larger queryable structures.
NVIDIA has created two foundational libraries for this:
These two platforms will become some of the most important foundational platforms in the future.
Today, we announce collaborations with multiple companies. IBM—creator of SQL—will use cuDF to accelerate its WatsonX Data platform. Dell has partnered with us to build the Dell AI Data Platform, integrating cuDF and cuVS, achieving significant performance improvements in real projects with NTT Data. Google Cloud is now accelerating not only Vertex AI but also BigQuery, and has partnered with Snapchat to reduce their computing costs by nearly 80%.
The benefits of accelerated computing are threefold: speed, scale, and cost. This aligns with Moore’s Law—achieving performance leaps through acceleration while continuously optimizing algorithms, allowing everyone to enjoy steadily decreasing costs.
NVIDIA has built an accelerated computing platform that consolidates many libraries: RTX, cuDF, cuVS, and more. These libraries are integrated into global cloud services and OEM systems, reaching users worldwide.
Deep Collaboration with Cloud Providers
Partnerships with major cloud providers
Google Cloud: We accelerate Vertex AI and BigQuery, deeply integrate with JAX/XLA, and perform excellently on PyTorch—NVIDIA is the only accelerator that performs well on both PyTorch and JAX/XLA. We have onboarded clients like Base10, CrowdStrike, Puma, and Salesforce into the Google Cloud ecosystem.
AWS: We accelerate EMR, SageMaker, and Bedrock, with deep integration. This year, I am especially excited that we will bring OpenAI into AWS, significantly boosting AWS cloud consumption and helping OpenAI expand regional deployment and compute scale.
Microsoft Azure: The first supercomputer built by us, with 100 PFLOPS, is also the first supercomputer deployed on Azure, laying a foundation for collaboration with OpenAI. We accelerate Azure cloud services and AI Foundry, jointly expanding Azure regions, and work closely with Bing Search. Notably, our Confidential Computing capability—ensuring even operators cannot view user data and models—NVIDIA GPUs are among the first to support confidential computing, enabling secure deployment of OpenAI and Anthropic models across cloud regions. For example, we accelerate all Synopsys EDA and CAD workflows, deploying on Microsoft Azure.
Oracle: We are Oracle’s first AI customer, proud to be the first to explain AI cloud concepts to Oracle. Since then, Oracle has grown rapidly, and we have introduced partners like Cohere, Fireworks, and OpenAI.
CoreWeave: The world’s first AI-native cloud, born for GPU hosting and AI cloud services, with an excellent customer base and strong growth momentum.
Palantir + Dell: A tripartite collaboration creating a new AI platform based on Palantir’s Ontology Platform and AI platform, capable of deploying AI fully locally in any country, any air-gapped environment—from data processing (vectorized or structured) to the entire AI acceleration stack.
NVIDIA has established this kind of mutually beneficial ecosystem with global cloud providers—bringing customers into the cloud.
Vertical Integration and Horizontal Openness: NVIDIA’s Core Strategy
NVIDIA is the world’s first vertically integrated, horizontally open company.
The necessity of this model is simple: accelerated computing is not just about chips or systems; it’s about application acceleration. CPUs can make computers run faster overall, but this approach has reached a bottleneck. In the future, only application- or domain-specific acceleration can continue to deliver performance leaps and cost reductions.
This is why NVIDIA must deeply develop one library after another, one industry after another, one vertical sector after another. We are a vertically integrated computing company with no other path. We must understand applications, understand domains, deeply grasp algorithms, and be able to deploy them in any scenario—data centers, cloud, on-premises, edge, and even robotics.
At the same time, NVIDIA remains horizontally open, willing to integrate its technology into any partner’s platform, so that the benefits of accelerated computing can be enjoyed worldwide.
The participant structure of this GTC fully reflects this. Among attendees, the highest proportion is from the financial services industry—developers, not traders. Our ecosystem covers upstream and downstream supply chains. Whether companies are 50, 70, or 150 years old, last year was their best year ever. We are at the beginning of something very, very significant.
CUDA-X: Accelerated Computing Engines for Every Industry
In every vertical sector, NVIDIA has deep deployment:
All these core areas are supported by our CUDA-X libraries—fundamental to NVIDIA as an algorithm company. These libraries are our most vital assets, enabling our computing platform to deliver real value across industries.
One of the most important libraries is cuDNN (CUDA Deep Neural Network library), which revolutionized AI and triggered the modern AI explosion.
(Playing CUDA-X demo video)
What you just saw is all simulation—including physics-based solvers, AI agent physics models, and physics AI robot models. All are simulated; no manual animation or joint binding. This is NVIDIA’s core capability: unlocking these opportunities through deep understanding of algorithms and organic integration with the computing platform.
AI Native Enterprises and the New Computing Era
You saw industry giants like Walmart, L’Oréal, JPMorgan Chase, Roche, Toyota, and many others, as well as a large number of companies you’ve never heard of—we call them AI-native enterprises. The list is enormous, including OpenAI, Anthropic, and many emerging companies serving different verticals.
In the past two years, this industry has experienced a remarkable leap. Venture capital inflows into startups reached $150 billion, a record in human history. More importantly, the size of individual investments has jumped from millions to hundreds of millions or billions of dollars. The reason is clear: for the first time in history, every such company needs massive compute resources and large tokens. This industry is creating, generating tokens, or increasing the value of tokens from institutions like Anthropic and OpenAI.
Just as the PC revolution, internet revolution, and mobile cloud revolution spawned epoch-making companies, this generation of computing platform transformation will also give rise to influential companies that will become key forces in the future world.
Three Historic Breakthroughs Driving It All
What exactly happened in the past two years? Three major events.
First: ChatGPT, ushering in the generative AI era (late 2022 to 2023)
It not only perceives and understands but also generates unique content. I showed the fusion of generative AI and computer graphics. Generative AI fundamentally changes the way we compute—shifting from retrieval-based to generation-based computing, profoundly impacting architecture, deployment, and overall significance.
Second: Reasoning AI, exemplified by o1
Reasoning enables AI to reflect, plan, and decompose problems—breaking down incomprehensible issues into manageable steps. o1 makes generative AI trustworthy, capable of reasoning based on real information. To do this, input context tokens and output tokens for reasoning increase dramatically, significantly boosting computational load.
Third: Claude Code, the first intelligent agent model
It can read files, write code, compile, test, evaluate, and iterate. Claude Code revolutionizes software engineering—NVIDIA’s 100% engineers use one or more of Claude Code, Codex, and Cursor. No software engineer works without AI assistance.
This is a new inflection point—you no longer ask AI “what, where, how,” but let it “create, execute, build,” actively using tools, reading files, decomposing problems, and taking action. AI now moves from perception to generation, reasoning, and now to actually completing work.
In the past two years, the computational demand for reasoning has increased about 10,000 times, and usage has grown about 100 times. I have always believed that the demand for compute has grown 1,000,000 times in these two years—this is a shared feeling among everyone, including OpenAI and Anthropic. More compute means more tokens generated, higher revenue, and smarter AI. The reasoning inflection point has arrived.
The Era of Trillion-Dollar AI Infrastructure
Last year at this time, I said we had high confidence in the demand and procurement orders for Blackwell and Rubin before 2026, totaling about $500 billion. Today, after a year at GTC, I stand here to tell you: looking to 2027, I see the number at least $1 trillion. And I am certain that actual compute demand will be far beyond that.
2025: NVIDIA’s Year of Inference
2025 is NVIDIA’s Year of Inference. We aim to ensure excellence at every stage of the AI lifecycle—beyond training and post-training—so that invested infrastructure can operate efficiently and have longer effective lifespans at lower unit costs.
Meanwhile, Anthropic and Meta have officially joined the NVIDIA platform, representing about one-third of global AI compute demand. Open-source models are approaching cutting-edge levels and are ubiquitous.
NVIDIA is currently the only platform capable of running all AI domains—language, biology, graphics, vision, speech, proteins, chemistry, robotics—across edge and cloud, in any language. Our architecture is universal for all these scenarios, making us the lowest-cost, highest-confidence platform.
Currently, 60% of NVIDIA’s business comes from the top five hyperscale cloud providers worldwide, with the remaining 40% spread across regional clouds, sovereign clouds, enterprises, industrial sectors, robotics, and edge computing. The breadth of AI coverage itself is its resilience—this is undoubtedly a new computing platform revolution.
Grace Blackwell and NVLink 72: Bold Architectural Innovation
While the Hopper architecture was still at its peak, we decided to completely redesign the system, expanding NVLink from 8 to 72 links, and restructuring the entire compute system. Grace Blackwell NVLink 72 is a major technological gamble, not easy for all partners—my sincere thanks to everyone involved.
At the same time, we launched NVFP4—a new type of tensor core and compute unit, not just ordinary FP4. We have demonstrated that NVFP4 can perform inference without precision loss, delivering huge performance and energy efficiency gains, and it is also suitable for training. Additionally, new algorithms like Dynamo and TensorRT-LLM have emerged, and we even built a supercomputer called DGX Cloud to optimize kernels with billions of dollars.
The results are impressive: according to Semi Analysis—the most comprehensive AI inference performance evaluation to date—NVIDIA leads in both watts per token and cost per token. While Moore’s Law might have given H200 a 1.5x boost, we achieved 35x. Dylan Patel from Semi Analysis even said, “Jensen was conservative—actually, it’s 50x.” He’s right.
I quote him: “Jensen sandbagged.”
NVIDIA’s cost per token is the lowest globally—no one can match it. The reason lies in extreme co-design.
Take Fireworks as an example: before NVIDIA’s software and algorithm updates, average token speed was about 700 per second; after updates, it approached 5,000 per second—about a 7x improvement. That’s the power of extreme co-design.
AI Factory: From Data Center to Token Factory
Data centers used to store files; now they are token-producing factories. Every cloud provider and AI company will soon use “token factory efficiency” as a core metric.
My core argument:
Tokens are the new commodity. Once mature, they will be tiered priced:
Compared to Hopper, Grace Blackwell has increased throughput at the highest value tier by 35 times and introduced a new tier. Simplified estimates suggest that allocating 25% of power to each tier, Grace Blackwell can generate 5 times more revenue than Hopper.
Vera Rubin: The Next-Generation AI Computing System
(Playing Vera Rubin system introduction video)
Vera Rubin is a complete, end-to-end optimized system designed specifically for agent workloads:
Vera Rubin is fully liquid-cooled, reducing installation time from two days to two hours, using 45°C hot water cooling, greatly easing data center cooling pressure. Satya Nadella has confirmed that the first Vera Rubin rack is now running on Microsoft Azure, which excites me greatly.
Groq Integration: The Ultimate Extension of Inference Performance
We acquired Groq’s team and licensed their technology. Groq is a deterministic dataflow processor, using static compilation and compiler scheduling, with large SRAM, optimized for inference workloads—offering extremely low latency and very high token generation speed.
However, Groq’s memory capacity is limited (500MB on-chip SRAM), making it difficult to independently handle large model parameters and KV caches, restricting large-scale applications.
The solution is Dynamo—a suite of inference scheduling software. We disaggregate inference pipelines with Dynamo:
Both are tightly coupled via Ethernet, with special modes to reduce latency by about half. Under Dynamo’s unified scheduling—an “AI factory operating system”—overall performance improves 35x, opening new inference performance levels previously unreachable with NVLink 72.
The combined use of Groq and Vera Rubin is recommended:
Groq LP30 chips, manufactured by Samsung, are already in mass production, expected to ship in Q3. Thanks to Samsung’s full cooperation.
Historical Leap in Inference Performance
Quantifying the progress: in two years, a 1 GW AI factory’s token generation rate will jump from 22 million tokens/sec to 700 million tokens/sec—a 350x increase. That’s the power of extreme co-design.
Roadmap
The roadmap clearly advances along three parallel paths: copper expansion, optical scale-up, and optical scale-out. We need all partners to continue expanding capacity in copper, fiber, and CPO.
NVIDIA DSX: Digital Twin Platform for AI Factories
AI factories are becoming more complex, but their component suppliers have never collaborated during design—until now, when they “meet” in data centers—this is insufficient.
To address this, we created Omniverse and the NVIDIA DSX platform based on it—a platform for all partners to co-design and operate gigawatt-scale AI factories in virtual worlds. DSX offers:
Conservatively, this system can improve energy utilization efficiency by about 2x, which is a significant gain at this scale. Omniverse, starting from the digital Earth, will host various digital twins, and we are building the largest computer in human history with global partners.
Furthermore, NVIDIA is venturing into space. Thor chips have passed radiation certification and are operating in satellites. We are developing Vera Rubin Space-1 for space-based data centers. In space, cooling relies solely on radiation dissipation, making thermal management a key challenge—top engineers are working on solutions.
OpenClaw: The Operating System of the Agent Era
Peter Steinberger developed software called OpenClaw. It is the most popular open-source project in human history, surpassing Linux’s achievements in just a few weeks.
OpenClaw is essentially an agent system capable of:
In OS terms, it’s truly an operating system—the operating system for agent computers. Windows made personal computing possible; OpenClaw makes personal agents possible.
Every enterprise needs to develop its own OpenClaw strategy, just as we need Linux, HTML, and Kubernetes strategies.
Revolutionizing Enterprise IT
Before OpenClaw, enterprise IT involved data and files entering systems, flowing through tools and workflows, ultimately becoming tools for humans. Software companies created tools; system integrators and consultants helped enterprises use them.
After OpenClaw, every SaaS company will become an AaaS (Agentic as a Service) company—not just providing tools, but offering specialized AI agents.
But a key challenge is that enterprise agents can access sensitive data, execute code, and communicate externally—strict controls are necessary.
To this end, we partnered with Peter to embed security into enterprise-grade versions, launching:
This is a renaissance for enterprise IT—a $2 trillion industry poised to grow into a multi-trillion-dollar sector, shifting from tool provision to delivering specialized AI agent services.
I foresee that in the future, every engineer in a company will have an annual token budget. Their salary might be hundreds of thousands of dollars, and I will allocate roughly half of that as token quota, multiplying their productivity tenfold. “How many tokens are in your onboarding package?” has become a new hiring topic in Silicon Valley.
Every enterprise will be both a user of tokens (for engineers) and a producer (serving clients). The significance of OpenClaw is comparable to HTML and Linux—fundamental.
NVIDIA’s Open Model Initiative
For custom agents, we offer NVIDIA’s own cutting-edge models:
Models in Nemotron, World Foundation Model Cosmos, GROOT humanoid robot model, Alpamayo autonomous driving, BioNeMo digital biology, Phys-AI physics
We are at the forefront in each field and committed to continuous iteration—Nemotron 4 after Nemotron 3, Cosmos 2 after Cosmos 1, Groq’s second generation.
Nemotron 3 ranks among the top three best models globally in OpenClaw, at the forefront. Nemotron 3 Ultra will be the most powerful foundational model ever, supporting national sovereignty AI development.
Today, we announce the Nemotron Alliance, investing billions of dollars to advance AI foundational model R&D. Members include BlackForest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection, Sarvam (India), Thinking Machines (Mira Murati’s lab), and others. Many enterprise software companies are joining, integrating NeMo Claw reference design and NVIDIA’s agent AI toolkit into their products.
Physical AI and Robotics
Digital agents act in the digital world—coding, data analysis; physical AI are embodied agents—robots.
At this GTC, 110 robots appeared, nearly covering all global robot R&D companies. NVIDIA provides three computers (training, simulation, onboard) and a complete software stack with AI models.
In autonomous driving, the “ChatGPT moment” has arrived. Today, we announce four new partners joining NVIDIA RoboTaxi Ready: BYD, Hyundai, Nissan, Geely, with a combined annual output of 18 million vehicles. Alongside Mercedes-Benz, Toyota, GM, the lineup grows stronger. We also announce a major partnership with Uber to deploy and connect RoboTaxi-ready vehicles in multiple cities.
In industrial robotics, companies like ABB, Universal Robots, KUKA are collaborating with us to integrate physical AI models with simulation systems, advancing robot deployment in manufacturing lines worldwide.
In telecommunications, Caterpillar and T-Mobile are also involved. Future wireless base stations will no longer be just communication nodes but NVIDIA Aerial AI RAN—real-time traffic sensing, beamforming adjustment, and intelligent edge computing for energy efficiency.
Special Segment: Olaf Robot Debuts
(Playing Disney Olaf robot demo video)
Huang: Snowman appears! Newton is running fine! Omniverse is working perfectly! Olaf, how are you?
Olaf: I’m so happy to see you.
Huang: Yes, because I gave you a computer—Jetson!
Olaf: What’s that?
Huang: Right inside your belly.
Olaf: Amazing.
Huang: You learned to walk in Omniverse.
Olaf: I like walking. It’s much better than riding a reindeer and gazing at the beautiful sky.
Huang: That’s thanks to physics simulation—based on NVIDIA Warp’s Newton solver, developed jointly with Disney and DeepMind, enabling you to adapt to the real physical world.
Olaf: I was just about to say that.
Huang: That’s your smart part. I am a snowman, not a snowball.
Huang: Can you imagine? The future Disney parks—where all these robot characters walk freely. Honestly, I thought you’d be taller. I’ve never seen such a short snowman.
Olaf: (shrugs)
Huang: Come help me finish today’s speech?
Olaf: Awesome!
Summary of the Keynote
Huang: Today, we discussed the following core themes: