your system language is:English

NVIDIA Vera Rubin & Physical AI: Jensen Huang CES Keynote

Cover

📺 Today’s recommended deep-dive video: https://www.youtube.com/watch?v=0NBILspM4c4


Beyond Blackwell: Jensen Huang Unveils the Vera Rubin Era and the Rise of Physical AI

NVIDIA is no longer just a chip company; it is the architect of a new industrial revolution where software trains itself and robots reason through the physical world. Jensen Huang’s CES 2025 keynote marks a pivotal transition from generative chatbots to “physical AI” that understands the laws of nature.

Core Question: How is NVIDIA reinventing the entire computing stack to support the massive shift toward reasoning agents and autonomous physical systems?

Highlights

  • Introduction of the Vera Rubin platform, featuring the new Vera CPU and Rubin GPU with extreme co-design.
  • The launch of Alpha Mylo, a “thinking” autonomous vehicle stack developed in partnership with Mercedes-Benz.
  • The emergence of “test-time scaling,” enabling AI models to reason in real-time rather than just predicting tokens.
  • Strategic collaborations with Siemens and Cadence to build “factories as robots” through digital twins and simulation.

⏱️ Reading time: approx. 8 minutes · Saves you about 83 minutes vs. watching.

Want to take notes while watching? Click the image below and let AI Notebook capture the key points for you 👇

AI Notebook


The New Architecture of Reasoning

Moving from Pre-recorded Software to Thinking Agents

The computer industry is currently undergoing a double platform shift: we are moving to AI-centric applications while fundamentally reinventing how software is developed and executed. We no longer simply program software; we train it, shifting from CPU-dominant processing to GPU-accelerated computing that generates every pixel and token from scratch.

This shift is worth trillions of dollars in modernization.

The “ChatGPT moment” has evolved into the “reasoning moment,” characterized by test-time scaling where the model “thinks” before it speaks. Instead of a one-shot answer based on memorized patterns, new models like DeepSeek R1 and OpenAI’s o1 use reinforcement learning to break problems into steps, research information, and plan outcomes in real-time.

NVIDIA is facilitating this through “Blueprints,” which are integrated frameworks for building agentic AI. These systems are multi-modal, multi-model, and multi-cloud, allowing enterprises to use a “model router” to decide which specialized AI—whether it’s a proprietary frontier model or a local open-source model—is best suited for a specific task based on privacy and complexity.

Architecture diagram showing a central "Intent-Based Model Router" receiving a user prompt and distributing tasks to specialized models (LLMs, Physics AIs, Search Tools) and local edge devices like the DGX Spark.

💡 Digging Deeper

Q: What is test-time scaling?
A: It is the process where an AI model uses additional compute power at the moment of inference to “think” or reason through a problem, rather than just predicting the next word.

Q: How does the “Model Router” work?
A: It acts as a manager that analyzes the user’s intent to determine if a task should be handled by a massive frontier model in the cloud or a smaller, private model running locally.

Q: Why is open source emphasized?
A: Open models like DeepSeek R1 allow every country and company to customize intelligence, ensuring that no industry is left behind in the AI revolution.


Physical AI: Common Sense for Robots

The Three-Computer Strategy for Autonomous Systems

The next frontier is Physical AI—systems that understand the laws of physics, such as gravity, friction, and object permanence. While digital AI interacts through screens, physical AI must interact with the world, requiring a “common sense” that cannot be learned from text alone but must be simulated in virtual environments.

Teaching a robot the “long tail” of reality requires a massive amount of synthetic data.

NVIDIA’s strategy relies on three distinct computers: one for training the models, one for real-time inference (the robot’s “brain”), and a third for simulation. This simulation environment, powered by Omniverse and the new Cosmos foundation model, allows robots to experience trillions of miles of travel or millions of tasks in a physically accurate digital twin before ever touching the ground.

This methodology has culminated in Alpha Mylo, the world’s first “thinking” autonomous vehicle AI. Trained end-to-end, it doesn’t just actuate steering and brakes; it provides a “chain of thought” explanation for its actions, allowing it to navigate complex, unseen road scenarios by reasoning through them like a human driver.

Process map illustrating the "Three-Computer Strategy": Computer 1 (DGX) for model training, Computer 2 (Omniverse/Cosmos) for synthetic data generation and simulation, and Computer 3 (Drive/Thor) for real-time inference inside the vehicle.


Vera Rubin: Engineering the Next Frontier

Extreme Co-Design and the 6-Chip Supercomputer

As AI models grow by 10x every year, Moore’s Law is no longer sufficient to keep pace with the demand for computation. To address this, NVIDIA has moved to “extreme co-design,” reinventing all six chips in the supercomputer stack—CPU, GPU, NVLink, DPU, NIC, and Switch—to function as a single, coherent engine.

The Vera Rubin platform is the result of this massive 15,000-engineer-year effort.

The Vera CPU offers twice the performance per watt of the world’s most advanced processors, while the Rubin GPU delivers five times the peak inference performance of Blackwell. Despite having only 1.6 times the transistors of its predecessor, Rubin achieves these leaps through the NVFP4 Tensor Core, an adaptive processor that dynamically adjusts precision to maximize throughput without losing accuracy.

A critical innovation in this generation is the management of “Context Memory” or KV Cache. As AI conversations become longer and models more complex, the memory required to remember past interactions has exploded. NVIDIA introduced BlueField-4 to manage this memory off-chip but within the rack, providing each GPU with an additional 16 terabytes of fast, accessible context storage.

Functional block diagram of the Vera Rubin NVL72 rack, showing the interconnection between the Vera CPU, Rubin GPU, BlueField-4 DPU, and the 240 terabytes-per-second NVLink switch backplane.


Key Takeaways

The keynote highlights a fundamental shift in NVIDIA’s identity from a hardware vendor to a full-stack AI foundry. By open-sourcing massive models like Alpha Mylo and Cosmos, NVIDIA is positioning itself as the infrastructure layer for the entire “Physical AI” industry, ensuring that its software (Omniverse, Isaac, Nemo) is as indispensable as its silicon.

The Vera Rubin platform demonstrates that hardware scaling now requires massive system-level integration and liquid cooling to remain efficient. With 100% liquid cooling and 45°C water requirements, NVIDIA is addressing the energy crisis of the data center while pushing inference and training speeds to levels that were unimaginable just 24 months ago.


Q&A

Q1: What is the significance of the “Vera Rubin” name?
A1: It honors American astronomer Vera Rubin, who discovered evidence of dark matter, symbolizing the “invisible” computation and data that powers the modern world.

Q2: How does Alpha Mylo differ from traditional self-driving stacks?
A2: Traditional stacks use rigid code for rules, whereas Alpha Mylo is a reasoning AI that can explain its decisions and handle “long-tail” edge cases through human-like logic.

Q3: Is the new hardware air-cooled or liquid-cooled?
A3: The Vera Rubin NVL72 is 100% liquid-cooled, allowing it to handle twice the power of previous generations while using “hot water” (45°C) to eliminate the need for expensive chillers.

Q4: What is the “KV Cache” problem?
A4: As AI conversations get longer, the “working memory” (Key-Value cache) grows too large for the GPU’s onboard memory. NVIDIA solved this by using BlueField-4 to create a dedicated context memory store in the rack.

Q5: How are Siemens and NVIDIA working together?
A5: They are integrating NVIDIA’s AI and Omniverse into Siemens’ industrial tools to create “agentic” chip and factory designers, essentially allowing factories to be designed and tested as giant robots in a computer.

Q6: What is Spectrum-X?
A6: It is NVIDIA’s high-performance Ethernet platform specifically designed for AI traffic, which is much more bursty and latency-sensitive than traditional internet data.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts