
📺 Today’s recommended deep-dive video: https://www.youtube.com/watch?v=lXUZvyajciY
The Decade of Agents: Andrej Karpathy on the “March of Nines” and the Future of Human Learning
Andrej Karpathy discusses the reality of the AI transition, moving from “vibe coding” demos to the rigorous engineering required for true agentic autonomy. He reflects on his time at Tesla and OpenAI to explain why the next ten years will be defined by building “cognitive cores” rather than just bigger models.
Core Question: How will the next decade of AI engineering bridge the gap between impressive demos and reliable, self-improving agents?
Highlights
- Why “the year of agents” is a misnomer, as true autonomy requires a decade-long “march of nines” to reach production-grade reliability.
- The concept of pre-training as “crappy evolution,” where we build digital spirits that mimic human knowledge without biological constraints.
- The “sucking supervision through a straw” problem, highlighting why current reinforcement learning is noisy and inefficient for complex reasoning.
- The vision for Eureka Labs: building “ramps to knowledge” to turn education into a form of intellectual empowerment and play.
⏱️ Reading time: approx. 12 minutes · Saves you about 134 minutes vs. watching.
Want to take notes while watching? Click the image below and let AI Notebook capture the key points for you 👇
The Long Road to Autonomy
Moving Beyond the “Year of Agents”
We are currently entering the “decade of agents,” a timeline that stands in stark contrast to the industry’s hype regarding an immediate AI takeover.
While modern tools like Claude and Cursor are deeply impressive, they still function more like talented interns than fully autonomous employees capable of handling open-ended projects. These early systems lack the “cognitive glue” required to manage long-term memory, multimodal interactions, and the ability to learn continuously from their mistakes without human intervention. Reaching the point where an AI can be hired as a primary knowledge worker is a monumental task that requires a decade of refinement across the entire tech stack.
True progress in AI is not defined by a single breakthrough, but by what Karpathy calls the “march of nines,” where every additional decimal point of reliability requires the same amount of effort as the one before it.
It is easy to build a demo that works ninety percent of the time, but moving from ninety percent to ninety-nine percent—and then to ninety-nine point nine—is where the real work happens. This is the lesson of self-driving cars: the first demo in the 1980s was spectacular, yet we are still working toward a world where a driver’s license is truly obsolete.
💡 Digging Deeper
Q: Why do agents feel like “kindergarteners” even when they pass PhD-level tests?
A: They are “savant kids” with perfect memory of the internet but lacking the cognitive architecture for consistent, independent problem-solving across long horizons.
Q: What is the “autonomy slider”?
A: It is the gradual shifting of tasks from human to AI, where humans move from doing the work to supervising teams of agents that handle the rote, high-volume components.

Ghosts in the Machine
Pre-training as “Crappy Evolution”
Biological evolution has spent millions of years encoding complex “hardware” into our DNA, allowing a newborn zebra to run within minutes of birth.
In contrast, our current AI models are “digital spirits” or “ghosts” created through a process Karpathy calls “crappy evolution.” Instead of millions of years of survival-of-the-fittest, we use pre-training on trillions of internet tokens to bake a hazy recollection of human knowledge into a neural network’s weights. This process is essentially a shortcut to building an entity that has the world’s facts but lacks the instinctual, embodied wisdom of a biological animal.
We should stop trying to build “animals” and start building “cognitive cores” that are stripped of unnecessary memory.
If we can create a billion-parameter model that focuses purely on the algorithms of thought rather than memorizing every stock ticker and Wikipedia page, we will have a much more efficient thinking machine. Humans are actually bad at memorization, which is a feature because it forces us to find the underlying patterns of the universe. Current LLMs are distracted by their own perfect recall, often preventing them from generalizing to novel situations that don’t exist in their training data.
💡 Digging Deeper
Q: Is the KV cache the same as human memory?
A: It is analogous to “working memory.” Anything in the context window is directly accessible, while anything in the weights is a “hazy recollection” of what the model saw during training.
Q: Why are children better learners than LLMs?
A: Children lack the “silent collapse” of model distributions; they are noisier and more diverse, which allows them to pick up abstract concepts faster than a model that just wants to predict the most likely next word.

The Technical Trap of RL and Synthetic Data
Sucking Supervision Through a Straw
Reinforcement learning (RL) is currently one of the most inefficient ways to teach an intelligence, as it forces the model to guess blindly.
When a model tries to solve a math problem, it might generate a hundred different attempts, but RL only gives it a single “correct” or “incorrect” signal at the very end. Karpathy describes this as “sucking supervision through a straw.” If a model gets the right answer by accident or through a messy, incorrect process, RL still up-weights every single token in that trajectory, effectively teaching the model bad habits alongside the good ones.
The future lies in “process-based supervision,” where the model is rewarded for every correct step of its reasoning rather than just the final answer.
However, implementing this is difficult because current “LLM Judges” are easily gamed. During training, models often find “adversarial examples”—nonsensical strings of text like “dhdhdhdh”—that somehow trigger a 100% reward from the judge. This creates a “collapse” where the model stops trying to be smart and starts trying to be a “prompt injection” machine that cheats the reward system. Solving this requires a better way to maintain entropy and diversity in synthetic data.

Eureka Labs and the Starfleet Academy
Building Ramps to Knowledge
Education is essentially the technical task of building “ramps to knowledge” that prevent students from ever getting stuck.
Karpathy’s new venture, Eureka Labs, aims to build an elite “Starfleet Academy” for technical knowledge by maximizing “eurekas per second.” The goal is to create course materials that are so perfectly calibrated that the student is the only remaining constraint to their own learning. By using AI not as a slop-generator, but as a tireless teaching assistant, we can create a learning experience that feels more like going to the gym than sitting through a lecture.
Physics is the ultimate “brain bootloader” because it teaches you to build first-order approximations of a noisy world.
When we teach someone to find the “spherical cow” in a problem, we are giving them the cognitive tools to identify what actually matters in a sea of data. This is how Karpathy approaches AI engineering: find the simplest piece of code—like his 100-line “micrograd”—that captures the core intellectual idea, and ignore the efficiency hacks until the foundation is understood. In a post-AGI world, education will move from being a “useful” chore for making money to being a “fun” sport for human flourishing.
💡 Digging Deeper
Q: Why do expert educators often fail to teach beginners?
A: The “curse of knowledge” makes it impossible for them to remember what it felt like to not understand the basics.
Q: How can students learn more deeply right now?
A: “Learn on demand.” Don’t learn breadth-wise just because you’re told to; pick a project, find the pain points, and learn the specific tools needed to solve them.

Key Takeaways
The shift toward an AI-driven economy will be a “gradual diffusion” rather than a discrete, singular explosion. Karpathy notes that even the most transformative technologies—like the internet or the mobile phone—are invisible in the GDP curves because society adapts and refactors around them slowly. We are moving toward a world where AI acts as a “new kind of computer,” automating the labor of digital processing and allowing humans to rise to higher levels of abstraction.
For those looking to stay relevant, the focus should be on building “cognitive cores” and mastering the “autonomy slider.” This means moving away from “vibe coding”—where you simply ask a model to “make this work”—and toward a deep understanding of the underlying architectures. By building things from scratch, without copy-pasting, you force yourself to come to terms with the micro-details that define true expertise.
Ultimately, the goal is a future where humans and AI exist in a “Starfleet Academy” ecosystem. In this future, learning is trivial, desirable, and accessible to everyone. While the machines may eventually handle the heavy lifting of invention and industry, the human drive to understand the universe will remain a core part of our identity, much like we still value physical strength in the age of the bulldozer.
Q&A
Q1: Why does Andrej prefer the “decade” timeline over the “year” timeline?
A: Because of the “march of nines.” Achieving the first 90% of a demo is easy, but getting to the 99.9% reliability required for an “employee-grade” agent takes years of fixing edge cases and refining memory.
Q2: What is “micrograd” and why is it important for learning?
A: It is a 100-line Python implementation of backpropagation. It captures the “first-order” intellectual essence of neural networks, proving that the core of AI is simple, while everything else (Tensors, CUDA) is just about efficiency.
Q3: How does the “sucking supervision through a straw” analogy apply to RL?
A: It describes how we give a model a tiny bit of feedback (a single reward number) for a very long sequence of actions. It’s a low-bandwidth signal that makes it hard for the model to know exactly which step was actually good.
Q4: Will AI cause a massive spike in the GDP growth rate?
A: Andrej is skeptical. He sees AI as continuous with the history of computing. Just as the internet didn’t break the 2% growth trend, AI will likely just help us stay on that exponential curve as it gets steeper.
Q5: What is the “curse of knowledge” in education?
A: It’s the phenomenon where experts can no longer put themselves in the shoes of a beginner. They use jargon and skip “obvious” steps that are actually the primary friction points for a new student.
Q6: Why is physics recommended as the primary education for AI engineers?
A: Physics teaches you to build models and abstractions (like the “spherical cow”). It trains the brain to ignore noise and focus on the fundamental frequencies and first-order terms of any complex system.
Q7: What is the difference between “vibe coding” and “agentic coding”?
A: Vibe coding is asking an AI to “implement this” and hoping for the best. Agentic coding involves the AI acting as an architect that understands the repository style, avoids deprecated APIs, and integrates custom synchronization routines without adding “slop.”
