your system language is:English

AI Agency & Energy-Based Models: Decoding Intelligence

Cover

📺 Today’s recommended deep-dive video: https://www.youtube.com/watch?v=Ucqfb33GJJ4


Decoding Intelligence: From Geometric Symmetries to Agentic Autonomy

The boundaries between simple objects and complex agents are blurring as geometric deep learning allows us to encode the fundamental symmetries of the physical world directly into our models. By moving beyond brute-force prediction toward energy-based frameworks and counterfactual reasoning, we are uncovering a new metric for “sophistication” that defines what it truly means to be an agent. This discussion explores the architecture of future intelligence, where specialized, modular systems cooperate to solve problems that a single general-purpose algorithm never could.

Core Question: How can we distinguish true agency from sophisticated policy mapping, and what does this mean for the future of autonomous scientific discovery?

Highlights

  • Agency is defined by internal planning and counterfactual reasoning rather than simple input-output policy mappings.
  • Energy-based models (EBMs) provide a superior framework for Bayesian inference by minimizing internal state costs alongside prediction errors.
  • The path to advanced AI lies in “collective specialized intelligence” rather than a singular, monolithic General Intelligence.
  • Safe AI development requires perturbing empirically observed human reward distributions instead of manually specifying high-level goals.

⏱️ Reading time: approx. 8 minutes · Saves you about 39 minutes vs. watching.

Want to take notes while watching? Click the image below and let AI Notebook capture the key points for you 👇

AI Notebook


Symmetry and the Spectrum of Agency

The Mathematician’s View of the World

Geometric deep learning is no longer just a niche stack; it is the essential method for modeling the physical world by incorporating inherent symmetries like translation and rotation.

While a brute-force approach can eventually “discover” that the world is rotationally invariant in the XY plane, building these constraints into the model from the start is far more efficient. It satisfies the mathematical urge to reflect reality as it is—subject to gravity and specific principal axes—using tools developed over the last several years to ensure our models don’t waste computation learning what we already know to be true.

A functional diagram of a Geometric Deep Learning pipeline. On the left, a 3D object (a cube) undergoes rotation and translation. Arrows point to a 'Symmetry Filter' box that preserves features regardless of orientation. The output is a 'Canonical Representation' that feeds into a neural network, showing that the model's internal weights remain stable despite physical transformations of the input. High-contrast, technical blueprint style.

What Makes an Agent?

Agency is often misunderstood as a binary trait, yet it is better viewed as a spectrum of computational sophistication rather than a structural distinction from ordinary objects.

When we examine why we consider humans more “agentic” than amoebas, the distinction usually boils down to latent variables representing goal-oriented behavior and planning. This involves counterfactual reasoning—the ability to roll out future consequences internally before committing to an action—effectively creating a causal disconnect from immediate locality and impulse.

From an external perspective, however, distinguishing a truly agentic planner from a massive, brute-force lookup table is mathematically challenging. If the policy function is sufficiently complex, the behavior looks identical unless we “crack open” the black box to verify that internal simulations are actually occurring during the decision-making process.

💡 Digging Deeper

Q: Is a rock an agent if it follows a “policy” of remaining stationary?
A: Technically, if a policy is just an input-output relationship, anything is an agent, but we usually reserve the term for things that use planning and counterfactual reasoning to compute those policies.

Q: How can we measure the “strength” of an agent?
A: We can use metrics like transfer entropy to estimate the timetable over which a system incorporates information or exhibits context-dependent behavior.


Energy-Based Models and Predictive Frameworks

Beyond Feed-Forward Networks

Traditional neural networks typically focus on a direct mapping from inputs to outputs where the cost function only optimizes the weights of the model.

In an energy-based model (EBM), the architecture is more complex because the cost function operates on the internal, hidden states of the model as well. This requires two distinct minimizations: one to find the energetic minimum for the latent variables and another to minimize the actual prediction error. This approach acts as an inductive prior, placing constraints on the relationship between input and output that a standard feed-forward network would ignore.

A comparison chart between a 'Traditional Neural Network' and an 'Energy-Based Model'. The Traditional side shows a linear flow: Input -> Weights -> Output -> Cost Function. The EBM side shows a looped architecture: Input and Internal States feed into an 'Energy Function' (represented as a topographic map with peaks and valleys). Arrows show the system 'settling' into a valley (energetic minimum) before producing a final output. Professional technical aesthetic.

The Bayesian Connection

Energy-based models and Bayesian frameworks are two sides of the same coin, where energy is essentially the log probability of a state.

A Variational Autoencoder (VAE) is perhaps the most famous example of an EBM used today, as it applies a cost function to the internal representation—demanding it be as Gaussian as possible. By calculating the curvature at the energetic minimum, researchers can use Laplace approximations to make these complex probabilistic models computationally tractable. This bridge allows us to treat weights as latent variables during test-time training, allowing the model to adapt to new data without needing a full supervised retraining cycle.


JEPA and the Evolution of Learning

The Power of Joint Embedding

Science is fundamentally about data compression and prediction, and architectures like JEPA (Joint Embedding Prediction Architecture) make this explicit by learning in a compressed space.

Instead of trying to predict every single pixel in an image—a task that often wastes energy on irrelevant noise—JEPA embeds both the input and the output into a latent space and learns the relationship between them there. This allows the model to capture “gestalt” concepts and high-level conceptual understandings, which is far more useful for navigating the world than perfect pixel-level reconstruction.

An architecture diagram of JEPA. On the left, an 'Input' image passes through an 'Encoder x'. On the right, an 'Output' target passes through an 'Encoder y'. Both feed into a central 'Predictor' block. A 'Latent Variable' input also feeds into the Predictor to account for uncertainty. The diagram highlights that comparison happens in the 'Embedding Space' rather than the 'Pixel Space'. Process map style with clear labels.

Avoiding the Collapse

The primary challenge of joint embedding is “modal collapse,” where the model finds a trivial solution by mapping all inputs and outputs to zero.

To solve this, researchers use non-contrastive learning and various forms of regularization to maintain the richness and fidelity of the representations. Unlike supervised learning, which often discards “long-tail” information that seems irrelevant to a specific task, self-supervised learning aims to preserve as much ambiguity as necessary to remain useful for multiple downstream applications. This mirrors how the brain processes data—constantly deciding what is task-irrelevant while maintaining enough signal to adapt if the context changes.

💡 Digging Deeper

Q: Why is PCA risky for neural data?
A: Principal Component Analysis throws away low-variability dimensions, but in neural data, those “quiet” dimensions often carry the most valuable information.

Q: What is “transduction” in this context?
A: It is the process of performing search or optimization based on the specific test samples at hand, rather than relying on a smoothed-out average from training.


The Future: Specialized Intelligence and Safety

Collective Intelligence vs. AGI

The term “Artificial General Intelligence” may be a misnomer; what we are actually building toward is a system of collective specialized intelligences.

Just as the human brain evolved by linking specialized modules—like the olfactory cortex adapting into the associative frontal cortex—AI will likely reach its peak by combining highly specialized agents that communicate. This modularity is like a set of Legos; the bricks are specialized, but their ability to connect in infinite ways is what gives rise to emergent creativity and system-level engineering.

A concept map representing 'Collective Specialized Intelligence'. Central nodes are labeled 'Module: Logic', 'Module: Vision', 'Module: Physics', and 'Module: Planning'. These are interconnected by a 'Communication Layer'. The map shows that while each node is specialized, the 'Emergent Intelligence' (represented by a glowing aura around the network) arises from their interaction. Clean, modern network graph style.

A Safe Path to Super-Intelligence

The greatest risk in AI safety isn’t a “rogue Skynet” but the naive specification of goals, such as asking an AI to “end world hunger” without defining the constraints.

A safer approach involves empirical reward estimation: observing how humans currently make decisions and modeling that stationary distribution of actions and outcomes. Instead of writing a reward function by hand, we can use Maximum Entropy Inverse Reinforcement Learning to capture the current state of human policy and then make small, controlled perturbations to improve it. This ensures the AI remains a partner that improves our understanding of the world, rather than a crutch that leads to human infeeblement.


Key Takeaways

Agency is not a mystical quality but a measurable degree of computational sophistication. By focusing on the internal states of models—specifically their ability to simulate counterfactuals and plan for future states—we can move away from “impulse-response” machines toward true autonomous agents. This transition is supported by energy-based modeling, which provides a more robust mathematical framework for understanding how systems “settle” into the most logical explanations for the data they observe.

The future of AI will be defined by modularity and specialization. Rather than a single “god-like” algorithm, we are creating a technological ecosystem that mimics the evolutionary history of the brain. By emphasizing experimental design and self-supervised learning, these systems will eventually be capable of autonomous scientific discovery, poking at the world to learn its properties much like a child with a beach ball.

Safety in this new era depends on our ability to integrate with these technologies as partners. We must resist the urge to offload all thinking to machines, which risks a dystopian “couch potato” future. Instead, by using AI to automate the drudgery of empirical inquiry and agriculture—much like the tractor did a century ago—we free the human spirit to pursue more complex, satisfying, and deeply intellectual challenges.


Q&A

Q1: Why is counterfactual reasoning considered the hallmark of agency?
A: Because it demonstrates that the system isn’t just reacting to an input; it is internally simulating multiple “what if” scenarios and selecting a path based on projected outcomes, which separates a planner from a simple function.

Q2: How does an energy-based model differ from a standard neural network?
A: A standard network optimizes weights to map X to Y. An EBM adds a layer of complexity where the model must also find the minimum “energy” for its internal latent states, effectively regularizing how it represents information.

Q3: Can a computer simulation ever be a “true” agent?
A: This is a philosophical divide. From a modeling perspective, if the simulation is indistinguishable from a physical agent, it has the same degree of agency. However, some argue that physical embodiment in the real world is a necessary component for “true” agency.

Q4: What is the benefit of “joint embedding” in JEPA?
A: It allows the model to ignore noisy, irrelevant details (like individual pixels) and focus on the conceptual relationship between the input and the goal, leading to better data compression and more abstractive power.

Q5: Will AI eventually run out of challenges to solve?
A: Likely not. As soon as an algorithm masters one benchmark, humans create more complex ones. It is a perpetual game of one-upsmanship that pushes the boundaries of what we define as “intelligence.”

Q6: What is the safest way to give an AI a goal?
A: Instead of manual goal-setting, use Inverse Reinforcement Learning to observe current human behavior, estimate the reward function that produces that behavior, and then make minor, tested improvements to that distribution.

Q7: Is AGI the ultimate goal of AI research?
A: The speaker suggests that “Collective Specialized Intelligence” is a more accurate and desirable goal—a modular system where specialized agents work together, similar to how the different regions of the human brain cooperate.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts