Learning as Gradient Descent · Cognitive Psychology

Try it right now: recite your own phone number, then recite a phone number you looked up ten seconds ago. The first comes back free — no friction, no search. The second is a climb: you grope, you guess, you check. Same task, two completely different costs. That difference is everything this module is about. Learning is the slow conversion of the second kind of recall into the first.

Here is the move. A skill you don't have is a displacement — you're far from the configuration where the right action is cheap and obvious. A skill you do have is ground — you're already there, and the answer costs almost nothing. Learning, then, is not "acquiring information." It is reshaping the landscape so that the path from where-you-are back to the right answer gets shorter, steeper, and cheaper every time you walk it.

And the brain only reshapes when it's wrong. This is one of the deepest findings in the science of learning, and it is almost embarrassingly simple.

Error is the teaching signal

In the Rescorla–Wagner model (1972), an association strengthens only to the degree that an outcome was surprising. If a cue already predicts the reward, a second cue paired with it learns nothing — there is no error left to explain. This is prediction error: the gap between what you expected and what happened. Wolfram Schultz and colleagues later found a physical substrate — dopamine neurons fire not for reward, but for reward you didn't see coming. Expected reward, no signal. Better than expected, a burst. Worse than expected, a dip. Dopamine is broadcasting the size of your mistake.

In displacement terms: error is the felt distance from ground. A correct prediction means you're already home — nothing to learn, no signal. A wrong one means you're displaced, and the size of the surprise sets the size of the correction the brain is willing to make. No error, no learning. A lesson that never surprises you teaches nothing, no matter how long you sit with it.

Difficulty you want

This is why the easy way fails. Robert Bjork's research on desirable difficulties shows that the conditions that make practice feel productive — rereading, cramming, staying on one topic — build shallow, fast-fading gradients. The conditions that feel harder build durable ones. Three of them:

Retrieval over review. Pulling an answer out of memory — the testing effect — carves a deeper return path than putting it in again. Spacing over massing. Letting the memory partly fade before you reload it forces a real climb each time, and the climb is what strengthens. Interleaving over blocking. Mixing problem types forces you to first identify which path you're on, instead of walking a path already selected for you. Each of these works by manufacturing a little error and a little effort — a real displacement to return from.

The plateau, and the path going quiet

Sometimes you stop improving while still practicing hard. That's a plateau — a local attractor. Your current method has reached the bottom of its own little valley, and getting better means climbing back out and over a ridge into a deeper one. Plateaus break when you change the method, not when you add hours.

And the endpoint of all this is automaticity. The expert isn't displaced where the novice is. The path that once cost full conscious effort now runs almost for free, below awareness. Skill is the path becoming cheap.

What you'll be able to do

Diagnose why a study session felt good but taught nothing — and swap it for retrieval, spacing, and interleaving.
Treat your mistakes as the signal rather than the failure, and seek the size of surprise that actually moves you.
Recognize a plateau as a local attractor and respond by changing the method, not grinding more reps.

The precise version

The rigorous layer. Optional — skip it and the practical core above still stands.

Let the cognitive ground state $S^0_{cog}$ be the configuration in which the correct response is produced at minimum cost. A not-yet-learned skill leaves the system at displacement $\xi_{cog} > 0$: the right action is far, and reaching it incurs an instantaneous cost $D_{cog}(\xi)$ — working-memory load, attentional effort, metabolic draw. Recall that for a living system the ground is never free, $D_{cog}(0) = \theta > 0$; even the expert idles above zero.

Learning is descent in this cost landscape, and the teaching signal is the gradient itself. Define prediction error $\delta$ as the discrepancy between expected and actual outcome. Rescorla–Wagner updates the association in proportion to $\delta$; Schultz's dopamine signal encodes $\delta$ directly. When $\delta = 0$ you sit at a minimum and nothing changes — no surprise, no descent. The brain reshapes the landscape only where the slope, $\delta$, is nonzero.

Over an episode, the price paid is $\Phi_{cog} = \int D_{cog}\,dt$. Here is the desirable-difficulty paradox stated cleanly: practice that minimizes $\Phi_{cog}$ in the moment (massed, blocked rereading) produces a shallow gradient that decays fast. Practice that raises $\Phi_{cog}$ now — forcing each retrieval to begin from genuine displacement (spacing, interleaving, testing) — steepens the return gradient so that future episodes start nearer ground. You pay more per rep to lower the lifetime integral.

A plateau is a local minimum: $\nabla D_{cog} \approx 0$ along every direction your current method explores, while a deeper basin sits beyond a ridge it never probes. Escaping requires perturbing the method to expose a new descent direction. Automaticity is the limit in which the once-costly return path approaches the floor — the trajectory back to $S^0_{cog}$ approaches cost $\theta$, the irreducible idle.

Worked example

You have two weeks for fifty vocabulary pairs. Plan A: eight hours of rereading the list, blocked by chapter. It feels fluent by hour two — low in-session $\Phi_{cog}$, low $\delta$, almost no surprise — and by exam day half has evaporated, because a shallow gradient was all you ever built.

Plan B: the same eight hours as spaced self-tests, words interleaved out of order, each session begun after enough delay that you genuinely fail a few. Every failure is a nonzero $\delta$ — a real displacement the brain corrects. It feels worse the whole way and the in-session cost is higher, but each retrieval steepens the return path, and recall holds. Same hours, opposite landscapes. The discomfort was the learning.

Exercises

Take something you're currently studying by rereading. Convert one session into closed-book retrieval: read once, then write everything you can from memory before checking. Note where you failed — those gaps are your $\delta$, the only places learning happened.
For one skill, schedule three short practices spaced across a week instead of one long block, and deliberately interleave two or three sub-types. After a week, compare retention against something you crammed.
(Open-ended.) Find a plateau in your own life — a skill that stopped improving despite steady effort. Describe the local attractor you're stuck in, then name one change of method (not more hours) that would force a new descent direction.

Sources

Rincón, D., alice, & clöe (2026). Cognitive Displacement: A Planck Scale for Human Understanding.
The Displacement Framework.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Classical Conditioning II: Current Research and Theory (pp. 64–99).
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599.
Bjork, R. A., & Bjork, E. L. — work on desirable difficulties, including the testing effect, spacing, and interleaving.