Untrained CNNs Outperform Backpropagation in Early Visual Cortex

Key takeaways

Random weights beat backprop at V1 and V2
STDP led the trained models at V1
Feedback alignment lagged across early and middle areas
All five conditions converged at IT

A network with random weights can match, and even beat, a trained one in the brain’s earliest visual stages. This paper compared four learning rules—backpropagation, feedback alignment, predictive coding, and spike-timing-dependent plasticity—on the same convolutional neural network (a layered image model) and checked how closely their internal patterns matched human fMRI scans. The test set came from THINGS-fMRI: 720 stimuli seen by 3 subjects, analyzed with representational similarity analysis, or RSA, which asks whether two systems organize images in similar ways. At V1/V2, the untrained random-weights baseline scored higher than backpropagation, with ρ = 0.076 versus 0.034. STDP had the best trained score at V1, while feedback alignment was the weakest there, at ρ = 0.012. At LOC, only backpropagation reliably beat the random baseline. By IT, all five conditions converged to similar values, and no trained rule stood out. The authors say early visual alignment seems driven mainly by architecture, while learning rules matter more in the middle of the hierarchy.

At V1 and V2, the untrained baseline scored rho = 0.076. Backpropagation scored 0.034. V1 and V2 are the brain's first two visual stops. The test used 720 object pictures from THINGS-fMRI, a brain-scan set, and 3 human subjects. If you expected training to win everywhere, this is the surprise. In the earliest visual layers, a random start could look more brain-like than a trained rule.

Why a random start mattered

At V1, STDP led the trained models with rho = 0.064. Feedback alignment was the weakest at V1 with rho = 0.012. It stayed low at V2 and LOC, the object area. At LOC, only backpropagation beat the random baseline. At IT, the final visual stage, all five conditions sat in a narrow band. Their rho values ran from 0.008 to 0.014. No trained rule pulled ahead there. The main split happened earlier, not later.

0.076 vs 0.034rho

at V1/V2

untrained baseline vs backpropagation

The random baseline beat backpropagation at V1 and V2.
STDP gave the best trained score at V1.
Feedback alignment ranked last at V1, V2, and LOC.
IT erased most gaps among all five conditions.

“early visual alignment is architecture-driven”

From the abstract

How the comparison stayed fair

All models shared one convolutional network. A convolutional network is a layered image model. Backpropagation sends error signals backward through the network. Feedback alignment sends them through mismatched backward links. Predictive coding updates each layer from the gap between a guess and the input. STDP means spike-timing-dependent plasticity. It changes links when spikes line up in time. The human side came from fMRI, a brain scan that tracks blood flow. THINGS-fMRI supplied 720 stimuli from 3 subjects. RSA, or representational similarity analysis, checks whether two systems group images in the same way. The study used 224 by 224 images and 5 random seeds. A partial RSA control removed simple pixel-matching effects. That kept the brain result from riding on image look-alikes.

“The shape came first. The training rule came second.”

Seed-to-seed noise stayed small at V1 and V2. A random seed is one lucky or unlucky start of training. That makes the gap between learning rules more convincing there. Partial RSA kept the same pattern after pixel control. So the early visual result did not come from easy image overlap alone. LOC gave backpropagation a small win. IT flattened the field. The highest visual level looked like a truce, not a race.

Why the middle layers mattered

This result says early visual cortex may care more about network shape than training rule. A network's shape is its architecture, or the fixed layout of layers and links. That makes the random baseline a real test, not a throwaway. It also explains why learning rules split apart in the middle. LOC still gave backpropagation a small edge. IT erased the gap. For model builders, the first design choice may matter before any learning trick does.

The next test: all three THINGS-fMRI subjects

The next check is the same V1/V2 contrast across all three THINGS-fMRI subjects. That would tell us whether the random-baseline edge is stable person by person. If it holds, early visual match may depend more on network shape than on how the model learns. That would make many learning-rule debates less central at the first visual stage. It would also push design work toward architecture choices first. If it does not hold, the surprise may be narrower than it looks. Either way, the random network has earned a seat at the table.

Untrained CNNs Outperform Backpropagation in Early Visual Cortex

Why a random start mattered

How the comparison stayed fair

Why the middle layers mattered

The next test: all three THINGS-fMRI subjects

Authors

Provenance

Keep reading

Comments