Key takeaways
  • 150B neural tokens from mouse visual cortex
  • One model for prediction, decoding, forecasting
  • More data kept helping at every size tested
  • Bigger models hit a performance ceiling
  • Brain models still look data-limited

If you want brain-reading software to get smarter, this paper says the fastest path is more data, not a bigger model. The team built OmniMouse from 3.1 million neurons in the visual cortex of 73 mice, collected over 323 sessions and more than 150 billion neural tokens. Those recordings came from natural movies, images, parametric stimuli, and behavior. OmniMouse is a multi-modal, multi-task model that can flexibly handle neural prediction, behavioral decoding, neural forecasting, or any combination of the three at test time. It reached state-of-the-art performance and beat specialized baselines across nearly all evaluation settings. The surprise was in how it improved: adding more data kept helping, while increasing model size eventually hit a wall. That flips the usual AI scaling story from language and vision, where bigger models often carry the day. Here, even in the mouse visual cortex, the models still look data-limited. The authors say this kind of scaling pattern could hint at phase transitions, where larger and richer datasets unlock new capabilities.

150 billion neural tokens is the kind of feed that should make a model feel spoiled. OmniMouse got that stream from 3.1 million neurons in the visual cortex of 73 mice, collected over 323 sessions while the animals saw natural movies, images, parametric stimuli, and behaved. You would expect that kind of scale to reward a larger model first. It does not. Once the data pile grows, OmniMouse keeps improving; once the model itself gets bigger, the gains flatten. If you want brain-reading software that gets smarter, this study changes the shopping list: more experience comes before more size, which means the bottleneck sits in the recordings, not just in the network.

Why this flip matters

OmniMouse does more than one job, and that is part of its appeal. The same model can predict neural activity, decode behavior, forecast future neural activity, or handle any combination of those three at test time. Across nearly all evaluation regimes, it beat specialized baselines, so the win was not just a narrow trick on one task. The scaling curve is the real twist. As training data grew, performance rose in a steady way. As model size grew, the gains did not keep rising at the same pace and eventually saturated. In language and vision, bigger models often become the main engine of progress once data is abundant. Here, the limit showed up somewhere else, because the system still wanted more data first.

How OmniMouse turns recordings into three answers

OmniMouse trains a multi-modal, multi-task model on a huge set of recordings, so one system can learn from what the mouse saw, how it behaved, and how its neurons fired together. The data cover natural movies, images, parametric stimuli, and behavior, and the model treats them as a shared stream rather than separate islands. That is why it can switch at test time between neural prediction, behavioral decoding, neural forecasting, or a mix of the three without being rebuilt for each job. The key idea is simple enough to picture: instead of teaching three small tools one at a time, OmniMouse learns one broad map of the same brain activity and then asks that map to answer different questions.

150Bneural tokens

from 3.1M neurons

73 mice, 323 sessions
  • It predicts neural activity from visual input and behavior.
  • It decodes behavior from the same recordings.
  • It forecasts future neural activity, alone or together with the other tasks.

performance scales reliably with more data, but gains from increasing model size saturate.

From the abstract

More data kept helping, while model size hit a wall.


Why it matters for brain modeling

That reversal changes the budget conversation. If performance keeps climbing with more data, then future gains may depend less on adding parameters and more on recording from more neurons, more sessions, and more varied stimuli. OmniMouse also pushes back on the idea that mouse visual cortex is too simple to teach us much about scaling. Even in this relatively tidy system, the old AI rule does not fully hold: data still looks like the scarce ingredient. The paper also raises a bigger possibility without claiming it outright — larger and richer datasets might one day unlock phase changes in neural modeling, the same way scale has unlocked surprising abilities in other AI systems.

The next test

The most interesting follow-up is not abstract. It is whether the same curve holds when OmniMouse faces even larger mouse visual-cortex recordings with more sessions and richer sensory drive. If the data-first pattern stays intact, then the field will know where to spend its effort: on better coverage of neural activity, not on ever-larger parameter counts. If it bends or breaks, that would mark the place where scale starts to behave more like language and vision after all. For now, the surprise stays sharp: in this brain system, more data still looks like the stronger fuel.