Key takeaways
  • Separates steady traffic from sudden fluctuations
  • Uses two temporal paths for two kinds of motion
  • Masks some road links as traffic shifts
  • Tested on four real-world datasets

Traffic forecasts can make the difference between a smooth commute and a gridlocked one. The catch is that road data mixes steady rhythms, like recurring daily patterns, with sudden bursts from events and disruptions. This paper tackles that problem with ADMFormer, a transformer model (a neural network built to track relationships across data) that first separates traffic signals into dominant regularities and leftover fluctuations. It does that with a time-node adaptive gating mechanism, then sends those two parts through separate temporal branches: one for global periodic dependencies and one for high-frequency irregular changes. The model also trims the spatial links it pays attention to by using time-varying masked spatial attention, which keeps dynamic and informative connections while filtering redundant ones. On four real-world datasets, ADMFormer achieves state-of-the-art performance. That matters because traffic networks are both sparse and fast-changing, and the paper argues that treating every road connection the same can blur the very patterns forecasting systems need to see.

A morning commute can change in minutes. One road settles into the same daily rhythm, then a crash, a concert, or a rainstorm tears a hole in that pattern. ADMFormer starts from that everyday mess. Instead of treating traffic as one smooth stream, it splits the signal into what stays steady and what jumps around. It also avoids giving equal attention to every road-to-road link, because traffic networks are sparse and fast-moving. That choice matters: in traffic forecasting, the wrong connection can be worse than no connection at all, since noise spreads just as fast as useful detail. The model asks a simple question with a sharp edge — which parts of the map deserve attention right now?

Why the old one-size-fits-all view breaks down

The core claim is not just that ADMFormer predicts traffic well. It is that traffic needs two kinds of reading at once. The paper says traffic series contain stable periodic regularities and event-driven fluctuations, and those two patterns often live side by side. If a model folds them into one representation, it blurs the timing cues that repeat every day and the irregular bursts that matter most when the system gets stressed. ADMFormer answers that by first separating the signal and then giving each part its own path. On four real-world datasets, the model reaches state-of-the-art performance, which shows that this split is not cosmetic. It helps the model keep rhythm and surprise apart long enough to learn both clearly.

Two kinds of time, one changing road map

ADMFormer begins with a time-node adaptive gating mechanism. In plain terms, it does not split every sensor the same way at every moment. It decides, for each time and node, how much of the signal belongs to the dominant regular pattern and how much belongs to the leftover fluctuation. After that, a dual-branch temporal module sends those two pieces down different routes. One branch looks for global periodic dependencies, while the other watches high-frequency irregular variation. That design matters because a rush-hour wave and a sudden disruption do not behave the same way, so one temporal lens would miss part of the story. The model then adds time-varying masked spatial attention, which keeps only the more useful road interactions as traffic states change.

4real-world datasets

used for evaluation

multiple traffic benchmarks
  1. The adaptive gate separates stable regularities from residual fluctuations at each time and node.
  2. The temporal branches split long-range periodic motion from high-frequency irregular change.
  3. The masked spatial attention keeps only the road links that look useful in the current traffic state.

Traffic series contain heterogeneous temporal patterns, where stable periodic regularities coexist with event-driven fluctuations.

From the abstract

Dense all-pairs attention often introduces redundant interactions and amplifies noise.


Why the mask matters as much as the split

Traffic maps are not neat grids. Connections are sparse, and the important ones shift as conditions change. That is why ADMFormer does not rely on dense all-pairs attention, which would make every node listen to every other node and risk piling noise on top of noise. The masked spatial attention instead trims the interaction set in real time, so the model keeps the links that look informative under the current traffic state. The payoff is practical: the network can focus on dynamic dependencies rather than drown in irrelevant ones. In a field where a jam on one artery may matter more than half a dozen quiet side streets, that selectivity is the whole game.

What this could change next

The strongest consequence of ADMFormer is a better fit between the model and the road network itself. Instead of asking traffic to behave like one smooth, uniform series, the method treats it as a mix of rhythm and disruption. That makes it a good template for other spatio-temporal systems that share the same shape of problem: stable cycles mixed with sudden shocks, plus connections that come and go with the moment. The next hard test is whether the same adaptive split and time-varying mask still help when traffic patterns shift across different cities, sensor layouts, or rare disruption-heavy days. If they do, then the central trick here is not just a better transformer. It is a better way to look at moving systems.