- Rebuilding proof states eats most branch time
- Import loading alone takes about 60 s per branch
- Snapshotting reuses one live proof state across branches
- Speedups reach 5.6–50× on miniF2F-v2
When a theorem prover keeps rebuilding the same proof state, parallel search wastes most of its time. In Lean 4 with Mathlib, each branch normally reloads imports and re-runs elaboration, Lean’s type-checking step that resolves implicit details. The paper estimates import loading at about 60 seconds per branch and theorem-body elaboration at 18 to 735 seconds, and says those two costs make up more than 99% of per-branch wall time. The fix is proof-state snapshotting: capture the elaborated proof state once, then reuse it across branches instead of reconstructing it again and again. The authors implement this with a small extension to the Lean 4 language server so tactic branches can fork from the same live state. On 48 miniF2F-v2 problems, the approach delivers a 5.6 to 50 times wall-time speedup over the standard fallback, with an average 14 times speedup and a median of 9.7 times across 45 hand-crafted benchmarks. The gains grow as the number of branches grows. The paper argues this is complementary to import-level caching, which avoids repeated import loading but not theorem-body elaboration, and says the patched Lean binary and Snapshot-DSP pipeline will be released open source.
Sixty seconds can disappear before a single tactic branch starts proving anything. In Lean 4 with Mathlib, each branch often reloads imports first, then reruns theorem-body elaboration, the check that builds the theorem context and resolves the missing details needed to reach the goal. Those two chores can eat more than 99% of per-branch wall time, so the expensive part is not the tactic itself; it is the rebuild. That matters because proof search works like a trial-and-error game: the more branches you try, the more you pay for the same setup again and again. Snapshotting attacks that waste by keeping one proof state alive and handing it to every branch.
Why the proof search is wasting time
Across 48 miniF2F-v2 problems, Snapshot-DSP cut wall time by 5.6 to 50 times compared with the standard fallback. The average speedup across the 45 hand-crafted prove-phase benchmarks was 14×, and the median was 9.7×. The gain grows as branch count rises, which fits the method: the more often search would normally rebuild the same state, the more it saves by reusing it. Tactic execution itself usually takes only a few milliseconds to 500 milliseconds per branch in typical cases, with a 95th percentile of 289 ms, although aesop can take several seconds on hard goals. That leaves elaboration overhead as the real drag, and snapshotting removes it.
on 48 miniF2F-v2 problems
standard fallback- Import loading deserializes pre-compiled libraries and costs about 60 seconds per branch.
- Theorem-body elaboration re-checks the context up to the goal and can take 18 to 735 seconds.
- Tactic execution itself usually takes only a few milliseconds to 500 milliseconds, so it is not the bottleneck.
“Together, these account for >99% of per-branch wall time, making portfolio-based search impractical at scale.”
“Keep one proof state alive, then branch from it.”
How snapshotting keeps one state alive
The trick is not to make Lean prove faster from scratch. It is to stop making it start from scratch. Snapshotting captures the fully elaborated proof state once, after the theorem context has already been checked, and then forks lightweight branches from that saved state inside a small extension to the Lean 4 language server. That shift matters because the expensive work sits before the real tactic search begins. If a portfolio search tries several candidate tactics for the same hole, each candidate can inherit the same prepared state instead of paying for import loading and theorem-body elaboration all over again. In practice, branching turns from repeated reconstruction into direct reuse.
Why the speedup matters
That difference changes the economics of proof search. Portfolio-based search only makes sense when each extra branch is cheap enough to try in parallel, because otherwise the setup work overwhelms the search itself. Snapshotting makes the branch cost mostly about the branch’s own tactic run, not about redoing the theorem context, so the search can spread out without paying the same toll each time. The gain is also complementary to import-level caching such as Kimina Lean Server, which avoids repeated import loading but still leaves theorem-body elaboration in place. Snapshotting reaches deeper: it cuts both major sources of overhead within a theorem, which is why it can scale parallel proof search more cleanly.
What still needs testing
The next test is the hardest miniF2F-v2 goals, especially the ones where aesop already takes several seconds. Those are the cases where tactic work starts to compete with setup time, so the size of the gain matters more than the average. If snapshotting still wins there, the old habit of reconstructing proof states for every fork starts to look wasteful rather than inevitable. The patched Lean binary and Snapshot-DSP pipeline will be released as open source, which makes that live-state trick easier to try on other Lean 4 searches.

Comments