Skip to content

Details

Abstract
Formal verification is at a promising inflection point: thanks to AI, we can do it at scale, and because of AI, we must do it at scale.

Nada will share what we have been learning at Midspiral about proof-driven development with LLMs, drawing on a stack of tools we have built: dafny-replay, a testbed for verified web apps; lemmafit, an extension to Claude Code that streamlines the dafny-replay methodology; LemmaScript, a verification toolchain for TypeScript that uses LLMs to discharge proof obligations against Lean or Dafny models; and lemmacore, an upcoming VSCode extension that combines cloud services for AI, verification and compilation.

Several observations have emerged. Since Opus 4.5, LLMs are capable of sustained work in Dafny: given an assumed lemma, a model can go off for thirty minutes and come back with a complete proof. Proofs in practice are built incrementally with intermediate `assume`s discharged step by step. The experience in Lean hasn’t been as smooth. In LemmaScript, we got to compare the same problem statement in Dafny and Lean (with Loom/Velvet), and Dafny is almost always easier while Lean is sometimes even intractable.

Nada will use these observations to sketch what proof-driven development looks like as a discipline, and what it will take to bring it to mainstream developers, as proving recedes into the background while the strong guarantees surface.
Bio
Nada Amin is an assistant professor of computer science at Harvard SEAS, where she combines PL and AI to create systems that are correct by construction. With Fernanda Graciolli, she has recently co-founded Midspiral to bring verification to mainstream development with AI

Related topics

Mathematics
Software Development
Software Engineering

You may also like