Days

Paper summary

High-level summary of the LeanGuard/Days research narrative.

LeanGuard overview: the simulator emits a per-transition trace certificate and a trusted Lean checker replays it.

Core idea: trace certificates + trusted replay

LeanGuard follows an "untrusted producer / trusted checker" workflow:

  1. Days emits a per-transition trace (one row per protocol transition), including a deterministic snapshot of post-state.
  2. A Lean 4 checker canonicalizes the trace (typically sorting by (time_ns, event_id) and rejecting duplicates).
  3. The checker replays an executable reference model and returns ACCEPT or REJECT with the first counterexample row.

See:

Why canonicalization matters

Because Days can run many models concurrently, the physical file order of trace rows is not treated as semantically meaningful. Canonicalization is part of the specification: the checker defines what ordering is acceptable, and the simulator only needs to log enough information to support that ordering.

Scaling beyond one run: coverage-guided generation

The paper also describes a test generation loop that uses checkers as both:

  • a correctness oracle (ACCEPT/REJECT + first failing step)
  • a semantic coverage sensor (which spec-level branches/obligations were exercised)

The generator mutates scenarios, keeps those that expand coverage, and minimizes failures into stable regression tests.

See:

Evaluation (high level)

The paper includes an experimental evaluation of:

  • trace checking cost (runtime/memory) on a suite of protocol traces
  • coverage-guided test generation behavior on a prototype corpus

On this page