Visualizing Raft Consensus and Chandy-Lamport Snapshots with Interactive Simulations

Distributed algorithms are notoriously difficult to learn from text alone. Papers and textbooks describe invariants and message sequences precisely — but the mental model only really clicks once you've watched the algorithm execute, step by step, across multiple nodes. The Process Simulation project provides two interactive browser simulations to make this concrete: one for the Raft consensus protocol, and one for the Chandy-Lamport distributed snapshot algorithm.

Why These Two Algorithms?

Raft and Chandy-Lamport represent two fundamental problems in distributed systems that every practitioner encounters. Raft solves consensus — how a cluster of nodes agrees on a sequence of values, even when individual nodes fail. It's the foundation of etcd, CockroachDB, TiKV, and dozens of other production systems. Chandy-Lamport solves a different problem: how do you take a consistent snapshot of a distributed system's global state without stopping it? This is essential for fault tolerance, checkpointing, and distributed debugging.

The Raft Simulation

The Raft simulation shows a cluster of nodes progressing through the three core phases of the protocol. In the leader election phase, you watch nodes time out, transition to candidate state, send RequestVote RPCs, and converge on a leader. In the log replication phase, you can submit commands to the leader and watch them propagate as AppendEntries messages to followers, with responses flowing back and the commit index advancing. Node failures can be injected to trigger new elections and demonstrate safety guarantees.

Visualize follower, candidate, and leader states across all nodes simultaneously
Step through individual message deliveries or run at continuous speed
Inject node failures to trigger and observe leader re-election
Watch log entries progress from proposed → replicated → committed across the cluster
Observe split-vote scenarios and their resolution

The Chandy-Lamport Simulation

The Chandy-Lamport simulation demonstrates the global snapshot algorithm on a network of communicating processes. Each process records its own local state when it initiates or receives a marker message. Channel states are recorded by capturing messages received between when a channel marker was sent and when the marker arrives. The simulation shows how consistent global state emerges from local recordings, and why the algorithm works even when processes don't share a clock.

Running the Simulations

The simulations are self-contained HTML5 files with no external dependencies. The easiest way to run them is through the hosted version:

Access the simulations at borisbesky.github.io/process-simulations. Source available at github.com/BorisBesky/process-simulations. To run locally: python3 -m http.server 8000 then open http://localhost:8000.

Who These Are For

Students working through distributed systems coursework will find these simulations complement the Raft paper and the Chandy-Lamport 1985 paper directly. Practitioners who use Raft-based systems (etcd, Consul, CockroachDB) and want a deeper intuition for what's happening under the hood under network partitions or leader failures will find the failure injection scenarios particularly useful. The simulations are also useful for technical interviews where consensus protocols are a common topic.