CAP Theorem & Distributed Consensus
In any distributed system, partition tolerance is a must. Network failures will happen, and your system needs to handle them. This means that in practice, CAP theorem really boils down to a single choice: do you prioritize consistency or availability when a network partition occurs?
Once you understand that tradeoff, the next question is: how do nodes actually agree on anything? That's where consensus algorithms come in. Raft is the most approachable one, and it's the engine behind systems like etcd, Consul, and CockroachDB.
The CAP Theorem
Three Guarantees, Pick Two
Eric Brewer proposed in 2000 that a distributed data store can only provide two of three guarantees at the same time:
The “pick two” framing is a bit misleading. In any distributed system, network partitions will happen. Switches fail. Cables get cut. Cloud availability zones lose connectivity. You don't get to opt out of P.
So the real choice is: when a partition occurs, do you sacrifice consistency or availability?
Examples: HBase, ZooKeeper, etcd, MongoDB (strong reads)
Examples: Cassandra, DynamoDB, CouchDB, DNS
CP vs AP in Action
During a partition, a CP system would rather return an error than risk serving outdated data. A banking system does this: if the node handling your request can't verify your balance with the primary, it refuses the transaction.
An AP system takes the opposite approach. A social media feed keeps loading even if some backend nodes are unreachable. You might see a post from 30 seconds ago instead of the absolute latest, but the app stays responsive.
Neither is universally better. The right choice depends on what your users can tolerate: stale data or downtime.
Beyond CAP: PACELC
CAP only describes behavior during a partition. But partitions are rare. Most of the time your system is running fine. PACELC extends CAP to cover normal operation too.
The idea: if there's a Partition, choose between Availability and Consistency. Else (normal operation), choose between Latency and Consistency.
PACELC is more useful in practice because it captures the tradeoff you make every day, not just during rare failure scenarios.
Real-World Tradeoffs
The consistency vs availability decision isn't abstract. It shows up in real product decisions every day.
If two drivers see the same spot as available and both try to park, you have a conflict. The system needs to reject one of them. Showing an error is better than double-booking a spot.
You can't show a balance of $500 when the real balance is $0. An ATM that's temporarily unavailable is better than one that dispenses money it shouldn't.
If your Instagram feed is 30 seconds behind, you won't notice. But if the app refuses to load because one server is unreachable, you'll switch to another app.
Showing a product page with a slightly outdated price is better than showing an error. The inventory check at checkout is where you enforce consistency.
Google Docs lets everyone keep typing even during network hiccups. Edits merge later. Blocking all users until every keystroke is confirmed would make the product unusable.
The pattern: choose consistency when wrong data causes real-world harm. Choose availability when a slightly stale experience is better than no experience at all.
Distributed Consensus
The Problem
If you have three database replicas and a client writes a value, how do all three nodes agree on what that value is? What if one node is slow? What if the leader crashes mid-write?
Consensus is the process of getting multiple nodes to agree on a single value, even when some nodes fail or messages get lost. Without it, replicas drift apart and your system returns contradictory answers.
This is the foundation of every CP system. Leader election, log replication, distributed locks, configuration management - they all depend on consensus under the hood.
Raft: Leader Election
Raft was designed in 2014 specifically to be understandable. It breaks consensus into three sub-problems: leader election, log replication, and safety.
Every Raft cluster has exactly one leader at any time. All writes go through the leader. If the leader goes down, the remaining nodes detect the failure (via heartbeat timeouts) and elect a new one.
Election process:
The term number is critical. It acts as a logical clock. If a node receives a message with a higher term, it knows its information is outdated and steps down. This prevents stale leaders from causing split-brain scenarios.
Raft: Log Replication
Once a leader is elected, it's responsible for accepting writes and replicating them to followers. Every write becomes an entry in an ordered log.
The leader appends the entry to its own log, sends it to all followers, and waits for a majority to acknowledge. Once a majority confirms, the entry is committed and the leader responds to the client. Even if a minority of nodes are down or slow, the system keeps making progress.
This is what makes Raft a CP protocol. A write is only considered successful when a majority of nodes have it. If the leader crashes after committing, the new leader is guaranteed to have all committed entries because it needed a majority vote - and at least one of those voters has the latest data.
Other Consensus Algorithms
Paxos came first (1989, published 1998). It solves the same fundamental problem as Raft but is notoriously hard to understand and even harder to implement correctly. Most engineers who work with Paxos rely on battle-tested libraries rather than writing it from scratch.
Multi-Paxos and Viewstamped Replication extend basic Paxos for continuous operation (not just single-value consensus). Raft is essentially a more structured and teachable version of Multi-Paxos.
PBFT (Practical Byzantine Fault Tolerance) handles a different threat model: nodes that actively lie or behave maliciously. This is relevant for blockchain systems but overkill for trusted datacenter environments.
Where This Shows Up
You don't usually implement consensus yourself, but understanding it helps you reason about the systems you depend on:
The pattern is the same everywhere: a small cluster of nodes (usually 3 or 5) runs consensus to agree on state, and the rest of the system trusts that source of truth. Understanding Raft gives you a mental model for how all of these work.