A Bit of Theory

When you run a program on a single machine, you can rely on events happening in a definite order and every part of the program seeing the same data. If you write a value to a variable, you can immediately read that value back. None of these guarantees hold automatically in a distributed system.

Clocks and Order

Lamport1978 is one of the most influential papers in computer science. Its central insight is that clocks are unreliable in distributed systems. Two machines might disagree about the time by milliseconds or even seconds, and there is no way to synchronize them perfectly. This means we cannot use wall-clock timestamps to determine which of two events on different machines happened first.

Lamport proposed an alternative. Instead of asking, "What time did this happen?" we should ask, "What happened before what?" He defined a happens-before relation:

  1. If event A occurs before event B on the same machine, then A happens-before B.

  2. If machine X sends a message and machine Y receives it, then the send happens-before the receive.

  3. If A happens-before B and B happens-before C, then A happens-before C (i.e., the relation is transitive).

Two events are concurrent if neither happens-before the other. This does not mean they occurred at the same instant; it just means that we cannot put them in order.

To track this relation in practice, Lamport introduced logical clocks. Each machine maintains a counter; it increments the counter before each event and includes the counter's value in every message it sends. When a machine receives a message, it sets its counter to the maximum of its current value and the value in the message, then increments it. This guarantees that if A happens-before B, then A's timestamp is less than B's. Note that the converse is not true: a lower timestamp does not prove that one event happened first. This means that logical clocks capture a partial order, not a total order.

Vector clocks extend this idea to capture the full happens-before relation. Instead of a single counter, each machine maintains a vector with one entry per machine. A machine increments its own entry for each event and sends the entire vector with every message. When a machine receives a message, it takes the element-wise maximum of its vector and the incoming one. Two events are concurrent if and only if neither vector is element-wise less than or equal to the other. The G-Counter CRDT uses exactly this structure: each replica tracks its own count in a vector, and merging takes the element-wise maximum.

Consistency Models

In a single-machine program, reading a variable always returns the most recent value written to it. This property is called strong consistency, and is expensive in a distributed system because it requires all replicas to coordinate on every operation.

At the other end of the spectrum is eventual consistency, which only guarantees that if no new updates are made, all replicas will eventually converge to the same state. Eventual consistency says nothing about what a replica might return in the meantime: a read might return a stale value, a newer value, or even a value that no single client ever wrote if partial updates have been merged in some strange way. Most eventually-consistent systems behave better in practice than this worst case, but the guarantee is still very weak.

Strong eventual consistency is a middle ground: replicas that have received the same set of updates are guaranteed to be in the same state, regardless of the order in which those updates arrived. This is the consistency model that CRDTs provide. No coordination or consensus is needed, and there are no conflicts to resolve, because the data structures are designed to allow concurrent updates to be merged deterministically.

Several other models lie between strong consistency and eventual consistency. Linearizability means that every operation appears to take effect at some instant between its invocation and its response, and all operations are consistent with a single global order. This is both very useful and very expensive, since it typically requires consensus protocols like Paxos or Raft.

Sequential consistency is slightly weaker: operations appear in some total order that respects each machine's local order, but that order need not correspond to real time. Causal consistency lies between its sequential and eventual cousins: it guarantees that operations related by happens-before are seen in order by everyone, but concurrent operations may be seen in different orders by different replicas. This is closely related to Lamport's happens-before relation and is often the strongest consistency model that can be achieved without expensive global coordination.

The CAP Theorem

Gilbert2002 proved that a distributed system cannot simultaneously provide all three of the following properties:

Since network partitions are inevitable in any real distributed system, the theorem effectively forces a choice between consistency and availability. A CP system (i.e., one that is consistent and partition-tolerant) will refuse to respond rather than return stale data during a partition. Traditional relational databases with synchronous replication behave this way: if a replica cannot reach the primary, it rejects writes rather than risk inconsistency.

On the other hand, an AP system (i.e., one that is available and partition-tolerant) will always respond, but may return stale or divergent data during a partition. CRDTs are a tool for building AP systems because each replica can accept writes independently and merge state later. They sacrifice strong consistency in exchange for guaranteed availability and automatic conflict resolution. Each CRDT encodes a particular conflict-resolution policy into the data structure itself, so that no external coordination is ever required. The cost is that the policies are fixed: an LWW register always discards concurrent writes in favor of the latest timestamp, and an OR-Set always resolves a concurrent add and remove in favor of the add. Whether these policies are acceptable depends on the application.

The CAP theorem is sometimes misunderstood as a simple menu of three options. In practice, most of the time there is no partition and the system can be both consistent and available. The question is what happens during the (hopefully brief) periods when partitions occur. Modern systems often provide tunable consistency: a database might allow you to choose strong consistency for financial transactions and eventual consistency for analytics queries.