Spanner's Correctness Constraints on Transactions
1. Serializable
Same results as if transactions executed one-by-one.
Even though they may actually execute concurrently.
r/w transactions are serializable
implemented through 2PL
r/o transactions fit between r/w transactions
assign each transaction a timestamp
use timestamped snapshots to ensure this. lock-free.
a r/o transaction can see result of all commited r/w transactions whose timestamps are lower than timestamp of the r/o transaction
Example 1
T1: Wx Wy C
T2: Wx Wy C
T3: Rx Ry
if T3 reads the latest value of x and y respectively, the transaction history can not be serializable. Read skew occurs in T3.
Example 2 (Snapshot Isolation)
x@10=9 x@20=8 version history of x
y@10=11 y@20=12 version history of y
T1 @ 10: Wx Wy C
T2 @ 20: Wx Wy C
T3 @ 15: Rx Ry
"@ 10" indicates the timestamp of T1 is 10.
T3 will see x@10=9, y@10=11
Example 3 (Snapshot Isolation)
x@10=9
y@10=11 y@14=12
T1 @ 10: Wx Wy C
T2 @ 14: Wy C
T3 @ 15: Rx Ry
T3 will see x@10=9, y@14=12
2. Externally Consistent
If T1 completes before T2 starts (in real time), T2 must see T1's writes.
External consistency rules out reading stale data
Implementation
we want to ensure that
if T1 commits before T2 starts, T1's timestamp TS1 < T2's timestamp TS2
so we can ensure serializability and external consistency all together using timestamps
Assign timestamp TS of a r/w transaction T to TT.now().latest when it is ready to commit
a r/w transaction T must wait until TS < TT.now().earlist before commit (commit wait)
If T1 commits before T2 starts (in real time), and T1 is a r/w transaction
assign TS1 to TT.now().latest when T1 is ready to commit
if T2 is also a r/w transactions,
assign TS2 to TT.now().latest when T2 is ready to commit
then:
real time of T1's commit < real time of T2's start (assumption)
TS1 < real time of T1's commit (commit wait)
TS2 >= real time of event of TS2's assignment >= real time of T2's start
=> TS1 < TS2
if T2 is a r/o transaction
assign TS2 to TT.now().latest when T2 starts
then:
real time of T1's commit < real time of T2's start (assumption)
TS1 < real time of T1's commit (commit wait)
TS2 >= real time of event of TS2's assignment >= real time of T2's start
=> TS1 < TS2
T2 can perform reads to data whose timestamp less than TS2, so that T2 can see writes of r/w transactions that commit before T2 starts
Is it possible that TS1 < TS2 and T1 commits after T2 starts ?
Yes. In this case, T2 will also see T1's writes
External consistency says nothing about the case that T1 commits after T2 starts
In the Spanner paper, we know that r/o transactions can read at any replica, how to ensure that a r/o transaction T sees all writes of r/w transactions whose timestamp < TS ?
Safe wait. Before serving a read with timestamp TS, replica must ensure that there will be no writes whose timestamp < TS commits at the replica.
If the replica see a Paxos write whose timestamp >= TS, there will be no writes whose timestamp < TS due to the timestamp monotonicity invariant:
within each Paxos group, Spanner assigns timestamps to Paxos writes in monotonically increasing order
so it is safe to serve the read.
But what if there are no such writes in a relatively long time?
the read will be blocked and the latency will be high.
Solution:
if there are no prepared transaction seen by the replica, the reader must wait.
if the replica knows a lower bound on timestamps of any prepared but not yet commited transactions
Spanner ensures that commit timestamp of such a transaction >= the known lower bound
So if the lower bound >= TS2, read can be safely served
Note:
Internal consistency: within a transaction, reads observe that transaction’s most recent writes (if any)
External consistency: reads without a preceding write in transaction T1 must observe the state written by a transaction T0, such that T0 is visible to T1, and no more recent transaction wrote to that object.
Spanner does not guarantee internal consistency.
References
Spanner: Google’s Globally-Distributed Database [OSDI 2012]
JEPSEN Blog: Snapshot Isolation