Why failover-based implementations are not enough Redis分布式锁实现 SET resource_name my_random_value NX PX 30000

核心

    SET resource_name my_random_value NX PX 30000

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

Distributed locks with Redis – Redis https://redis.io/topics/distlock

There is an obvious race condition with this model:

Client A acquires the lock in the master.
The master crashes before the write to the key is transmitted to the slave.
The slave gets promoted to master.
Client B acquires the lock to the same resource A already holds a lock for. SAFETY VIOLATION!

Redlock：Redis分布式锁最牛逼的实现 - 简书 https://www.jianshu.com/p/7e47a4503b87

Distributed locks with Redis

Distributed locks are a very useful primitive in many environments where different processes must operate with shared resources in a mutually exclusive way.

There are a number of libraries and blog posts describing how to implement a DLM (Distributed Lock Manager) with Redis, but every library uses a different approach, and many use a simple approach with lower guarantees compared to what can be achieved with slightly more complex designs.

This page is an attempt to provide a more canonical algorithm to implement distributed locks with Redis. We propose an algorithm, called Redlock, which implements a DLM which we believe to be safer than the vanilla single instance approach. We hope that the community will analyze it, provide feedback, and use it as a starting point for the implementations or more complex or alternative designs.

Implementations

Before describing the algorithm, here are a few links to implementations already available that can be used for reference.

Redlock-rb (Ruby implementation). There is also a fork of Redlock-rb that adds a gem for easy distribution and perhaps more.
Redlock-py (Python implementation).
Pottery (Python implementation).
Aioredlock (Asyncio Python implementation).
Redlock-php (PHP implementation).
PHPRedisMutex (further PHP implementation)
cheprasov/php-redis-lock (PHP library for locks)
rtckit/react-redlock (Async PHP implementation)
Redsync (Go implementation).
Redisson (Java implementation).
Redis::DistLock (Perl implementation).
Redlock-cpp (C++ implementation).
Redlock-cs (C#/.NET implementation).
RedLock.net (C#/.NET implementation). Includes async and lock extension support.
ScarletLock (C# .NET implementation with configurable datastore)
Redlock4Net (C# .NET implementation)
node-redlock (NodeJS implementation). Includes support for lock extension.

Safety and Liveness guarantees

We are going to model our design with just three properties that, from our point of view, are the minimum guarantees needed to use distributed locks in an effective way.

Safety property: Mutual exclusion. At any given moment, only one client can hold a lock.
Liveness property A: Deadlock free. Eventually it is always possible to acquire a lock, even if the client that locked a resource crashes or gets partitioned.
Liveness property B: Fault tolerance. As long as the majority of Redis nodes are up, clients are able to acquire and release locks.

Why failover-based implementations are not enough

To understand what we want to improve, let’s analyze the current state of affairs with most Redis-based distributed lock libraries.

The simplest way to use Redis to lock a resource is to create a key in an instance. The key is usually created with a limited time to live, using the Redis expires feature, so that eventually it will get released (property 2 in our list). When the client needs to release the resource, it deletes the key.

Superficially this works well, but there is a problem: this is a single point of failure in our architecture. What happens if the Redis master goes down? Well, let’s add a slave! And use it if the master is unavailable. This is unfortunately not viable. By doing so we can’t implement our safety property of mutual exclusion, because Redis replication is asynchronous.

There is an obvious race condition with this model:

Client A acquires the lock in the master.
The master crashes before the write to the key is transmitted to the slave.
The slave gets promoted to master.
Client B acquires the lock to the same resource A already holds a lock for. SAFETY VIOLATION!

Sometimes it is perfectly fine that under special circumstances, like during a failure, multiple clients can hold the lock at the same time. If this is the case, you can use your replication based solution. Otherwise we suggest to implement the solution described in this document.

Correct implementation with a single instance

Before trying to overcome the limitation of the single instance setup described above, let’s check how to do it correctly in this simple case, since this is actually a viable solution in applications where a race condition from time to time is acceptable, and because locking into a single instance is the foundation we’ll use for the distributed algorithm described here.

To acquire the lock, the way to go is the following:

    SET resource_name my_random_value NX PX 30000

The command will set the key only if it does not already exist (NX option), with an expire of 30000 milliseconds (PX option). The key is set to a value “myrandomvalue”. This value must be unique across all clients and all lock requests.

Basically the random value is used in order to release the lock in a safe way, with a script that tells Redis: remove the key only if it exists and the value stored at the key is exactly the one I expect to be. This is accomplished by the following Lua script:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

This is important in order to avoid removing a lock that was created by another client. For example a client may acquire the lock, get blocked in some operation for longer than the lock validity time (the time at which the key will expire), and later remove the lock, that was already acquired by some other client. Using just DEL is not safe as a client may remove the lock of another client. With the above script instead every lock is “signed” with a random string, so the lock will be removed only if it is still the one that was set by the client trying to remove it.

What should this random string be? I assume it’s 20 bytes from /dev/urandom, but you can find cheaper ways to make it unique enough for your tasks. For example a safe pick is to seed RC4 with /dev/urandom, and generate a pseudo random stream from that. A simpler solution is to use a combination of unix time with microseconds resolution, concatenating it with a client ID, it is not as safe, but probably up to the task in most environments.

The time we use as the key time to live, is called the “lock validity time”. It is both the auto release time, and the time the client has in order to perform the operation required before another client may be able to acquire the lock again, without technically violating the mutual exclusion guarantee, which is only limited to a given window of time from the moment the lock is acquired.

So now we have a good way to acquire and release the lock. The system, reasoning about a non-distributed system composed of a single, always available, instance, is safe. Let’s extend the concept to a distributed system where we don’t have such guarantees.

The Redlock algorithm

In the distributed version of the algorithm we assume we have N Redis masters. Those nodes are totally independent, so we don’t use replication or any other implicit coordination system. We already described how to acquire and release the lock safely in a single instance. We take for granted that the algorithm will use this method to acquire and release the lock in a single instance. In our examples we set N=5, which is a reasonable value, so we need to run 5 Redis masters on different computers or virtual machines in order to ensure that they’ll fail in a mostly independent way.

In order to acquire the lock, the client performs the following operations:

It gets the current time in milliseconds.
It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances. During step 2, when setting the lock in each instance, the client uses a timeout which is small compared to the total lock auto-release time in order to acquire it. For example if the auto-release time is 10 seconds, the timeout could be in the ~ 5-50 milliseconds range. This prevents the client from remaining blocked for a long time trying to talk with a Redis node which is down: if an instance is not available, we should try to talk with the next instance ASAP.
The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.
If the lock was acquired, its validity time is considered to be the initial validity time minus the time elapsed, as computed in step 3.
If the client failed to acquire the lock for some reason (either it was not able to lock N/2+1 instances or the validity time is negative), it will try to unlock all the instances (even the instances it believed it was not able to lock).

Is the algorithm asynchronous?

The algorithm relies on the assumption that while there is no synchronized clock across the processes, still the local time in every process flows approximately at the same rate, with an error which is small compared to the auto-release time of the lock. This assumption closely resembles a real-world computer: every computer has a local clock and we can usually rely on different computers to have a clock drift which is small.

At this point we need to better specify our mutual exclusion rule: it is guaranteed only as long as the client holding the lock will terminate its work within the lock validity time (as obtained in step 3), minus some time (just a few milliseconds in order to compensate for clock drift between processes).

For more information about similar systems requiring a bound clock drift, this paper is an interesting reference: Leases: an efficient fault-tolerant mechanism for distributed file cache consistency.

Retry on failure

When a client is unable to acquire the lock, it should try again after a random delay in order to try to desynchronize multiple clients trying to acquire the lock for the same resource at the same time (this may result in a split brain condition where nobody wins). Also the faster a client tries to acquire the lock in the majority of Redis instances, the smaller the window for a split brain condition (and the need for a retry), so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.

It is worth stressing how important it is for clients that fail to acquire the majority of locks, to release the (partially) acquired locks ASAP, so that there is no need to wait for key expiry in order for the lock to be acquired again (however if a network partition happens and the client is no longer able to communicate with the Redis instances, there is an availability penalty to pay as it waits for key expiration).

Releasing the lock

Releasing the lock is simple and involves just releasing the lock in all instances, whether or not the client believes it was able to successfully lock a given instance.

Safety arguments

Is the algorithm safe? We can try to understand what happens in different scenarios.

To start let’s assume that a client is able to acquire the lock in the majority of instances. All the instances will contain a key with the same time to live. However, the key was set at different times, so the keys will also expire at different times. But if the first key was set at worst at time T1 (the time we sample before contacting the first server) and the last key was set at worst at time T2 (the time we obtained the reply from the last server), we are sure that the first key to expire in the set will exist for at least MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT. All the other keys will expire later, so we are sure that the keys will be simultaneously set for at least this time.

During the time that the majority of keys are set, another client will not be able to acquire the lock, since N/2+1 SET NX operations can’t succeed if N/2+1 keys already exist. So if a lock was acquired, it is not possible to re-acquire it at the same time (violating the mutual exclusion property).

However we want to also make sure that multiple clients trying to acquire the lock at the same time can’t simultaneously succeed.

If a client locked the majority of instances using a time near, or greater, than the lock maximum validity time (the TTL we use for SET basically), it will consider the lock invalid and will unlock the instances, so we only need to consider the case where a client was able to lock the majority of instances in a time which is less than the validity time. In this case for the argument already expressed above, for MIN_VALIDITY no client should be able to re-acquire the lock. So multiple clients will be able to lock N/2+1 instances at the same time (with "time" being the end of Step 2) only when the time to lock the majority was greater than the TTL time, making the lock invalid.

Are you able to provide a formal proof of safety, point to existing algorithms that are similar, or find a bug? That would be greatly appreciated.

Liveness arguments

The system liveness is based on three main features:

The auto release of the lock (since keys expire): eventually keys are available again to be locked.
The fact that clients, usually, will cooperate removing the locks when the lock was not acquired, or when the lock was acquired and the work terminated, making it likely that we don’t have to wait for keys to expire to re-acquire the lock.
The fact that when a client needs to retry a lock, it waits a time which is comparably greater than the time needed to acquire the majority of locks, in order to probabilistically make split brain conditions during resource contention unlikely.

However, we pay an availability penalty equal to TTL time on network partitions, so if there are continuous partitions, we can pay this penalty indefinitely. This happens every time a client acquires a lock and gets partitioned away before being able to remove the lock.

Basically if there are infinite continuous network partitions, the system may become not available for an infinite amount of time.

Performance, crash-recovery and fsync

Many users using Redis as a lock server need high performance in terms of both latency to acquire and release a lock, and number of acquire / release operations that it is possible to perform per second. In order to meet this requirement, the strategy to talk with the N Redis servers to reduce latency is definitely multiplexing (or poor man's multiplexing, which is, putting the socket in non-blocking mode, send all the commands, and read all the commands later, assuming that the RTT between the client and each instance is similar).

However there is another consideration to do about persistence if we want to target a crash-recovery system model.

Basically to see the problem here, let’s assume we configure Redis without persistence at all. A client acquires the lock in 3 of 5 instances. One of the instances where the client was able to acquire the lock is restarted, at this point there are again 3 instances that we can lock for the same resource, and another client can lock it again, violating the safety property of exclusivity of lock.

If we enable AOF persistence, things will improve quite a bit. For example we can upgrade a server by sending SHUTDOWN and restarting it. Because Redis expires are semantically implemented so that virtually the time still elapses when the server is off, all our requirements are fine. However everything is fine as long as it is a clean shutdown. What about a power outage? If Redis is configured, as by default, to fsync on disk every second, it is possible that after a restart our key is missing. In theory, if we want to guarantee the lock safety in the face of any kind of instance restart, we need to enable fsync=always in the persistence setting. This in turn will totally ruin performances to the same level of CP systems that are traditionally used to implement distributed locks in a safe way.

However things are better than what they look like at a first glance. Basically the algorithm safety is retained as long as when an instance restarts after a crash, it no longer participates to any currently active lock, so that the set of currently active locks when the instance restarts, were all obtained by locking instances other than the one which is rejoining the system.

To guarantee this we just need to make an instance, after a crash, unavailable for at least a bit more than the max TTL we use, which is, the time needed for all the keys about the locks that existed when the instance crashed, to become invalid and be automatically released.

Using delayed restarts it is basically possible to achieve safety even without any kind of Redis persistence available, however note that this may translate into an availability penalty. For example if a majority of instances crash, the system will become globally unavailable for TTL (here globally means that no resource at all will be lockable during this time).

Making the algorithm more reliable: Extending the lock

If the work performed by clients is composed of small steps, it is possible to use smaller lock validity times by default, and extend the algorithm implementing a lock extension mechanism. Basically the client, if in the middle of the computation while the lock validity is approaching a low value, may extend the lock by sending a Lua script to all the instances that extends the TTL of the key if the key exists and its value is still the random value the client assigned when the lock was acquired.

The client should only consider the lock re-acquired if it was able to extend the lock into the majority of instances, and within the validity time (basically the algorithm to use is very similar to the one used when acquiring the lock).

However this does not technically change the algorithm, so the maximum number of lock reacquisition attempts should be limited, otherwise one of the liveness properties is violated.

Want to help?

If you are into distributed systems, it would be great to have your opinion / analysis. Also reference implementations in other languages could be great.

Thanks in advance!

Analysis of Redlock

Martin Kleppmann analyzed Redlock here. I disagree with the analysis and posted my reply to his analysis here.

《Redis官方文档》用Redis构建分布式锁

My-Interpretations/distlock-zh.md at master · log4leo/My-Interpretations https://github.com/log4leo/My-Interpretations/blob/master/Redis/distlock-zh.md

《Redis官方文档》用Redis构建分布式锁-阿里云开发者社区 https://developer.aliyun.com/article/16963

用Redis构建分布式锁

在不同进程需要互斥地访问共享资源时，分布式锁是一种非常有用的技术手段。有很多三方库和文章描述如何用Redis实现一个分布式锁管理器，但是这些库实现的方式差别很大，而且很多简单的实现其实只需采用稍微增加一点复杂的设计就可以获得更好的可靠性。这篇文章的目的就是尝试提出一种官方权威的用Redis实现分布式锁管理器的算法，我们把这个算法称为RedLock，我们相信这个算法会比一般的普通方法更加安全可靠。我们也希望社区能一起分析这个算法，提供一些反馈，然后我们以此为基础，来设计出更加复杂可靠的算法，或者更好的新算法。

实现

在描述具体的算法之前，下面是已经实现了的项目可以作为参考： Redlock-rb (Ruby实现)。还有一个Redlock-rb的分支，添加了一些特性使得实现分布式锁更简单

Redlock-py (Python 实现).
Redlock-php (PHP 实现).
PHPRedisMutex (PHP 更完整的实现)
Redsync.go (Go 实现).
Redisson (Java 实现).
Redis::DistLock (Perl 实现).
Redlock-cpp (C++ 实现).
Redlock-cs (C#/.NET 实现).
node-redlock (NodeJS 实现). Includes support for lock extension.

安全和可靠性保证

在描述我们的设计之前，我们想先提出三个属性，这三个属性在我们看来，是实现高效分布式锁的基础。

安全属性：互斥，不管任何时候，只有一个客户端能持有同一个锁。
效率属性A：不会死锁，最终一定会得到锁，就算一个持有锁的客户端宕掉或者发生网络分区。
效率属性B：容错，只要大多数Redis节点正常工作，客户端应该都能获取和释放锁。

为什么基于故障切换的方案不够好

为了理解我们想要提高的到底是什么，我们先看下当前大多数基于Redis的分布式锁三方库的现状。用Redis来实现分布式锁最简单的方式就是在实例里创建一个键值，创建出来的键值一般都是有一个超时时间的（这个是Redis自带的超时特性），所以每个锁最终都会释放（参见前文属性2）。而当一个客户端想要释放锁时，它只需要删除这个键值即可。表面来看，这个方法似乎很管用，但是这里存在一个问题：在我们的系统架构里存在一个单点故障，如果Redis的master节点宕机了怎么办呢？有人可能会说：加一个slave节点！在master宕机时用slave就行了！但是其实这个方案明显是不可行的，因为这种方案无法保证第1个安全互斥属性，因为Redis的复制是异步的。总的来说，这个方案里有一个明显的竞争条件（race condition），举例来说：

客户端A在master节点拿到了锁。
master节点在把A创建的key写入slave之前宕机了。
slave变成了master节点 4.B也得到了和A还持有的相同的锁（因为原来的slave里还没有A持有锁的信息）

当然，在某些特殊场景下，前面提到的这个方案则完全没有问题，比如在宕机期间，多个客户端允许同时都持有锁，如果你可以容忍这个问题的话，那用这个基于复制的方案就完全没有问题，否则的话我们还是建议你采用这篇文章里接下来要描述的方案。

采用单实例的正确实现

在讲述如何用其他方案突破单实例方案的限制之前，让我们先看下是否有什么办法可以修复这个简单场景的问题，因为这个方案其实如果可以忍受竞争条件的话是有望可行的，而且单实例来实现分布式锁是我们后面要讲的算法的基础。要获得锁，要用下面这个命令： SET resource_name my_random_value NX PX 30000 这个命令的作用是在只有这个key不存在的时候才会设置这个key的值（NX选项的作用），超时时间设为30000毫秒（PX选项的作用）这个key的值设为“my_random_value”。这个值必须在所有获取锁请求的客户端里保持唯一。基本上这个随机值就是用来保证能安全地释放锁，我们可以用下面这个Lua脚本来告诉Redis：删除这个key当且仅当这个key存在而且值是我期望的那个值。

if redis.call("get",KEYS[1]) == ARGV[1] then
        return redis.call("del",KEYS[1])
    else
        return 0
    end

这个很重要，因为这可以避免误删其他客户端得到的锁，举个例子，一个客户端拿到了锁，被某个操作阻塞了很长时间，过了超时时间后自动释放了这个锁，然后这个客户端之后又尝试删除这个其实已经被其他客户端拿到的锁。所以单纯的用DEL指令有可能造成一个客户端删除了其他客户端的锁，用上面这个脚本可以保证每个客户单都用一个随机字符串’签名’了，这样每个锁就只能被获得锁的客户端删除了。

这个随机字符串应该用什么生成呢？我假设这是从/dev/urandom生成的20字节大小的字符串，但是其实你可以有效率更高的方案来保证这个字符串足够唯一。比如你可以用RC4加密算法来从/dev/urandom生成一个伪随机流。还有更简单的方案，比如用毫秒的unix时间戳加上客户端id，这个也许不够安全，但是也许在大多数环境下已经够用了。

key值的超时时间，也叫做”锁有效时间”。这个是锁的自动释放时间，也是一个客户端在其他客户端能抢占锁之前可以执行任务的时间，这个时间从获取锁的时间点开始计算。所以现在我们有很好的获取和释放锁的方式，在一个非分布式的、单点的、保证永不宕机的环境下这个方式没有任何问题，接下来我们看看无法保证这些条件的分布式环境下我们该怎么做。

Redlock算法

在分布式版本的算法里我们假设我们有N个Redis master节点，这些节点都是完全独立的，我们不用任何复制或者其他隐含的分布式协调算法。我们已经描述了如何在单节点环境下安全地获取和释放锁。因此我们理所当然地应当用这个方法在每个单节点里来获取和释放锁。在我们的例子里面我们把N设成5，这个数字是一个相对比较合理的数值，因此我们需要在不同的计算机或者虚拟机上运行5个master节点来保证他们大多数情况下都不会同时宕机。一个客户端需要做如下操作来获取锁：

1.获取当前时间（单位是毫秒）。

2.轮流用相同的key和随机值在N个节点上请求锁，在这一步里，客户端在每个master上请求锁时，会有一个和总的锁释放时间相比小的多的超时时间。比如如果锁自动释放时间是10秒钟，那每个节点锁请求的超时时间可能是5-50毫秒的范围，这个可以防止一个客户端在某个宕掉的master节点上阻塞过长时间，如果一个master节点不可用了，我们应该尽快尝试下一个master节点。

3.客户端计算第二步中获取锁所花的时间，只有当客户端在大多数master节点上成功获取了锁（在这里是3个），而且总共消耗的时间不超过锁释放时间，这个锁就认为是获取成功了。

4.如果锁获取成功了，那现在锁自动释放时间就是最初的锁释放时间减去之前获取锁所消耗的时间。

5.如果锁获取失败了，不管是因为获取成功的锁不超过一半（N/2+1)还是因为总消耗时间超过了锁释放时间，客户端都会到每个master节点上释放锁，即便是那些他认为没有获取成功的锁。

这个算法是否是异步的？

这个算法是基于一个假设：虽然不存在可以跨进程的同步时钟，但是不同进程时间都是以差不多相同的速度前进，这个假设不一定完全准确，但是和自动释放锁的时间长度相比不同进程时间前进速度差异基本是可以忽略不计的。这个假设就好比真实世界里的计算机：每个计算机都有本地时钟，但是我们可以说大部分情况下不同计算机之间的时间差是很小的。现在我们需要更细化我们的锁互斥规则，只有当客户端能在T时间内完成所做的工作才能保证锁是有效的（详见算法的第3步），T的计算规则是锁失效时间T1减去一个用来补偿不同进程间时钟差异的delta值（一般只有几毫秒而已）如果想了解更多基于有限时钟差异的类似系统，可以参考这篇有趣的文章：《Leases: an efficient fault-tolerant mechanism for distributed file cache consistency.》

失败的重试

当一个客户端获取锁失败时，这个客户端应该在一个随机延时后进行重试，之所以采用随机延时是为了避免不同客户端同时重试导致谁都无法拿到锁的情况出现。同样的道理客户端越快尝试在大多数Redis节点获取锁，出现多个客户端同时竞争锁和重试的时间窗口越小，可能性就越低，所以最完美的情况下，客户端应该用多路传输的方式同时向所有Redis节点发送SET命令。这里非常有必要强调一下客户端如果没有在多数节点获取到锁，一定要尽快在获取锁成功的节点上释放锁，这样就没必要等到key超时后才能重新获取这个锁（但是如果网络分区的情况发生而且客户端无法连接到Redis节点时，会损失等待key超时这段时间的系统可用性）

释放锁

释放锁比较简单，因为只需要在所有节点都释放锁就行，不管之前有没有在该节点获取锁成功。

安全性的论证

这个算法到底是不是安全的呢？我们可以观察不同场景下的情况来理解这个算法为什么是安全的。开始之前，让我们假设客户端可以在大多数节点都获取到锁，这样所有的节点都会包含一个有相同存活时间的key。但是需要注意的是，这个key是在不同时间点设置的，所以这些key也会在不同的时间超时，但是我们假设最坏情况下第一个key是在T1时间设置的（客户端连接到第一个服务器时的时间），最后一个key是在T2时间设置的（客户端收到最后一个服务器返回结果的时间），从T2时间开始，我们可以确认最早超时的key至少也会存在的时间为MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT，TTL是锁超时时间、（T2-T1）是最晚获取到的锁的耗时，CLOCK_DRIFT是不同进程间时钟差异，这个是用来补偿前面的(T2-T1）。其他的key都会在这个时间点之后才会超时，所以我们可以确定这些key在这个时间点之前至少都是同时存在的。

在大多数节点的key都set了的时间段内，其他客户端无法抢占这个锁，因为在N/2+1个客户端的key已经存在的情况下不可能再在N/2+1个客户端上获取锁成功，所以如果一个锁获取成功了，就不可能同时重新获取这个锁成功（不然就违反了分布式锁互斥原则），然后我们也要确保多个客户端同时尝试获取锁时不会都同时成功。如果一个客户端获取大多数节点锁的耗时接近甚至超过锁的最大有效时间时（就是我们为SET操作设置的TTL值），那么系统会认为这个锁是无效的同时会释放这些节点上的锁，所以我们仅仅需要考虑获取大多数节点锁的耗时小于有效时间的情况。在这种情况下，根据我们前面的证明，在MIN_VALIDITY时间内，没有客户端能重新获取锁成功，所以多个客户端都能同时成功获取锁的结果，只会发生在多数节点获取锁的时间都大大超过TTL时间的情况下，实际上这种情况下这些锁都会失效。我们非常期待和欢迎有人能提供这个算法安全性的公式化证明，或者发现任何bug。

性能论证

这个系统的性能主要基于以下三个主要特征：

1.锁自动释放的特征（超时后会自动释放），一定时间后某个锁都能被再次获取。

2.客户端通常会在不再需要锁或者任务执行完成之后主动释放锁，这样我们就不用等到超时时间会再去获取这个锁。

3.当一个客户端需要重试获取锁时，这个客户端会等待一段时间，等待的时间相对来说会比我们重新获取大多数锁的时间要长一些，这样可以降低不同客户端竞争锁资源时发生死锁的概率。

然而，我们在网络分区时要损失TTL的可用性时间，所以如果网络分区持续发生，这个不可用会一直持续。这种情况在每次一个客户端获取到了锁并在释放锁之前被网络分区了时都会出现。

基本来说，如果持续的网络分区发生的话，系统也会在持续不可用。

性能、故障恢复和fsync

很多使用Redis做锁服务器的用户在获取锁和释放锁时不止要求低延时，同时要求高吞吐量，也即单位时间内可以获取和释放的锁数量。为了达到这个要求，一定会使用多路传输来和N个服务器进行通信以降低延时（或者也可以用假多路传输，也就是把socket设置成非阻塞模式，发送所有命令，然后再去读取返回的命令，假设说客户端和不同Redis服务节点的网络往返延时相差不大的话）。

然后如果我们想让系统可以自动故障恢复的话，我们还需要考虑一下信息持久化的问题。

为了更好的描述问题，我们先假设我们Redis都是配置成非持久化的，某个客户端拿到了总共5个节点中的3个锁，这三个已经获取到锁的节点中随后重启了，这样一来我们又有3个节点可以获取锁了（重启的那个加上另外两个），这样一来其他客户端又可以获得这个锁了，这样就违反了我们之前说的锁互斥原则了。

如果我们启用AOF持久化功能，情况会好很多。举例来说，我们可以发送SHUTDOWN命令来升级一个Redis服务器然后重启之，因为Redis超时时效是语义层面实现的，所以在服务器关掉期间时超时时间还是算在内的，我们所有要求还是满足了的。然后这个是基于我们做的是一次正常的shutdown，但是如果是断电这种意外停机呢？如果Redis是默认地配置成每秒在磁盘上执行一次fsync同步文件到磁盘操作，那就可能在一次重启后我们锁的key就丢失了。理论上如果我们想要在所有服务重启的情况下都确保锁的安全性，我们需要在持久化设置里设置成永远执行fsync操作，但是这个反过来又会造成性能远不如其他同级别的传统用来实现分布式锁的系统。然后问题其实并不像我们第一眼看起来那么糟糕，基本上只要一个服务节点在宕机重启后不去参与现在所有仍在使用的锁，这样正在使用的锁集合在这个服务节点重启时，算法的安全性就可以维持，因为这样就可以保证正在使用的锁都被所有没重启的节点持有。为了满足这个条件，我们只要让一个宕机重启后的实例，至少在我们使用的最大TTL时间内处于不可用状态，超过这个时间之后，所有在这期间活跃的锁都会自动释放掉。使用延时重启的策略基本上可以在不适用任何Redis持久化特性情况下保证安全性，然后要注意这个也必然会影响到系统的可用性。举个例子，如果系统里大多数节点都宕机了，那在TTL时间内整个系统都处于全局不可用状态（全局不可用的意思就是在获取不到任何锁）。

扩展锁来使得算法更可靠

如果客户端做的工作都是由一些小的步骤组成，那么就有可能使用更小的默认锁有效时间，而且扩展这个算法来实现一个锁扩展机制。基本上，客户端如果在执行计算期间发现锁快要超时了，客户端可以给所有服务实例发送一个Lua脚本让服务端延长锁的时间，只要这个锁的key还存在而且值还等于客户端获取时的那个值。客户端应当只有在失效时间内无法延长锁时再去重新获取锁（基本上这个和获取锁的算法是差不多的）然而这个并不会对从本质上改变这个算法，所以最大的重新获取锁数量应该被设置成合理的大小，不然性能必然会受到影响。

SET – Redis https://redis.io/commands/set

Available since 1.0.0.

Time complexity: O(1)

Set key to hold the string value. If key already holds a value, it is overwritten, regardless of its type. Any previous time to live associated with the key is discarded on successful SET operation.

Options

The SET command supports a set of options that modify its behavior:

EX seconds -- Set the specified expire time, in seconds.
PX milliseconds -- Set the specified expire time, in milliseconds.
EXAT timestamp-seconds -- Set the specified Unix time at which the key will expire, in seconds.
PXAT timestamp-milliseconds -- Set the specified Unix time at which the key will expire, in milliseconds.
NX -- Only set the key if it does not already exist.
XX -- Only set the key if it already exist.
KEEPTTL -- Retain the time to live associated with the key.
GET -- Return the old value stored at key, or nil when key did not exist.

Note: Since the SET command options can replace SETNX, SETEX, PSETEX, GETSET, it is possible that in future versions of Redis these commands will be deprecated and finally removed.

Return value

Simple string reply: OK if SET was executed correctly. Bulk string reply: when GET option is set, the old value stored at key, or nil when key did not exist. Null reply: a Null Bulk Reply is returned if the SET operation was not performed because the user specified the NX or XX option but the condition was not met, or if the user specified the GET option and there was no previous value for the key.

History

>= 2.6.12: Added the EX, PX, NX and XX options.
>= 6.0: Added the KEEPTTL option.
>= 6.2: Added the GET, EXAT and PXAT option.

Examples

redis> SET mykey "Hello"

"OK"

redis> GET mykey

"Hello"

redis> SET anotherkey "will expire in a minute" EX 60

"OK"

*Patterns

Note: The following pattern is discouraged in favor of the Redlock algorithm which is only a bit more complex to implement, but offers better guarantees and is fault tolerant.

The command SET resource-name anystring NX EX max-lock-time is a simple way to implement a locking system with Redis.

A client can acquire the lock if the above command returns OK (or retry after some time if the command returns Nil), and remove the lock just using DEL.

The lock will be auto-released after the expire time is reached.

It is possible to make this system more robust modifying the unlock schema as follows:

Instead of setting a fixed string, set a non-guessable large random string, called token.
Instead of releasing the lock with DEL, send a script that only removes the key if the value matches.

This avoids that a client will try to release the lock after the expire time deleting the key created by another client that acquired the lock later.

An example of unlock script would be similar to the following:

if redis.call("get",KEYS[1]) == ARGV[1]
then
    return redis.call("del",KEYS[1])
else
    return 0
end

The script should be called with EVAL ...script... 1 resource-name token-value

EVAL – Redis https://redis.io/commands/eval

EVAL script numkeys key [key ...] arg [arg ...]

EVAL and EVALSHA are used to evaluate scripts using the Lua interpreter built into Redis starting from version 2.6.0.

The first argument of EVAL is a Lua 5.1 script. The script does not need to define a Lua function (and should not). It is just a Lua program that will run in the context of the Redis server.

The second argument of EVAL is the number of arguments that follows the script (starting from the third argument) that represent Redis key names. The arguments can be accessed by Lua using the KEYS global variable in the form of a one-based array (so KEYS[1], KEYS[2], ...).

All the additional arguments should not represent key names and can be accessed by Lua using the ARGV global variable, very similarly to what happens with keys (so ARGV[1], ARGV[2], ...).

The following example should clarify what stated above:

> eval "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 key1 key2 first second
1) "key1"
2) "key2"
3) "first"
4) "second"

Note: as you can see Lua arrays are returned as Redis multi bulk replies, that is a Redis return type that your client library will likely convert into an Array type in your programming language.

It is possible to call Redis commands from a Lua script using two different Lua functions:

redis.call()
redis.pcall()

redis.call() is similar to redis.pcall(), the only difference is that if a Redis command call will result in an error, redis.call() will raise a Lua error that in turn will force EVAL to return an error to the command caller, while redis.pcall will trap the error and return a Lua table representing the error.

The arguments of the redis.call() and redis.pcall() functions are all the arguments of a well formed Redis command:

> eval "return redis.call('set','foo','bar')" 0
OK

The above script sets the key foo to the string bar. However it violates the EVAL command semantics as all the keys that the script uses should be passed using the KEYS array:

> eval "return redis.call('set',KEYS[1],'bar')" 1 foo
OK

All Redis commands must be analyzed before execution to determine which keys the command will operate on. In order for this to be true for EVAL, keys must be passed explicitly. This is useful in many ways, but especially to make sure Redis Cluster can forward your request to the appropriate cluster node.

Note this rule is not enforced in order to provide the user with opportunities to abuse the Redis single instance configuration, at the cost of writing scripts not compatible with Redis Cluster.

Lua scripts can return a value that is converted from the Lua type to the Redis protocol using a set of conversion rules.

Conversion between Lua and Redis data types

Redis return values are converted into Lua data types when Lua calls a Redis command using call() or pcall(). Similarly, Lua data types are converted into the Redis protocol when calling a Redis command and when a Lua script returns a value, so that scripts can control what EVAL will return to the client.

This conversion between data types is designed in a way that if a Redis type is converted into a Lua type, and then the result is converted back into a Redis type, the result is the same as the initial value.

In other words there is a one-to-one conversion between Lua and Redis types. The following table shows you all the conversions rules:

Redis to Lua conversion table.

Redis integer reply -> Lua number
Redis bulk reply -> Lua string
Redis multi bulk reply -> Lua table (may have other Redis data types nested)
Redis status reply -> Lua table with a single ok field containing the status
Redis error reply -> Lua table with a single err field containing the error
Redis Nil bulk reply and Nil multi bulk reply -> Lua false boolean type

Lua to Redis conversion table.

Lua number -> Redis integer reply (the number is converted into an integer)
Lua string -> Redis bulk reply
Lua table (array) -> Redis multi bulk reply (truncated to the first nil inside the Lua array if any)
Lua table with a single ok field -> Redis status reply
Lua table with a single err field -> Redis error reply
Lua boolean false -> Redis Nil bulk reply.

There is an additional Lua-to-Redis conversion rule that has no corresponding Redis to Lua conversion rule:

Lua boolean true -> Redis integer reply with value of 1.

Lastly, there are three important rules to note:

Lua has a single numerical type, Lua numbers. There is no distinction between integers and floats. So we always convert Lua numbers into integer replies, removing the decimal part of the number if any. If you want to return a float from Lua you should return it as a string, exactly like Redis itself does (see for instance the ZSCORE command).
There is no simple way to have nils inside Lua arrays, this is a result of Lua table semantics, so when Redis converts a Lua array into Redis protocol the conversion is stopped if a nil is encountered.
When a Lua table contains keys (and their values), the converted Redis reply will not include them.

RESP3 mode conversion rules: note that the Lua engine can work in RESP3 mode using the new Redis 6 protocol. In this case there are additional conversion rules, and certain conversions are also modified compared to the RESP2 mode. Please refer to the RESP3 section of this document for more information.

Here are a few conversion examples:

> eval "return 10" 0
(integer) 10

> eval "return {1,2,{3,'Hello World!'}}" 0
1) (integer) 1
2) (integer) 2
3) 1) (integer) 3
   2) "Hello World!"

> eval "return redis.call('get','foo')" 0
"bar"

The last example shows how it is possible to receive the exact return value of redis.call() or redis.pcall() from Lua that would be returned if the command was called directly.

In the following example we can see how floats and arrays containing nils and keys are handled:

> eval "return {1,2,3.3333,somekey='somevalue','foo',nil,'bar'}" 0
1) (integer) 1
2) (integer) 2
3) (integer) 3
4) "foo"

As you can see 3.333 is converted into 3, somekey is excluded, and the bar string is never returned as there is a nil before.

Helper functions to return Redis types

There are two helper functions to return Redis types from Lua.

redis.error_reply(error_string) returns an error reply. This function simply returns a single field table with the err field set to the specified string for you.
redis.status_reply(status_string) returns a status reply. This function simply returns a single field table with the ok field set to the specified string for you.

There is no difference between using the helper functions or directly returning the table with the specified format, so the following two forms are equivalent:

return {err="My Error"}
return redis.error_reply("My Error")

Atomicity of scripts

Redis uses the same Lua interpreter to run all the commands. Also Redis guarantees that a script is executed in an atomic way: no other script or Redis command will be executed while a script is being executed. This semantic is similar to the one of MULTI / EXEC. From the point of view of all the other clients the effects of a script are either still not visible or already completed.

However this also means that executing slow scripts is not a good idea. It is not hard to create fast scripts, as the script overhead is very low, but if you are going to use slow scripts you should be aware that while the script is running no other client can execute commands.

Error handling

As already stated, calls to redis.call() resulting in a Redis command error will stop the execution of the script and return an error, in a way that makes it obvious that the error was generated by a script:

> del foo
(integer) 1
> lpush foo a
(integer) 1
> eval "return redis.call('get','foo')" 0
(error) ERR Error running script (call to f_6b1bf486c81ceb7edf3c093f4c48582e38c0e791): ERR Operation against a key holding the wrong kind of value

Using redis.pcall() no error is raised, but an error object is returned in the format specified above (as a Lua table with an err field). The script can pass the exact error to the user by returning the error object returned by redis.pcall()

https://mp.weixin.qq.com/s/lrSQBK-Kihkj6994kQFpUQ

Go：分布式锁实现原理与最佳实践

Go语言中文网 2021-02-05

posted @ 2020-06-01 12:36 papering 阅读(464) 评论(0) 收藏举报

刷新页面返回顶部