随笔 - 1 文章 - 0 评论 - 0 阅读 - 13

[翻译][英语学习] Redis 与分布式锁

Distributed^[1] Locks with Redis

使用 Redis 的分布式锁

Implementations
Safety and Liveness Guarantees
Why Failover-based Implementations Are Not Enough
Correct Implementation with a Single Instance
The Redlock Algorithm
Is the Algorithm Asynchronous?
Retry on Failure
Releasing the Lock
Safety Arguments
Liveness Arguments
Performance, Crash Recovery and fsync
Making the algorithm more reliable: Extending the lock
Reference

A distributed lock pattern^[2] with Redis
使用 Redis 的分布式锁模式

Distributed locks are a very useful primitive^[3] in many environments where different processes must operate with shared resources in a mutually exclusive^[4] way.
分布式锁在许多环境中是一个非常有用的基本工具，在这些环境中，不同的进程必须以互斥的方式操作共享资源。

There are a number of libraries and blog posts describing how to implement^[5] a DLM (Distributed Lock Manager) with Redis, but every library uses a different approach^[6], and many use a simple approach with lower guarantees^[7] compared to what can be achieved^[8] with slightly^[9] more complex^[10] designs.
许多的库和博客描述如何使用 Redis 来实现一个 DLM（分布式锁管理），但每个库都采用不同的方法，并且有很多都是用了一个简单的方法，而这种方法相较于稍加复杂的设计来说，不太敢担保能实现(DLM)。

This page describes a more canonical^[11] algorithm^[12] to implement distributed locks with Redis.
本页描述了使用 Redis 实现分布式锁的一个更权威的算法。

We propose an algorithm, called Redlock, which implements a DLM which we believe to be safer than the vanilla^[13] single instance approach.
我们提出了一个算法，称其为“红锁”，它实现了一个DLM，并且我们相信它比普通的单实例方法更安全。

We hope that the community will analyze^[14] it, provide feedback^[15], and use it as a starting point for the implementations or more complex or alternative^[16] designs.
我们希望社区会分析它，提供反馈意见，并用它作为一个实现的或者更复杂的或者替代设计的起点。

Implementations

Before describing the algorithm, here are a few links to implementations already available that can be used for reference.
在描述算法之前，这里是一些已经可以使用的来实施的链接可以作为参考。

Redlock-rb (Ruby implementation). There is also a fork of Redlock-rb that adds a gem for easy distribution.
还有 Redlock-rb 的一个分支，添加了 gem 以便于分发。
RedisQueuedLocks (Ruby implementation).
Redlock-py (Python implementation).
Pottery (Python implementation).
Aioredlock (Asyncio Python implementation).
Redlock-php (PHP implementation).
PHPRedisMutex (further PHP implementation).
cheprasov/php-redis-lock (PHP library for locks).
rtckit/react-redlock (Async PHP implementation).
Redsync (Go implementation).
Redisson (Java implementation).
Redis::DistLock (Perl implementation).
Redlock-cpp (C++ implementation).
Redis-plus-plus (C++ implementation).
Redlock-cs (C#/.NET implementation).
RedLock.net (C#/.NET implementation). Includes async and lock extension support.
包括异步和锁定扩展支持。
ScarletLock (C# .NET implementation with configurable datastore).
具有可配置的数据存储的 C# .NET 实现。
Redlock4Net (C# .NET implementation).
node-redlock (NodeJS implementation). Includes support for lock extension.
Deno DLM (Deno implementation)
Rslock (Rust implementation). Includes async and lock extension support.

Safety and Liveness^[17] Guarantees

安全性和活性保证

We are going to model^[18] our design with just three properties that, from our point of view, are the minimum^[19] guarantees needed to use distributed locks in an effective^[20] way.
我们将只使用三个属性来对我们的设计进行建模，从我们的观点来看，这是以有效方式使用分布式锁的最小保证。

Safety property: Mutual exclusion. At any given moment, only one client can hold a lock.
安全属性：互斥。在任何给定的时刻，只有一个客户端持有锁。
Liveness property A: Deadlock free. Eventually^[21] it is always possible^[22] to acquire^[23] a lock, even if the client that locked a resource crashes^[24] or gets partitioned^[25].
活跃属性A：无死锁。最终，即使锁定资源的客户端崩溃或分区，也始终有可能获取锁。
Liveness property B: Fault tolerance^[26]. As long as the majority[^majority] of Redis nodes are up, clients are able to acquire and release^[27] locks.
活跃属性B：容错。只要大多数 Redis 节点正常运行，客户端就可以获取和释放锁。

Why Failover-based Implementations Are Not Enough

为什么基于故障转移的实施还不够

To understand what we want to improve^[28], let’s analyze the current state of affairs with most Redis-based distributed lock libraries.
了解我们想要改进的地方，让我们一起来分析一下大部分基于 Redis 的分布式锁库的当前状态。

The simplest way to use Redis to lock a resource is to create a key in an instance.
使用 Redis 锁定资源的最简单方法是在一个实例中创建 Key 。

The key is usually created with a limited time to live, using the Redis expires^[29] feature, so that eventually it will get released (property 2 in our list).
通常使用 Redis 过期功能（类似于 TTL）创建的 Key 具有有限的生存时间，因此最终它会被释放（我们列表中的属性 2）。

When the client needs to release the resource, it deletes the key.
当客户端需要释放资源时，它会删除该 Key 。

Superficially^[30] this works well, but there is a problem: this is a single point of failure in our architecture^[31].
从表面上看，这个效果很好，但有一个问题：这在我们的架构中是一个单点故障。

What happens if the Redis master goes down? Well, let’s add a replica!
如果 Redis 主节点挂掉了会发生什么？好吧，让我们加上一个副本。

And use it if the master is unavailable. This is unfortunately^[32] not viable.
如果主节点不可用那就使用它（副本）。可不幸的是，这是行不通的。

By doing so we can’t implement our safety property of mutual exclusion, because Redis replication^[33] is asynchronous^[34].
通过这样做，我们无法实现我们互斥的安全属性，因为 Redis 复制是异步的。

There is a race^[35] condition with this model:
该模型存在竞争条件：

Client A acquires the lock in the master.
客户端 A 在主节点中获得锁
The master crashes before the write to the key is transmitted^[36] to the replica.
在对 Key 的写入传到副本之前，主节点崩溃了。
The replica gets promoted to master.
副本被提升为主节点。
Client B acquires the lock to the same resource A already holds a lock for. SAFETY VIOLATION^[37]!
客户端 B 获取了 A 已经持有的同一资源的锁。安全违规！

Sometimes it is perfectly fine that, under special circumstances^[38], for example during a failure, multiple clients can hold the lock at the same time.
有时，在特殊情况下（例如在发生故障时），多个客户端可以同时持有锁，这是完全没问题的。

If this is the case, you can use your replication based solution. Otherwise we suggest to implement the solution described in this document.
如果是这种情况，您可以使用基于复制的解决方案。否则，我们建议实施本文档中描述的解决方案。

Correct Implementation with a Single Instance

正确实现单个实例

Before trying to overcome^[39] the limitation of the single instance setup described above, let’s check how to do it correctly in this simple case,
在尝试克服上述单实例设置的限制之前，让我们看看在这个简单的情况下如何正确地做到这一点，

since this is actually a viable solution in applications where a race condition from time to time is acceptable^[40],
由于这实际上是在某些应用中可以接受偶尔发生竞态条件的一种可行解决方案,

and because locking into a single instance is the foundation^[41] we’ll use for the distributed algorithm described here.
因此，锁定单个实例是我们将用于描述的分布式算法的基础。

To acquire the lock, the way to go is the following:
要获取锁，请按照以下方式操作：

SET resource_name my_random_value NX PX 30000

The command will set the key only if it does not already exist (NX option), with an expire of 30000 milliseconds (PX option).
该命令只有在 Key 不存在时（NX选项）才会设置 Key ，并设置过期时间为30000毫秒（PX选项）。

The key is set to a value “my_random_value”. This value must be unique across all clients and all lock requests.
这个 Key 设置的值为“my_random_value”。这个值在所有客户端和所有锁定请求中必须是唯一的。

Basically the random value is used in order to release the lock in a safe way, with a script that tells Redis: remove the key only if it exists and the value stored at the key is exactly the one I expect to be.
基本上，这个随机值被用来以安全的方式释放锁，并通过一个脚本告诉 Redis: 仅当 Key 存在并且存储在 Key 中的值正是我期望的值时才删除该 Key 。

This is accomplished^[42] by the following Lua script:
通过以下 Lua 脚本来实现：

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

This is important in order to avoid^[43] removing a lock that was created by another client.
这对于避免删除一个由另一个客户端创建的锁至关重要。
For example a client may acquire the lock, get blocked performing some operation for longer than the lock validity time (the time at which the key will expire^[44]), and later remove the lock, that was already acquired by some other client.
例如一个客户端可能会获取锁，在执行某些操作时被阻止，时间长于锁有效期（ Key 过期时间），然后删除已被其他客户端获取的锁。
Using just DEL is not safe as a client may remove another client's lock. With the above script instead every lock is “signed” with a random string, so the lock will be removed only if it is still^[45] the one that was set by the client trying to remove it.
仅使用 DEL 并不安全，因为客户端可能会删除另一个客户端的锁。使用上面的脚本，每个锁都用随机字符串“签名”，因此，除非它仍然是由试图删除它的客户端所设置的锁时，该锁才会被删除。

What should this random string be? We assume^[46] it’s 20 bytes from /dev/urandom, but you can find cheaper^[47] ways to make it unique enough for your tasks.
这个随机字符串应该是什么？我们假设它的20个字符是来自/dev/urandom，但你可以找到更便宜的方法来使其对于您的任务来说足够独特（唯一性）。
For example a safe pick^[48] is to seed RC4 with /dev/urandom, and generate a pseudo^[49] random stream from that.
例如，一个安全的选择是使用 /dev/urandom 为 RC4 播种，并从中生成伪随机流。
A simpler solution is to use a UNIX timestamp with microsecond precision^[50], concatenating the timestamp with a client ID. It is not as safe, but probably sufficient for most environments.
一个更简单的解决方案是使用微秒精度的 UNIX 时间辍，将时间戳与客户端 ID 连接起来。它不太安全，但对于大多数环境来说可能足够了。

The "lock validity time" is the time we use as the key's time to live.
“锁有效期”是我们用作 Key 生存时间的时间。
It is both the auto release time, and the time the client has in order to perform the operation required before another client may be able to acquire the lock again, without technically violating^[51] the mutual exclusion guarantee, which is only limited to a given window of time from the moment the lock is acquired.
它既是自动释放时间，也是客户端在另一个客户端能够再次获取锁之前执行所需操作的时间，而不会在技术上违反互斥保证，互斥保证仅限于给定的窗口从获取锁的那一刻起的时间。

So now we have a good way to acquire and release the lock. With this system, reasoning about a non-distributed system composed of a single, always available, instance, is safe. Let’s extend the concept to a distributed system where we don’t have such guarantees.
所以现在我们有一个很好的方法来获取和释放锁。使用这个系统，推理对于一个单一、始终可用的实例组成的非分布式系统是安全的。让我们将这个概念扩展到没有这样保证的分布式系统。

The Redlock Algorithm

红锁算法

In the distributed version of the algorithm we assume we have N Redis masters.
在算法的分布式版本中，我们假设我们有N个 Redis 主节点

Those nodes are totally independent, so we don’t use replication or any other implicit^[52] coordination^[53] system^[54].
这些节点是完全独立的，所以我们不使用副本或任何其他的隐性协调系统。
We already described how to acquire and release the lock safely in a single instance.
我们已经描述了如何在单个实例中安全地获取并释放锁。
We take for granted^[55] that the algorithm will use this method to acquire and release the lock in a single instance.
我们理所当然地认为算法将使用此方法在单个实例中获取和释放锁。
In our examples we set N=5, which is a reasonable value, so we need to run 5 Redis masters on different computers or virtual machines in order to ensure that they’ll fail in a mostly independent way.
在我们的示例中，我们设置 N=5，这是一个合理的值，所以我们需要在不同的计算机或虚拟机中运行5个 Redis 主节点，以确保他们将以一种基本独立的方式失败。

In order to acquire the lock, the client performs the following operations:
为了获取锁，客户端执行以下操作：

It gets the current time in milliseconds.
获取当前时间（以毫秒为单位）。
It tries to acquire the lock in all the N instances sequentially^[56], using the same key name and random value in all the instances.
尝试依次获取N个实例中所有的锁，在所有实例中使用相同的 Key 名称和随机值。
During step 2, when setting the lock in each instance, the client uses a timeout which is small compared to the total lock auto-release time in order to acquire it.
在步骤2中，当在每个实例中设置锁时，客户端使用与总锁自动释放时间相比较小的超时来获取锁。
For example if the auto-release time is 10 seconds, the timeout could be in the ~ 5-50 milliseconds range.
例如当自动释放时间为10秒时，超时时间可能会在5~50毫秒之间。
This prevents^[57] the client from remaining^[58] blocked for a long time trying to talk with a Redis node which is down: if an instance is not available, we should try to talk with the next instance ASAP.
这可以防止客户端在尝试与已关闭的 Redis 节点通信时长时间处于阻塞状态：如果某个实例不可用，我们应该尽快尝试与下一个实例对话。

The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1.
客户端通过从当前时间减去步骤 1 中获得的时间戳来计算获取锁所花费的时间。
If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.
当且仅当客户端能够在大多数实例（至少3个）中获取锁，并且获取锁所花费的总时间小于锁的有效期时，才被认为获取了锁。
If the lock was acquired, its validity time is considered to be the initial validity time minus^[59] the time elapsed^[60], as computed in step 3.
如果获取了锁，则其有效时间被视为初始有效时间减去经过的时间（如步骤 3 中计算的那样）。
If the client failed to acquire the lock for some reason (either it was not able to lock N/2+1 instances or the validity time is negative), it will try to unlock all the instances (even the instances it believed it was not able to lock).
如果客户端由于一些原因（要么无法锁定 N/2+1 个实例，要么有效时间为负）获取锁失败，它将尝试解锁所有实例（甚至它认为无法锁定的实例能够锁定）。

Is the Algorithm Asynchronous?

这个算法是异步的吗？

The algorithm relies on^[61] the assumption^[62] that while there is no synchronized clock across the processes, the local time in every process updates at approximately^[63] at the same rate, with a small margin^[64] of error compared to the auto-release time of the lock.
算法依赖于这样的假设：虽然进程之间没有同步时钟，但每个进程中的本地时间以大致相同的速率更新，与锁的自动释放时间相比，误差幅度很小。
This assumption closely resembles^[65] a real-world computer: every computer has a local clock and we can usually rely on different computers to have a clock drift^[66] which is small.
这个假设非常类似于现实世界的计算机：每台计算机都有一个本地时钟，我们通常可以依靠不同的计算机来获得很小的时钟漂移。
At this point we need to better specify our mutual exclusion rule: it is guaranteed only as long as the client holding the lock terminates^[67] its work within the lock validity time (as obtained in step 3), minus some time (just a few milliseconds in order to compensate^[68] for clock drift between processes).
此刻，我们需要更好地指定我们的互斥规则：只有持有锁的客户端在锁有效时间内（如步骤 3 中获得的）终止其工作，减去一些时间（仅几毫秒，以补偿进程之间的时钟漂移），才能保证它。

This paper contains^[69] more information about similar systems requiring a bound clock drift: Leases: an efficient^[70] fault-tolerant mechanism^[71] for distributed file cache consistency.
本文包含有关需要绑定时钟漂移的类似系统的更多信息：租约：分布式文件缓存一致性的高效容错机制。

Retry on Failure

失败重试

When a client is unable to acquire the lock, it should try again after a random delay in order to try to desynchronize multiple clients trying to acquire the lock for the same resource at the same time (this may result in a split brain condition where nobody wins).
当客户端没能获取锁时，它应该在随机延时后重试，以便于尝试使多个客户端同时尝试获取同一资源的锁不同步（这可能会导致没有人获胜的裂脑情况[各执己见]）。
Also the faster a client tries to acquire the lock in the majority of Redis instances, the smaller the window for a split brain condition (and the need for a retry), so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.
另外，客户端在大多数 Redis 实例中尝试获取锁的速度越快，则脑裂情况的窗口就越小（并且需要重试），因此理想情况下，客户端应尝试使用多路复用同时将 SET 命令发送到 N 个实例。

It is worth^[72] stressing^[73] how important it is for clients that fail to acquire the majority of locks, to release the (partially) acquired locks ASAP,
值得强调的是，对于未能获取大部分锁的客户端来说，尽快释放（部分）获取的锁是有多重要，
so that there is no need to wait for key expiry in order for the lock to be acquired again (however if a network partition happens and the client is no longer able to communicate with the Redis instances, there is an availability penalty to pay as it waits for key expiration).
这样就不需要等待 Key 过期了再次获取锁（然而，如果发生网络分区并且客户端不再能够与 Redis 实例通信，则在等待 Key 过期时会产生可用性损失）。

Releasing the Lock

释放锁

Releasing the lock is simple, and can be performed whether or not the client believes it was able to successfully lock a given instance.
释放锁很简单，并且无论客户端是否认为它能够成功锁定给定实例都可以执行。

Safety Arguments

安全论点

Is the algorithm safe? Let's examine what happens in different scenarios.
算法安全吗？让我们看看不同场景下会发生什么。

To start let’s assume that a client is able to acquire the lock in the majority of instances.
首先，让我们假设客户端在大多数情况下都能获取锁。
All the instances will contain a key with the same time to live. However, the key was set at different times, so the keys will also expire at different times.
所有实例将包含一个有着相同存活时间的 Key 。但是， Key 是在不同时间设置的，因此 Key 也会在不同时间过期。
But if the first key was set at worst at time T1 (the time we sample before contacting the first server) and the last key was set at worst at time T2 (the time we obtained the reply from the last server), we are sure that the first key to expire in the set will exist for at least MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT.
但如果第一个 Key 在 T1（我们在联系第一台服务器之前采样的时间）时刻被设置为最差，最后一个 Key 在 T2（我们从最后一个服务器获得回复的时间）时刻被设置为最差，那么我们确信集合中第一个过期的 Key 将至少存在 MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT。
All the other keys will expire later, so we are sure that the keys will be simultaneously^[74] set for at least this time.
所有其他 Key 稍后都会过期，因此我们确信至少这一次将同时设置这些 Key 。
During the time that the majority of keys are set, another client will not be able to acquire the lock, since N/2+1 SET NX operations can’t succeed if N/2+1 keys already exist. So if a lock was acquired, it is not possible to re-acquire it at the same time (violating the mutual exclusion property).
在设置大多数 Key 期间，另一个客户端将无法获取锁，因为如果 N/2+1 个 Key 已经存在，则 N/2+1 SET NX 操作无法成功。因此，如果获取了锁，则不可能在同一时间重新获取它（违反了互斥属性）。

However we want to also make sure that multiple clients trying to acquire the lock at the same time can’t simultaneously succeed.
然而，我们还想要确保多个客户端尝试在同一时间获取锁不能同时成功。

If a client locked the majority of instances using a time near, or greater, than the lock maximum validity time (the TTL we use for SET basically), it will consider the lock invalid and will unlock the instances,
如果客户端使用接近或大于锁最大有效时间（我们基本上用于 SET 的 TTL）的时间锁定了大多数实例，那么它将认为锁无效并解锁实例，
so we only need to consider the case where a client was able to lock the majority of instances in a time which is less than the validity time.
所以我们只需要考虑客户端能够在小于有效时间的时间内锁定大多数实例的情况。
In this case for the argument already expressed above, for MIN_VALIDITY no client should be able to re-acquire the lock.
在这种情况下，对于上面已经表达的论点，对于 MIN_VALIDITY，没有客户端应该能够重新获取锁。
So multiple clients will be able to lock N/2+1 instances at the same time (with "time" being the end of Step 2) only when the time to lock the majority was greater than the TTL time, making the lock invalid.
因此，只有当锁定多数实例的时间大于 TTL 时间时，多个客户端才能够在同一时间（“时间”是第 2 步的结束时间）锁定 N/2+1个实例，从而使锁定无效。

Liveness Arguments

活跃度论点

The system liveness is based on three main features:
系统活跃度基于3个主要特征：

The auto release of the lock (since keys expire): eventually keys are available again to be locked.
锁（由于 Key 过期）自动释放：最终 Key 可以再次被锁定。
The fact that clients, usually, will cooperate^[75] removing the locks when the lock was not acquired, or when the lock was acquired and the work terminated, making it likely that we don’t have to wait for keys to expire to re-acquire the lock.
实际上，当没有获取锁时，或者当获取锁并且工作终止时，客户端通常会配合删除锁，这使得我们不必等待 Key 过期来重新获取锁。
The fact that when a client needs to retry a lock, it waits a time which is comparably greater than the time needed to acquire the majority of locks, in order to probabilistically^[76] make split brain conditions during resource contention unlikely.
实际上，当客户端需要重试锁时，它等待的时间比获取大多数锁所需的时间要长，以便在概率上使资源争用期间不太可能出现脑裂情况。

However, we pay an availability penalty equal to TTL time on network partitions, so if there are continuous partitions, we can pay this penalty indefinitely.
然而，我们在网络分区上支付相当于 TTL 时间的可用性惩罚，因此若有连续分区，我们可以无限期地支付这笔罚款。
This happens every time a client acquires a lock and gets partitioned away before being able to remove the lock.
每当客户端获取锁并在能够删除锁之前被分区时，就会发生这种情况。

Basically if there are infinite continuous network partitions, the system may become not available for an infinite amount of time.
基本上，若有无限连续的网络分区，那么系统可能会在无限期内不可用（永不可用）。

Performance, Crash Recovery and fsync

性能，崩溃恢复及 fsync

Many users using Redis as a lock server need high performance in terms of both latency^[77] to acquire and release a lock, and number of acquire / release operations that it is possible to perform per second.
许多使用 Redis 作为锁服务的用户在获取和释放锁的延迟以及每秒可以执行的获取/释放操作的数量方面都需要高性能。
In order to meet this requirement, the strategy to talk with the N Redis servers to reduce latency is definitely multiplexing (putting the socket in non-blocking mode, send all the commands, and read all the commands later, assuming that the RTT between the client and each instance is similar).
为了满足需求，与N个 Redis 服务器通信以减少延迟的策略一定是多路复用（将套接字置于非阻塞模式，发送所有命令，然后读取所有命令，假设 Redis 服务器之间的 RTT 客户端和每个实例都是相似的）。
However there is another consideration around persistence if we want to target a crash-recovery system model.
然而，如果我们想要以崩溃恢复系统模型为目标，则还需要考虑持久性。

Basically to see the problem here, let’s assume we configure Redis without persistence at all. A client acquires the lock in 3 of 5 instances.
基本上为了看到这儿的问题，我们假设我们根本没有配置 Redis 持久性。客户端在 5 个实例中的 3 个实例中获取了锁。
One of the instances where the client was able to acquire the lock is restarted, at this point there are again 3 instances that we can lock for the same resource, and another client can lock it again, violating the safety property of exclusivity of lock.
客户端能够获取锁的实例之一被重新启动，此时我们又可以对同一资源锁定3个实例，并且另一个客户端可以再次锁定它，这违反了锁独占性的安全属性。

If we enable AOF persistence, things will improve quite a bit. For example we can upgrade a server by sending it a SHUTDOWN command and restarting it.
如果我们启用 AOF 持久化，事情将会改善很多。例如，我们可以通过向服务器发送 SHUTDOWN 命令并重新启动来升级服务器。
Because Redis expires are semantically implemented so that time still elapses when the server is off, all our requirements are fine.
因为 Redis 过期是语义实现的，所以当服务器关闭时，时间仍会流逝，所以我们的所有要求都很好。
However everything is fine as long as it is a clean shutdown. What about a power outage? If Redis is configured, as by default, to fsync on disk every second, it is possible that after a restart our key is missing.
但是，只要干净关闭，一切都很好。停电了怎么办（指非正常关机）？如果 Redis 默认情况下配置为每秒在磁盘上 fsync 一次，则重新启动后我们的 Key 可能会丢失。
In theory^[78], if we want to guarantee the lock safety in the face of any kind of instance restart, we need to enable fsync=always in the persistence settings. This will affect performance due to the additional sync overhead^[79].
理论上讲，如果我们想在任何类型的实例重启时保证锁的安全，我们需要在持久化设置中启用 fsync=always。由于额外的同步开销，这将影响性能。
However things are better than they look like at a first glance. Basically, the algorithm safety is retained^[80] as long as when an instance restarts after a crash, it no longer participates^[81] to any currently active lock.
然而，事情比乍一看要好。基本上，只要当实例在崩溃后重新启动，它不再参与任何当前活动的锁，算法的安全性就会得到保留。
This means that the set of currently active locks when the instance restarts were all obtained by locking instances other than the one which is rejoining^[82] the system.
这意味着当实例重新启动时当前活动的一组锁都是通过锁定除重新加入系统之外的实例而获得的。

To guarantee this we just need to make an instance, after a crash, unavailable for at least a bit more than the max TTL we use.
为了保证这一点，我们只需要让一个实例在崩溃后不可用的时间至少比我们使用的最大 TTL 长一点。
This is the time needed for all the keys about the locks that existed when the instance crashed to become invalid and be automatically released.
这是当实例崩溃时存在的锁的所有 Key 失效并自动释放所需的时间。
Using delayed restarts it is basically possible to achieve safety even without any kind of Redis persistence available, however note that this may translate into an availability penalty.
使用延迟重启，即使没有任何可用的 Redis 持久性，基本上也可以实现安全性，但请注意，这可能会转化为可用性损失。
For example if a majority of instances crash, the system will become globally unavailable for TTL (here globally means that no resource at all will be lockable during this time).
例如如果大多数实例崩溃了，系统将变得全局不可用于 TTL（这里的全局指的是在这段时间内没有任何资源是可锁定的）。

Making the algorithm more reliable: Extending the lock

使算法更可靠：延伸锁

If the work performed by clients consists of small steps, it is possible to use smaller lock validity times by default, and extend the algorithm implementing a lock extension mechanism.
如果客户端执行的工作由小步骤组成，则默认情况下可以使用较小的锁有效时间，并扩展实现锁扩展机制的算法。
Basically the client, if in the middle of the computation while the lock validity is approaching a low value, may extend the lock by sending a Lua script to all the instances that extends the TTL of the key if the key exists and its value is still the random value the client assigned when the lock was acquired.
基本上，如果在计算过程中，当锁有效性接近较低值时，客户端可以通过向所有扩展该 Key 的 TTL 的实例发送 Lua 脚本（如果该 Key 存在并且其值仍然是）来扩展锁。获取锁时客户端分配的随机值。

The client should only consider the lock re-acquired if it was able to extend the lock into the majority of instances, and within the validity time (basically the algorithm to use is very similar to the one used when acquiring the lock).
仅当客户端能够将锁扩展到大多数实例并且在有效时间内（基本上使用的算法与获取锁时使用的算法非常相似）时，客户端才应该考虑重新获取锁。

However this does not technically change the algorithm, so the maximum number of lock reacquisition^[83] attempts^[84] should be limited, otherwise one of the liveness properties is violated.
然而，这在技术上并没有改变算法，因此应该限制锁重新获取尝试的最大次数，否则就会违反活性属性之一。

Reference

https://redis.io/docs/latest/develop/use/patterns/distributed-locks/

distribute: v. 分发，分配，分布 ↩︎
pattern: n. 方式，形式；模式；图案；图样 ↩︎
primitive: 原始的，早期的；简陋的；在该文章中指的是 “基本工具（或概念）” ↩︎
mutually: adv. 相互地； exclusive: adj. 独有的，独占的； mutually exclusive: 相互排斥，互斥 ↩︎
implement: v. 实施，实现； n. 工具，器具 ↩︎
approach: n. 方式；方法；商谈，接洽； v. 靠近，接近 ↩︎
guarantee: v.保证，保修，确保； n. 保证，担保 ↩︎
achieve: v. 完成；达到；实现 ↩︎
slightly: adv. 稍微地，略微地 ↩︎
complex: adj. 复杂的;难懂的； n. 建筑群 ↩︎
canonical: adj. 最佳的，经典的，权威的 ↩︎
algorithm: n. 算法 ↩︎
vanilla: adj. 普通的，基本的； n. 香草精（调味料） ↩︎
analyze: v. 分析（实际为 analyse） ↩︎
feedback: n. 反馈信息，反馈意见 ↩︎
alternative: adj. （计划或方法）可替代的；可供选择的;另类的 ↩︎
liveness: 没有这个单词，在这里指应用程序的活跃性 ↩︎
model: v. 建模，塑造； n. 模型；模特儿 ↩︎
minimum: adv. 至少； adj. 最小的，最低限度的； n. 最小值，最低限度 ↩︎
effective: adj. 有效的；实际上的；（法律法规）生效的 ↩︎
eventually: adv. 最终，终于 ↩︎
possible: adj. 可能的 ↩︎
acquire: v. 取得，获得;购得;学到 ↩︎
crash: v. 瘫痪，死机；崩溃，破产；撞坏，坠毁 ↩︎
partition: v. 分隔，隔开，分区；分裂； n. 隔墙，隔板；分裂 ↩︎
fault: n. 故障；缺点；过失； v. 挑剔，指责； tolerance: n. 宽容，忍受，容忍；忍耐力；公差； fault tolerance: 容错 ↩︎
release: v. 释放，放走；释出；发泄；松开；公开，发布，发行； n. 释放；公开，发布；排放；解脱 ↩︎
improve: v. 改进，改善 ↩︎
expires: v. 到期，过期，结束；亡故，逝世 ↩︎
superficially: adv. 表面地，肤浅地；当放在句首时译为“从表面上看” ↩︎
architecture: n. 架构，体系结构；建筑学；建筑风格 ↩︎
unfortunately: adv. 不幸地，倒霉地 ↩︎
replication: n. 复现，重复，复制；再造，重生 ↩︎
asynchronous: adj. 不同时存在的，不同时发生的，异步的 ↩︎
race: n. 竞争；争夺；抢先; 赛跑，速度竞赛 ↩︎
transmit: v. 发射，播送；传递，传播 ↩︎
violation: n. 违反，违背，违犯（尤指法律、协议、原则等） ↩︎
circumstance: n. 条件，情况；情形，形势; 无法控制的因素；客观环境；命运 ↩︎
overcome: v. 克服；战胜；攻克；解决; 使受不了；使无法行动（或思考） ↩︎
acceptable: adj. 令人满意的；可以接受的；可容许的；赞同的; 勉强可以的，差强人意的 ↩︎
foundation: n. 基础；创建;建立; 基金会；粉底霜 ↩︎
accomplished: v. 完成；实现；达到；做到（accomplish 的过去式）; adj. 熟练的；有造诣的；有才艺的 ↩︎
avoid: v. 避开，逃避；避免，防止 ↩︎
expire: v. 到期，期满;结束；逝世，亡故 ↩︎
still: adv. 还，还是; 仍然，依旧 ↩︎
assume: v. 假定，假设; 假冒 ↩︎
cheaper: adj. 更便宜的，更廉价的（cheap 的比较级） ↩︎
pick: n. 挑选;选择；牙签; v. 挑选;选择 ↩︎
pseudo-: prefix. 伪，假 ↩︎
precision: n.准确（性），精确（性），精密（度） ↩︎
violate: v.违反，违背，违犯（尤指法律、协议、原则等） ↩︎
implicit: adj. 不明言的，含蓄的; 无疑问的；无保留的 ↩︎
coordination: n. 协调，调节 ↩︎
在 Redis 中，"implicit coordination system" 指的是一种无须显式协调机制的设计方式，允许多个客户端或应用程序在分布式环境中进行高效的数据交互和状态管理。 ↩︎
take for granted: 理所当然的认为...；...是理所当然的 ↩︎
sequentially: adv. 按特定顺序地;连续地 ↩︎
prevent: v. 阻止，妨碍；预防 ↩︎
remain: v. 停留，留下;保持不变，仍然是 ↩︎
minus: prep. 减（去）; n. 缺点 ↩︎
elapse: v. （时间）流逝，过去 ↩︎
rely on: 依赖；依靠；依仗 ↩︎
assumption: n. 假定;假设;臆断 ↩︎
approximately: adv.约, 大约, 大致, 左右 ↩︎
margin: n.余量, 幅度, 边, 差额 ↩︎
closely: adv. 相似地，紧密地; resemble: v. 像；看起来像；与…相似; closely resemble: 非常相似 ↩︎
drift: n. 漂移； v. 漂移，漂流 ↩︎
terminate: v. （使…）结束，停止，终止 ↩︎
compensate: v. 赔偿;补偿; 弥补 ↩︎
contain: v.包含, 含有, 含, 遏制 ↩︎
efficient: adj. 效率高的；有能力的；有效的；生效的 ↩︎
mechanism: n. 体制，结构方式; 机械部件；机械装置 ↩︎
worth: n. 价格;价值; 重要性; adj. 值得；值…钱的 ↩︎
stressing: v. 强调；紧张；焦虑 ↩︎
simultaneously: adv.同时, 并, 齐 ↩︎
cooperate: v.合作, 协作, 搭档 ↩︎
probabilistically: adv. 概率上 ↩︎
latency: n. 潜在因素; 潜伏 ↩︎
theory: 理论;学说;意见; In theory：理论上讲 ↩︎
overhead: n. 幻灯片；经费 ↩︎
retain: v. 保持;保留;保有 ↩︎
participate: v. 参与，参加 ↩︎
rejoin: v. 重返;返回;与…再会合; 回答，反驳 ↩︎
acquisition: n.获得（re + acquisition = 重新获得） ↩︎
attempt: v. 努力，尝试 ↩︎

posted on 2024-10-14 00:04 hao7Chen 阅读(13) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· Redis分布式锁

· Redis—分布式锁

· Redis分布式锁安全问题

· Redis的分布式锁详解

· Redis—分布式锁

公告

昵称： hao7Chen
园龄： 2年
粉丝： 0
关注： 1

+加关注

2025年3月

日

一

二

三

四

五

六

随笔档案

2024年10月(1)

阅读排行榜

1. [翻译][英语学习] Redis 与分布式锁(13)

[翻译][英语学习] Redis 与分布式锁

Distributed^[1] Locks with Redis

Implementations

Safety and Liveness^[17] Guarantees

Why Failover-based Implementations Are Not Enough

Correct Implementation with a Single Instance

The Redlock Algorithm

Is the Algorithm Asynchronous?

Retry on Failure

Releasing the Lock

Safety Arguments

Liveness Arguments

Performance, Crash Recovery and fsync

Making the algorithm more reliable: Extending the lock

Reference

公告

搜索

常用链接

随笔档案

阅读排行榜

[翻译][英语学习] Redis 与分布式锁

Distributed[1] Locks with Redis

Implementations

Safety and Liveness[17] Guarantees

Why Failover-based Implementations Are Not Enough

Correct Implementation with a Single Instance

The Redlock Algorithm

Is the Algorithm Asynchronous?

Retry on Failure

Releasing the Lock

Safety Arguments

Liveness Arguments

Performance, Crash Recovery and fsync

Making the algorithm more reliable: Extending the lock

Reference

公告

搜索

常用链接

随笔档案

阅读排行榜

Distributed^[1] Locks with Redis

Safety and Liveness^[17] Guarantees