Redis Stream Commands 命令学习-3 XREADGROUP

Redis Streams tutorial | Redis

xread命令有几个问题：

1、消息丢失的问题

2、消息多人重复读取

3、消息是否读取的状态标记

针对上面的问题，引出另一个概念，“消费者组"(consumber group),可以这么理解，stream 是水果生产基地，组就是中间批发商，消费者就是消费者了。。。苹果就是entry

每个苹果都独一无二，卖了就再也没有有了。消费者买到手里，如果一时吃不掉放在冰箱里。

以下为官方资料，

Consumer groups

When the task at hand is to consume the same stream from different clients, then XREAD already offers a way to fan-out to N clients, potentially also using replicas in order to provide more read scalability. However in certain problems what we want to do is not to provide the same stream of messages to many clients, but to provide a different subset of messages from the same stream to many clients. An obvious case where this is useful is that of messages which are slow to process: the ability to have N different workers that will receive different parts of the stream allows us to scale message processing, by routing different messages to different workers that are ready to do more work.

In practical terms, if we imagine having three consumers C1, C2, C3, and a stream that contains the messages 1, 2, 3, 4, 5, 6, 7 then what we want is to serve the messages according to the following diagram:

1 -> C1
2 -> C2
3 -> C3
4 -> C1
5 -> C2
6 -> C3
7 -> C1

In order to achieve this, Redis uses a concept called consumer groups. It is very important to understand that Redis consumer groups have nothing to do, from an implementation standpoint, with Kafka (TM) consumer groups. Yet they are similar in functionality, so I decided to keep Kafka's (TM) terminology, as it originally popularized this idea.

A consumer group is like a pseudo consumer that gets data from a stream, and actually serves multiple consumers, providing certain guarantees:

Each message is served to a different consumer so that it is not possible that the same message will be delivered to multiple consumers.
Consumers are identified, within a consumer group, by a name, which is a case-sensitive string that the clients implementing consumers must choose. This means that even after a disconnect, the stream consumer group retains all the state, since the client will claim again to be the same consumer. However, this also means that it is up to the client to provide a unique identifier.
Each consumer group has the concept of the first ID never consumed so that, when a consumer asks for new messages, it can provide just messages that were not previously delivered.
Consuming a message, however, requires an explicit acknowledgment using a specific command. Redis interprets the acknowledgment as: this message was correctly processed so it can be evicted from the consumer group.
A consumer group tracks all the messages that are currently pending, that is, messages that were delivered to some consumer of the consumer group, but are yet to be acknowledged as processed. Thanks to this feature, when accessing the message history of a stream, each consumer will only see messages that were delivered to it.

In a way, a consumer group can be imagined as some amount of state about a stream:

+----------------------------------------+
| consumer_group_name: mygroup           |
| consumer_group_stream: somekey         |
| last_delivered_id: 1292309234234-92    |
|                                        |
| consumers:                             |
|    "consumer-1" with pending messages  |
|       1292309234234-4                  |
|       1292309234232-8                  |
|    "consumer-42" with pending messages |
|       ... (and so forth)               |
+----------------------------------------+

If you see this from this point of view, it is very simple to understand what a consumer group can do, how it is able to just provide consumers with their history of pending messages, and how consumers asking for new messages will just be served with message IDs greater than last_delivered_id. At the same time, if you look at the consumer group as an auxiliary data structure for Redis streams, it is obvious that a single stream can have multiple consumer groups, that have a different set of consumers. Actually, it is even possible for the same stream to have clients reading without consumer groups via XREAD, and clients reading via XREADGROUP in different consumer groups.

Now it's time to zoom in to see the fundamental consumer group commands. They are the following:

XGROUP is used in order to create, destroy and manage consumer groups.
XREADGROUP is used to read from a stream via a consumer group.
XACK is the command that allows a consumer to mark a pending message as correctly processed.

Creating a consumer group

Assuming I have a key mystream of type stream already existing, in order to create a consumer group I just need to do the following:

> XGROUP CREATE mystream mygroup $
OK

As you can see in the command above when creating the consumer group we have to specify an ID, which in the example is just $. This is needed because the consumer group, among the other states, must have an idea about what message to serve next at the first consumer connecting, that is, what was the last message ID when the group was just created. If we provide $ as we did, then only new messages arriving in the stream from now on will be provided to the consumers in the group. If we specify 0 instead the consumer group will consume all the messages in the stream history to start with. Of course, you can specify any other valid ID. What you know is that the consumer group will start delivering messages that are greater than the ID you specify. Because $ means the current greatest ID in the stream, specifying $ will have the effect of consuming only new messages.

XGROUP CREATE also supports creating the stream automatically, if it doesn't exist, using the optional MKSTREAM subcommand as the last argument:

> XGROUP CREATE newstream mygroup $ MKSTREAM
OK

Now that the consumer group is created we can immediately try to read messages via the consumer group using the XREADGROUP command. We'll read from consumers, that we will call Alice and Bob, to see how the system will return different messages to Alice or Bob.

XREADGROUP is very similar to XREAD and provides the same BLOCK option, otherwise it is a synchronous command. However there is a mandatory option that must be always specified, which is GROUP and has two arguments: the name of the consumer group, and the name of the consumer that is attempting to read. The option COUNT is also supported and is identical to the one in XREAD.

Before reading from the stream, let's put some messages inside:

> XADD mystream * message apple
1526569495631-0
> XADD mystream * message orange
1526569498055-0
> XADD mystream * message strawberry
1526569506935-0
> XADD mystream * message apricot
1526569535168-0
> XADD mystream * message banana
1526569544280-0

Note: here message is the field name, and the fruit is the associated value, remember that stream items are small dictionaries.

It is time to try reading something using the consumer group:

> XREADGROUP GROUP mygroup Alice COUNT 1 STREAMS mystream >
1) 1) "mystream"
   2) 1) 1) 1526569495631-0
         2) 1) "message"
            2) "apple"

XREADGROUP replies are just like XREAD replies. Note however the GROUP <group-name> <consumer-name> provided above. It states that I want to read from the stream using the consumer group mygroup and I'm the consumer Alice. Every time a consumer performs an operation with a consumer group, it must specify its name, uniquely identifying this consumer inside the group.

There is another very important detail in the command line above, after the mandatory STREAMS option the ID requested for the key mystream is the special ID >. This special ID is only valid in the context of consumer groups, and it means: messages never delivered to other consumers so far.

This is almost always what you want, however it is also possible to specify a real ID, such as 0 or any other valid ID, in this case, however, what happens is that we request from XREADGROUP to just provide us with the history of pending messages, and in such case, will never see new messages in the group. So basically XREADGROUP has the following behavior based on the ID we specify:

If the ID is the special ID > then the command will return only new messages never delivered to other consumers so far, and as a side effect, will update the consumer group's last ID.
If the ID is any other valid numerical ID, then the command will let us access our history of pending messages. That is, the set of messages that were delivered to this specified consumer (identified by the provided name), and never acknowledged so far with XACK.

We can test this behavior immediately specifying an ID of 0, without any COUNT option: we'll just see the only pending message, that is, the one about apples:

> XREADGROUP GROUP mygroup Alice STREAMS mystream 0
1) 1) "mystream"
   2) 1) 1) 1526569495631-0
         2) 1) "message"
            2) "apple"

However, if we acknowledge the message as processed, it will no longer be part of the pending messages history, so the system will no longer report anything:

> XACK mystream mygroup 1526569495631-0
(integer) 1
> XREADGROUP GROUP mygroup Alice STREAMS mystream 0
1) 1) "mystream"
   2) (empty list or set)

Don't worry if you yet don't know how XACK works, the idea is just that processed messages are no longer part of the history that we can access.

Now it's Bob's turn to read something:

> XREADGROUP GROUP mygroup Bob COUNT 2 STREAMS mystream >
1) 1) "mystream"
   2) 1) 1) 1526569498055-0
         2) 1) "message"
            2) "orange"
      2) 1) 1526569506935-0
         2) 1) "message"
            2) "strawberry"

Bob asked for a maximum of two messages and is reading via the same group mygroup. So what happens is that Redis reports just new messages. As you can see the "apple" message is not delivered, since it was already delivered to Alice, so Bob gets orange and strawberry, and so forth.

This way Alice, Bob, and any other consumer in the group, are able to read different messages from the same stream, to read their history of yet to process messages, or to mark messages as processed. This allows creating different topologies and semantics for consuming messages from a stream.

There are a few things to keep in mind:

Consumers are auto-created the first time they are mentioned, no need for explicit creation.
Even with XREADGROUP you can read from multiple keys at the same time, however for this to work, you need to create a consumer group with the same name in every stream. This is not a common need, but it is worth mentioning that the feature is technically available.
XREADGROUP is a write command because even if it reads from the stream, the consumer group is modified as a side effect of reading, so it can only be called on master instances.

XREADGROUP

The XREADGROUP command is a special version of the XREAD command with support for consumer groups. Probably you will have to understand the XREAD command before reading this page will makes sense.

Moreover, if you are new to streams, we recommend to read our introduction to Redis Streams. Make sure to understand the concept of consumer group in the introduction so that following how this command works will be simpler.

Consumer groups in 30 seconds

The difference between this command and the vanilla XREAD is that this one supports consumer groups.

Without consumer groups, just using XREAD, all the clients are served with all the entries arriving in a stream. Instead using consumer groups with XREADGROUP, it is possible to create groups of clients that consume different parts of the messages arriving in a given stream. If, for instance, the stream gets the new entries A, B, and C and there are two consumers reading via a consumer group, one client will get, for instance, the messages A and C, and the other the message B, and so forth.

Within a consumer group, a given consumer (that is, just a client consuming messages from the stream), has to identify with a unique consumer name. Which is just a string.

One of the guarantees of consumer groups is that a given consumer can only see the history of messages that were delivered to it, so a message has just a single owner. However there is a special feature called message claiming that allows other consumers to claim messages in case there is a non recoverable failure of some consumer. In order to implement such semantics, consumer groups require explicit acknowledgment of the messages successfully processed by the consumer, via the XACK command. This is needed because the stream will track, for each consumer group, who is processing what message.

This is how to understand if you want to use a consumer group or not:

If you have a stream and multiple clients, and you want all the clients to get all the messages, you do not need a consumer group.
If you have a stream and multiple clients, and you want the stream to be partitioned or sharded across your clients, so that each client will get a sub set of the messages arriving in a stream, you need a consumer group.

Differences between XREAD and XREADGROUP

From the point of view of the syntax, the commands are almost the same, however XREADGROUP requires a special and mandatory option:

GROUP <group-name> <consumer-name>

The group name is just the name of a consumer group associated to the stream. The group is created using the XGROUP command. The consumer name is the string that is used by the client to identify itself inside the group. The consumer is auto created inside the consumer group the first time it is saw. Different clients should select a different consumer name.

When you read with XREADGROUP, the server will remember that a given message was delivered to you: the message will be stored inside the consumer group in what is called a Pending Entries List (PEL), that is a list of message IDs delivered but not yet acknowledged.

The client will have to acknowledge the message processing using XACK in order for the pending entry to be removed from the PEL. The PEL can be inspected using the XPENDING command.

The NOACK subcommand can be used to avoid adding the message to the PEL in cases where reliability is not a requirement and the occasional message loss is acceptable. This is equivalent to acknowledging the message when it is read.

The ID to specify in the STREAMS option when using XREADGROUP can be one of the following two:

The special > ID, which means that the consumer want to receive only messages that were never delivered to any other consumer. It just means, give me new messages.
Any other ID, that is, 0 or any other valid ID or incomplete ID (just the millisecond time part), will have the effect of returning entries that are pending for the consumer sending the command with IDs greater than the one provided. So basically if the ID is not >, then the command will just let the client access its pending entries: messages delivered to it, but not yet acknowledged. Note that in this case, both BLOCK and NOACK are ignored.

Like XREAD the XREADGROUP command can be used in a blocking way. There are no differences in this regard.

What happens when a message is delivered to a consumer?

Two things:

If the message was never delivered to anyone, that is, if we are talking about a new message, then a PEL (Pending Entries List) is created.
If instead the message was already delivered to this consumer, and it is just re-fetching the same message again, then the last delivery counter is updated to the current time, and the number of deliveries is incremented by one. You can access those message properties using the XPENDING command.

Usage example

Normally you use the command like that in order to get new messages and process them. In pseudo-code:

WHILE true
    entries = XREADGROUP GROUP $GroupName $ConsumerName BLOCK 2000 COUNT 10 STREAMS mystream >
    if entries == nil
        puts "Timeout... try again"
        CONTINUE
    end

    FOREACH entries AS stream_entries
        FOREACH stream_entries as message
            process_message(message.id,message.fields)

            # ACK the message as processed
            XACK mystream $GroupName message.id
        END
    END
END

In this way the example consumer code will fetch only new messages, process them, and acknowledge them via XACK. However the example code above is not complete, because it does not handle recovering after a crash. What will happen if we crash in the middle of processing messages, is that our messages will remain in the pending entries list, so we can access our history by giving XREADGROUP initially an ID of 0, and performing the same loop. Once providing an ID of 0 the reply is an empty set of messages, we know that we processed and acknowledged all the pending messages: we can start to use > as ID, in order to get the new messages and rejoin the consumers that are processing new things.

To see how the command actually replies, please check the XREAD command page.

What happens when a pending message is deleted?

Entries may be deleted from the stream due to trimming or explicit calls to XDEL at any time. By design, Redis doesn't prevent the deletion of entries that are present in the stream's PELs. When this happens, the PELs retain the deleted entries' IDs, but the actual entry payload is no longer available. Therefore, when reading such PEL entries, Redis will return a null value in place of their respective data.

Example:

> XADD mystream 1 myfield mydata
"1-0"
> XGROUP CREATE mystream mygroup 0
OK
> XREADGROUP GROUP mygroup myconsumer STREAMS mystream >
1) 1) "mystream"
   2) 1) 1) "1-0"
         2) 1) "myfield"
            2) "mydata"
> XDEL mystream 1-0
(integer) 1
> XREADGROUP GROUP mygroup myconsumer STREAMS mystream 0
1) 1) "mystream"
   2) 1) 1) "1-0"
         2) (nil)

Syntax

XREADGROUP GROUP group consumer [COUNT count] [BLOCK milliseconds]
  [NOACK] STREAMS key [key ...] id [id ...]

posted on 2023-03-13 13:08 hztech 阅读(1089) 评论(0) 编辑收藏举报

刷新页面返回顶部