Storm Stream grouping

在Storm中, 开发者可以为上游spout/bolt发射出的tuples指定下游bolt的哪个/哪些task(s)来处理该tuples。这种指定在storm中叫做对stream的分组,即stream grouping,分组方式主要有以下6种

  • Shuffle Grouping 或 None Grouping
  • Fields Grouping
  • All Grouping
  • Global Grouping
  • LocalOrShuffle Grouping
  • Direct Grouping

1. Shuffle Grouping或None Grouping

1.1 定义

    Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.

    None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).

——官方文档

随机分组,随机派发stream里面的tuple,下游每个bolt均衡接收到上游的tuple。

 

                                                                 

                                                                                                                                        (图1)

2. Fields Grouping

2.1 定义

    The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.

——官方文档

按字段分组,比如按userid来分组,具有同样userid的tuple会被分到相同的bolt,而不同的userid则被分配到不同的bolots。

 

                                                    

      (图2)

3. All Grouping

3.1 定义

    The stream is replicated across all the bolt's tasks. Use this grouping with care.

——官方文档

广播发送,对于每一个tuple,所有的bolts都会收到。

 

(图3)

4. Global Grouping

4.1 定义

    The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.

——官方文档

全局分组,所有tuple被分配到storm中的一个bolt的其中一个task。再具体一点就是分配给id值最低的那个task。

(图4)

5. LocalOrShuffle Grouping

5.1 定义

    If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.

——官方文档

如果下游bolt的某些task与上游spout/bolt的某些task运行在同一个worker进程中,那么上游spout/bolt的这些task所发射的所有tuples均由下游bolt的同进程的tasks来处理;否则,这种分组方式等同于shuffle grouping。

(图5)

 

6. Direct Grouping

6.1 定义

    This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the emitDirect methods. A bolt can get the task ids of its consumers by either using the provided TopologyContext or by keeping track of the output of the emit method in OutputCollector (which returns the task ids that the tuple was sent to).

——官方文档

直接分组,用这种分组意味着消息的发送者指定优消息接收者的某个task处理这个消息,只有被声明为DirectStream的消息流可以声明这种分组方法。而且这种消息tuple必须使用emitDirect方法来发射。消息处理者可以通过TopologyContext来获取处理它的消息的taskid(OutputCollector.emit方法也会返回taskid)。

(图6)

 

 

 

posted @ 2018-10-14 15:38  ENOTS  阅读(278)  评论(0编辑  收藏  举报