Kafka入门之PHP语言描述

Kafka入门之PHP语言描述

  1. 准备

    在阅读下面的文章之前,你最好会熟练使用 docker

    本人的系统环境如下:

    CentOS Linux release 8.2.2004 (Core)

    Docker version 19.03.13

    docker-compose version 1.27.4

    Zookeeper version 3.6.2

    Kafka version 2.13-2.6.0

    PHP version 7.4.12

    为了方便诸君阅读实践以及自己后期回顾,我会将部分命令记录下来。

    下载指定版本的 docker:

    yum install -y docker-ce-19.03.14 docker-ce-cli-19.03.14 containerd.io
    

    更改 docker 默认存储路径:

    vim /usr/lib/systemd/system/docker.service
    # ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --graph /data/docker
    # 在 ExecStart 一行最后加上 --graph /data/docker,我这里把默认存储路径改成了 /data/docker。
    # 更新系统配置
    systemctl daemon-reload
    # 启动 docker
    systemctl start docker
    

    下载指定版本的 docker-compose:

    curl -L https://get.daocloud.io/docker/compose/releases/download/1.27.4/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose; chmod 777 /usr/local/bin/docker-compose
    
  2. 开始

    Kafka 属于分布式的消息引擎系统,它的主要功能是提供一套完备的消息发布与订阅解决方案。在 Kafka 中,发布订阅的对象是主题(Topic),你可以为每个业务、每个应用甚至是每类数据都创建专属的主题。

    Kafka 是依赖于 Zookeeper 的。Zookeeper 是一个分布式协调框架,负责协调管理并保存 Kafka 集群的所有元数据信息,比如集群都有哪些 Broker 在运行、创建了哪些 Topic,每个 Topic 都有多少分区(Partition)以及这些分区的 Leader 副本都在哪些机器上等信息。

    Kafka 配置文件中与 Zookeeper 相关的主要是 zookeeper.connectzookeeper.connection.timeout.ms 两个参数,分别是连接设置和连接超时时间设置。

    Kafka 的服务器端由被称为 Broker 的服务进程构成,即一个 Kafka 集群由多个 Broker 组成,Broker 负责接收和处理客户端发送过来的请求,以及对消息进行持久化。虽然多个 Broker 进程能够运行在同一台机器上,但更常见的做法是将不同的 Broker 分散运行在不同的机器上,这样如果集群中某一台机器宕机,即使在它上面运行的所有 Broker 进程都挂掉了,其他机器上的 Broker 也依然能够对外提供服务。这其实就是 Kafka 提供高可用的手段之一。

    实现高可用的另一个手段就是备份机制(Replication)。备份的思想很简单,就是把相同的数据拷贝到多台机器上,而这些相同的数据拷贝在 Kafka 中被称为副本(Replica)。副本的数量是可以配置的,这些副本保存着相同的数据,但却有不同的角色和作用。Kafka 定义了两类副本:领导者副本(Leader Replica)和追随者副本(Follower Replica)。前者对外提供服务,这里的对外指的是与客户端程序进行交互;而后者只是被动地追随领导者副本而已,不能与外界进行交互。副本是在分区这个层级定义的。每个分区下可以配置若干个副本,其中只能有 1 个领导者副本和 N-1 个追随者副本。生产者向分区写入消息,每条消息在分区中的位置信息由一个叫位移(Offset)的数据来表征。分区位移总是从 0 开始,假设一个生产者向一个空分区写入了 10 条消息,那么这 10 条消息的位移依次是 0、1、2、......、9。

    简单来说,Kafka 的架构是这样的:

    Broker => Topic => Partition => Replication。生产者将消息写入分区,进行副本备份。然后消费者将分区中领导者副本的消息进行消费。

  3. 单例

    单例模式比较简单,一般只适用于开发环境,因此我不会费过多的笔墨。重要的参数配置以及注意事项,我会在集群里面写。

    目录结构:

    ├── docker-compose
    │   └── docker-compose.yml
    ├── kafka
    │   ├── conf
    │   │   └── server.properties
    │   ├── data
    │   ├── log                  # chmod 777 log
    │   └── start-kafka.sh       # chmod 777 start-kafka.sh
    └── zookeeper
        ├── conf
        │   └── zoo.cfg
        ├── data
        │   └── myid
        └── log                  # chmod 777 log
    

    zoo.cfg 配置文件:

    tickTime=2000			# 客户端与服务端之间的心跳检测时间间隔,单位毫秒
    clientPort=2181			# 端口
    dataDir=/data			# 数据路径
    dataLogDir=/log			# 日志路径
    

    server.properties 配置文件:

    # Broker
    broker.id=1                                                            # 机器 ID
    log.dirs=/kafka/data                                                   # 数据文件
    log.retention.hours=168                                                # 消息数据被保存多长时间
    log.retention.bytes=-1                                                 # 为消息保存的总磁盘容量大小,-1不限
    message.max.bytes=2097152                                              # 能接收的最大消息大小
    auto.create.topics.enable=false                                        # 不允许自动创建 Topic
    auto.leader.rebalance.enable=false                                     # 不允许自动更换 Leader
    unclean.leader.election.enable=false                                   # 不允许副本竞选 Leader
    offsets.topic.replication.factor=1                                     # Offset Topic 的副本数 <= 机器数
    replication.factor=1                                                   # Topic 的副本数 <= 机器数
    min.insync.replicas=1                                                  # 消息至少写入多少个副本,默认1
    
    # Connect
    listeners=PLAINTEXT://:9091                                            # 对内端口
    advertised.listeners=PLAINTEXT://××.××.××.××:9091                      # 对外端口
    zookeeper.connect=zookeeper:2181                                       # Zookeeper 端口 
    zookeeper.connection.timeout.ms=18000                                  # Zookeeper 连接超时时间
    
    # Producer
    acks=all                                                               # 写入所有副本才算发送成功
    retries=1                                                              # retries > 0 能够自动重试消息发送
    
    # Consumer
    enable.auto.commit=false                                               # 不允许自动提交
    

    start-kafka.sh 脚本文件:

    #!/bin/bash -e
    exec "$KAFKA_HOME/bin/kafka-server-start.sh" "$KAFKA_HOME/config/server.properties"
    

    docker-compose 配置文件:

    services:
      zookeeper:
        image: zookeeper:3.6.2
        container_name: zookeeper
        restart: always
        ports:
          - 2181:2181
        volumes:
          - /work/docker/zookeeper/conf/zoo.cfg:/conf/zoo.cfg
          - /work/docker/zookeeper/data:/data
          - /work/docker/zookeeper/log:/log
        networks:
          - kafka-net
    
      kafka:
        image: wurstmeister/kafka:2.13-2.6.0
        container_name: kafka
        restart: always
        ports:
          - 9091:9091
        volumes:
          - /work/docker/kafka/conf/server.properties:/opt/kafka/config/server.properties
          - /work/docker/kafka/data:/kafka/data
          - /work/docker/kafka/log:/opt/kafka/logs
          - /work/docker/kafka/start-kafka.sh:/usr/bin/start-kafka.sh
        networks:
          - kafka-net
    
    networks:
      kafka-net:
        driver: bridge
    

    接着是客户端与 Kafka 的交互,我是用的是 PHP,需要安装 RdKafka 拓展。

    RdKafka 文档链接:https://arnaud.le-blanc.net/php-rdkafka-doc/phpdoc/book.rdkafka.html

    生产者

    <?php
    
    $conf = new RdKafka\Conf();
    $conf->set('metadata.broker.list', '××.××.××.××:9091');
    
    $producer = new RdKafka\Producer($conf);
    
    $topic = $producer->newTopic("test");
    
    for ($i = 0; $i < 10; $i++) {
        $topic->produce(RD_KAFKA_PARTITION_UA, 0, "Message $i");
        $producer->poll(0);
    }
    
    for ($flushRetries = 0; $flushRetries < 10; $flushRetries++) {
        $result = $producer->flush(10000);
        if (RD_KAFKA_RESP_ERR_NO_ERROR === $result) {
            break;
        }
    }
    
    if (RD_KAFKA_RESP_ERR_NO_ERROR !== $result) {
        throw new \RuntimeException('Was unable to flush, messages might be lost!');
    }
    

    消费者

    <?php
    
    $conf = new \RdKafka\Conf();
    
    $rk = new RdKafka\Consumer($conf);
    $rk->addBrokers("××.××.××.××:9091");
    
    $topicConf = new RdKafka\TopicConf();
    $topic = $rk->newTopic('test', $topicConf);
    
    // Start consuming partition 0
    // Start consuming offset 0, or you can control the offset by yourself
    $topic->consumeStart(0, 0);
    
    while (true) {
        $message = $topic->consume(0, 120 * 10000);
        switch ($message->err) {
            case RD_KAFKA_RESP_ERR_NO_ERROR:
                var_dump($message);
                break;
            case RD_KAFKA_RESP_ERR__PARTITION_EOF:
                echo "No more messages; will wait for more\n";
                break;
            case RD_KAFKA_RESP_ERR__TIMED_OUT:
                echo "Timed out\n";
                break;
            default:
                throw new \Exception($message->errstr(), $message->err);
                break;
        }
    }
    

    注意:因为我设置了 auto.create.topics.enable=false,所以示例中的 test Topic 需要提前手动创建好。开发环境中我一般给1个分区,1个副本(单例下 Topic 只好有一个副本)。另外 $topic->consumeStart(0, 0) 中,第一个0是指第一个分区,我这里只有一个分区。第二个0是指 Offset 偏移量,我这里从0开始。有一种手动维护 Offset 的方案,在 MySQL 中,为每一个 Topic 创建一个 offset 字段,每成功消费一个消息,offset++。为性能考虑,offset++ 在 Redis 里面完成,每成功消费1000次,写一次数据库。这样就不需要考虑消费者消费后 Offset 有没有提交。如果你想让 Kafka 自行维护偏移量的话,将第二个参数改为 RD_KAFKA_OFFSET_STORED,并且设置 group id 等相关参数。详见:https://arnaud.le-blanc.net/php-rdkafka-doc/phpdoc/rdkafka.examples-low-level-consumer-basic.html

  4. 集群

    线上环境,一定是使用分布式的高可用的 Kafka 集群。Kafka 是依赖于 Zookeeper 的,所以为了高可用,Zookeeper 也必须是集群。

    Zookeeper 和 Kafka 的目录结构不变,只是从一台机器变成了三台机器。我这里没有三台机器,也不想搞三个虚拟机,就还用 Docker 模拟。

    • Zookeeper 集群

      zoo.cfg 配置文件:

      tickTime=2000
      initLimit=10                                # 集群初始化时间限制(单位:多少个心跳检测时间)
      syncLimit=5                                 # 集群同步时间限制(单位:多少个心跳检测时间)
      clientPort=2181
      dataDir=/data
      dataLogDir=/log
      server.1=0.0.0.0:2888:3888                  # 本机器使用 0.0.0.0 来代替 IP
      server.2=zookeeper2:2888:3888
      server.3=zookeeper3:2888:3888
      

      docker-compose 配置文件:

      services:
        zookeeper1:
          image: zookeeper:3.6.2
          container_name: zookeeper1
          restart: always
          ports:
            - 2181:2181
          volumes:
            - /work/docker/zookeeper1/conf/zoo.cfg:/conf/zoo.cfg
            - /work/docker/zookeeper1/data:/data
            - /work/docker/zookeeper1/log:/log
          networks:
            - kafka-net
      
        zookeeper2:
          image: zookeeper:3.6.2
          container_name: zookeeper2
          restart: always
          ports:
            - 2182:2181
          volumes:
            - /work/docker/zookeeper2/conf/zoo.cfg:/conf/zoo.cfg
            - /work/docker/zookeeper2/data:/data
            - /work/docker/zookeeper2/log:/log
          networks:
            - kafka-net
      
        zookeeper3:
          image: zookeeper:3.6.2
          container_name: zookeeper3
          restart: always
          ports:
            - 2183:2181
          volumes:
            - /work/docker/zookeeper3/conf/zoo.cfg:/conf/zoo.cfg
            - /work/docker/zookeeper3/data:/data
            - /work/docker/zookeeper3/log:/log
          networks:
            - kafka-net
            
      networks:
        kafka-net:
          driver: bridge
      

      可以看到,Zookeeper 集群已经搭建成功(我这里 zookeeper3 是 leader):

      [root@iZuf6c82diwquwsq69eqejZ docker-compose]# docker exec -it zookeeper1 bash -c './bin/zkServer.sh status'
      ZooKeeper JMX enabled by default
      Using config: /conf/zoo.cfg
      Client port found: 2181. Client address: localhost. Client SSL: false.
      Mode: follower
      [root@iZuf6c82diwquwsq69eqejZ docker-compose]# docker exec -it zookeeper2 bash -c './bin/zkServer.sh status'
      ZooKeeper JMX enabled by default
      Using config: /conf/zoo.cfg
      Client port found: 2181. Client address: localhost. Client SSL: false.
      Mode: follower
      [root@iZuf6c82diwquwsq69eqejZ docker-compose]# docker exec -it zookeeper3 bash -c './bin/zkServer.sh status'
      ZooKeeper JMX enabled by default
      Using config: /conf/zoo.cfg
      Client port found: 2181. Client address: localhost. Client SSL: false.
      Mode: leader
      

      注意:zookeeper data 文件中的 myid 需要分别设置成不一样的数字,我这里分别设置成1,2,3。如果是三台机器部署,需要同时开放 2181、2888、3888 三个端口。其中:2181客户端连接,2888同步数据,3888选举。

    • Kafka 集群

      server.properties 配置文件:

      # Broker
      broker.id=1
      log.dirs=/kafka/data
      log.retention.hours=168
      log.retention.bytes=-1
      message.max.bytes=2097152
      auto.create.topics.enable=false
      auto.leader.rebalance.enable=false
      unclean.leader.election.enable=false
      offsets.topic.replication.factor=3                                    # Offset Topic 的副本数 <= 机器数
      replication.factor=3                                                  # Topic 的副本数 <= 机器数
      min.insync.replicas=2                                                 # 消息至少写入多少个副本
      
      # Connect
      listeners=PLAINTEXT://:9091
      advertised.listeners=PLAINTEXT://××.××.××.××:9091
      zookeeper.connect=zookeeper1:2181,zookeeper2:2182,zookeeper3:2183
      zookeeper.connection.timeout.ms=18000
      
      # Producer
      acks=all                                                              # 写入所有副本才算发送成功
      retries=1
      
      # Consumer
      enable.auto.commit=false
      
      • 这里有几个参数需要重点讲一下:
        • broker.id 代表 Broker 的 ID,需要唯一,建议和 Zookeeper 的 myid 一致。
        • zookeeper.connect 这是一个 CSV 格式的参数,单机的话就是 zk1:2181,集群的话就是 zk1:2181,zk2:2181,zk3:2181。一目了然。
        • replication.factor 代表 Topic 的副本数,集群中最好将其设置为 >= 3,以防止消息丢失。
        • min.insync.replicas 是 Broker 端的参数,控制的是消息至少要被写入到多少个副本才算是“已提交”。设置成大于 1 可以提升消息持久性。在实际环境中千万不要使用默认值 1。确保 replication.factor > min.insync.replicas。如果两者相等,那么只要有一个副本挂机,整个分区就无法正常工作了。我们不仅要改善消息的持久性,防止数据丢失,还要在不降低可用性的基础上完成。推荐设置成 replication.factor = min.insync.replicas + 1。
        • acks 是 Producer 端的参数,和 min.insync.replicas 看起来有点冲突。但事实上,min.insync.replicas 是给 all 设置了一个下限。打个比方,一共有三个副本,挂了一台机器,就剩两个副本了,这个时候 all 还是3,但显然 Producer 已经写入不了三个副本了。这个时候 min.insync.replicas 的作用就体现出来了,只要副本数 >= 2,就还可以继续写入。
        • offsets.topic.replication.factor 参数的默认值是3,和 replication.factor 设置成一样的就行了。所谓 Offset Topic 就是位移主题,在集群搭建成功的时候会自动生成,用来保存 Kafka 消费者的位移信息。默认50个分区。

      docker-compose 配置文件:

        kafka1:
          image: wurstmeister/kafka:2.13-2.6.0
          container_name: kafka1
          restart: always
          ports:
            - 9091:9091
          volumes:
            - /work/docker/kafka1/conf/server.properties:/opt/kafka/config/server.properties
            - /work/docker/kafka1/data:/kafka/data
            - /work/docker/kafka1/log:/opt/kafka/logs
            - /work/docker/kafka1/start-kafka.sh:/usr/bin/start-kafka.sh
          networks:
            - kafka-net
      
        kafka2:
          image: wurstmeister/kafka:2.13-2.6.0
          container_name: kafka2
          restart: always
          ports:
            - 9092:9092
          volumes:
            - /work/docker/kafka2/conf/server.properties:/opt/kafka/config/server.properties
            - /work/docker/kafka2/data:/kafka/data
            - /work/docker/kafka2/log:/opt/kafka/logs
            - /work/docker/kafka2/start-kafka.sh:/usr/bin/start-kafka.sh
          networks:
            - kafka-net
      
        kafka3:
          image: wurstmeister/kafka:2.13-2.6.0
          container_name: kafka3
          restart: always
          ports:
            - 9093:9093
          volumes:
            - /work/docker/kafka3/conf/server.properties:/opt/kafka/config/server.properties
            - /work/docker/kafka3/data:/kafka/data
            - /work/docker/kafka3/log:/opt/kafka/logs
            - /work/docker/kafka3/start-kafka.sh:/usr/bin/start-kafka.sh
          networks:
            - kafka-net
      

      集群搭建好了之后,查看 Topics,发现已经多了一个 __consumer_offsets 位移主题:

      [root@iZuf6c82diwquwsq69eqejZ docker]# docker exec -it kafka1 bash -c './opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181,zookeeper2:2182,zookeeper3:2183 --describe'
      Topic: __consumer_offsets	PartitionCount: 50	ReplicationFactor: 3	Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
      	Topic: __consumer_offsets	Partition: 0	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 1	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 2	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 3	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 4	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 5	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 6	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 7	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 8	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 9	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 10	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 11	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 12	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 13	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 14	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 15	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 16	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 17	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 18	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 19	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 20	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 21	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 22	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 23	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 24	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 25	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 26	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 27	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 28	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 29	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 30	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 31	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 32	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 33	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 34	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 35	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 36	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 37	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 38	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 39	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 40	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 41	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 42	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 43	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 44	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
      	Topic: __consumer_offsets	Partition: 45	Leader: 3	Replicas: 3,1,2	Isr: 3,1,2
      	Topic: __consumer_offsets	Partition: 46	Leader: 1	Replicas: 1,2,3	Isr: 1,3,2
      	Topic: __consumer_offsets	Partition: 47	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
      	Topic: __consumer_offsets	Partition: 48	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
      	Topic: __consumer_offsets	Partition: 49	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
      

      注意:集群搭建好之后推荐使用 kafka-manager 来管理 Clusters 和 Topics:

      docker run -d -p 8080:9000 -e ZK_HOSTS="zookeeper1:2181,zookeeper2:2182,zookeeper3:2183" --name=kafka-manager --net=docker-compose_kafka-net sheepkiller/kafka-manager
      

    在客户端连接中,只需要把 $rk->addBrokers("××.××.××.××:9091") 之类的换成 $rk->addBrokers("××.××.××.××:9091,××.××.××.××:9092,××.××.××.××:9093") 就行了。

  5. 实战

    在实际生产中,使用 Kafka 的场景有很多。比如异步写日志、下单后的业务逻辑处理、登录后的业务逻辑处理等。对于高并发的业务场景,一个分区不一定够用。下面的示例中,我创建了一个 userLogin 主题,设置了4个分区3个副本。

    [root@iZuf6c82diwquwsq69eqejZ docker]# docker exec -it kafka1 bash -c './opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181,zookeeper2:2182,zookeeper3:2183 --describe --topic=userLogin'
    Topic: userLogin	PartitionCount: 4	ReplicationFactor: 3	Configs: 
    	Topic: userLogin	Partition: 0	Leader: 1	Replicas: 1,3,2	Isr: 1,3,2
    	Topic: userLogin	Partition: 1	Leader: 2	Replicas: 2,1,3	Isr: 2,1,3
    	Topic: userLogin	Partition: 2	Leader: 3	Replicas: 3,2,1	Isr: 3,2,1
    	Topic: userLogin	Partition: 3	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3
    

    生产者

    <?php
    
    class UserLoginProducer
    {
        public function login()
        {
            try {
                $params = [
                    'user_id' => 74,
                    'login_area' => 'Shanghai',
                    'login_time' => time()
                ];
    
                $this->handleAfterLogin($params);
    
                return json_encode(['code' => 200]);
            } catch (\Exception $e) {
                return json_encode([
                    'code' => 400,
                    'msg' => $e->getMessage()
                ]);
            }
        }
    
        protected function handleAfterLogin(array $params)
        {
            $conf = new RdKafka\Conf();
            $conf->set('metadata.broker.list', '××.××.××.××:9091,××.××.××.××:9092,××.××.××.××:9093');
    
            $conf->setDrMsgCb(function ($kafka, $message) {
                if ($message->err) {
                    throw new \Exception('message permanently failed to be delivered');
                } else {
                    // message successfully delivered
                }
            });
    
            $producer = new RdKafka\Producer($conf);
    
            $topic = $producer->newTopic("userLogin");
    
            // The first argument is the partition. RD_KAFKA_PARTITION_UA stands for unassigned, and lets librdkafka choose the partition.
            // The second argument are message flags and should be either 0
            // or RD_KAFKA_MSG_F_BLOCK to block produce on full queue. The message payload can be anything.
            $topic->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($params));
    
            // Polls the producer for events and calls the corresponding callbacks (if registered).
            $producer->poll(0);
    
            // This should be done prior to destroying a producer instance
            // to make sure all queued and in-flight produce requests are completed
            // before terminating. Use a reasonable value for $timeout_ms.
            $result = $producer->flush(10000);
    
            if (RD_KAFKA_RESP_ERR_NO_ERROR !== $result) {
                throw new \RuntimeException('Was unable to flush, messages might be lost!');
            }
        }
    }
    
    $user = new UserLoginProducer();
    $res = $user->login();
    print_r($res); // {"code":200}
    

    注意:对于生产者而言,最最重要的就是确保消息已经成功发送出去了。这里的发送出去,不仅指的是 Producer 发送成功,还需要 Broker 接收成功才行。要知道 Broker 有没有接收成功,我们必须加一个回调。$conf->setDrMsgCb$producer->poll(0) 实现了这个回调。详见:https://arnaud.le-blanc.net/php-rdkafka-doc/phpdoc/rdkafka-conf.setdrmsgcb.html

    还有,开发应该清楚地知道自己把消息发送到了哪个分区。$topic->produce(RD_KAFKA_PARTITION_UA, 0, json_encode($params)) 第一个参数是分区。RD_KAFKA_PARTITION_UA 是不指定分区的意思,如果不指定分区,rdkafka 会自己指定分区。如果只有一个分区那好说,RD_KAFKA_PARTITION_UA 就是0的意思。像我这里一共有四个分区,在我不知道 rdkafka 选择分区逻辑的情况下,我倾向于自己把控。因此,这里把 RD_KAFKA_PARTITION_UA 改为 random_int(0, 3) (随机发送到4个分区中的一个)更加合适。还有就是,在一个分区上,消息是有序的,在不同的分区上,消息可能会乱序。所以,在对顺序有要求的场景下,一定要自己把控分区,把对顺序有要求的消息放在同一个分区上。

    消费者

    <?php
    
    $conf = new RdKafka\Conf();
    
    // Set a rebalance callback to log partition assignments (optional)
    $conf->setRebalanceCb(function (RdKafka\KafkaConsumer $kafka, $err, array $partitions = null) {
        switch ($err) {
            case RD_KAFKA_RESP_ERR__ASSIGN_PARTITIONS:
                echo "Assign: ";
                var_dump($partitions);
                $kafka->assign($partitions);
                break;
    
            case RD_KAFKA_RESP_ERR__REVOKE_PARTITIONS:
                echo "Revoke: ";
                var_dump($partitions);
                $kafka->assign(NULL);
                break;
    
            default:
                throw new \Exception($err);
        }
    });
    
    // Configure the group.id. All consumer with the same group.id will consume
    // different partitions.
    $conf->set('group.id', 'userLoginConsumerGroup');
    
    // Initial list of Kafka brokers
    $conf->set('metadata.broker.list', '××.××.××.××:9091,××.××.××.××:9092,××.××.××.××:9093');
    
    // Set where to start consuming messages when there is no initial offset in
    // offset store or the desired offset is out of range.
    // 'earliest': start from the beginning
    $conf->set('auto.offset.reset', 'earliest');
    
    $conf->set('enable.auto.commit', 'false');
    
    $consumer = new RdKafka\KafkaConsumer($conf);
    
    // Subscribe to topic 'userLogin'
    $consumer->subscribe(['userLogin']);
    
    echo "Waiting for partition assignment... (make take some time when quickly re-joining the group after leaving it.)\n";
    
    while (true) {
        $message = $consumer->consume(120 * 1000);
        switch ($message->err) {
            case RD_KAFKA_RESP_ERR_NO_ERROR:
                // var_dump($message);
                if (handleAfterLogin($message->payload)) {
                    $consumer->commit($message);
                    echo '[ ' . date('Y-m-d H:i:s') . ' ] ' . $message->payload . ' consume successful' . "\n";
                }
    
                break;
            case RD_KAFKA_RESP_ERR__PARTITION_EOF:
                echo "No more messages; will wait for more\n";
                break;
            case RD_KAFKA_RESP_ERR__TIMED_OUT:
                echo "Timed out\n";
                break;
            default:
                throw new \Exception($message->errstr(), $message->err);
                break;
        }
    }
    
    function handleAfterLogin($params)
    {
        $data = json_decode($params);
        if ($data->user_id == 74) {
            return true;
        }
    
        return false;
    }
    

    注意:对于消费者而言,最最重要的就是确保消息被正确消费。消费成功时,提交偏移量,消费失败时,什么都不变,做到不重复消费。$conf->set('enable.auto.commit', 'false') 和消费成功后 $consumer->commit($message) 保证了这一点。对于多个分区的生产环境,我们最好使用消费组。我这里是四个分区,同一个消费组下启动了两个消费者,所以每个消费者消费两个分区。最好有几个分区,就设置几个消费者。而且尽量不要在一个消费组下新增或者删除消费者,这样将导致 rebalance,严重影响性能。

    运行结果

    # consumer1
    $ php ./UserLoginHighConsumer.php
    Waiting for partition assignment... (make take some time when quickly re
    -joining the group after leaving it.)
    Assign: array(2) {
      [0]=>
      object(RdKafka\TopicPartition)#4 (3) {
        ["topic"]=>
        string(9) "userLogin"
        ["partition"]=>
        int(2)
        ["offset"]=>
        int(-1001)
      }
      [1]=>
      object(RdKafka\TopicPartition)#5 (3) {
        ["topic"]=>
        string(9) "userLogin"
        ["partition"]=>
        int(3)
        ["offset"]=>
        int(-1001)
      }
    }
    [ 2021-01-07 16:05:45 ] {"user_id":74,"login_area":"Shanghai","login_tim
    e":1610006745} consume successful
    [ 2021-01-07 16:05:47 ] {"user_id":74,"login_area":"Shanghai","login_tim
    e":1610006747} consume successful
    
    # consumer2
    $ php ./UserLoginHighConsumer.php
    Waiting for partition assignment... (make take some time when quickly re
    -joining the group after leaving it.)
    Assign: array(2) {
      [0]=>
      object(RdKafka\TopicPartition)#4 (3) {
        ["topic"]=>
        string(9) "userLogin"
        ["partition"]=>
        int(0)
        ["offset"]=>
        int(-1001)
      }
      [1]=>
      object(RdKafka\TopicPartition)#5 (3) {
        ["topic"]=>
        string(9) "userLogin"
        ["partition"]=>
        int(1)
        ["offset"]=>
        int(-1001)
      }
    }
    [ 2021-01-07 16:05:46 ] {"user_id":74,"login_area":"Shanghai","login_tim
    e":1610006746} consume successful
    [ 2021-01-07 16:05:48 ] {"user_id":74,"login_area":"Shanghai","login_tim
    e":1610006748} consume successful
    [ 2021-01-07 16:05:49 ] {"user_id":74,"login_area":"Shanghai","login_tim
    e":1610006749} consume successful
    

参考资料:《Kafka核心技术与实战》

posted @ 2021-01-07 17:39  灯无焰  阅读(239)  评论(0编辑  收藏  举报