【Kafka】Flume整合Kafka


  • 需求

    实现flume监控某个目录下面的所有文件,然后将文件收集发送到kafka消息系统中

  • 一、Flume下载地址

    http://archive.cloudera.com/cdh5/cdh/5

  • 二、上传解压Flume

    cd /export/softwares
    tar -zxvf apache-flume-1.6.0-cdh5.14.0 -C ../servers

  • 三、配置flume.conf

    使用flume监控一个文件夹,一旦文件夹下面有了数据,就将数据发送到Kafka里面去
    mkdir -p /export/servers/flumedata 先创建要监控的文件夹
    cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
    vim flume_kafka.conf

    # 给各个组件起名
    a1.sources = r1
    a1.channels = c1
    a1.sinks = k1
    
    # 指定source收集到的数据发送到哪个管道
    a1.sources.r1.channels = c1
    # 指定source数据收集策略
    a1.sources.r1.type = spooldir
    a1.sources.r1.spoolDir = /export/servers/flumedata
    a1.sources.r1.deletePolicy = never
    a1.sources.r1.fileSuffix = .COMPLETED
    a1.sources.r1.ignorePattern = ^(.)*\\.tmp$
    a1.sources.r1.inputCharset = UTF-8
    
    #指定channel为memory,即表示所有的数据都装进memory当中
    a1.channels.c1.type = memory
    
    #指定sink为kafka  sink,并指定sink从哪个channel当中读取数据
    a1.sinks.k1.channel = c1
    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    a1.sinks.k1.kafka.topic = test
    a1.sinks.k1.kafka.bootstrap.servers = node01:9092,node02:9092,node03:9092
    a1.sinks.k1.kafka.flumeBatchSize = 20
    a1.sinks.k1.kafka.producer.acks = 1
    
  • 四、启动flume

    bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name a1 -Dflume.root.logger=INFO,console

  • 五、测试整合

    启动flume成功后,再启动kafka bin/kafka-console-consumer.sh --from-beginning --bootstrap-server node01:9092 --topic test
    然后像/export/servers/flumedata目录下上传文本文件即可

posted @ 2020-03-22 23:24  _codeRookie  阅读(106)  评论(0编辑  收藏  举报