使用Flume往kafka和hdfs里同时写数据
环境背景
组件名称 | 组件版本 | 百度网盘地址 |
Flume | flume-ng-1.6.0-cdh5.7.0.tar.gz | 链接:https://pan.baidu.com/s/11QeF7rk2rqnOrFankr4TzA 提取码:3ojw |
Zookeeper | Zookeeper-3.4.5 | 链接:https://pan.baidu.com/s/1upNcB53WGWP_89lhYnqP6g 提取码:j50f |
Kafka | kafka_2.11-0.10.0.0.tgz | 链接:https://pan.baidu.com/s/1TpU6QPnoF1tuUy-7HnGgmQ 提取码:aapj |
Zookeeper部署 参照第4部
-
kafka部署
#解压 [hadoop@hadoop001 soft]$ cd ~/soft [hadoop@hadoop001 soft]$ tar -zxvf kafka_2.11-0.10.0.0.tgz -C ~/app/ #修改数据存储位置 [hadoop@hadoop001 soft]$ cd ~/app/kafka_2.11-0.10.0.0/ [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ mkdir -p ~/app/kafka_2.11-0.10.0.0/datalogdir [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ vim config/server.properties log.dirs=/home/hadoop/app/kafka_2.11-0.10.0.0/datalogdir #添加环境变量 [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ vim ~/.bash_profile export KAFKA_HOME=/home/hadoop/app/kafka_2.11-0.10.0.0 export PATH=$KAFKA_HOME/bin:$PATH [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ source ~/.bash_profile [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ which kafka-topics.sh ~/app/kafka-0.10.1.1/bin/kafka-topics.sh #启动 [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-server-start.sh config/server.properties #测试:创建Topic [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wsk_test #测试:显示Topic列表 [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-topics.sh --list --zookeeper localhost:2181 #测试:控制台生产者 [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wsk_test #测试:控制台消费者 [hadoop@hadoop001 kafka_2.11-0.10.0.0]$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wsk_test --from-beginning
-
配置Flume作业
使用Flume的TailDir Source采集数据发送到Kafka以及HDFS。具体配置如下:
Taildir-HdfsAndKafka-Agnet.sources = taildir-source Taildir-HdfsAndKafka-Agnet.channels = c1 c2 Taildir-HdfsAndKafka-Agnet.sinks = hdfs-sink kafka-sink Taildir-HdfsAndKafka-Agnet.sources.taildir-source.type = TAILDIR Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups = f1 Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups.f1 = /home/hadoop/data/flume/HdfsAndKafka/input/.* Taildir-HdfsAndKafka-Agnet.sources.taildir-source.positionFile = /home/hadoop/data/flume/HdfsAndKafka/taildir_position/taildir_position.json Taildir-HdfsAndKafka-Agnet.sources.taildir-source.selector.type = replicating Taildir-HdfsAndKafka-Agnet.channels.c1.type = memory Taildir-HdfsAndKafka-Agnet.channels.c2.type = memory Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.type = hdfs Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/HdfsAndKafka/%Y%m%d%H%M Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.useLocalTimeStamp=true Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.filePrefix = wsktest- Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollInterval = 10 Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollSize = 100000000 Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollCount = 0 Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.fileType=DataStream Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.writeFormat=Text Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.brokerList = localhost:9092 Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.topic = wsk_test Taildir-HdfsAndKafka-Agnet.sources.taildir-source.channels = c1 c2 Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.channel = c1 Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.channel = c2
-
启动命令
flume-ng agent \ --name Taildir-HdfsAndKafka-Agnet \ --conf $FLUME_HOME/conf \ --conf-file $FLUME_HOME/conf/Taildir-HdfsAndKafka-Agnet.conf \ -Dflume.root.logger=INFO,console