Flume实战案例 -- 从HDFS上读取某个文件到本地目录
需求分析
-
我们从HDFS上的特定目录下的文件,读取到本地目录下的特定目录下
-
根据需求,首先定义以下3大要素
-
数据源组件,即source ——监控HDFS目录文件 : exec 'tail -f'
-
下沉组件,即sink—— file roll sink
-
通道组件,即channel——可用file channel 也可以用内存channel
-
flume配置文件开发
-
配置文件编写:
cd /bigdata/install/flume-1.9.0/conf/ vim hdfs2local.conf
-
内容如下
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source # 注意:不能往监控目中重复丢同名文件 a1.sources.r1.type = exec a1.sources.r1.command = hdfs dfs -tail -f /hdfs2flume/test/a.txt a1.sources.r1.channels = c1 # sink 配置信息 a1.sinks.k1.type = file_roll a1.sinks.k1.channel = c1 a1.sinks.k1.sink.directory = /bigdata/install/mydata/flume/hdfs2local a1.sinks.k1.sink.rollInterval = 3600 a1.sinks.k1.sink.pathManager.prefix = event- a1.sinks.k1.sink.serializer = TEXT a1.sinks.k1.sink.batchSize = 100 # Use a channel which buffers events in memory a1.channels.c1.type = memory # channel中存储的event的最大数目 a1.channels.c1.capacity = 1000 # 每次传输数据,从source最多获得event的数目或向sink发送的event的最大的数目 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
准备HDFS文件信息
vi a.txt #输入一下内容,保存并推送到HDFS上 1 zhangsan 21 2 lisi 22 3 wangwu 23 4 zhaoliu 24 5 guangyunchang 25 6 gaojianli 27 hdfs dfs -put ./a.txt /hdfs2flume/test/a.txt
-
准备本地目录文件夹
mkdir -p /bigdata/install/mydata/flume/hdfs2local
-
启动flume
cd /bigdata/install/flume-1.9.0 bin/flume-ng agent -c ./conf -f ./conf/hdfs2local.conf -n a1 -Dflume.root.logger=INFO,console
-
追加hdfs上a.txt文件内容,验证本地目录文件夹,如下图