Flume—(3)实时读取目录文件到HDFS

1)案例需求：

　　使用flume监听整个目录的文件

2）需求分析：

3）实现步骤：

1. 创建配置文件flume-dir-hdfs.conf

创建一个文件并打开文件

[ck@hadoop102 job]$ touch flume-dir-hdfs.conf
[ck@hadoop102 job]$ vim flume-dir-hdfs.conf

添加如下内容

a3.sources = r3
a3.sinks = k3
a3.channels = c3

#Describe/configure the source
a3.sources.r3.type = spooldir           #定义source类型为目录               
a3.sources.r3.spoolDir = /opt/module/flume-1.9.0/upload   #定义监控目录
a3.sources.r3.fileSuffix = .COMPLETED   #定义文件上传完后的后缀
a3.sources.r3.fileHeader = true         #是否有文件头
a3.sources.r3.ignorePattern = ([^ ]*\.tmp)  忽略所有以.tmp的文件，不上传

#Describe the sink
a3.sinks.k3.type = hdfs        #定义sink类型为hdfs
a3.sinks.k3.hdfs.path = hdfs://hadoop102:9000/flume-1.9.0/upload/%Y%m%d/%H  #文件上传路径
a3.sinks.k3.hdfs.filePrefix = upload-     #上传文件到hdfs的前缀
a3.sinks.k3.hdfs.round = true             #是否按时间滚动文件
a3.sinks.k3.hdfs.roundValue = 1           #多少时间单位创建一个新的文件
a3.sinks.k3.hdfs.roundUnit = hour         #重新定义时间单位
a3.sinks.k3.hdfs.useLocalTimeStamp = true #是否使用本地时间戳
a3.sinks.k3.hdfs.batchSize = 100          #积攒多少个event才flush到HDFS一次
a3.sinks.k3.hdfs.fileType = DataStream    #设置文件类型
a3.sinks.k3.hdfs.rollInterval = 60        #多久生成新文件
a3.sinks.k3.hdfs.rollSize = 134217700     #多大生成新文件
a3.sinks.k3.hdfs.rollCount = 0            #多少event生成新文件
a3.sinks.k3.hdfs.minBlockReplicas = 1     #多少副本数
 
#Use a channel which buffers events in memory
a3.channels.c3.type = memory
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100
 
#Bind the Source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

2.启动监控文件夹命令

[ck@hadoop102 flume-1.9.0]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/flume-dir-hdfs.conf

说明：在使用spooling Directory Source时

1）不要在监控目录中创建并持续修改文件

2）上传完成的文件会以.COMPLETED结尾

3）被监控文件夹每500毫秒扫描一次文件变动

3. 向upload文件夹中添加文件

[ck@hadoop102 flume-1.9.0]$ mkdir upload
[ck@hadoop102 flume-1.9.0]$ cd upload/
[ck@hadoop102 upload]$ touch ck.log
[ck@hadoop102 upload]$ touch ck.txt
[ck@hadoop102 upload]$ touch ck.tmp

4. 查看HDFS上的数据

5. 等待1s，再次查询upload文件夹

案例来源于atguigu视频

posted @ 2020-09-09 20:09 cqyyck 阅读(516) 评论(0) 收藏举报

刷新页面返回顶部

cqyyck

勤能补拙，好记性不如烂笔头！