Flume实战案例 -- 采集文件到HDFS

需求分析：

采集需求：比如业务系统使用log4j生成的日志，日志内容不断增加，需要把追加到日志文件中的数据实时采集到hdfs
根据需求，首先定义以下3大要素
- 采集源，即source——监控文件内容更新 : exec ‘tail -f file’
- 下沉目标，即sink——HDFS文件系统 : hdfs sink
- Source和sink之间的传递通道——channel，可用file channel 也可以用内存channel

flume的配置文件开发

hadoop03开发配置文件

cd /bigdata/install/flume-1.9.0/conf
vim tail-file.conf

配置文件内容

agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

# Describe/configure tail -F source1
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /bigdata/install/mydata/flume/taillogs/access_log
agent1.sources.source1.channels = channel1

# Describe sink1
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = hdfs://hadoop01:8020/weblog/flume-collection/%y-%m-%d/%H-%M
agent1.sinks.sink1.hdfs.filePrefix = access_log
# 允许打开的文件数；如果超出5000，老文件会被关闭
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
agent1.sinks.sink1.hdfs.batchSize= 100
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize = 102400
agent1.sinks.sink1.hdfs.rollCount = 1000000
agent1.sinks.sink1.hdfs.rollInterval = 60
agent1.sinks.sink1.hdfs.round = true
agent1.sinks.sink1.hdfs.roundValue = 10
agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true

# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
# 向channel添加一个event或从channel移除一个event的超时时间
agent1.channels.channel1.keep-alive = 120
agent1.channels.channel1.capacity = 5000    ##设置过大，效果不是太明显
agent1.channels.channel1.transactionCapacity = 4500

# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

组件官网：

hdfs sink

memory channel

启动flume

cd /bigdata/install/flume-1.9.0
bin/flume-ng agent -c conf -f conf/tail-file.conf -n agent1 -Dflume.root.logger=INFO,console

开发shell脚本定时追加文件内容

mkdir -p /home/hadoop/shells/
cd /home/hadoop/shells/
vim tail-file.sh

内容如下

#!/bin/bash
while true
do
 date >> /bigdata/install/mydata/flume/taillogs/access_log;
  sleep 0.5;
done

创建文件夹

mkdir -p /bigdata/install/mydata/flume/taillogs/

启动脚本

chmod u+x tail-file.sh 
sh /home/hadoop/shells/tail-file.sh

验证结果，在hdfs的webui下和console下可以看到如下截图

posted @ 2021-06-20 01:06 Tenic 阅读(421) 评论(0) 编辑收藏举报

刷新页面返回顶部

Tenic

Flume实战案例 -- 采集文件到HDFS

需求分析：

flume的配置文件开发

启动flume

开发shell脚本定时追加文件内容

公告