Flume的断点续传解决

根据需求，首先定义以下3大要素

采集源，即source——监控文件内容更新 : exec ‘tail -F file’

下沉目标，即sink——HDFS文件系统 : hdfs sink

Source和sink之间的传递通道——channel，可用file channel 也可以用内存channel

agent1.sources = source1

agent1.sinks = sin k1

agent1.channels = channel1

# Describe/configure tail -F source1

agent1.sources.source1.type = exec

agent1.sources.source1.command = tail -f /root/flumedata/logs/text.txt

agent1.sources.source1.channels = channel1

#configure host for source

agent1.sources.source1.interceptors = i1

agent1.sources.source1.interceptors.i1.type = host

agent1.sources.source1.interceptors.i1.hostHeader = hostname

# Describe sink1

agent1.sinks.sink1.type = hdfs

#a1.sinks.k1.channel = c1

agent1.sinks.sink1.hdfs.path =hdfs://hadoop01:9000/weblog/flume-collection/%y-%m-%d/%H-%M

agent1.sinks.sink1.hdfs.filePrefix = access_log

agent1.sinks.sink1.hdfs.maxOpenFiles = 5000

agent1.sinks.sink1.hdfs.batchSize= 10

agent1.sinks.sink1.hdfs.fileType = DataStream

agent1.sinks.sink1.hdfs.writeFormat =Text

agent1.sinks.sink1.hdfs.rollSize = 10

agent1.sinks.sink1.hdfs.rollCount = 100

agent1.sinks.sink1.hdfs.rollInterval = 6

agent1.sinks.sink1.hdfs.round = true

agent1.sinks.sink1.hdfs.roundValue = 1

agent1.sinks.sink1.hdfs.roundUnit = minute

agent1.sinks.sink1.hdfs.useLocalTimeStamp = true

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.keep-alive = 120

agent1.channels.channel1.capacity = 500000

agent1.channels.channel1.transactionCapacity = 600

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

然后往：/root/flumedata/logs/text.txt 这个文件中追加日期

while true

date >> /root/flumedata/logs/text.txt

done

tail -f 和 tail -F的区别：

tail -f 当文件变了，不会再输出

tail -F当文件变了，还会再输出

所以，我们可以利用tail -F实现断点续传的功能：

a1.sources.r2.command=

tail -n +$(tail -n1 /root/log) -F /root/data/nginx.log | awk 'ARGIND==1{i=$0;next}{i++;if($0~/^tail/){i=0};print $0;print i >> "/root/log";fflush("")}' /root/log-

如果有多个source，那必须要配置多个：a1.sources.r2.command

posted @ 2015-05-02 00:02 niutao 阅读(1493) 评论(0) 编辑收藏举报

刷新页面返回顶部