Flume 操作示例
一、案例1之 Spool
Spool 监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:
-
拷贝到 spool 目录下的文件不可以再打开编辑。
-
spool 目录下不可包含相应的子目录。
配置文件 jobs/spool.conf
a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /opt/apache-flume-1.6.0-bin/logs a1.sources.r1.fileHeader = true a1.sources.r1.channels = c1 a1.sinks.k1.type = logger a1.sinks.k1.channel = c1
启动命令
bin/flume-ng agent \ -c conf \ -f jobs/spool.conf \ -n a1 \ -Dflume.root.logger=INFO,console
测试
$ echo "hello world" > logs/spool.log $ more logs/spool.log.COMPLETED hello world
二、案例2之 Exec
EXEC 执行一个给定的命令获得输出的源
配置文件 jobs/exec.conf
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.type = exec a1.sources.r1.command = tail -F /opt/apache-flume-1.6.0-bin/logs/log_exec_tail a1.sources.r1.channels = c1 a1.sinks.k1.type = logger a1.sinks.k1.channel = c1
测试
for i in {1..1000} do echo "exec tail$i" >> /opt/apache-flume-1.6.0-bin/logs/log_exec_tail done
三、案例3之 JSONHanlder
从远程客户端接收数据
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.type = org.apache.flume.source.http.HTTPSource a1.sources.r1.port = 8888 a1.sources.r1.channels = c1 a1.sinks.k1.type = logger a1.sinks.k1.channel = c1
测试
curl -X POST \ -d '[{ "headers" :{"a" : "a1","b" : "b1"}, "body" : "shiyanlou.org_body" }]' http://localhost:8888
四、案例4之 Syslogtcp
把数据写入 HDFS
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = syslogtcp a1.sources.r1.port = 4444 a1.sources.r1.host = localhost a1.sources.r1.channels = c1 a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs://localhost:9000/user/hadoop/syslogtcp a1.sinks.k1.hdfs.filePrefix = Syslog a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute a1.sinks.k1.channel = c1 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100
测试
$ echo "hello flume" | nc localhost 4444
检测输出
$ hadoop fs -lsr /user/hadoop
五、案例5之 File Roll Sink
写入稍微复杂的文件数据,把动态生成的时间戳和数据一同写入 HDFS。
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = syslogtcp a1.sources.r1.port = 5556 a1.sources.r1.host = localhost a1.sinks.k1.type = file_roll a1.sinks.k1.sink.directory = /opt/apache-flume-1.6.0-bin/logs a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
测试
$ echo "Hello world!"|nc localhost 5556
查看 /opt/apache-flume-1.6.0-bin/logs
下是否生成文件,默认每 30 秒生成一个新文件。
$ ls -alh /opt/apache-flume-1.6.0-bin/logs/
233