Flume 操作示例

一、案例1之 Spool

Spool 监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:

  • 拷贝到 spool 目录下的文件不可以再打开编辑。

  • spool 目录下不可包含相应的子目录。

配置文件 jobs/spool.conf

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /opt/apache-flume-1.6.0-bin/logs
a1.sources.r1.fileHeader = true
a1.sources.r1.channels = c1

a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

 

启动命令

bin/flume-ng agent \
    -c conf \
    -f jobs/spool.conf \
    -n a1 \
    -Dflume.root.logger=INFO,console

 

测试

$ echo "hello world" > logs/spool.log

$ more logs/spool.log.COMPLETED
hello world

 

 二、案例2之 Exec

EXEC 执行一个给定的命令获得输出的源

配置文件 jobs/exec.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/apache-flume-1.6.0-bin/logs/log_exec_tail
a1.sources.r1.channels = c1

a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

 

测试

for i in {1..1000}
do
echo "exec tail$i" >> /opt/apache-flume-1.6.0-bin/logs/log_exec_tail
done

 

三、案例3之 JSONHanlder

从远程客户端接收数据

配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.port = 8888
a1.sources.r1.channels = c1

a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

 

测试

curl -X POST \
    -d '[{ 
          "headers" :{"a" : "a1","b" : "b1"},
          "body" : "shiyanlou.org_body"
        }]' http://localhost:8888

 

四、案例4之 Syslogtcp

把数据写入 HDFS

配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 4444
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/user/hadoop/syslogtcp
a1.sinks.k1.hdfs.filePrefix = Syslog
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.channel = c1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

 

测试

$ echo "hello  flume" | nc localhost 4444

 检测输出

$ hadoop fs -lsr /user/hadoop

 

五、案例5之 File Roll Sink

写入稍微复杂的文件数据,把动态生成的时间戳和数据一同写入 HDFS。

配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5556
a1.sources.r1.host = localhost

a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /opt/apache-flume-1.6.0-bin/logs

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 

测试

$ echo "Hello world!"|nc localhost 5556

 

查看 /opt/apache-flume-1.6.0-bin/logs 下是否生成文件,默认每 30 秒生成一个新文件。

$ ls -alh /opt/apache-flume-1.6.0-bin/logs/

 

233

posted on 2020-06-21 20:51  Lemo_wd  阅读(188)  评论(0编辑  收藏  举报

导航