Flume实例分析

需求1:从指定网络端口(44444)采集数据输出到控制台
需求2:监控一个文件实时采集新增的数据输出到控制台
需求3:将A服务器上的日志实时采集到B服务器


一、需求1:从指定网络端口(44444)采集数据输出到控制台

1.建立一个test.conf(简单的节点flume的配置)

(1)使用flume的关键在于写配置文件

a)配置source
b)配置 channel
c)配置 Sink
d)把以上三个组件串起来

a1:agent的名称
r1:数据源的名称
k1:sink的名称
c1:channel 的名称


(2)在/kbb/install/flume/conf目录下建立test.conf文件
vim test.conf

(3)test.conf内容如下:
#name the compents on this agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#describe/configure the source 配置source
a1.sources.r1.type=netcat
a1.sources.r1.bind=node01
a1.sources.r1.port=44444

#describe the sink 配置sink
a1.sinks.k1.type=logger

#use a channel which buffers events in memory 存储到memory
a1.channels.c1.type=memory

#bind the source and sink to channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

 

 

 

2.启动agent

 /kbb/install/flume/bin 目录下启动下列命令
./flume-ng agent --name a1 --conf /kbb/install/flume/conf --conf-file /kbb/install/flume/conf/test.conf -Dflume.root.logger=INFO,console

克隆窗口
使用telnet进行测试:
telnet node01 44444

传递消息时窗口中出现下列格式的传递消息
Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. }
Event是flume的数据传输基本单元
Event=可选的header+byte arry

 


二、需求2:监控一个文件实时采集新增的数据输出到控制台

1(输出到控制台)

Agent选型:exec source+ memory channel +logger sink

1.)创建一个文件 exec-memory-logger.conf
exec-memory-logger.conf 配置文件如下:

#name the compents on this agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#describe/configure the source 配置source
a1.sources.r1.type=exec
a1.sources.r1.command=tail -F /kbb/install/flume/data/data.log #监控文件路径
a1.sources.r1.shell=/bin/sh -c

#describe the sink 配置sink
a1.sinks.k1.type=logger

#use a channel which buffers events in memory 存储到memory
a1.channels.c1.type=memory

#bind the source and sink to channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

 

 

2).启动agent
/kbb/install/flume/bin 目录下启动下列命令
./flume-ng agent --name a1 --conf /kbb/install/flume/conf --conf-file /kbb/install/flume/conf/exec-memory-logger.conf -Dflume.root.logger=INFO,console

克隆窗口
echo welcome >>data.log (向/kbb/install/flume/data/data.log文件中写入welcome等内容)
往监控文件data.log中输入内容,控制台上会显示输入的内容,实现了对某个文件的实时监控

 

3(将内容输出到hdfs:离线)

hdfs中新建文件夹 hadoop fs -mkdir /filename
hadoop fs -mkdir /user/flume/test

3.配置文件file-flume-hdfs.conf

#name the compents on this agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#describe/configure the source 配置source

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir=/home/hadoop/flume

#describe the sink 配置sink
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://node01:9870/user/flume/test/%y-%m-%d/%H%M/

a1.sinks.k1.hdfs.filePrefix = Data
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#use a channel which buffers events in memory 存储到memory
a1.channels.c1.type=memory

#bind the source and sink to channel

a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1


bin目录下启动agent
./flume-ng agent --name a1 --conf /kbb/install/flume/conf/test --conf-file /kbb/install/flume/conf/test/file-flume-hdfs.conf -Dflume.root.logger=INFO,console

 

三、需求3:将A服务器上的日志实时采集到B服务器

1.分析


技术选型:exec source +memory channel +avro sink
avro source +memory channel +logger sink

A服务器:
Agent:
source:type=exec
sink:type=avro

B服务器:
Agent:
source:type=avro
sink:type=logger

完成该需求应该写两份配置文件:(配置文件1和2中不能都是a1)

2.配置文件1:exec-memory-avro.conf

#name the compents on this agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#describe/configure the source 配置source
a1.sources.r1.type=exec
a1.sources.r1.command=tail -F /kbb/install/flume/data/data.log #监控文件路径
a1.sources.r1.shell=/bin/sh -c

#describe the sink 配置sink
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=node01
a1.sinks.k1.port=44444

#use a channel which buffers events in memory 存储到memory
a1.channels.c1.type=memory

#bind the source and sink to channel
a1.sources.r1.channels=c1
a1.sinks.k1.channel=c1

 

 

 

3.配置文件2:avro-memory-logger.conf

a2.sources = r2
a2.channels = c2
a2.sinks = k2

#describe/configure the source 配置source
a2.sources.r2.type=avro
a2.sources.r2.bind=node01
a2.sources.r2.port=44444

#describe the sink 配置sink
a2.sinks.k2.type=logger

#use a channel which buffers events in memory 存储到memory
a2.channels.c2.type=memory

#bind the source and sink to channel
a2.sources.r2.channels=c2
a2.sinks.k2.channel=c2

 

 

4.启动agent

1)一定先启动avro-memory-logger.conf(监听)
./flume-ng agent --name a2 --conf /kbb/install/flume/conf/test --conf-file /kbb/install/flume/conf/test/avro-memory-logger.conf -Dflume.root.logger=INFO,console

2)后启动exec-memory-avro.conf
./flume-ng agent --name a1 --conf /kbb/install/flume/conf/test --conf-file /kbb/install/flume/conf/test/exec-memory-avro.conf -Dflume.root.logger=INFO,console

 

 

总结:日志收集过程:

1)机器A(exec source+memory channel+avro sink)上监控一个文件,当我们访问主站时会有用户行为日志记录到access.log中输入内容,控制台上会显示输入的内容,实现了对某个文件的实时监控
2)avro sink把新产生的日志输出到对应的avro source(机器B的source)指定的hostname和port上
3)通过avro source 对应的agent(机器B的logger sink)将日志输出到控制台(以后该位置对接kafka)

 结果:

 

 

posted on 2022-09-15 14:25  桑榆非晚柠月如风  阅读(85)  评论(0编辑  收藏  举报