Flume具体应用(多案例)
日志采集
对于flume的原理其实很容易理解,我们更应该掌握flume的具体使用方法,flume提供了大量内置的Source、Channel和Sink类型。而且不同类型的Source、Channel和Sink可以自由组合—–组合方式基于用户设置的配置文件,非常灵活。比如:Channel可以把事件暂存在内存里,也可以持久化到本地硬盘上。Sink可以把日志写入HDFS, HBase,甚至是另外一个Source等等。下面我将用具体的案例详述flume的具体用法。
其实flume的用法很简单—-书写一个配置文件,在配置文件当中描述source、channel与sink的具体实现,而后运行一个agent实例,在运行agent实例的过程中会读取配置文件的内容,这样flume就会采集到数据。
配置文件的编写原则:
1>从整体上描述代理agent中sources、sinks、channels所涉及到的组件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
2>详细描述agent中每一个source、sink与channel的具体实现:即在描述source的时候,需要
指定source到底是什么类型的,即这个source是接受文件的、还是接受http的、还是接受thrift
的;对于sink也是同理,需要指定结果是输出到HDFS中,还是Hbase中啊等等;对于channel
需要指定是内存啊,还是数据库啊,还是文件啊等等。
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
3>通过channel将source与sink连接起来
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
启动agent的shell操作:
flume-ng agent -n a1 -c ../conf -f ../conf/example.file
-Dflume.root.logger=DEBUG,console
参数说明: -n 指定agent名称(与配置文件中代理的名字相同)
-c 指定flume中配置文件的目录
-f 指定配置文件
-Dflume.root.logger=DEBUG,console 设置日志等级
案例1
NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:logger Channel:memory
flume官网中NetCat Source描述
Property Name Default Description
channels –
type – The component type name, needs to be netcat
bind – 日志需要发送到的主机名或者Ip地址,该主机运行着netcat类型的source在监听
port – 日志需要发送到的端口号,该端口号要有netcat类型的source在监听
编写配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f netcat.conf -Dflume.root.logger=DEBUG,console |
使用telnet发送数据
1 2 3 4 5 6 | [root@node-247 ~]# telnet 192.168.1.246 44444 Trying 192.168.1.246... Connected to 192.168.1.246. Escape character is '^]' . 111111 OK |
在agent节点查看输出
1 | 18/08/01 17:32:21 INFO sink.LoggerSink: Event: { headers:{} body: 31 31 31 31 31 31 0D 111111. } |
案例2:
NetCat Source:监听一个指定的网络端口,即只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。 其中 Sink:hdfs Channel:file (相比于案例1的两个变化)
flume官网中HDFS Sink的描述:
编写配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs: //node-231:8020/user/hdfs/flume/netcat a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 10 a1.sinks.k1.hdfs.rollSize = 0 a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file a1.channels.c1.type = file a1.channels.c1.checkpointDir = /usr/flume/checkpoint a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f netcat2hdfs.conf -Dflume.root.logger=DEBUG,console |
telnet发送数据
1 2 3 4 5 6 | [root@node-247 ~]# telnet 192.168.1.246 44444 Trying 192.168.1.246... Connected to 192.168.1.246. Escape character is '^]' . write to hdfs OK |
Agent节点日志信息
1 2 3 4 5 6 7 8 | 18/08/01 17:39:28 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 18/08/01 17:39:28 INFO hdfs.BucketWriter: Creating hdfs: //node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp 18/08/01 17:39:39 INFO hdfs.BucketWriter: Closing hdfs: //node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp 18/08/01 17:39:39 INFO hdfs.BucketWriter: Renaming hdfs: //node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959.tmp to hdfs://node-231:8020/user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959 18/08/01 17:39:39 INFO hdfs.HDFSEventSink: Writer callback called. 18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1 18/08/01 17:39:53 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533116333542, queueSize: 0, queueHead: 0 18/08/01 17:39:53 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-1 position: 171 logWriteOrderID: 1533116333542 |
写入成功,验证
1 2 3 4 5 6 | [root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/netcat/ Found 1 items -rw-r--r-- 3 root hdfs 15 2018-08-01 17:39 /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959 [root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/netcat/2018-08-01-17-39-28.1533116368959 write to hdfs |
再次telnet发送数据会发现HDFS目录下会有两个数据文件
案例3:
Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。其中 Sink:logger Channel:memory
flume官网中Spooling Directory Source描述:
Property Name Default Description
channels –
type – The component type name, needs to be spooldir.
spoolDir – Spooling Directory Source监听的目录
fileSuffix .COMPLETED 文件内容写入到channel之后,标记该文件
deletePolicy never 文件内容写入到channel之后的删除策略: never or immediate
fileHeader false Whether to add a header storing the absolute path filename.
ignorePattern ^$ Regular expression specifying which files to ignore (skip)
interceptors – 指定传输中event的head(头信息),常用timestamp
Spooling Directory Source的两个注意事项:
①If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
即:拷贝到spool目录下的文件不可以再打开编辑
②If a file name is reused at a later time, Flume will print an error to its log file and stop processing.
即:不能将具有相同文件名字的文件拷贝到这个目录下
编写配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /usr/local/test/datainput a1.sources.r1.fileHeader = true a1.sources.r1.interceptors = i1 a1.sources.r1.interceptors.i1.type = timestamp # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f spool.conf -Dflume.root.logger=DEBUG,console |
在该路径下放入测试文件,内容为hello spool
1 | cp test.txt datainput/ |
agent日志
1 2 | 18/08/01 17:52:48 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test.txt to /usr/local/test/datainput/test.txt.COMPLETED 18/08/01 17:52:48 INFO sink.LoggerSink: Event: { headers:{file=/usr/local/test/datainput/test.txt, timestamp=1533117168275} body: 68 65 6C 6C 6F 20 73 70 6F 6F 6C hello spool } |
从控制台显示的结果可以看出event的头信息中包含了时间戳信息。
同时我们查看一下Spooling Directory中的datafile信息—-文件内容写入到channel之后,该文件被标记了
1 2 | [root@node-246 test]# ls datainput/ test.txt.COMPLETED |
案例4:
Spooling Directory Source:监听一个指定的目录,即只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。 其中 Sink:hdfs Channel:file (相比于案例3的两个变化)
编写配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | #name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /usr/local/test/datainput a1.sources.r1.fileHeader = true a1.sources.r1.interceptors = i1 a1.sources.r1.interceptors.i1.type = timestamp # Describe the sink # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs: //node-231:8020/user/hdfs/flume/spool/dataoutput a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 10 a1.sinks.k1.hdfs.rollSize = 0 a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file a1.channels.c1.type = file a1.channels.c1.checkpointDir = /usr/flume/checkpoint a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f spool2hdfs.conf -Dflume.root.logger=DEBUG,console |
向datainput下放入新文件test1.txt
1 | cp test1.txt datainput |
agent日志
1 2 3 4 5 6 7 8 9 10 11 12 13 | 18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 1 18/08/01 17:58:41 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533117491901, queueSize: 0, queueHead: 0 18/08/01 17:58:42 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-3 position: 241 logWriteOrderID: 1533117491901 18/08/01 17:58:42 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-1 18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 18/08/01 17:58:42 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 18/08/01 17:58:43 INFO hdfs.BucketWriter: Closing hdfs: //node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp 18/08/01 17:58:43 INFO hdfs.BucketWriter: Renaming hdfs: //node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp to hdfs://node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457 18/08/01 17:58:43 INFO hdfs.HDFSEventSink: Writer callback called. 18/08/01 17:58:32 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /usr/local/test/datainput/test1.txt to /usr/local/test/datainput/test1.txt.COMPLETED 18/08/01 17:58:32 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 18/08/01 17:58:32 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 18/08/01 17:58:32 INFO hdfs.BucketWriter: Creating hdfs: //node-231:8020/user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457.tmp |
查看HDFS
1 2 3 4 5 | [root@node-235 bin]# hadoop fs -ls /user/hdfs/flume/spool/dataoutput Found 1 items -rw-r--r-- 3 root hdfs 12 2018-08-01 17:58 /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457 [root@node-235 bin]# hadoop fs -cat /user/hdfs/flume/spool/dataoutput/2018-08-01-17-58-32.1533117512457 hello spool |
查看datainput下的文件状态
1 2 | [root@node-246 test]# ls datainput/ test1.txt.COMPLETED test.txt.COMPLETED |
案例5:
Exec Source:监听一个指定的命令,获取一条命令的结果作为它的数据源
常用的是tail -F file指令,即只要应用程序向日志(文件)里面写数据,source组件就可以获取到日志(文件)中最新的内容 。 其中 Sink:hdfs Channel:file
这个案列为了方便显示Exec Source的运行效果,结合Hive中的external table进行来说明。
编写配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /usr/local/test/log.file # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs: //node-231:8020/user/hdfs/flume/exec/dataoutput a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 10 a1.sinks.k1.hdfs.rollSize = 0 a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file a1.channels.c1.type = file a1.channels.c1.checkpointDir = /usr/flume/checkpoint a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
创建hive外部表
1 2 3 4 5 6 | create external table flume_exec_table (info String) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE location '/user/hdfs/flume/exec/dataoutput' |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f exec.conf -Dflume.root.logger=DEBUG,console |
使用echo命令向/usr/local/test/log.file中写入数据
1 | echo firstline=1 >> /usr/local/test/log.file |
查看hive中的数据
总结Exec source:Exec source和Spooling Directory Source是两种常用的日志采集的方式,其中Exec source可以实现对日志的实时采集,Spooling Directory Source在对日志的实时采集上稍有欠缺,尽管Exec source可以实现对日志的实时采集,但是当Flume不运行或者指令执行出错时,Exec source将无法收集到日志数据,日志会出现丢失,从而无法保证收集日志的完整性。
案例6:
Avro Source:监听一个指定的Avro 端口,通过Avro 端口可以获取到Avro client发送过来的文件 。即只要应用程序通过Avro 端口发送文件,source组件就可以获取到该文件中的内容。 其中 Sink:hdfs Channel:file
(注:Avro和Thrift都是一些序列化的网络端口–通过这些网络端口可以接受或者发送信息,Avro可以发送一个给定的文件给Flume,Avro 源使用AVRO RPC机制)
Avro Source运行原理如下图:
flume官网中Avro Source的描述:
Property Name Default Description
channels –
type – The component type name, needs to be avro
bind – 日志需要发送到的主机名或者ip,该主机运行着ARVO类型的source
port – 日志需要发送到的端口号,该端口要有ARVO类型的source在监听
编写配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 4141 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs: //node-231:8020/user/hdfs/flume/avro/dataoutput a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 10 a1.sinks.k1.hdfs.rollSize = 0 a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file a1.channels.c1.type = file a1.channels.c1.checkpointDir = /usr/flume/checkpoint a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f avro.conf -Dflume.root.logger=DEBUG,console |
使用avro-client发送文件
1 | flume-ng avro-client -c ../conf -H 192.168.1.246 -p 4141 -F /usr/local/test/log.file |
agent日志如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] OPEN 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] BOUND: /192.168.1.246:4141 18/08/02 09:56:14 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 => /192.168.1.246:4141] CONNECTED: /192.168.1.246:43750 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] DISCONNECTED 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] UNBOUND 18/08/02 09:56:15 INFO ipc.NettyServer: [id: 0xdc3eb5d1, /192.168.1.246:43750 :> /192.168.1.246:4141] CLOSED 18/08/02 09:56:15 INFO ipc.NettyServer: Connection to /192.168.1.246:43750 disconnected. 18/08/02 09:56:18 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 18/08/02 09:56:18 INFO hdfs.BucketWriter: Creating hdfs: //node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp 18/08/02 09:56:19 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 18/08/02 09:56:19 INFO hdfs.BucketWriter: Creating hdfs: //node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp 18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Start checkpoint for /usr/flume/checkpoint/checkpoint_1533116333069, elements to sync = 3 18/08/02 09:56:22 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1533174892909, queueSize: 0, queueHead: 1 18/08/02 09:56:23 INFO file.Log: Updated checkpoint for file: /usr/flume/data/log-5 position: 361 logWriteOrderID: 1533174892909 18/08/02 09:56:23 INFO file.LogFile: Closing RandomReader /usr/flume/data/log-3 18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs: //node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp 18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs: //node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446 18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called. 18/08/02 09:56:29 INFO hdfs.BucketWriter: Closing hdfs: //node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp 18/08/02 09:56:29 INFO hdfs.BucketWriter: Renaming hdfs: //node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087.tmp to hdfs://node-231:8020/user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087 18/08/02 09:56:29 INFO hdfs.HDFSEventSink: Writer callback called. |
查看HDFS下文件
1 2 3 4 | [root@node-231 ~]# hadoop fs -ls /user/hdfs/flume/avro/dataoutput/ Found 2 items -rw-r--r-- 3 root hdfs 12 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-18.1533174978446 -rw-r--r-- 3 root hdfs 25 2018-08-02 09:56 /user/hdfs/flume/avro/dataoutput/2018-08-02-09-56-19.1533174979087 |
案例7:
syslogtcp
Syslogtcp监听TCP的端口做为数据源
agent配置文件如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = 192.168.1.246 a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f syslogtcp.conf -Dflume.root.logger=DEBUG,console |
产生测试syslog
1 | echo "test syslogtcp" | nc 192.168.1.246 5140 |
agent日志
1 2 | 18/08/02 11:13:48 WARN source.SyslogUtils: Event created from Invalid Syslog data. 18/08/02 11:13:49 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 74 65 73 74 20 73 79 73 6C 6F 67 74 63 70 test syslogtcp } |
案例8:
JSONHandler
创建agent配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = org.apache.flume.source.http.HTTPSource a1.sources.r1.host = 192.168.1.246 a1.sources.r1.port = 8888 a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f httpsource.conf -Dflume.root.logger=DEBUG,console |
生成JSON格式的POST request
1 | curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "idoall.org_body"}]' http: //192.168.1.246:8888 |
agent日志
1 | 18/08/02 11:28:34 INFO sink.LoggerSink: Event: { headers:{a=a1, b=b1} body: 69 64 6F 61 6C 6C 2E 6F 72 67 5F 62 6F 64 79 idoall.org_body } |
案例9:
File Roll Sink
"file_roll"表示将数据存入本地文件系统
创建配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = syslogtcp a1.sources.r1.port = 5555 a1.sources.r1.host = 192.168.1.246 a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type = file_roll a1.sinks.k1.sink.directory = /usr/local/test/fileroll # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f fileroll.conf -Dflume.root.logger=DEBUG,console |
测试产生log
1 | echo "hello idoall.org syslog" | nc 192.168.1.246 5555 |
agent日志
1 | 18/08/02 11:53:34 WARN source.SyslogUtils: Event created from Invalid Syslog data. |
查看/usr/local/test/fileroll目录下文件
1 2 3 4 | [root@node-246 fileroll]# ls 1533181857932-1 1533181857932-2 1533181857932-3 1533181857932-4 1533181857932-5 1533181857932-6 1533181857932-7 [root@node-246 fileroll]# cat 1533181857932-6 hello idoall.org syslog |
案例10
Replicating Channel Selector
Flume支持Fan out流从一个源到多个通道。有两种模式的Fan out,分别是复制和复用。在复制的情况下,流的事件被发送到所有的配置通道。在复用的情况下,事件被发送到可用的渠道中的一个子集。Fan out流需要指定源和Fan out通道的规则。
创建replicating_Channel_Selector配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # Describe/configure the source a1.sources.r1.type = syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = 192.168.1.246 a1.sources.r1.channels = c1 c2 a1.sources.r1.selector.type = replicating # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname = 192.168.1.246 a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro a1.sinks.k2.channel = c2 a1.sinks.k2.hostname = 192.168.1.247 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type = memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100 |
创建replicating_Channel_Selector_avro配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
将这两个配置文件拷贝到另一台机器247上,并修改配置中的IP
1 | scp replicating_Channel_Selector* root@node-247:/usr/local/test/flume/ |
打开四个窗口,分别启动两个agent
1 2 | flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console flume-ng agent -n a1 -c ../conf -f replicating_Channel_Selector.conf -Dflume.root.logger=DEBUG,console |
测试产生syslog
1 | echo "hello idoall.org syslog" | nc 192.168.1.246 5140 |
agent日志
1 | 18/08/02 14:09:53 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org } |
案例11
Multiplexing Channel Selector
新建Multiplexing_Channel_Selector配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # Describe/configure the source a1.sources.r1.type = org.apache.flume.source.http.HTTPSource a1.sources.r1.host = 192.168.1.246 a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 c2 a1.sources.r1.selector.type = multiplexing a1.sources.r1.selector.header = type #映射允许每个值通道可以重叠。默认值可以包含任意数量的通道。 a1.sources.r1.selector.mapping.baidu = c1 a1.sources.r1.selector.mapping.ali = c2 a1.sources.r1.selector. default = c1 # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname = 192.168.1.246 a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro a1.sinks.k2.channel = c2 a1.sinks.k2.hostname = 192.168.1.247 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type = memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100 |
1 | 新建Multiplexing_Channel_Selector_avro配置文件 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
将配置文件拷贝到另一个节点,并修改为对应IP
1 | scp Multiplexing_Channel_Selector* root@192.168.1.247:/usr/local/test/flume/ |
开启四个窗口,246 247分别两个,分别启动agent
1 2 | flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector_avro.conf -Dflume.root.logger=DEBUG,console flume-ng agent -n a1 -c ../conf -f Multiplexing_Channel_Selector.conf -Dflume.root.logger=DEBUG,console |
任意节点上,测试产生syslog
1 | curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http: //192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST2"}]' http://192.168.1.247:5140 && curl -X POST -d '[{ "headers" :{"type" : "qq"},"body" : "idoall_TEST3"}]' http://192.168.1.246:5140 |
agent日志
246上
1 2 | 18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=qq} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 33 idoall_TEST3 } 18/08/02 14:36:04 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31 idoall_TEST1 } |
247上
1 | 18/08/02 14:36:06 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 32 idoall_TEST2 } |
可以看到,根据header中不同的条件分布到不同的channel上
案例12
Flume Sink Procesors
Failover的机器是一直发送给其中一个sink,当这个sink不可用的时候,自动发送到下一个sink
创建Flume_Sink_Processors配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 #这个是配置failover的关键,需要有一个sink group a1.sinkgroups = g1 a1.sinkgroups.g1.sinks = k1 k2 #处理的类型是failover a1.sinkgroups.g1.processor.type = failover #优先级,数字越大优先级越高,每个sink的优先级必须不相同 a1.sinkgroups.g1.processor.priority.k1 = 5 a1.sinkgroups.g1.processor.priority.k2 = 10 #设置为10秒,当然可以根据你的实际状况更改成更快或者很慢 a1.sinkgroups.g1.processor.maxpenalty = 10000 # Describe/configure the source a1.sources.r1.type = syslogtcp a1.sources.r1.host = 192.168.1.246 a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 c2 a1.sources.r1.selector.type = replicating # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname = 192.168.1.246 a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro a1.sinks.k2.channel = c2 a1.sinks.k2.hostname = 192.168.1.247 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type = memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100 |
新建Flume_Sink_Processors_avro配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
将这两个文件拷贝到247节点,并修改对应host
1 | scp Flume_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/ |
开启四个窗口,分别启动两个agent
1 2 | flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console flume-ng agent -n a1 -c ../conf -f Flume_Sink_Processors.conf -Dflume.root.logger=DEBUG,console |
测试产生log
1 | echo "idoall.org test1 failover" | nc 192.168.1.246 5140 |
因为247的优先级高,所以在247的sink窗口,可以看到日志
1 | 18/08/02 15:47:44 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 } |
这时停掉247的sink(Ctrl+c),再次输出测试数据
1 | echo "idoall.org test1 failover" | nc 192.168.1.246 5140 |
可以看到246的sink日志
1 | 18/08/02 15:51:23 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 } |
案例13
Load balancing Sink Processor
load balance type和failover不同的地方是,load balance有两个配置,一个是轮询,一个是随机。两种情况下如果被选择的sink不可用,就会自动尝试发送到下一个可用的sink上面。
新建Load_balancing_Sink_Processors配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 #这个是配置Load balancing的关键,需要有一个sink group a1.sinkgroups = g1 a1.sinkgroups.g1.sinks = k1 k2 a1.sinkgroups.g1.processor.type = load_balance a1.sinkgroups.g1.processor.backoff = true a1.sinkgroups.g1.processor.selector = round_robin # Describe/configure the source a1.sources.r1.type = syslogtcp a1.sources.r1.host = 192.168.1.246 a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname = 192.168.1.246 a1.sinks.k1.port = 5555 a1.sinks.k2.type = avro a1.sinks.k2.channel = c1 a1.sinks.k2.hostname = 192.168.1.247 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 |
新建Load_balancing_Sink_Processors_arvo配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
将这两个文件拷贝到247节点下,并修改IP
1 | scp Load_balancing_Sink_Processors* root@192.168.1.247:/usr/local/test/flume/ |
启动四个窗口,启动四个agent
1 | flume-ng agent -n a1 -c ../conf -f Load_balancing_Sink_Processors_avro.conf -Dflume.root.logger=DEBUG,console |
测试产生log
1 2 3 4 | [root@node-246 ~]# echo "idoall.org test1" | nc 192.168.1.246 5140 [root@node-246 ~]# echo "idoall.org test2" | nc 192.168.1.246 5140 [root@node-246 ~]# echo "idoall.org test3" | nc 192.168.1.246 5140 [root@node-246 ~]# echo "idoall.org test4" | nc 192.168.1.246 5140 |
247日志
1 2 3 | 18/08/02 18:36:20 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 } 18/08/02 18:36:35 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 } 18/08/02 18:36:58 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 } |
246日志
1 | 18/08/02 18:36:47 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 } |
案例14
Hbase sink
将hbase lib下文件复制到flume lib下
1 2 3 4 5 6 7 8 | protobuf-java-2.5.0.jar hbase-client-0.96.2-hadoop2.jar hbase-common-0.96.2-hadoop2.jar hbase-protocol-0.96.2-hadoop2.jar hbase-server-0.96.2-hadoop2.jar hbase-hadoop2-compat-0.96.2-hadoop2.jar hbase-hadoop-compat-0.96.2-hadoop2.jar htrace-core-2.04.jar |
1 | cp protobuf-java-2.5.0.jar hbase-client-1.1.2.2.6.1.0-129.jar hbase-common-1.1.2.2.6.1.0-129.jar hbase-protocol-1.1.2.2.6.1.0-129.jar hbase-server-1.1.2.2.6.1.0-129.jar hbase-hadoop2-compat-1.1.2.2.6.1.0-129.jar hbase-hadoop-compat-1.1.2.2.6.1.0-129.jar htrace-core-3.1.0-incubating.jar /usr/hdp/2.6.1.0-129/flume/lib/ |
hbase新建表 flume_test 列族name
1 2 3 | hbase(main):003:0> create 'flume_test' , 'name' 0 row(s) in 2.3900 seconds => Hbase::Table - flume_test |
新建agent配置文件hbase_simple
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = 192.168.1.246 a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type = logger a1.sinks.k1.type = hbase a1.sinks.k1.table = flume_test a1.sinks.k1.columnFamily = name a1.sinks.k1.column = message a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sinks.k1.channel = memoryChannel # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f hbase_simple.conf -Dflume.root.logger=DEBUG,console |
产生测试log
1 | echo "hello zzz.org from flume" | nc 192.168.1.246 5140 |
agent日志
1 | 18/08/03 10:01:07 WARN source.SyslogUtils: Event created from Invalid Syslog data. |
查看hbase
1 2 3 4 | hbase(main):006:0> scan 'flume_test' ROW COLUMN+CELL 1533261667472-IamY4IbgS7-0 column=name:payload, timestamp=1533261670851, value=hello zzz.org from flume 1 row(s) in 0.1130 seconds |
案例15
使用flume avro采集平台日志
Agent文件如下,采集完成直接写入HDFS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | [root@node-246 flume]# cat avro_tag.conf # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = avro a1.sources.r1.bind = 192.168.1.246 a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs: //node-231:8020/user/hdfs/flume/avro_tag/dataoutput a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.rollInterval = 10 a1.sinks.k1.hdfs.rollSize = 0 a1.sinks.k1.hdfs.rollCount = 0 a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S a1.sinks.k1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in file a1.channels.c1.type = file a1.channels.c1.checkpointDir = /usr/flume/checkpoint a1.channels.c1.dataDirs = /usr/flume/data # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 |
平台需要引入的包
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-flume-ng</artifactId> <version>${log4j.version}</version> </dependency> <dependency> <groupId>org.apache.flume.flume-ng-clients</groupId> <artifactId>flume-ng-log4jappender</artifactId> <version>1.8.0</version> </dependency> <!-- log4j-core --> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>${log4j.version}</version> </dependency> <!-- log4j-api --> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> <version>${log4j.version}</version> </dependency> <!-- log4j-web --> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-web</artifactId> <version>${log4j.version}</version> </dependency> |
log4j2.xml,如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | <?xml version= "1.0" encoding= "UTF-8" ?> <!--日志级别以及优先级排序: OFF > FATAL > ERROR > WARN > INFO > DEBUG > TRACE > ALL --> <!--Configuration后面的status,这个用于设置log4j2自身内部的信息输出,可以不设置,当设置成trace时,你会看到log4j2内部各种详细输出--> <!--monitorInterval:Log4j能够自动检测修改配置 文件和重新配置本身,设置间隔秒数--> <configuration status= "INFO" monitorInterval= "30" > <properties> <property name= "LOG_HOME" >../logs</property> <property name= "TMP_LOG_FILE_NAME" >tmp</property> <property name= "INFO_LOG_FILE_NAME" >info</property> <property name= "WARN_LOG_FILE_NAME" >warn</property> <property name= "ERROR_LOG_FILE_NAME" >error</property> </properties> <!--先定义所有的appender--> <appenders> <!--这个输出控制台的配置--> <console name= "Console" target= "SYSTEM_OUT" > <!--输出日志的格式--> <PatternLayout pattern= "[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" /> </console> <!--文件会打印出所有信息,这个log每次运行程序会自动清空,由append属性决定,这个也挺有用的,适合临时测试用--> <File name= "log" fileName= "${LOG_HOME}/${TMP_LOG_FILE_NAME}.log" append= "false" > <PatternLayout pattern= "%d{HH:mm:ss.SSS} %-5level %class{36} %L %M - %msg%xEx%n" /> </File> <!-- 这个会打印出所有的info及以下级别的信息,每次大小超过size,则这size大小的日志会自动存入按年份-月份建立的文件夹下面并进行压缩,作为存档--> <RollingFile name= "RollingFileInfo" fileName= "${LOG_HOME}/${INFO_LOG_FILE_NAME}.log" filePattern= "${LOG_HOME}/${INFO_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log" > <!--控制台只输出level及以上级别的信息(onMatch),其他的直接拒绝(onMismatch)--> <ThresholdFilter level= "info" onMatch= "ACCEPT" onMismatch= "DENY" /> <PatternLayout pattern= "[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" /> <Policies> <TimeBasedTriggeringPolicy/> <SizeBasedTriggeringPolicy size= "100 MB" /> </Policies> </RollingFile> <RollingFile name= "RollingFileWarn" fileName= "${LOG_HOME}/${WARN_LOG_FILE_NAME}.log" filePattern= "${LOG_HOME}/${WARN_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log" > <ThresholdFilter level= "warn" onMatch= "ACCEPT" onMismatch= "DENY" /> <PatternLayout pattern= "[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" /> <Policies> <TimeBasedTriggeringPolicy/> <SizeBasedTriggeringPolicy size= "100 MB" /> </Policies> <!-- DefaultRolloverStrategy属性如不设置,则默认为最多同一文件夹下7个文件,这里设置了20 --> <DefaultRolloverStrategy max= "20" /> </RollingFile> <RollingFile name= "RollingFileError" fileName= "${LOG_HOME}/${ERROR_LOG_FILE_NAME}.log" filePattern= "${LOG_HOME}/${ERROR_LOG_FILE_NAME}-%d{yyyy-MM-dd}-%i.log" > <ThresholdFilter level= "error" onMatch= "ACCEPT" onMismatch= "DENY" /> <PatternLayout pattern= "[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" /> <Policies> <TimeBasedTriggeringPolicy/> <SizeBasedTriggeringPolicy size= "100 MB" /> </Policies> </RollingFile> <!-- flume配置 --> <Flume name= "FlumeAppender" compress= "true" > <Agent host= "192.168.1.246" port= "44444" /> <!-- <RFC5424Layout charset= "UTF-8" enterpriseNumber= "18060" includeMDC= "true" appName= "myapp" /> --> <PatternLayout charset= "GBK" pattern= "[%d{HH:mm:ss:SSS}] [%p] - %l - %m%n" /> </Flume> </appenders> <!--然后定义logger,只有定义了logger并引入的appender,appender才会生效--> <loggers> <!--过滤掉spring和mybatis的一些无用的DEBUG信息--> <logger name= "org.springframework" level= "INFO" ></logger> <logger name= "org.mybatis" level= "INFO" ></logger> <!-- <Logger name= "sysLog" level= "trace" > <AppenderRef ref = "FlumeAppender" /> </Logger> --> <root level= "info" > <appender- ref ref = "Console" /> <appender- ref ref = "RollingFileInfo" /> <appender- ref ref = "RollingFileWarn" /> <appender- ref ref = "RollingFileError" /> <!-- 日志写入flume source --> <appenderRef ref = "FlumeAppender" /> </root> </loggers> </configuration> |
启动agent
1 | flume-ng agent -n a1 -c ../conf -f avro_tag.conf -Dflume.root.logger=DEBUG,console |
启动项目之后就会将日志信息通过flume写入HDFS
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示
2017-08-06 Zookeeper简单配置
2017-08-06 Hadoop-HA配置详细步骤