06_Flume_interceptor_时间戳+Host
1、目标场景
2、flume agent配置文件
# 01 define agent name, source/sink/channel name a1.sources = r1 a1.sinks = k1 a1.channels = c1 # 02 source,http,jsonhandler a1.sources.r1.type = http a1.sources.r1.bind = master a1.sources.r1.port = 6666 a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler # 03 timestamp and host interceptors work before source a1.sources.r1.interceptors = i1 i2 # 两个interceptor串联,依次作用于event a1.sources.r1.interceptors.i1.type = timestamp a1.sources.r1.interceptors.i1.preserveExisting = false a1.sources.r1.interceptors.i2.type = host # flume event的头部将添加 “hostname”:实际主机名 a1.sources.r1.interceptors.i2.hostHeader = hostname # 指定key,value将填充为flume agent所在节点的主机名 a1.sources.r1.interceptors.i2.useIP = false # IP和主机名,二选一即可 # 04 hdfs sink a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/%Y-%m-%d/ # hdfs sink将根据event header中的时间戳进行替换 # 和hostHeader的值保持一致,hdfs sink将提取event中key为hostnmae的值,基于该值创建文件名前缀 a1.sinks.k1.hdfs.filePrefix = %{hostname} # hdfs sink将根据event header中的hostnmae对应的value进行替换 a1.sinks.k1.hdfs.fileType = DataStream a1.sinks.k1.hdfs.writeFormat = Text a1.sinks.k1.hdfs.rollInterval = 0 a1.sinks.k1.hdfs.rollCount = 10 a1.sinks.k1.hdfs.rollSize = 1024000 # channel,memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # bind source,sink to channel a1.sinks.k1.channel = c1 a1.sources.r1.channels = c1
3、验证timestamp+host interceptor
验证思路:
1)先将interceptor作用后的event,通过logger sink打印到console,验证header是否正常添加
2)修改sink为hdfs, 观察目录和文件的名称是否能够按照预期创建(时间戳-目录,hostname-文件前缀)
验证过程:
1)发送header为空的http请求,logger sink打印event到终端,观察event header中是否被添加了timestamp以及hostname
2)ogger打印到console的event,header发生了变化
3)修改sink为hdfs, 观察HDFS的目录名(时间戳)和文件前缀(hostnme)
*目录名被正常替换(基于event header中的时间戳)
*文件前缀被正常替换(基于event header中的hostname:实际主机名)
* 文件内容被写入为event的body