flume 之 http source
1. flume为大数据平台数据采集工具,可以根据应用场景合理定制不同的source,channel,sink 。详细分类可参考https://www.cnblogs.com/zhangyinhua/p/7803486.html#_lab2_2_3和官网:http://flume.apache.org/FlumeUserGuide.html,
用户可以二次开发拦截器,用于flume传输中简单的数据处理。
2. 第一种场景:http-memory-logger
a1.sources=r1 a1.sinks=k1 a1.channels=c1 a1.sources.r1.type=http a1.sources.r1.bind=duan140 a1.sources.r1.port=50000 a1.sources.r1.channels=c1 a1.sinks.k1.type=logger a1.sinks.k1.channel=c1 a1.channels.c1.type=memory a1.channels.c1.capacity=10000 a1.channels.c1.transactionCapacity=100
开启客户端:flume-ng agent -f /root/bigdata/http_test.conf -n a1
测试:在另一个窗口输入:curl -X POST -d'[{"headers":{"h1":"v1","h2":"v2"},"body":"hello body"}]' http://duan140:50000
3. 第二种场景
#set name
agent1.sources = source1
agent1.channels = channel1
agent1.sinks = sink1
#link sources and sinks
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
#set sources
agent1.sources.source1.type=http
agent1.sources.source1.bind=duan140
agent1.sources.source1.port=50000
#set sinks 、necessary set in this example
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/duan/http/%Y%m%d
#not necessary set in this example
agent1.sinks.sink1.hdfs.filePrefix = duan
agent1.sinks.sink1.hdfs.fileSuffix = .log
#默认情况下,Flume会每隔30s、10个事件或者是1024字节来转储写入的文件。
#如果希望每100M转储一次:
agent1.sinks.sink1.hdfs.rollInterval=0
agent1.sinks.sink1.hdfs.rollCount=0
agent1.sinks.sink1.hdfs.rollSize=104857600
#下面这个属性默认为空
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.kerberosPrincipal = duan@HADOOP.COM
agent1.sinks.sink1.hdfs.kerberosKeytab = /tmp/keytab/duan.keytab
#下面的配置将时间戳向下舍入到最后10分钟。
agent1.sinks.sink1.hdfs.round = true
agent1.sinks.sink1.hdfs.roundValue = 10
agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
#复制份数为1,不起任何作用,只是告知而已,最后还要看hdfs的文件配置,网上以解决产生小文件问题,实验后生效。
agent1.sinks.sink1.hdfs.minBlockReplicas = 1
#set channels
agent1.channels.channel1.type = file
agent1.channels.channel1.checkpointDir=/root/bigdata/flume/checkpoint
agent1.channels.channel1.dataDirs=/root/bigdata/flume/data
#agent1.channels.channel1.type=memory
agent1.channels.channel1.capacity=100000
#agent1.channels.channel1.transactionCapacity=100
启动:flume-ng agent -f /root/bigdata/http_hdfs.conf -n agent1
测试:
curl -X POST -d'[{"headers": {"timestamp": "434324343","host": "random_host.example.com"},"body": "random_body"}]' http://duan140:50000
出现错误
18/12/13 16:27:54 WARN hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: File type SequenceFile #文件格式,不压缩 not supported
解决办法:参数后面不要加备注 #