flume 之 http source

1. flume为大数据平台数据采集工具,可以根据应用场景合理定制不同的source,channel,sink 。详细分类可参考https://www.cnblogs.com/zhangyinhua/p/7803486.html#_lab2_2_3和官网:http://flume.apache.org/FlumeUserGuide.html,

用户可以二次开发拦截器,用于flume传输中简单的数据处理。

 

2. 第一种场景:http-memory-logger

a1.sources=r1
a1.sinks=k1
a1.channels=c1
 
a1.sources.r1.type=http
a1.sources.r1.bind=duan140
a1.sources.r1.port=50000
a1.sources.r1.channels=c1
 
a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1
 
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100

开启客户端:flume-ng agent  -f /root/bigdata/http_test.conf -n a1 

测试:在另一个窗口输入:curl -X POST -d'[{"headers":{"h1":"v1","h2":"v2"},"body":"hello body"}]'  http://duan140:50000

3. 第二种场景

#set name
agent1.sources = source1
agent1.channels = channel1
agent1.sinks = sink1

#link sources and sinks
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

#set sources
agent1.sources.source1.type=http
agent1.sources.source1.bind=duan140
agent1.sources.source1.port=50000

#set sinks 、necessary set in this example

agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/duan/http/%Y%m%d

#not necessary set in this example
agent1.sinks.sink1.hdfs.filePrefix = duan
agent1.sinks.sink1.hdfs.fileSuffix = .log

#默认情况下,Flume会每隔30s、10个事件或者是1024字节来转储写入的文件。
#如果希望每100M转储一次:
agent1.sinks.sink1.hdfs.rollInterval=0
agent1.sinks.sink1.hdfs.rollCount=0
agent1.sinks.sink1.hdfs.rollSize=104857600

#下面这个属性默认为空

agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.kerberosPrincipal = duan@HADOOP.COM
agent1.sinks.sink1.hdfs.kerberosKeytab = /tmp/keytab/duan.keytab

#下面的配置将时间戳向下舍入到最后10分钟。
agent1.sinks.sink1.hdfs.round = true
agent1.sinks.sink1.hdfs.roundValue = 10
agent1.sinks.sink1.hdfs.roundUnit = minute

agent1.sinks.sink1.hdfs.useLocalTimeStamp = true

#复制份数为1,不起任何作用,只是告知而已,最后还要看hdfs的文件配置,网上以解决产生小文件问题,实验后生效。

agent1.sinks.sink1.hdfs.minBlockReplicas = 1


#set channels
agent1.channels.channel1.type = file
agent1.channels.channel1.checkpointDir=/root/bigdata/flume/checkpoint
agent1.channels.channel1.dataDirs=/root/bigdata/flume/data

#agent1.channels.channel1.type=memory
agent1.channels.channel1.capacity=100000
#agent1.channels.channel1.transactionCapacity=100

启动:flume-ng agent  -f /root/bigdata/http_hdfs.conf -n agent1

测试:

curl -X POST -d'[{"headers": {"timestamp": "434324343","host": "random_host.example.com"},"body": "random_body"}]' http://duan140:50000

 出现错误

18/12/13 16:27:54 WARN hdfs.HDFSEventSink: HDFS IO error
java.io.IOException: File type SequenceFile #文件格式,不压缩 not supported 

解决办法:参数后面不要加备注 #

 

posted @ 2018-12-13 15:37  duaner92  阅读(1356)  评论(0编辑  收藏  举报