Flume
Step1.download tar packages from apache website.
Step2.extract the file and set the environment.
Vim /etc/profile
Export FLUME_HOME=/home/hadoop/flume1.4
Export PATH=$PATH:$FLUME_HOME/bin
Source /etc/profile
Step3:
Start a agent to put dir to avro.
flume-ng agent -n agent1 -f confs/avrotest.conf
flume-ng avro-client -H namenode -p 55555 -F /home/hadoop/data/xml/*.*
flume-ng avro-client -H namenode -p 55555 -F /home/hadoop/data/xml/*.zip
传输的过程,先把数据传送到avro形式,然后再使用avro source - hdfs sink.
先开启hdfs conf的flume,然后再开启avro source的conf.
flume-ng agent -n agent2 -f $FLUME_HOME/confs/avrosink.conf
flume-ng agent -n agent3 -f $FLUME_HOME/confs/hdfssink.conf
flume-ng agent -n agent1 -f $FLUME_HOME/confs/avrotest.conf
./flume-ng agent -n agent1 -f /home/yaxiaohu/flumeconf/flume-dest.conf
./flume-ng agent -n agent -f /home/yaxiaohu/flumeconf/flume-source.conf
./flume-ng agent -n agent-1 -f /home/yaxiaohu/flumeconf/evantest.conf
Actually it might be tricky to use the directory spooling source to read a
compressed archive. It's possible, but you would definitely need to write
your own deserializer.
Flume is an event-oriented streaming system, it's not really optimized to
be a plain file transfer mechanism like FTP.