在linux上一行代码不用写实现自动采集+hadoop分词

在linux上一行代码不用写实现自动采集+hadoop分词

将下面的shell脚本保存成到xxx.sh,然后执行即可大笑

cd /opt/hadoop

mkdir spider
wget -O spider/test.html "http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html"  
hadoop fs -mkdir /spider
hadoop fs -put spider/test.html /spider

hadoop jar share/hadoop/mapreduce/wordcount.jar wordcount.wordcount /spider/test.html /fenci2


执行结果如下:


posted @ 2016-12-06 16:57  chinacloudy  阅读(110)  评论(0编辑  收藏  举报