执行hadoop自带的WordCount实例
hadoop 自带的WordCount实例可以统计一批文本文件中各单词出现的次数。
下面介绍如何执行WordCount实例。
1.启动hadoop
[root@hadoop ~]# start-all.sh #启动hadoop
2.在本地新建目录及2个文件
[root@hadoop ~]# mkdir input [root@hadoop ~]# cd input/ [root@hadoop input]# echo "hello world">test1.txt #新建2个测试文件 [root@hadoop input]# echo "hello hadoop">test2.txt
3.将本地文件系统上的input目录复制到HDFS根目录下,重命名为in
[root@hadoop ~]# hdfs dfs -put input/ /in [root@hadoop ~]# hdfs dfs -ls / #查看根目录 Found 1 items drwxr-xr-x - root supergroup 0 2018-07-20 03:06 /in [root@hadoop ~]# hdfs dfs -ls /in #查看in根目录 Found 2 items -rw-r--r-- 1 root supergroup 12 2018-07-20 03:06 /in/test1.txt -rw-r--r-- 1 root supergroup 13 2018-07-20 03:06 /in/test2.txt
4.执行以下命令
[root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放 [root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out #out为输出目录,执行命令之前必须为空或者不存在否则报错
[root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放 [root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out 18/07/30 14:02:11 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.42.133:8032 18/07/30 14:02:13 INFO input.FileInputFormat: Total input paths to process : 2 18/07/30 14:02:13 INFO mapreduce.JobSubmitter: number of splits:2 18/07/30 14:02:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532913019648_0002 18/07/30 14:02:14 INFO impl.YarnClientImpl: Submitted application application_1532913019648_0002 18/07/30 14:02:14 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1532913019648_0002/ 18/07/30 14:02:14 INFO mapreduce.Job: Running job: job_1532913019648_0002 18/07/30 14:02:36 INFO mapreduce.Job: Job job_1532913019648_0002 running in uber mode : false 18/07/30 14:02:36 INFO mapreduce.Job: map 0% reduce 0% 18/07/30 14:04:37 INFO mapreduce.Job: map 67% reduce 0% 18/07/30 14:04:42 INFO mapreduce.Job: map 100% reduce 0% 18/07/30 14:05:21 INFO mapreduce.Job: map 100% reduce 100% 18/07/30 14:05:23 INFO mapreduce.Job: Job job_1532913019648_0002 completed successfully 18/07/30 14:05:26 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=55 FILE: Number of bytes written=368074 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=217 HDFS: Number of bytes written=25 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=259093 Total time spent by all reduces in occupied slots (ms)=21736 Total time spent by all map tasks (ms)=259093 Total time spent by all reduce tasks (ms)=21736 Total vcore-milliseconds taken by all map tasks=259093 Total vcore-milliseconds taken by all reduce tasks=21736 Total megabyte-milliseconds taken by all map tasks=265311232 Total megabyte-milliseconds taken by all reduce tasks=22257664 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=41 Map output materialized bytes=61 Input split bytes=192 Combine input records=4 Combine output records=4 Reduce input groups=3 Reduce shuffle bytes=61 Reduce input records=4 Reduce output records=3 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=847 CPU time spent (ms)=4390 Physical memory (bytes) snapshot=461631488 Virtual memory (bytes) snapshot=6226669568 Total committed heap usage (bytes)=277356544 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=25 File Output Format Counters Bytes Written=25
5.查看输出结果
1)直接查看HDFS上的输出文件
[root@hadoop mapreduce]# hdfs dfs -ls /out Found 2 items -rw-r--r-- 1 root supergroup 0 2018-07-30 14:05 /out/_SUCCESS -rw-r--r-- 1 root supergroup 25 2018-07-30 14:05 /out/part-r-00000 [root@hadoop mapreduce]# hdfs dfs -cat /out/part-r-00000 hadoop 1 hello 2 world 1
2)也可以输入以下命令查看
[root@hadoop mapreduce]# hdfs dfs -cat /out/* hadoop 1 hello 2 world 1
3)还可以把文件复制到本地查看
[root@hadoop mapreduce]# hdfs dfs -get /out /root/output [root@hadoop mapreduce]# cd /root/output/ [root@hadoop output]# ll 总用量 4 -rw-r--r-- 1 root root 25 7月 30 17:18 part-r-00000 -rw-r--r-- 1 root root 0 7月 30 17:18 _SUCCESS [root@hadoop output]# cat part-r-00000 hadoop 1 hello 2 world 1