MapReduce_wordcount
测试数据:
[hadoop@h201 mapreduce]$ more counttext.txt
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
hello mama
hello baba
hello word
cai wen wei
mama baba jiejie gege
gege jiejie didi
meimei jiejie
didi mama
ayi shushu
ayi mama
vim WordCount2.java
1 package MapReduce; 2 3 import java.io.*; 4 import org.apache.hadoop.conf.Configuration; 5 import org.apache.hadoop.fs.Path; 6 import org.apache.hadoop.io.IntWritable; 7 import org.apache.hadoop.io.Text; 8 import org.apache.hadoop.mapreduce.Job; 9 import org.apache.hadoop.mapreduce.Mapper; 10 import org.apache.hadoop.mapreduce.Reducer; 11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 13 14 public class WordCount2{
private static final String INPUT_PATH = "hdfs://h201:9000/user/hadoop/counttext.txt";
private static final String OUTPUT_PATH = "hdfs://h201:9000/user/hadoop/output"; 15 public static class WordCount2Mapper extends Mapper<Object,Text,Text,IntWritable>{ 16 private final static IntWritable one = new IntWritable(1); 17 private Text word = new Text(); 18 19 public void map(Object key,Text value,Context context) throws IOException, InterruptedException { 20 String[] words = value.toString().split(" "); 21 for (String str: words){ 22 word.set(str); 23 context.write(word,one); 24 } 25 } 26 } 27 28 public static class WordCount2Reducer extends Reducer<Text,IntWritable,Text,IntWritable> { 29 public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { 30 int total=0; 31 for (IntWritable val : values){ 32 total++; 33 } 34 context.write(key, new IntWritable(total)); 35 } 36 } 37 38 public static void main (String[] args) throws Exception{ 39 Configuration conf = new Configuration(); 40 conf.set("mapred.jar","wc1.jar"); 41 Job job = new Job(conf, "wordcount"); 42 job.setJarByClass(WordCount2.class); 43 job.setMapperClass(WordCount2Mapper.class); 44 job.setReducerClass(WordCount2Reducer.class); 45 job.setOutputKeyClass(Text.class); 46 job.setOutputValueClass(IntWritable.class); 47 FileInputFormat.addInputPath(job, new Path(args[0])); 48 FileOutputFormat.setOutputPath(job, new Path(args[1])); 49 //FileInputFormat.addInputPath(job, new Path(INPUT_PATH));addInputPaths多路径50 //FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH)); 51 System.exit(job.waitForCompletion(true) ? 0 : 1); 52 } 53 }
[hadoop@h201 mapreduce]$ /usr/jdk1.7.0_25/bin/javac WordCount2.java
Note: WordCount2.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
[hadoop@h201 mapreduce]$ ls
counttext.txt WordCount2.class WordCount2.java WordCount2$WordCount2Mapper.class WordCount2$WordCount2Reducer.class
[hadoop@h201 mapreduce]$ /usr/jdk1.7.0_25/bin/jar cvf wc1.jar WordCount2*class
added manifest
adding: WordCount2.class(in = 1531) (out= 815)(deflated 46%)
adding: WordCount2$WordCount2Mapper.class(in = 1831) (out= 783)(deflated 57%)
adding: WordCount2$WordCount2Reducer.class(in = 1623) (out= 670)(deflated 58%)
[hadoop@h201 mapreduce]$ ls
counttext.txt wc1.jar WordCount2.class WordCount2.java WordCount2$WordCount2Mapper.class WordCount2$WordCount2Reducer.class
[hadoop@h201 mapreduce]$ hadoop jar wc1.jar WordCount2 hdfs://h201:9000/user/hadoop/counttext.txt hdfs://h201:9000/user/hadoop/output
18/03/09 23:33:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/09 23:33:39 INFO client.RMProxy: Connecting to ResourceManager at h201/192.168.121.132:8032
18/03/09 23:33:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/03/09 23:34:05 INFO input.FileInputFormat: Total input paths to process : 1
18/03/09 23:34:06 INFO mapreduce.JobSubmitter: number of splits:1
18/03/09 23:34:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/03/09 23:34:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1516635595760_0001
18/03/09 23:34:21 INFO impl.YarnClientImpl: Submitted application application_1516635595760_0001
18/03/09 23:34:21 INFO mapreduce.Job: The url to track the job: http://h201:8088/proxy/application_1516635595760_0001/
18/03/09 23:34:21 INFO mapreduce.Job: Running job: job_1516635595760_0001
18/03/09 23:35:32 INFO mapreduce.Job: Job job_1516635595760_0001 running in uber mode : false
18/03/09 23:35:32 INFO mapreduce.Job: map 0% reduce 0%
18/03/09 23:36:33 INFO mapreduce.Job: map 100% reduce 0%
18/03/09 23:36:45 INFO mapreduce.Job: map 100% reduce 100%
18/03/09 23:36:47 INFO mapreduce.Job: Job job_1516635595760_0001 completed successfully
18/03/09 23:36:47 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1366
FILE: Number of bytes written=221143
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=747
HDFS: Number of bytes written=101
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=55286
Total time spent by all reduces in occupied slots (ms)=8704
Total time spent by all map tasks (ms)=55286
Total time spent by all reduce tasks (ms)=8704
Total vcore-seconds taken by all map tasks=55286
Total vcore-seconds taken by all reduce tasks=8704
Total megabyte-seconds taken by all map tasks=56612864
Total megabyte-seconds taken by all reduce tasks=8912896
Map-Reduce Framework
Map input records=50
Map output records=120
Map output bytes=1120
Map output materialized bytes=1366
Input split bytes=107
Combine input records=0
Combine output records=0
Reduce input groups=13
Reduce shuffle bytes=1366
Reduce input records=120
Reduce output records=13
Spilled Records=240
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1264
CPU time spent (ms)=4210
Physical memory (bytes) snapshot=223772672
Virtual memory (bytes) snapshot=2148155392
Total committed heap usage (bytes)=136712192
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=640
File Output Format Counters
Bytes Written=101
[hadoop@h201 mapreduce]$ hadoop fs -lsr /user/hadoop/output
lsr: DEPRECATED: Please use 'ls -R' instead.
18/03/09 23:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 2 hadoop supergroup 0 2018-03-09 23:36 /user/hadoop/output/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 101 2018-03-09 23:36 /user/hadoop/output/part-r-00000
[hadoop@h201 mapreduce]$ hadoop fs -cat /user/hadoop/output/part-r-00000
18/03/09 23:39:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ayi 10
baba 10
cai 5
didi 10
gege 10
hello 15
jiejie 15
mama 20
meimei 5
shushu 5
wei 5
wen 5
word 5