MapReduce-计数器

计数器

计数器是收集作业统计信息的有效手段之一,用于质量控制或应用级统计。计数器还可辅助诊断系统故障。根据计数器值来记录某一特定事件的发生比分析一堆日志文件容易得多。
内置计数器
Hadoop为每个作业维护若干内置计数器,以描述多项指标。例如,某些计数器记录已处理的字节数和记录数,使用户可监控已处理的输入数据量和已产生的输出数据量。
这些计数器被分为若干个组,如下

 

 

1. 任务计数器


在任务执行过程中,任务计数器采集任务的相关信息,每个作业的所有任务的结果会被聚集起来。例如,MAP_INPUT_RECORDS计数器统计每个map任务输入记录的总数,并在一个作业的所有map任务上进行聚集,使得最终数字是整个作业的所有输入记录的总数。
任务计数器由关联任务维护,并定期发送给tasktracker(YARN中为nodemanager),再由tasktracker发送给jobtracker(YARN中为application master)。因此,计数器能够被全局地聚集。任务计数器的值每次都是完整传输的,而非自上次传输之后再继续尚未完成的传输,从而避免由于消息丢失而引发的错误。另外,如果一个任务在作业执行期间失败,则相关计数器的值会减小。
虽然只有当整个作业执行完之后计数器的值才是完整可靠的,但是部分计数器仍然可以再任务处理过程中提供一些有用的诊断信息,以便由Web界面监控。例如,PHYSICAL_MEMORY_BYTES\VIRTUAL_MEMORY_BYTES和COMMITED_HEAP_BYTES计数器显示特定任务执行过程中的内存使用变化情况。
内置的任务计数器包括在MapReduce任务计数器分组中的计数器以及在文件相关的计数器分组中。
内置的MapReduce任务计数器

 


内置的文件系统任务计数器

 

内置的FileInputFormat任务计数器

 

内置的FileOutputFormat任务计数器

 

2. 作业计数器


作业计数器由jobtracker(YARN中的application master)维护,因此无需再网络间传输数据,这一点与包括“用户定义的计数器”在内的其他计数器不同。这些计数器都是作业级别的统计量,其值不会随着任务运行而改变。例如,TOTAL_LAUNCHED_MAPS统计在作业执行过程中启动的map任务数,包括失败的map任务。
内置的作业计数器

 

 

用户自定义的Java计数器


计数器的值可在mapper或reducer中增加,计数器由一个Java枚举(enmu)类型来定义,以便对有关的计数器分组。一个作业可以定义的枚举类型数量不限,各个枚举类型所包含的字段数量也不限。枚举类型的名称即为组的名称,枚举类型的字段就是计数器名称。计数器是全局的。MapReduce框架将跨所有map和reduce聚集这些计数器,并在作业结束时产生一个最终结果。

示例:统计数据文件中空行条数
代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
package com.zhen.mapreduce.counter;
 
import java.io.IOException;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
 
/**
 * @author FengZhen
 * @date 2018年8月29日
 * 计数器,统计输入文件中空白的行数
 */
public class SimpleCounterTest extends Configured implements Tool{
 
    enum Empty{
        EMPTY,
        NOT_EMPTY
    }
     
    static class SimpleCounterMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            if (line.equals("")) {
                context.getCounter(Empty.EMPTY).increment(1);
            }else {
                context.getCounter(Empty.NOT_EMPTY).increment(1);
            }
            context.write(value, new IntWritable(1));
        }
    }
     
    static class SimpleCounterReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values,
                Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable intWritable : values) {
                sum += intWritable.get();
            }
            Counter empty = context.getCounter(Empty.EMPTY);
            Counter not_empty = context.getCounter(Empty.NOT_EMPTY);
            System.out.println("empty:"+empty.getValue() + "----not_empty:"+not_empty.getValue());
            context.write(key, new IntWritable(sum));
        }
    }
     
    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        job.setJobName("SimpleCounterTest");
        job.setJarByClass(SimpleCounterTest.class);
         
        job.setMapperClass(SimpleCounterMapper.class);
        job.setReducerClass(SimpleCounterReducer.class);
         
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
         
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
         
//      job.setInputFormatClass(FileInputFormat.class);
//      job.setOutputFormatClass(FileOutputFormat.class);
         
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
         
        return job.waitForCompletion(true) ? 0 : 1;
    }
 
    public static void main(String[] args) throws Exception {
        String[] params = new String[]{"hdfs://fz/user/hdfs/MapReduce/data/counter/containsEmpty/input","hdfs://fz/user/hdfs/MapReduce/data/counter/containsEmpty/output"};
        int exitCode = ToolRunner.run(new SimpleCounterTest(), params);
        System.exit(exitCode);
    }
     
}

  

打jar包上传到服务器,执行

1
2
scp /Users/FengZhen/Desktop/Hadoop/file/SimpleCounter.jar root@192.168.1.124:/usr/local/test/mr
hadoop jar SimpleCounter.jar com.zhen.mapreduce.counter.SimpleCounterTest


结果如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
[root@HDP4 mr]# hadoop jar SimpleCounter.jar com.zhen.mapreduce.counter.SimpleCounterTest
18/09/08 20:15:44 INFO client.RMProxy: Connecting to ResourceManager at HDP4/192.168.1.124:8032
18/09/08 20:15:46 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/09/08 20:15:47 INFO input.FileInputFormat: Total input paths to process : 1
18/09/08 20:15:47 INFO mapreduce.JobSubmitter: number of splits:1
18/09/08 20:15:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1535207597429_0006
18/09/08 20:15:51 INFO impl.YarnClientImpl: Submitted application application_1535207597429_0006
18/09/08 20:15:52 INFO mapreduce.Job: The url to track the job: http://HDP4:8088/proxy/application_1535207597429_0006/
18/09/08 20:15:52 INFO mapreduce.Job: Running job: job_1535207597429_0006
18/09/08 20:16:09 INFO mapreduce.Job: Job job_1535207597429_0006 running in uber mode : false
18/09/08 20:16:09 INFO mapreduce.Job:  map 0% reduce 0%
18/09/08 20:16:22 INFO mapreduce.Job:  map 100% reduce 0%
18/09/08 20:16:34 INFO mapreduce.Job:  map 100% reduce 100%
18/09/08 20:16:34 INFO mapreduce.Job: Job job_1535207597429_0006 completed successfully
18/09/08 20:16:34 INFO mapreduce.Job: Counters: 51
    File System Counters
        FILE: Number of bytes read=78
        FILE: Number of bytes written=298025
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=174
        HDFS: Number of bytes written=28
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=10609
        Total time spent by all reduces in occupied slots (ms)=8008
        Total time spent by all map tasks (ms)=10609
        Total time spent by all reduce tasks (ms)=8008
        Total vcore-milliseconds taken by all map tasks=10609
        Total vcore-milliseconds taken by all reduce tasks=8008
        Total megabyte-milliseconds taken by all map tasks=10863616
        Total megabyte-milliseconds taken by all reduce tasks=8200192
    Map-Reduce Framework
        Map input records=9
        Map output records=9
        Map output bytes=70
        Map output materialized bytes=74
        Input split bytes=141
        Combine input records=0
        Combine output records=0
        Reduce input groups=4
        Reduce shuffle bytes=74
        Reduce input records=9
        Reduce output records=4
        Spilled Records=18
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=261
        CPU time spent (ms)=5410
        Physical memory (bytes) snapshot=499347456
        Virtual memory (bytes) snapshot=5458706432
        Total committed heap usage (bytes)=361893888
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    com.zhen.mapreduce.counter.SimpleCounterTest$Empty
        EMPTY=4
        NOT_EMPTY=5
    File Input Format Counters
        Bytes Read=33
    File Output Format Counters
        Bytes Written=28

可以看到EMPTY=4 NOT_EMPTY=5 

posted on   嘣嘣嚓  阅读(368)  评论(0编辑  收藏  举报

编辑推荐:
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
阅读排行:
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示