统计apachelog各访问状态个数(使用MapReduce)

统计日志文件中各访问状态的个数.

1.将日志数据上传到hdfs

路径 /mapreduce/data/apachelog/in 中

内容如下

复制代码
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET / HTTP/1.1" 200 11452
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET /tomcat.css HTTP/1.1" 200 5926
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET /tomcat.png HTTP/1.1" 200 5103
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET /bg-nav.png HTTP/1.1" 200 1401
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET /asf-logo.png HTTP/1.1" 200 17811
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET /bg-upper.png HTTP/1.1" 200 3103
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET /bg-button.png HTTP/1.1" 200 713
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:24:16 +0800] "GET /bg-middle.png HTTP/1.1" 200 1918
127.0.0.1 - - [15/Feb/2017:16:25:53 +0800] "GET / HTTP/1.1" 404 994
127.0.0.1 - - [15/Feb/2017:16:25:53 +0800] "GET / HTTP/1.1" 404 994
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:25:53 +0800] "GET / HTTP/1.1" 404 994
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:28:39 +0800] "GET / HTTP/1.1" 404 994
127.0.0.1 - - [15/Feb/2017:16:30:32 +0800] "GET / HTTP/1.1" 404 994
127.0.0.1 - - [15/Feb/2017:16:30:32 +0800] "GET / HTTP/1.1" 404 994
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:30:33 +0800] "GET / HTTP/1.1" 404 994
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:33:52 +0800] "GET / HTTP/1.1" 404 994
127.0.0.1 - - [15/Feb/2017:16:40:54 +0800] "GET / HTTP/1.1" 404 994
127.0.0.1 - - [15/Feb/2017:16:40:54 +0800] "GET / HTTP/1.1" 404 994
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:40:59 +0800] "GET /sentiment_ms/login HTTP/1.1" 404 1030
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:41:07 +0800] "GET / HTTP/1.1" 404 994
0:0:0:0:0:0:0:1 - - [15/Feb/2017:16:41:08 +0800] "GET / HTTP/1.1" 404 994
复制代码

2.代码

复制代码
package com.zhen.apachelog;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class ApacheLog {

public static class apacheMapper extends Mapper<Object, Text, Text, IntWritable>{

@Override
protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String valueStr = value.toString();
String[] strings = valueStr.split("\" ");
String status = strings[1].split(" ")[0];
context.write(new Text(status), new IntWritable(1));
}

}

public static class apacheReduce extends Reducer<Text, IntWritable, Text, IntWritable>{

@Override
protected void reduce(Text key, Iterable<IntWritable> value,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable intWritable : value) {
count+=intWritable.get();
}
context.write(key, new IntWritable(count));
}

}


public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

Configuration conf = new Configuration(); 
String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs(); 

Job job = new Job(conf,"ApacheLog");
job.setJarByClass(ApacheLog.class);

job.setMapperClass(apacheMapper.class);
job.setReducerClass(apacheReduce.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true)?0:1);
}

}
复制代码

 

3.将代码生成jar包

4.调用

EFdeMacBook-Pro:hadoop-2.8.0 FengZhen$ hadoop jar /Users/FengZhen/Desktop/ApacheLog.jar com.zhen.apachelog.ApacheLog /mapreduce/data/apachelog/in /mapreduce/data/apachelog/out
17/09/13 15:32:22 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
17/09/13 15:32:23 INFO input.FileInputFormat: Total input files to process : 1
17/09/13 15:32:23 INFO mapreduce.JobSubmitter: number of splits:1
17/09/13 15:32:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505268150495_0017
17/09/13 15:32:23 INFO impl.YarnClientImpl: Submitted application application_1505268150495_0017
17/09/13 15:32:23 INFO mapreduce.Job: The url to track the job: http://192.168.1.64:8088/proxy/application_1505268150495_0017/
17/09/13 15:32:23 INFO mapreduce.Job: Running job: job_1505268150495_0017
17/09/13 15:32:32 INFO mapreduce.Job: Job job_1505268150495_0017 running in uber mode : false
17/09/13 15:32:32 INFO mapreduce.Job: map 0% reduce 0%
17/09/13 15:32:37 INFO mapreduce.Job: map 100% reduce 0%
17/09/13 15:32:43 INFO mapreduce.Job: map 100% reduce 100%
17/09/13 15:32:43 INFO mapreduce.Job: Job job_1505268150495_0017 completed successfully
17/09/13 15:32:43 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=216
FILE: Number of bytes written=272795
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1776
HDFS: Number of bytes written=13
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3160
Total time spent by all reduces in occupied slots (ms)=3167
Total time spent by all map tasks (ms)=3160
Total time spent by all reduce tasks (ms)=3167
Total vcore-milliseconds taken by all map tasks=3160
Total vcore-milliseconds taken by all reduce tasks=3167
Total megabyte-milliseconds taken by all map tasks=3235840
Total megabyte-milliseconds taken by all reduce tasks=3243008
Map-Reduce Framework
Map input records=21
Map output records=21
Map output bytes=168
Map output materialized bytes=216
Input split bytes=150
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=216
Reduce input records=21
Reduce output records=2
Spilled Records=42
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=54
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=358612992
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1626
File Output Format Counters
Bytes Written=13

 

5.查看结果

EFdeMacBook-Pro:lib FengZhenhadoopfsls/mapreduce/data/apachelog/outFound2itemsrwrr1FengZhensupergroup02017091315:32/mapreduce/data/apachelog/out/SUCCESSrwrr1FengZhensupergroup132017091315:32/mapreduce/data/apachelog/out/partr00000EFdeMacBookPro:libFengZhen hadoop fs -text /mapreduce/data/apachelog/out/part-r-00000
200 8
404 13

posted on   嘣嘣嚓  阅读(366)  评论(0编辑  收藏  举报

编辑推荐:
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
阅读排行:
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· 记一次.NET内存居高不下排查解决与启示

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示