运维界的卡乐咪

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

提供BEGIN和END的作用是给程序赋予初始状态和在程序结束之后执行一些扫尾的工作。任何在BEGIN之后列出的操作(在{}内)将awk开始扫描输入之前执行,而END之后列出的操作将在扫描完全部的输入之后执行。因此,通常使用 BEGIN来显示变量和预置(初始化)变量,使用END来输出最终结果。

awk 'BEGIN{array[1]="张三";array[2]="李四";for(key in array) print key,array[key]}'

awk 'BEGIN{array[1]="张三";array[2]="李四"};END {for(key in array) print key,array[key]}' /etc/hosts    这种方式后面必须跟一个文件

cat /etc/hosts |awk 'BEGIN{array[1]="张三";array[2]="李四"};END {for(key in array) print key,array[key]}'

把文件内容第一列作为下标key,第二列作为值S[key],放入数组S[]然后输出

[root@localhost ~]# cat test.log
1 张三
2 李四
[root@localhost ~]# awk '{S[$1]=$2}END{for(key in S) print key,S[key]}' test.log
1 张三
2 李四

 实例1:

模拟日志分析,将文件里的域名过滤并去重,统计每个域名出现的次数,用awk数组方式:

[root@localhost ~]# cat test.log
https://www.taobao.com/index.html
https://www.taobao.com/markets/xie/nvxie/index?spm=a21bo.2017.201867-main.4.5af911d953rZGw
https://login.taobao.com/member/login.jhtml
https://3c.tmall.com/?spm=a21bo.2017.201859.5.795811d9YuIM1h
https://neiyi.taobao.com/?spm=a21bo.2017.201867-main.3.795811d9YuIM1h
https://3c.tmall.com/?spm=a21b5811d9YuIM1h
https://3c.tmall.com/?spm=a21bo.20
[root@localhost ~]# awk -F "/" '{S[$3]=S[$3]+1}END{for (k in S) print k,S[k]}' test.log
login.taobao.com 1
neiyi.taobao.com 1
3c.tmall.com 3
www.taobao.com 2

awk -F "/" '{S[$3]+}END{for (k in S) print k,S[k]}' test.log

awk -F "/" '{S[$3]+=1}END{for (k in S) print k,S[k]}' test.log

注:这里累加方式有S[$3]++、S[$3]+=1相当于S[$3]=S[$3]+1
解析:

相当于S数组以截取到的$3域名为数组下标然后将值赋值给数组,相同下标的数组累加

S[www.taobao.com,www.taobao.com,login.taobao.com,3c.tmall.com,neiyi.taobao.com,3c.tmall.com,3c.tmall.com]

读第一行S[www.taobao.com]=1

读第二行S[www.taobao.com]=1+1=2 此处覆盖第一行的值,因为数组下标相同

读第三行S[login.taobao.com]=1

读第四行S[3c.tmall.com]=1

读第五行S[neiyi.taobao.com]=1

读第六行S[3c.tmall.com]=1+1=2

读第六行S[3c.tmall.com]=2+1=3

实例2:

统计web日志单ip访问请求数排名(这个比较常用)

假设日志文件内容:

10.0.0.41 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.43 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.42 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.46 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.42 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.41 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.41 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
10.0.0.46 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
方法一:用awk数据组方式处理

[root@localhost ~]# awk '{S[$1]+=1}END{for (k in S) print k ,S[k]}' test.log | sort -rn -k2
10.0.0.41 3
10.0.0.47 2
10.0.0.46 2
10.0.0.42 2
10.0.0.43 1

#提示 $1为第一个域的内容。-k2 为对第二个字段排序,即对数量排序。

方法二:使用awk配合sort排序以及uniq去重处理

[root@localhost ~]# awk '{print $1}' test.log|sort|uniq -c |sort -rn -k1 相当于awk -F " " '{print $1}' test.log|sort|uniq -c |sort -rn -k1
      3 10.0.0.41
      2 10.0.0.47
      2 10.0.0.46
      2 10.0.0.42
      1 10.0.0.43
#提示 $1是awk以空格为分隔符的第一列数据。-k2 为对第二个字段排序,即对数量排序。

 方法三:使用sed命令处理

[root@localhost ~]# sed's/- -.*$//g' test.log|sort|uniq -c|sort -rn -k1
      3 10.0.0.41
      2 10.0.0.47
      2 10.0.0.46
      2 10.0.0.42
      1 10.0.0.43

 

提示:sed管道后的第一个sort是让所有一样的IP挨着,因为uniq -c只能对相邻的IP行去重计数。

 

实例3:分析日志中被访问次数及占用带宽资源最大的文件

假如日志内容如下:

[root@localhost ~]# cat test.log
2019-08-19 02:12:24 GET /uploadfiles/20170524-153504.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 206 382
2019-08-19 02:12:24 GET /jquery/jquery.cycle.all.js - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 19173
2019-08-19 02:12:25 GET /images/iproducts_title_bg.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/css/style_en.css 200 2431
2019-08-19 02:12:25 GET /thumbs/20170828-115757.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 48569
2019-08-19 02:12:27 GET /uploadfiles/20170704-184425.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 183697
2019-08-19 02:12:28 GET /thumbs/20170828-115554.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 39900
2019-08-19 02:12:28 GET /thumbs/20170828-115533.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 41881
2019-08-19 02:12:30 GET /thumbs/20170828-115633.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 45656
2019-08-19 02:12:25 GET /thumbs/20170828-115757.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 48569
2019-08-19 02:12:25 GET /thumbs/20170828-115757.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 48569
2019-08-19 02:12:25 GET /thumbs/20170828-115757.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 48569
2019-08-19 02:12:25 GET /thumbs/20170828-115757.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 48569
2019-08-19 02:12:25 GET /thumbs/20170828-115757.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 48569
2019-08-19 02:12:24 GET /uploadfiles/20170524-153504.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 206 382
2019-08-19 02:12:24 GET /uploadfiles/20170524-153504.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 206 382
2019-08-19 02:12:30 GET /thumbs/20170828-115633.jpg - - 14.248.67.178 HTTP/1.1 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/76.0.3809.100+Safari/537.36 http://www.baidu.com/ 200 45656

输出格式:【访问次数】*【单个文件大小】  【访问次数】 【文件名绝对路径(带URL)】

 

方法一:

[root@localhost ~]# awk '{print $4"\t " $12}' test.log|sort|uniq -c|awk '{print $1*$3"\t" $1"\t" $2}'|sort -rn
291414    6    /thumbs/20170828-115757.jpg
183697    1    /uploadfiles/20170704-184425.jpg
91312    2    /thumbs/20170828-115633.jpg
41881    1    /thumbs/20170828-115533.jpg
39900    1    /thumbs/20170828-115554.jpg
19173    1    /jquery/jquery.cycle.all.js
2431    1    /images/iproducts_title_bg.jpg
1146    3    /uploadfiles/20170524-153504.jpg

方法二:

[root@localhost ~]# awk '{S_num[$4]++;S_size[$4]=S_size[$4]+$12}END{for (k in S_num) print S_size[k],S_num[k],k}' test.log | sort -rn
291414 6 /thumbs/20170828-115757.jpg
183697 1 /uploadfiles/20170704-184425.jpg
91312 2 /thumbs/20170828-115633.jpg
41881 1 /thumbs/20170828-115533.jpg
39900 1 /thumbs/20170828-115554.jpg
19173 1 /jquery/jquery.cycle.all.js
2431 1 /images/iproducts_title_bg.jpg
1146 3 /uploadfiles/20170524-153504.jpg

-----------------------------------------------------------------------------------------------

参考转载自https://blog.51cto.com/oldboy/1184177

学习记录,仅供参考!!!

 

posted on 2020-02-12 20:33  卡乐咪运维  阅读(310)  评论(0编辑  收藏  举报