day18-Linux命令-三剑客awk

awk 命令格式

awk [options] ' [pattern {action}] '

​ 参数选项 匹配模式 执行动作

常用选项:

-F  指定分隔符,如果是多个,可以使用[ ]

匹配模式:

//      正则表达式匹配
><=!    比较表达匹配
n,m     范围匹配

执行动作:

print   执行输出动作

内置变量介绍

NR      控制匹配的行号
RS
FS      默认分割符
OFS     默认输出分隔符
$0      表示所有的列
$N      表示第N列
$NF     表示最后一列

awk工作原理

一款用于数据流的文本处理工具,它将文件作为记录序列处理。在一般情况下,文件内容的每行都是一个记录。

awk小试身手

按需求输出列

测试样本:

cat >person.txt << EOF
101,Zhangya:CEO-12000
102,BanZhang:CTO-15000
103,CKman:COO-18000
104,Mr.Sheng:COO-2x000
105,GuanRu:CXO-13000
EOF

比较匹配模式

运算符 含义 示类
< 小于 x>y
<= 小于或等于 x<=y
== 等于 x=y
!= 不等于 x!=y
>= 大于或等于 x>=y
以上的运算符是针对数字,下面两个 运算符之前已有示类针对字符串
~ 与正则表达式 x~/y/
!~ 与正则表达式不匹配 x!~y

学会awk只需要解决两件事
怎么切,怎么输出

第1章 AWK介绍

1.命令作用

以列为单位输出内容
怎么切,怎么输出

2.命令格式

awk [options] '[pattern {action}]'
参数选项 匹配模式 执行动作

第2章 AWK输出列

1.位置变量说明

, 默认输出的分隔符代表空格
$0 所有列
$1 第1列
$N 第N列
$NF 最后一列
$(NF-1) 倒数第二列

2.输出所有列

echo '123 456 789'|awk '{print $0}'

[root@centos7-100 ~]# echo '123 456 789' |awk '{print$0}'
123 456 789

3.只输出第一列

echo '123 456 789'|awk '{print $1}'

[root@centos7-100 ~]# echo '123 456 789'|awk '{print$1}'
123

4.输出第1和第2列,以空格为分隔符

echo '123 456 789'|awk '{print $1,$2}'

[root@centos7-100 ~]# echo '123 456 789'|awk '{print$1,$2}'
123 456

5.输出第1列到第3列

echo '123 456 789'|awk '{print $1,$2,$3}'

[root@centos7-100 ~]# echo '123 456 789'|awk '{print$1,$2,$3}'
123 456 789

6.改变输出顺序为第3列,第2列,第1列

echo '123 456 789'|awk '{print $3,$2,$1}'

[root@centos7-100 ~]# echo '123 456 789'|awk '{print$3,$2,$1}'
789 456 123

7.在第1列前添加A,第2列前添加B,第3列前添加C

echo '123 456 789'|awk '{print "A"$1,"B"$2,"C"$3}'

[root@centos7-100 ~]# echo '123 456 789'|awk '{print"A"$1,"B"$2,"C"$3}'
A123 B456 C789

awk '{print "自定义内容"$0}'

8.以:分隔输出1-3列

[root@centos7-100 ~]# echo '123 456 789'|awk '{print$1":"$2":"$3}'
123:456:789
[root@centos7-100 ~]# echo '123 456 789'|awk -v OFS=":" '{print$1,$2,$3}'
123:456:789

echo '123 456 789'|awk '{print $1":"$2":"$3}'

9.输出最后一列

echo '123 456 789'|awk '{print $NF}'

[root@centos7-100 ~]# echo '123 456 789'|awk '{print$NF}'
789

第3章 -F 或 FS改变默认分隔符:

1.语法格式:

awk -F ":"
awk -F "😕"
awk -F "[😕]"

2.FS改变默认分隔符

echo 1:2:3|awk -F":" '{print $1,$2}'

[root@centos7-100 ~]# echo '1:2:3'|awk -F':'  '{print$1,$2}'
1 2
[root@centos7-100 ~]# echo '1:2:3'|awk -v FS=':'  '{print$1,$2}'
1 2
[root@centos7-100 ~]# echo '1:2:3'|awk -v FS=":"  '{print$1,$2}'
1 2

echo 1-2:3|awk -F"[:-]" '{print $1,$2}'

[root@centos7-100 ~]# echo '1-2:3'|awk -F"[-:]" '{print$1,$2}'
1 2
[root@centos7-100 ~]# echo '1-2:3'|awk -F'[-:]' '{print$1,$2}'
1 2

3.输出/etc/passwd第1列和最后一列并且以:为输出分隔符

最后一列:第一列

cat /etc/passwd|awk -F":" '{print $NF":"$1}'

[root@centos7-100 ~]# cat /etc/passwd|awk -F':' '{print$NF":"$1}'
/bin/bash:root
/sbin/nologin:bin
/sbin/nologin:daemon
/sbin/nologin:adm
/sbin/nologin:lp
/bin/sync:sync
/sbin/shutdown:shutdown
/sbin/halt:halt
/sbin/nologin:mail
/sbin/nologin:operator
/sbin/nologin:games
/sbin/nologin:ftp
/sbin/nologin:nobody
/sbin/nologin:systemd-network
/sbin/nologin:dbus
/sbin/nologin:polkitd
/sbin/nologin:sshd
/sbin/nologin:postfix
/sbin/nologin:chrony
/bin/bash:oldboy
/bin/bash:adminuser
/sbin/nologin:nginx
/bin/bash:zhaocheng
/bin/bash:ops1
/bin/bash:ops2
/bin/bash:dev1
/bin/bash:dev2
/sbin/nologin:ntp

4.输出/etc/passwd倒数第二列:

4.1cat /etc/passwd|awk -F":" '{print $(NF-1)}'

[root@centos7-100 ~]# cat /etc/passwd|awk -F':' '{print$(NF-1)}'
/root
/bin
/sbin
/var/adm
/var/spool/lpd
/sbin
/sbin
/sbin
/var/spool/mail
/root
/usr/games
/var/ftp
/
/
/
/
/var/empty/sshd
/var/spool/postfix
/var/lib/chrony
/home/oldboy
/home/adminuser
/var/cache/nginx
/home/zhaocheng
/home/ops1
/home/ops2
/home/dev1
/home/dev2
/etc/ntp

4.2去掉前面的/

[root@centos7-100 ~]# cat /etc/passwd|awk -F ':/' '{print$(NF-1)}'
root
bin
sbin
var/adm
var/spool/lpd
sbin
sbin
sbin
var/spool/mail
root
usr/games
var/ftp




var/empty/sshd
var/spool/postfix
var/lib/chrony
home/oldboy
home/adminuser
var/cache/nginx
home/zhaocheng
home/ops1
home/ops2
home/dev1
home/dev2
etc/ntp

5.多个符号的匹配-匹配oldbug

echo ':😕/...---oldbug:..oldboy'

[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:]' '{print$4}'

[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:]' '{print$5}'

[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:]' '{print$6}'
oldbug

[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:/.]' '{print$14}'
oldboy
[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:/.]' '{print$13}'

[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:/.]' '{print$12}'

[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:/.]' '{print$11}'
oldbug
[root@centos7-100 ~]#
[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:/.]+' '{print$2}'
oldbug
[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[-:/.]+' '{print$1}'

[root@centos7-100 ~]# 
[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[:-]+' '{print$3}'
oldbug
[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[:-]+' '{print$(NF-2)}'
//...
[root@centos7-100 ~]# echo ':://...---oldbug:..oldboy'|awk -F'[:-]+' '{print$(NF-1)}'
oldbug
[root@centos7-100 ~]#

echo ':😕/...---oldbug:..oldboy'|awk -F'[:-]' '{print $(NF-1)}'
echo ':😕/...---oldbug:..oldboy'|awk -F"[😕.-]" '{print$(NF-3)}'
echo ':😕/...---oldbug:..oldboy'|awk -F'[😕.-]+' '{print $2}'
echo ':😕/...---oldbug:..oldboy'|awk -F'[-:]+' '{print $3}'

6.取出passwd中的用户家目录但是不要/根符号

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':/' '{print$(NF-1)}'
root
bin
sbin
var/adm
var/spool/lpd
sbin
sbin
sbin
var/spool/mail
root
usr/games
var/ftp




var/empty/sshd
var/spool/postfix
var/lib/chrony
home/oldboy
home/adminuser
var/cache/nginx
home/zhaocheng
home/ops1
home/ops2
home/dev1
home/dev2
etc/ntp

cat /etc/passwd |awk -F"😕" '{print$(NF-1)}'
awk -F'😕' '{print $2}' /etc/passwd

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':/' '{print$(NF-1)}'
root
bin
sbin
var/adm
var/spool/lpd
sbin
sbin
sbin
var/spool/mail
root
usr/games
var/ftp




var/empty/sshd
var/spool/postfix
var/lib/chrony
home/oldboy
home/adminuser
var/cache/nginx
home/zhaocheng
home/ops1
home/ops2
home/dev1
home/dev2
etc/ntp
[root@centos7-100 ~]# cat /etc/passwd |awk -F ':/' '{print$2}'
root
bin
sbin
var/adm
var/spool/lpd
sbin
sbin
sbin
var/spool/mail
root
usr/games
var/ftp




var/empty/sshd
var/spool/postfix
var/lib/chrony
home/oldboy
home/adminuser
var/cache/nginx
home/zhaocheng
home/ops1
home/ops2
home/dev1
home/dev2
etc/ntp

7.特殊情况-以单引号为分隔符

echo 1\'2\'3\'4
echo 1\'2\'3\'4|awk -F"'" '{print $4}'
[root@centos7-100 ~]# echo echo 1'2'3'4 
> 
[root@centos7-100 ~]# echo echo 1\'2\'3\'4 |awk -F "\'" '{print$1":"$2":"$3":"$4}'
awk: warning: escape sequence `\'' treated as plain `''
echo 1:2:3:4
[root@centos7-100 ~]# echo echo 1\'2\'3\'4 |awk -F "'" '{print$1":"$2":"$3":"$4}'
echo 1:2:3:4
echo 1\"2\"3\"4
echo 1\"2\"3\"4|awk -F'"' '{print $4}'
[root@centos7-100 ~]# echo 1\"2\"3\"4|awk -F '"' '{print$1":"$2":"$3":"$4}'
1:2:3:4

第4章 OFS 默认输出分隔符

1.更改默认分隔符

需要画图
echo 1-2-3|awk -v FS='-' -v OFS=':' '{print $1,$2,$3}'

[root@centos7-100 ~]# echo 1-2-3-4|awk -v FS="-" -v OFS=":" '{print$1,$2,$3,$4}'
1:2:3:4

echo 1:2:3|awk -v FS=':' -v OFS='-' '{print $1,$2,$3}'

第5章 比较匹配:

1.比较符号说明

针对数字:

<
!=

<=

针对字符串:
~
~!

案例:
awk -F"," '$1 > 1{print $0}' num.txt
awk -F"," '$1 != 1{print $0}' num.txt

2.输出/etc/passwd第3列等于0的行

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':' '$3==0 {print$0}'
root:x:0:0:root:/root:/bin/bash
[root@centos7-100 ~]# 

3.输出/etc/passwd第3列大于等于0的行

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':' '$3>=0 {print$0}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
oldboy:x:1000:2000::/home/oldboy:/bin/bash
adminuser:x:1001:1001::/home/adminuser:/bin/bash
nginx:x:997:995:nginx user:/var/cache/nginx:/sbin/nologin
zhaocheng:x:2223:2223::/home/zhaocheng:/bin/bash
ops1:x:2224:2224::/home/ops1:/bin/bash
ops2:x:2225:2225::/home/ops2:/bin/bash
dev1:x:2226:2226::/home/dev1:/bin/bash
dev2:x:2227:2227::/home/dev2:/bin/bash
ntp:x:38:38::/etc/ntp:/sbin/nologin

4.输出/etc/passwd第3列小于900的行

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':' '$3<900 {print$0}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin

5.输出/etc/passwd第3列大于300的行

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':' '$3>300 {print$0}'
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
oldboy:x:1000:2000::/home/oldboy:/bin/bash
adminuser:x:1001:1001::/home/adminuser:/bin/bash
nginx:x:997:995:nginx user:/var/cache/nginx:/sbin/nologin
zhaocheng:x:2223:2223::/home/zhaocheng:/bin/bash
ops1:x:2224:2224::/home/ops1:/bin/bash
ops2:x:2225:2225::/home/ops2:/bin/bash
dev1:x:2226:2226::/home/dev1:/bin/bash
dev2:x:2227:2227::/home/dev2:/bin/bash

6.输出/etc/passwd最后一列匹配/sbin/nologin的行

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':' '$NF~"/sbin/nologin"  {print$0}'
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
nginx:x:997:995:nginx user:/var/cache/nginx:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin

7.输出/etc/passwd最后一列不是/sbin/nologin的行

[root@centos7-100 ~]# cat /etc/passwd |awk -F ':' '$NF!~"/sbin/nologin"  {print$0}'
root:x:0:0:root:/root:/bin/bash
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
oldboy:x:1000:2000::/home/oldboy:/bin/bash
adminuser:x:1001:1001::/home/adminuser:/bin/bash
zhaocheng:x:2223:2223::/home/zhaocheng:/bin/bash
ops1:x:2224:2224::/home/ops1:/bin/bash
ops2:x:2225:2225::/home/ops2:/bin/bash
dev1:x:2226:2226::/home/dev1:/bin/bash
dev2:x:2227:2227::/home/dev2:/bin/bash

同学需求:

要求: URL

1.在特定时间段,把访问最多的接口查找出来,安次数多少排序。

例:service=App.Liveshow.hotAward 其中一种接口,具体几种我也不知道哈哈哈。

1.1查看第一行的接口

[root@centos7-100 ~]# head -1 2021-04-14.log 
["2021-04-14 00:00:00","223.104.36.83:63069","apiv2-rds.3yakj.com","POST",200,"\/?service=App.User.Userinfo",16649,null,"vivo 5.9.1","HTTP\/1.1",375]
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '[?"]' '{print$11}'
service=App.User.Userinfo
[root@centos7-100 ~]# 

1.2按次数查找排序

[root@centos7-100 ~]# cat 2021-04-14.log |awk -F '[?"]' '{print$11}'|sort|uniq -c |sort -rn |head 
 351990 service=App.Liveshow.hotAward
 174249 service=App.Wsapi.Userinfo
  99854 service=App.Seeding.Monthrank
  99526 service=App.User.Userinfo
  71505 service=App.PublicClass.financeNotify
  68327 service=App.Seeding.UserList
  68178 service=App.User.Myinfo
  63720 service=App.Wsapi.newjoinroom
  61872 service=App.Wsapi.onclose
  60381 service=App.Wsapi.sendgift1

2.统计某一时间段访问IP数量(1.将时间短内的日志取出来

2021-04-14 00:00
2021-04-14 05:05)

2.1先取出ip地址

[root@centos7-100 ~]# head -1 2021-04-14.log 
["2021-04-14 00:00:00","223.104.36.83:63069","apiv2-rds.3yakj.com","POST",200,"\/?service=App.User.Userinfo",16649,null,"vivo 5.9.1","HTTP\/1.1",375]
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '[":]' '{print$7}'
63069
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '[":]' '{print$6}'
223.104.36.83

2.2统计一段时间的ip地址

[root@centos7-100 ~]# cat 2021-04-14.log |awk '/2021-04-14 00:00/,/2021-04-14 05:05/ {print$0}'|head -1
["2021-04-14 00:00:00","223.104.36.83:63069","apiv2-rds.3yakj.com","POST",200,"\/?service=App.User.Userinfo",16649,null,"vivo 5.9.1","HTTP\/1.1",375]
[root@centos7-100 ~]# cat 2021-04-14.log |awk -F'[":]' '/2021-04-14 00:00/,/2021-04-14 05:05/ {print$6}'|sort|uniq -c |sort -rn |head 
  40044 47.103.144.36
  28833 47.103.127.196
  13896 47.103.97.109
   6003 106.92.113.138
   3896 223.90.121.154
   2973 117.189.143.23
   2638 122.9.69.120
   2624 116.63.98.153
   2568 116.63.44.202
   2485 116.63.237.164

4.统计报错的接口(只要返回的状态码不是200就是报错的接口)

4.1先取出接口

[root@centos7-100 ~]# head -1 2021-04-14.log 
["2021-04-14 00:00:00","223.104.36.83:63069","apiv2-rds.3yakj.com","POST",200,"\/?service=App.User.Userinfo",16649,null,"vivo 5.9.1","HTTP\/1.1",375]
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F',' '{print$5}'
200

4.2去除200的状态码,发现前200行都是正常的

[root@centos7-100 ~]# head -20 2021-04-14.log |awk -F',' '$5!=200 {print$5}'
[root@centos7-100 ~]# head -200 2021-04-14.log |awk -F',' '$5!=200 {print$5}'

4.3统计报错的状态码次数

[root@centos7-100 ~]# cat  2021-04-14.log |awk -F',' '$5!=200 {print$5}'|wc -l
11720

5.统计HTTP响应状态码

[root@centos7-100 ~]# cat  2021-04-14.log |awk -F',' '{print$5}' |sort|uniq -c |sort -rn|head 
2492540 200
  11430 504
    175 499
    111 502
      4 408

6.统计服务器并发量(统计某一秒有多少人访问服务器)

6.1统计/2021-04-14 00:00:00/,/2021-04-14 00:00:01中间一秒钟

[root@centos7-100 ~]# cat  2021-04-14.log |awk '/2021-04-14 00:00:00/,/2021-04-14 00:00:01/ {print$0}'|wc -l 
60

6.2统计/2021-04-14 00:30/,/2021-04-14 00:31/时间段并发量

[root@centos7-100 ~]# cat  2021-04-14.log |awk '/2021-04-14 00:30/,/2021-04-14 00:31/  {print$0}'|wc -l
2413

第7题:比如 xxx.xxx.xxx.xxx 访问了 service=App.User.Userinfo 多少次 其他IP 访问这个接口多少次 做时间段 排序

7.1先取出xxx.xxx.xxx.xxx和service=App.User.Userinfo两列

[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '["?]' '{print$6,$7}'
apiv2-rds.3yakj.com ,
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '["?]' '{print$6,$13}'
apiv2-rds.3yakj.com vivo 5.9.1
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '["?]' '{print$6,$11}'
apiv2-rds.3yakj.com service=App.User.Userinfo

7.2再去重排序

[root@centos7-100 ~]# cat 2021-04-14.log |awk -F '["?]' '{print$6,$11}' |sort|uniq -c|sort -rn|head

8.其他IP访问这个接口多少次

8.1先提前IP跟接口两列

[root@centos7-100 ~]# head -1 2021-04-14.log 
["2021-04-14 00:00:00","223.104.36.83:63069","apiv2-rds.3yakj.com","POST",200,"\/?service=App.User.Userinfo",16649,null,"vivo 5.9.1","HTTP\/1.1",375]
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '[":?]' '{print$4,$12}'
00 ,200,
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '[":?]' '{print$5,$13}'
, \/
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '[":?]' '{print$3,$14}'
00 service=App.User.Userinfo
[root@centos7-100 ~]# head -1 2021-04-14.log |awk -F '[":?]' '{print$6,$14}'
223.104.36.83 service=App.User.Userinfo
[root@centos7-100 ~]# 

8.2去重排序

[root@centos7-100 ~]# cat 2021-04-14.log |awk -F '[":?]' '{print$6,$14}' |sort|uniq -c |sort -rn |head  
 103032 222.189.85.141 service=App.Liveshow.hotAward
  87469 47.103.144.36 service=App.Wsapi.Userinfo
  86780 47.103.127.196 service=App.Wsapi.Userinfo
  71505 47.103.144.36 service=App.PublicClass.financeNotify
  51612 119.39.248.64 service=App.Liveshow.hotAward
  43347 120.228.1.250 service=App.Liveshow.hotAward
  32403 47.103.144.36 service=App.Wsapi.sendgift1
  31910 47.103.144.36 service=App.Wsapi.newjoinroom
  31769 47.103.127.196 service=App.Wsapi.newjoinroom
  31108 47.103.144.36 service=App.Wsapi.onclose

第6章 正则匹配

1.格式说明

awk '/正则表达式/{print $0}'

2.输出开头包含nginx的行

cat num.txt |awk -F"," '/2/,/4/{print $0}'

第7章 范围匹配

1.格式说明

NR 行
NR== 等于行
NR>= 大于等于行
NR<= 小于等于
NR>=N&&NR<=M 从N行到M行

2.输出/etc/passwd第2行

[root@centos7-100 ~]# cat /etc/passwd |awk 'NR==2 {print$0}'
bin:x:1:1:bin:/bin:/sbin/nologin

3.输出/etc/passwd第2行以后的所有行

[root@centos7-100 ~]# cat /etc/passwd |awk 'NR>2 {print$0}'
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
chrony:x:998:996::/var/lib/chrony:/sbin/nologin
oldboy:x:1000:2000::/home/oldboy:/bin/bash
adminuser:x:1001:1001::/home/adminuser:/bin/bash
nginx:x:997:995:nginx user:/var/cache/nginx:/sbin/nologin
zhaocheng:x:2223:2223::/home/zhaocheng:/bin/bash
ops1:x:2224:2224::/home/ops1:/bin/bash
ops2:x:2225:2225::/home/ops2:/bin/bash
dev1:x:2226:2226::/home/dev1:/bin/bash
dev2:x:2227:2227::/home/dev2:/bin/bash
ntp:x:38:38::/etc/ntp:/sbin/nologin

4.输出/etc/passwd第1到5行所有内容

[root@centos7-100 ~]# cat /etc/passwd |awk 'NR>=1&&NR <=5 {print$0}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

5.输出/etc/passwd以root开头到sshd开头的行

[root@centos7-100 ~]# cat /etc/passwd |awk '/^root/,/^sshd/ {print$0}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

6.特殊动作-BEGIN-END 在动作之前做什么操作

seq 1 10|awk 'BEGIN{print "开始"}{print $0}END{print "结束"}'
awk -F: '$3>100{a++}END{print a}' /etc/passwd

第7章 扩展-数组

统计排名前10的IP地址
方法1:
awk '{print $1}' access_log |sort | uniq -c | sort -rn | head -5

方法2:
awk '{IP[$1]++}END{for(key in IP) print IP[key],key }' access_log |sort -rn | haed -5

第8章 生产中日志分析需求

1.找出访问网站频次最高的 IP 排名前十

[root@centos7-100 ~]# cat  bbs.goumin.com_access.log |awk '{print$1}'|sort|uniq -c |sort -rn|head
  29438 207.46.13.69
  24726 106.11.157.236
  24713 106.11.158.236
  24706 106.11.153.243
  24501 106.11.152.242
  24385 106.11.156.242
  24335 106.11.159.234
  19165 207.46.13.125
  16978 207.46.13.131
  14918 207.46.13.181

2.找出访问网站排名前十的 URL

[root@centos7-100 ~]# cat  bbs.goumin.com_access.log |awk '{print$7}' |sort|uniq -c |sort -rn |head 
  15239 /images/common/back.gif
  12909 /attachments/month_1201/30/902039460.gif
  10567 /
   8539 /attachments/month_1201/16/699338809.gif
   7980 /attachments/month_1112/31/294664463.gif
   4716 /attachments/month_1305/02/462755223.gif
   4263 /attachments/month_1305/13/435842670.gif
   2615 /forum-66-1.html
   1907 /forum-44-1.html
   1427 /thread-4438751-1-1.html

3.找出中午 10 点到 2 点之间 www 网站访问频次最高的 IP

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |awk '/10:00:00/,/14:00:00/ {print$0}'|awk '/www/ {print$0}'|awk '{print$1}'|sort|uniq -c|sort -rn|head 
   4452 207.46.13.69
   1711 207.46.13.43
   1616 207.46.13.125
   1223 157.55.39.192
    976 157.55.39.158
    710 157.55.39.177
    670 207.46.13.112
    650 207.46.13.5
    633 203.208.60.231
    609 203.208.60.230

3.找出特定的页面被访问了多少次(thread界面访问多少次)

awk '$7 ~ "/thread-.*"{print $7}' bbs.goumin.com_access.log |sort|uniq -c|sort -rn|head -10

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |awk '$7~"thread.*"{print$7}'|sort|uniq -c |sort -rn |head
   1427 /thread-4438751-1-1.html
   1225 /thread-1135240-1-1.html
   1065 /thread-4731571-1-1.html
   1050 /thread-4678741-1-1.html
   1050 /thread-4677392-1-1.html
    942 /forum.php?mod=post&action=newthread&fid=148
    806 /thread-4734106-1-1.html
    787 /thread-2994344-1-1.html
    775 /thread-4729764-1-1.html
    763 /thread-4731586-1-1.html

4.找出有问题的 IP 地址,并告诉我这个IP地址都访问了什么页面,在对比前几天他来过吗?他从什么时间段开

4.1始访问的,什么时间段走了

[root@centos7-100 ~]# cat  bbs.goumin.com_access.log |awk '{print$1}'|sort|uniq -c |sort -rn|head
  29438 207.46.13.69
  24726 106.11.157.236
  24713 106.11.158.236
  24706 106.11.153.243
  24501 106.11.152.242
  24385 106.11.156.242
  24335 106.11.159.234
  19165 207.46.13.125
  16978 207.46.13.131
  14918 207.46.13.181

4.2查看都访问那些页面

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |awk '/207.46.13.69/ {print$7}' |sort|uniq -c|sort -rn|head 
      6 /forum-121-1.html
      5 /forum-44-1.html
      4 /forum-36-1.html
      4 /forum-120-72.html
      3 /thread-4734488-1-1.html
      3 /thread-4734486-1-1.html
      3 /thread-4734177-1-1.html
      3 /thread-4343484-1-1.html
      3 /thread-3535729-1-1.html
      3 /thread-2890650-1-1.html
[root@centos7-100 ~]# cat bbs.goumin.com_access.log |awk '/207.46.13.69/ {print$7}' |sort|uniq -c|sort -rn|wc -l
28869

egrep -vi 'bot|spider' bbs.goumin.com_access.log >> no_bot.txt

4.3默认查看bbs.goumin.com_access.log日志前10行,查看有那些人查看网站

[root@centos7-100 ~]# head bbs.goumin.com_access.log 
222.73.68.131 - - [21/Oct/2017:23:55:13 +0800] "GET /forum.php?mod=forumdisplay&fid=46&page=2 HTTP/1.1" 200 74519 "-" "Mozilla/5.2 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0 GreaterBrowser/9661" "222.73.68.131" "0.218" 0.202
207.46.13.181 - - [21/Oct/2017:23:55:13 +0800] "GET /thread-852746-1-1.html HTTP/1.1" 200 22439 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "207.46.13.181" "0.257" 0.252
222.73.68.131 - - [21/Oct/2017:23:55:14 +0800] "GET /forum.php?mod=forumdisplay&fid=47&page=1 HTTP/1.1" 200 89493 "-" "Mozilla/5.2 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0 GreaterBrowser/1693" "222.73.68.131" "0.310" 0.281
207.46.13.131 - - [21/Oct/2017:23:55:14 +0800] "GET /thread-253884-1-1.html HTTP/1.1" 200 23804 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "207.46.13.131" "0.344" 0.338
123.125.71.49 - - [21/Oct/2017:23:55:14 +0800] "GET /forum36/type1605/ HTTP/1.1" 200 18610 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "123.125.71.49" "0.254" 0.247
39.78.49.9 - - [21/Oct/2017:23:55:15 +0800] "GET /forum-22-1.html HTTP/1.1" 200 20808 "http://bbs.goumin.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063" "39.78.49.9" "0.235" 0.228
210.14.154.178 - - [21/Oct/2017:23:55:15 +0800] "GET / HTTP/1.1" 200 73320 "-" "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; Trident/5.0)" "210.14.154.178" "0.150" 0.139
207.46.13.181 - - [21/Oct/2017:23:55:15 +0800] "GET /thread-3391940-1-1.html HTTP/1.1" 200 42131 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "207.46.13.181" "0.421" 0.415
222.73.68.131 - - [21/Oct/2017:23:55:15 +0800] "GET /forum.php?mod=forumdisplay&fid=51&page=1 HTTP/1.1" 200 81309 "-" "Mozilla/5.2 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0 GreaterBrowser/5825" "222.73.68.131" "0.276" 0.259
207.46.13.181 - - [21/Oct/2017:23:55:16 +0800] "GET /thread-2891676-1-1.html HTTP/1.1" 200 24482 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "207.46.13.181" "0.362" 0.356

4.4排除Baiduspider搜索引擎,bingbot搜索引擎爬虫的行

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -iv 'bot|spider' >no_bot.txt
[root@centos7-100 ~]# wc -l no_bot.txt 
491777 no_bot.txt

4.5查询no_bot.txtIP地址

[root@centos7-100 ~]# cat no_bot.txt |awk '{print$1}'|sort|uniq -c |sort -rn|head 
   6131 122.10.12.122
   6015 68.180.231.54
   3676 104.197.197.203
   3587 212.150.211.165
   3334 210.14.154.178
   2824 108.177.181.202
   2768 185.25.32.45
   2727 146.148.67.171
   2554 175.20.88.183
   2554 119.57.159.183

awk '{print $1}' no_bot.txt|sort|uniq -c|sort -rn|head -10

4.6查询122.10.12.122IP地址查看那些网站

grep '122.10.12.122' no_bot.txt|awk '{print $4,$7}'

[root@centos7-100 ~]# cat no_bot.txt |grep "122.10.12.122" |awk '{print$4,$7}'|head 
[21/Oct/2017:23:56:11 /thread-34470-1-1.html
[21/Oct/2017:23:58:38 /thread-4728992-1-1.html
[21/Oct/2017:23:59:51 /home.php?mod=spacecp&ac=favorite&type=forum&id=75&handlekey=favoriteforum
[22/Oct/2017:00:01:46 /forum.php?mod=attachment&aid=NTMxNjg2OXw3NzVkMGQzNXwxNTA4MDA3OTc1fDB8Mjk4MjU5Nw%3D%3D&nothumb=yes
[22/Oct/2017:00:02:19 /thread-323057-1-1.html
[22/Oct/2017:00:03:23 /thread-201907-1-1.html
[22/Oct/2017:00:03:31 /thread-376800-1-1.html
[22/Oct/2017:00:04:50 /forum.php?mod=attachment&aid=ODQzNDQ0OXxjOGJkNGE1OHwxNTA4MDIzODkwfDB8NDY4MzI4Mg%3D%3D&nothumb=yes
[22/Oct/2017:00:05:12 /forum.php?mod=redirect&tid=4305037&goto=lastpost
[22/Oct/2017:00:06:14 /thread-116752-1-1.html

grep '122.10.12.122' no_bot.txt|awk '{print $4,$7}'|head -1

[root@centos7-100 ~]# cat no_bot.txt |grep "122.10.12.122" |head -1 
122.10.12.122 - - [21/Oct/2017:23:56:11 +0800] "GET /thread-34470-1-1.html HTTP/1.1" 499 0 "-" "-" "122.10.12.122" "0.012" -
[root@centos7-100 ~]# cat no_bot.txt |grep "122.10.12.122" |awk '{print$4}'|head -1
[21/Oct/2017:23:56:11
[root@centos7-100 ~]# cat no_bot.txt |grep "122.10.12.122" |awk '{print$7}'|head -1
/thread-34470-1-1.html

grep '122.10.12.122' no_bot.txt|awk '{print $4,$7}'|tail -1

[root@centos7-100 ~]# cat no_bot.txt |grep "122.10.12.122" |awk '{print$4,$7}'|tail 
[22/Oct/2017:23:44:45 /forum-22-139.html
[22/Oct/2017:23:45:54 /forum-114-130.html
[22/Oct/2017:23:46:38 /forum.php?mod=post&action=reply&fid=55&tid=62562&repquote=1826305&extra=page%3D1&page=1
[22/Oct/2017:23:47:36 /thread-1035263-1-1.html
[22/Oct/2017:23:49:14 /forum.php?mod=redirect&tid=4311161&goto=lastpost
[22/Oct/2017:23:49:45 /thread-309515-1-1.html
[22/Oct/2017:23:51:31 /thread-4704791-1-1.html
[22/Oct/2017:23:52:18 /forum-22-142.html
[22/Oct/2017:23:54:26 /thread-355261-1-1.html
[22/Oct/2017:23:54:46 /thread-4712254-1-1.html
[root@centos7-100 ~]# 

5.找出搜索引擎今天各抓取了多少次?抓取了哪些页面?响应时间如何?

5.1先查看有哪些搜索引擎抓取页面

[root@centos7-100 ~]# head bbs.goumin.com_access.log 
[root@centos7-100 ~]# tail bbs.goumin.com_access.log 

5.2echo "搜索引擎总计访问次数: $(egrep -i 'bot|spider|Spider' bbs.goumin.com_access.log |wc -l)"

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -i "bot|spider"|wc -l
465941

5.3echo "Baidu访问次数: $(egrep -i 'Baiduspider' bbs.goumin.com_access.log |wc -l)"

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -i "baiduspider"|wc -l
54374

5.4echo "bing访问次数: $(egrep -i 'bingbot' bbs.goumin.com_access.log |wc -l)"

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -i "bingbot"|wc -l
166589

5.5echo "Google访问次数: $(egrep -i 'googlebot' bbs.goumin.com_access.log |wc -l)"

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -i "googlebot"|wc -l
27837

5.6 echo "sougou访问次数: $(egrep -i 'Sogou web spider|pic.sogou.com' bbs.goumin.com_access.log |wc -l)"

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -i "Sogou web spider"|wc -l
42301

5.7 echo "yisou访问次数: $(egrep -i 'YisouSpider' bbs.goumin.com_access.log |wc -l)"

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -i "YisouSpider"|wc -l
146952

5.8 echo "brandwatch访问次数: $(egrep -i 'brandwatch' bbs.goumin.com_access.log |wc -l)"

简介:Brandwatch提供社交智能监测分析工具,汇总大量用户在社交网站上的行为,帮助用户进行数据驱动的商业策略。

[root@centos7-100 ~]# cat bbs.goumin.com_access.log |egrep -i "brandwatch"|wc -l
5101

11.5 分钟之内告诉我结果

第9章 日志分析脚本参考

!/bin/bash

1.显示服务信息

echo "==============================
服务器名:$(hostname)
服务器IP:$(hostname -I)
查询日志为:xxx.com_access.log
查询时间为: $(date +%F)
=============================="

2.PV数

echo "PV数量为: $(wc -l bbs.xxxx.com_access.log|awk '{print $1}')"
echo "=============================="

3.搜索引擎次数

echo "搜索情况汇总"
echo "搜索引擎总计访问次数: $(egrep -i 'bot|spider|Spider' bbs.xxxx.com_access.log |wc -l)"
echo "Baidu访问次数: $(egrep -i 'Baiduspider' bbs.xxxx.com_access.log |wc -l)"
echo "bing访问次数: $(egrep -i 'bingbot' bbs.xxxx.com_access.log |wc -l)"
echo "Google访问次数: $(egrep -i 'googlebot' bbs.xxxx.com_access.log |wc -l)"
echo "sougou访问次数: $(egrep -i 'Sogou web spider|pic.sogou.com' bbs.xxxx.com_access.log |wc -l)"
echo "yisou访问次数: $(egrep -i 'YisouSpider' bbs.xxxx.com_access.log |wc -l)"
echo "brandwatch访问次数: $(egrep -i 'brandwatch' bbs.xxxx.com_access.log |wc -l)"

4.TOP IP

echo "=============================="
echo "访问最多IP前10为:"
num=1
exec < ip.txt
while read line
do
num=echo ${line}|awk '{print $1}'
ip=echo ${line}|awk '{print $2}'
host=curl -s cip.cc/${ip}|awk '/地址/{print $3}'
echo "${num} ${ip} ${host}"
sleep 2
done

5.其他

echo ""
echo "监控关键链接为:GET /thread-"
echo "
"
echo "关键链接PV访问次数: $(grep "GET /thread-" bbs.xxxx.com_access.log|wc -l)"
echo ""
echo "关键链接平均响应时间为: $(grep "GET /thread-" bbs.xxxx.com_access.log|awk '{sum+=$NF} END {print sum/NR}')"
echo "
"
echo "关键链接访问响应时间排名"
echo "$(awk '{print $NF}' bbs.xxxx.com_access.log |grep -v "-"|cut -b -3|sort|uniq -c|sort -nr|head -10)"

posted @ 2021-10-19 21:41  zhaocheng690  阅读(140)  评论(0编辑  收藏  举报