linux三剑客之awk从入门到精通

1-1 awk工具的介绍、工作原理解析

GNU awk:
文本处理三工具:grep,sed,awk
grep,egrep,fgrep 文本过滤工具:pattern
sed: 行编辑器(擅长替换、取行)
支持2种内存空间及工作场所
模式空间、保持空间
awk:报告生成器,格式化文本输出(擅长取列)
AWK:Aho,Weibeiger,Kerngghan --> New AWK,NAWK
GNU awk,gawk
gawk - pattern scanning and processing language

[root@linux-node11 ~]# which awk
/bin/awk
[root@linux-node11 ~]# ll /bin/awk 
lrwxrwxrwx. 1 root root 4 Jun 18 16:50 /bin/awk -> gawk

基本用法:gawk [options] 'program' FILE ...
program: PATTERN{ACTION STATEMENTS}
语句之间用分号分隔
print,printf
选项:
-F:指明输入时用到的字段分隔符;
-v:var=value:自定义变量;
awk是怎么处理文本的,类似于sed
sed是读取文本到模式空间,在模式空间进行处理的。
awk在每行读取文本的时候会切成片,并每片在gawk的内部自动会赋予内键变量($1,$2,$3)来进行保存。当然切成片以后我们也可以对每片进行额外加工和处理(awk每读取一行就会切片,切成片段,如何处理就看你做判断,在来指定行范围等等)

1-2 awk内建变量的示例演示(RS,FS,ORS等)

[root@linux-node1 ~]# tail -3 /etc/fstab |awk '{print $2,$4}'
/ defaults
/boot defaults
swap defaults

1、print

print item1, item2...
要点:
1)逗号分隔符;
2)输出的各item可以是字符串,也可以是数值,当前记录的字段,变量或awk的表达式;
3)如省略item,相当于print $0

[root@linux-node1 ~]# tail -3 /etc/fstab |awk '{print "hello",$2,$4,6}' 
hello / defaults 6
hello /boot defaults 6
hello swap defaults 6

2、变量

2.1 内建变量
FS:input field seperator, 默认为空白字符;
OFS:output field seperator, 默认为空白字符;
RS: input record seperator, 输入时的换行符;
ORS:output record seperator, 输出时的换行符;
NF: number of field, 每一行的字段数量
{print NF}, {print $NF}
NR: number of record, 行数
FNR:各文件分别计数,行数;
FILENAME:当前文件名
ARGC:命令行参数的个数;
ARGV: 数组,保存的是命令行所给定的各参数;

[root@linux-node1 ~]# awk -v FS=":" '{print $1}' /etc/passwd
或[root@linux-node1 ~]# awk -F: '{print $1}' /etc/passwd 
[root@linux-node1 ~]# awk -v FS=":" -v OFS=':' '{print $1,$3,$7}' /etc/passwd
#把空白当做换行符
[root@linux-node1 ~]# awk -v RS=' ' '{print $0}' /etc/passwd
[root@linux-node1 ~]# awk -v RS=' ' -v ORS='#' '{print $0}' /etc/passwd
[root@linux-node1 ~]# awk '{print NF}' /etc/fstab 
0
1
2
10
1
9
12
1
6
6
6
[root@linux-node1 ~]# awk '{print $NF}' /etc/fstab   
#
/etc/fstab
2017
#
'/dev/disk'
info
#
0
0
0
[root@linux-node1 ~]# awk '{print FNR}' /etc/fstab /etc/issue
1
2
3
4
5
6
7
8
9
10
11
1
2
3
[root@linux-node1 ~]# awk '{print FILENAME}' /etc/fstab /etc/issue   
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/fstab
/etc/issue
/etc/issue
/etc/issue
[root@linux-node1 ~]# awk 'BEGIN{print ARGC}' /etc/fstab /etc/issue
3
[root@linux-node1 ~]# awk 'BEGIN{print ARGV[0]}' /etc/fstab /etc/issue         
awk
 [root@linux-node1 ~]# awk 'BEGIN{print ARGV[1]}' /etc/fstab /etc/issue 
/etc/fstab
[root@linux-node1 ~]# awk 'BEGIN{print ARGV[2]}' /etc/fstab /etc/issue 
/etc/issue

1-3 自定义变量、引用、printf使用

3自定义变量

(1) -v var=value
变量名区分字符大小写;
(2) 在program中直接定义

[root@linux-node1 ~]# awk -v test='hello gawk' 'BEGIN{print test}'
hello gawk
awk 引用变量不需要加$符
也可以在程序中直接定义变量:
[root@linux-node1 ~]# awk 'BEGIN{test="hello gawk"; print test}'
hello gawk

4、printf命令
格式化输出: printf FORMAT, item1,item2...
(1) FORMAT 必须给出;
(2) 不会自动换行,需要显示给出换行控制符,\n
(3) FORMAT中需要分别为后面的每个item指定一个格式化符号;
格式符:
%c:显示字符的ASCII码;
%d,%i: 显示十进制整数;
%e,%E: 科学计数法数值显示;
%f:显示为浮点数;
%g,%G 以科学计数法或浮点形式显示数值;
%s: 显示字符串;
%u: 无符号整数;
%%: 显示%自身

[root@linux-node1 ~]# awk -F: '{printf "%s\n",$1}' /etc/passwd
[root@linux-node1 ~]# awk -F: '{printf "Username: %s\n",$1}' /etc/passwd
Username: systemd-network
Username: dbus
Username: polkitd
Username: tss
Username: postfix
Username: sshd
Username: apache
Username: mysql
Username: redis
Username: zabbix
Username: ntp
Username: www
[root@linux-node1 ~]# awk -F: '{printf "Username: %s, UID: %d\n",$1,$3}' /etc/passwd
Username: polkitd, UID: 997
Username: tss, UID: 59
Username: postfix, UID: 89
Username: sshd, UID: 74
Username: apache, UID: 48
Username: mysql, UID: 27
Username: redis, UID: 996
Username: zabbix, UID: 995
Username: ntp, UID: 38
Username: www, UID: 1000

1-4 printf修饰符、操作符、pattern

修饰符:

#[.#]: 第一个数字控制显示的宽度;第二个#表示小数点后的精度
%3.1f
-:左对齐
+:显示数值的符号
[root@linux-node1 ~]# awk -F: '{printf "Username: %15s, UID: %d\n",$1,$3}' /etc/passwd
[root@linux-node1 ~]# awk -F: '{printf "Username: %-15s, UID: %d\n",$1,$3}' /etc/passwd
Username: polkitd        , UID: 997
Username: tss            , UID: 59
Username: postfix        , UID: 89
Username: sshd           , UID: 74
Username: apache         , UID: 48
Username: mysql          , UID: 27
Username: redis          , UID: 996
Username: zabbix         , UID: 995
Username: ntp            , UID: 38
Username: www            , UID: 1000

4、操作符

算术操作符
x+y, x-y, x*y, x/y, x^y, x%y
-x
+x: 转换为数值
字符串操作符:没有符号的操作符,字符串连接
赋值操作符:
=, +=, -=, /=, %=, ^=
++, --
比较操作符:
>, >= <, <=, !=, ==
模式匹配符:
~: 是否匹配
!~: 是否不匹配
逻辑操作符:
&&
||

函数调用:
function_name(argu1, argu2, ...)
条件表达式:
selector?if-true-expression:if-false-expression

  [root@linux-node1 ~]# awk -F: '{$3>=1000?usertype="Common User":usertype="Sysadmin  or SysUser";printf "%15s:%-s\n",$1,usertype}' /etc/passwd
           root:Sysadmin  or SysUser
            bin:Sysadmin  or SysUser
         daemon:Sysadmin  or SysUser
            adm:Sysadmin  or SysUser
             lp:Sysadmin  or SysUser
           sync:Sysadmin  or SysUser
       shutdown:Sysadmin  or SysUser
           halt:Sysadmin  or SysUser
           mail:Sysadmin  or SysUser
       operator:Sysadmin  or SysUser
          games:Sysadmin  or SysUser
            ftp:Sysadmin  or SysUser
         nobody:Sysadmin  or SysUser
  avahi-autoipd:Sysadmin  or SysUser
systemd-bus-proxy:Sysadmin  or SysUser
systemd-network:Sysadmin  or SysUser
           dbus:Sysadmin  or SysUser
        polkitd:Sysadmin  or SysUser
            tss:Sysadmin  or SysUser
        postfix:Sysadmin  or SysUser
           sshd:Sysadmin  or SysUser
         apache:Sysadmin  or SysUser
          mysql:Sysadmin  or SysUser
          redis:Sysadmin  or SysUser
         zabbix:Sysadmin  or SysUser
            ntp:Sysadmin  or SysUser
            www:Common User
           papa:Common User

1-5 pattern示例讲解,控制语句介绍

5、PATTERN

(1) empty: 空模式,匹配每一行;
(2) /regular expression/: 仅处理能够被此处匹配到的行;
(3) relational expression: 关系表达式;结果有真有假,结果为真才被处理
真:结果为非0值,非空字符串;
(4) line ranges: 行范围,
startline, endline
/pat1/,/pat2/
注意:不支持直接给出数字的格式,可以使用NR的形式
(5)BEGIN/ENDs模式
BEGIN{}: 仅在开始处理文件中的文本之前执行一次(用来打印表头只会执行一次);
END{}: 仅在文本处理完成之后,命令执行之前执行一次;

[root@linux-node1 ~]# awk '/^UUID/{print $1}' /etc/fstab  
UUID=742654df-f0e1-4468-a163-68c795a6d553
[root@linux-node1 ~]# awk '!/^UUID/{print $1}' /etc/fstab  

#
#
#
#
#
#
#
/dev/mapper/centos-root
/dev/mapper/centos-swap
[root@linux-node1 ~]# awk -F: '$3>=1000{print $1,$3}' /etc/passwd 
www 1000
papa 1001
[root@linux-node1 ~]# awk -F: '$3<1000{print $1,$3}' /etc/passwd
[root@linux-node1 ~]# awk -F: '$NF=="/bin/bash"{print $1,$NF}' /etc/passwd
root /bin/bash
papa /bin/bash
[root@linux-node1 ~]# awk -F: '$NF~/bash$/{print $1,$NF}' /etc/passwd   
root /bin/bash
www /sbin/bash
papa /bin/bash
[root@linux-node1 ~]# awk -F: '/^root/,/^papa/{print $1}' /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
operator
games
ftp
nobody
avahi-autoipd
systemd-bus-proxy
systemd-network
dbus
polkitd
tss
postfix
sshd
apache
mysql
redis
zabbix
ntp
www
papa
[root@linux-node1 ~]# awk -F: '/^h/,/^w/{print $1}' /etc/passwd 
halt
mail
operator
games
ftp
nobody
avahi-autoipd
systemd-bus-proxy
systemd-network
dbus
polkitd
tss
postfix
sshd
apache
mysql
redis
zabbix
ntp
www
[root@linux-node1 ~]# awk -F: '(NR>=2&&NR<=10){print $1}' /etc/passwd
bin
daemon
adm
lp
sync
shutdown
halt
mail
operator
[root@linux-node1 ~]# awk -F: 'BEGIN{print "    username    uid \n-------------------"}' 
    username    uid 
-------------------
[root@linux-node1 ~]# awk -F: 'BEGIN{print "    username    uid \n-------------------"}{print $1,$3}' /etc/passwd
[root@linux-node1 ~]# awk -F: '{print "    username    uid \n-------------------"}{print $1,$3}' /etc/passwd
    username    uid 
-------------------
root 0
    username    uid 
-------------------
bin 1
    username    uid 
-------------------
daemon 2
    username    uid 
-------------------
adm 3
    username    uid 
-------------------
lp 4
    username    uid 
-------------------
sync 5
    username    uid 
-------------------
[root@linux-node1 ~]# awk -F: 'BEGIN{print "    username    uid \n-------------------"}{print $1,$3}END{print "===============\n  end"}' /etc/passwd
ntp 38
www 1000
papa 1001
===============
  end

6、常用的action

(1) Expressions
(2) Control statements: if, while等
(3) Compound statements:组合语句:
(4) input statements
(5) output statements

7、控制语句

if(condition) {statements}
if(condition) {statements} else {statements}
whiel(condition) {statements}
do {statements} while(condition)
for (expr1,expr2,expr3) {statements}
break
continue
exit
delete array[index]
delete array

1-6 if-else、while各自使用场景及示例

if-else语句

语法: if(condition) statement [else statement]
如果有多个语句需要加{}
eg1:如果用户的id大于等于1000就显示其用户名及id号

#单分支 print 后面是不用加{}的
[root@linux-node1 ~]# awk -F: '{if($3>=1000) print $1,$3}' /etc/passwd
www 1000
papa 1001
#双分支
eg2:如果用户的id大于等于1000就普通用户否则就是管理员或系统用户
[root@linux-node1 ~]# awk -F: '{if($3>=1000) {printf "Common user: %s\n",$1} else {printf "root or Sysuser: %s\n",$1}}' /etc/passwd
root or Sysuser: root
root or Sysuser: bin
root or Sysuser: daemon
root or Sysuser: adm
root or Sysuser: lp
root or Sysuser: sync
root or Sysuser: shutdown
root or Sysuser: halt
root or Sysuser: mail
root or Sysuser: operator
root or Sysuser: games
root or Sysuser: ftp
root or Sysuser: nobody
root or Sysuser: avahi-autoipd
root or Sysuser: systemd-bus-proxy
root or Sysuser: systemd-network
root or Sysuser: dbus
root or Sysuser: polkitd
root or Sysuser: tss
root or Sysuser: postfix
root or Sysuser: sshd
root or Sysuser: apache
root or Sysuser: mysql
root or Sysuser: redis
root or Sysuser: zabbix
root or Sysuser: ntp
Common user: www
Common user: papa
#使用场景:对awk取得的整行或某个字段做条件判断;
eg3:如果用户默认的bash,那么我们就显示这个用户的用户名
[root@linux-node1 ~]# awk -F: '{if($NF=="/bin/bash")print $1}' /etc/passwd  
root
papa
[root@linux-node1 ~]# awk '{if(NF>5)print $0}' /etc/fstab
[root@linux-node1 ~]# df -h|awk -F "[% ]+" '/^\/dev/{print $5}'
8
44
[root@linux-node1 ~]# df -h|awk -F[%] '/^\/dev/{print $1}'|awk '{if($NF>=20)print $1}'
/dev/sda1

while循环

语法: while(condition) statement
注意:如果语句不上一个就要加{}
条件为真进入循环,条件为假退出循环
使用场景:对一行内有多个字段逐一类似处理时作用;对数组中的各元素逐一处理时作用;

#对每一行的单独字段做统计
[root@linux-node1 ~]# awk '/^[[:space:]]*linux16/{print}' /etc/grub2.cfg
[root@linux-node1 ~]# awk '/^[[:space:]]*linux16/{i=1;while(i<=NF) {print $i,length($i);i++}}' /etc/grub2.cfg 
linux16 7
/vmlinuz-3.10.0-327.el7.x86_64 30
root=/dev/mapper/centos-root 28
ro 2
crashkernel=auto 16
rd.lvm.lv=centos/root 21
rd.lvm.lv=centos/swap 21
biosdevname=0 13
net.ifnames=0 13
rhgb 4
quiet 5
LANG=en_US.UTF-8 16
linux16 7
/vmlinuz-0-rescue-004f45a8807740d880f2253f514a959e 50
root=/dev/mapper/centos-root 28
ro 2
crashkernel=auto 16
rd.lvm.lv=centos/root 21
rd.lvm.lv=centos/swap 21
biosdevname=0 13
net.ifnames=0 13
rhgb 4
quiet 5
[root@linux-node1 ~]# awk '/^[[:space:]]*linux16/{i=1;while(i<=NF) {if(length($i)>=7) {print $i,length($i)};i++}}' /etc/grub2.cfg 
linux16 7
/vmlinuz-3.10.0-327.el7.x86_64 30
root=/dev/mapper/centos-root 28
crashkernel=auto 16
rd.lvm.lv=centos/root 21
rd.lvm.lv=centos/swap 21
biosdevname=0 13
net.ifnames=0 13
LANG=en_US.UTF-8 16
linux16 7
/vmlinuz-0-rescue-004f45a8807740d880f2253f514a959e 50
root=/dev/mapper/centos-root 28
crashkernel=auto 16
rd.lvm.lv=centos/root 21
rd.lvm.lv=centos/swap 21
biosdevname=0 13
net.ifnames=0 13

1-7 do-while、for、switch的使用

do-while循环

语法:do statement while(condition)
意义:至少执行一次循环体

for循环

语法:for(expr1;expr2;expr3) statement
for(variable assignment;condition;iteration process) {for-body}

[root@linux-node1 ~]# awk '/^[[:space:]]*linux16/{for(i=1;i<=NF;i++) {print $i,length($i)}}' /etc/grub2.cfg 
linux16 7
/vmlinuz-3.10.0-327.el7.x86_64 30
root=/dev/mapper/centos-root 28
ro 2
crashkernel=auto 16
rd.lvm.lv=centos/root 21
rd.lvm.lv=centos/swap 21
biosdevname=0 13
net.ifnames=0 13
rhgb 4
quiet 5
LANG=en_US.UTF-8 16
linux16 7
/vmlinuz-0-rescue-004f45a8807740d880f2253f514a959e 50
root=/dev/mapper/centos-root 28
ro 2
crashkernel=auto 16
rd.lvm.lv=centos/root 21
rd.lvm.lv=centos/swap 21
biosdevname=0 13
net.ifnames=0 13
rhgb 4
quiet 5

特殊用法:
能够遍历数组中的元素:
语法:for (var in array) {for-body}

switch语句

语法: switch(expression) {case VALUE1 or /REGEXP/: statement; case VALUE2 or /REGEXP2/: statement; ...; default: statement}

break和continue

break [n]
continue

next

提前结束对本行的处理而直接进入下一行;

[root@linux-node1 ~]# awk -F: '{if($3%2!=0) next; print $1,$3}' /etc/passwd 
root 0
daemon 2
lp 4
shutdown 6
mail 8
games 12
ftp 14
avahi-autoipd 170
systemd-network 998
sshd 74
apache 48
redis 996
ntp 38
www 1000

1-8 关联数组及其使用示例讲解

array

关联数组: array[index-expression]
index-expression:
(1) 可作用任意字符串:字符串要使用双引号;
(2) 如果某元素事先不存在,在引用时,awk会自动创建此元素,并将其傎初始化为空串;
若要判断数组中是否存在某元素,要使用index in array 格式进行;

 [root@linux-node1 ~]# awk 'BEGIN{weekdays["mon"]="Monday";weekdays["tue"]="Tuesday";print weekdays["mon"]}'
Monday

若要遍历数组中的每个元素,要使用for循环:
for(var in array) {for-body}

[root@linux-node1 ~]# awk 'BEGIN{weekdays["mon"]="Monday";weekdays["tue"]="Tuesday";for(i in weekdays) {print weekdays[i]}}'    
Tuesday
Monday

注意:var会遍历array的每个索引;

[root@linux-node1 ~]# netstat -tan|awk '/^tcp\>/{state[$NF]++}END{for(i in state) {print i,state[i]}}'
LISTEN 6
ESTABLISHED 3
[root@linux-node1 ~]# awk '{ip[$1]++}END{for(i in ip){print i,ip[i]}}' /var/log/httpd/access_log-20171009
192.168.56.1 77

练习1: 统计/etc/fstab文件中每个文件系统类型出现的次数;

[root@linux-node1 ~]# awk '/^UUID/{fs[$3]++}END{for(i in fs) {print i,fs[i]}}' /etc/fstab 
xfs 1

练习2:统计指定文件中每个单词出现的次数;

[root@linux-node1 ~]# awk '{for(i=1;i<=NF;i++){count[$i]++}}END{for(i in count){print i,count[i]}}' /etc/fstab 
swap 2
fstab(5), 1
filesystems, 1
on 1
/etc/fstab 1
/boot 1
more 1
mount(8) 1
UUID=742654df-f0e1-4468-a163-68c795a6d553 1
pages 1

1-9 内置函数、自定义函数的示例

函数

9.1 内置函数
数值处理:
rand(): 返回0和1之间的一个随机数;

[root@linux-node1 ~]# awk 'BEGIN{print rand()}'
0.237788

字符串处理:
length([s]): 返回指定字符串的长度;
sub(r,s,[t]): 以r表示的模式来查找t所表示的字符串中的匹配的内容,并将其第一次出现替换为s所表示的内容;
gsub(r,s,[t]): 以r表示的模式来查找t所表示的字符串中的匹配的内容,并将其所有出现均替换为s所表示的内容;
split(s,a[,r]): 以r为分隔符切割字符s,并将切割后的结果保存至a所表示的数组中;(切片下标从1开始)

[root@linux-node1 ~]# awk -F: '{sub(o,O,$0)}' /etc/passwd
[root@linux-node1 ~]# echo $?
[root@linux-node1 ~]# netstat -tan| awk '/^tcp\>/{split($5,ip,":");print ip[1]}'
0.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
0.0.0.0
192.168.56.11
192.168.56.11
192.168.56.1
[root@linux-node1 ~]# netstat -tan| awk '/^tcp\>/{split($5,ip,":");count[ip[1]]++}END{for(i in count){print i,count[i]}}'
192.168.56.1 1
0.0.0.0 6
192.168.56.11 2

推荐2本书:
sed与awk
Linux命令行与shell脚本编程大全

posted @ 2017-10-22 22:11  ShenghuiChen  阅读(459)  评论(0编辑  收藏  举报