0x61——shell编程学习笔记(4)

这篇主要讲awk

basic syntax of awk:
awk 'BEGIN {start_action} {action} END {stop_action}' filename
(1) awk '{print $1}' input_file
print first column in each row as shown below
(2) awk '{BEGIN {sum=0}{sum=sum+$5}END{print sum}' filename
print the sum of the value in the 5th column
(3) awksum.sh and ls.log in ~/Documents/codes/sh and
awk -f awksum.sh ls.log
This will run the script in awksum.sh and displays the sum of the 5th column in the ls.log
(4) awk '{if($9=="demo_file") print $0;}' ls.log
This awk command checks for the string "demo_file" in the 9th column and if it finds a mathch then it will print the entire line.
(5) awk 'BEGIN{for(i=1;i<=5;++i) print "square of ", i, " is", i*i)'
Obviously...
(6) FS - Input field separator variable

By default, awk assume that fields in a file are separated by space characters. We can use the FS variable to tell about the delimiter if use other character.
    awk 'BEGIN{FS=":"}{print $2}' ls.log
    OR
    awk -F: '{print $2} ls.log
    
    OFS - Output field separator variable
We can change this default separator using OFS variable as
    awk 'BEGIN{OFS=";"}{print $4,$5}' ls.log
    
    NF - Number of fields variable
    NR - Number of records variable
    FILENAME - awk浏览的文件名
    awk '{print NF,NR, FILENAME}' ls.log

(7) Filtering lines using awk split function

split(string, array, delimiter)
awk '{
    split($2, arr,",");
    if(arr[3] == "UNIX")
        print $0;
    }' file.txt

(8) substitutes t for the first occurrence of the regular expression r in the string s. If s is not given, $0 is used
sub(r, t, s)

AAA 1    ->    AAA 1
BBB 2          BBB 2
CCC 3          CCC 3
AAA 4          AAA 4
AAA 5          AAA 5
BBB 6          BBB 6
CCC 7          CCC 7
AAA 8          ZZZ 8 
BBB 9          BBB 9
AAA 0          AAA 0
awk 'BEGIN {count=0}
{
    if($1 == "AAA")
    {
        count++;
    }
    if(count == 4)
    {
        sub("AAA", "ZZZ", $1)
    }
}
{
    print $0
}' file.txt

(9) use printf
awk '{printf "%d\n", 012}' inpt_file
(10) 匹配正则表达式,用~,awk的正则表达式支持'?'
awk '$1~/pattern/' filename
然后会输出整行,也可以在if里面使用
awk '{if{$2~/pattern/) print $1' filename,类似的!~就是不匹配的意思。
(11) 内置的字符串函数
1. gsub(r,s[,t]) - 在整个t中(缺省时为$0)用s替代r
替换4842到4899
awk 'gsub(/4842/,4899) {print $0}' grade.txt
2. index(s,t) - 返回s中字符串t的第一个位置
必须用双引号将字符串括起来
awk 'BEGIN {print index("Bunny","ny")}' filename
3. length(s) - 返回s长度
4. match(s,r) - 测试s是否包含匹配r的字符串
awk 'BEGIN {print match('ANCD", /d/)}'
因为没有d,所以输出0.否则输出匹配到的位置。
5. split(s,a,fs) - 以fs为分隔符,将s分成数组a,并返回数组a的长度
awk 'BEGIN {print split("123#456#789", array, "#")}'
6. sub(r, s, str) - 用str中最s替代左边最长的被r匹配的子串
awk '$1=="J.Troll" {sub(/26/, "29", $0); print $0}' grade.txt
7. subsr(s,p[,n]) - 返回字符串s中从p开始长度为n(缺省时就是所有)的后缀部分
8. 从shell中向awk传入字符串
例子一:
$ echo "Stand-by" | awk '{print length($0)}'
例子二:
STR="mydoc.txt"
echo $STR | awk '{print substr($STR,7)}'
(12) awk脚本
1.传递变量
#!/usr/bin/awk -f
# to call: awksum.sh PRE=STR OFS=":" grade.txt
BEGIN {sum=0}
{sum=sum+$5}
END {print PRE,sum}
第一行表明用awk运行,第二行就是个注释,告诉你怎么用。PRE就可以传进去了
2.for循环遍历数组
#!/bin/awk -f
BEGIN{
record="123#456#789";
split(record, myarray, "#")}
END { for (i in myarray) {print myarray[i]}}

posted @ 2012-06-19 16:13  cuero  阅读(312)  评论(0编辑  收藏  举报