Awk基本入门[1] Awk Syntax and Basic Commands
awk是一个操作处理文本文件的强大工具,尤其是处理记录型的文本,也就是每行文本包含多个用分隔符分隔的域。甚至在没有输入文本的情况下也可以做一些逻辑处理。
在接下来的示例中,我们会多次用以下的文档作为操作的对象:
employee.txt is a comma delimited file that contains 5 employee records in the following format:
employee-number,employee-name,employee-title
Create the file:
$ vi employee.txt 101,John Doe,CEO 102,Jason Smith,IT Manager 103,Raj Reddy,Sysadmin 104,Anand Ram,Developer 105,Jane Miller,Sales Manager
items.txt sample file
items.txt is a comma delimited text file that contains 5 item records in the following format:
item-number,item-description,item-category,cost,quantity-available
Create the file:
$ vi items.txt 101,HD Camcorder,Video,210,10 102,Refrigerator,Appliance,850,2 103,MP3 Player,Audio,270,15 104,Tennis Racket,Sports,190,20 105,Laser Printer,Office,475,5
items-sold.txt sample file
items-sold.txt is a space delimited text file that contains 5 item records. Each record is for one particular item that contains the item
number followed by number of items sold for that month (during the last 6 months).
Create the file:
$ vi items-sold.txt 101 2 10 5 8 10 12 102 0 1 4 3 0 2 103 10 6 11 20 5 13 104 2 3 4 0 6 5 105 10 2 5 7 12 6
1、Awk Command Syntax
Basic Awk Syntax:
awk -Fs '/pattern/ {action}' input-file (or) awk -Fs '{action}' intput-file
In the above syntax:
- -F is the field separator. If you don't specify, it will use an empty space as field delimiter.
- The /pattern/ and the {action} should be enclosed inside single quotes.
- /pattern/ is optional. If you don't provide it, awk will process all the records from the input-file. If you specify a pattern, it will process only those records from the input-file that match the given pattern.
- {action} - These are the awk programming commands, which can be one or multiple awk commands. The whole action block (including all the awk commands together) should be closed between { and }
- input-file - The input file that needs to be processed.
也可以将要执行的命令放到一个单独的文件中,然后通过以下的语法来进行调用:
awk -Fs -f myscript.awk input-file
在该语法中,myscript.awk中存放的是要执行的命令。
2、Awk Program Structure (BEGIN, body, END block)
A typical awk program has following three blocks.
1. BEGIN Block
Syntax of begin block:
BEGIN { awk-commands }
The begin block gets executed only once at the beginning, before awk starts executing the body block for all the lines in the input file.
2. Body Block
Syntax of body block:
/pattern/ {action}
The body block gets executed once for every line in the input file.
3. END Block
Syntax of end block:
END { awk-commands }
The end block gets executed only once at the end, after awk completes executing the body block for all the lines in the input-file.
awk执行流程图如下所示:
3、Print Command
默认情况下,print命令(没有参数)会打印输出整条记录,如下例所示,该示例等同于命令'cat employee.txt':
$ awk '{print}' employee.txt 101,John Doe,CEO 102,Jason Smith,IT Manager 103,Raj Reddy,Sysadmin 104,Anand Ram,Developer 105,Jane Miller,Sales Manager
你也可以通过传递特定的域号给print命令,以只打印特定的域,假如我们只想打印雇员的名字(第二列),则可通过以下命令:
$ awk -F ',' '{print $2}' employee.txt John Doe Jason Smith Raj Reddy Anand Ram Jane Miller
因为默认情况下的分隔符是空格,所以我们需要通过 -F ',' 来指定分隔符为逗号来正确的获取需要的列。
4、模式匹配
你可以只对匹配特定模式的行执行command,例如:
$ awk -F ',' '/Manager/ {print $2, $3}' employee.txt Jason Smith IT Manager Jane Miller Sales Manager
该示例打印经理的姓名和职位。
$ awk -F ',' '/^102/ {print "Emp id 102 is", $2}' employee.txt Emp id 102 is Jason Smith
该示例只打印编号以102开头的员工的姓名