sed进阶
下面这些命令未必经常会用到,但当需要时,知道这些肯定是件好事。
一、多行命令
sed命令通常是对一行数据进行处理,然后下一行重复处理。
sed编辑器包含了三个可用来处理多行文本的特殊命令
- N:将数据流中的下一行加进来创建一个多行组来处理
- D:删除多行组中的一行
- P:打印多行组中的一行
1.1 next命令
两种删除匹配的下一行的办法:
cat data1.txt This is the header line. This is a data line. This is the last line. sed '/^$/d' data1.txt This is the header line. This is a data line. This is the last line. sed '/header/{n ; d}' data1.txt This is the header line. This is a data line. This is the last line.
1.2 合并文本行
用大写N,可以一次性读取两行。区别与n是移动到下一行。
$ sed '/first/{ N ; s/\n/ / }' data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line. $ cat data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line. $ cat data3.txt On Tuesday, the Linux System Administrator's group meeting will be held. All System Administrators should attend. Thank you for your attendance. $ sed 'N ; s/System.Administrator/Desktop User/' data3.txt On Tuesday, the Linux Desktop User's group meeting will be held. All Desktop Users should attend. Thank you for your attendance.
在N后面最好匹配多行命令,而单行命令则可以放在N前面,如:
$ sed 'N > s/System\nAdministrator/Desktop\nUser/ > s/System Administrator/Desktop User/ > ' data4.txt On Tuesday, the Linux Desktop User's group meeting will be held. All System Administrators should attend. $ cat data4.txt On Tuesday, the Linux System Administrator's group meeting will be held. All System Administrators should attend. $ sed ' > s/System Administrator/Desktop User/ > N > s/System\nAdministrator/Desktop\nUser/ > ' data4.txt On Tuesday, the Linux Desktop User's group meeting will be held. All Desktop Users should attend.
1.3 删除多行命令
如果在N后面用d,就会把多行都一起删除。但是如果用D,就会删除到\n为止。
$ sed 'N ; /System\nAdministrator/d' data4.txt All System Administrators should attend. $ cat data4.txt On Tuesday, the Linux System Administrator's group meeting will be held. All System Administrators should attend. $ sed 'N ; /System\nAdministrator/D' data4.txt Administrator's group meeting will be held. All System Administrators should attend.
然后有删除header前的空白行
$ sed '/^$/{N ; /header/D}' data5.txt This is the header line. This is a date line. This is the last line. $ sed '{N ; /header/D}' data5.txt This is a date line. This is the last line. $ sed '{N ; /header/d}' data5.txt This is a date line. This is the last line. $ cat data5.txt This is the header line. This is a date line. This is the last line.
类似的和D相似的,P也是会和N配合输出第一行
$ cat data3.txt On Tuesday, the Linux System Administrator's group meeting will be held. All System Administrators should attend. Thank you for your attendance. $ sed -n 'N ; /System\nAdministrator/P' data3.txt On Tuesday, the Linux System
二、保持空间
sed编辑器有一块称作保持空间的缓冲区域。在处理某些行时,可以用保持空间来临时保存一些行。
- h:将模式空间那关键复制到保持空间
- H:将模式空间附加到保持空间
- g:将保持空间复制到模式空间
- G:将保持空间附加到模式空间
- x:交换模式空间和保持空间的内容
$ cat data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line. $ sed -n '/first/ {h ; p; n ; p ; g ; p }' data2.txt This is the first data line. This is the second data line. This is the first data line. $ sed -n '/first/ {h ; n ; p ; g ; p }' data2.txt This is the second data line. This is the first data line.
三、排除命令
感叹号命令(!)用来排除命令
$ sed -n '/header/!p' data2.txt This is the first data line. This is the second data line. This is the last line. $ sed 'N; > s/System\nAdministrator/Desktop\nUser/ > s/System Administrator/Desktop User/ > ' data4.txt On Tuesday, the Linux Desktop User's group meeting will be held. All System Administrators should attend. $ sed '$!N; > s/System\nAdministrator/Desktop\nUser/ > s/System Administrator/Desktop User/ > ' data4.txt On Tuesday, the Linux Desktop User's group meeting will be held. All Desktop Users should attend.
然后是将文件反转的例子:
$ cat data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line. $ sed -n '{1!G ; h ; $p }' data2.txt This is the last line. This is the second data line. This is the first data line. This is the header line.
四、改变流
4.1 分支指令
基于地址、地址模式或地址区间排除一整块命令,这允许你只对数据流中的特定行执行一组命令。
分支命令b格式:[address]b [label]
address参数:决定了那些行的数据会触发分支明林。
label参数:定义了要跳转到的位置。
$ cat data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line. $ sed '{2,3b ; s/This is/Is this/ ; s/line./test?/}' data2.txt Is this the header test? This is the first data line. This is the second data line. Is this the last test?
上面的例子里,分支跳过了2,3两行。如果不想直接跳转到脚本结尾,可为分支命令定义一个要跳转到的标签。
标签最多长7个字符,例子:
chen@ubuntu:~/shell/ch21$ sed '{/first/b jump1 ; s/This is the/No jump on/ > :jump1 > s/This is the/Jump here on/}' data2.txt No jump on header line. Jump here on first data line. No jump on second data line. No jump on last line. chen@ubuntu:~/shell/ch21$ cat data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line.
如果文中出现first,程序就跳转到jump1脚本行。如果没有匹配,sed则会继续执行脚本中的命令,包括分支标签后的命令。
chen@ubuntu:~/shell/ch21$ echo "This, is, a, test, to, remove, commas." | sed -n '{ > :start > s/,//1p > b start > }' This is, a, test, to, remove, commas. This is a, test, to, remove, commas. This is a test, to, remove, commas. This is a test to, remove, commas. This is a test to remove, commas. This is a test to remove commas. ^C
要能停止循环,只要加上模式匹配在命令b前面就好了
chen@ubuntu:~/shell/ch21$ echo "This, is, a, test, to, remove, commas." | sed -n '{ > :start > s/,//1p > /,/b start > }' This is, a, test, to, remove, commas. This is a, test, to, remove, commas. This is a test, to, remove, commas. This is a test to, remove, commas. This is a test to remove, commas. This is a test to remove commas.
4.2 测试指令
类似于分支命令,测试test命令(t)也可以用来改变sed编辑器脚本执行流程。
如果天幻命令成功匹配并替换了一个模式,测试命令就会跳转到指定标签。如果替换命令未能匹配指定模式,测试命令就不会跳转。
测试命令格式与分支命令格式相同:[address]t [label]
和分支命令一样,如果没有标签的情况下,如果测试成功,sed会跳转到脚本结尾。
测试命令基本上就是一个if-then
chen@ubuntu:~/shell/ch21$ sed '{ > s/first/matched/ > t > s/This is the/No match on/ > }' data2.txt No match on header line. This is the matched data line. No match on second data line. No match on last line. chen@ubuntu:~/shell/ch21$ cat data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line.
测试命令的循环方法:
chen@ubuntu:~/shell/ch21$ echo "This, is, a, test, to, remove, commas." | sed -n '{ > :start > s/,//1p > t start > }' This is, a, test, to, remove, commas. This is a, test, to, remove, commas. This is a test, to, remove, commas. This is a test to, remove, commas. This is a test to remove, commas. This is a test to remove commas.
五、模式替代
在使用通配符时,很难知道到底哪些文本会匹配模式。
chen@ubuntu:~/shell/ch21$ echo "The cat sleeps in his hat." | sed 's/cat/"cat"/'The "cat" sleeps in his hat. chen@ubuntu:~/shell/ch21$ echo "The cat sleeps in his hat." | sed 's/.at/".at"/g' The ".at" sleeps in his ".at".
5.1 &符号
用&符号可以用来代表替换命令中的匹配的模式。不管匹配出来什么样的文本,都可以使用&符号,来使用这段文本。
myfly2@ubuntu:~/shell/ch21$ echo "The cat sleeps in his hat." | sed 's/.at/"&"/g' The "cat" sleeps in his "hat".
5.2 替代单独的单词
有时候我们不需要整个字符串,只想提取字符串的一部分。
sed编辑器用圆括号来定义替换模式中的子模式,然后用特殊字符来引用每个子模式。
chen@ubuntu:~/shell/ch21$ echo "The System Administrator manual" | sed ' > s/\(System\) Administrator/\1 User/' The System User manual chen@ubuntu:~/shell/ch21$ echo "The System Administrator manual" | sed ' s/System \(Administrator\)/\1 User/' The Administrator User manual chen@ubuntu:~/shell/ch21$ echo "That furry cat is pretty" | sed 's/furry \(.at\)/\1/' That cat is pretty chen@ubuntu:~/shell/ch21$ echo "That furry hat is pretty" | sed 's/furry \(.at\)/\1/' That hat is pretty #当需要在两个或多个子模式间插入文本时,这个特性尤其有用 chen@ubuntu:~/shell/ch21$ echo "1234567" | sed '{ > :start > s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/ > t start > }' 1,234,567 #分成两部分: #.*[0-9] #[0-9]{3} #第一个子模式是以数字结尾的任意长度的字符。 #第二个子模式是若干组三位数字
六、在脚本中使用sed
使用包装脚本,sed脚本过程繁琐,如果脚本很长的话。可以将sed编辑器命令放到shell包装脚本中。
这样可以不用每次都键入脚本。
chen@ubuntu:~/shell/ch21$ cat reverse.sh #!/bin/bash # Shell wrapper for sed editor script. # to reverse text file lines. # sed -n '{ 1!G ; h ; $p }' $1 # chen@ubuntu:~/shell/ch21$ chmod +x reverse.sh chen@ubuntu:~/shell/ch21$ ls data2.txt data4.txt reverse.sh chen@ubuntu:~/shell/ch21$ ./reverse.sh data2.txt This is the last line. This is the second data line. This is the first data line. This is the header line.
6.2 重定向sed的输出
chen@ubuntu:~/shell/ch21$ cat fact.sh #!/bin/bash # Add commas to number in factorial answer # factorial=1 counter=1 number=$1 # while [ $counter -le $number ] do factorial=$[ $factorial * $counter ] counter=$[ $counter + 1 ] done # result=$(echo $factorial | sed '{ :start s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/ t start }') # echo "The result is $result" # chen@ubuntu:~/shell/ch21$ ./fact.sh 20 The result is 2,432,902,008,176,640,000
七、创建sed实用工具
7.1 加倍行间距
chen@ubuntu:~/shell/ch21$ sed 'G' data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line. chen@ubuntu:~/shell/ch21$ sed '$!G' data2.txt This is the header line. This is the first data line. This is the second data line. This is the last line.
7.2 对可能还有空白行的文件加倍行间距
#先删除原来空行,然后再加空行 chen@ubuntu:~/shell/ch21$ sed '/^$/d;$!G' data6.txt This is line one. This is line two. This is line three. This is line four. chen@ubuntu:~/shell/ch21$
7.3 给文件中的行编行
chen@ubuntu:~/shell/ch21$ sed '=' data2.txt 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 4 This is the last line. chen@ubuntu:~/shell/ch21$ sed '=' data2.txt | sed 'N; s/\n/ /' 1 This is the header line. 2 This is the first data line. 3 This is the second data line. 4 This is the last line. chen@ubuntu:~/shell/ch21$ #=号命令增加行号 #N命令将两行合并成一行 #然后把换行符换成空格
7.4 打印末尾行
chen@ubuntu:~/shell/ch21$ sed '{ :start $q ; N ; 11,$D b start }' data7.txt This is line 6. This is line 7. This is line 8. This is line 9. This is line 10. This is line 11. This is line 12. This is line 13. This is line 14. This is line 15. chen@ubuntu:~/shell/ch21$ cat data7.txt This is line 1. This is line 2. This is line 3. This is line 4. This is line 5. This is line 6. This is line 7. This is line 8. This is line 9. This is line 10. This is line 11. This is line 12. This is line 13. This is line 14. This is line 15.
7.5 删除行
删除不需要的空白行,如果有选择的删除空白行,需要一点创造力。
7.5.1 删除连续的空白行
chen@ubuntu:~/shell/ch21$ sed '/./,/^$/!d' data8.txt This is line one. This is line two. This is line three. This is line four. chen@ubuntu:~/shell/ch21$ cat data8.txt This is line one. This is line two. This is line three. This is line four.
7.5.2 删除开头的空白行
chen@ubuntu:~/shell/ch21$ cat data9.txt This is line one. This is line two. chen@ubuntu:~/shell/ch21$ sed '/./,$!d' data9.txt This is line one. This is line two.
7.5.3 删除结尾的空白行
1 chen@ubuntu:~/shell/ch21$ sed '{ 2 > :start 3 > /^\n*$/{$d ; N ; b start} 4 > }' data10.txt 5 This is the first line. 6 This is the seconde line. 7 chen@ubuntu:~/shell/ch21$ cat data10.txt 8 This is the first line. 9 This is the seconde line.
7.6 删除HTML标签