linux-文本处理-sed

sed

gun文档

sed编辑器被称作流编辑器( stream editor),流编辑器则会在编辑器处理数据之前基于预先提供的一组规则来编辑数据流。sed编辑器可以根据命令来处理数据流中的数据, 执行如下操作:

  1. 一次从输入中读取一行数据。
  2. 根据所提供的编辑器命令匹配数据 。
  3. 按照命令修改流中的数据。
  4. 将新的数据输出到STDOUT。

在流编辑器将所有命令与一行数据匹配完毕后,它会读取下一行数据并重复这个过程。在流
编辑器处理完流中的所有数据行后,它就会终止。

格式

sed options script file 
选项
-e script 执行多个脚本,`echo "there is a test"
-f file 将脚本存放至单独的文件,可以使用-f调用
-n

示例

sed 's/hello/world/' input.txt > output.txt

sed -e 's/hello/world/' input.txt > output.txt
sed --expression='s/hello/world/' input.txt > output.txt

echo 's/hello/world/' > myscript.sed
sed -f myscript.sed input.txt > output.txt
sed --file=myscript.sed input.txt > output.txt

如果没有参数-e, -f, --expression, or --file,则sed紧接着的非参数选项会作为脚本执行,紧接着的选项会被作为input file

脚本数据

The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.

sed -f

cat sedtest.sed
s/test/prod/
s/is/are/
echo "there is a test"|sed -f sedtest.sed
there are a prod

s命令:substitute

s/pattern/replacement/flags

替换命令在替换多行中的文本时能正常工作,但默认情况下它只替换每行中出现的第一处。 要让替换命令能够替换一行中不同地方出现的文本必须使用替换标记( substitution flag)。

  • 数字,表明新文本将替换第几处模式匹配的地方;
  • g,表明新文本将会替换所有匹配的文本
  • p,表明原先行的内容要打印出来
  • w file,将替换的结果写到文件中
#替换每行第二处
echo "this is a test"|sed -e 's/test/prod/;s/is/are/2'
this are a prod
echo "this is a test"|sed -e 's/test/prod/;s/is/are/g'
thare are a prod

sed 's/test/trial/w test.txt' data5.txt
This is a trial line.
This is a different line.

cat test.txt
This is a trial line

sed -n

$ cat data5.txt
This is a test line.
This is a different line.
$
$ sed -n 's/test/trial/p' data5.txt
This is a trial line.

-n选项将禁止sed编辑器输出。但p替换标记会输出修改过的行。将二者配合使用的效果就是只输出被替换命令修改过的行。

字符转义

替换包含斜杠的字符串是,必须加反斜杠转义,如

$ sed 's/\/bin\/bash/\/bin\/csh/' /etc/passwd

sed编辑器允许选择其他字符来作为替换命令中的字符串分隔符:

$ sed 's!/bin/bash!/bin/csh!' /etc/passwd

在这个例子中,感叹号被用作字符串分隔符

行号匹配

sed -n 'xp' filename;显示文件X行命令:

sed -n 'x,yp' filename;显示文件X行到Y行的内容:

#只修改了指定行的数据
sed '2s/dog/cat/' tata.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
#修改指定范围的数据
sed '2,3s/dog/cat/' tata.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy dog.

#某行开始的所有行
sed '2,$s/dog/cat/' tata.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.

只显示指定行

sed -n 4,8p file #打印file中的4-8行
sed -n 4p file #打印file中的第4行

模式(文本/ 正则)匹配

正斜线将要指定的pattern封起来。 sed编辑器会将该命令作用到包含指定文本模式的行上。

格式:/pattern/command

cat dataa6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
This is text you want to keep.
This is the last line in the file.

sed '/text/s/you/iam/' dataa6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
This is text iam want to keep.
This is the last line in the file.
sed -n '/text/s/you/iam/p' dataa6.txt
This is text iam want to keep.
sed -n '/text/p' dataa6.txt
This is text you want to keep.

d命令:删除单行

sed '2d' tata.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
sed '2,3d' tata.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.


匹配指定模式

sed '/text/d' dataa6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
This is the last line in the file.

两个模式匹配

sed '/1/,/3/d' dataa6.txt
This is line number 4.

第一行由于匹配到1,所以开始执行删除之后的行,而匹配到3后,停止删除行,所以输出第四行。

随后又匹配到1,开始删除行,之后由于没有匹配到3所以没有停止。

插入和追加

  • 插入( insert)命令( i)会在指定行前增加一个新行
  • 附加( append)命令( a)会在指定行后增加一个新行
sed '[address]command\
new line'
##new line中的文本将会出现在sed编辑器输出中你指定的位置

echo "Test Line 2" | sed 'i\Test Line 1'
Test Line 1
Test Line 2

echo "Test Line 2" | sed 'a\Test Line 1'
Test Line 2
Test Line 1
##追加到指定行
sed '3a\this insert by command' dataa6.txt
This is line number 1.
This is line number 2.
This is line number 3.
this insert by command
This is line number 4.
This is line number 1 again.
This is text you want to keep.
This is the last line in the file.

##追加到匹配的文本行后面
sed '/text/a\this insert by command' dataa6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
This is text you want to keep.
this insert by command
This is the last line in the file.

修改

sed '/text/c\this change by command' dataa6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
this change by command
This is the last line in the file.

同上,可以指定行,也可以根据模式匹配,修改指定行

映射

sed 'y/123/456/' dataa6.txt
This is line number 4.
This is line number 5.
This is line number 6.
This is line number 4.
This is line number 4 again.
This is text you want to keep.
This is the last line in the file.

转换( transform)命令( y) 转换命令会对inchars和outchars值进行一对一的映射。 inchars中的第一个字符会被转换为outchars中的第一个字符,第二个字符会被转换成outchars中的第二个字符。 inchars和outchars的长度必须相同

打印行号

sed '=' dataa6.txt
1
This is line number 1.
2
This is line number 2.
3
This is line number 3.
4
This is line number 4.

读取文件

sed '3r tata.txt' dataa6.txt
This is line number 1.
This is line number 2.
This is line number 3.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
This is line number 4.
This is line number 1 again.
This is text you want to keep.
This is the last line in the file.

在第三行读取 tata.txt文件内容,并追加到当前流中。

next单行

cat data8.txt
This is the header line.

This is a data line.

This is the last line.

##删除所有空白行
sed '/^$/d' data8.txt
This is the header line.
This is a data line.
This is the last line.

#删除匹配到header后的下一行
sed '/header/{n;d}' data8.txt
This is the header line.
This is a data line.

This is the last line.

小写的n命令会告诉sed编辑器移动到数据流中的下一文本行,而不用重新回到命令的最开始
再执行一遍。

next 多行

cat data9.txt
This is the header line.
This is a first line.
This is a second line.
This is the last line.
sed '/first/{ N ; s/\n/ / }' data9.txt
This is the header line.
This is a first line. This is a second line.
This is the last line.

匹配到first的那行文本后,使用N命令将下一行命令合并到该行,然后使用替换命令替换换行符,以达到合并两行的目的。

cat data10.txt
On Tuesday, the Linux System
Administrator's group meeting will be held.System
Administrator,hello world!!
All System Administrators should attend
All System Administrators should attend..

###匹配System Administrators
sed 'N
s/System\nAdministrator/Desk\nUser/
s/System Administrator/Desk User/
' data10.txt
On Tuesday, the Linux Desk
User's group meeting will be held.System
Administrator,hello world!!
All Desk Users should attend
All System Administrators should attend..

单引号声明多行命令,再次遇到单引号时会执行命令。

N会合并当前行与下一行,所以能匹配到换行的System\nAdministrator

而最后一行,由于没有下一行能进行合并,所以N命令会丢弃最后一行。所以单行行号命令应放在N的前面

可以看到第2和第3行的System Administrator并没有被替换,所以这里的合并也有一定的局限,就是固定合并(1,2)(3,4)以此类推

多行删除

cat data11.txt
On Tuesday, the Linux System
Administrator's group meeting will be held.
hello world!!
All System Administrators should attend
All System Administrators should attend..


sed 'N ; /System\nAdministrator/D' data11.txt
Administrator's group meeting will be held.
hello world!!
All System Administrators should attend
All System Administrators should attend..
sed 'N ; /System\nAdministrator/d' data11.txt
hello world!!
All System Administrators should attend
All System Administrators should attend..


##注意这里区别
sed 'N ; /System\nAdministrator/D' data10.txt
Administrator,hello world!!
All System Administrators should attend
All System Administrators should attend..

D命令会删除模式空间的第一行

d配合N使用时,会把匹配到的两行都删除,而D只会删除第一行,即删除到换行符(含换行符)为止的所有字符。

注意data10.txt和data11.txt文本的区别

在D删除了第一行后,会重新开始多行匹配模式,所以这里第一第二行都被删除了,没有像上文所述固定(1,2)

多行打印P

sed -n 'N ; /System\nAdministrator/p' data11.txt
On Tuesday, the Linux System
Administrator's group meeting will be held.
sed -n 'N ; /System\nAdministrator/P' data11.txt
On Tuesday, the Linux System

与多行删除类似,P只会打印匹配到的第一行。

保持空间

命 令 命 令
h 将模式空间复制到保持空间
H 将模式空间附加到保持空间
g 将保持空间复制到模式空间
G 将保持空间附加到模式空间
x 交换模式空间和保持空间的内容
cat data2.txt
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.

sed -n '/first/ {h ; p ; n ; p ; g ; p }' data2.txt
This is the first data line.
This is the second data line.
This is the first data line.
  1. sed脚本在地址中用正则表达式来过滤出含有单词first的行;
  2. 当含有单词first的行出现时, h命令将该行放到保持空间;
  3. p命令打印模式空间也就是第一个数据行的内容;
  4. n命令提取数据流中的下一行( This is the second data line),并将它放到模式空间;
  5. p命令打印模式空间的内容,现在是第二个数据行;
  6. g命令将保持空间的内容( This is the first data line)放回模式空间,替换当前文本;
  7. p命令打印模式空间的当前内容,现在变回第一个数据行了。

$美元符

美元符表示数据流中的最后一行文本

排除命令

sed -n '/hello/!p' data10.txt
On Tuesday, the Linux System
Administrator's group meeting will be held.System
All System Administrators should attend
All System Administrators should attend..

感叹号命令( !)用来排除( negate)命令,也就是让原本会起作用的命令不起作用。

这里表示除了匹配到hello的行都打印出来

cat data7.txt
This is a test of a line.
The cat is sleeping.
That is a very nice hat.
This test is at line four.
at ten o'clock we'll go home.

sed -n '{1!G ;h;$p}' data7.txt
at ten o'clock we'll go home.
This test is at line four.
That is a very nice hat.
The cat is sleeping.
This is a test of a line.

image-20210531234241074

将文本倒叙,当然也可以使用tac命令

这里的1!G表示在第一行不使用G命令,否则这里会多出一个空白行

当到达数据流中的最后一行时,需要打印模式空间的整个数据流,命令为:$p。表示最后一行时才打印。

分支branch

sed编辑器提供了一种方法,可以基于地址、地址模式或地址区间排除一整块命令。

分支( branch)命令b的格式如下:

[address]b [label]

address参数决定了哪些行的数据会触发分支命令。 label参数定义了要跳转到的位置。如果没有加label参数,跳转命令会跳转到脚本的结尾,即b后面的脚本都不执行。

cat data2.txt
This is the head line.
This is the first data line.
This is the second data line.
This is the last line.
sed '{2,3b;s/This is the/Is this/; s/line./test?/}' data2.txt
Is this head test?
This is the first data line.
This is the second data line.
Is this last test?

分支命令在数据流中的第2行和第3行处跳过了两个替换命令。

要是不想直接跳到脚本的结尾,可以为分支命令定义一个要跳转到的标签。标签以冒号开始,最多可以是7个字符长度

 sed '{2,3b t1;s/This is the/Is this/;:t1 s/line./test?/}' data2.txt
Is this head test?
This is the first data test?
This is the second data test?
Is this last test?

:label 指定标签名。

则在触发分支命令时,会跳转只标签对应的位置继续执行脚本,而不会直接忽略分支命令b后面所有的脚本

测试命令test

测试( test)命令( t)也可以用来改变sed编辑器脚本的执行流程。测试命令会根据替换命令的结果跳转到某个标签,而不是根据地址进行跳转。

测试命令的格式。

[address]t [label]  
sed '{s/first/matched/;t; s/This is the/No matched on/}' data2.txt
No matched on head line.
This is the matched data line.
No matched on second data line.
No matched on last line.

如果匹配到first成功,则跳转至脚本最后,所以这里第二行没有替换This is the

也可以使用标签中,指定跳转而脚本的位置。

通配符 .和符号&

echo "The cat sleeps in his hat." | sed 's/.at/"cat"/'
The "cat" sleeps in his hat.
echo "The caaaet sleeps in his hat." | sed 's/c.*et/"&"/g'
The "caaaet" sleeps in his hat.
#本意时想把匹配到的单词加双引号
echo "The cat sleeps in his hat." | sed 's/.at/".at"/g'
The ".at" sleeps in his ".at".
#可以使用符号&
echo "The cat sleeps in his hat." | sed 's/.at/"&"/g'
The "cat" sleeps in his "hat".

&符号可以用来代表替换命令中的匹配的模式。不管模式匹配的是什么样的文本,你都可以在替代模式中使用&符号来使用这段文本。

文本增加空白行

#先删除原来的空白行,避免空白行导致多个空白行
#$!表示最后一行不执行G命令
sed '/^$/d ; $!G' data7.txt
This is a test of a line.

The cat is sleeping.

That is a very nice hat.

This test is at line four.

at ten o'clock we'll go home.

当启动sed编辑器时,保持空间只有一个空行。 执行G命令时将空白行插入当前流中

文本加上行编号

sed '=' data7.txt |sed 'N; s/\n/ /'
1 This is a test of a line.
2 The cat is sleeping.
3 That is a very nice hat.
4 This test is at line four.
5 at ten o'clock we'll go home.

cat -n data7.txt
     1  This is a test of a line.
     2  The cat is sleeping.
     3  That is a very nice hat.
     4  This test is at line four.
     5  at ten o'clock we'll go home.

cat 增加了不必要的空格

posted @ 2021-06-26 23:21  froggengo  阅读(153)  评论(0编辑  收藏  举报