Linux文本处理三剑客之grep过滤器 - 杨文昭

一，格式

grep [选项]… 查找条件目标文件

-E ：开启扩展（Extend）的正则表达式
-c ：计算找到 '搜寻字符串' 的次数
-i ：忽略大小写的不同，所以大小写视为相同
-o ：只显示被模式匹配到的字符串
-v ：反向选择，亦即显示出没有 '搜寻字符串' 内容的那一行！（反向查找，输出与查找条件不相符的行）
--color=auto ：可以将找到的关键词部分加上颜色的显示喔！
-n ：顺便输出行号

grep -c root /etc/passwd        //统计root字符总行数;或cat /etc/passwd | grep root 
grep  -i "the"   web.sh          //不区分大小写查找the所有的行
grep -v root /etc/passwd           //将/etc/passwd，将没有出现 root 的行取出来
cat web.sh |grep -v '^$' >test.txt  //将非空行写入到test.txt文件
ifconfig ens33 |grep -o "[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+"|head -1                       //过滤出IP

二，示例

操作的案列

（1）查找特定字符
查找特定字符非常简单，如执行以下命令即可从 test.txt 文件中查找出特定字符“the”所在位置。其中“-n”表示显示行号、“-i”表示不区分大小写。命令执行后，符合匹配标准的字符， 字体颜色会变为红色（本章中全部通过加粗显示代替）。

grep -n 'the' test.txt


若反向选择，如查找不包含“the”字符的行，则需要通过 grep 命令的“-v”选项实现，并配合“-n”一起使用显示行号。

grep -vn 'the' test.txt


（2）利用中括号“[]”来查找集合字符

想要查找“shirt”与“short”这两个字符串时，可以发现这两个字符串均包含“sh”与“rt”。此时执行以下命令即可同时查找到“shirt”与“short”这两个字符串，其中“[]”中无论有几个字符， 都仅代表一个字符，也就是说“[io]”表示匹配“i”或者“o”。

grep -n 'sh[io]rt' test.txt


若要查找包含重复单个字符“oo”时，只需要执行以下命令即可。

grep -n 'oo' test.txt


若查找“oo”前面不是“w”的字符串，只需要通过集合字符的反向选择“[^]”来实现该目的。例如执行“grep -n‘[^w]oo’test.txt”命令表示在 test.txt 文本中查找“oo”前面不是“w”的字符串。

grep -n '[^w]oo' test.txt

3:The home of Football on BBC Sport online. 5:google is the best tools for search keyword. 11:#woood #
12:#woooooood #
14:I bet this place is really spooky late at night!



在上述命令的执行结果中发现“woood”与“wooooood”也符合匹配规则，二者均包含“w”。其实通过执行结果就可以看出，符合匹配标准的字符加粗显示，而上述结果中可以得知， “#woood #”中加粗显示的是“ooo”，而“oo”前面的“o”是符合匹配规则的。同理“#woooooood #”也符合匹配规则。
若不希望“oo”前面存在小写字母，可以使用“grep -n‘[^a-z]oo’test.txt”命令实现，其中

“a-z”表示小写字母，大写字母则通过“A-Z”表示。


[root@localhost ~]# grep -n '[^a-z]oo' test.txt
3:The home of Football on BBC Sport online.


查找包含数字的行可以通过“grep -n ‘[0-9]’test.txt”命令来实现。

[root@localhost ~]# grep -n '[0-9]' test.txt
4:the tongue is boneless but it breaks bones.12! 7:PI=3.141592653589793238462643383249901429


（3）查找行首“^”与行尾字符“$”
基础正则表达式包含两个定位元字符：“^”（行首）与“$”（行尾）。在上面的示例中， 查询“the”字符串时出现了很多包含“the”的行，如果想要查询以“the”字符串为行首的行，则可以通过“^”元字符来实现。

[root@localhost ~]# grep -n '^the' test.txt
4:the tongue is boneless but it breaks bones.12!


查询以小写字母开头的行可以通过“^[a-z]”规则来过滤，查询大写字母开头的行则使用
“^[A-Z]”规则，若查询不以字母开头的行则使用“^[^a-zA-Z]”规则。

[root@localhost ~]# grep -n '^[a-z]' test.txt
1:he was short and fat.
4:the tongue is boneless but it breaks bones.12! 5:google is the best tools for search keyword.
8:a wood cross!
[root@localhost ~]# grep -n '^[A-Z]' test.txt
2:He was wearing a blue polo shirt with black pants. 3:The home of Football on BBC Sport online.
6:The year ahead will test our political establishment to the limit. 7:PI=3.141592653589793238462643383249901429
9:Actions speak louder than words

13:AxyzxyzxyzxyzC
14:I bet this place is really spooky late at night! 15:Misfortunes never come alone/single.
16:I shouldn't have lett so tast.
[root@localhost ~]# grep -n '^[^a-zA-Z]' test.txt
11:#woood # 12:#woooooood #


“^”符号在元字符集合“[]”符号内外的作用是不一样的，
在“[]”符号内表示反向选择，在“[]” 符号外则代表定位行首。
反之，若想查找以某一特定字符结尾的行则可以使用“$”定位符。
例如，执行以下命令即可实现查询以小数点（.）结尾的行。因为小数点（.）在正则表达式中也是一个元字符（后面会讲到），所以在这里需要用转义字符“\”将具有特殊意义的字符转化成普通字符。

[root@localhost ~]# grep -n '\.$' test.txt
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants. 3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit. 15:Misfortunes never come alone/single.
16:I shouldn't have lett so tast.


当查询空白行时，执行“grep -n‘^$’test.txt”命令即可。

[root@localhost ~]# grep -n '^$' test.txt
10:


（4）查找任意一个字符“.”与重复字符“*”
前面提到，在正则表达式中小数点（.）也是一个元字符，代表任意一个字符。例如执行以下命令就可以查找“w??d”的字符串，即共有四个字符，以 w 开头 d 结尾。


[root@localhost ~]# grep -n 'w..d' test.txt
5:google is the best tools for search keyword.

8:a wood cross!
9:Actions speak louder than words


在上述结果中，“wood”字符串“w..d”匹配规则。若想要查询 oo、ooo、ooooo 等资料， 则需要使用星号（*）元字符。但需要注意的是，
“*”代表的是重复零个或多个前面的单字符。 “o*”表示拥有零个（即为空字符）或大于等于一个“o”的字符，因为允许空字符，
所以执行“grep -n 'o*' test.txt”命令会将文本中所有的内容都输出打印。如果是“oo*”，则第一个 o 必须存在， 第二个 o 则是零个或多个 o，所以凡是包含 o、oo、ooo、ooo，等的资料都符合标准。同理，若查询包含至少两个 o 以上的字符串，则执行“grep -n 'ooo*' test.txt”命令即可。

[root@localhost ~]# grep -n 'ooo*' test.txt
 3:The home of Football on BBC Sport online. 5:google is the best tools for search keyword. 8:a wood cross!
11:#woood # 12:#woooooood #
14:I bet this place is really spooky late at night!

查询以 w 开头 d 结尾，中间包含至少一个 o 的字符串，执行以下命令即可实现。

[root@localhost ~]# grep -n 'woo*d' test.txt
8:a wood cross! 11:#woood # 12:#woooooood #

执行以下命令即可查询以 w 开头 d 结尾，中间的字符可有可无的字符串。

[root@localhost ~]# grep -n 'w.*d' test.txt
1:he was short and fat.
5:google is the best tools for search keyword. 8:a wood cross!
9:Actions speak louder than words 11:#woood #
12:#woooooood #


执行以下命令即可查询任意数字所在行。


[root@localhost ~]# grep -n '[0-9][0-9]*' test.txt 4:the tongue is boneless but it breaks bones.12! 7:PI=3.141592653589793238462643383249901429


（5）查找连续字符范围“{}”
在上面的示例中，使用了“.”与“*”来设定零个到无限多个重复的字符，如果想要限制一个范围内的重复的字符串该如何实现呢？例如，查找三到五个 o 的连续字符，这个时候就需要使用基础正则表达式中的限定范围的字符“{}”。因为“{}”在 Shell 中具有特殊意义，所以在使用“{}”字符时，需要利用转义字符“\”，将“{}”字符转换成普通字符。“{}”字符的使用方法如下所示。
① 查询两个 o 的字符。

[root@localhost ~]# grep -n 'o\{2\}' test.txt
 3:The home of Football on BBC Sport online. 
 5:google is the best tools for search keyword. 
 8:a wood cross!
11:#woood # 12:#woooooood #
14:I bet this place is really spooky late at night!


② 查询以 w 开头以 d 结尾，中间包含 2～5 个 o 的字符串。

[root@localhost ~]# grep -n 'wo\{2,5\}d' test.txt
8:a wood cross! 11:#woood #


③ 查询以 w 开头以 d 结尾，中间包含 2 个或 2 个以上 o 的字符串。


[root@localhost ~]# grep -n 'wo\{2,\}d' test.txt
8:a wood cross!

11:#woood # 12:#woooooood #

posted on 2022-02-14 20:52 杨文昭阅读(376) 评论(0) 编辑收藏举报

刷新页面返回顶部