Linux 文本处理三剑客之grep

grep 简介
grep 的使用

Linux 文本处理有三大神器：grep、sed、awk。

grep：擅长数据查找定位，使用正则表达式搜索文本，并把匹配的行打印出来
sed：擅长数据修改，用于编辑匹配到的文本
awk：擅长数据切片，数据格式化，能够对文本进行复杂的格式处理

grep 简介

grep (global search regular expression(RE) and print out the line)是一种强大的文本搜索工具，它能使用正则表达式搜索文本，并把匹配的行打印出来。

grep家族
grep家族总共有三个：grep，egrep，fgrep。

egrep和fgrep的命令跟grep只有很小不同
egrep是grep的扩展，支持更多的re元字符
fgrep就是fixed grep或fast grep，不识别解析正则表达式，一般很少用
- 把所有的字母都看作单词，正则表达式中的元字符表示其自身的字面意义，不再特殊

linux使用GNU版本的grep。它功能更强，可以通过-G、-E、-F命令行选项来使用egrep和fgrep的功能。

命令格式

grep [option] pattern file

option ：grep 命令的参数
pattern ：所需查找/过滤的内容
file ：指定的文件

命令参数

-e ：同时匹配多个pattern
- 没有-E 支持的完整
-E ：开启扩展（Extend）的正则表达式
-n ：显示行号
-i ：忽略大小写
- -in ：不区分大小写，并显示行号
-v ：只打印没有匹配的，而匹配的反而不打印
-r ：递归处理
-w ：被匹配的文本只能是单词，而不能是单词中的某一部分
- 如文本中有liker，但只想搜寻like，就可以使用-w选项来避免匹配liker
-c ：显示总共有多少行被匹配到了
- 如果同时使用-cv选项是显示有多少行没有被匹配到
-o ：只显示被模式匹配到的字符串
--color :将匹配到的内容以颜色高亮显示
-A n：显示匹配到的字符串所在的行及其后n行
-B n：显示匹配到的字符串所在的行及其前n行
-C n：显示匹配到的字符串所在的行及其前后各n行

grep 的使用

grep 常用用法

同时匹配多个pattern（-e）和忽略大小写（-i）很方便查找
显示行号（-n）可以快速定位
反向匹配（-v）也经常用到

1、同时查找多个文件

# 从文件info.log 和 warn.log 查找字符串"key"，不区分大小写，并显示行号
grep -in "key"  info.log  warn.log 

# 查询当前目录下所有文件中包含字符串"key"，并显示对应的行号
grep -n "key"  *

2、递归查找

# 递归查询当前目录及其子目录所有文件中包含字符串"key"，并显示对应的行号
grep -rn "key" *

3、查找对应字符前后 n 行

# 查看"bug"字符后的10行，a->after
grep -a 10 "bug" info.log 

# 查看"bug"字符前的10行，b->before
grep -b 10 "bug"  info.log 

# 查看"bug"字符前后各10行
grep -c 10 "bug"  info.log

grep 与正规表达式

1、字符类

（1）字符类的搜索
搜索有 test 或 taste 这两个单词的行，发现其结构为 t?st ：

[root@www ~]# grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.

（2）字符类的反向选择 [^]
搜索有 oo 的行，但不想要 oo 前面有 g：

[root@www ~]# grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

（3）字符类的连续
搜索 oo 前面没有小写字节的行，可以写成 [^abcd....z]oo ，还可以继续简化成 [^a-z]oo ：

[root@www ~]# grep -n '[^a-z]oo' regular_express.txt
3:Football game is not use feet only.

2、行首与行尾符 ^ $

（1）行首字符
搜索 the 在开头的行：

[root@www ~]# grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.

搜索开头是小写字节的那一行：

[root@www ~]# grep -n '^[a-z]' regular_express.txt
2:apple is my favorite food.
4:this dress doesn't fit me.

搜索开头不是英文字母的行：

[root@www ~]# grep -n '^[^a-zA-Z]' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

^ 符号，在 [] 内代表『反向选择』，在 [] 之外则代表定位在行首

（2）行首字符
搜索行尾结束为小数点的行：

[root@www ~]# grep -n '\.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.

小数点具有其他意义，需要加上转义字符''

搜索空白行：

[root@www ~]# grep -n '^$' regular_express.txt
22:

只有行首跟行尾 '^$'

3、任意一个字节 . 与重复字节 *

. ：代表『一定有一个任意字符』
- ：代表『重复前一个字符， 0 到无穷多次』

搜索有 g??d 的行，共有四个字节，起头是 g 而结束是 d ：

[root@www ~]# grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
16:The world <Happy> is the same with "glad".

搜索有至少两个 o 以上的字符串，o* 表示拥有0个以上的o：

[root@www ~]# grep -n 'ooo*' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
18:google is the best tools for search keyword.
19:goooooogle yes!

搜索开头与结尾都是 g，但是两个 g 之间仅能存在至少一个 o 的字符串：

[root@www ~]# grep -n 'goo*g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

搜索有字符串以 g 开头与 g 结尾的行，当中的字符可有可无：

[root@www ~]# grep -n 'g.*g' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

搜索有任意数字的行：

[root@www ~]# grep -n '[0-9][0-9]*' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

4、限定连续 RE 字符范围 {}

利用 . 与 RE 字符及 * 可以配置 0 个到无限多个重复字节，如果要限制一个范围区间内的重复字符，需要使用限定符 {} 。

{} 在 shell 中具有特殊意义，所以要使用 \ 来转义

搜索有两个 o 的字符串：

[root@www ~]# grep -n 'o\{2\}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
18:google is the best tools for search ke
19:goooooogle yes!

搜索 g 后面接 2 到 5 个 o 的字符串：

[root@www ~]# grep -n 'go\{2,5\}g' regular_express.txt
18:google is the best tools for search keyword.

搜索开头与结尾都是 g，但是两个 g 之间仅能存在至少 2 个 o 的字符串，除了可以是 gooo*g ，也可以是：

[root@www ~]# grep -n 'go\{2,\}g' regular_express.txt
18:google is the best tools for search keyword.
19:goooooogle yes!

grep 常用组合命令示例

# 查找历史命令中执行了哪些删除命令
history|grep rm 

# 查找当前目录下所有log日志中的exception字符行
cat *.log | grep 'exception' 

# 此命令为一般运维中最常用命令，查询linux进程中是否运行了此程序
ps -ef|grep java 

# 查询linux下是否有安装过此rpm包
rpm -qa |grep yum 

# 使用正则表达式查找，查找以 2020 开头的行，并显示其行号
grep -En '^2020' info.log 

# 递归查询当前目录及其子目录所有log类型文件中包含字符串"warn"，并显示对应的行号
grep -rn "warn"    --include  ".log" 

# 多个条件过滤查找
netstat -nap|grep -E "6651"|grep -E "203.130.41.24" 

# aaa 或 bbb 的条件都可以搜索到，这个命令实践中，滚动实时查看日志很常用
tail -1000f info.log | grep -E "aaa|bbb" 

# 查找，并把符合规则的输出到对应文件中
tail -10000f info.log |grep 'check' >> call.log 

# 多个关键字同时出现查找过滤
grep -E 'keyword1.*keyword2' info.log 

# 查找以 Error 开头，过滤包含 failed 的行，并以空格分隔，输出第10个的内容
grep -E '^Error' info.log |grep 'failed'|awk -F ' ' '{print $10}'

| ：管道命令，| 的左边运行结果是 | 右边的输入条件或者范围

posted @ 2022-03-30 17:56 当康阅读(116) 评论(0) 编辑收藏举报

刷新页面返回顶部

当康