文本常见处理工具

查看文本文件内容 cat

格式

cat [OPTION]... [FILE]...

常见选项

-E：显示行结束符$
-A：显示所有控制符
-n：对显示出的每一行进行编号
-b：非空行编号
-s：压缩连续的空行成一行

范例

[root@centos8 ~]# cat -A /data/fa.txt
a b$
c $
d^Ib^Ic$
[root@centos8 ~]# cat /data/fa.txt
a b
c
d b c
[root@centos8 ~]# cat /data/fb.txt
a
b
c
[root@centos8 ~]# hexdump -C /data/fb.txt
00000000 61 0d 0a 62 0d 0a 63 0d 0a |a..b..c..|
00000009
[root@centos8 ~]#cat -A /data/fb.txt
a^M$
b^M$
c^M$
[root@centos8 ~]# file /data/fb.txt
/data/fb.txt: ASCII text, with CRLF line terminators

nl

显示行号，相当于cat -b

[root@centos8 ~]# cat /data/f1.txt
a
b
c
d

[root@centos8 ~]# nl /data/f1.txt
1 a
2 b
3 c
4 d

tac

逆向显示文本内容

[root@centos8 ~]# cat /data/fa.txt
1
2
3
4
5
[root@centos8 ~]# tac /data/fa.txt
5
4
3
2
1
[root@centos8 ~]# tac
a
bb
ccc 按ctrl+d
ccc
bb
a

[root@centos8 ~]# seq 10 | tac
10
9
8
7
6
5
4
3
2
1

rev

将同一行的内容逆向显示

[root@centos8 ~]# cat /data/fa.txt
1 2 3 4 5
a b c
[root@centos8 ~]# tac /data/fa.txt
a b c
1 2 3 4 5
[root@centos8 ~]# rev /data/fa.txt
5 4 3 2 1
c b a
[root@centos8 ~]# rev
abcdef
fedcba
[root@centos8 ~]# echo {1..10} |rev
01 9 8 7 6 5 4 3 2 1

hexdump

查看非文本文件内容

范例

[root@centos8 ~]# hexdump -C -n 512 /dev/sda
00000000 eb 63 90 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0 |.c..............|
[root@centos8 ~]# echo {a..z} | tr -d ' '| hexdump -C
00000000 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 |abcdefghijklmnop|
00000010 71 72 73 74 75 76 77 78 79 7a 0a |qrstuvwxyz.|
0000001b

od

od 即 dump files in octal and other formats
范例：

[root@centos8 ~]# echo {a..z} | tr -d ' '|od -t x
0000000 64636261 68676665 6c6b6a69 706f6e6d
0000020 74737271 78777675 000a7a79
0000033
[root@centos8 ~]# echo {a..z} | tr -d ' '|od -t x1
0000000 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70
0000020 71 72 73 74 75 76 77 78 79 7a 0a
0000033
[root@centos8 ~]# echo {a..z} | tr -d ' '|od -t x1z
0000000 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 >abcdefghijklmnop<
0000020 71 72 73 74 75 76 77 78 79 7a 0a >qrstuvwxyz.<
0000033

xxd

[root@centos8 ~]# echo {a..z} | tr -d ' '|xxd
00000000: 6162 6364 6566 6768 696a 6b6c 6d6e 6f70 abcdefghijklmnop
00000010: 7172 7374 7576 7778 797a 0a qrstuvwxyz.

分页查看文件内容

可以实现分页查看文件，可以配合管道实现输出信息的分页

格式

more [OPTIONS...] FILE...

选项：

-d: 显示翻页及退出提示

less

less 也可以实现分页查看文件或STDIN输出，less 命令是man命令使用的分页器

查看时有用的命令包括：

/文本 搜索 文本

n/N 跳到下一个 或 上一个匹配

范例：

[root@centos8 ~]#cat /etc/init.d/functions |less
# -*-Shell-script-*-
#
# functions This file contains functions to be used by most or all
# shell scripts in the /etc/init.d directory.
#
TEXTDOMAIN=initscripts
# Make sure umask is sane
umask 022
# Set up a default search path.
PATH="/sbin:/usr/sbin:/bin:/usr/bin"
export PATH
...省略...

显示文本前或后行内容

head

可以显示文件或标准输入的前面行
格式：

head [OPTION]... [FILE]...

选项：

-c # 指定获取前#字节
-n # 指定获取前#行,#如果为负数,表示从文件头取到倒数第#前
-# 同上

范例：

[root@centos8 ~]# head -n 3 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

[root@centos8 ~]# head -3 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

[root@centos8 ~]# echo a我b | head -c4
a我[root@centos8 ~]#

[root@centos8 ~]# cat /dev/urandom | tr -dc '[:alnum:]'| head -c10

G755MlZatW[root@centos8 ~]# cat /dev/urandom | tr -dc '[:alnum:]'| head -c10
ASsax6DeBz[root@centos8 ~]# cat /dev/urandom | tr -dc '[:alnum:]'| head -c10 |
tee pass.txt | passwd --stdin long
Changing password for user long.
passwd: all authentication tokens updated successfully.

[root@centos8 ~]# cat pass.txt
AGT952Essg[root@centos8 ~]# su - wang
[wang@centos8 ~]$ su - long
Password:

[root@centos8 ~]# cat seq.log
1
2
3
4
5
6
7
8
9
10
[root@centos8 ~]# head -n 3 seq.log
1
2
3
[root@centos8 ~]# head -n -3 seq.log
1
2
3
4
5
6
7
[root@centos8 ~]# head -n +3 seq.log
1
2
3

tail

tail 和head 相反，查看文件或标准输入的倒数行

格式：

tail [OPTION]... [FILE]...

常用选项：

-c # 指定获取后#字节
-n # 指定获取后#行,如果#是负数,表示从第#行开始到文件结束
-# 同上
-f 跟踪显示文件fd新追加的内容,常用日志监控，相当于 --follow=descriptor,当文件删除再新
建同名文件,将无法继续跟踪文件
-F 跟踪文件名，相当于--follow=name --retry，当文件删除再新建同名文件,将可以继续跟踪文
件
tailf 类似 tail –f，当文件不增长时并不访问文件,节约资源,CentOS8无此工具

范例：

[root@centos8 ~]# cat /data/f1.txt
1
2
3
4
5
6
7
8
9
10
[root@centos8 ~]# tail -n 3 /data/f1.txt
8
9
10
[root@centos8 ~]# tail -n +3 /data/f1.txt
3
4
5
6
7
8
9
10

范例：

[root@centos8 ~]# tail -3 /var/log/messages
Dec 20 09:50:01 centos8 dbus-daemon[952]: [system] Successfully activated
service 'net.reactivated.Fprint'
Dec 20 09:50:01 centos8 systemd[1]: Started Fingerprint Authentication Daemon.
Dec 20 09:50:13 centos8 su[6887]: (to mage) root on pts/0

[root@centos8 ~]# tail -f /var/log/messages
Dec 20 08:36:40 centos8 systemd[1321]: Startup finished in 52ms.
Dec 20 08:36:40 centos8 systemd[1]: Started User Manager for UID 0.
Dec 20 08:47:01 centos8 systemd[1]: Starting dnf makecache...
Dec 20 08:47:02 centos8 dnf[1465]: AppStream

#只查看最新发生的日志
[root@centos8 ~]# tail -fn0 /var/log/messages
[root@centos8 ~]# tail -0f /var/log/messages
#取IP行
[root@centos8 data]# ifconfig | head -2 | tail -1
│
inet 10.0.0.8 netmask 255.255.255.0 broadcast 10.0.0.255

范例: 显示第6行

[root@centos8 ~]# seq 20 | head -n 6 | tail -n1
6
[root@centos8 ~]# seq 20 | tail -n +6 | head -n1
6

按列抽取文本 cut

cut 命令可以提取文本文件或STDIN数据的指定列

格式

cut [OPTION]... [FILE]...

常用选项

-d DELIMITER: 指明分隔符，默认tab

-f FILEDS:

#: 第#个字段,例如:3

#,#[,#]：离散的多个字段，例如:1,3,6

#-#：连续的多个字段, 例如:1-6

混合使用：1-3,7

-c 按字符切割

--output-delimiter=STRING指定输出分隔符

范例

[root@centos8 ~]# cut -d: -f1,3-4,7 /etc/passwd
[root@centos8 ~]# ifconfig | head -n2 | tail -n1 | cut -d" " -f10
10.0.0.8
[root@centos8 ~]# ifconfig | head -n2 | tail -n1 | tr -s " " | cut -d " " -f3
10.0.0.8
[root@centos8 ~]# df | tr -s ' '| cut -d' ' -f5 | tr -dc "[0-9\n]"
0
0
1
0
5
1
15
1
[root@centos8 ~]# df | tr -s ' ' % | cut -d% -f5 | tr -d '[:alpha:]'
0
0
1
0
6
1
13
1
[root@centos8 ~]# df | cut -c44-46 | tr -d '[:alpha:]'
0
0
1
0
6
1
13
1

[root@centos8 ~]# cut -d: -f1,3,7 --output-delimiter="---" /etc/passwd
root---0---/bin/bash
bin---1---/sbin/nologin
daemon---2---/sbin/nologin
cat /etc/passwd | cut -d: -f7
cut -c2-5 /usr/share/dict/words

[root@centos8 ~]# echo {1..10} | cut -d ' ' -f1-10 --output-delimiter="+" | bc
55

范例: 取分区利用率

#取分区利用率
[root@centos8 ~]# df | tr -s ' ' | cut -d' ' -f5 | tr -d %
[root@centos8 ~]# df | tr -s ' ' '%'| cut -d% -f5
Use
0
0
2
0
3
1
15
0
100
[root@centos8 ~]# df |cut -c 44-46 | tail -n +2
0
0
3
0
3
1
13
0
[root@centos8 ~]# df | tail -n +2 | tr -s ' ' % | cut -d% -f5
0
0
1
0
3
1
19
0
100
[root@centos8 ~]# df | tail -n +2 | tr -s ' ' | cut -d' ' -f5 | tr -d %
0
0
1
0
3
1
19
0
100

合并多个文件 paste

paste 合并多个文件同行号的列到一行

格式

paste [OPTION]... [FILE]...

常用选项：

-d #分隔符：指定分隔符，默认用TAB
-s #所有行合成一行显示

范例：

[root@centos8 ~]# cat alpha.log
a
b
c
d
e
f
g
h
[root@centos8 ~]# cat seq.log
1
2
3
4
5
[root@centos8 ~]# cat alpha.log seq.log
a
b
c
d
e
f
g
h
1
2
3
4
5
[root@centos8 ~]# paste alpha.log seq.log
a 1
b 2
c 3
d 4
e 5
f
g
h
[root@centos8 ~]# paste -d":" alpha.log seq.log
a:1
b:2
c:3
d:4
e:5
f:
g:
h:

[root@centos8 ~]# paste -s seq.log
1 2 3 4 5
[root@centos8 ~]# paste -s alpha.log
a b c d e f g h
[root@centos8 ~]# paste -s alpha.log seq.log
a b c d e f g h
1 2 3 4 5

[root@centos8 ~]# cat title.txt
ceo
coo
cto
[root@centos8 ~]# cat emp.txt
long
zhang
wang
xu
[root@centos8 ~]# paste title.txt emp.txt
ceo long
coo zhang
cto wang
xu
[root@centos8 ~]# paste -s title.txt emp.txt
ceo coo cto
long zhang wang xu

[root@centos8 ~]# paste -s -d: f1.log f2.log
1:2:3:4:5:6:7:8:9:10
a:b:c:d:e:f:g:h:i:j
[root@centos8 ~]# seq 10
1
2
3
4
5
6
7
8
9
10
[root@centos8 ~]# seq 10 |paste -s -d+|bc
55

范例: 批量修改密码

[root@centos8 ~]# cat user.txt
wang
long
[root@centos8 ~]# cat pass.txt
123456
longge
[root@centos8 ~]# paste -d: user.txt pass.txt
wang:123456
long:longge
[root@centos8 ~]# paste -d: user.txt pass.txt | chpasswd

分析文本的工具

文本数据统计：wc

整理文本：sort

比较文件：diff和patch

收集文本统计数据 wc

wc 命令可用于统计文件的行总数、单词总数、字节总数和字符总数

可以对文件或STDIN中的数据统计

常用选项

-l 只计数行数

-w 只计数单词总数

-c 只计数字节总数

-m 只计数字符总数

-L 显示文件中最长行的长度

范例:

wc story.txt
39 237 1901 story.txt
行数 单词数 字节数

[root@centos8 ~]# ll title.txt
-rw-r--r-- 1 root root 30 Dec 20 11:05 title.txt
[root@centos8 ~]# ll title1.txt
-rw-r--r-- 1 root root 28 Dec 20 11:06 title1.txt
[root@centos8 ~]# cat title.txt
ceo mage
coo zhang
cto 老王
[root@centos8 ~]# cat title1.txt
ceo mage
coo zhang
cto wang
[root@centos8 ~]# wc title.txt
3 6 30 title.txt
[root@centos8 ~]# wc title1.txt
3 6 28 title1.txt
[root@centos8 ~]# wc -l title.txt
3 title.txt
[root@centos8 ~]# cat title.txt | wc -l
3

[root@centos8 ~]# df | tail -n $(echo `df | wc -l`-1|bc)
devtmpfs 910220 0 910220 0% /dev
tmpfs 924728 0 924728 0% /dev/shm
tmpfs 924728 9224 915504 1% /run
tmpfs 924728 0 924728 0% /sys/fs/cgroup
/dev/sda2 104806400 4836160 99970240 5% /
/dev/sda3 52403200 398580 52004620 1% /data
/dev/sda1 999320 131764 798744 15% /boot
tmpfs 184944 4 184940 1% /run/user/0

范例：单词文件

[root@centos8 ~]# yum -y install words
[root@centos8 ~]# wc -l /usr/share/dict/linux.words
479829 /usr/share/dict/linux.words

文本排序 sort

把整理过的文本显示在STDOUT，不改变原始文件

格式：

sort [options] file(s)

常用选项

-r 执行反方向（由上至下）整理

-R 随机排序

-n 执行按数字大小整理

-h 人类可读排序,如: 2K 1G

-f 选项忽略（fold）字符串中的字符大小写

-u 选项（独特，unique），合并重复项，即去重

-t c 选项使用c做为字段界定符

-k # 选项按照使用c字符分隔的 # 列来整理能够使用多次

范例:

[root@centos8 data]# cut -d: -f1,3 /etc/passwd | sort -t: -k2 -nr | head -n3
nobody:65534
xiaoming:1002
mage:1001
#统计日志访问量
[root@centos8 data]# cut -d" " -f1 /var/log/nginx/access_log | sort -u | wc -l
201

范例：统计分区利用率

[root@centos8 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 391676 0 391676 0% /dev
tmpfs 408092 0 408092 0% /dev/shm
tmpfs 408092 5816 402276 2% /run
tmpfs 408092 0 408092 0% /sys/fs/cgroup
/dev/sda2 104806400 2259416 102546984 3% /
/dev/sda3 52403200 398608 52004592 1% /data
/dev/sda1 999320 130848 799660 15% /boot
tmpfs 81616 0 81616 0% /run/user/0
/dev/sr0 7377866 7377866 0 100% /misc/cd

#查看分区利用率最高值
[root@centos8 ~]# df | tr -s ' ' '%' | cut -d% -f5 | sort -nr | head -1
100
[root@centos8 ~]# df | tr -s " " % | cut -d% -f5 | tr -d '[:alpha:]' | sort
0
0
0
1
1
1
15
5
[root@centos8 ~]# df | tr -s " " % | cut -d% -f5 | tr -d '[:alpha:]' | sort -n
0
0
0
1
1
1
5
15
[root@centos8 ~]# df | tr -s " " % | cut -d% -f5 | tr -d '[:alpha:]' | sort -n | tail -n1
15
[root@centos8 ~]# df | tr -s " " % | cut -d% -f5 | tr -d '[:alpha:]' | sort -nr
15
5
1
1
1
0
0
0
[root@centos8 ~]# df | tr -s " " %  | cut -d% -f5 | tr -d '[:alpha:]' | sort -nr | head -n1
15

面试题：有两个文件，a.txt与b.txt ，合并两个文件，并输出时确保每个数字也唯一

#a.txt中的每一个数字在本文件唯一
200
100
34556
23
...
#b.txt中的每一个数字在本文件唯一
123
43
200
3321
...
#就是将两个文件合并后重复的行去除，不保留
100
345563
123
43
3321
...

去重uniq

uniq命令从输入中删除前后相接的重复的行

格式：

uniq [OPTION]... [FILE]...

常见选项：

-c: 显示每行重复出现的次数

-d: 仅显示重复过的行

-u: 仅显示不曾重复的行

uniq常和sort 命令一起配合使用：

范例：

sort userlist.txt | uniq -c

范例：统计日志访问量最多的请求

[root@centos8 data]# cut -d" " -f1 access_log | sort | uniq -c | sort -nr | head -3
4870 172.20.116.228
3429 172.20.116.208
2834 172.20.0.222
[root@centos8 data]# lastb -f btmp-34 | tr -s ' ' | cut -d ' ' -f3 | sort | uniq -c | sort -nr | head -3
86294 58.218.92.37
43148 58.218.92.26
18036 112.85.42.201

范例：并发连接最多的远程主机IP

[root@centos8 ~]# ss -nt| tail -n+2 | tr -s ' ' : | cut -d: -f6 | sort | uniq -c | sort -nr | head -n2
7 10.0.0.1
2 10.0.0.7

范例：取两个文件的相同和不同的行

[root@centos8 data]# cat test1.txt
a
b
1
c
[root@centos8 data]# cat test2.txt
b
e
f
c
1
2
#取文件的共同行
[root@centos8 data]# cat test1.txt test2.txt | sort | uniq -d
1
b
c
#取文件的不同行
[root@centos8 data]# cat test1.txt test2.txt | sort | uniq -u
2
a
e
f

比较文件 diff

diff 命令比较两个文件之间的区别

-u 选项来输出“统一的（unified）”diff格式文件，最适用于补丁文件

范例：

[root@centos8 ~]# cat f1.txt
mage
zhang
wang
xu
[root@centos8 ~]# cat f2.txt
magedu
zhang sir
wang
xu
shi
[root@centos8 ~]# diff f1.txt f2.txt
1,2c1,2
< mage
< zhang
---
> magedu
> zhang sir
4a5
> shi
[root@centos8 ~]# diff -u f1.txt f2.txt
--- f1.txt 2019-12-13 21:31:30.892775671 +0800
+++ f2.txt 2019-12-13 22:00:14.373677728 +0800
@@ -1,4 +1,5 @@
-mage
-zhang
+magedu
+zhang sir
wang
xu
+shi
[root@centos8 ~]# diff -u f1.txt f2.txt > f.patch
[root@centos8 ~]# rm -f f2.txt
[root@centos8 ~]# patch -b f1.txt f.patch
patching file f1.txt
[root@centos8 ~]# cat f1.txt
magedu
zhang sir
wang
xu
shi
[root@centos8 ~]# cat f1.txt.orig
mage
zhang
wang
xu

patch

patch 复制在其它文件中进行的改变（要谨慎使用）

-b 选项来自动备份改变了的文件

范例：

[root@centos8 ~]# diff -u foo.conf foo2.conf > foo.patch
[root@centos8 ~]# patch -b foo.conf foo.patch

vimdiff

相当于 vim -d

[root@centos8 ~]# cat f1.txt
mage
zhangsir
wang
lilaoshi
zhao
[root@centos8 ~]# cat f2.txt
mage
zhang
wang
li
zhao
[root@centos8 ~]# which vimdiff
/usr/bin/vimdiff
[root@centos8 ~]# ll /usr/bin/vimdiff
lrwxrwxrwx. 1 root root 3 Nov 12 2019 /usr/bin/vimdiff -> vim
[root@centos8 ~]# vimdiff f1.txt f2.txt

cmp

范例：查看二进制文件的不同

[root@centos8 data]# ll /usr/bin/dir /usr/bin/ls
-rwxr-xr-x. 1 root root 166448 May 12 2019 /usr/bin/dir
-rwxr-xr-x. 1 root root 166448 May 12 2019 /usr/bin/ls

[root@centos8 data]# ll /usr/bin/dir /usr/bin/ls -i
201839444 -rwxr-xr-x. 1 root root 166448 May 12 2019 /usr/bin/dir
201839465 -rwxr-xr-x. 1 root root 166448 May 12 2019 /usr/bin/ls

[root@centos8 data]# diff /usr/bin/dir /usr/bin/ls
Binary files /usr/bin/dir and /usr/bin/ls differ

[root@centos8 ~]# cmp /bin/dir /bin/ls
/bin/dir /bin/ls differ: byte 737, line 2
#跳过前735个字节,观察后面30个字节
[root@centos8 ~]# hexdump -s 735 -Cn 30 /bin/ls
000002df 00 05 6d da 3f 1b 77 91 91 63 a7 de 55 63 a2 b9 |..m.?.w..c..Uc..|
000002ef d9 d2 45 55 4c 00 00 00 00 03 00 00 00 7d |..EUL........}|
000002fd
[root@centos8 ~]# hexdump -s 735 -Cn 30 /bin/dir
000002df 00 f1 21 4e f2 19 7e ef 38 0d 9b 3e d7 54 08 39 |..!N..~.8..>.T.9|
000002ef e4 74 4d 69 25 00 00 00 00 03 00 00 00 7d |.tMi%........}|
000002fd

posted @ 2021-04-14 20:10 空白的旋律阅读(104) 评论(0) 编辑收藏举报

刷新页面返回顶部

空白的Melody

文本常见处理工具

查看文本文件内容 cat

nl

tac

rev

hexdump

od

xxd

分页查看文件内容

more

less

显示文本前或后行内容

head

tail

按列抽取文本 cut

合并多个文件 paste

分析文本的工具

收集文本统计数据 wc

文本排序 sort

去重uniq

比较文件 diff

patch

vimdiff

cmp

公告