[Linux] 随机切分文件内容
1.从原文件中随机选出若干行
可以直接用shuf命令就可以完成:
$ shuf -n 100 source.txt > target.txt
shuf命令的说明:
$ shuf --help Usage: shuf [OPTION]... [FILE] or: shuf -e [OPTION]... [ARG]... or: shuf -i LO-HI [OPTION]... Write a random permutation of the input lines to standard output. With no FILE, or when FILE is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -e, --echo treat each ARG as an input line -i, --input-range=LO-HI treat each number LO through HI as an input line -n, --head-count=COUNT output at most COUNT lines -o, --output=FILE write result to FILE instead of standard output --random-source=FILE get random bytes from FILE -r, --repeat output lines can be repeated -z, --zero-terminated line delimiter is NUL, not newline --help display this help and exit --version output version information and exit
2.把文件随机切分成若干部分
这里我的做法是先把文件全部打乱,再进行顺序切分
(1)全部打乱
$ shuf source.txt > source_shuffle.txt
(2)顺序切分
切分的方法有很多种:用split、head/tail、awk、sed都可以,根据实际需要选用即可
(可参考:[Linux] 输出文件的指定行、Linux 大文件的分割与合并)
例如,这里把打乱后的文件根据前100行与剩余的部分作为最终想要的随机切分结果:
$ head -n100 source_shuffle.txt > target1.txt $ tail -n+101 source_shuffle.txt > target2.txt # 或者$ awk 'NR>=101' source_shuffle.txt > target2.txt
如果有其它更高效便捷的方法也欢迎指教~