shell 统计词频脚本
#!/bin/bash if [ $# -ne 1 ]; then echo "Usage:$0 filename"; exit -1 fi filename=$1 egrep -o "\b[[:alpha:]]+\b" $filename | awk '{count[$0]++}END{printf("%-14s%s\n","Word","Count");for(ind in count){printf("%-14s%d\n",ind,count[ind]);}}'
这里注意两点
egrep 和grep的区别:egrep 支持的正则更全一点
\b
The symbol \b
matches the empty string at the edge of a word 匹配一个单词边界的空字符串
\< \>
The symbols \< and \> respectively match the empty string at the beginning and end of a word. 匹配单词的开头或者结尾空串
%-14s - 表示左对齐 14 表示 字符串宽度为14
[:alpha:] 表示正则匹配 相当于 a-z A-Z 详见:http://www.cnblogs.com/zhuyp1015/archive/2012/07/01/2572289.html
posted on 2015-09-16 10:45 ggbond1988 阅读(926) 评论(0) 编辑 收藏 举报