awk使用

-F后面加分隔符‘p_words":’ ，print $2指的是输出分隔符右面内容

awk -F ' "p_words":' '{print $2}'

统计文件passage内容的单词数

cat data/train_middle-processed.json | awk -F ' "p_words":' '{print $2}'|  awk -F ', "p_q_relation":' '{print $1}' | awk '{print NF}'

计算单词数：

echo 'he said no SDG JCD DDDV .' | awk '{print NF}'

统计词频：

有两句话：

the day is sunny the the
the sunny is is

想得到：

the 4  is 3  sunny 2  day 1

命令脚本：

awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}' words.txt|sort -nr -k 2

求平均数：

文件：

命令：

# awk -F' ' '{sum+=$2;count+=1} END{print "SUM:"sum"\nAVG:"sum/count}' inputfile 
SUM:150
AVG:37.5

项目使用：

cat length.txt |  awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}' |sort -nr -k 2

cat data/train_middle-processed.json | awk -F ' "p_words":' '{print $2}'|  awk -F ', "p_q_relation":' '{print $1}'  |awk '{print NF}' | awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}'|sort -nr -k 2

posted @ 2018-08-08 13:45 hozhangel 阅读(125) 评论(0) 编辑收藏举报

刷新页面返回顶部

awk使用

统计词频：

公告