awk使用
-F后面加分隔符‘p_words":’ ,print $2指的是输出分隔符右面内容
awk -F ' "p_words":' '{print $2}'
统计文件passage内容的单词数
cat data/train_middle-processed.json | awk -F ' "p_words":' '{print $2}'| awk -F ', "p_q_relation":' '{print $1}' | awk '{print NF}'
计算单词数:
echo 'he said no SDG JCD DDDV .' | awk '{print NF}'
统计词频:
有两句话:
the day is sunny the the the sunny is is
想得到:
the 4 is 3 sunny 2 day 1
命令脚本:
awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}' words.txt|sort -nr -k 2
求平均数:
文件:
1 50 2 30 3 20 4 50
命令:
# awk -F' ' '{sum+=$2;count+=1} END{print "SUM:"sum"\nAVG:"sum/count}' inputfile SUM:150 AVG:37.5
项目使用:
1
cat length.txt | awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}' |sort -nr -k 2
2
cat data/train_middle-processed.json | awk -F ' "p_words":' '{print $2}'| awk -F ', "p_q_relation":' '{print $1}' |awk '{print NF}' | awk -F" " '{for(i=1;i<=NF;i++){array[$i]+=1;}} END{for(s in array){print s" "array[s];}}'|sort -nr -k 2