作业三

1) 自己的基本信息：

学号：2017*****1035；
姓名：陈慧霖
码云仓库地址：https://gitee.com/chl035/word_frequency

2) 程序分析，对程序中的四个函数做简要说明。要求附上每一段代码及对应的说明。

第一段：打开并读取文件到缓冲区

 1 def process_file(dst):     # 读文件到缓冲区
 2     try:     # 打开文件
 3         f = open(dst)
 4     except IOError as s:
 5         print (s)
 6         return None
 7     try:     # 读文件到缓冲区
 8         bvffer = f.read()
 9     except:
10         print ("Read File Error!")
11         return None
12     f.close()
13     return bvffer

第二段：添加处理缓冲区bvffer代码，统计每个单词的频率，对文本特殊符号进行修改，并读入字典word_freq

 1 def process_buffer(bvffer):
 2     if bvffer:
 3         word_freq = {}
 4         # 下面添加处理缓冲区 bvffer代码，统计每个单词的频率，存放在字典word_freq
 5         for item in bvffer.strip().split():
 6             word = item.strip(punctuation+' ')
 7             if word in word_freq.keys():
 8                 word_freq[word] += 1
 9             else:
10                 word_freq[word] = 1
11         return word_freq

第三段：设置输出函数，进行排序并输出Top 10 的单词，统计词频

1 def output_result(word_freq):
2     if word_freq:
3         sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True)
4         for item in sorted_word_freq[:10]:  # 输出 Top 10 的单词
5             print(item)

第四段：调用main函数，输出至控制台

1 if __name__ == "__main__":
2     import argparse
3     parser = argparse.ArgumentParser()
4     parser.add_argument('dst')
5     args = parser.parse_args()
6     dst = args.dst
7     bvffer = process_file(dst)
8     word_freq = process_buffer(bvffer)
9     output_result(word_freq)