课后作业3：个人项目（词频统计及其效能分析）

1.个人信息

学号：2017*****7189
姓名：李博文
码云地址：https://gitee.com/libowena9/word_frequency/tree/ES7189/

2.程序分析

读取文件到缓冲区
def process_file(dst): # 读文件到缓冲区
try: # 打开文件
txt = open(dst)
except IOError as s:
print (s)
return None
try: # 读文件到缓冲区
bvffer = txt.read()
except: #文件错误
print ("Read File Error!")
return None
txt.close() #关闭文件
return bvffer #返回bvffer
将文件转换为bvffer类并进行序列化操作
def process_buffer(bvffer):
if bvffer:
word_freq = {}
# 下面添加处理缓冲区 bvffer代码，统计每个单词的频率，存放在字典word_freq
for i in bvffer.split():
#引用punctuation函数去掉文件内空格
word_ = i.strip(punctuation + " ")
#统计
if word_ in word_freq:
word_freq[word_] += 1
else:
word_freq[word_] = 1
#将统计数据存储在word_freq中并返回word_freq的值
return word_freq
将统计好的数据进行排序并输出Top10的单词
def output_result(word_freq):
if word_freq:
sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True)
for item in sorted_word_freq[:10]: # 输出 Top 10 的单词
print(item)
调用主函数时导入argparse包，并依次执行process_file()、 process_buffer()、 output_result()函数
if name == "main":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('dst')
args = parser.parse_args()
dst = args.dst
bvffer = process_file(dst)
word_freq = process_buffer(bvffer)
output_result(word_freq)

3.性能分析结果及改进

执行次数最多的代码为：
word_ = i.strip(punctuation + " ")
执行时间最长的代码为：
word_freq = process_buffer(bvffer)

4.程序运行命令、运行结果截图及改进后的程序运行命令结果截图

程序运行命令、运行结果截图：

5.对此次任务的总结与反思

经过此次任务使我重温了python对文件进行处理，同时学习了python程序的性能分析及git分支语句的使用；在过程中发现自己对git了解不足，并且会继续学习git的使用。

posted @ 2019-04-03 11:31 o3o 阅读(202) 评论(1) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部