第一次个人编程作业
我的github
计算模块接口的设计与实现过程
具体算法流程图如下
模块介绍
基本思想:余弦相似度算法 参考博客
one_hot用于onehot编码
def one_hot(word_dict, keywords):
cut_code = [word_dict[word] for word in keywords]
cut_code = [0]*len(word_dict)
for word in keywords:
cut_code[word_dict[word]] += 1
return cut_code
def extract_keyword用于提取关键词
def extract_keyword(content):
re_exp = re.compile(r'(<style>.*?</style>)|(<[^>]+>)', re.S)
content = re_exp.sub(' ', content)
content = html.unescape(content)
seg = [i for i in jieba.cut(content, cut_all=True) if i != '']
# 提取关键词
keywords = jieba.analyse.extract_tags("|".join(seg), topK=200, withWeight=False)
return keywords
计算模块接口部分的性能改进
消耗最大的部分
如图所示,main.py消耗最大
性能分析图
计算模块部分单元测试展示
测试结果:
基本都在0.8左右,上下浮动,较为符合预期。
部分测试代码:
if __name__ == '__main__':
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x1, open('F:/qq/sim_0.8/orig_0.8_add.txt', 'r',
encoding="UTF-8") as y1:
content_x1 = x1.read()
content_y1 = y1.read()
similarity = CosineSimilarity(content_x1, content_y1)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x2, open('F:/qq/sim_0.8/orig_0.8_del.txt', 'r',
encoding="UTF-8") as y2:
content_x2 = x2.read()
content_y2 = y2.read()
similarity = CosineSimilarity(content_x2, content_y2)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x3, open('F:/qq/sim_0.8/orig_0.8_dis_1.txt', 'r',
encoding="UTF-8") as y3:
content_x3 = x3.read()
content_y3 = y3.read()
similarity = CosineSimilarity(content_x3, content_y3)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x4, open('F:/qq/sim_0.8/orig_0.8_dis_3.txt', 'r',
encoding="UTF-8") as y4:
content_x4 = x4.read()
content_y4 = y4.read()
similarity = CosineSimilarity(content_x4, content_y4)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x6, open('F:/qq/sim_0.8/orig_0.8_dis_7.txt', 'r',
encoding="UTF-8") as y6:
content_x6 = x6.read()
content_y6 = y6.read()
similarity = CosineSimilarity(content_x6, content_y6)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x7, open('F:/qq/sim_0.8/orig_0.8_dis_10.txt', 'r',
encoding="UTF-8") as y7:
content_x7 = x7.read()
content_y7 = y7.read()
similarity = CosineSimilarity(content_x7, content_y7)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x8, open('F:/qq/sim_0.8/orig_0.8_dis_15.txt', 'r',
encoding="UTF-8") as y8:
content_x8 = x8.read()
content_y8 = y8.read()
similarity = CosineSimilarity(content_x8, content_y8)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x9, open('F:/qq/sim_0.8/orig_0.8_mix.txt', 'r',
encoding="UTF-8") as y9:
content_x9 = x9.read()
content_y9 = y9.read()
similarity = CosineSimilarity(content_x9, content_y9)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
with open('F:/qq/sim_0.8/orig.txt', 'r', encoding="UTF-8") as x0, open('F:/qq/sim_0.8/orig_0.8_rep.txt', 'r',
encoding="UTF-8") as y0:
content_x0 = x0.read()
content_y0 = y0.read()
similarity = CosineSimilarity(content_x0, content_y0)
similarity = similarity.main()
print('相似度: %.2f%%\n' % (similarity * 100))
计算模块部分异常处理说明
设计空白对比文档和完全一致的文档
空白文档的结果:
没有异常。
完全一致文档的结果:
没有异常。
时间有限,暂时没有发现模块异常。
PSP表格如下
PSP2.1 | Personal Software Process Stages | 预估耗时(分钟)| 实际耗时(分钟)
- | - | :-: |:-:
Planning|计划|30|40
Estimate|估计这个任务需要多少时间|20|20
Development|开发|480|300
Analysis|需求分析 (包括学习新技术)|300|200
Design Spec|生成设计文档|60|30
Design Review|设计复审|30|20
Coding Standard|代码规范 (为目前的开发制定合适的规范)|30|30
Design|具体设计|60|60
Coding|具体编码|300|200
Code Review|代码复审|30|30
Test|测试(自我测试,修改代码,提交修改)|120|90
Reporting|报告|60|80
Test Repor|测试报告|30|20
Size Measurement|计算工作量|30|15
Postmortem & Process Improvement Plan|事后总结, 并提出过程改进计划|40|35
Total|合计|1620|1140
小总结
第一次做这种作业,没有经验,难度有点高。只能在网上论坛上找找别人的东西,参考了很多才完成作业。自己还是有很多不足,希望以后再接再厉。