作业概述
这个作业属于哪个课程 |
软件工程 |
这个作业要求在哪里 |
个人项目 |
这个作业的目标 |
完成个人项目:设计一个论文查重算法 |
github链接
github链接
PSP表格
PSP2.1 |
Personal Software Process Stages |
预估耗时(分钟) |
实际耗时(分钟) |
Planning |
计划 |
20 |
20 |
· Estimate |
· 估计这个任务需要多少时间 |
340 |
360 |
Development |
开发 |
300 |
320 |
· Analysis |
· 需求分析 (包括学习新技术) |
100 |
100 |
· Design Spec |
· 生成设计文档 |
30 |
25 |
· Design Review |
· 设计复审 |
20 |
15 |
· Coding Standard |
· 代码规范 (为目前的开发制定合适的规范) |
30 |
40 |
· Design |
· 具体设计 |
30 |
40 |
· Coding |
· 具体编码 |
30 |
30 |
· Code Review |
· 代码复审 |
20 |
15 |
· Test |
· 测试(自我测试,修改代码,提交修改) |
40 |
55 |
Reporting |
报告 |
40 |
40 |
· Test Repor |
· 测试报告 |
15 |
15 |
· Size Measurement |
· 计算工作量 |
10 |
10 |
· Postmortem & Process Improvement Plan |
· 事后总结, 并提出过程改进计划 |
15 |
15 |
|
· 合计 |
360 |
380 |
模块接口设计与实现
- TxtIOUtil类:将传入的文件转换为String,也可将String写出到指定的文件中
- SimHashUtils类:传入String,计算出它的hash值,并以字符串形式输出
- HammingUtils类:输入两个simHash值,计算它们的海明距离,并计算输出相似度
- main类:程序主入口,通过传递命令行参数的方式提供文件的位置,调用Util包下的类输出结果
- MainTest类:单元测试类
模块接口部分性能分析
内存上的占用主要在浮点数、数组集合的创建,调用的是Util包中的方法,无需改进。
单元测试展示
@Test
public void Test1(){
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str2 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_add.txt");
String str3 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_del.txt");
String str4 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_1.txt");
String str5 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_10.txt");
String str6 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_15.txt");
String ansFileName = "src/test/resources/test/test1.txt";
double ans1 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str1));
double ans2 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str2));
double ans3 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str3));
double ans4 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str4));
double ans5 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str5));
double ans6 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str6));
TxtIOUtil.writeTxt(ans1, ansFileName);
TxtIOUtil.writeTxt(ans2, ansFileName);
TxtIOUtil.writeTxt(ans3, ansFileName);
TxtIOUtil.writeTxt(ans4, ansFileName);
TxtIOUtil.writeTxt(ans5, ansFileName);
TxtIOUtil.writeTxt(ans6, ansFileName);
}
@Test
public void Test2(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String ansFileName = "src/test/resources/test/test2.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test3(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_add.txt");
String ansFileName = "src/test/resources/test/test3.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test4(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_del.txt");
String ansFileName = "src/test/resources/test/test4.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test5(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_1.txt");
String ansFileName = "src/test/resources/test/test5.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test6(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_10.txt");
String ansFileName = "src/test/resources/test/test6.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
@Test
public void Test7(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_15.txt");
String ansFileName = "src/test/resources/test/test7.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans,ansFileName);
}
- 测试覆盖率:
- 测试耗时时间:
异常处理说明
/**
* 文件不存在异常测试
* @throws Exception
*/
@Test
public void Test8() throws Exception {
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_.txt");
String ansFileName = "src/test/resources/test/test8.txt";
if(str0 == "" || str1 == ""){
throw new Exception("文件不存在");
}
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
/**
* 文件为空异常测试
*/
@Test
public void Test9(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.1.txt");
String ansFileName = "src/test/resources/test/test9.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
/**
* 文件字数太少异常测试
*/
@Test
public void Test10(){
String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.2.txt");
String ansFileName = "src/test/resources/test/test10.txt";
double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
TxtIOUtil.writeTxt(ans, ansFileName);
}
- 在SimHashUtils的getSimHash方法中加入代码来处理异常情况:
try {
if (str.length() == 0) throw new Exception("文件为空");
if (str.length() < 200) throw new ShortStringException("文本过短,难以判断!");
} catch (Exception e) {
e.printStackTrace();
return null;
}