第一次个人编程作业
一、流程图
二、模块设计
1.模块划分:
• SimHash模块:负责计算文本的SimHash值。
• Hamming距离模块:负责计算两个SimHash值的汉明距离。
• 文件操作模块:读取文件、输出对比结果。
2.接口设计:
SimHash接口:
点击查看代码
public interface SimHashCalculator {
String calculateSimHash(String content);
}
Hamming距离接口:
点击查看代码
public interface HammingDistanceCalculator {
int calculateHammingDistance(String simHash1, String simHash2);
}
文件操作接口:
点击查看代码
public interface FileHandler {
List<String> readFile(String filePath);
void writeFile(String filePath, String content);
}
点击查看代码
public class SimHashCalculatorImpl implements SimHashCalculator {
@Override
public String calculateSimHash(String content) {
// SimHash算法实现(假设之前已经有SimHash算法代码)
return simHash;
}
}
点击查看代码
public class HammingDistanceCalculatorImpl implements HammingDistanceCalculator {
@Override
public int calculateHammingDistance(String simHash1, String simHash2) {
// 计算汉明距离算法(假设之前已经有实现)
return distance;
}
}
点击查看代码
public class FileHandlerImpl implements FileHandler {
@Override
public List<String> readFile(String filePath) {
// 读取文件并返回每行内容的列表
}
@Override
public void writeFile(String filePath, String content) {
// 将内容写入文件
}
}
点击查看代码
public class Main {
public static void main(String[] args) {
SimHashCalculator simHashCalculator = new SimHashCalculatorImpl();
HammingDistanceCalculator hammingDistanceCalculator = new HammingDistanceCalculatorImpl();
FileHandler fileHandler = new FileHandlerImpl();
// 读取原文件和疑似抄袭文件
List<String> origContent = fileHandler.readFile("E:\\java\\test\\MyProject\\src\\data\\orig.txt");
List<String> suspectFiles = Arrays.asList("orig_0.8_add.txt", "orig_0.8_del.txt", "orig_0.8_dis_1.txt", "orig_0.8_dis_10.txt", "orig_0.8_dis_15.txt");
String origSimHash = simHashCalculator.calculateSimHash(origContent.toString());
for (String suspectFile : suspectFiles) {
List<String> suspectContent = fileHandler.readFile("E:\\java\\test\\MyProject\\src\\data\\" + suspectFile);
String suspectSimHash = simHashCalculator.calculateSimHash(suspectContent.toString());
int hammingDistance = hammingDistanceCalculator.calculateHammingDistance(origSimHash, suspectSimHash);
// 输出结果
fileHandler.writeFile("E:\\java\\test\\MyProject\\src\\data\\result.txt", suspectFile + "相似度为:" + hammingDistance);
}
}
}
三、单元测试
1.测试代码(测试SimHash值):
点击查看代码
```plaintext
import static org.junit.Assert.assertEquals;
import org.junit.Test;
import java.math.BigInteger;
public class SimHashCalculatorTest {
@Test
public void testCalculateSimHash() {
// 使用SimHash类,计算测试文本的SimHash值
SimHashBatchCompare.SimHash simHash = new SimHashBatchCompare.SimHash("测试文本");
// 使用上一步中打印出来的SimHash值作为期望值
BigInteger expectedSimHash = new BigInteger("707195883685897721"); // 替换为打印的实际值
// 打印实际生成的SimHash值
System.out.println("实际计算出来的SimHash值: " + simHash.simHash().toString());
// 打印期望的SimHash值
System.out.println("期望的SimHash值: " + expectedSimHash.toString());
// 断言实际生成的SimHash值与期望的SimHash值相等
assertEquals(expectedSimHash, simHash.simHash());
}
}
2.测试结果:
四、性能测试
使用Jacoco插件生成代码的覆盖率报告,并确定代码是否被充分测试:
五、性能改进
使用Java的性能分析工具如JProfiler、VisualVM等工具来进行性能分析。通过分析找到性能瓶颈,可能的优化点包括:
• 文件读取优化:减少磁盘I/O操作,采用BufferedReader。
• SimHash计算优化:在计算过程中优化位操作。
• 并行处理:如果有多个文件需要比较,可以使用多线程或并行流来提高处理速度。
- 优化文件读取:
点击查看代码
public List<String> readFile(String filePath) {
List<String> content = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
String line;
while ((line = br.readLine()) != null) {
content.add(line);
}
} catch (IOException e) {
e.printStackTrace();
}
return content;
}
- 并行计算汉明距离:
点击查看代码
suspectFiles.parallelStream().forEach(suspectFile -> {
List<String> suspectContent = fileHandler.readFile("E:\\java\\test\\MyProject\\src\\data\\" + suspectFile);
String suspectSimHash = simHashCalculator.calculateSimHash(suspectContent.toString());
int hammingDistance = hammingDistanceCalculator.calculateHammingDistance(origSimHash, suspectSimHash);
fileHandler.writeFile("E:\\java\\test\\MyProject\\src\\data\\result.txt", suspectFile + "相似度为:" + hammingDistance);
});
六、异常处理
在关键点加上异常处理代码,确保程序不会因为异常而崩溃,并输出友好提示信息。
- 文件读取异常:
点击查看代码
public List<String> readFile(String filePath) {
List<String> content = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
String line;
while ((line = br.readLine()) != null) {
content.add(line);
}
} catch (FileNotFoundException e) {
System.out.println("文件未找到: " + filePath);
} catch (IOException e) {
System.out.println("读取文件时发生错误: " + filePath);
}
return content;
}
- SimHash计算异常:
点击查看代码
public String calculateSimHash(String content) {
if (content == null || content.isEmpty()) {
throw new IllegalArgumentException("内容不能为空");
}
// 继续计算SimHash
return simHash;
}
七、PSP表格
八、运行结果