软工作业2:个人项目

作业概述

这个作业属于哪个课程 软件工程
这个作业要求在哪里 个人项目
这个作业的目标 完成个人项目:设计一个论文查重算法

github链接

github链接

PSP表格

PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时(分钟)
Planning 计划 20 20
· Estimate · 估计这个任务需要多少时间 340 360
Development 开发 300 320
· Analysis · 需求分析 (包括学习新技术) 100 100
· Design Spec · 生成设计文档 30 25
· Design Review · 设计复审 20 15
· Coding Standard · 代码规范 (为目前的开发制定合适的规范) 30 40
· Design · 具体设计 30 40
· Coding · 具体编码 30 30
· Code Review · 代码复审 20 15
· Test · 测试(自我测试,修改代码,提交修改) 40 55
Reporting 报告 40 40
· Test Repor · 测试报告 15 15
· Size Measurement · 计算工作量 10 10
· Postmortem & Process Improvement Plan · 事后总结, 并提出过程改进计划 15 15
· 合计 360 380

模块接口设计与实现

  • TxtIOUtil类:将传入的文件转换为String,也可将String写出到指定的文件中
  • SimHashUtils类:传入String,计算出它的hash值,并以字符串形式输出
  • HammingUtils类:输入两个simHash值,计算它们的海明距离,并计算输出相似度
  • main类:程序主入口,通过传递命令行参数的方式提供文件的位置,调用Util包下的类输出结果
  • MainTest类:单元测试类

模块接口部分性能分析


内存上的占用主要在浮点数、数组集合的创建,调用的是Util包中的方法,无需改进。

单元测试展示

  • Test1至Test7为原文与仿文的比较:
      @Test
    public void Test1(){
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str2 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_add.txt");
        String str3 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_del.txt");
        String str4 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_1.txt");
        String str5 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_10.txt");
        String str6 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_15.txt");
        String ansFileName = "src/test/resources/test/test1.txt";
        double ans1 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str1));
        double ans2 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str2));
        double ans3 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str3));
        double ans4 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str4));
        double ans5 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str5));
        double ans6 = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str1), SimHashUtils.getSimHash(str6));
        TxtIOUtil.writeTxt(ans1, ansFileName);
        TxtIOUtil.writeTxt(ans2, ansFileName);
        TxtIOUtil.writeTxt(ans3, ansFileName);
        TxtIOUtil.writeTxt(ans4, ansFileName);
        TxtIOUtil.writeTxt(ans5, ansFileName);
        TxtIOUtil.writeTxt(ans6, ansFileName);
    }

    @Test
    public void Test2(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String ansFileName = "src/test/resources/test/test2.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }

    @Test
    public void Test3(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_add.txt");
        String ansFileName = "src/test/resources/test/test3.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }

    @Test
    public void Test4(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_del.txt");
        String ansFileName = "src/test/resources/test/test4.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }

    @Test
    public void Test5(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_1.txt");
        String ansFileName = "src/test/resources/test/test5.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }

    @Test
    public void Test6(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_10.txt");
        String ansFileName = "src/test/resources/test/test6.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }

    @Test
    public void Test7(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_15.txt");
        String ansFileName = "src/test/resources/test/test7.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans,ansFileName);
    }
  • 测试覆盖率:
  • 测试耗时时间:

异常处理说明

  • 文件不存在的异常测试:
    /**
     * 文件不存在异常测试
     * @throws Exception
     */
    @Test
    public void Test8() throws Exception {
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.8_dis_.txt");
        String ansFileName = "src/test/resources/test/test8.txt";
        if(str0 == "" || str1 == ""){
            throw new Exception("文件不存在");
        }
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }
  • 文件为空的异常测试:
    /**
     * 文件为空异常测试
     */
    @Test
    public void Test9(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.1.txt");
        String ansFileName = "src/test/resources/test/test9.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }
  • 文件字数太少的异常测试:
    /**
     * 文件字数太少异常测试
     */
    @Test
    public void Test10(){
        String str0 = TxtIOUtil.readTxt("src/test/resources/test/orig.txt");
        String str1 = TxtIOUtil.readTxt("src/test/resources/test/orig_0.2.txt");
        String ansFileName = "src/test/resources/test/test10.txt";
        double ans = HammingUtils.getSimilarity(SimHashUtils.getSimHash(str0), SimHashUtils.getSimHash(str1));
        TxtIOUtil.writeTxt(ans, ansFileName);
    }
  • 在SimHashUtils的getSimHash方法中加入代码来处理异常情况:
  try {
            if (str.length() == 0) throw new Exception("文件为空");
            if (str.length() < 200) throw new ShortStringException("文本过短,难以判断!");
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
posted @ 2023-09-13 23:15  machuze  阅读(133)  评论(0编辑  收藏  举报