软工第一次作业

这个作业属于哪个课程 https://edu.cnblogs.com/campus/gdgy/informationsecurity1812
这个作业要求在哪里 https://edu.cnblogs.com/campus/gdgy/informationsecurity1812/homework/11155
这个作业的目标 学习使用PSP表格,学习commit规范

计算模块接口的设计与实现过程

实现思路

一开始没什么思路,想着通过遍历每一个字,采取栈的数据结构进行存储,再与抄袭的文章内容进行对比得出重复率,但仔细一想这种方式行不通,只要在文章中插入一些其他文字就无法查出重复。

后来到网络上搜索,找到阮一峰的一篇文章:TF-IDF与余弦相似性的应用(二):找出相似文章,思路为对文章进行分词,分词之后计算词频,列出二者的词频向量,然后利用余弦定理计算出两个向量的夹角,当夹角越小,两条向量约靠近,即文章重复率越高。

分词这一步我用到了 hanlp 分词,这一步要判断字符为汉字,即标点符号不进行统计。

然后遍历存放着词与词频信息的 map,计算余弦值。最后根据余弦值得出结果。

String readFile(String fileNmae)
Boolean writeFile(String value, String fileName)
// 用来读写文件的类

Map<String, List<Integer>> CountWord(String value)
// 用于进行分词操作

Double CountCos(Map<String,List<Integer>>, Map<String,List<Integer>>)
// 用于通过求向量余弦计算相似度

程序流程图

程序运行结果

orig_0.8_add.txt  0.8695990639733713
orig_0.8_del.txt  0.7498838191640381
orig_0.8_dis_1.txt 0.9206491294709916
orig_0.8_dis_10.txt 0.804067893296461
orig_0.8_dis_15.txt 0.6575365154781483

性能分析


list 使用内存最多,每一个词都要用一个 list 来储存

单元测试展示

public class MainStart {
    @Test
     public void sameTest(){
        String path = "D:\\test\\orig_0.8_add.txt";
        String path2 = "D:\\test\\orig_0.8_add.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
    @Test
    public void addTest(){
        String path = "D:\\test\\orig.txt";
        String path2 = "D:\\test\\orig_0.8_add.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
    @Test
    public void delTest(){
        String path = "D:\\test\\orig.txt";
        String path2 = "D:\\test\\orig_0.8_del.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
    @Test
    public void disTest(){
        String path = "D:\\test\\orig.txt";
        String path2 = "D:\\test\\orig_0.8_dis_1.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
    @Test
    public void dis10Test(){
        String path = "D:\\test\\orig.txt";
        String path2 = "D:\\test\\orig_0.8_dis_10.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
    @Test
    public void dis15Test(){
        String path = "D:\\test\\orig.txt";
        String path2 = "D:\\test\\orig_0.8_dis_15.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
    @Test
    public void Test(){
        String path = "D:\\test\\orig.txt";
        String path2 = "D:\\test\\orig_0.8_dis_15.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }

    @Test
    public void NullpointTest(){
        String path = "";
        String path2 = "";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
    @Test
    public void DIYpointTest(){
        String path = "D:\\test\\1000.txt";
        String path2 = "D:\\test\\4.txt";
        Map<String, List<Integer>> stringListMap = TokenizerUtil.CountWord(path);
        Map<String, List<Integer>> stringListMap2 = TokenizerUtil.CountWord(path2);
        Double aDouble = TokenizerUtil.CountCos(stringListMap, stringListMap2);
        FileUtil.writeFile("D:\\test\\test\\res.txt",String.valueOf(aDouble));
        System.out.println(aDouble);
    }
}



PSP表格

PSP2.1 Personal Software Process Stages 预估耗时(分钟) 实际耗时(分钟)
·Planning ·计划 100 120
· Estimate · 估计这个任务需要多少时间 400 800
·Development ·开发 480 640
· Analysis · 需求分析 (包括学习新技术) 60 120
· Design Spec · 生成设计文档 60 60
· Design Review · 设计复审 60 60
· Coding Standard · 代码规范 (为目前的开发制定合适的规范) 30 30
· Design · 具体设计 30 30
· Coding · 具体编码 180 300
· Code Review · 代码复审 60 80
· Test · 测试(自我测试,修改代码,提交修改) 60 120
Reporting 报告 60 60
· Test Repor · 测试报告 20 30
· Size Measurement · 计算工作量 10 10
· Postmortem & Process Improvement Plan · 事后总结, 并提出过程改进计划 30 30
· 合计 1290 2250
posted @ 2020-09-24 22:30  一个兢兢业业的切图仔  阅读(181)  评论(0编辑  收藏  举报