WordCounter Teamwork

基础任务

Contributor	贡献占比
蒋志远	37%
李露阳	30%
鲁平	33%

PSP

PSP2.1	PSP阶段	预估耗时实际耗时（分钟）	实际耗时（分钟）
Planning	计划	20	30
Estimate	估计这个任务需要多少时间	10	10
Development	开发	530	740
- Analysis	- 需求分析（包括学习新技术）	100	160
- Design Spec	- 生成设计文档	100	150
- Coding Standard	- 代码规范 (为目前的开发制定合适的规范)	10	10
- Design	- 具体设计	30	30
- Coding	- 具体编码	200	240
- Code Review	- 代码复审	30	30
- Test	- 测试（自我测试，修改代码，提交修改）	60	120
Reporting	报告	180	320
- Test Report	- 测试报告	40	65
- Size Measurement	- 计算工作量	10	15
- Postmortem & Process Improvement Plan	- 事后总结, 并提出过程改进计划	130	240
	合计	740	1100

模块划分

将程序划分成三个部分，分别管控 IO 、核心功能的实现，和主函数。
因为核心功能比较复杂，其中我负责和李露阳结对编程，实现核心功能 WordCounter.java，编写主要的单元测试，并且教他优化代码，编写 Javadoc。
各模块设计如下：

1. Main

/**
 * com.hust.wcPro
 * Created by Blues on 2018/3/27.
 */

import java.util.HashMap;

public class Main {
    static public void main(String[] args) {

        IOController io_control = new IOController();
        
        String valid_file = io_control.get(args);
        if (valid_file.equals("")) {
            return ;
        }
        
        WordCounter wordcounter = new WordCounter();
        
        HashMap<String, Integer> result = wordcounter.count(valid_file);

        io_control.save(result);

    }
}

Main函数负责所有接口的调用，逻辑很简单，即IO获取有效的文件参数，调用 WordCounter 类的核心函数，IO 将结果排序后存入 result.txt 中。

2. IOController

IOController 类负责管控 io，具体设计如下：

class IOController {
    IOController() {}
    
    /**
     * Parses the main function arguments
     * 
     * @param args the main function arguments
     * @return a valid file name
     */
    public String get(String[] args);

    /**
     * Saves the result sorted
     * 
     * @param result the result contain word as key as count as value
     * @return the state code of operation
     */
    public int save(HashMap<String, Integer> result);
}

get() 负责解析主函数的参数，返回一个合法的，存在的文件名。
save() 负责将输出传入的结果排序后输出到 result.txt 文件中。

3. WordCounter

WordCounter 类负责实现核心功能 count() 函数，判断合法的单词，处理老师要求的各种特殊情况，统计传入的文件中的各字符的数量，结果以 HashMap 的形式返回。

/**
 * The {@code WordCounter} class is a multi-purpose text file
 * counter, which can judge a legal word and count word numbers,
 * to put the result to a {@code HashMap}.
 * <br><br>
 *
 * @author VectorLu
 * @author YangLeee
 * @since JDK1.8
 */
public class WordCounter {
    
    WordCounter() {
    }
    
     /**
     * Return whether the argument is in the English alphabet.
     * @param c
     * @return {@code true} if the argument is in the English alphabet,
     * otherwise {@code false}.
     */
    private boolean isEngChar(char c);
    
    /**
     * Return whether the argument is hyphen, which is {@code -}.
     * @param c
     * @return {@code true} if the argument is hyphen
     * otherwise {@code false}.
     */
    private boolean isHyphen(char c);
    
    /**
     * Counts the words in the specific file
     * 
     * @param filename the file to be counted
     * @return the result saves the word(lowercased) as key and count as value
     */
    public HashMap<String, Integer> count(String filename);
}

项目管理

为了能高效的合作以及更好的项目管理，我们选择使用 Gradle 进行项目的管理以及依赖管理，使用也可以更好的使用 JUnit5 进行单元测试（其中曾经混用过 JUnit4，并且发现 JUnit5 更为严谨，后全部迁移至 JUnit5)。因为多成员合作，我们使用 Git 进行源代码管理。

build.gradle 来自我们组组员蒋志远同学

其中，Gradle 的配置文件 build.gradle 内容如下，可供参考：

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath 'org.junit.platform:junit-platform-gradle-plugin:1.1.0'
    }
}

plugins {
    id 'com.gradle.build-scan' version '1.12.1'
    id 'java'
    id 'eclipse'
    id 'idea'
    id 'maven'
}

buildScan {
    licenseAgreementUrl = "https://gradle.com/terms-of-service"
    licenseAgree = "yes"
}

apply plugin: 'org.junit.platform.gradle.plugin'

int javaVersion = Integer.valueOf((String) JavaVersion.current().getMajorVersion())
if (javaVersion < 10) apply plugin: 'jacoco'

jar {
    baseName = 'wcPro'
    version = '0.0.1'
    manifest {
        attributes 'Main-Class': 'Main'
    }
}

repositories {
    mavenCentral()
}

dependencies {
    testCompile (
        'org.junit.jupiter:junit-jupiter-api:5.0.3',
        'org.json:json:20090211'
    )

    testRuntime(
        'org.junit.jupiter:junit-jupiter-engine:5.0.3',
        'org.junit.vintage:junit-vintage-engine:4.12.1',
        'org.junit.platform:junit-platform-launcher:1.0.1',
        'org.junit.platform:junit-platform-runner:1.0.1'
    )
}

task wrapper(type: Wrapper) {
    description = 'Generates gradlew[.bat] scripts'
    gradleVersion = '4.6'
}

测试

1. 单元测试

白盒测试的测试方法有代码检查法、静态结构分析法、静态质量度量法、逻辑覆盖法、基本路径测试法、域测试、符号测试、路径覆盖和程序变异。
白盒测试法的覆盖标准有逻辑覆盖、循环覆盖和基本路径测试。其中逻辑覆盖包括语句覆盖、判定覆盖、条件覆盖、判定/条件覆盖、条件组合覆盖和路径覆盖。

要保证测试到每一个方法，而复杂的方法，可能需要使用多种测试方法，如边界测试，路径覆盖。

`public` 方法测试

单元测试我们测试的粒度是到接口，因为项目主要包含 3 个大的接口，所以我们要对其分别进行测试。主要接口：

IOController.get()
IOController.save()
WordCounter.count()

我们设计了 UnitTest 类来进行接口测试，我主要负责对 WordCounter.count() 进行单元测试（包括边界测试），测试了非法路径、针对老师要求的五种特殊情况做了详细的测试。

第一，由连续的若干个英文字母组成的字符串，例如，software，
第二，用连字符（即短横线）所连接的若干个英文单词也视为1个单词，例如，content-based，视为1个单词。
注意，单词不区分大小写，不考虑英文以外的其他语言，且仅考虑半角。
有关单词识别的部分典型情况的说明：
第一，Let’s，这种包含单引号的情况，视为2个单词，即let和s。
第二，night-，带短横线的单词，视为1个单词，即night。
第三，“I，带双引号的单词，视为1个单词，即i。
第四，TABLE1-2，带数字的单词，视为1个单词，即table。
第五，(see Box 3–2).8885d_c01_016，带数字、常用字符和单词的情况，视为4个单词，即see, box, d, c。

单元测试内容（使用了 @DisplayName 来说明，而且由测试方法名易知其测试目的，如下：

使用命令 gradle test 进行测试

或者在 Idea 里直接点击运行测试：

class UnitTest {

    UnitTest() {}

    String fileParentPath = "src/test/resources/";

    @Test
    void testCountEmptyFile() {
        String fileName = "emptyFile.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(0, result.size());
    }

    @Test
    @DisplayName("Border test: wc.count(endWithHyphen.txt)")
    void testCountFileEndWithHyphen() {
        String fileName = "endWithHyphen.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(1, result.size());
    }

    @Test
    @DisplayName("Bord test: wc.count(startWithHyphen.txt)")
    void testCountFileStartWithHyphen() {
        String fileName = "startWithHyphen.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(true, result.containsKey("hyphen"));
    }

    @Test
    @DisplayName("Bord test: wc.count(startWithHyphen.txt)")
    void testNumberStartWithHyphen() {
        String fileName = "startWithHyphen.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(1, result.size());
    }

    @Test
    @DisplayName("Bord test: wc.count(startWithHyphen.txt)")
    void testCountFileWithQuatation() {
        String fileName = "withQuatation.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(2, result.size());
    }


    @Test
    void testCountHyphen() {
        String fileName = "endWithHyphen.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        HashMap expect = new HashMap(1);
        expect.put("hyphen", 1);
        assertEquals(expect.keySet(), result.keySet());
        for (Object key: expect.keySet()) {
            assertEquals((int)expect.get(key), (int)result.get(key));
        }
    }

    @Test
    @DisplayName("Border test: single quotation mark")
    void testCountSingleQuotationMark() {
        String fileName = "singleQuotationMark.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(2, result.size());
    }

    @Test
    @DisplayName("Border test: double quotation mark")
    void testCountDoubleQuotationMark() {
        String fileName = "doubleQuotationMark.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(1, result.size());
    }

    @Test
    @DisplayName("Border test: word with number")
    void testCountWordWithNumber() {
        String fileName = "wordWithNumber.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(1, result.size());
    }

    @Test
    @DisplayName("Border test: word with multiple kinds of char")
    void testCountMultiple() {
        String fileName = "multiple.txt";
        String relativePath = fileParentPath + fileName;
        WordCounter wc = new WordCounter();
        HashMap result = wc.count(relativePath);
        assertEquals(4, result.size());
    }

}

`private` 方法测试

除了对于公有方法的测试，我还负责对私有方法进行测试。而对于无法没有访问权限的两个私有方法，要如何测试呢？（虽然 WordCounter.java 的两个私有方法十分简单，不过仍有测试以防万一的必要，另外也可以学习如何测试私有方法）。

很显然，碰到这种情况，我们只能用 reflection 即反射来处理，如下：

/**
 * Use reflection to test {@code private} method {@isEngChar()}
 */
@Test
void testIsEngChar() {
    /* a English Alphabet */
    final char[] ALPHABET = new char[52];

    final char[] NOT_ALPHABET = {',', '&', '@', '\\', '/'};

    char ch = 'a';
    for (int i = 0; i < 26; i++) {
        ALPHABET[i] = ch;
        ch++;
    }
    ch = 'A';
    for (int i = 26; i < 52; i++) {
        ALPHABET[i] = ch;
        ch++;
    }

    /* get a {@code Class} object of {@code WordCounter} */
    Class<WordCounter> classOfWordCounter = WordCounter.class;
    try {
        /* get an instance of target class */
        Object wcInstance = classOfWordCounter.newInstance();
        try {
            Method privateMethod = classOfWordCounter.getDeclaredMethod("isEngChar", char.class);
            privateMethod.setAccessible(true);
            for (char letter: ALPHABET) {
                try {
                    boolean result = (Boolean) privateMethod.invoke(wcInstance, new Object[]{letter});
                    assertEquals(true, result);
                } catch (InvocationTargetException e) {
                    e.printStackTrace();
                }
            }
            for (char notLetter: NOT_ALPHABET) {
                try {
                    boolean result = (Boolean) privateMethod.invoke(wcInstance, new Object[]{notLetter});
                    assertEquals(false, result);
                } catch (InvocationTargetException e) {
                    e.printStackTrace();
                }
            }

        } catch (NoSuchMethodException e) {
            e.printStackTrace();
        }
    } catch (InstantiationException e) {
        e.printStackTrace();
    } catch (IllegalAccessException e) {
        e.printStackTrace();
    }
}

2. 自动化黑盒测试

——来自我们组员蒋志远同学

为了能高效进行测试，我们采用了自动化脚本的方式进行测试能更好的进行压力测试。

首先我们需要大量的、正确的测试用例，每个测试用例的大小必须要足够大、内容也要保证正确。为此，手写测试用例是绝对不实际的，所以我们需要自动生成正确的测试用例。为了达到这个目的，我们组用 Python 写了一个简单的脚本，用来自动生成测试用例，内容随机但是大小可控：

from functools import reduce
import numpy as np
from numpy.random import randint
import json
import sys, os, re

elements = {
    "words": "abcdefghijklmnopqrstuvwxyz-",
    "symbol": "!@#$%^&*()~`_+=|\\:;\"'<>?/ \t\r\n1234567890-"
}

def generate_usecase(configs):
    global elements
    
    path = os.path.join('test', 'testcase')
    result_path = os.path.join('test', 'result')
    if not os.path.exists(path):
        os.makedirs(path)
    if not os.path.exists(result_path):
        os.makedirs(result_path)
    for config_idx, config in enumerate(configs):
        word_dict = {}
        i = 0
        # 这里用于生成一个合法的单词
        while i < config['num_of_type']:
            word_len = randint(*config['word_size'])
            word_elements = randint(0, len(elements['words']), word_len)
            word = np.array(list(elements['words']))[word_elements]
            word = ''.join(word)
            # 这里将单词中不合法的 ‘-’ 转化删除掉
            word = re.sub(r'-{2,}','-', word)
            word = re.sub(r'^-*', '', word)
            word = re.sub(r'-*$', '', word)
            if len(word) == 0: # 运气不好全是 ‘-’ 那么单词生成失败，从新生成单词 
                continue
            word_dict[word] = 0
            i += 1
        total_count = 0
        # 设置单词重复出现的次数
        for key in word_dict.keys():
            word_dict[key] = randint(*config['word_repeat'])
            total_count += word_dict[key]
        word_dict_tmp = word_dict.copy()
        final_string = ''
        # 构造最终的用例文本
        for i in range(total_count):
            key, val = None, 0
            while (val == 0):
                key_tmp = list(word_dict_tmp.keys())[randint(len(word_dict))]
                val = word_dict_tmp[key_tmp]
                if val != 0:
                    key = key_tmp
                    word_dict_tmp[key_tmp] = val-1
            # 这里将单词的内容随机大小写
            word_upper_case = randint(0, 2, len(key))
            key = ''.join([s.upper() if word_upper_case[i] > 0 else s for i, s in enumerate(list(key))])
            final_string += key
            sep = ''
            # 构造合法的分隔符
            for _ in range(randint(*config['sep_size'])):
                sep += elements['symbol'][randint(0, len(elements['symbol']))]
            if sep == '-':
                while sep == '-':
                    sep = elements['symbol'][randint(0, len(elements['symbol']))]
            final_string += sep

        with open(os.path.join(path, '{}_usecase.txt').format(config_idx), 'w') as f:
            f.write(final_string)
                   
        sorted_key = sorted(word_dict.items(), key=lambda kv:(-kv[1], kv[0]))
        result = ''
        for key, val in sorted_key:
            result += key + ': ' + str(val) + '\n'

        with open(os.path.join(path, '{}_result_true.txt'.format(config_idx)), 'w') as f:
            f.write(result)

        print('test case {} generated'.format(config_idx))

def main():
    config = sys.argv[-1]
    with open(config) as f:
        config = json.load(f)
    
    generate_usecase(config)

if __name__ == '__main__':
    main()

其中的配置文件的格式大致如下：

[
    {
        "num_of_type": 10,
        "word_size": [1, 10],
        "sep_size": [1,3],
        "word_repeat": [1, 300]
    },
    {
        "num_of_type": 20,
        "word_size": [1, 20],
        "sep_size": [1,3],
        "word_repeat": [20, 300]
    }
]

内容很简单，只需要配置有多少个单词，每个单词长度范围，分隔符的长度范围，每个单词重复出现的大小范围，即可生成相应的测试用例和正确的排序后的结果。

..........
YMtyibqY
zxz*^QtRWv*O=3KDvJKmpQb86MThOdnP
ZXZ>#aAys>&mthodnP>`qtRWv(QTRWV*YmTYiBqY^\O9Zxz_?MthOdNP$ zxZ="MtHODnP#!yMTYibqY:o%2AaYS<#QTRwV8MTHOdnp!o#+MTHodNP)*QTRWV;YmtyiBQY	ZXz$hesS`aayS_#FKcU=)AAys;fKcu-$Z$MthoDnp
 YMTYIBqy/3aAyS!Zxz'yMtyiBQY~1KdvjKMpQB'@aAYs'Z'zXZ3z2hESs5aAys@yMtyiBQy4qtRWV3kDvJKMpQB:9yMTyIbqy_YmtyIBqY
KdvJKmpqB>YMtYibQy
>z2O
z`^FKCu$<QTRwv#<mtHOdnP%z+z"*FKCu9hESs<fkcu!YMtYiBqY"HesS9MtHODNp
ZxZ
.........

👆上面是自动生成的用例的部分内容。

mthodnp: 287
o: 253
aays: 250
kdvjkmpqb: 232
fkcu: 170
qtrwv: 151
ymtyibqy: 133
hess: 67
zxz: 52
z: 32

👆上面是生成的正确答案。

测试用例已经生成好了，要做的就是让他能自动运行以及统计运行时间了，所以我设计的一下的脚本来完成这个费事的工作，内容在项目的 build.sh 中：

echo '--------- building jar -----------'
gradle build -x test

echo '------ generating test case ------'
python ./scripts/testcase_generate.py ./scripts/config.json

echo '-------- setting test env --------'
cp ./build/libs/* ./test
echo 'jar copyed to ./test'
echo '----------- testing --------------'
declare -i num_test
num_test=($(ls -l ./test/testcase | wc -l)-1)/2
echo 'number of test:' ${num_test}
cd test
jarname=$(ls | grep *.jar)
declare -i correct_cnt
correct_cnt=0
echo 'testing ' ${jarname}
num_test=num_test-1
for i in $(seq 0 ${num_test})
do
    start=`python -c 'import time; print (time.time())'`
    java -jar ${jarname} ./testcase/${i}_usecase.txt
    end=`python -c 'import time; print (time.time())'`
    cmp result.txt ./testcase/${i}_result_true.txt
    if [ ${?} == 0 ]; then
        correct_cnt=correct_cnt+1
        echo 'test ' $i ' passed...time: ' `bc <<< $end-$start`
    else
        echo 'test ' $i ' failed...time: ' `bc <<< $end-$start`
    fi
    mv result.txt ./result/${i}_result.txt
done
num_test=num_test+1
echo 'test passed: ' ${correct_cnt} 'total: ' ${num_test}
cd ..

在 WordCounter/ 的目录下用 sh build.sh 运行黑盒测试。我们组的运行结果如下：

--------- building jar -----------

BUILD SUCCESSFUL in 0s
2 actionable tasks: 2 executed
------ generating test case ------
test case 0 generated
test case 1 generated
test case 2 generated
test case 3 generated
test case 4 generated
test case 5 generated
test case 6 generated
test case 7 generated
test case 8 generated
-------- setting test env --------
jar copyed to ./test
----------- testing --------------
number of test: 9
testing  wcPro-0.0.1.jar
test  0  passed...time:  0.16925787925720215
test  1  passed...time:  0.20428085327148438
test  2  passed...time:  0.31932592391967773
test  3  passed...time:  0.7172000408172607
test  4  passed...time:  1.7908999919891357
test  5  passed...time:  2.6144556999206543
test  6  passed...time:  0.38278985023498535
test  7  passed...time:  0.2935812473297119
test  8  passed...time:  0.29720091819763184
test passed:  9 total:  9

由此可以完成自动化的黑盒测试，可以及时查看运行时间以及正确性。

扩展任务：静态测试

我们组使用了 Alibaba Coding Guidelines 来做静态测试。使用了其 Plugin for IntelliJ Idea，插件来检测代码。检测到如下缺陷：

解决 `NullPointerException` 隐患

《阿里巴巴Java开发手册》中指出：

可见 nowWord.equals("") 中由于变量 nowWord 可能为 null，这样会抛出严重的异常 NullPointerException，故每次调用该方法之前要首先判断 nowWord 是否为 null。而如果用字符串常量 "" 调用 equals() 方法，即将代码改为 "".equals(nowWord) 就不会有这样的隐患。

单行注释独立成行

李露阳同学的代码中，为了方便将注释写在了代码的后面，不符合规范，告知他修改。

`HashMap` 初始化

如果我们能提前得知 HashMap 的大小，可以在初始化的时候就设定。不过我们的代码中，HashMap 的大小是不确定的，选择 Disable Inspection 选项忽略它。

高级任务：性能测试及优化

性能测试

以下时间以秒记。

test  0  passed...time:  0.16925787925720215
test  1  passed...time:  0.20428085327148438
test  2  passed...time:  0.31932592391967773
test  3  passed...time:  0.7172000408172607
test  4  passed...time:  1.7908999919891357
test  5  passed...time:  2.6144556999206543
test  6  passed...time:  0.38278985023498535
test  7  passed...time:  0.2935812473297119
test  8  passed...time:  0.29720091819763184
test passed:  9 total:  9

TestCase	Size	Time / s
0	4K	0.16
1	56K	0.20
2	510K	0.30
3	3.5M	0.57
4	17.7M	1.81
5	25.4M	2.26
6	1.5M	0.39
7	300K	0.27
8	167K	0.23

评审

我们对核心功能 WordCounter 的 count() 方法进行了代码评审，参考静态测试给出的结果，发现有很多编码习惯的问题需要改进，对于一些不安全的操作，比如我提出利用单个字符读取的过程中，循环必须首先判断是否为 -1 ，否则在使用 (char) 在强制类型转换时会得到意想不到的结果。

还发现对文件的读取、遍历方式和情况进行了讨论，商讨是否一个一个字符的读取分析会使用太多 IO 时间。

结合 config.json，可知单词种类和数量，是影响程序运行的重要因素。核心代码的算法已经没有什么可以简化的余地。不过对于文件读取的操作，我认为可以使用更加安全高效的 try-with-resources 语法来改进。

优化

try-with-resources 是 JDK1.7 之后引入的语法，可以自动调用 close() 方法，而不用担心嵌套的异常导致资源不关闭。详见 Oracle Official Java Document -- try-with-resource。

posted @ 2018-04-08 21:05 VectorLu 阅读(244) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

VectorLu