第二周作业WordCount

第二周作业心得

github项目地址

PSP表格

PSP2.1	PSP阶段	预估耗时（分钟）	实际耗时（分钟）
Planning	计划	10	10
· Estimate	· 估计这个任务需要多少时间	10	10
Development	开发	575	1180
· Analysis	· 需求分析 (包括学习新技术)	60	60
· Design Spec	· 生成设计文档	30	30
· Design Review	· 设计复审 (和同事审核设计文档)	0	0
· Coding Standard	· 代码规范 (为目前的开发制定合适的规范)	5	10
· Design	· 具体设计	30	60
· Coding	· 具体编码	300	500
· Code Review	· 代码复审	30	200
· Test	· 测试（自我测试，修改代码，提交修改）	100	320
Reporting	报告	45	55
· Test Report	· 测试报告	30	40
· Size Measurement	· 计算工作量	5	5
· Postmortem & Process Improvement Plan	· 事后总结, 并提出过程改进计划	10	10
	合计	630	1345

解题思路

刚拿到题目的时候，觉得这个程序功能并不复杂，而且与编译原理课程中编写的词法分析器有很多相似之处。但是我想到当时手写词法分析器的时候，十分繁琐，特别是一些边界条件的处理总是判断不好。后来想到flex/bison中使用正则表达式来进行词法分析，觉得这是一个非常容易实现的思路，所以就决定使用正则表达式来编写WordCount。

1.在网络上查找了正则表达式手册[1]，知道了常用的正则表达式的写法

2.查找Java源程序如何打包为jar包，并转成exe格式的方法[2]

程序设计实现过程

因为代码是围绕文本处理来进行，所以只设计了一个类WordCount，一共有如下个函数：

public int countWord();
public int countLine();
public int countChar();
public int countEmptyLine();
public int countCommentLine();
public int countCodeLine();
public int countWordWithStopList();

这些函数之间的关系并不是很大，在countCodeLine()中调用了countCommentLine(),countEmptyLine(), countLine()。

后面会看到这些函数的具体实现都很简单，所以不太需要流程图来辅助说明。

代码说明

将要统计的文件读入一个String类型的buffer变量中

则行数可以按照下列方式统计：

public int countLine()
{
    if(buffer.length() == 0)
        return 0;
	String[] lines = buffer.split("\n", -1);
    return lines.length;
}

同理，字符数是按照,\space\t\n来进行分割的，则字符数统计方式为

public int countWord()
{
    if(buffer.length() == 0)
        return 0;
	String[] words = buffer.split("[, \t\n]+");
    return words.length;
}

对于空行，写出空行对应的正则表达式[\\s]*[\\S]?[\\s]*

然后用这个正则表达式去匹配文本内容即可：

public int countEmptyLine()
    {
        if(buffer.length() == 0)    return 0;
        String regex = "[\\s]*[\\S]?[\\s]*";
        Pattern emptyLinePattern = Pattern.compile(regex);

        String[] lines = buffer.split("\n", -1);

        Matcher matcher;
        int result = 0;
        for(String line : lines)
        {
            matcher = emptyLinePattern.matcher(line);
            if(matcher.matches())
                result++;
        }
        return result;

    }

同理，注释行的正则表达式为(//.*\\n)|(/\\*(.|\\n)*\\*/[\\n]?)，函数编写方法与空行类似。

public int countCommentLine()
    {
        if(buffer.length() == 0)    return 0;
        String temp = "";
        String regex = "(//.*\\n)|(/\\*(.|\\n)*\\*/[\\n]?)";
        Pattern commentPattern = Pattern.compile(regex);
        Matcher matcher = commentPattern.matcher(buffer);

        while(matcher.find())
        {
            temp += matcher.group();
        }
        if(temp.equals("")) return 0;

        return temp.split("\n").length;
    }

代码行数的计算方式用代码行数 = 总行数 - 空行数 - 注释行数即可。

测试设计过程

1.统计单词数,行数,代码行数

-c 统计字符数

路径	输入	预期输出	实际输出
A-B-C	null	0	0
A-B-D	a\n	2	2

-l 统计行数

路径	输入	预期输出	实际输出
A-B-C	null	0	0
A-B-D	abc\nabc	2	2

-w 统计单词数

路径	输入	预期输出	实际输出
A-B-C	null	0	0
A-B-D	dog cat mouse	3	3

2.统计空行数，注释行数

路径	输入	预期输出	实际输出
A-B-D	null	空行数：0，注释行数：0	空行数：0，注释行数：0
A-B-C-E-G	"}//"	空行数：0，注释行数：1	空行数：0，注释行数：1
A-B-C-E-F	"\t\t\nabc\n\t\t"	空行数：2，注释行数：0	空行数：2，注释行数：0

对于一些文件的测试用例：

file.c的内容如下：

File write,name = new File(outputPath);
            writename.createNe|wFile();
            Buffe,redWriter out = new BufferedWrit,er(new FileWriter(writename));
            out.close();

stopList.txt内容如下：

out = file else void

输入	预期输出	实际输出
wc.exe -w file.c	file.c,单词数：16	file.c,单词数：16
wc.exe -l file.c	file.c,行数：4	file.c,行数：4
wc.exe -c file.c	file.c,字符数：186	file.c,字符数：186
wc.exe -w -l -c file.c	file.c,字符数：186 file.c,单词数：16 file.c,行数：4	file.c,字符数：186 file.c,单词数：16 file.c,行数：4
wc.exe -a file.c	file.c,代码行/空行/注释行：4/0/0	file.c,代码行/空行/注释行：4/0/0
wc.exe -w -l -c empty.c	empty.c,字符数：0 empty.c,单词数：0 empty.c,行数：0	empty.c,字符数：0 empty.c,单词数：0 empty.c,行数：0
wc.exe -w -l -c -a file.c	file.c,字符数：186 file.c,单词数：16 file.c,行数：4 file.c,代码行/空行/注释行：4/0/0	file.c,字符数：186 file.c,单词数：16 file.c,行数：4 file.c,代码行/空行/注释行：4/0/0
wc.exe -s -w -l -c -a *.c	file2.c,字符数：1 file2.c,单词数：0 file2.c,行数：2 file2.c,代码行/空行/注释行：0/2/0 aaa.c,字符数：1 aaa.c,单词数：0 aaa.c,行数：2 aaa.c,代码行/空行/注释行：0/2/0 file.c,字符数：186 file.c,单词数：16 file.c,行数：4 file.c,代码行/空行/注释行：4/0/0	file2.c,字符数：1 file2.c,单词数：0 file2.c,行数：2 file2.c,代码行/空行/注释行：0/2/0 aaa.c,字符数：1 aaa.c,单词数：0 aaa.c,行数：2 aaa.c,代码行/空行/注释行：0/2/0 file.c,字符数：186 file.c,单词数：16 file.c,行数：4 file.c,代码行/空行/注释行：4/0/0
wc.exe -w -l -c -o output.txt	output.txt	output.txt
wc.exe -w -l -c -a -e stopList.txt	file2.c,字符数：1 file2.c,单词数：0 file2.c,行数：2 file2.c,代码行/空行/注释行：0/2/0 aaa.c,字符数：1 aaa.c,单词数：0 aaa.c,行数：2 aaa.c,代码行/空行/注释行：0/2/0 file.c,字符数：186 file.c,单词数：14 file.c,行数：4 file.c,代码行/空行/注释行：4/0/0	file2.c,字符数：1 file2.c,单词数：0 file2.c,行数：2 file2.c,代码行/空行/注释行：0/2/0 aaa.c,字符数：1 aaa.c,单词数：0 aaa.c,行数：2 aaa.c,代码行/空行/注释行：0/2/0 file.c,字符数：186 file.c,单词数：14 file.c,行数：4 file.c,代码行/空行/注释行：4/0/0
wc.exe -w -l -c -a -s *.c -e stopList.txt -o output2.txt	output2.txt

output.txt内容如下：

file2.c,字符数：1
file2.c,行数：2
file2.c,单词数：0
file2.c,代码行/空行/注释行：0/2/0
aaa.c,字符数：0
aaa.c,行数：0
aaa.c,单词数：0
aaa.c,代码行/空行/注释行：0/0/0
file.c,字符数：186
file.c,行数：4
file.c,单词数：14
file.c,代码行/空行/注释行：4/0/0

读写文件部分其实是高风险代码，但是也都有try,catch和exception的处理。

测试脚本

test.bat

#(test1.c是空文件)
wc.exe -w -l -c test\test1.c
#(test2.c是单字符文件)
wc.exe -w -l -c test\test2.c
#(test3.c是全为符号的字符文件)
wc.exe -w -l -c -a test\test3.c
#(test4.c是普通的代码文件)
wc.exe -w -l -c test\test4.c -e stopList.txt
wc.exe -w -l -c -a test\test4.c -o output1.txt
wc.exe -w -l -c -a test\test4.c -e stopList.txt -o output2.txt
wc.exe -w -l -c -a -s test\*.c -e stopList.txt -o output3.txt

参考文献链接

[1] http://tool.oschina.net/uploads/apidocs/jquery/regexp.html

[2] http://blog.csdn.net/sunkun2013/article/details/13167099

posted @ 2018-03-21 23:38 LarryYang97 阅读(146) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

LarryYang97