北航面向对象设计与构造2021第一单元作业总结
一、程序特征分析
本单元作业是实现一个多项式的求导程序,支持四则运算、乘方、三角函数、嵌套表达式和错误格式检测。
1. 复杂性分析
复杂性矩阵如下所示,除了解析常量、生成逆波兰表达式、生成表达式树等工具方法和格式检查、解析表达式等工具类外,核心架构中没有特别复杂的方法或类。
Method | CogC | ev(G) | iv(G) | v(G) |
---|---|---|---|---|
derivative.Main.main(String[]) |
1 | 1 | 2 | 2 |
derivative.atom.Constant.Constant(BigInteger) |
0 | 1 | 1 | 1 |
derivative.atom.Constant.add(Constant) |
0 | 1 | 1 | 1 |
derivative.atom.Constant.getValue() |
0 | 1 | 1 | 1 |
derivative.atom.Constant.multiply(Constant) |
0 | 1 | 1 | 1 |
derivative.atom.Constant.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.atom.Constant.toString() |
0 | 1 | 1 | 1 |
derivative.atom.Variable.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.atom.Variable.toString() |
0 | 1 | 1 | 1 |
derivative.compound.Add.Add(Compound, Compound) |
0 | 1 | 1 | 1 |
derivative.compound.Add.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.compound.Add.toString() |
3 | 4 | 3 | 4 |
derivative.compound.Cosine.Cosine(Compound) |
0 | 1 | 1 | 1 |
derivative.compound.Cosine.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.compound.Cosine.toString() |
2 | 2 | 1 | 4 |
derivative.compound.Multiply.Multiply(Compound, Compound) |
0 | 1 | 1 | 1 |
derivative.compound.Multiply.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.compound.Multiply.toString() |
5 | 5 | 5 | 6 |
derivative.compound.Negate.Negate(Compound) |
0 | 1 | 1 | 1 |
derivative.compound.Negate.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.compound.Negate.toString() |
4 | 4 | 3 | 6 |
derivative.compound.Power.Power(Compound, Constant) |
0 | 1 | 1 | 1 |
derivative.compound.Power.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.compound.Power.toString() |
2 | 3 | 3 | 3 |
derivative.compound.Sine.Sine(Compound) |
0 | 1 | 1 | 1 |
derivative.compound.Sine.takeDerivative() |
0 | 1 | 1 | 1 |
derivative.compound.Sine.toString() |
2 | 2 | 1 | 4 |
derivative.utility.ExpressionParser.createTree(String) |
9 | 1 | 5 | 12 |
derivative.utility.ExpressionParser.getRpnFrom(String) |
8 | 1 | 4 | 7 |
derivative.utility.ExpressionParser.operator2rank(char) |
1 | 11 | 11 | 11 |
derivative.utility.ExpressionParser.orderBetween(char, char) |
0 | 1 | 1 | 1 |
derivative.utility.ExpressionParser.parse(String) |
0 | 1 | 1 | 1 |
derivative.utility.ExpressionParser.preProcess(String) |
0 | 1 | 1 | 1 |
derivative.utility.ExpressionParser.scanToEndOfInt(IterableString) |
0 | 1 | 1 | 1 |
derivative.utility.FormatChecker.check(String) |
2 | 3 | 1 | 3 |
derivative.utility.FormatChecker.checkConstant(IterableString) |
10 | 4 | 6 | 7 |
derivative.utility.FormatChecker.checkExponent(IterableString) |
2 | 3 | 1 | 3 |
derivative.utility.FormatChecker.checkExpression(IterableString) |
4 | 1 | 3 | 5 |
derivative.utility.FormatChecker.checkExpressionFactor(IterableString) |
2 | 3 | 1 | 3 |
derivative.utility.FormatChecker.checkFactor(IterableString) |
4 | 1 | 5 | 5 |
derivative.utility.FormatChecker.checkPowerFunction(IterableString) |
2 | 2 | 2 | 3 |
derivative.utility.FormatChecker.checkTerm(IterableString) |
3 | 1 | 3 | 4 |
derivative.utility.FormatChecker.checkTrigFunction(IterableString) |
5 | 4 | 3 | 6 |
derivative.utility.FormatChecker.checkVariable(IterableString) |
2 | 1 | 2 | 2 |
derivative.utility.FormatChecker.checkWhiteSpace(IterableString) |
4 | 3 | 3 | 4 |
derivative.utility.IterableString.IterableString(String) |
0 | 1 | 1 | 1 |
derivative.utility.IterableString.current() |
2 | 2 | 2 | 2 |
derivative.utility.IterableString.hasNext() |
0 | 1 | 1 | 1 |
derivative.utility.IterableString.iterator() |
0 | 1 | 1 | 1 |
derivative.utility.IterableString.next() |
1 | 1 | 2 | 2 |
derivative.utility.IterableString.previous() |
0 | 1 | 1 | 1 |
derivative.utility.IterableString.remaining() |
0 | 1 | 1 | 1 |
derivative.utility.IterableString.skipCharsBy(int) |
0 | 1 | 1 | 1 |
derivative.utility.IterableString.startsWith(String) |
0 | 1 | 1 | 1 |
derivative.utility.IterableString.toString() |
0 | 1 | 1 | 1 |
derivative.utility.WrongFormatException.WrongFormatException() |
0 | 1 | 1 | 1 |
Class | OCavg | OCmax | WMC |
---|---|---|---|
derivative.Atom |
n/a | n/a | 0 |
derivative.Compound |
n/a | n/a | 0 |
derivative.Main |
1.00 | 1 | 1 |
derivative.atom.Constant |
1.00 | 1 | 6 |
derivative.atom.Variable |
1.00 | 1 | 2 |
derivative.compound.Add |
2.00 | 4 | 6 |
derivative.compound.Cosine |
1.33 | 2 | 4 |
derivative.compound.Multiply |
2.33 | 5 | 7 |
derivative.compound.Negate |
2.00 | 4 | 6 |
derivative.compound.Power |
1.67 | 3 | 5 |
derivative.compound.Sine |
1.33 | 2 | 4 |
derivative.utility.ExpressionParser |
5.29 | 13 | 37 |
derivative.utility.FormatChecker |
3.18 | 5 | 35 |
derivative.utility.IterableString |
1.20 | 2 | 12 |
derivative.utility.Operator |
n/a | n/a | 0 |
derivative.utility.WrongFormatException |
1.00 | 1 | 1 |
Package | v(G)avg | v(G)tot |
---|---|---|
derivative |
2.00 | 2 |
derivative.atom |
1.00 | 8 |
derivative.compound |
2.17 | 39 |
derivative.utility |
3.17 | 92 |
2. 代码量分析
代码量矩阵如下所示,除了解析格式检查、解析表达式等工具类外,核心架构中只有部分类的toString
方法代码量较大,也只有这部分出现的 bug 最多。但这其实是与架构设计相关,后文会提到。
Method | CLOC | JLOC | LOC | NCLOC | RLOC |
---|---|---|---|---|---|
derivative.Compound.takeDerivative() |
0 | n/a | n/a | 2 | 33.33% |
derivative.Compound.toString() |
0 | n/a | n/a | 2 | 33.33% |
derivative.Derivable.takeDerivative() |
0 | n/a | n/a | 1 | 33.33% |
derivative.Main.main(String[]) |
0 | 0 | 11 | 11 | 84.62% |
derivative.atom.Constant.Constant(BigInteger) |
0 | 0 | 3 | 3 | 13.04% |
derivative.atom.Constant.add(Constant) |
0 | 0 | 3 | 3 | 13.04% |
derivative.atom.Constant.getValue() |
0 | 0 | 3 | 3 | 13.04% |
derivative.atom.Constant.multiply(Constant) |
0 | 0 | 3 | 3 | 13.04% |
derivative.atom.Constant.takeDerivative() |
0 | 0 | 4 | 4 | 17.39% |
derivative.atom.Constant.toString() |
0 | 0 | 4 | 4 | 17.39% |
derivative.atom.Variable.takeDerivative() |
0 | 0 | 4 | 4 | 40.00% |
derivative.atom.Variable.toString() |
0 | 0 | 4 | 4 | 40.00% |
`derivative.compound.Add.Add(Compound | Compound)` | 0 | 0 | 4 | 4 |
derivative.compound.Add.takeDerivative() |
0 | 0 | 4 | 4 | 16.00% |
derivative.compound.Add.toString() |
0 | 0 | 13 | 13 | 52.00% |
derivative.compound.Cosine.Cosine(Compound) |
0 | 0 | 3 | 3 | 17.65% |
derivative.compound.Cosine.takeDerivative() |
0 | 0 | 4 | 4 | 23.53% |
derivative.compound.Cosine.toString() |
0 | 0 | 7 | 7 | 41.18% |
derivative.compound.Multiply.Multiply(Compound, Compound) |
0 | 0 | 4 | 4 | 14.29% |
derivative.compound.Multiply.takeDerivative() |
0 | 0 | 5 | 5 | 17.86% |
derivative.compound.Multiply.toString() |
0 | 0 | 15 | 15 | 53.57% |
derivative.compound.Negate.Negate(Compound) |
0 | 0 | 3 | 3 | 12.50% |
derivative.compound.Negate.takeDerivative() |
0 | 0 | 4 | 4 | 16.67% |
derivative.compound.Negate.toString() |
0 | 0 | 14 | 14 | 58.33% |
derivative.compound.Power.Power(Compound, Constant) |
0 | 0 | 4 | 4 | 16.00% |
derivative.compound.Power.takeDerivative() |
0 | 0 | 7 | 7 | 28.00% |
derivative.compound.Power.toString() |
0 | 0 | 10 | 10 | 40.00% |
derivative.compound.Sine.Sine(Compound) |
0 | 0 | 3 | 3 | 17.65% |
derivative.compound.Sine.takeDerivative() |
0 | 0 | 4 | 4 | 23.53% |
derivative.compound.Sine.toString() |
0 | 0 | 7 | 7 | 41.18% |
derivative.utility.ExpressionParser.createTree(String) |
0 | 0 | 53 | 53 | 34.64% |
derivative.utility.ExpressionParser.getRpnFrom(String) |
11 | 0 | 35 | 33 | 22.88% |
derivative.utility.ExpressionParser.operator2rank(char) |
1 | 0 | 15 | 15 | 9.80% |
derivative.utility.ExpressionParser.orderBetween(char, char) |
1 | 0 | 3 | 3 | 1.96% |
derivative.utility.ExpressionParser.parse(String) |
0 | 0 | 4 | 4 | 2.61% |
derivative.utility.ExpressionParser.preProcess(String) |
0 | 0 | 19 | 19 | 12.42% |
derivative.utility.ExpressionParser.scanToEndOfInt(IterableString) |
0 | 0 | 6 | 6 | 3.92% |
derivative.utility.FormatChecker.check(String) |
0 | 0 | 10 | 10 | 5.32% |
derivative.utility.FormatChecker.checkConstant(IterableString) |
3 | 3 | 24 | 21 | 12.77% |
derivative.utility.FormatChecker.checkExponent(IterableString) |
4 | 3 | 13 | 10 | 6.91% |
derivative.utility.FormatChecker.checkExpression(IterableString) |
4 | 4 | 19 | 15 | 10.11% |
derivative.utility.FormatChecker.checkExpressionFactor(IterableString) |
0 | 0 | 9 | 9 | 4.79% |
derivative.utility.FormatChecker.checkFactor(IterableString) |
3 | 3 | 13 | 10 | 6.91% |
derivative.utility.FormatChecker.checkPowerFunction(IterableString) |
4 | 3 | 12 | 9 | 6.38% |
derivative.utility.FormatChecker.checkTerm(IterableString) |
3 | 3 | 18 | 15 | 9.57% |
derivative.utility.FormatChecker.checkTrigFunction(IterableString) |
4 | 4 | 24 | 20 | 12.77% |
derivative.utility.FormatChecker.checkVariable(IterableString) |
3 | 3 | 10 | 7 | 5.32% |
derivative.utility.FormatChecker.checkWhiteSpace(IterableString) |
0 | 0 | 8 | 8 | 4.26% |
derivative.utility.IterableString.IterableString(String) |
0 | 0 | 3 | 3 | 6.38% |
derivative.utility.IterableString.current() |
0 | 0 | 7 | 7 | 14.89% |
derivative.utility.IterableString.hasNext() |
0 | 0 | 4 | 4 | 8.51% |
derivative.utility.IterableString.iterator() |
0 | 0 | 4 | 4 | 8.51% |
derivative.utility.IterableString.next() |
0 | 0 | 8 | 8 | 17.02% |
derivative.utility.IterableString.previous() |
0 | 0 | 3 | 3 | 6.38% |
derivative.utility.IterableString.remaining() |
0 | 0 | 3 | 3 | 6.38% |
derivative.utility.IterableString.skipCharsBy(int) |
0 | 0 | 4 | 4 | 8.51% |
derivative.utility.IterableString.startsWith(String) |
0 | 0 | 3 | 3 | 6.38% |
derivative.utility.IterableString.toString() |
0 | 0 | 4 | 4 | 8.51% |
derivative.utility.WrongFormatException.WrongFormatException() |
0 | 0 | 3 | 3 | 60.00% |
Class | CLOC | JLOC | LOC |
---|---|---|---|
derivative.Atom |
0 | 0 | 2 |
derivative.Compound |
0 | 0 | 6 |
derivative.Main |
0 | 0 | 13 |
derivative.atom.Constant |
0 | 0 | 23 |
derivative.atom.Variable |
0 | 0 | 10 |
derivative.compound.Add |
0 | 0 | 25 |
derivative.compound.Cosine |
0 | 0 | 17 |
derivative.compound.Multiply |
0 | 0 | 28 |
derivative.compound.Negate |
0 | 0 | 24 |
derivative.compound.Power |
0 | 0 | 25 |
derivative.compound.Sine |
0 | 0 | 17 |
derivative.utility.ExpressionParser |
28 | 3 | 153 |
derivative.utility.FormatChecker |
53 | 51 | 188 |
derivative.utility.IterableString |
0 | 0 | 47 |
derivative.utility.Operator |
0 | 0 | 3 |
derivative.utility.WrongFormatException |
0 | 0 | 5 |
Interface | CLOC | JLOC | LOC | NCLOC |
---|---|---|---|---|
derivative.Derivable |
0 | 0 | 3 | 4 |
Package | CLOC | CLOC(rec) | JLOC | JLOC(rec) | LOC | LOC(rec) | LOCp | LOCp(rec) | LOCt | LOCt(rec) | NCLOC | NCLOCp | NCLOCp(rec) | NCLOCt |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
total | n/a | 27 | n/a | 54 | n/a | 637 | n/a | 637 | n/a | 0 | n/a | n/a | 634 | n/a |
derivative |
0 | 27 | 0 | 54 | 31 | 637 | 31 | 637 | 0 | 0 | 31 | 31 | 634 | 0 |
derivative.atom |
0 | 0 | 0 | 0 | 39 | 39 | 39 | 39 | 0 | 0 | 39 | 39 | 39 | 0 |
derivative.compound |
0 | 0 | 0 | 0 | 151 | 151 | 151 | 151 | 0 | 0 | 151 | 151 | 151 | 0 |
derivative.utility |
0 | 27 | 54 | 54 | 416 | 416 | 416 | 416 | 0 | 0 | 413 | 413 | 413 | 0 |
3. 架构与依赖分析
这是我作业 3 的架构,与作业 2 基本一致。基本符合高内聚、低耦合的思想,基本不存在“长臂管辖”的情况。
二、重构经历与心得体会
1. 控制复杂性方法之一——抽象
在作业 1 中,我尝试了复杂正则,简单设计。关于正则表达式,我的理解是:用给人看的抽象符号描述的复杂规则。简单但扩展性差的设计是:构造一个因子类,存符号、系数、指数;一个表达式类,用TreeMap
(红黑树,自动维护顺序)存所有因子,相邻两项是乘法就直接计算,是加法就看指数来计算,但如果变量不再是幂函数就没法办!
在作业 2 中,我可谓是迷途知返,推倒重构。尝试使用巨大的正则失败了,因为 Java 正则不支持递归、不支持操作正则栈(尝试写了,很绝望),并且手动用正则+栈解析仍然复杂。
static final String FACTOR_REGEX =
"(?<mulSign>\\*)?" +
"(?:(?<coe>[+-]?\\d+)|" +
"(?:(?<fSign>[+-]?)(?:x|(?<trig>sin|cos)\\(x\\))(?:\\*\\*(?<pow>[+-]?\\d+))?)|" +
"(?<subExpr>(?<subExprSign>[+-]?)\\((?:[^()]|(?:\\(.*\\)))*\\)))";
static final Pattern FACTOR_PATTERN = Pattern.compile(FACTOR_REGEX);
于是回归数据结构的解法:用两个栈,解析字符串并生成逆波兰表达式(其实可以略过逆波兰表达式直接构建表达式树),构建表达式树。由于 Java 并不能像 C/C++ 那样显式使用指针,故构造IterableString
类实现迭代器并模拟*p++
这种字符串指针操作:
import java.util.Iterator;
public class IterableString implements Iterator<Character>, Iterable<Character> {
private final String value;
private int cursor = 0;
@Override
public boolean hasNext() { return cursor < value.length(); }
@Override
public Character next() {
if (!hasNext()) {
System.*out*.println("WRONG FORMAT!");
System.*exit*(0);
}
return value.charAt(cursor++);
}
@Override
public Iterator<Character> iterator() { return this; }
}
我最深的体会是,面向对象的核心在于“抽象”,需要从题目定义中抽象出类和关系。于是形成了上面图中所示的架构。假如使用支持运算符重载的语言,则更加抽象,如定义了*
和+
,则在乘法类中,求导的函数可以写为
Add takeDerivative() {
return leftValue->takeDerivative() * rightValue + leftValue * rightValue->takeDerivative();
}
但我架构的缺陷是,因为没用容器,所以化简不太方便。我只在toString
方法中做了必要的化简,而这不但可能增加了逻辑复杂度,而且很容易出现输出格式不满足题目要求的情况。
在作业 3 中,我尝试用最少精力,顺利完成。因为格式错误情况难以穷举,而题目中已给出正确的形式化表述,对运行时空效率限制也较宽松。所以决定保持原架构不变,仅添加一个工具类来最先用递归下降检查格式,若格式有误,抛异常或终止;若格式无误,则将原字符串原样传给状态机解析。递归下降像是函数版状态机,并不难。
2. 总体感受
本单元作业大量时间花在了思考并尝试解析字符串的方法,而这是面向过程的。用正则?用状态机?还是学递归下降?不过面向对象设计与构造能力确实得到了提升。前面偷的懒,后面迟早还。一开始设计的缺陷,会严重影响代码复用性和扩展性。还好当时 C 语言数据结构代码风格不错,重构时可以直接拿来翻译。而更好的架构可以偷更大的懒。比如若对运行时空效率限制较宽松时,只要加一层就能解决新需求,就不用重构。
三、互测发现的问题
1. 自己程序的 bug
作业 1 中,由于重复的+-
等符号会让正则表达式更加复杂,因此会被程序在解析表达式前直接替换掉。但由于题意理解有误,未意识到-+-
的合法性,程序存在 bug。这种情况下,无论是自己手动还是自动构造测试样例,都无法发现该 bug,因为这是题意理解有误造成的。而且自动测试程序基于随机性生成,尽管可以调整生成不同组合之间的权重,但仍缺乏针对性。自动测试程序使用python
实现,正确结果以sympy
库为准。
import sympy
from random import randint
import subprocess
import os
NEG_HUGE_NUM = -31415926535897932384626433
HUGE_NUM = 31415926535897932384626433
JAR_PATH = "derivative.jar"
x = sympy.symbols("x")
start_symbols = ("", "+", "-")
symbols = ("+", "-", "*")
def gen_factor(option):
if option == 0:
return str(num_tup[randint(0, 1)])
elif option == 1:
return "x**" + str(num_tup[randint(0, 1)])
elif option == 2:
return "x"
else:
return str(randint(-1, 1))
cnt = 0
while True:
cnt += 1
num = randint(NEG_HUGE_NUM, HUGE_NUM)
num_tup = (-num, num)
original_expr = start_symbols[randint(0, 2)]
for _ in range(20):
original_expr += gen_factor(randint(0, 5)) + symbols[randint(0, 2)]
original_expr += gen_factor(randint(0, 5))
print(f"ORIGINAL: {original_expr}")
master_expr = sympy.diff(original_expr, x)
print(f"MASTER: {master_expr}")
query_expr = subprocess.run(
f"java -jar {JAR_PATH}",
input=original_expr, stdout=subprocess.PIPE, encoding='utf-8').stdout
print(f"QUERY: {query_expr}")
if master_expr.equals(query_expr):
print(f"Test case {cnt} passed.\n\n")
else:
print(f"Your output on test case {cnt} is wrong.\n\n")
os.system("pause")
作业 2 中,由于运算符优先级表中取相反数和求幂的优先级设置反,且输出时考虑欠周,乘法输出了*-x
之类的字符串被认定为格式错误。此外由于许多地方没有用临时变量保存函数调用的返回值,导致重复计算,有个别数据超时了。
作业 3 中,由于处理格式时调用next
方法未判定终止条件而导致索引越界,以及乘法输出了*-sin
、*-cos
之类的字符串被认定为格式错误。由于之前在toString
方法中设置的逻辑仍不完备,故可见攻击者的数据经过了精心构造,当然也说明了代码的可读性较好,能够被攻击者轻易找到漏洞。最重要地,印证了前文所述,代码量大的方法,容易出 bug,提醒我在今后的实践中注意设计更方便的架构,进一步控制复杂度。
2. 他人程序的 bug 和问题
我一般首先通过代码风格大致猜测对方的水平。代码风格优雅的最容易吸引我学习的兴趣,而代码风格糟糕的最容易引起我攻击的兴趣。我一般先测试最简单的输入,结果在第二次作业中发现有同学一点都没有化简,只要括号一多,输出立刻超过一万个字符。简单的输入没有问题,再自动生成数据进行测试。
3. 控制复杂性方法之二——规范的代码风格
我发现很多同学的代码风格都有问题,特别是使用奇怪的名称。比如使用了缩写、用拼音命名或写注释,使读者不能一眼看出含义;类名用了动词,感觉很别扭;甚至使用无意义的名字,虽然我能理解他的心情,但毕竟无意义;MainClass
里东西太多,完全可以把这些静态方法单独放到一个工厂类里;等等。我感觉这些不好的习惯主要都是从大一一开始接触 C 语言程序设计时造成的。本来 C 语言的程序就不好懂,但当时老师并没有强调这么多,结果这些不好的习惯一直被带到了现在。其实,看那些大项目,都有着很好的代码风格。比如,Git 是用 C 写的,但和很多同学的风格肯定大不相同。
假如像下面这样写,感受相同吗?
int main(int argc, const char **argv) {
init_clk_trace();
stdfds_sanit();
default_sigpipe();
exe_dir(argv[0]);
gettext_setup();
repo_init();
start_attr();
init_trace();
cmd_start_trace(argv);
proc_info_collect_trace(0);
int res = cmd(argc, argv);
exit_cmd_trace(result);
return res;
}
Programs must be written for people to read, and only incidentally for machines to execute.
——Harold Abelson
If you don’t know what a thing should be called, you cannot know what it is.
If you don’t know what it is, you cannot sit down and write the code.
——Sam Gardiner
与大家共勉。
注:本文将要求的 5 部分合并到了 3 部分中。