记一次遗留代码的重构改造:数十万行国家标准坐标文件分析方法的改造与提速
重构这个词我们经常不停的提到,但是每次遇到的时候,总觉得无处下手,怕改坏了得不偿失。本周我进行了一次重构,并从中总结了部分经验。
事情从周一说起,我收到一个需求:公司的维护项目耕地保护监管平台中的导入地块坐标文件模块速度太慢,平均需要数分钟,较大的文件甚至需要十几分钟,严重影响用户体验。
一般拿到这样的需求之后,第一反应就时查看源码,通过调试找到慢的位置,优化代码结构。于是我按照管理查看了相关源码,发现原先负责的同事的大致思路是:将文件读取到流中,并循环遍历文件中的每一行坐标进行处理,拼接等操作。我就在想,是不是文件行数太多,循环10万+本身比较慢呢?如果是的,涉及到数据机构,改起来可就有些棘手了。光想没有用,要实验。我找了一个20万+行数的txt,然后直接读取文件流逐行循环,程序很快就结束了,只花了不到2s。那就是我们自己的问题了,每次循环过程中做的事情太慢:质变产生量变,最终导致速度的恐怖下降。
部坐标文件格式如下(代表两个地块,第一个地块有一个挖空区。):
[属性描述] 格式版本号=1.0 数据产生单位=国土资源部 数据产生日期=2019-12-25 坐标系=中国2000国家大地坐标系 几度分带=3 投影类型=高斯克吕格 计量单位=米 带号=37 精度=0.01 转换参数=,,,,,, [地块坐标] 906,44.07,1,湘乡市月山镇、白田镇、育塅乡、金薮乡高标准农田建设项目1,面,,农业综合开发,,@ J0001,1,3093358.43,37636109.39 J0002,1,3093331.62,37636110.99 J0003,1,3093330.65,37636104.15 J0004,1,3093324.79,37636089.83 J0005,1,3093320.56,37636083.32 J0001,1,3093358.43,37636109.39 J0001,2,3092589.84,37635980.86 J0002,2,3092591.65,37635973.27 J0003,2,3092590.57,37635965.67 J0004,2,3092585.14,37635957.72 J0013,2,3092571.04,37635983.03 J0014,2,3092579.36,37635984.48 J0015,2,3092586.23,37635983.76 J0001,2,3092589.84,37635980.86 125,3.49,3,湘乡市月山镇、白田镇、育塅乡、金薮乡高标准农田建设项目3,面,,农业综合开发,,@ J0001,1,3091162.58,37629834.86 J0002,1,3091158.84,37629835.69 J0003,1,3091152.92,37629839.93 J0004,1,3091149.27,37629843.58 J0001,1,3091162.58,37629834.86
仔细分析代码,发现每次循环过程中时获取部坐标格式文件中的坐标,并验证,检查,汇总合并,每一步仿佛都不能在少,也没有可精简的循环可言。那能不能不逐行处理,而是将文件中的数百个地块同时处理,既:将整个坐标文件进行截取,采用多线程的思路按不同的项目分别同时进行处理最后再合并成完成的坐标串呢?
理论上来说可行,但是一旦着手开始修改,原来的历史遗留代码立刻放我犯难了。牵一发而动全身。代码循环逻辑比较复杂,通过条件分支判断是否继续循环,方法体超长,全局变量和临时变量过多,变量命名不规范...
原始代码如下:
public static string ResolveDeptFile(StreamReader sr, out string paramJZDH, out string paramDKQH, out string infoMsgParam, out string paramDOT, out string errorMsg, string zbx) { string strPoint = ""; errorMsg = string.Empty; paramJZDH = string.Empty; infoMsgParam = string.Empty; paramDKQH = string.Empty; paramDOT = string.Empty; //一、解析,按照地块取出来 StringBuilder sPointBul = new StringBuilder(); //存点 StringBuilder sDKQH = new StringBuilder(); //地块圈号 StringBuilder sJZDH = new StringBuilder(); //界址点号 StringBuilder sPointList = new StringBuilder(); StringBuilder sRetrunPot = new StringBuilder(); //对于图符号、地类编码、地块名称等信息在这里不分开,一整块保存 StringBuilder sInfoBul = new StringBuilder(); string pointX = ""; string pointY = ""; string temp = sr.ReadLine(); string[] pointlist; int iFlag = 0; try { while (!sr.EndOfStream) { if (temp.EndsWith("@")) { iFlag++; //信息,用其本身的结尾@区分 sInfoBul.Append(temp); string[] checklist = temp.Split(','); if (checklist[2] == "") { errorMsg = "第" + iFlag.ToString() + "个地块的地块编号不存在,请检查坐标文件!"; return ""; } if (checklist[3] == "") { errorMsg = "第" + iFlag.ToString() + "个地块的地块名称不存在,请检查坐标文件!"; return ""; } temp = sr.ReadLine(); pointlist = temp.Split(','); string aaa = pointlist[1]; bool flag = true; sPointList = new StringBuilder(); while (temp != null && temp.Split(',').Length == 4) { pointlist = temp.Split(','); //点 string point = string.Format("{0},{1},{2} ", pointlist[1].Replace(" ", ""), pointlist[2].Replace(" ", ""), pointlist[3].Replace(" ", "")); if (sPointList.ToString().IndexOf(point) == 0) { if (flag) { flag = false; } else { errorMsg = "第" + iFlag.ToString() + "个地块的坐标点(" + point + ")重复,请检查坐标文件!"; return ""; } } else if (sPointList.ToString().IndexOf(point) > 0) { errorMsg = "第" + iFlag.ToString() + "个地块的坐标点(" + point + ")重复,请检查坐标文件!"; return ""; } //界址点 sJZDH.Append(pointlist[0]); sJZDH.Append(","); //地块圈号 if (aaa != pointlist[1]) { sDKQH.Append("@"); sPointBul.Remove(sPointBul.Length - 1, 1); sPointBul.Append("@"); // 圈号 sRetrunPot.Remove(sRetrunPot.Length - 1, 1); sRetrunPot.Append("$"); sPointList.Clear(); flag = true; } sPointBul.Append(pointlist[2].Replace(" ", ""));//替换掉里面的空格,谭正文让改 sPointBul.Append(","); sPointBul.Append(pointlist[3].Replace(" ", "")); sPointBul.Append(" "); sPointList.Append(pointlist[1].Replace(" ", "")); sPointList.Append(","); sPointList.Append(pointlist[2].Replace(" ", "")); sPointList.Append(","); sPointList.Append(pointlist[3].Replace(" ", "")); sPointList.Append(" "); sRetrunPot.Append(pointlist[2].Replace(" ", ""));//替换掉里面的空格,谭正文让改 sRetrunPot.Append(","); sRetrunPot.Append(pointlist[3].Replace(" ", "")); sRetrunPot.Append(" "); pointX = pointlist[2].Replace(" ", ""); pointY = pointlist[3].Replace(" ", ""); aaa = pointlist[1]; sDKQH.Append(pointlist[1]); sDKQH.Append(","); temp = sr.ReadLine(); } sPointBul.Remove(sPointBul.Length - 1, 1); sPointBul.Append("#"); // 地块 sRetrunPot.Remove(sRetrunPot.Length - 1, 1); sRetrunPot.Append("#"); sJZDH.Remove(sJZDH.Length - 1, 1); sJZDH.Append("*"); sDKQH.Remove(sDKQH.Length - 1, 1); sDKQH.Append("*"); } else { temp = sr.ReadLine(); } } sInfoBul.Remove(sInfoBul.Length - 1, 1); sPointBul.Remove(sPointBul.Length - 1, 1); sRetrunPot.Remove(sRetrunPot.Length - 1, 1); sJZDH.Remove(sJZDH.Length - 1, 1); sDKQH.Remove(sDKQH.Length - 1, 1); paramJZDH = sJZDH.ToString(); paramDKQH = sDKQH.ToString(); paramDOT = sPointBul.ToString(); strPoint = sRetrunPot.ToString(); infoMsgParam = sInfoBul.ToString(); strPoint = string.Format("{0}*{1}", strPoint, GetZBX(zbx, pointY)); } catch (Exception ex) { errorMsg = "解析坐标异常:请检查坐标文件内容!"; NLogLogger.Error("解析坐标异常:" + ex.ToString()); mapLog.WriteLog("ex.Message:" + ex.Message); return ""; } return strPoint; }
一股子妥妥的坏代码味道。
鉴于此种情况,我想起了重构的原则之一:单元测试覆盖。只有在单元测试覆盖的情况下, 小步小步的完成修改,并不停的运行单元测试,才能保证自己的修改不会破坏代码原有的结构和逻辑以及功能。但是这个代码,要加上单元测试,还需要做一些准备工作。
首先,我们需要将其复杂的超长的参数列表结构化,并将文件流的创建和关闭推到方法的外部。
将复杂的参数列表进行封装。根据参数要求封装一个类,改造原有的代码,使其返回这个类而不是通过out的方式返回,减少参数传入和干扰。
public class ResolveDeptFileResult { private string _errorMsg = ""; /// <summary> /// 错误信息 /// </summary> public string errorMsg { get => _errorMsg; set => _errorMsg = value; } private string _paramJZDH = ""; /// <summary> /// 界址点集合 /// </summary> public string paramJZDH { get => _paramJZDH; set => _paramJZDH = value; } private string _infoMsgParam = ""; /// <summary> /// 地块信息集合 /// </summary> public string infoMsgParam { get => _infoMsgParam; set => _infoMsgParam = value; } private string _paramDKQH = ""; public string paramDKQH { get => _paramDKQH; set => _paramDKQH = value; } private string _paramDOT = ""; /// <summary> /// 坐标集合(待数据检查格式) /// </summary> public string paramDOT { get => _paramDOT; set => _paramDOT = value; } private string _strPoint = ""; /// <summary> /// 坐标集合(待入库格式) /// </summary> public string strPoint { get => _strPoint; set => _strPoint = value; } /// <summary> /// 根据部坐标格式文件封装的地块信息集合 /// </summary> public List<DotInfoModelWithOneCase> DotInfoModelWithOneCases { get; set; } private string _zbx = ""; /// <summary> /// 坐标系 /// </summary> public string Zbx { get => _zbx; set => _zbx = value; } private string _firstPointY = ""; /// <summary> /// 第一个纵坐标,用来支持带号判断算法 /// </summary> public string firstPointY { get => _firstPointY; set => _firstPointY = value; } }
public static ResolveDeptFileResult ResolveDeptFileFast(StreamReader sr, string zbx) { ResolveDeptFileResult resolveDeptFileResult = new ResolveDeptFileResult() { Zbx = zbx }; string fileInOneLine = sr.ReadLine(); //...... }
此实例中有一个运气比较好的因素是要修改的部分没有涉及到数据库,网络等复杂类和方法,不然按照历史代码的编程风格,要将耦合在里面的类库剥离并依赖注入,将是一个巨大的工程。
其次,准备单元测试代码。为了保证结果与原方法一致以及改造不受原方法的掣肘,我新建了一个方法,将原方法内容复制过来,并在单元测试中对比同一个数据在两种不同方法跑出来的结果和运行效率。并大致准备了5个单元测试。一个正常情况下正确的数百行的坐标文件的单元测试,3个坐标有问题情况下的单元测试,一个是正常情况但是坐标超过10万行的测试。
[TestMethod] [DataRow(@"C:\Users\wenpeng\Desktop\3.txt")] [DataRow(@"C:\Users\wenpeng\Desktop\4.txt")] [DataRow(@"C:\Users\wenpeng\Desktop\5.txt")] [DataRow(@"C:\Users\wenpeng\Desktop\6.txt")] [DataRow(@"C:\Users\wenpeng\Desktop\NZ17高标准农田坐标3910.txt")] public void TestMethod1(string filepath) { Stream objStream = System.IO.File.OpenRead(filepath); StreamReader sr = new StreamReader(objStream, Encoding.GetEncoding("GB2312")); string paramJZDH = string.Empty; string paramDKQH = string.Empty; string infoMsgParam = string.Empty; string errorMsg = string.Empty; string paramdot = string.Empty; string zbx = string.Empty; string dh = string.Empty; Stopwatch stopwatch = new Stopwatch(); Console.WriteLine("开始执行ResolveDeptFileFast"); stopwatch.Start(); //解析部坐标 ResolveDeptFileResult resolveDeptFileResult = ResolveDeptFileFast(sr, zbx); stopwatch.Stop(); Console.WriteLine(string.Format("ResolveDeptFileFast执行完毕,共花费时间{0}s", stopwatch.ElapsedMilliseconds / 1000.0)); paramJZDH = resolveDeptFileResult.paramJZDH; paramDKQH = resolveDeptFileResult.paramDKQH; infoMsgParam = resolveDeptFileResult.infoMsgParam; string strPoints = resolveDeptFileResult.strPoint; paramdot = resolveDeptFileResult.paramDOT; errorMsg = resolveDeptFileResult.errorMsg; sr.BaseStream.Seek(0, SeekOrigin.Begin); Console.WriteLine("开始执行原方法ResolveDeptFile"); stopwatch.Restart(); strPoints = ResolveDeptFile(sr, out paramJZDH, out paramDKQH, out infoMsgParam, out paramdot, out errorMsg, zbx); stopwatch.Stop(); Console.WriteLine(string.Format("原方法ResolveDeptFile执行完毕,共花费时间{0}s", stopwatch.ElapsedMilliseconds / 1000.0)); Assert.AreEqual(resolveDeptFileResult.strPoint, strPoints); Assert.AreEqual(resolveDeptFileResult.paramJZDH, paramJZDH); Assert.AreEqual(resolveDeptFileResult.paramDKQH, paramDKQH); Assert.AreEqual(resolveDeptFileResult.infoMsgParam, infoMsgParam); Assert.AreEqual(resolveDeptFileResult.paramDOT, paramdot); Assert.AreEqual(resolveDeptFileResult.errorMsg, errorMsg); }
接下来就可以放心大胆的修改了,每次完成一次可编译的修改就运行一次单元测试,只要我的修改影响了原来的逻辑、改变了结果,那么单元测试就编译不通过,我就知道肯定是我上次的修改除了问题,只需要仔细检查上次修改改变了什么,无需从头检查,避免整个重构修改功亏一篑。
通过仔细审视原有代码,发现它的循环之所有点、多有点绕,是因为它在逐行循环文件流的同时还在处理坐标传逻辑。并且因为部坐标格式的业务逻辑,制作成了循环嵌套循环再嵌套循环的模式,这种模式下想要增加和删除逻辑比较困难,根本不可能进行预计的多线程改造。
怎么办?我首先想到的是,专心循环文件流,并按照不同的业务块分别存储,循环结束后再对内存中的多个地块的坐标同时处理,最后汇总。
我们先设计一个类,对照了坐标文件中的一个地块,方便将文件流结构化保存。
public class DotInfoModelWithOneCase { public int no { get; set; } public string caseinfo { get; set; } public List<string> dotinfo { get; set; } }
然后循环这个文件,存储到对应的DotInfoModelWithOneCase列表结构中,通过NO保持地块的顺序
public static ResolveDeptFileResult ResolveDeptFileFast(StreamReader sr, string zbx) { string fileInOneLine = sr.ReadLine(); int iFlag = 0; try { List<DotInfoModelWithOneCase> dotInfoModelWithOneCases = new List<DotInfoModelWithOneCase>(); while (!sr.EndOfStream) { //通过循环跳过部坐标的头部 if (fileInOneLine.EndsWith("@")) { iFlag++; DotInfoModelWithOneCase dotInfoModelWithOneCase = new DotInfoModelWithOneCase() { no = iFlag }; dotInfoModelWithOneCase.caseinfo = fileInOneLine; SetDotInfoModelWithOneCaseDotInfo(sr, iFlag, ref fileInOneLine, dotInfoModelWithOneCase); dotInfoModelWithOneCases.Add(dotInfoModelWithOneCase); } else { fileInOneLine = sr.ReadLine(); } } } catch (Exception ex) { resolveDeptFileResult.errorMsg = "解析坐标异常:请检查坐标文件内容!"; NLogLogger.Error("解析坐标异常:" + ex.ToString()); mapLog.WriteLog("ex.Message:" + ex.Message); return resolveDeptFileResult; } } private static void SetDotInfoModelWithOneCaseDotInfo(StreamReader sr, int iFlag, ref string fileInOneLineWithDot, DotInfoModelWithOneCase dotInfoModelWithOneCase) { //获取下一行 fileInOneLineWithDot = sr.ReadLine(); List<string> listDotinfo = new List<string>(); while (fileInOneLineWithDot != null && fileInOneLineWithDot.Split(',').Length == 4) { listDotinfo.Add(fileInOneLineWithDot); fileInOneLineWithDot = sr.ReadLine(); } dotInfoModelWithOneCase.dotinfo = listDotinfo; }
至此,部坐标格式文件中的数据已经被我按照地块分别存储到列表中了,多线程并发操作成为可能。接下来,我们可以并发执行原来在循环这个文件时执行的操作,将操作结果汇总成完整的字符串返回即可。
按照这个思路,我们先完成每个地块的自己的坐标的处理方法,并标志为可异步执行。
public Task<ResolveDeptFileResult> GetResolveDeptFileResultInThisCase() { return Task.Run(() => { ResolveDeptFileResult resolveDeptFileResult = new ResolveDeptFileResult(); resolveDeptFileResult.errorMsg += Checkcaseinfo(); //获取项目信息 resolveDeptFileResult.infoMsgParam = caseinfo; if (!string.IsNullOrEmpty(resolveDeptFileResult.errorMsg)) { return resolveDeptFileResult; } StringBuilder sbparamJZDH = new StringBuilder(); StringBuilder sbparamDKQH = new StringBuilder(); StringBuilder sbparamDOT = new StringBuilder(); StringBuilder sbstrPoint = new StringBuilder(); StringBuilder sbforCheckRepeat = new StringBuilder(); bool dkbegin = true; foreach (var item in dotinfo) { //获取界址点 string[] arrDot = item.Split(','); sbparamJZDH.Append(arrDot[0]).Append(","); //获取地块圈号,获取坐标检查用的点,如数据库用的点 if (sbparamDKQH.Length > 0 && !sbparamDKQH.ToString().Contains(arrDot[1])) { dkbegin = true; sbparamDKQH.Append("@"); //去掉多余的空格 sbparamDOT.Remove(sbparamDOT.Length - 1, 1); sbstrPoint.Remove(sbstrPoint.Length - 1, 1); sbparamDOT.Append("@"); sbstrPoint.Append("$"); } sbparamDKQH.Append(arrDot[1]).Append(","); sbparamDOT.Append(arrDot[2].Trim()).Append(",").Append(arrDot[3].Trim()).Append(" "); sbstrPoint.Append(arrDot[2].Trim()).Append(",").Append(arrDot[3].Trim()).Append(" "); resolveDeptFileResult.errorMsg += Checkrepeat(item, sbforCheckRepeat); if (!dkbegin) { sbforCheckRepeat.Append(arrDot[1].Trim()).Append(",").Append(arrDot[2].Trim()).Append(",").Append(arrDot[3].Trim()).Append(" "); } else { dkbegin = false; } if (!string.IsNullOrEmpty(resolveDeptFileResult.errorMsg)) { return resolveDeptFileResult; } } resolveDeptFileResult.paramJZDH = sbparamJZDH.ToString().TrimEnd(','); resolveDeptFileResult.paramDKQH = sbparamDKQH.ToString().TrimEnd(','); resolveDeptFileResult.paramDOT = sbparamDOT.ToString().TrimEnd(' '); resolveDeptFileResult.strPoint = sbstrPoint.ToString().TrimEnd(' '); return resolveDeptFileResult; }); }
再将整体项目所有地块的处理操作执行,并汇总
public void GetTotalResolveDeptFileResult() { List<Task<ResolveDeptFileResult>> arr = new List<Task<ResolveDeptFileResult>>(); for (int i = 0; i < DotInfoModelWithOneCases.Count; i++) { arr.Add(DotInfoModelWithOneCases[i].GetResolveDeptFileResultInThisCase()); } Task.WaitAll(arr.ToArray()); foreach (var item in arr) { this.paramDKQH += item.Result.paramDKQH + "*"; this.infoMsgParam += item.Result.infoMsgParam; this.paramJZDH += item.Result.paramJZDH + "*"; this.paramDOT += item.Result.paramDOT + "#"; this.strPoint += item.Result.strPoint + "#"; if (!string.IsNullOrEmpty(item.Result.errorMsg)) { this.errorMsg += item.Result.errorMsg + ";"; } } this.paramDKQH = this.paramDKQH.TrimEnd('*'); this.infoMsgParam = this.infoMsgParam.TrimEnd('@'); this.paramJZDH = this.paramJZDH.TrimEnd('*'); this.paramDOT = this.paramDOT.TrimEnd('#'); this.strPoint = this.strPoint.TrimEnd('#'); this.strPoint = string.Format("{0}*{1}", this.strPoint, GetZBX(Zbx, firstPointY)); }
最后将这些方法归并到合适的类中
public class ResolveDeptFileResult { /..其他方法../ public void GetTotalResolveDeptFileResult() { /..具体实现../ } /..其他方法../ } public class DotInfoModelWithOneCase { /..其他方法../ public Task<ResolveDeptFileResult> GetResolveDeptFileResultInThisCase() { /..具体实现../ } /..其他方法../ }
至此改造完成。改造后的主函数相对更加简洁,逻辑更清晰,以后要增加循环过程中的处理,直接修改DotInfoModelWithOneCase类即可,要修改地块与地块直接的拼接逻辑,直接修改ResolveDeptFileResult类即可。具体代码如下:
public static ResolveDeptFileResult ResolveDeptFileFast(StreamReader sr, string zbx) { ResolveDeptFileResult resolveDeptFileResult = new ResolveDeptFileResult() { Zbx = zbx }; string fileInOneLine = sr.ReadLine(); int iFlag = 0; try { List<DotInfoModelWithOneCase> dotInfoModelWithOneCases = new List<DotInfoModelWithOneCase>(); while (!sr.EndOfStream) { //通过循环跳过部坐标的头部 if (fileInOneLine.EndsWith("@")) { iFlag++; DotInfoModelWithOneCase dotInfoModelWithOneCase = new DotInfoModelWithOneCase() { no = iFlag }; dotInfoModelWithOneCase.caseinfo = fileInOneLine; SetDotInfoModelWithOneCaseDotInfo(sr, iFlag, ref fileInOneLine, dotInfoModelWithOneCase); dotInfoModelWithOneCases.Add(dotInfoModelWithOneCase); } else { fileInOneLine = sr.ReadLine(); } } resolveDeptFileResult.firstPointY = dotInfoModelWithOneCases[0].dotinfo[0].Split(',')[3]; resolveDeptFileResult.DotInfoModelWithOneCases = dotInfoModelWithOneCases; resolveDeptFileResult.GetTotalResolveDeptFileResult(); if (!string.IsNullOrEmpty(resolveDeptFileResult.errorMsg)) { return new ResolveDeptFileResult() { errorMsg = resolveDeptFileResult.errorMsg }; } return resolveDeptFileResult; } catch (Exception ex) { resolveDeptFileResult.errorMsg = "解析坐标异常:请检查坐标文件内容!"; NLogLogger.Error("解析坐标异常:" + ex.ToString()); mapLog.WriteLog("ex.Message:" + ex.Message); return resolveDeptFileResult; } }
总结以上经验,我觉得要完成历史遗留代码的重构,一定要抓住几个要点:
- 改造之前先对自己的想法进行验证,一定要有可行性才能修改,不做没有意义的修改。所谓大胆假设,小心求证。
- 改造前一定要覆盖单元测试,不然改造难度将成指数上升,千辛万苦改造完成后,依旧不敢更新,给自己,公司带来巨大的测试压力。
- 理清楚原始代码中带来大量干扰的全局变量,历史变量,尽可能的用类来封装和处理,将长的方法进行拆分,将方法放置到正确的类中。
当然,好的代码可以更好,我在此处依然存在大量的遗留代码,你们能找出来么?欢迎在评论中告诉我。(*^_^*)