软工实践寒假作业（2/2）

这个作业属于哪个课程	2020春\|S班(福州大学)
这个作业要求在哪里	软工实践寒假作业（2/2）
这个作业的目标	熟悉github；实现疫情数据统计工具
作业正文	here
其他参考文献	CSDN、博客园、python.org

1. Github仓库地址

https://github.com/ldc-37/InfectStatistic-main

2. PSP表格

PSP2.1	Personal Software Process Stages	预估耗时（分钟）	实际耗时（分钟）
Planning	计划	10	10
Estimate	估计这个任务需要多少时间	1200	1260
Development	开发	400	450
Analysis	需求分析 (包括学习新技术)	600	600
Design Spec	生成设计文档	10	10
Design Review	设计复审	15	15
Coding Standard	代码规范 (为目前的开发制定合适的规范)	15	15
Design	具体设计	40	40
Coding	具体编码	360	410
Code Review	代码复审	30	30
Test	测试（自我测试，修改代码，提交修改）	60	180
Reporting	报告	30	60
Test Repor	测试报告	30	30
Size Measurement	计算工作量	10	10
Postmortem & Process Improvement Plan	事后总结, 并提出过程改进计划	20	20
合计		1230	1290

3.解题思路

通读需求，我认为这个项目主要需要三个步骤：（在命令行参数的控制下）读取数据、处理数据、输出结果。

语言选择
拿到题目后，我决定速成python来做一下这个项目。速成的方式是对着官网教程（https://docs.python.org/zh-cn/3.7/tutorial/index.html）敲一敲demo，感觉还是挺快的，大约花了3-4天时间。
命令行参数
本项目需要对命令行参数进行处理，我使用了Python自带的argparse库（当然，也可以写一个处理参数的类，不过摸鱼简便起见，我没有用这种方式）。

4. 设计实现

数据结构
使用形如{患者类型名称:该类型患者数量}的字典保存单个省份的数据，使用形如{省份名:省份患者数据}的字典保存所有省份的数据。
命令行参数处理
首先校验输入是否合法，先是argparse自带的选项参数数量控制和给定参数范围，借助help参数生成命令行帮助说明。然后是合法性校验，如检查省份是否正确、输出类型是否重复等。
日志记录类型判断
根据不同类型日志记录的特征，判断出某行记录是哪一种类型，调用响应的处理函数

5. 代码说明

命令行参数处理

    parser = argparse.ArgumentParser(prog='InfectStatistic', description='show statistic of epidemic data')
    parser.add_argument('option', choices=['list'], type=str, help='only "list" support')
    parser.add_argument('-log', type=str, required=True, help='*directory* that contain logs file')
    parser.add_argument('-out', type=argparse.FileType('w', encoding='utf8'), required=True,
                        help='*file* path that this script output')
    parser.add_argument('-date', type=str, help='real time data until this param, format YYYY-mm-DD')
    parser.add_argument('-type', type=str, nargs='*', choices=['ip', 'sp', 'cure', 'dead'], help='type(s) to output')
    parser.add_argument('-province', type=str, nargs='*', help='which province(s) to display, input "全国" if needs sum')
    option = parser.parse_args()

根据日志记录的类型处理输入数据

class Statistic:
    def change_infected(self, province, amount):
    def change_suspect(self, province, amount):
    def add_death(self, province, amount):
    def add_cure(self, province, amount):
    def parse_record_line(self, info_str: str):
        keywords = info_str.split(' ')
        count = int(re.findall(r'\d+', keywords[-1])[0])
        province = keywords[0]
        province2 = keywords[3] if len(keywords) >= 4 else None  # 若没有第二个省份则该变量无意义
        switch = {
            1: lambda: self.change_infected(province, count),  # 新增确诊
            2: lambda: self.change_suspect(province, count),  # 新增疑似
            3: lambda: (self.change_infected(province, -count), self.change_infected(province2, count)),  # 确诊流动
            4: lambda: (self.change_suspect(province, -count), self.change_suspect(province2, count)),  # 感染流动
            5: lambda: (self.add_death(province, count), self.change_infected(province, -count)),  # 死亡
            6: lambda: (self.add_cure(province, count), self.change_infected(province, -count)),  # 治愈
            7: lambda: (self.change_suspect(province, -count), self.change_infected(province, count)),  # 疑似->确诊
            8: lambda: self.change_suspect(province, -count)  # 排除疑似
        }
        switch[map_info_type(info_str)]()

def map_info_type(s: str) -> int:
    """
    根据s进行数据类型分类
    :param s: 从log中读取的字符串数据
    :return: 数据类型编号，与举例一致
    """
    keywords = s.split(' ')
    if keywords[1] == '新增':
        if keywords[2] == '感染患者':
            return 1
        else:
            return 2
    if keywords[1] == '感染患者':
        return 3
    if keywords[1] == '疑似患者':
        if keywords[2] == '流入':
            return 4
        else:
            return 7
    if keywords[1] == '死亡':
        return 5
    if keywords[1] == '治愈':
        return 6
    if keywords[1] == '排除':
        return 8

处理日志文件

    stat = Statistic()
    filesTuple = os.walk(option.log).__next__()
    basePath = filesTuple[0] + '\\'
    # 扫描目录下的所有文件
    for filename in filesTuple[2]:
        # 若指定日期，则按参数中止循环，否则处理完所有文件
        # if option.date and filename.find(option.date) > -1:
        if option.date and str_to_date(filename[:10]) > str_to_date(option.date):
            break
        with open(basePath + filename, encoding='utf8') as fr:
            # 处理文件的每一行
            for line in fr:
                if line[0] == line[1] == '/':
                    continue
                if line[len(line) - 1] == '\n':
                    line = line[:-1]
                stat.parse_record_line(line)
    if option.date and str_to_date(filesTuple[2][-1][:10]) < str_to_date(option.date):
        sys.stderr.write('warning: 日期超出范围\n')
    stat.data['全国'] = stat.get_total()

处理输出数据

    output = ''
    data = {}
    if option.province:
        for p in option.province:
            data[p] = stat.data[p] if p in stat.data.keys() else {**EMPTY_DATA}
    else:
        data = {**stat.data}  # todo:这里应该不是深拷贝
    # sortedProvince = sorted(list(data.keys()), key=lambda x: lazy_pinyin(x[0]))
    # 排除type参数不包含的类型
    if option.type:
        mapList = []
        new_data = {}
        for t in option.type:
            if t == 'ip':
                mapList.append('infected')
            elif t == 'sp':
                mapList.append('suspect')
            elif t == 'dead':
                mapList.append('death')
            else:
                mapList.append('cure')
        for k, prov in data.items():
            new_data[k] = {}
            for t in mapList:
                new_data[k][t] = prov[t]
        data = new_data
    # 生成输出文本
    if not option.province or '全国' in option.province:
        output += parse_output_line('全国', data)
    for prov in SORTED_PROVENCE_NAME:
        output += parse_output_line(prov, data)
    output += '// 该文档并非真实数据，仅供测试使用\n' \
              '// 命令：' + ' '.join(sys.argv[1:]) + '\n'
    # 写入out参数指定的文件
    option.out.write(output)

def parse_output_line(province, data):
    """
    处理输出的一行数据到题目指定格式
    :param province: 省份名称
    :param data: 存储数据的字典，符合Statistic.data的结构
    :return: 数据行
    """
    if province in data.keys():
        oneLine = province
        for k2, v2 in data[province].items():
            oneLine += ' ' + TYPE_CN_MAP[k2] + str(v2) + '人'
        return oneLine + '\n'
    return ''

6. 单元测试

使用了pytest作为测试框架，并用pytest-cov查看单元测试覆盖率

import pytest

from InfectStatistic import Statistic, map_info_type


@pytest.fixture(scope='function')
def stat_data():
    return {
        '福建': {'infected': 10, 'suspect': 20, 'cure': 30, 'death': 40},
        '湖北': {'infected': 100, 'suspect': 200, 'cure': 300, 'death': 400}
    }


@pytest.mark.parametrize('info_str', [
    "<省> 新增 感染患者 n人",
    "<省> 新增 疑似患者 n人",
    "<省1> 感染患者 流入 <省2> n人",
    "<省1> 疑似患者 流入 <省2> n人",
    "<省> 死亡 n人", "<省> 治愈 n人",
    "<省> 疑似患者 确诊感染 n人",
    "<省> 排除 疑似患者 n人"
])
def test_map_info_type(info_str):
    strList = [
        "<省> 新增 感染患者 n人",
        "<省> 新增 疑似患者 n人",
        "<省1> 感染患者 流入 <省2> n人",
        "<省1> 疑似患者 流入 <省2> n人",
        "<省> 死亡 n人", "<省> 治愈 n人",
        "<省> 疑似患者 确诊感染 n人",
        "<省> 排除 疑似患者 n人"
    ]
    assert map_info_type(info_str) == int(strList.index(info_str)) + 1


class TestStatistic:

    def test_infected(self, stat_data):
        stat = Statistic()
        stat.data = stat_data
        stat.change_infected('福建', 1)
        assert stat.data['福建']['infected'] == 11

    def test_suspect(self, stat_data):
        stat = Statistic()
        stat.data = stat_data
        stat.change_suspect('福建', -1)
        assert stat.data['福建']['suspect'] == 19

    def test_death(self, stat_data):
        stat = Statistic()
        stat.data = stat_data
        stat.add_death('福建', 5)
        assert stat.data['福建']['death'] == 45

    def test_cure(self, stat_data):
        stat = Statistic()
        stat.data = stat_data
        stat.add_cure('福建', 10)
        assert stat.data['福建']['cure'] == 40

    def test_add_province(self, stat_data):
        stat = Statistic()
        stat.data = stat_data
        stat.change_infected('广东', 1)
        assert stat.data['广东']['infected'] == 1

    def test_total(self, stat_data):
        stat = Statistic()
        stat.data = stat_data
        assert stat.get_total() == {'infected': 110, 'suspect': 220, 'cure': 330, 'death': 440}

    @pytest.mark.parametrize('line', [
        "福建 新增 感染患者 2人",
        "福建 新增 疑似患者 5人",
        "湖北 新增 感染患者 15人",
        "湖北 感染患者 流入 福建 2人",
        "湖北 疑似患者 流入 福建 3人",
        "湖北 死亡 1人",
        "湖北 治愈 2人",
        "福建 疑似患者 确诊感染 1人",
        "湖北 排除 疑似患者 2人"
    ])
    def test_parse_record_line(self, line):
        ans = {
            '福建 新增 感染患者 2人': {
                '福建': {'infected': 12, 'suspect': 20, 'cure': 30, 'death': 40},
                '湖北': {'infected': 100, 'suspect': 200, 'cure': 300, 'death': 400}
            },
            '福建 新增 疑似患者 5人': {
                '福建': {'infected': 10, 'suspect': 25, 'cure': 30, 'death': 40},
                '湖北': {'infected': 100, 'suspect': 200, 'cure': 300, 'death': 400}
            },
            '湖北 新增 感染患者 15人': {
                '福建': {'infected': 10, 'suspect': 20, 'cure': 30, 'death': 40},
                '湖北': {'infected': 115, 'suspect': 200, 'cure': 300, 'death': 400}
            },
            '湖北 感染患者 流入 福建 2人': {
                '福建': {'infected': 12, 'suspect': 20, 'cure': 30, 'death': 40},
                '湖北': {'infected': 98, 'suspect': 200, 'cure': 300, 'death': 400}
            },
            '湖北 疑似患者 流入 福建 3人': {
                '福建': {'infected': 10, 'suspect': 23, 'cure': 30, 'death': 40},
                '湖北': {'infected': 100, 'suspect': 197, 'cure': 300, 'death': 400}
            },
            '湖北 死亡 1人': {
                '福建': {'infected': 10, 'suspect': 20, 'cure': 30, 'death': 40},
                '湖北': {'infected': 99, 'suspect': 200, 'cure': 300, 'death': 401}
            },
            '湖北 治愈 2人': {
                '福建': {'infected': 10, 'suspect': 20, 'cure': 30, 'death': 40},
                '湖北': {'infected': 98, 'suspect': 200, 'cure': 302, 'death': 400}
            },
            '福建 疑似患者 确诊感染 1人': {
                '福建': {'infected': 11, 'suspect': 19, 'cure': 30, 'death': 40},
                '湖北': {'infected': 100, 'suspect': 200, 'cure': 300, 'death': 400}
            },
            '湖北 排除 疑似患者 2人': {
                '福建': {'infected': 10, 'suspect': 20, 'cure': 30, 'death': 40},
                '湖北': {'infected': 100, 'suspect': 198, 'cure': 300, 'death': 400}
            }
        }
        stat = Statistic()
        # stat.data = stat_data
        stat.data = {
            '福建': {'infected': 10, 'suspect': 20, 'cure': 30, 'death': 40},
            '湖北': {'infected': 100, 'suspect': 200, 'cure': 300, 'death': 400}
        }
        stat.parse_record_line(line)
        assert stat.data == ans[line]

import pytest
from datetime import date
from Lib import parse_output_line, str_to_date


@pytest.mark.parametrize('province', ['福建', '湖北', '全国'])
def test_parse_output_line(province):
    data = {'福建': {'infected': 22, 'suspect': 38, 'cure': 3, 'death': 0}, '湖北': {'infected': 125, 'suspect': 279, 'cure': 24, 'death': 21}, '全国': {'infected': 147, 'suspect': 317, 'cure': 27, 'death': 21}}
    ans = {
        '福建': '福建 感染患者22人 疑似患者38人 治愈3人 死亡0人\n',
        '湖北': '湖北 感染患者125人 疑似患者279人 治愈24人 死亡21人\n',
        '全国': '全国 感染患者147人 疑似患者317人 治愈27人 死亡21人\n'
    }
    assert parse_output_line(province, data) == ans[province]


def test_parse_output_line_2():
    data = {'福建': {'infected': 22, 'suspect': 38, 'cure': 3, 'death': 0}, '湖北': {'infected': 125, 'suspect': 279, 'cure': 24, 'death': 21}, '全国': {'infected': 147, 'suspect': 317, 'cure': 27, 'death': 21}}
    assert parse_output_line('北京', data) == ''


@pytest.mark.parametrize('s, sep', [
    ('2020-01-01', '-'),
    ('2019 12 31', ' ')
])
def test_str_to_date(s, sep):
    ans = {
        '2020-01-01': date(2020, 1, 1),
        '2019 12 31': date(2019, 12, 31)
    }
    assert str_to_date(s, sep) == ans[s]

7. 单元测试覆盖率优化和性能测试

由于主要过程的代码是用面向过程的方式写的，找了很久资料也不清楚这样的单元测试要怎么做（个人感觉单元测试是针对function/class/module的），除此之外的代码均已覆盖。

8. 代码规范

https://github.com/ldc-37/InfectStatistic-main/blob/master/221701331/codestyle.md

9. 心路历程与收获

我认为正式代码部分难度普通，个人花费的时间和代码量也不算多。但是单元测试带来了不小的麻烦：一方面是不熟悉相关的工具，使用的pytest及相关插件在网上没有那么多有效的中文教程（大概看了下英文文档），另一方面也是对测试这块缺少了解。之后会进一步了解这一方面的知识。另外后期的时间分配不够合理，导致博客内容不够充实，下一次会努力改正。
关于收获……通过本次项目，我入门了一下python，了解了几个常用库。同时也熟悉了git和github的相关操作。

10. 技术路线图相关仓库

Element UI
链接：https://github.com/ElemeFE/element
介绍：一套为开发者、设计师和产品经理准备的基于 Vue 2.0 的桌面端组件库，能极大提升项目的开发速度。
Vue-element-admin
链接：https://github.com/PanJiaChen/vue-element-admin
介绍：一个后台前端解决方案，它基于 vue 和 element-ui实现。它使用了最新的前端技术栈，内置了 i18 国际化解决方案，动态路由，权限验证，提炼了典型的业务模型，提供了丰富的功能组件
puppeteer
链接：https://github.com/puppeteer/puppeteer
介绍：基于Node.js和Chrominum内核的无头浏览器，是一个强大的爬虫/性能测试工具。
Mpvue
链接：https://github.com/Meituan-Dianping/mpvue
介绍：一个使用 Vue.js 开发小程序的前端框架，目前支持微信、支付宝、头条等小程序。框架基于 Vue.js，修改了的运行时框架 runtime 和代码编译器 compiler 实现，使其可运行在小程序环境中。
Layui
链接：https://github.com/sentsin/layui
介绍：门槛较低的前端UI框架，遵循原生 HTML/CSS/JS 的书写与组织形式，对新手较为友好。

posted @ 2020-02-21 22:57 ldc-37 阅读(146) 评论(0) 收藏举报

刷新页面返回顶部

ldc-37的博客

221701331