沉淀,再出发:结合案例看python

沉淀,再出发:结合案例看python

一、前言

  关于python,如果不经过大型程序开发的洗礼,我们很难说自己已经懂得了python了,因此,我们需要通过稍微结构化的编程来学习python。

二、一个案例

   首先我们看一下需要具备的前提知识。

   2.1、新建pandas表格

man_num = 100
women_num = 100
pd.DataFrame( [['w'+str(i) for i in random.sample(range(1,women_num+1),women_num)] for j in range(man_num)], index = ['m'+str(i) for i in range(1,man_num+1)], columns = ['level'+str(i) for i in range(1,women_num+1)] )

    通过上述的方式,我们创建出了一个表格:

   其中random的sample方法,我们可以从下面的例子中理解,在上面就是挑选自身的一种置换来填充每一个单元,for j in range(man_num)其实就是重复多少行的意思:

import random

list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in range(3):
    slice = random.sample(list, 5)  # 从list中随机获取5个元素,作为一个片断返回
    print(slice)
    print(list, '\n')  # 原有序列并没有改变

      2.2、reset_index()和to_csv()

    def to_csv(self, path=None, index=True, sep=",", na_rep='',
               float_format=None, header=False, index_label=None,
               mode='w', encoding=None, compression=None, date_format=None,
               decimal='.'):
        """
        Write Series to a comma-separated values (csv) file
        Parameters
        ----------
        path : string or file handle, default None
            File path or object, if None is provided the result is returned as
            a string.
        na_rep : string, default ''
            Missing data representation
        float_format : string, default None
            Format string for floating point numbers
        header : boolean, default False
            Write out series name
        index : boolean, default True
            Write row names (index)
        index_label : string or sequence, default None
            Column label for index column(s) if desired. If None is given, and
            `header` and `index` are True, then the index names are used. A
            sequence should be given if the DataFrame uses MultiIndex.
        mode : Python write mode, default 'w'
        sep : character, default ","
            Field delimiter for the output file.
        encoding : string, optional
            a string representing the encoding to use if the contents are
            non-ascii, for python versions prior to 3
        compression : string, optional
            A string representing the compression to use in the output file.
            Allowed values are 'gzip', 'bz2', 'zip', 'xz'. This input is only
            used when the first argument is a filename.
        date_format: string, default None
            Format string for datetime objects.
        decimal: string, default '.'
            Character recognized as decimal separator. E.g. use ',' for
            European data
        """
        from pandas.core.frame import DataFrame
        df = DataFrame(self)
        # result is only a string if no path provided, otherwise None
        result = df.to_csv(path, index=index, sep=sep, na_rep=na_rep,
                           float_format=float_format, header=header,
                           index_label=index_label, mode=mode,
                           encoding=encoding, compression=compression,
                           date_format=date_format, decimal=decimal)
        if path is None:
           return result

     2.3、stack()和unstack()

     在用pandas进行数据重排时,经常用到stack和unstack两个函数。stack的意思是堆叠,堆积,unstack即“不要堆叠”。常见的数据的层次化结构有两种,一种是表格,一种是“花括号”,即下面这样的l两种形式:

 

       表格在行列方向上均有索引(类似于DataFrame),花括号结构只有“列方向”上的索引(类似于层次化的Series),结构更加偏向于堆叠(Series-stack,方便记忆)。stack函数会将数据从”表格结构“变成”花括号结构“,即将其行索引变成列索引,反之,unstack函数将数据从”花括号结构“变成”表格结构“,即要将其中一层的列索引变成行索引。

import numpy as np
import pandas as pd
from pandas import Series,DataFrame
data=DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['street1','street2']),columns=pd.Index(['one','two','three']))
print(data)
print('-----------------------------------------\n')
data2=data.stack()
data3=data2.unstack()
print(data2)
print('-----------------------------------------\n')
print(data3)

   2.4、DataFrame上的concat()

    2.5、案例

1    有一座城市,当地风俗是,想结婚的男子必须先向心仪的女子求婚,而女子则需要等待求婚。
2    牧师每年会邀请人数相同的适婚男女参与一次集体相亲。一次相亲活动可能有很多轮,男子会首先向自己最爱的女子求婚,女子则会在所有的追求者中选择她的最爱;
如果男子被拒绝,下一轮会向他第二喜欢的女子求婚;上一轮已经订婚的女子如果得到她更爱的人的求婚,则会毫不留情地抛弃未婚夫,和更爱的人在一起。
被抛弃的男子需要重新参与求婚。如此反复,等大家都订婚,就举办集体婚礼。
3 假设: 4 1)参加求婚的男女数量保持一致 5 2)每个男子都按喜爱程度对女子进行排序,比如最爱a,其次爱b,再次爱c 6 3)每个女子也同样给每个男子排序

    2.6、安装 pyecharts和pyecharts_snapshot

    2.7、创造样例

 1 import pandas as pd
 2 import random
 3 
 4 from configuration import MAN_NUM,WOMAN_NUM
 5 
 6 
 7 def create_sample():
 8     man_num = MAN_NUM
 9     women_num = WOMAN_NUM
10 
11     #设置男女生喜好样本
12     print('==============================生成样本数据==============================')
13     man = pd.DataFrame( [['w'+str(i) for i in random.sample(range(1,women_num+1),women_num)] \
14                           for i in range(man_num)],
15                         index = ['m'+str(i) for i in range(1,man_num+1)],
16                         columns = ['level'+str(i) for i in range(1,women_num+1)]
17                         )
18 
19     women = pd.DataFrame( [['m'+str(i) for i in random.sample(range(1,man_num+1),man_num)] \
20                           for i in range(women_num)],
21                         index = ['w'+str(i) for i in range(1,women_num+1)],
22                         columns = ['level'+str(i) for i in range(1,man_num+1)]
23                         )
24     return (man,women)
25 
26 if __name__ == '__main__':
27     create_sample(man,women)

   2.8、生成映射表

 1 import pandas as pd
 2 
 3 #设置姻缘关系表
 4 def create_mapping_table(man,women):
 5 
 6     man_ismapping = pd.DataFrame({
 7             'man_id':man.index,
 8             'target':'n',
 9             'love_level':0,
10             'range':0
11             }).set_index('man_id')
12 
13     women_ismapping = pd.DataFrame({
14             'women_id':women.index,
15             'target':'n',
16             'love_level':0,
17             'range':0
18             }).set_index('women_id')
19     return (man_ismapping,women_ismapping)
20 
21 if __name__ == '__main__':
22     create_mapping_table(man,women)

    2.9、创建目录,完成初始化

 1 from configuration import  TEST_NUM
 2 import os
 3 from scr.create_sample import create_sample
 4 from scr.create_mapping_table import create_mapping_table
 5 import pandas as pd
 6 
 7 def loop_script():
 8     test_num = TEST_NUM
 9     if not os.path.exists('./data'):
10         os.makedirs('./data')
11 
12     for i in range(1,test_num+1):
13         print('==============================开始创建测试文件夹{}=============================='.format(i))
14         path = './data/test' + str(i)
15         if not os.path.exists(path):
16             os.makedirs(path)
17         sample_data = create_sample()
18         man = sample_data[0]
19         women = sample_data[1]
20         man.reset_index().to_csv(path+'/'+'man_sample.csv', index=0)
21         women.reset_index().to_csv(path+'/'+'woman_sample.csv', index=0)
22         print('==============================样本数据生成成功==============================')
23         print('==============================创建婚姻关系表==============================')
24         man = pd.read_csv(path+'/'+'man_sample.csv').set_index('index')
25         women = pd.read_csv(path+'/'+'woman_sample.csv').set_index('index')
26         mapping_data = create_mapping_table(man, women)
27         man_ismapping = mapping_data[0]
28         women_ismapping = mapping_data[1]
29         man_ismapping.reset_index().to_csv(path+'/'+'man_ismapping.csv', index=0)
30         women_ismapping.reset_index().to_csv(path+'/'+'women_ismapping.csv', index=0)
31         print('==============================婚姻关系表创建完成==============================')

   2.10、业务逻辑计算

 1 import pandas as pd
 2 from configuration import  TEST_NUM
 3 
 4 def calculation():
 5     test_num =  TEST_NUM
 6     for i in range(1,test_num+1):
 7         path = './data/test' + str(i)
 8         man = pd.read_csv(path + '/' + 'man_sample.csv').set_index('index')
 9         women = pd.read_csv(path + '/' + 'woman_sample.csv').set_index('index')
10         man_ismapping = pd.read_csv(path + '/' + 'man_ismapping.csv').set_index('man_id')
11         women_ismapping = pd.read_csv(path + '/' + 'women_ismapping.csv').set_index('women_id')
12         print('==============================测试集{}模拟开始=============================='.format(i))
13         print('==============================开始模拟求婚过程==============================')
14         level_num = 0
15         while man_ismapping['love_level'].min() == 0:
16             level_num += 1
17             print('==============================开始第{}天婚姻配对=============================='.format(level_num))
18             u_mapping_man = man_ismapping[man_ismapping.target == 'n'].index.tolist()
19 
20             if level_num < 2:
21                 level_col = 'level' + str(level_num)
22                 man_choose = man[man.index.isin(u_mapping_man)][level_col].to_frame().reset_index()
23                 man_choose.columns = ['man_id', 'women_id']
24                 man_choose['range'] = 1
25             else:
26                 m_id = u_mapping_man
27                 l = []
28                 for man_id in m_id:
29                     col_n = int(man_ismapping[man_ismapping.index == man_id].range[0])
30                     level_col = 'level' + str(col_n + 1)
31                     women_id = man[man.index == man_id][level_col][0]
32                     rg = col_n + 1
33                     l.append([man_id, women_id, rg])
34                 man_choose = pd.DataFrame(l, columns=['man_id', 'women_id', 'range'])
35 
36             for r in range(0, len(man_choose)):
37                 relationship = man_choose[man_choose.index == r]
38                 m = [i for i in relationship['man_id']][0]
39                 w = [i for i in relationship['women_id']][0]
40                 find = women[women.index == w].unstack().reset_index()
41                 find.columns = ['level', 'women_id', 'man_id']
42                 find = int([i for i in find[find['man_id'] == m]['level']][0].split('level')[1])
43                 o_love_level = [i for i in women_ismapping[women_ismapping.index == w]['love_level']][0]
44                 rg = [i for i in relationship['range']][0]
45                 if o_love_level == 0:
46                     women_ismapping.loc[w, 'love_level'] = find
47                     women_ismapping.loc[w, 'target'] = m
48                     women_ismapping.loc[w, 'range'] = level_num
49                     man_ismapping.loc[m, 'love_level'] = rg
50                     man_ismapping.loc[m, 'target'] = w
51                     man_ismapping.loc[m, 'range'] = rg
52                 elif o_love_level > find:
53                     m_o = women_ismapping.loc[w, 'target']
54                     man_ismapping.loc[m_o, 'love_level'] = 0
55                     man_ismapping.loc[m_o, 'target'] = 'n'
56                     man_ismapping.loc[m, 'love_level'] = rg
57                     man_ismapping.loc[m, 'target'] = w
58                     man_ismapping.loc[m, 'range'] = rg
59                     women_ismapping.loc[w, 'love_level'] = find
60                     women_ismapping.loc[w, 'target'] = m
61                     women_ismapping.loc[w, 'range'] = level_num
62                 else:
63                     man_ismapping.loc[m, 'range'] = rg
64                     pass
65 
66         print('==============================婚姻配对完成==============================')
67         print('共进行了{}次牵线搭桥,在第{}天举办集体婚礼。'.format(level_num, level_num + 1))
68 
69         print('==============================导出配对明细表==============================')
70         man_love_level_mean = man_ismapping.love_level.mean()
71         women_love_level_mean = women_ismapping.love_level.mean()
72         detail = [[level_num,level_num,man_love_level_mean,women_love_level_mean]]
73         pd.DataFrame(detail,columns=['match_range','party_time','man_love_level_mean','women_love_level_mean'])\
74                 .to_csv(path+'/'+'test_detail.csv', index=0)
75         man_ismapping.reset_index().to_csv(path+'/'+'man_match_table.csv', index=0)
76         women_ismapping.reset_index().to_csv(path+'/'+'woman_match_table.csv', index=0)
77         print('==============================导出完毕==============================')
78 
79 if __name__ == '__main__':
80     calculation()

   2.11、绘制图表

 1 import pandas as pd
 2 from configuration import  TEST_NUM
 3 from pyecharts import Bar
 4 import matplotlib.pyplot as plt
 5 import warnings
 6 warnings.filterwarnings("ignore")
 7 
 8 def drawing():
 9     test_num =  TEST_NUM
10     l_table = []
11     for i in range(1,test_num+1):
12         path = './data/test' + str(i)
13         man_match_table = pd.read_csv(path + '/' + 'man_match_table.csv').set_index('man_id')
14         woman_match_table = pd.read_csv(path + '/' + 'woman_match_table.csv').set_index('women_id')
15 
16         man_match_table = man_match_table.groupby('love_level').count()['range'].sort_values(ascending=False)
17         woman_match_table = woman_match_table.groupby('love_level').count()['range'].sort_values(ascending=False)
18 
19         man_attr = man_match_table.index.tolist()
20         man_v = man_match_table.values.tolist()
21         bar_man = Bar('男生匹配对象喜爱程度分布',width=900,height=500)
22         bar_man.add('频数',man_attr,man_v,mark_line=["max"],label_color = ['#9932CC'])
23         bar_man.render(path + '/' + '男生匹配对象喜爱程度分布.html')
24 
25         women_attr = woman_match_table.index.tolist()
26         women_v = woman_match_table.values.tolist()
27         bar_women = Bar('女生匹配对象喜爱程度分布',width=900,height=500)
28         bar_women.add('频数',women_attr,women_v,mark_line=["max"],label_color = ['#FF3030'])
29         bar_women.render(path + '/' + '女生匹配对象喜爱程度分布.html')
30 
31         detail = pd.read_csv(path + '/' + 'test_detail.csv')
32         l_table.append(detail)
33         detail_table = pd.concat(l_table)
34         s_match_range = detail_table['match_range']
35         s_man_love_level_mean = detail_table['man_love_level_mean']
36         women_love_level_mean = detail_table['women_love_level_mean']
37     fig = plt.figure(figsize=(10, 6), facecolor='gray')
38     ax1 = fig.add_subplot(2, 2, 1)
39     ax1.hist(s_match_range,bins = 20,
40                         histtype = 'bar',
41                         align = 'mid',
42                         orientation = 'vertical',
43                         alpha=0.5,
44                         normed =False
45                         )
46     plt.grid(True)
47     ax2 = fig.add_subplot(2, 2, 2)
48     ax2.hist(s_man_love_level_mean, bins=20,
49                          histtype='bar',
50                          align='mid',
51                          orientation='vertical',
52                          alpha=0.5,
53                          normed=False
54                          )
55     plt.grid(True)
56     ax3 = fig.add_subplot(2, 2, 3)
57     ax3.hist(women_love_level_mean, bins=20,
58                          histtype='bar',
59                          align='mid',
60                          orientation='vertical',
61                          alpha=0.5,
62                          normed=False
63                          )
64     plt.grid(True)
65     plt.savefig('./data/detail_graph.png',dpi=400)
66     plt.show()
67     print('==========================制图完成==========================')

     2.12、配置文件和程序入口

 1 #基本参数配置
 2 
 3 #MAN_NUM:男生样本数量
 4 #WOMAN_NUM:女生样本数量
 5 #TEST_NUM:测试次数
 6 
 7 
 8 MAN_NUM = 100
 9 WOMAN_NUM = 100
10 TEST_NUM = 50
1 from scr.loop_script import loop_script
2 from scr.calculation import calculation
3 from scr.drawing import drawing
4 
5 if __name__ == '__main__':
6     loop_script()
7     calculation()
8     drawing()
9     print('========================================任务结束========================================')

三、总结

      通过一个案例,我们更好的理解了python的相关语法和应用,以及结合其他工具的强大的绘图能力和表达能力,对语言的高度浓缩和精炼等等。

  参考资料:https://mp.weixin.qq.com/s?__biz=MzAxMjUyNDQ5OA==&mid=2653557526&idx=1&sn=fa6afe368f3395afc80758f88c81868f&chksm=806e3dabb719b4bd452fe6f2389a56966a65e8e542d34ca475fc8061ace1a591f8ce27be6e8e&mpshare=1&scene=23&srcid=1016fiPYJuCJ4b33lW5V0jpx#rd

posted @ 2018-10-16 11:02  精心出精品  阅读(419)  评论(0编辑  收藏  举报