2017数学建模B题-众包任务定价优化方案

一、案例背景

2017年高教社杯全国大学生数学建模竞赛题目

B题 “拍照赚钱”的任务定价

“拍照赚钱”是移动互联网下的一种自助式服务模式。用户下载APP,注册成为APP的会员,然后从APP上领取需要拍照的任务(比如上超市去检查某种商品的上架情况),赚取APP对任务所标定的酬金。这种基于移动互联网的自助式劳务众包平台,为企业提供各种商业检查和信息搜集,相比传统的市场调查方式可以大大节省调查成本,而且有效地保证了调查数据真实性,缩短了调查的周期。因此APP成为该平台运行的核心,而APP中的任务定价又是其核心要素。如果定价不合理,有的任务就会无人问津,而导致商品检查的失败。
附件一是一个已结束项目的任务数据,包含了每个任务的位置、定价和完成情况(“1”表示完成,“0”表示未完成);附件二是会员信息数据,包含了会员的位置、信誉值、参考其信誉给出的任务开始预订时间和预订限额,原则上会员信誉越高,越优先开始挑选任务,其配额也就越大(任务分配时实际上是根据预订限额所占比例进行配发);附件三是一个新的检查项目任务数据,只有任务的位置信息。请完成下面的问题:

  1. 研究附件一中项目的任务定价规律,分析任务未完成的原因。
  2. 为附件一中的项目设计新的任务定价方案,并和原方案进行比较。
  3. 实际情况下,多个任务可能因为位置比较集中,导致用户会争相选择,一种考虑是将这些任务联合在一起打包发布。在这种考虑下,如何修改前面的定价模型,对最终的任务完成情况又有什么影响?
  4. 对附件三中的新项目给出你的任务定价方案,并评价该方案的实施效果。

附件一:已结束项目任务数据
附件二:会员信息数据
附件三:新项目任务数据

二、案例目标及实现思路

image.png

三、数据获取与探索

1、Folium地理信息可视化包安装
pip install folium
Requirement already satisfied: folium in c:\users\13717\anaconda3\lib\site-packages (0.12.1.post1)
Requirement already satisfied: branca>=0.3.0 in c:\users\13717\anaconda3\lib\site-packages (from folium) (0.5.0)
Requirement already satisfied: requests in c:\users\13717\anaconda3\lib\site-packages (from folium) (2.24.0)
Requirement already satisfied: numpy in c:\users\13717\appdata\roaming\python\python37\site-packages (from folium) (1.19.5)
Requirement already satisfied: jinja2>=2.9 in c:\users\13717\anaconda3\lib\site-packages (from folium) (3.0.3)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\13717\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.1.1)
Requirement already satisfied: idna<3,>=2.5 in c:\users\13717\appdata\roaming\python\python37\site-packages (from requests->folium) (2.6)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\13717\appdata\roaming\python\python37\site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\13717\anaconda3\lib\site-packages (from requests->folium) (2021.10.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\13717\anaconda3\lib\site-packages (from requests->folium) (1.25.11)
Note: you may need to restart the kernel to use updated packages.
2、数据读取与地图可视化
import pandas as pd
#1.读取任务数据和会员数据
A = pd.read_excel('附件一:已结束项目任务数据.xls') 
B = pd.read_excel('附件二:会员信息数据.xlsx')
A
任务号码 任务gps纬度 任务gps经度 任务标价 任务执行情况
0 A0001 22.566142 113.980837 66.0 0
1 A0002 22.686205 113.940525 65.5 0
2 A0003 22.576512 113.957198 65.5 1
3 A0004 22.564841 114.244571 75.0 0
4 A0005 22.558888 113.950723 65.5 0
... ... ... ... ... ...
830 A0831 23.044062 113.125784 65.5 0
831 A0832 22.833262 113.280152 72.0 1
832 A0833 22.814676 113.827731 85.0 1
833 A0834 23.063674 113.771188 65.5 1
834 A0835 23.123294 113.110382 85.0 1

835 rows × 5 columns

A.iloc[0,1]
22.566142254795
A.iloc[0,2]
113.980836777953
B
会员编号 维度 经度 预订任务限额 预订任务开始时间 信誉值
0 B0001 22.947097 113.679983 114 06:30:00 67997.3868
1 B0002 22.577792 113.966524 163 06:30:00 37926.5416
2 B0003 23.192458 113.347272 139 06:30:00 27953.0363
3 B0004 23.255965 113.318750 98 06:30:00 25085.6986
4 B0005 33.652050 116.970470 66 06:30:00 20919.0667
... ... ... ... ... ... ...
1872 B1873 22.840505 113.277245 1 08:00:00 0.0124
1873 B1874 23.069415 113.287606 1 08:00:00 0.0121
1874 B1875 23.333446 113.301736 1 08:00:00 0.0062
1875 B1876 22.693506 113.994101 1 08:00:00 0.0036
1876 B1877 23.133238 113.239864 1 08:00:00 0.0001

1877 rows × 6 columns

#2.导入地图可视化包
import folium as f
#利用map函数创建地图,参数依次为地图中心位置(纬度,经度)、地图缩放大小、地理坐标系编码
M=f.Map([A.iloc[0,1],A.iloc[0,2]],zoom_start=14,crs='EPSG3857')
#在地图上画圆圈,利用Circle函数实现,参数依次为半径大小(单位:米)、圆心位置(纬度、经度)、颜色……
for t in range(len(A)): 
   f.Circle(radius=50, location=[A.iloc[t,1],A.iloc[t,2]], color='black', 
            fill=True, fill_color='black').add_to(M)

for t in range(len(B)): 
   f.Circle(radius=50, location=[B.iloc[t,1],B.iloc[t,2]], color='red', 
            fill=True, fill_color='red').add_to(M)
#保存地图,html文件
M.save('f.html')

image.png

四、指标计算

1、指标设计
探究影响任务定价的主要因素,是本案例的主要任务。实际上,一个任务的定价不仅与其周围的任务数量、会员数量有关,还与任务发布时间有一定关系。
通过分析数据,我们发现任务的发布时间有一定的规律,即任务从6:30开始发布一批,之后三分钟发布一批贸易知道8:00任务发放结束。

根据以上分析,对附件一的每个任务,设计了12个指标

字段名称 字段解释

Z1  对每一个任务,计算其 Q 公里范围内的所有任务数量
Z2  对每一个任务,计算其 Q 公里范围内的所有任务平均价格
Z2	对每一个任务,计算其 Q 公里范围内的所有会员数量
Z3	对每一个任务,计算其 Q 公里范围内的所有会员信誉平均值
Z4	对每一个任务,计算其 Q 公里范围内的所有会员所有时段可预订任务限额
Z5	对每一个任务,计算其 Q 公里范围内的所有会员在 6:30 时段可预订任务限额
Z6	对每一个任务,计算其 Q 公里范围内的所有会员在 6:33-6:45 时段可预订任务限额
Z7	对每一个任务,计算其 Q 公里范围内的所有会员在 6:48-7:03 时段可预订任务限额
Z8	对每一个任务,计算其 Q 公里范围内的所有会员在 7:06-7:21 时段可预订任务限额
Z9	对每一个任务,计算其 Q 公里范围内的所有会员在 7:24-7:39 时段可预订任务限额
Z10	对每一个任务,计算其 Q 公里范围内的所有会员在 7:42-7:57 时段可预订任务限额
Z11	对每一个任务,计算其 Q 公里范围内的所有会员在 8:00 时段可预订任务限额

其中Q为公里

五、程序实现

1、Z1~Z5的计算

import pandas as pd      #导入pandas库
import math               #导入数学函数包
A=pd.read_excel('附件一:已结束项目任务数据.xls') 
B=pd.read_excel('附件二:会员信息数据.xlsx')
A_W0=A.iloc[0,1]  #第0个任务的维度
A_J0=A.iloc[0,2]  #第0个任务的经度
A_W1=A.iloc[1,1]  #第1个任务的维度
A_J1=A.iloc[1,2]  #第1个任务的经度
B_W0=B.iloc[0,1]  #第0个会员的维度
B_J0=B.iloc[0,2]  #第0个会员的经度
#第0个任务到第1个任务之间的距离
d1=111.19*math.sqrt((A_W0-A_W1)**2+(A_J0-A_J1)**2*
math.cos((A_W0+A_W1)*math.pi/180)**2);  
#第0个任务到第0个会员之间的距离
d2=111.19*math.sqrt((A_W0-B_W0)**2+(A_J0-B_J0)**2*
   math.cos((A_W0+B_W0)*math.pi/180)**2);
print('第0个任务到第1个任务之间的距离 ',d1)
print('第0个任务到第0个会员之间的距离 ',d2)
第0个任务到第1个任务之间的距离  13.71765563354376
第0个任务到第0个会员之间的距离  48.41201229628393
import pandas as pd     #导入pandas库
import numpy as np      #导入numpy库
import math               #导入数学函数库
A=pd.read_excel('附件一:已结束项目任务数据.xls') 
B=pd.read_excel('附件二:会员信息数据.xlsx')
A_W0=A.iloc[0,1]  #第0个任务的维度
A_J0=A.iloc[0,2]  #第0个任务的经度
# 预定义数组D1,用于存放第0个任务与所有任务之间的距离
# 预定义数组D2,用于存放第0个任务与所有会员之间的距离
D1=np.zeros((len(A)))
D2=np.zeros((len(B)))
for t in range(len(A)):
     A_Wt=A.iloc[t,1]  #第t个任务的维度
     A_Jt=A.iloc[t,2]  #第t个任务的经度
     #第0个任务到第t个任务之间的距离
     dt=111.19*math.sqrt((A_W0-A_Wt)**2+(A_J0-A_Jt)**2*
       math.cos((A_W0+A_Wt)*math.pi/180)**2);  
     D1[t]=dt
for k in range(len(B)):
     B_Wk=B.iloc[k,1] #第k个会员的维度
     B_Jk=B.iloc[k,2] #第k个会员的经度
     #第0个任务到第k个会员之间的距离
     dk=111.19*math.sqrt((A_W0-B_Wk)**2+(A_J0-B_Jk)**2*
         math.cos((A_W0+B_Wk)*math.pi/180)**2); 
     D2[k]=dk
print('第0个任务到第t个任务之间的距离:',D1)
print('第0个任务到第k个会员之间的距离:',D2)
第0个任务到第t个任务之间的距离: [  0.          13.71765563   2.1832139   20.68868509   2.49640437
  20.45047092   2.02098319   1.93986332   9.94037198   5.10756368
   6.03794875   6.31279022   2.03332205   7.71496502   8.17411737
   5.32790828   6.38589159   9.94579264   2.28216922   8.33877549
   9.04664348   6.61468372   6.37344585   4.04451543   2.77471416
   8.85894546   6.67029682   3.43526419   0.18415874  13.87878228
  12.09422895   5.53411942   2.53045351  13.17378353  11.98863144
   5.71679957  21.68853625  35.61596943  17.71053466  12.09682323
  12.89954308  20.72116616  29.25735922  15.16117831  10.28273508
  25.62972416  25.49548553  13.20342429  18.80104662  31.65704503
   9.63653185  12.99370838  37.27384638  18.86826057  21.23483439
  34.73757543   6.32988843  19.24385294   7.41199119  10.47540666
  12.57686501  14.97279106  35.61596943  34.61291     29.03948838
  22.33983045  11.32811984  12.78642487  20.5968566   22.99858815
  31.50338743   7.02562194  20.55211491   8.82616822  12.29696271
   9.97547228  30.55368026  22.99858815  19.69627459   9.1471004
  20.90173168  14.5267522   13.24286315  11.4271028   19.08456138
  23.41188368  21.96542574  24.06340157  20.60692268  15.41552785
  11.12272331  18.37654908  17.06594362  20.43696859   9.94050367
  18.97382648  17.11957907  17.3408921   21.96542574   6.77197638
  19.42323153  10.38253467  17.68104329   7.1452814   17.02467404
  11.72041453  12.44227191  18.3294174    5.80780346  16.20561512
  12.6133022    6.64543644  12.31311155  10.41455456  18.7289315
  18.1004653   19.51094679   9.84294994   5.69288489   6.79225606
  30.07249674   7.48796433   6.07138118   5.57129353  24.74619694
  71.23157263  81.0013278   82.67192184  79.90532072  79.40640293
  76.5698521   78.45183644  75.07740179  76.17678281  79.21066692
  83.332455    78.38057328  84.72478215  68.32965227  70.1463727
  71.25069175  76.35848965  83.4299794   78.17910844  81.04463525
  75.04494585  82.68239575  73.02645158  73.02142274  80.78793774
  71.25069175  70.75284883  71.39237676  70.12031435  71.25069175
  70.77725424  74.17820485  75.91556359  81.55686844  60.0882045
 104.54122971  68.03816085  89.43247706  60.6978272   91.25004318
  95.33426155  95.89058955 101.62356846 101.2870407   92.39121716
  83.14175329  91.25004318  94.37102238  94.78329168  90.23958129
  94.64350867  10.58139099  37.20914895  84.87133744  83.3877189
  83.47253643  76.12491657  79.21066692  82.28239467  83.52513852
  82.00925957  79.47790834  80.01988251  54.18968497  80.26804983
  78.33278689  80.94159869  85.89478376  79.42154355  83.03411625
  83.09111094  79.31276994  46.81162556  39.97837706  38.93061536
  79.15892808 108.19109898  79.12143374  42.74741561  42.83353957
  42.91370757  37.36270468  42.60872633  37.31256948  71.25069175
  86.95228537 116.06280182 101.27792509 118.48351318  86.82673843
 109.22073614 118.78433393  85.08132635 112.77612033 109.25626288
 116.56771017  37.11925585 118.64551264 109.24984224 109.13708435
 116.44176205 109.78479507 109.65234039 110.31674925  78.15068799
 107.76005604 112.69874826 108.35533384 109.26469372 110.5985257
 109.78295962  82.46025224 110.87608913  76.90164111 109.24984224
  77.06030973 112.51849872  80.96087355  82.32045173 112.81308912
  75.6606271   76.04836598  78.66393233  75.7367637   78.39887554
  74.81567715  74.25471244  76.75739287  80.06692786  77.13389937
  76.21597272  76.75124572  75.78362713  67.27772921  76.05908203
  75.91556359  77.14518269  63.60302447  78.08479355  71.69385654
  51.84369599  62.43415603  60.0882045   65.11850743  60.800911
  54.19557967  70.1336686   51.90934755  71.07498891  63.26717431
  61.51910038  65.7978305   60.0882045   68.92740328  61.41645185
  63.1832834   65.7089122   61.627246    63.92317183  64.40071506
  64.36108473  63.32173634  64.90137402  63.74257556  77.45370447
  63.95281033 113.85392192  61.67680339  64.66253015 115.09995896
 125.14950401 149.77285419 129.97443507 114.34227409 112.80836985
 112.83065543 113.42447652 138.99476086 113.37414902  92.51698962
 114.0734442  113.31758755 112.18392355  90.10984263 101.15223727
 114.37717989  91.24213803  86.49740675  88.27643885  91.82985884
  89.9153451   89.99055321  92.22157241  91.63104466  90.20735126
  95.33426155  92.84802084  91.35442172  89.87104461  90.91592043
  92.11405484  85.2400677   92.24276291  92.84186127  90.83828621
  90.10753667  88.76313372  90.68898545  89.94597796  95.33426155
  88.10732228  90.18469612  88.72991895  92.91754055  92.62948768
  95.33426155  92.51698962  91.65591219  87.0245856   90.85408385
  92.96370858  84.62291166  87.00235654  92.24150658  90.48865758
  89.97043298  87.84606913  86.32338492  86.50586831  10.57375851
   7.09638549  14.330216     2.48862129  10.84186512   2.48465565
  17.06594362   0.72883643  67.24101369  13.74630904  92.67357207
  94.67056438  94.69688128  94.58160528   4.18735646  86.92334905
  94.66136391  86.40815745  64.13662035  67.37930737  11.00802208
  73.13290291  68.94989468  83.45282242  83.24891806  60.68180368
  81.83721752  83.04758127  84.22244683  38.27386183  30.26547531
  81.42433784  80.01276563  42.86580736  83.52513852  46.15345599
  82.25160017  83.93519529 114.17379355  79.22512732 119.08476178
  83.58340216 109.62474462 106.2173463  109.61127812 109.28753383
 105.3148437   73.14570034  68.10745412  67.44580613  54.19557967
  58.35591392 117.75667811  72.76224014  59.08274625  57.90451079
  62.4052644   72.58003396  64.49325873  67.6137512   60.6978272
  62.83263328  62.13587865 112.44515057  61.30219469  57.59223812
  71.11666999  88.56911016  64.07931345 113.75605306  94.47497216
 105.57683502  93.03561663  98.15615742 113.70516773  93.31596272
 113.76682074  90.37718578  90.68898545  90.86139508 116.58319608
 107.78651802 118.01157757 118.66360741 118.60844268   4.48981547
  10.68008144  27.00122046  15.47640147   0.82023256  13.3242637
  13.11627029  10.4449241    5.75412302   5.33860499  40.23811654
   9.62156702  22.0004146   25.20552354  24.99767491  24.50162149
  11.20880803 118.48351318   7.49569868  42.10008837  19.87723065
  85.73944692   5.53411942  79.03707872   7.19145944   9.97873465
  12.90948872   6.3373011    7.35803549  81.34565256  70.62983946
  81.73341568  82.28674622  87.76095063  76.31930807  80.08024598
  82.40044331  76.59011277  88.40402699  62.05276334  80.83164624
  89.18736446  82.65970851  55.29741244  84.70645115  82.70787
  29.55476029  23.17457011  31.21514768  31.22662694  84.56459008
  55.12377449  47.04959649  36.45315856  57.25914118  54.59313824
  60.39091798  56.85614542  61.90311024  34.85345386  29.29672716
  34.02790116  34.83801004  34.90527977  53.29704116  56.4325512
  40.56078293  41.85005139  34.28899047  31.39645065  28.97749494
  30.49144675  56.06361973  29.42959121  84.88046383  54.91329761
  29.1786203   55.04443536  49.99191669  58.67879483  57.01622987
  56.81422225  26.79629398  63.24011861  31.35753323  29.27511388
  58.14710457  51.69747181  56.22395678  45.31161953  52.54683822
  49.74295782  55.97055203  50.89971277  51.26635152  53.230007
  49.25578268  47.97512721  64.45016382  56.67349236 108.12087388
  97.90221513 109.21548315 106.95747843 107.07835632 107.72678737
 106.0771403  104.6423619   92.14069181  80.11023968  94.63563728
 107.08916282  68.29624064  91.6882411   87.49455891  93.44553111
  87.08341746 107.11282168  85.26622128  97.4446242   85.73317322
  98.04797615  82.26430976  90.14209967  89.66844107  92.60417239
  96.60050358  80.11023968  85.11251413  92.77952803  85.90730097
  93.68944555  82.06042249  86.25736185 106.23680081  91.08959428
  89.26578201  82.85621885  92.77023818 100.1658063   89.26578201
  96.34908428  83.82105952  83.080916    88.4890959   93.84065857
  80.73272222 105.31609949  84.22680773  83.33802813  93.22049403
  82.43650623  86.83009095  86.70502323  92.57248874  88.07101524
  82.792144    85.77303736  85.89487784  85.989464    88.76039295
  88.29955422  84.47181652  82.72893026  85.04752989  83.03265734
  86.31206623  88.41861044  83.38128772  85.49567154  86.21430444
  83.92786765  45.44180727  34.963663    84.90747059  46.04053766
  81.60011195  81.96247287  48.23987384  57.19293647  40.82969274
  28.9206645   30.27882715  45.93691773  33.36829762  24.70339267
  24.63676649  26.39176853  31.35753323  54.54524996  31.21624744
  46.51279332  37.10099369  52.97570115  55.69079154  26.84178321
  47.89812326  22.93053331  56.58722433  36.75803354  35.22908366
  24.23520731  46.41132223  21.32280346  47.27817231  53.03205473
  52.73873859  61.17781726  51.94235829  34.41445029  56.4873337
  52.43124476  34.42290675  52.29041014  79.32083087  88.4890959
  69.45915433  69.22320485  70.53940071  72.42349097  80.1174977
  80.1174977   65.55966269  85.37177398  62.73003992  66.83434646
  64.37666585  58.29832455  63.12140592  73.12675304  71.90151741
  73.72490405  79.285107    64.36364164  69.10885502  80.11023968
  64.88967859  97.70045127  84.31507841  61.19160337  93.72549232
  63.83673053  64.03188955  64.36364164  64.54374859  34.2101134
  47.50876198  60.51295837 100.4185163   32.55860583 101.29327203
  97.70045127  93.96219427  85.98318585  79.15967083  59.52301144
  44.7890111   31.57657118  53.3068013   87.6168451   46.92343638
  48.25351089  34.4548615   24.63676649  39.20121111  57.99136765
  26.67957862  61.56797339  43.33887115  36.8570619   50.62897673
  30.62684113  33.13060559  64.36817078  36.81330551  48.38390884
  23.52948452  37.10099369  21.37417322  34.99081974  51.91351922
  23.73736234  41.96175316  29.80677898  57.99136765  46.41132223
  31.71191367  42.26425937  48.97795828  69.2854667   60.38489142
  66.52827572  64.28138548  70.18381085  66.11302768  95.16799804
  59.03934712 107.05553788  49.6250453   85.39305344  86.05927804
  36.81330551  47.62161426  47.97512721  31.12814763 109.58045132
  53.03360903  36.81330551  64.07769235  45.21448941  51.52391471
  61.77949536  31.18068415  50.00712041  57.66130936  31.79314272
  57.532146    62.84935144  97.73107237  56.38092266  28.53644007
  61.20890377  57.42174096  59.91466624  54.62803362  49.26724583
  52.71559894  42.00543973  57.70056643  60.74709554  57.05603575
  39.12861989  54.29275916  56.48626104  57.89191975  92.80699323
  47.018495    97.69975655  99.01842838  41.75131897  50.7239668
  39.96259416  49.30050271 101.7815881   88.64981674  57.84882558
  92.90773236  61.66448168  93.58619274  85.3579635  101.28525767
  90.27532486  87.60418079  86.69360277 101.40862438 101.88850648
  90.72877064  60.21037229  98.99069411  93.7926647   89.58234356
  90.74105198  41.60973231  61.68403832  39.2692827   38.89245497
  62.00454026  55.01299023  34.16403776  90.72877064  73.50866964
  82.38697483  82.31678746  81.99069083  78.79685006  90.05740222
  81.92393797  80.95802135  90.66119696  82.20062296  85.35450874
  68.48929783  87.27112429  82.02334907  80.89938474  87.99751122
  81.22868519  84.97901477  52.54648892 109.50282524  89.36447218
  85.12968156  62.2476069   30.11050454  57.67224278  91.69956714]
第0个任务到第k个会员之间的距离: [ 48.4120123    1.71402098  85.23709477 ... 100.20090358  14.19957514
  85.36443222]
对第0个任务计算指标z1~z5
Z1 = len(D1[D1<5])
print('周围任务数量:',Z1)


Z2=A.iloc[D1<=5,3].mean()
Z3=len(D2[D2<=5])   
Z4=B.iloc[D2<=5,5].mean()
Z5=B.iloc[D2<=5,3].sum()
周围任务数量: 18
Z2=A.iloc[D1<=5,3].mean()
print('周围任务平均价格:',Z2)
周围任务平均价格: 66.19444444444444
Z3=len(D2[D2<=5])   
print('周围会员数量:',Z3)
周围会员数量: 45
Z4=B.iloc[D2<=5,5].mean()
print('周围会员平均信誉值:',Z4)
周围会员平均信誉值: 1302.327115555555
Z5=B.iloc[D2<=5,3].sum()
print('周围会员总共可预定任务限额:',Z5)
周围会员总共可预定任务限额: 548

所有任务计算指标z1~z5
import pandas as pd     #导入pandas库
import numpy as np      #导入numpy库
import math               #导入数学函数包
A=pd.read_excel('附件一:已结束项目任务数据.xls') 
B=pd.read_excel('附件二:会员信息数据.xlsx')
# 预定义,存放所有任务的指标Z1、Z2、Z3、Z4、Z5
Z=np.zeros((len(A),6))
for t in range(len(A)):
    A_Wt=A.iloc[t,1]  #第q个任务的维度
    A_Jt=A.iloc[t,2]  #第q个任务的经度
    # 预定义数组D1,用于存放第q个任务与所有任务之间的距离
    # 预定义数组D2,用于存放第q个任务与所有会员之间的距离
    D1=np.zeros((len(A)))
    D2=np.zeros((len(B)))
    for i in range(len(A)):
       A_Wi=A.iloc[i,1]  #第t个任务的维度
       A_Ji=A.iloc[i,2]  #第t个任务的经度
       #第q个任务到第t个任务之间的距离
       d1=111.19*math.sqrt((A_Wt-A_Wi)**2+(A_Jt-A_Ji)**2*
          math.cos((A_Wt+A_Wi)*math.pi/180)**2);  
       D1[i]=d1
    for k in range(len(B)):
       B_Wk=B.iloc[k,1]          #第k个会员的维度
       B_Jk=B.iloc[k,2]  #第k个会员的经度
       #第q个任务到第k个会员之间的距离
       d2=111.19*math.sqrt((A_Wt-B_Wk)**2+(A_Jt-B_Jk)**2*
          math.cos((A_Wt+B_Wk)*math.pi/180)**2); 
       D2[k]=d2
    Z[t,0]=t
    Z[t,1]=len(D1[D1<=5])
    Z[t,2]=A.iloc[D1<=5,3].mean()
    Z[t,3]=len(D2[D2<=5])
    Z[t,4]=B.iloc[D2<=5,5].mean()
    Z[t,5]=B.iloc[D2<=5,3].sum()
Z
array([[0.00000000e+00, 1.80000000e+01, 6.61944444e+01, 4.50000000e+01,
        1.30232712e+03, 5.48000000e+02],
       [1.00000000e+00, 9.00000000e+00, 6.83888889e+01, 4.30000000e+01,
        1.24269663e+02, 1.52000000e+02],
       [2.00000000e+00, 2.40000000e+01, 6.58750000e+01, 6.00000000e+01,
        1.01404358e+03, 6.79000000e+02],
       ...,
       [8.32000000e+02, 1.10000000e+01, 6.80454545e+01, 3.00000000e+01,
        9.68627900e+01, 7.70000000e+01],
       [8.33000000e+02, 2.90000000e+01, 6.71206897e+01, 4.40000000e+01,
        5.06203720e+02, 8.61000000e+02],
       [8.34000000e+02, 1.20000000e+01, 7.74166667e+01, 1.00000000e+01,
        5.68841400e+01, 8.70000000e+01]])


2、Z6~Z12的计算

import datetime
def find_I(h1,m1,h2,m2,D2,B):
   I1=B.iloc[:,4].values>=datetime.time(h1,m1)
   I2=B.iloc[:,4].values<=datetime.time(h2,m2)
   I3=D2<=5
   I=I1&I2&I3
   return I
#以第0个任务为例,计算z5~z12

import pandas as pd     #导入pandas库
import numpy as np      #导入nmypy库
import math             #导入数学函数模
A=pd.read_excel('附件一:已结束项目任务数据.xls') 
B=pd.read_excel('附件二:会员信息数据.xlsx')
Z=np.zeros((len(A),13))
A_W0=A.iloc[0,1]  #第0个任务的维度
A_J0=A.iloc[0,2]  #第0个任务的经度
D2=np.zeros((len(B))) #预定义,第0个任务与所有会员之间的距离
for k in range(len(B)):
    B_Wk=B.iloc[k,1]   #第k个会员的维度
    B_Jk=B.iloc[k,2]   #第k个会员的经度
    d2=111.19*math.sqrt((A_W0-B_Wk)**2+(A_J0-B_Jk)**2*
       math.cos((A_W0+B_Wk)*math.pi/180)**2);
    D2[k]=d2
    
Z5=B.iloc[D2<=5,3].sum()
Z6=B.iloc[find_I(6,30,6,30,D2,B),3].sum()
Z7=B.iloc[find_I(6,33,6,45,D2,B),3].sum()
Z8=B.iloc[find_I(6,48,7,3,D2,B),3].sum()
Z9=B.iloc[find_I(7,6,7,21,D2,B),3].sum()
Z10=B.iloc[find_I(7,24,7,39,D2,B),3].sum()
Z11=B.iloc[find_I(7,42,7,57,D2,B),3].sum()
Z12=B.iloc[find_I(8,0,8,0,D2,B),3].sum()

Z6_12=sum([Z6,Z7,Z8,Z9,Z10,Z11,Z12])
print('Z5= ',Z5)
print('sum(Z6~Z12)=',Z6_12)
Z5=  548
sum(Z6~Z12)= 548

3、所有指标的计算

import pandas as pd     #导入pandas库
import numpy as np      #导入nmypy库
import math             #导入数学函数模

A=pd.read_excel('附件一:已结束项目任务数据.xls') 
B=pd.read_excel('附件二:会员信息数据.xlsx')
Z=np.zeros((len(A),13))
for t in range(len(A)):
   A_Wt=A.iloc[t,1]  #第t个任务的维度
   A_Jt=A.iloc[t,2]  #第t个任务的经度
   D1=np.zeros(len(A))
   D2=np.zeros(len(B))
   for i in range(len(A)):
      A_Wi=A.iloc[i,1]  #第i个任务的维度
      A_Ji=A.iloc[i,2]  #第i个任务的经度
      d1=111.19*math.sqrt((A_Wt-A_Wi)**2+(A_Jt-A_Ji)**2*
         math.cos((A_Wt+A_Wi)*math.pi/180)**2);  
      D1[i]=d1
   for k in range(len(B)):
      B_Wk=B.iloc[k,1]   #第k个会员的维度
      B_Jk=B.iloc[k,2]   #第k个会员的经度
      d2=111.19*math.sqrt((A_Wt-B_Wk)**2+(A_Jt-B_Jk)**2*
         math.cos((A_Wt+B_Wk)*math.pi/180)**2);
      D2[k]=d2

   Z[t,0]=t
   Z[t,1]=len(D1[D1<=5])
   Z[t,2]=A.iloc[D1<=5,3].mean()
   Z[t,3]=len(D2[D2<=5])
   Z[t,4]=B.iloc[D2<=5,5].mean()
   Z[t,5]=B.iloc[D2<=5,3].sum()
   Z[t,6]=B.iloc[find_I(6,30,6,30,D2,B),3].sum()
   Z[t,7]=B.iloc[find_I(6,33,6,45,D2,B),3].sum()
   Z[t,8]=B.iloc[find_I(6,48,7,3,D2,B),3].sum()
   Z[t,9]=B.iloc[find_I(7,6,7,21,D2,B),3].sum()
   Z[t,10]=B.iloc[find_I(7,24,7,39,D2,B),3].sum()
   Z[t,11]=B.iloc[find_I(7,42,7,57,D2,B),3].sum()
   Z[t,12]=B.iloc[find_I(8,0,8,0,D2,B),3].sum()
np.save('Z',Z)
Z
array([[  0.        ,  18.        ,  66.19444444, ...,   9.        ,
         13.        , 128.        ],
       [  1.        ,   9.        ,  68.38888889, ...,  12.        ,
         14.        ,  76.        ],
       [  2.        ,  24.        ,  65.875     , ...,  17.        ,
         13.        , 157.        ],
       ...,
       [832.        ,  11.        ,  68.04545455, ...,  16.        ,
          4.        ,  12.        ],
       [833.        ,  29.        ,  67.12068966, ...,  66.        ,
        109.        ,  38.        ],
       [834.        ,  12.        ,  77.41666667, ...,  11.        ,
          0.        ,  15.        ]])

五、任务定价模型构建

1、指标数据预处理

1、空值处理
#先将12个指标的存放数组Z转换成数据框,进而通过数据框的fillna()方法将空值填充
import numpy as np 
import pandas as pd
Z=np.load('Z.npy')
Data=pd.DataFrame(Z[:,1:])
Data=Data.fillna(0)
Z
array([[  0.        ,  18.        ,  66.19444444, ...,   9.        ,
         13.        , 128.        ],
       [  1.        ,   9.        ,  68.38888889, ...,  12.        ,
         14.        ,  76.        ],
       [  2.        ,  24.        ,  65.875     , ...,  17.        ,
         13.        , 157.        ],
       ...,
       [832.        ,  11.        ,  68.04545455, ...,  16.        ,
          4.        ,  12.        ],
       [833.        ,  29.        ,  67.12068966, ...,  66.        ,
        109.        ,  38.        ],
       [834.        ,  12.        ,  77.41666667, ...,  11.        ,
          0.        ,  15.        ]])
2、相关性分析
#我们通过数据处理获得的12个指标,其之间是否存在较强的相关性?
R = Data.corr()
R
0 1 2 3 4 5 6 7 8 9 10 11
0 1.000000 -0.616120 0.703197 0.117067 0.740254 0.221523 0.288878 0.677010 0.546820 0.353876 0.558365 0.730319
1 -0.616120 1.000000 -0.637442 -0.236682 -0.591313 -0.223977 -0.331366 -0.385930 -0.361395 -0.466696 -0.461521 -0.549973
2 0.703197 -0.637442 1.000000 0.103925 0.745033 0.042267 0.200093 0.628015 0.523935 0.383107 0.461526 0.949560
3 0.117067 -0.236682 0.103925 1.000000 0.381843 0.588993 0.451751 0.137975 0.105006 0.171130 0.196037 0.072652
4 0.740254 -0.591313 0.745033 0.381843 1.000000 0.600282 0.682588 0.679009 0.730280 0.543601 0.732003 0.723339
5 0.221523 -0.223977 0.042267 0.588993 0.600282 1.000000 0.671223 0.146441 0.381157 0.347796 0.568981 -0.014438
6 0.288878 -0.331366 0.200093 0.451751 0.682588 0.671223 1.000000 0.209410 0.497034 0.362163 0.420605 0.133269
7 0.677010 -0.385930 0.628015 0.137975 0.679009 0.146441 0.209410 1.000000 0.297241 0.262464 0.470540 0.657746
8 0.546820 -0.361395 0.523935 0.105006 0.730280 0.381157 0.497034 0.297241 1.000000 0.468725 0.516445 0.499932
9 0.353876 -0.466696 0.383107 0.171130 0.543601 0.347796 0.362163 0.262464 0.468725 1.000000 0.470236 0.286150
10 0.558365 -0.461521 0.461526 0.196037 0.732003 0.568981 0.420605 0.470540 0.516445 0.470236 1.000000 0.398002
11 0.730319 -0.549973 0.949560 0.072652 0.723339 -0.014438 0.133269 0.657746 0.499932 0.286150 0.398002 1.000000
#绘制各变量数据相关性的热力图
import seaborn as sns
import matplotlib
#指定默认字体
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
matplotlib.rcParams['font.family']='sans-serif'
#解决负号'-'显示为方块的问题
matplotlib.rcParams['axes.unicode_minus'] = False
#绘制热力图关键代码
sns.heatmap(Data.corr(),cmap="YlGnBu_r",linewidths=0.1,vmax=1.0, square=True,linecolor='white', annot=True)
<matplotlib.axes._subplots.AxesSubplot at 0x19f7c163148>

png

3、标准化处理
#这里采用均值-方差规范方法
#导入预处理库
from sklearn import preprocessing
#均值-方差规范化(Z-Score规范化)
data = preprocessing.scale(Data)
pd.DataFrame(data)
0 1 2 3 4 5 6 7 8 9 10 11
0 -0.033442 -0.857642 0.193732 2.469184 1.147050 2.112414 1.915995 -0.476946 0.157467 -0.396735 -0.221478 0.422689
1 -0.753540 -0.182698 0.133707 -0.247119 -0.497675 0.125506 -0.816221 -0.653897 -0.901890 -0.209398 -0.181570 -0.068339
2 0.446624 -0.955893 0.643915 1.804475 1.691138 2.112414 2.414133 0.496284 0.687146 0.102830 -0.221478 0.696532
3 -1.313616 1.850677 -1.066782 -0.532798 -1.116524 -0.478514 -0.816221 -0.919324 -0.901890 -0.958745 -0.700374 -0.767110
4 1.406755 -0.925991 1.004062 1.433901 2.010946 2.112414 2.429229 1.513752 0.531358 0.914622 0.057879 0.838175
... ... ... ... ... ... ... ... ... ... ... ... ...
830 1.886820 -0.847878 0.463842 -0.269858 1.529158 0.411621 1.206524 0.982899 1.995764 1.539078 1.095488 0.960932
831 -0.353485 0.224953 -0.886709 -0.470258 -0.559976 0.522888 -0.152036 -0.919324 -0.652630 -0.209398 -0.381110 -0.776553
832 -0.593518 -0.288328 -0.256452 -0.310313 -0.809176 -0.478514 -0.529414 -0.676016 -0.434527 0.040384 -0.580650 -0.672682
833 0.846678 -0.572757 0.163719 0.633524 2.447048 3.225083 2.655655 0.562640 2.774703 3.162663 3.609695 -0.427167
834 -0.513507 2.593969 -0.856697 -0.402493 -0.767643 -0.478514 -0.816221 0.120263 -0.465684 -0.271844 -0.740282 -0.644353

835 rows × 12 columns

4、主成分分析
#1.导入主成分分析模块PCA
from sklearn.decomposition import PCA
#2.利用PCA创建主成分分析对象pca
pca = PCA(n_components = 0.9) # 保留95%的信息量,那么PCA就会自动选使得信息量>=95%的特征数量(贡献率)
#3.调用pca对象中的fit()方法,对待分析的数据进行拟合训练
pca.fit(data)
PCA(n_components=0.9)
#4.调用pca对象中的transform()方法,返回提取的主成分
x= pca.transform(data)
x
array([[ 1.88813556,  2.75633337,  1.78617768,  0.48935664,  1.28646235,
        -0.48351853],
       [-1.1349675 , -0.2409384 ,  0.14757657,  0.64031776, -0.3301473 ,
        -0.263724  ],
       [ 3.10338507,  2.25231433,  1.32128386,  0.05102004,  1.2776191 ,
        -0.1233936 ],
       ...,
       [-1.48822184, -0.17173705, -0.31113801,  0.84155013, -0.01903963,
        -0.22246347],
       [ 5.59135074,  3.98004815, -2.65199462, -0.6723497 , -1.12130499,
         0.3550163 ],
       [-2.45968856, -0.13832587, -0.08423316, -1.3640026 , -0.2457046 ,
         1.51009867]])
#5.通过pca对象中的components_属性、explained_variance_属性、explained_variance_ratio_属性,
#返回主成分分析中对应的特征向量、特征值和主成分方差百分比(贡献率)


#返回特征向量
tzxl = pca.components_
print('特征向量:')
print(tzxl)
print()

#返回特征值
tz = pca.explained_variance_
print('特征值:')
print(tz)
print()

#返回主成分方差百分比(贡献率)
gxl = pca.explained_variance_ratio_
print('方差百分比(贡献率)、每个特征在原始数据信息占比:')
print(gxl)
特征向量:
[[ 0.33374174 -0.28965409  0.32903688  0.14329823  0.39790519  0.20823662
   0.23896688  0.28231708  0.29428419  0.24414444  0.30591633  0.31244789]
 [-0.20402499  0.08763272 -0.32570851  0.40217845  0.07577229  0.53198212
   0.41760583 -0.22216794  0.0535007   0.11460792  0.12360449 -0.37294204]
 [ 0.06199872 -0.04408505  0.08889273  0.62706601  0.02467587  0.05848869
  -0.02328601  0.31765507 -0.44061552 -0.48336889 -0.19145921  0.15224327]
 [-0.10906755 -0.61400776  0.11189973  0.28168321 -0.15209996 -0.1469648
  -0.11233818 -0.30155108 -0.24740316  0.51486107 -0.2109514  -0.01245172]
 [-0.09748172  0.05587617  0.20586487  0.13773047  0.07432791 -0.16309512
   0.33468972 -0.37902011  0.48274625 -0.23269545 -0.54353412  0.23581049]
 [-0.24592211  0.56781209  0.07076979  0.2370881   0.07580318 -0.08781261
  -0.11464317  0.35107648  0.03399221  0.57193268 -0.25619566  0.11017719]]

特征值:
[6.02239628 2.1069067  0.91921536 0.74194921 0.63357744 0.48293426]

方差百分比(贡献率)、每个特征在原始数据信息占比:
[0.50126532 0.17536529 0.07650954 0.06175505 0.05273489 0.04019632]

这里我们可以看出,原来的12个指标数据,经过主分析分析后,在累计贡献率0.9以上的要求下,降为6个综合指标数据,即6个主成分。基于这六个主成分数据,就可以构建任务定价模型了

2、多元线性回归模型

根据上面获得六个主成分数据,将附件1的任务定价数据拆分为未执行任务和已执行任务两种情况
#线性回归
A=pd.read_excel('附件一:已结束项目任务数据.xls') 
A4=A.iloc[:,4].values
x_0=x[A4==0,:] #未执行任务主成分数据
x_1=x[A4==1,:] #执行任务主成分数据
y=A.iloc[:,3].values
y=y.reshape(len(y),1)
y_0=y[A4==0]#未执行任务定价数据
y_1=y[A4==1]#执行任务定价数据



from sklearn.linear_model import LinearRegression as LR
lr = LR()    #创建线性回归模型类
lr.fit(x_1, y_1) #拟合
Slr=lr.score(x_1,y_1)   # 判定系数 R^2
c_x=lr.coef_        # x对应的回归系数
c_b=lr.intercept_   # 回归系数常数项
print('判定系数: ',Slr)

判定系数:  0.526173439561583

可以看出,多元线性回归模型的判定系数为 0.526173439561583,其线性关系较弱,所以这里考虑使用非线性神经网络模型

3、神经网络模型

from sklearn.neural_network import MLPRegressor 
#两个隐含层300*5
clf = MLPRegressor(solver='lbfgs', alpha=1e-5,hidden_layer_sizes=(300,5), random_state=1,max_iter=5000) 
clf.fit(x_1, y_1);   
rv1=clf.score(x_1,y_1)
y_0r=clf.predict(x_0)
print('拟合优度: ',rv1)
拟合优度:  0.9565623194412773
print('未执行的任务,利用模型重新预测定价:',y_0r)
未执行的任务,利用模型重新预测定价: [ 80.44388588  66.81514851  74.94944184  69.32090977  75.61555564
  66.50766349  73.39593324  64.55910094  64.21833084  70.95075732
  64.53694016  70.762575    51.73427798  80.17798174  70.86391429
  78.44687835  65.5200754   78.15422279  69.08827421  74.56798559
  70.95826185  73.31931118  49.51370471  79.07693939  82.8844776
  50.02346443  67.9868474   74.37683493  71.10611037  47.39867621
  54.85182513  65.89467657  50.02364029  72.44343038  70.79706155
  64.91992986  59.66025425  70.99421182  59.83164316  78.87093592
  76.62932912  46.13669937  81.40580123  74.56798559  70.49298111
  72.65927885  87.4222359   63.50814134  58.92072527  83.93482824
  66.55898071  88.08443762  68.54551205  51.54352791  78.39956867
  99.71636365  76.11052173  73.80315339  81.79033118  81.40580123
  60.00311414  58.88444509  73.66401633  63.51566604  65.92526655
  77.93720103  68.19248082  84.46806964  65.26855086  72.7602534
  73.45485468  69.39175757  70.53199214  74.64057376  78.27095656
  65.92526655  66.66599358  64.28454453  57.97978822  62.42910676
  76.15082698  64.87665761  71.89584321  65.74039955  92.82162049
  67.35494947  63.05240792  61.52006801  70.06164625  86.40458254
  73.77466041  64.28454453  69.67475292  70.714708    55.52369418
  58.11552507  57.35581937  67.55087931  74.66848073  63.90134384
  42.24709198  58.07602003  72.27910917  56.31806302  75.39077678
  66.66787083  64.19496287  48.90633039  75.62013172  56.91551924
  70.77186226  49.8600344   79.27220255  79.27220255  74.63521147
  63.53427063  74.04946322  66.88659111  63.05717058  70.03887049
  74.64416753  55.67717532  39.29106158  71.53194885  66.06114094
  68.9910377   60.58266266  77.37101974  74.56944815  95.23750992
  47.22807472  66.06114094  69.20635318  94.0392552   87.05606018
  69.83309812  70.0373717   69.29768051  67.53135228  61.52636495
  59.86243708  56.70859406   4.31369256  74.16992645  16.10115134
  73.12644125  50.89060718  52.93999836  65.52454388  66.98660889
  63.17908818  78.80476767  67.05499367  65.65511802  67.05499367
  69.99964874  67.05499367  67.64900059  61.71251231  67.64900059
  95.33090912  93.12011972  59.94673929  59.56711859  61.98787835
  60.24581789  63.86033459  76.43651166  66.88659111  71.8919227
  71.8919227   65.21648249  65.49891636  65.37870861  64.91005132
  64.91005132  56.02473578 101.48603703  65.81104418  65.68309688
  82.53875221  69.08022867  65.63685766  52.81125505  63.39088353
  71.77760388  58.36598829  82.4051133   52.9220859  108.09493393
  82.65877356  76.29075085  63.90309546  51.10264051  66.78303839
  60.88191579  73.45485468  58.8473046   69.72263288  69.32729995
  65.49694344 101.41154323  65.46310205  93.08755655  74.48371234
  65.30573783  65.46310205  60.95977065  76.71216353  69.31641487
  65.63963975  52.73504807  58.32146038  61.64022362  67.51519132
  72.05020922  65.95261042  72.34192943  61.50287115  70.12817271
  72.45014607  67.41022999  67.05499367  71.96248389  70.70250927
  84.00655871  74.55839518  63.61600085  60.89184263  65.1120001
  79.91802582  80.86814356  53.70635129  83.34643972  65.92526655
  74.67761252  52.83019962  64.86287839  63.50814134  67.05499367
  66.46803221  67.97524304  52.30288581  72.62706887  81.70496474
  74.37724064  62.04576893  71.83357278  79.44504391   4.31369256
  68.76768305  61.52636495  65.81076788  66.44368506  60.40960326
  50.11122369  60.40960326  49.37834209  66.10658976  73.11458477
  65.95914473  17.90328159  73.29603956  76.26975922  87.97994874
  77.27055419  72.19071291  69.1265566   76.55084371  89.27746274
  80.5622614   49.37834209  52.51138688  87.75022386  59.94917604
  51.79928736  81.79952408  58.6141055   56.8601307   77.09103933
  53.25229624  59.09719434  72.91341126  62.79108776  68.97444158
  70.74490565  70.54036397  72.27066791  72.27066791  71.34241185
  70.26640967  68.97444158  51.71956966  75.93964537  71.24485045
  74.98047904  74.52152133  42.3951782   41.8251968   42.58400044
  72.44531911  41.01576785  45.00294179  72.0471998   43.78673929
  71.34241185  76.13707588  72.39188053  74.79377183  77.57277078
  71.35024166  77.9940469   53.88832827]

六、方案评价

xx=pd.concat((Data,A.iloc[:,[3]]),axis=1) #12个指标+任务定价,自变量
xx=xx.values#转化为数组
yy=A4.reshape(len(A4),1)                  #任务执行情况,因变量
#对自变量与因变量按训练80%、测试20%随机拆分
from sklearn.model_selection import train_test_split
xx_train, xx_test, yy_train, yy_test = train_test_split(xx, yy, test_size=0.2, random_state=4)

from sklearn import svm
#用高斯核,训练数据类别标签作平衡策略
clf = svm.SVC(kernel='rbf',class_weight='balanced')  
clf.fit(xx_train, yy_train) 
rv2=clf.score(xx_train, yy_train);#模型准确率
yy1=clf.predict(xx_test)
yy1=yy1.reshape(len(yy1),1)
r=yy_test-yy1
rv3=len(r[r==0])/len(r) #预测准确率
print('模型准确率: ',rv2)
print('预测准确率: ',rv3)
xx_0=np.hstack((Z[A4==0,1:],y_0r.reshape(len(y_0r),1)))#预测自变量
P=clf.predict(xx_0)   #预测结果,1-执行,0-未被执行
R1=len(P[P==1])      #预测被执行的个数
R1=int(R1*rv3)       #任务完成增加量
print('任务完成增加量: ',R1)
R2=sum(y_0r)-sum(y_0)   #成本增加额
print('成本增加额: ',R2)
模型准确率:  0.6961077844311377
预测准确率:  0.6766467065868264
任务完成增加量:  92
成本增加额:  [-112.65800635]

posted @ 2022-05-15 21:36  AubeLiang  阅读(1640)  评论(0编辑  收藏  举报