数据分析与挖掘练习

1.0 背景

该数据集是澳大利亚某公司无人机送货的记录(2018年8月之前),主要包括以下的列:

  • 'Id' : 记录的ID
  • 'Drone Type' : 无人机的类别分 1类 2类 3类
  • 'Post Type' : 运送的类别 0为普通运送 1为速运
  • 'Package Weight' :包裹的重量
  • 'Origin Region' :出发地的区域代码
  • 'Destination Region' :目的地的区域代码
  • 'Origin Latitude' :出发纬度
  • 'Origin Longitude' :出发经度
  • 'Destination Latitude' :目的地纬度
  • 'Destination Longitude' :目的地经度
  • 'Journey Distance' :运送距离
  • 'Departure Date' :出发日期
  • 'Departure Time' :出发时间
  • 'Travel Time' :飞行时间
  • 'Delivery Time' :到达时间
  • 'Delivery Fare' :运送费用
pd.options.display.max_rows = 10

2.0 载入包和数据

#loading library
import pandas as pd
import re
import matplotlib.pyplot as plt
#import seaborn as sns  !pip intall seaborn
import scipy.stats as st
import numpy as np
import math
from math import *
from datetime import datetime,timedelta

任务1:载入名为‘data.csv’的数据

data = pd.read_csv('data.csv')

DataFrame
Series

type(data)
pandas.core.frame.DataFrame

3.0 数据初步探索

任务2:找出数据有多少行列

data.shape
(37903, 16)

任务3:查看列的统计信息

提示:describe()

data.describe()
Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Travel Time Delivery Fare
count 37893.000000 37883.000000 37903.000000 37893.000000 37893.000000 37903.000000 37903.000000 37903.000000 37903.000000 37903.000000 37863.000000 37874.000000
mean 1.699285 0.298709 25.669901 20.476684 20.452722 -37.728867 145.423058 -37.722054 145.434035 221.954150 208.794518 126.814976
std 0.779845 0.457698 12.107150 11.501110 11.509311 1.899183 6.923993 1.895621 6.909055 116.604355 107.612447 59.314445
min 1.000000 0.000000 5.001000 1.000000 1.000000 -39.006941 -148.337157 -39.006941 -147.691902 0.664000 7.420000 54.020000
25% 1.000000 0.000000 15.199000 11.000000 11.000000 -38.443034 143.965002 -38.431293 143.951543 131.044500 125.165000 97.440000
50% 2.000000 0.000000 25.446000 20.000000 20.000000 -37.707244 145.423386 -37.700695 145.450794 209.796000 196.370000 120.045000
75% 2.000000 1.000000 35.953500 30.000000 30.000000 -37.094433 147.170334 -37.080256 147.216886 302.052000 281.250000 145.800000
max 3.000000 1.000000 55.992000 40.000000 40.000000 38.986998 148.450576 38.989473 148.450576 556.637000 545.460000 1217.690000

任务4:找出每个列名称

data.columns

Index(['Id', 'Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
       'Destination Region', 'Origin Latitude', 'Origin Longitude',
       'Destination Latitude', 'Destination Longitude', 'Journey Distance',
       'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
       'Delivery Fare'],
      dtype='object')

任务5:找出每个列的属性

是Obeject 还是 float

data.info()
data['Drone Type'] = data['Drone Type'].astype('str')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37903 entries, 0 to 37902
Data columns (total 16 columns):
Id                       37878 non-null object
Drone Type               37893 non-null float64
Post Type                37883 non-null float64
Package Weight           37903 non-null float64
Origin Region            37893 non-null float64
Destination Region       37893 non-null float64
Origin Latitude          37903 non-null float64
Origin Longitude         37903 non-null float64
Destination Latitude     37903 non-null float64
Destination Longitude    37903 non-null float64
Journey Distance         37903 non-null float64
Departure Date           37903 non-null object
Departure Time           37903 non-null object
Travel Time              37863 non-null float64
Delivery Time            37903 non-null object
Delivery Fare            37874 non-null float64
dtypes: float64(12), object(4)
memory usage: 4.6+ MB
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37903 entries, 0 to 37902
Data columns (total 16 columns):
Id                       37878 non-null object
Drone Type               37903 non-null object
Post Type                37883 non-null float64
Package Weight           37903 non-null float64
Origin Region            37893 non-null float64
Destination Region       37893 non-null float64
Origin Latitude          37903 non-null float64
Origin Longitude         37903 non-null float64
Destination Latitude     37903 non-null float64
Destination Longitude    37903 non-null float64
Journey Distance         37903 non-null float64
Departure Date           37903 non-null object
Departure Time           37903 non-null object
Travel Time              37863 non-null float64
Delivery Time            37903 non-null object
Delivery Fare            37874 non-null float64
dtypes: float64(11), object(5)
memory usage: 4.6+ MB

任务6:找出数据的前5行和后5行

data.head()

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 nan 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 ID1802448576 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 ID1940231408 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 ID1299303958 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 ID1752722028 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 ID5995243590 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
data.tail()

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49
37899 ID5862552991 1.0 1.0 40.363 9.0 38.0 -38.983710 145.320518 -37.673908 142.879230 258.259 2018-06-26 15:55:37 256.70 20:12:18 122.98
37900 ID5339104082 1.0 1.0 35.955 13.0 32.0 -38.292301 147.562013 -36.605285 148.293183 198.597 2018-03-19 16:41:10 198.97 20:00:08 118.47
37901 ID5468787866 2.0 1.0 29.566 33.0 23.0 -38.853243 144.508346 -37.727691 145.662270 160.816 2018-02-26 04:22:30 150.58 6:53:04 161.96
37902 ID1448126768 3.0 0.0 44.070 36.0 34.0 -37.129313 145.266426 -38.428477 143.341632 222.687 2018-07-07 08:01:42 182.71 11:04:24 144.41

任务7:找出所有列的缺失值个数并且按照多到少排列

隐藏任务:可视化缺失值的列

data.isnull().sum().sort_values(ascending=False)

Travel Time              40
Delivery Fare            29
Id                       25
Post Type                20
Destination Region       10
Origin Region            10
Delivery Time             0
Departure Time            0
Departure Date            0
Journey Distance          0
Destination Longitude     0
Destination Latitude      0
Origin Longitude          0
Origin Latitude           0
Package Weight            0
Drone Type                0
dtype: int64
count = {}
for col in data.columns:
    count_null = data[col].isnull().sum()
    count[col] = count_null
for i,j in sorted(count.items(),key = lambda s: s[1], reverse=True):
    print('列名:%s,存在缺失值 %s 个'%(i,j))

列名:Travel Time,存在缺失值 40 个
列名:Delivery Fare,存在缺失值 29 个
列名:Id,存在缺失值 25 个
列名:Post Type,存在缺失值 20 个
列名:Drone Type,存在缺失值 10 个
列名:Origin Region,存在缺失值 10 个
列名:Destination Region,存在缺失值 10 个
列名:Package Weight,存在缺失值 0 个
列名:Origin Latitude,存在缺失值 0 个
列名:Origin Longitude,存在缺失值 0 个
列名:Destination Latitude,存在缺失值 0 个
列名:Destination Longitude,存在缺失值 0 个
列名:Journey Distance,存在缺失值 0 个
列名:Departure Date,存在缺失值 0 个
列名:Departure Time,存在缺失值 0 个
列名:Delivery Time,存在缺失值 0 个

任务8:找出所有至少含有一个缺失值的行,并统计有多少行

data.isnull().any(axis=1)  # 判断至少有一个缺失值

0        False
1         True
2        False
3        False
4        False
         ...  
37898     True
37899    False
37900    False
37901    False
37902    False
Length: 37903, dtype: bool
data.drop(data.iloc[0,2])

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
1 ID1697620764 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 ID1802448576 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 ID1940231408 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 ID1299303958 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 ID1752722028 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 ID5995243590 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
10 ID1483358088 2.0 0.0 23.172 13.0 27.0 -38.225456 147.425515 -37.642798 147.124104 70.051 2018-03-19 09:59:10 69.30 11:08:27 96.95
11 ID1626798395 2.0 0.0 19.754 23.0 26.0 -37.625368 145.838281 -36.789955 147.133916 147.791 2018-02-28 17:40:59 138.92 19:59:54 117.48
12 ID5277549009 3.0 1.0 12.807 4.0 6.0 -36.855984 142.929596 -36.906838 145.696986 246.465 2018-03-26 07:55:48 201.49 11:17:17 173.32
13 ID1950928883 2.0 0.0 22.332 33.0 19.0 -38.894115 144.457143 -37.173740 144.152105 193.365 2018-06-28 16:05:55 179.73 19:05:38 117.67
14 ID5143738648 2.0 1.0 25.880 33.0 5.0 -38.872372 144.606034 -37.553304 145.120753 153.580 2018-05-09 08:33:29 144.10 10:57:34 132.98
15 ID5132897910 2.0 1.0 38.691 7.0 15.0 -38.844622 144.093195 -38.476630 145.849992 158.102 2018-01-05 15:55:00 148.15 18:23:09 147.73
16 ID1290889802 1.0 0.0 30.742 19.0 14.0 -37.178059 144.403991 -37.713867 146.382965 184.783 2018-06-05 13:03:43 185.60 16:09:18 84.82
17 ID5226355535 1.0 1.0 18.055 10.0 18.0 -37.141728 147.256091 -37.983127 143.290272 362.227 2018-05-04 07:42:51 357.32 13:40:10 116.35
18 ID1898978312 1.0 0.0 5.986 23.0 18.0 -37.706960 145.718119 -37.983127 143.182464 224.997 2018-05-23 12:53:12 224.51 16:37:42 82.47
19 ID5284908619 2.0 1.0 30.664 13.0 14.0 -38.250238 147.366610 -37.774617 146.503568 92.371 2018-07-11 12:52:57 89.29 14:22:14 143.96
20 ID1585556406 3.0 0.0 24.942 3.0 39.0 -38.322643 145.505910 -38.453191 148.300405 244.251 2018-03-14 22:16:33 199.74 1:36:17 167.45
21 ID1901962779 1.0 0.0 34.012 27.0 37.0 -37.516090 146.969053 -38.852488 147.816987 166.236 2018-07-17 16:26:55 167.65 19:14:34 88.98
22 ID5590279060 1.0 1.0 55.229 7.0 6.0 -38.876145 143.911302 -36.875012 145.759049 275.633 2018-05-01 12:42:19 273.52 17:15:50 688.24
23 ID1473718059 3.0 0.0 40.741 33.0 17.0 -38.756634 144.375564 -38.817727 147.071495 234.014 2018-06-21 01:04:15 191.66 4:15:54 165.37
24 ID5551646734 3.0 1.0 15.419 35.0 21.0 -36.922413 146.362349 -37.240137 143.768208 233.068 2018-01-05 07:19:48 190.91 10:30:42 174.52
25 ID1772122934 2.0 0.0 10.663 29.0 1.0 -37.019910 142.798295 -38.358124 143.943955 179.929 2018-02-27 04:42:03 167.70 7:29:44 124.95
26 ID1987608852 1.0 0.0 20.685 9.0 35.0 -38.970092 145.435801 -37.068624 146.317717 225.349 2018-05-15 15:35:10 224.85 19:20:00 80.66
27 ID1249352358 2.0 0.0 10.272 8.0 34.0 -38.485641 144.522135 -38.505484 143.311788 105.471 2018-02-05 06:13:29 101.02 7:54:30 102.09
28 ID1611614450 1.0 0.0 7.373 16.0 38.0 -36.600877 143.566811 -37.804113 142.793139 150.483 2018-06-18 14:01:42 152.40 16:34:06 82.11
29 ID1262379299 2.0 0.0 31.358 13.0 16.0 -38.195911 147.436921 -36.734698 143.764279 362.941 2018-04-24 22:40:54 331.59 4:12:29 147.21
30 ID5498216777 2.0 1.0 6.383 40.0 29.0 -37.692167 147.890721 -37.273398 142.951899 438.698 2018-05-09 10:31:10 399.43 17:10:35 152.50
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37873 NaN 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 NaN 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 NaN 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 NaN 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 NaN 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37878 ID1525565031 2.0 0.0 27.560 7.0 35.0 -38.689638 144.093996 -37.046945 146.460389 276.894 2018-02-22 23:43:48 254.53 3:58:19 135.68
37879 NaN 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37880 ID1272392458 1.0 0.0 44.226 40.0 3.0 -37.740012 147.768225 -38.276119 145.403536 215.813 2018-02-21 21:28:22 215.63 1:03:59 105.89
37881 ID1909131399 2.0 0.0 6.415 25.0 27.0 -36.585246 147.853062 -37.700285 147.027318 144.135 2018-01-16 08:26:33 135.64 10:42:11 100.45
37882 ID5420355345 3.0 1.0 30.471 3.0 15.0 -38.345519 145.542105 -38.712933 145.977537 55.772 2018-07-22 00:13:54 50.94 1:04:50 183.73
37883 ID5164753016 2.0 1.0 6.553 5.0 15.0 -37.625311 145.097611 -38.630132 146.040376 139.018 2018-01-14 16:51:56 131.06 19:02:59 145.71
37884 ID1122103211 1.0 0.0 37.850 38.0 10.0 -37.510578 142.826103 -37.337327 147.315144 397.278 2018-04-27 14:54:35 391.24 21:25:49 102.88
37885 NaN 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37886 ID1970402579 2.0 0.0 38.656 1.0 32.0 -38.171542 143.871582 -36.756484 148.322553 423.585 2018-03-18 01:01:33 385.90 7:27:26 142.19
37887 ID1388385049 1.0 0.0 23.699 10.0 10.0 -37.352794 147.476662 -37.104700 147.417027 28.118 2018-05-24 00:41:08 33.99 1:15:07 83.77
37888 NaN 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37889 ID1281653747 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44
37890 ID5349085772 1.0 1.0 18.286 22.0 14.0 -36.720129 144.588398 -37.695893 146.430793 196.154 2018-02-15 17:04:44 196.60 20:21:19 121.11
37891 ID5972337482 1.0 1.0 11.538 33.0 2.0 -38.836802 144.357057 -37.549931 148.306793 374.024 2018-05-26 02:07:50 368.73 8:16:33 141.06
37892 NaN 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37893 ID1539650034 2.0 0.0 34.355 8.0 39.0 -38.520278 144.408786 -38.447195 148.416066 349.251 2018-05-09 07:26:49 319.33 12:46:08 121.30
37894 NaN 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37895 ID1796943211 1.0 0.0 44.341 23.0 24.0 -37.777223 146.024184 -38.913981 142.913934 299.552 2018-02-20 05:08:12 296.66 10:04:51 113.70
37896 ID5429883749 2.0 1.0 17.798 11.0 40.0 -38.045551 146.736254 -37.633007 147.639273 91.711 2018-05-03 10:19:32 88.70 11:48:14 130.51
37897 NaN 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49
37899 ID5862552991 1.0 1.0 40.363 9.0 38.0 -38.983710 145.320518 -37.673908 142.879230 258.259 2018-06-26 15:55:37 256.70 20:12:18 122.98
37900 ID5339104082 1.0 1.0 35.955 13.0 32.0 -38.292301 147.562013 -36.605285 148.293183 198.597 2018-03-19 16:41:10 198.97 20:00:08 118.47
37901 ID5468787866 2.0 1.0 29.566 33.0 23.0 -38.853243 144.508346 -37.727691 145.662270 160.816 2018-02-26 04:22:30 150.58 6:53:04 161.96
37902 ID1448126768 3.0 0.0 44.070 36.0 34.0 -37.129313 145.266426 -38.428477 143.341632 222.687 2018-07-07 08:01:42 182.71 11:04:24 144.41

37902 rows × 16 columns

# axis=1针对的是行;=0针对的是列
data[data.isnull().any(axis=1)].shape
#(data.isnull().sum(axis=1) >= 1).sum()
data[data.isnull().any(axis=1)].shape
data.isnull().any(axis=1)

0        False
1         True
2        False
3        False
4        False
5        False
6        False
7        False
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16       False
17       False
18       False
19       False
20       False
21       False
22       False
23       False
24       False
25       False
26       False
27       False
28       False
29       False
         ...  
37873     True
37874     True
37875     True
37876     True
37877     True
37878    False
37879     True
37880    False
37881    False
37882    False
37883    False
37884    False
37885     True
37886    False
37887    False
37888     True
37889    False
37890    False
37891    False
37892     True
37893    False
37894     True
37895    False
37896    False
37897     True
37898     True
37899    False
37900    False
37901    False
37902    False
dtype: bool

4.0 数据清洗

任务9: 填补 'Id'列的空值

任务9.1 统计‘id’列有多少个空值

data['Id'].isnull().sum()

25

任务9.2 找出所有‘id’为空的行

data[data['Id'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
37844 NaN 1.0 0.0 22.498 30.0 15.0 -37.885792 144.875305 -38.680341 145.874064 124.252 2018-01-28 13:07:09 127.02 15:14:10 74.69
37845 NaN 3.0 0.0 32.300 16.0 36.0 -36.571169 143.741010 -36.993435 144.983048 120.297 2018-03-26 02:34:49 101.88 4:16:41 162.20
37846 NaN 3.0 0.0 18.601 38.0 29.0 -37.694132 142.851548 -37.058014 142.873698 70.838 2018-03-02 20:33:11 62.83 21:36:00 142.10
37850 NaN 2.0 1.0 28.203 18.0 21.0 -38.139260 143.345778 -37.279348 143.604189 98.391 2018-04-17 03:28:41 94.68 5:03:21 152.30
37851 NaN 2.0 1.0 45.696 28.0 1.0 -38.152835 147.793072 -38.162103 144.043047 328.220 2018-03-25 13:50:57 300.50 18:51:27 167.39
37852 NaN 1.0 0.0 27.143 38.0 32.0 -37.613135 142.854194 -36.713765 148.383062 500.500 2018-05-12 08:54:52 491.13 17:05:59 97.44
37854 NaN 1.0 0.0 13.002 27.0 3.0 -37.428594 147.056992 -38.383631 145.590528 167.005 2018-02-20 03:35:43 168.39 6:24:06 92.12
37857 NaN 1.0 0.0 19.468 5.0 31.0 -37.570514 145.253281 -37.824303 143.862472 125.716 2018-04-20 11:48:14 128.43 13:56:39 69.86
37860 NaN 1.0 0.0 10.647 21.0 5.0 -37.239046 143.524656 -37.546884 145.204154 152.434 2018-06-13 23:05:39 154.29 1:39:56 95.65
37861 NaN 2.0 0.0 40.775 22.0 19.0 -36.678761 144.345069 -37.167099 144.257043 54.922 2018-03-27 04:31:43 55.75 5:27:28 120.70
37863 NaN 1.0 0.0 35.410 9.0 16.0 -39.006941 145.406988 -36.525200 143.517906 322.399 2018-07-21 06:48:39 318.77 12:07:25 88.14
37866 NaN 1.0 1.0 40.151 14.0 34.0 -37.644061 146.625820 -38.518928 143.369714 301.447 2018-04-14 07:29:13 298.50 12:27:43 118.89
37869 NaN 3.0 1.0 44.559 33.0 10.0 -38.733722 144.474460 -37.220132 147.498884 314.323 2018-04-18 04:57:37 255.06 9:12:40 201.89
37873 NaN 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 NaN 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 NaN 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 NaN 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 NaN 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37879 NaN 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37885 NaN 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37888 NaN 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37892 NaN 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37894 NaN 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37897 NaN 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49

9.2.1 删除 除ID列之外其余数据重复的行

data[
    ['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
       'Destination Region', 'Origin Latitude', 'Origin Longitude',
       'Destination Latitude', 'Destination Longitude', 'Journey Distance',
       'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
       'Delivery Fare']
]

Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
10 2.0 0.0 23.172 13.0 27.0 -38.225456 147.425515 -37.642798 147.124104 70.051 2018-03-19 09:59:10 69.30 11:08:27 96.95
11 2.0 0.0 19.754 23.0 26.0 -37.625368 145.838281 -36.789955 147.133916 147.791 2018-02-28 17:40:59 138.92 19:59:54 117.48
12 3.0 1.0 12.807 4.0 6.0 -36.855984 142.929596 -36.906838 145.696986 246.465 2018-03-26 07:55:48 201.49 11:17:17 173.32
13 2.0 0.0 22.332 33.0 19.0 -38.894115 144.457143 -37.173740 144.152105 193.365 2018-06-28 16:05:55 179.73 19:05:38 117.67
14 2.0 1.0 25.880 33.0 5.0 -38.872372 144.606034 -37.553304 145.120753 153.580 2018-05-09 08:33:29 144.10 10:57:34 132.98
15 2.0 1.0 38.691 7.0 15.0 -38.844622 144.093195 -38.476630 145.849992 158.102 2018-01-05 15:55:00 148.15 18:23:09 147.73
16 1.0 0.0 30.742 19.0 14.0 -37.178059 144.403991 -37.713867 146.382965 184.783 2018-06-05 13:03:43 185.60 16:09:18 84.82
17 1.0 1.0 18.055 10.0 18.0 -37.141728 147.256091 -37.983127 143.290272 362.227 2018-05-04 07:42:51 357.32 13:40:10 116.35
18 1.0 0.0 5.986 23.0 18.0 -37.706960 145.718119 -37.983127 143.182464 224.997 2018-05-23 12:53:12 224.51 16:37:42 82.47
19 2.0 1.0 30.664 13.0 14.0 -38.250238 147.366610 -37.774617 146.503568 92.371 2018-07-11 12:52:57 89.29 14:22:14 143.96
20 3.0 0.0 24.942 3.0 39.0 -38.322643 145.505910 -38.453191 148.300405 244.251 2018-03-14 22:16:33 199.74 1:36:17 167.45
21 1.0 0.0 34.012 27.0 37.0 -37.516090 146.969053 -38.852488 147.816987 166.236 2018-07-17 16:26:55 167.65 19:14:34 88.98
22 1.0 1.0 55.229 7.0 6.0 -38.876145 143.911302 -36.875012 145.759049 275.633 2018-05-01 12:42:19 273.52 17:15:50 688.24
23 3.0 0.0 40.741 33.0 17.0 -38.756634 144.375564 -38.817727 147.071495 234.014 2018-06-21 01:04:15 191.66 4:15:54 165.37
24 3.0 1.0 15.419 35.0 21.0 -36.922413 146.362349 -37.240137 143.768208 233.068 2018-01-05 07:19:48 190.91 10:30:42 174.52
25 2.0 0.0 10.663 29.0 1.0 -37.019910 142.798295 -38.358124 143.943955 179.929 2018-02-27 04:42:03 167.70 7:29:44 124.95
26 1.0 0.0 20.685 9.0 35.0 -38.970092 145.435801 -37.068624 146.317717 225.349 2018-05-15 15:35:10 224.85 19:20:00 80.66
27 2.0 0.0 10.272 8.0 34.0 -38.485641 144.522135 -38.505484 143.311788 105.471 2018-02-05 06:13:29 101.02 7:54:30 102.09
28 1.0 0.0 7.373 16.0 38.0 -36.600877 143.566811 -37.804113 142.793139 150.483 2018-06-18 14:01:42 152.40 16:34:06 82.11
29 2.0 0.0 31.358 13.0 16.0 -38.195911 147.436921 -36.734698 143.764279 362.941 2018-04-24 22:40:54 331.59 4:12:29 147.21
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37873 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37878 2.0 0.0 27.560 7.0 35.0 -38.689638 144.093996 -37.046945 146.460389 276.894 2018-02-22 23:43:48 254.53 3:58:19 135.68
37879 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37880 1.0 0.0 44.226 40.0 3.0 -37.740012 147.768225 -38.276119 145.403536 215.813 2018-02-21 21:28:22 215.63 1:03:59 105.89
37881 2.0 0.0 6.415 25.0 27.0 -36.585246 147.853062 -37.700285 147.027318 144.135 2018-01-16 08:26:33 135.64 10:42:11 100.45
37882 3.0 1.0 30.471 3.0 15.0 -38.345519 145.542105 -38.712933 145.977537 55.772 2018-07-22 00:13:54 50.94 1:04:50 183.73
37883 2.0 1.0 6.553 5.0 15.0 -37.625311 145.097611 -38.630132 146.040376 139.018 2018-01-14 16:51:56 131.06 19:02:59 145.71
37884 1.0 0.0 37.850 38.0 10.0 -37.510578 142.826103 -37.337327 147.315144 397.278 2018-04-27 14:54:35 391.24 21:25:49 102.88
37885 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37886 2.0 0.0 38.656 1.0 32.0 -38.171542 143.871582 -36.756484 148.322553 423.585 2018-03-18 01:01:33 385.90 7:27:26 142.19
37887 1.0 0.0 23.699 10.0 10.0 -37.352794 147.476662 -37.104700 147.417027 28.118 2018-05-24 00:41:08 33.99 1:15:07 83.77
37888 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37889 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44
37890 1.0 1.0 18.286 22.0 14.0 -36.720129 144.588398 -37.695893 146.430793 196.154 2018-02-15 17:04:44 196.60 20:21:19 121.11
37891 1.0 1.0 11.538 33.0 2.0 -38.836802 144.357057 -37.549931 148.306793 374.024 2018-05-26 02:07:50 368.73 8:16:33 141.06
37892 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37893 2.0 0.0 34.355 8.0 39.0 -38.520278 144.408786 -38.447195 148.416066 349.251 2018-05-09 07:26:49 319.33 12:46:08 121.30
37894 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37895 1.0 0.0 44.341 23.0 24.0 -37.777223 146.024184 -38.913981 142.913934 299.552 2018-02-20 05:08:12 296.66 10:04:51 113.70
37896 2.0 1.0 17.798 11.0 40.0 -38.045551 146.736254 -37.633007 147.639273 91.711 2018-05-03 10:19:32 88.70 11:48:14 130.51
37897 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49
37899 1.0 1.0 40.363 9.0 38.0 -38.983710 145.320518 -37.673908 142.879230 258.259 2018-06-26 15:55:37 256.70 20:12:18 122.98
37900 1.0 1.0 35.955 13.0 32.0 -38.292301 147.562013 -36.605285 148.293183 198.597 2018-03-19 16:41:10 198.97 20:00:08 118.47
37901 2.0 1.0 29.566 33.0 23.0 -38.853243 144.508346 -37.727691 145.662270 160.816 2018-02-26 04:22:30 150.58 6:53:04 161.96
37902 3.0 0.0 44.070 36.0 34.0 -37.129313 145.266426 -38.428477 143.341632 222.687 2018-07-07 08:01:42 182.71 11:04:24 144.41

37903 rows × 15 columns

# drop_duplicates返回一个dataframe,重复的行会标为False
# data_1 = data.drop_duplicates(data[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
#        'Destination Region', 'Origin Latitude', 'Origin Longitude',
#        'Destination Latitude', 'Destination Longitude', 'Journey Distance',
#        'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
#        'Delivery Fare']])
# 2
data_1 = data.drop_duplicates(['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
    'Destination Region', 'Origin Latitude', 'Origin Longitude',
    'Destination Latitude', 'Destination Longitude', 'Journey Distance',
    'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
    'Delivery Fare'])
data_1

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 ID1879423081 2.0 0.0 17.081 20.0 14.0 -38.860358 148.174855 -37.562224 146.443991 209.278 2018-02-15 12:09:15 193.98 15:23:13 121.33
37840 ID5705840841 1.0 1.0 35.164 29.0 23.0 -37.250331 142.837244 -37.739285 145.963420 281.403 2018-01-01 00:07:35 279.10 4:46:41 136.58
37841 ID1276239209 3.0 0.0 36.704 12.0 11.0 -36.578568 145.273744 -38.305557 146.997297 245.269 2018-01-24 06:48:05 200.54 10:08:37 146.69
37842 ID1432868583 3.0 0.0 13.195 37.0 21.0 -38.755802 147.744770 -37.487866 143.585293 390.599 2018-04-07 18:57:52 315.28 0:13:08 170.52
37889 ID1281653747 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44

37844 rows × 16 columns

9.2.2 设置ID列相同,但其余数据不重复的ID为np.nan

# 返回一个布尔型的series,表示id是否重复行
data_1['Id'].duplicated()

0        False
1        False
2        False
3        False
4        False
         ...  
37839    False
37840    False
37841    False
37842    False
37889     True
Name: Id, Length: 37844, dtype: bool
(data_1['Id'].duplicated()) & ~(data_1[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
       'Destination Region', 'Origin Latitude', 'Origin Longitude',
       'Destination Latitude', 'Destination Longitude', 'Journey Distance',
       'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
       'Delivery Fare']].duplicated())

0        False
1        False
2        False
3        False
4        False
         ...  
37839    False
37840    False
37841    False
37842    False
37889     True
Length: 37844, dtype: bool
data_1.loc[(data_1['Id'].duplicated()) & ~(data_1[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
       'Destination Region', 'Origin Latitude', 'Origin Longitude',
       'Destination Latitude', 'Destination Longitude', 'Journey Distance',
       'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
       'Delivery Fare']].duplicated()),'Id'] = np.nan

data_1['Id'].value_counts()

ID1858018960    1
ID1792956453    1
ID1574575344    1
ID1146475421    1
ID1897230811    1
               ..
ID5349642032    1
ID5451065609    1
ID5131869013    1
ID1550669799    1
ID5260585195    1
Name: Id, Length: 37843, dtype: int64

任务9.3 想出空值的填补的2种方法,并且选一种应用

#方法1 :随机数
import random
# 'Id'+str(random.randint(1000000000,2000000000)) in data_1['Id']
#方法2 :随机数填补

def fill_id(Id):
    Id_out = Id
    while Id_out in data_1['Id'].tolist():
        Id_out = 'ID'+str(random.randint(1000000000,2000000000))
        return Id_out
data_1.loc[(data_1['Post Type']==0)&(data_1['Id'].isnull()),'Id'].apply(fill_id)

37889    ID1117118084
Name: Id, dtype: object
data_1.loc[(data_1['Post Type']==0)&(data_1['Id'].isnull()),'Id']

37889    NaN
Name: Id, dtype: object
data_1['Id']

0        ID1645282128
1        ID1697620764
2        ID1543933503
3        ID1756517608
4        ID1832325834
             ...     
37839    ID1879423081
37840    ID5705840841
37841    ID1276239209
37842    ID1432868583
37889             NaN
Name: Id, Length: 37844, dtype: object

任务9.4 检查‘id’是否还有空值

data_1['Id'].isnull().sum()

1

任务10:找出所有重复的id

data[data.Id.duplicated()].Id

37843    ID1874340610
37845             NaN
37846             NaN
37847    ID5156350605
37848    ID1176413101
             ...     
37898             NaN
37899    ID5862552991
37900    ID5339104082
37901    ID5468787866
37902    ID1448126768
Name: Id, Length: 59, dtype: object
data['Id'].value_counts() >= 2

ID5281864060     True
ID1796943211     True
ID1877344172     True
ID5122284320     True
ID1238297934     True
                ...  
ID5672029782    False
ID1114364309    False
ID5495523518    False
ID1874532678    False
ID5260585195    False
Name: Id, Length: 37843, dtype: bool

任务11:删除重复行

data.duplicated()

0        False
1        False
2        False
3        False
4        False
5        False
6        False
7        False
8        False
9        False
10       False
11       False
12       False
13       False
14       False
15       False
16       False
17       False
18       False
19       False
20       False
21       False
22       False
23       False
24       False
25       False
26       False
27       False
28       False
29       False
         ...  
37873    False
37874    False
37875    False
37876    False
37877    False
37878     True
37879    False
37880     True
37881     True
37882     True
37883     True
37884     True
37885    False
37886     True
37887     True
37888    False
37889    False
37890     True
37891     True
37892    False
37893     True
37894    False
37895     True
37896     True
37897    False
37898    False
37899     True
37900     True
37901     True
37902     True
dtype: bool
data.drop_duplicates()

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0.0 21.686 19.0 38.0 -37.089338 144.429529 -37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 NaN 0.0 39.075 15.0 15.0 -38.481935 146.009567 -38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0.0 7.243 33.0 28.0 -38.754167 144.509664 -38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0.0 13.383 10.0 38.0 -37.240526 147.568019 -37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0.0 8.123 1.0 8.0 -38.143985 143.798292 -38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
5 ID1802448576 2.0 0.0 32.859 2.0 28.0 -37.421211 148.044072 -38.159627 148.194048 83.250 2018-05-15 16:35:50 81.12 17:56:57 113.88
6 ID1940231408 1.0 0.0 20.616 29.0 36.0 -37.173949 143.140662 -37.021605 145.197043 183.363 2018-04-01 19:31:12 184.22 22:35:25 85.60
7 ID1299303958 2.0 0.0 44.577 36.0 31.0 -37.123190 145.236196 -37.667199 143.877650 134.543 2018-05-01 18:39:36 127.05 20:46:38 114.22
8 ID1752722028 1.0 0.0 15.363 20.0 30.0 -38.850561 148.317253 -38.024914 144.823938 318.132 2018-05-27 14:48:17 314.64 20:02:55 87.39
9 ID5995243590 1.0 1.0 36.190 18.0 28.0 -38.070189 142.950207 -37.996817 148.026520 445.106 2018-06-17 12:53:02 437.52 20:10:33 142.95
10 ID1483358088 2.0 0.0 23.172 13.0 27.0 -38.225456 147.425515 -37.642798 147.124104 70.051 2018-03-19 09:59:10 69.30 11:08:27 96.95
11 ID1626798395 2.0 0.0 19.754 23.0 26.0 -37.625368 145.838281 -36.789955 147.133916 147.791 2018-02-28 17:40:59 138.92 19:59:54 117.48
12 ID5277549009 3.0 1.0 12.807 4.0 6.0 -36.855984 142.929596 -36.906838 145.696986 246.465 2018-03-26 07:55:48 201.49 11:17:17 173.32
13 ID1950928883 2.0 0.0 22.332 33.0 19.0 -38.894115 144.457143 -37.173740 144.152105 193.365 2018-06-28 16:05:55 179.73 19:05:38 117.67
14 ID5143738648 2.0 1.0 25.880 33.0 5.0 -38.872372 144.606034 -37.553304 145.120753 153.580 2018-05-09 08:33:29 144.10 10:57:34 132.98
15 ID5132897910 2.0 1.0 38.691 7.0 15.0 -38.844622 144.093195 -38.476630 145.849992 158.102 2018-01-05 15:55:00 148.15 18:23:09 147.73
16 ID1290889802 1.0 0.0 30.742 19.0 14.0 -37.178059 144.403991 -37.713867 146.382965 184.783 2018-06-05 13:03:43 185.60 16:09:18 84.82
17 ID5226355535 1.0 1.0 18.055 10.0 18.0 -37.141728 147.256091 -37.983127 143.290272 362.227 2018-05-04 07:42:51 357.32 13:40:10 116.35
18 ID1898978312 1.0 0.0 5.986 23.0 18.0 -37.706960 145.718119 -37.983127 143.182464 224.997 2018-05-23 12:53:12 224.51 16:37:42 82.47
19 ID5284908619 2.0 1.0 30.664 13.0 14.0 -38.250238 147.366610 -37.774617 146.503568 92.371 2018-07-11 12:52:57 89.29 14:22:14 143.96
20 ID1585556406 3.0 0.0 24.942 3.0 39.0 -38.322643 145.505910 -38.453191 148.300405 244.251 2018-03-14 22:16:33 199.74 1:36:17 167.45
21 ID1901962779 1.0 0.0 34.012 27.0 37.0 -37.516090 146.969053 -38.852488 147.816987 166.236 2018-07-17 16:26:55 167.65 19:14:34 88.98
22 ID5590279060 1.0 1.0 55.229 7.0 6.0 -38.876145 143.911302 -36.875012 145.759049 275.633 2018-05-01 12:42:19 273.52 17:15:50 688.24
23 ID1473718059 3.0 0.0 40.741 33.0 17.0 -38.756634 144.375564 -38.817727 147.071495 234.014 2018-06-21 01:04:15 191.66 4:15:54 165.37
24 ID5551646734 3.0 1.0 15.419 35.0 21.0 -36.922413 146.362349 -37.240137 143.768208 233.068 2018-01-05 07:19:48 190.91 10:30:42 174.52
25 ID1772122934 2.0 0.0 10.663 29.0 1.0 -37.019910 142.798295 -38.358124 143.943955 179.929 2018-02-27 04:42:03 167.70 7:29:44 124.95
26 ID1987608852 1.0 0.0 20.685 9.0 35.0 -38.970092 145.435801 -37.068624 146.317717 225.349 2018-05-15 15:35:10 224.85 19:20:00 80.66
27 ID1249352358 2.0 0.0 10.272 8.0 34.0 -38.485641 144.522135 -38.505484 143.311788 105.471 2018-02-05 06:13:29 101.02 7:54:30 102.09
28 ID1611614450 1.0 0.0 7.373 16.0 38.0 -36.600877 143.566811 -37.804113 142.793139 150.483 2018-06-18 14:01:42 152.40 16:34:06 82.11
29 ID1262379299 2.0 0.0 31.358 13.0 16.0 -38.195911 147.436921 -36.734698 143.764279 362.941 2018-04-24 22:40:54 331.59 4:12:29 147.21
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 ID1879423081 2.0 0.0 17.081 20.0 14.0 -38.860358 148.174855 -37.562224 146.443991 209.278 2018-02-15 12:09:15 193.98 15:23:13 121.33
37840 ID5705840841 1.0 1.0 35.164 29.0 23.0 -37.250331 142.837244 -37.739285 145.963420 281.403 2018-01-01 00:07:35 279.10 4:46:41 136.58
37841 ID1276239209 3.0 0.0 36.704 12.0 11.0 -36.578568 145.273744 -38.305557 146.997297 245.269 2018-01-24 06:48:05 200.54 10:08:37 146.69
37842 ID1432868583 3.0 0.0 13.195 37.0 21.0 -38.755802 147.744770 -37.487866 143.585293 390.599 2018-04-07 18:57:52 315.28 0:13:08 170.52
37844 NaN 1.0 0.0 22.498 30.0 15.0 -37.885792 144.875305 -38.680341 145.874064 124.252 2018-01-28 13:07:09 127.02 15:14:10 74.69
37845 NaN 3.0 0.0 32.300 16.0 36.0 -36.571169 143.741010 -36.993435 144.983048 120.297 2018-03-26 02:34:49 101.88 4:16:41 162.20
37846 NaN 3.0 0.0 18.601 38.0 29.0 -37.694132 142.851548 -37.058014 142.873698 70.838 2018-03-02 20:33:11 62.83 21:36:00 142.10
37850 NaN 2.0 1.0 28.203 18.0 21.0 -38.139260 143.345778 -37.279348 143.604189 98.391 2018-04-17 03:28:41 94.68 5:03:21 152.30
37851 NaN 2.0 1.0 45.696 28.0 1.0 -38.152835 147.793072 -38.162103 144.043047 328.220 2018-03-25 13:50:57 300.50 18:51:27 167.39
37852 NaN 1.0 0.0 27.143 38.0 32.0 -37.613135 142.854194 -36.713765 148.383062 500.500 2018-05-12 08:54:52 491.13 17:05:59 97.44
37854 NaN 1.0 0.0 13.002 27.0 3.0 -37.428594 147.056992 -38.383631 145.590528 167.005 2018-02-20 03:35:43 168.39 6:24:06 92.12
37857 NaN 1.0 0.0 19.468 5.0 31.0 -37.570514 145.253281 -37.824303 143.862472 125.716 2018-04-20 11:48:14 128.43 13:56:39 69.86
37860 NaN 1.0 0.0 10.647 21.0 5.0 -37.239046 143.524656 -37.546884 145.204154 152.434 2018-06-13 23:05:39 154.29 1:39:56 95.65
37861 NaN 2.0 0.0 40.775 22.0 19.0 -36.678761 144.345069 -37.167099 144.257043 54.922 2018-03-27 04:31:43 55.75 5:27:28 120.70
37863 NaN 1.0 0.0 35.410 9.0 16.0 -39.006941 145.406988 -36.525200 143.517906 322.399 2018-07-21 06:48:39 318.77 12:07:25 88.14
37866 NaN 1.0 1.0 40.151 14.0 34.0 -37.644061 146.625820 -38.518928 143.369714 301.447 2018-04-14 07:29:13 298.50 12:27:43 118.89
37869 NaN 3.0 1.0 44.559 33.0 10.0 -38.733722 144.474460 -37.220132 147.498884 314.323 2018-04-18 04:57:37 255.06 9:12:40 201.89
37873 NaN 1.0 0.0 39.038 7.0 22.0 -38.664403 143.702592 -36.670478 144.312065 228.361 2018-02-02 18:03:25 227.77 21:51:11 87.87
37874 NaN 1.0 0.0 7.171 14.0 1.0 -37.589382 146.559422 -38.229669 143.710836 260.125 2018-02-03 22:11:31 258.51 2:30:01 106.47
37875 NaN 1.0 0.0 39.141 5.0 1.0 -37.491746 145.168879 -38.168680 143.936761 131.959 2018-04-19 22:54:02 134.48 1:08:30 94.83
37876 NaN 2.0 0.0 42.330 25.0 5.0 -36.618297 147.643165 -37.563856 145.038804 254.069 2018-01-27 11:06:09 234.09 15:00:14 116.10
37877 NaN 1.0 1.0 43.965 25.0 38.0 -36.542082 147.863736 -37.549355 142.980000 448.104 2018-06-14 22:15:50 440.42 5:36:15 149.26
37879 NaN 1.0 0.0 18.540 36.0 18.0 -36.974023 145.036046 -38.015640 142.955101 217.299 2018-05-28 07:14:50 217.06 10:51:53 76.31
37885 NaN 1.0 0.0 13.544 34.0 38.0 -38.528632 143.386140 -37.561137 142.835312 118.028 2018-04-09 11:48:46 120.99 13:49:45 69.88
37888 NaN 1.0 1.0 34.923 36.0 28.0 -37.044972 144.920592 -38.218799 148.050871 305.307 2018-07-20 03:13:54 302.23 8:16:07 130.70
37889 ID1281653747 1.0 0.0 34.130 8.0 36.0 -38.434373 144.730220 -37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44
37892 NaN 3.0 1.0 5.416 20.0 33.0 -38.959090 148.294700 -38.930545 144.661887 314.514 2018-04-27 20:53:02 255.21 1:08:14 185.96
37894 NaN 2.0 0.0 41.232 38.0 39.0 -37.657406 142.777301 -38.622040 148.366529 500.901 2018-07-02 08:59:29 455.14 16:34:37 139.79
37897 NaN 1.0 0.0 8.865 9.0 2.0 -38.839254 145.226776 -37.695101 148.251214 293.394 2018-03-11 12:18:21 290.70 17:09:02 88.71
37898 NaN 3.0 1.0 27.153 39.0 16.0 -38.446310 148.292498 -36.739777 143.604529 454.968 2018-07-23 08:29:19 366.09 14:35:24 188.49

37869 rows × 16 columns

任务12:填补 'Post Type'的空值

提示:可能与id有关

data_1[data_1['Post Type'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
2179 ID5742860733 1.0 NaN 27.126 22.0 34.0 -36.679647 144.463364 -38.469625 143.202877 228.181 2018-02-28 14:01:50 227.59 17:49:25 114.27
4212 ID1710452907 2.0 NaN 37.466 22.0 21.0 -36.695380 144.474875 -37.357030 143.508873 113.113 2018-07-28 11:30:33 107.86 13:18:24 103.59
4219 ID1787036788 1.0 NaN 30.877 3.0 6.0 -38.311620 145.478355 -36.832092 145.600747 165.050 2018-07-18 11:58:01 166.50 14:44:31 75.01
4253 ID1680028038 2.0 NaN 13.860 22.0 19.0 -36.837363 144.515369 -37.089526 144.531402 28.106 2018-06-17 04:43:00 31.74 5:14:44 120.43
6299 ID1377767619 1.0 NaN 45.416 23.0 13.0 -37.720326 146.008038 -38.222304 147.498125 142.197 2018-01-13 09:36:02 144.38 12:00:24 69.85
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
26778 ID1571528676 1.0 NaN 15.062 33.0 9.0 -38.965718 144.650812 -38.962812 145.292314 55.525 2018-04-28 21:13:25 60.51 22:13:55 89.97
28691 ID5883461461 1.0 NaN 40.325 4.0 29.0 -36.542055 143.024959 -37.232887 142.980514 77.003 2018-05-01 04:18:09 81.29 5:39:26 124.92
30861 ID1971183564 3.0 NaN 42.700 19.0 4.0 -37.120747 144.287009 -36.762751 143.078632 114.656 2018-02-25 16:19:23 97.43 17:56:48 144.33
32921 ID5291598933 2.0 NaN 27.189 2.0 7.0 -37.564903 148.097962 -38.797131 143.873724 394.211 2018-01-09 12:39:15 359.59 18:38:50 165.54
36991 ID1484986030 1.0 NaN 27.517 37.0 36.0 -38.938504 147.755657 -37.060130 144.993660 320.003 2018-01-07 20:24:23 316.45 1:40:49 99.58

20 rows × 16 columns

#用投递速度来区分
def speed(df):
    speed = df['Journey Distance']/df['Travel Time']
    return round(speed,2)

data_1.loc[data_1['Post Type']==0].apply(speed,axis=1)

0        1.06
1        0.88
2        1.09
3        1.10
4        1.04
         ... 
37837    0.99
37839    1.08
37841    1.22
37842    1.24
37889    0.99
Length: 26529, dtype: float64
#用投递价格来区分
def price(df):
    price = df['Delivery Fare']/df['Package Weight']
    return round(price,2)
data_1.loc[data_1['Post Type']==0,['Delivery Fare','Package Weight']].apply(price,axis=1)

0         4.58
1         3.81
2        19.53
3         9.18
4        13.78
         ...  
37837    13.20
37839     7.10
37841     4.00
37842    12.92
37889     2.03
Length: 26529, dtype: float64
#ID以5还是1开头为区别
data_1.loc[data_1['Post Type'].isnull(),'Post Type'] = data_1.loc[data_1['Post Type'].isnull(),'Id'].apply(lambda s:s[2]=='1')

任务13:修复 'Origin Longitude'与 'Origin Latitude' 列中错误的值

data_1['Origin Longitude'].describe()
#有负值

count    37844.000000
mean       145.423081
std          6.929107
min       -148.337157
25%        143.964265
50%        145.424189
75%        147.171954
max        148.450576
Name: Origin Longitude, dtype: float64
def fix_Longitude_Latitude(flt):
    if flt<0:
        return -flt
    else:
        return flt
data_1['Origin Longitude'].apply(fix_Longitude_Latitude).describe()

count    37844.000000
mean       145.577375
std          1.764044
min        142.769991
25%        143.966143
50%        145.426161
75%        147.172965
max        148.450576
Name: Origin Longitude, dtype: float64
data_1['Origin Longitude'] = data_1['Origin Longitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
data_1['Origin Latitude'].describe()

count    37844.000000
mean       -37.728738
std          1.900393
min        -39.006941
25%        -38.442905
50%        -37.707244
75%        -37.094433
max         38.986998
Name: Origin Latitude, dtype: float64
data_1['Origin Latitude'] = data_1['Origin Latitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

任务14:修复 'Destination Latitude' 与 'Destination Longitude'列中错误的值

data_1['Destination Latitude'] = data_1['Destination Latitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
data_1['Destination Longitude'] = data_1['Destination Longitude'].apply(fix_Longitude_Latitude)

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

任务 15:填补'Origin Region'和‘Destination Region’中的空值

data_1[data_1['Origin Region'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
2080 ID1634835235 1.0 0 31.871 NaN 22.0 37.300712 143.610855 36.794684 144.280083 81.904 2018-06-09 03:21:49 86.04 4:47:51 95.70
4442 ID1321659107 2.0 0 12.991 NaN 10.0 38.180248 146.775985 37.396662 147.313624 99.224 2018-04-12 11:25:54 95.42 13:01:19 98.48
8220 ID1522212019 1.0 0 25.282 NaN 23.0 36.756457 148.437792 37.665363 145.818608 253.275 2018-03-27 17:56:31 251.88 22:08:23 94.56
8228 ID1946204979 2.0 0 12.779 NaN 25.0 37.391203 148.116452 36.520786 147.557258 108.914 2018-07-08 10:57:12 104.10 12:41:17 102.49
15023 ID1891262263 1.0 0 36.466 NaN 7.0 37.537619 145.202427 38.723869 143.685908 187.265 2018-03-13 07:21:08 188.00 10:29:08 72.78
20513 ID1490968406 1.0 0 42.648 NaN 9.0 36.654443 143.691689 38.849982 145.479151 290.643 2018-02-16 18:46:02 288.04 23:34:04 99.81
20515 ID5148310393 1.0 1 44.505 NaN 7.0 38.973121 142.989422 38.823172 143.990199 88.293 2018-04-18 23:35:27 92.22 1:07:40 119.24
28710 ID5941350307 1.0 1 17.232 NaN 32.0 37.616289 146.052853 36.696768 148.280560 222.563 2018-02-22 02:17:46 222.16 5:59:55 125.46
34853 ID5234294750 2.0 1 29.063 NaN 7.0 37.048542 142.843945 38.852320 143.963431 223.542 2018-07-14 13:22:02 206.75 16:48:47 151.47
36904 ID1724943602 1.0 0 31.774 NaN 31.0 36.736858 143.688416 37.548149 143.885770 91.993 2018-02-12 14:31:24 95.80 16:07:11 81.57
data_1.loc[data_1['Origin Region'].notnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
0 ID1645282128 2.0 0 21.686 19.0 38.0 37.089338 144.429529 37.639134 142.891391 149.212 2018-01-16 09:38:17 140.19 11:58:28 99.25
1 ID1697620764 NaN 0 39.075 15.0 15.0 38.481935 146.009567 38.585528 146.199827 20.185 2018-02-10 04:28:17 22.84 4:51:07 149.04
2 ID1543933503 2.0 0 7.243 33.0 28.0 38.754167 144.509664 38.242224 147.855342 296.975 2018-05-05 01:38:03 272.52 6:10:34 141.48
3 ID1756517608 2.0 0 13.383 10.0 38.0 37.240526 147.568019 37.687178 142.991188 407.396 2018-06-11 11:43:04 371.40 17:54:27 122.82
4 ID1832325834 2.0 0 8.123 1.0 8.0 38.143985 143.798292 38.548315 144.769228 95.974 2018-03-16 14:50:25 92.51 16:22:55 111.97
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
37839 ID1879423081 2.0 0 17.081 20.0 14.0 38.860358 148.174855 37.562224 146.443991 209.278 2018-02-15 12:09:15 193.98 15:23:13 121.33
37840 ID5705840841 1.0 1 35.164 29.0 23.0 37.250331 142.837244 37.739285 145.963420 281.403 2018-01-01 00:07:35 279.10 4:46:41 136.58
37841 ID1276239209 3.0 0 36.704 12.0 11.0 36.578568 145.273744 38.305557 146.997297 245.269 2018-01-24 06:48:05 200.54 10:08:37 146.69
37842 ID1432868583 3.0 0 13.195 37.0 21.0 38.755802 147.744770 37.487866 143.585293 390.599 2018-04-07 18:57:52 315.28 0:13:08 170.52
37889 NaN 1.0 0 34.130 8.0 36.0 38.434373 144.730220 37.047801 145.309517 162.554 2018-02-06 06:27:55 164.08 9:11:59 69.44

37834 rows × 16 columns

data_2 = data_1.loc[data_1['Origin Region'].notnull()]
def fill_region(df):
    longitude,latitude = df['Origin Longitude'],df['Origin Latitude']
    distance_0 = 100
    region_out = ''

    for index in data_2.index:
        longitude_ = data_2.loc[index,'Origin Longitude']
        latitude_ = data_2.loc[index,'Origin Latitude']
        region = data_2.loc[index,'Origin Region']
        
        distance = (longitude-longitude_)**2+(latitude_-latitude)**2
        if distance<distance_0 :
            region_out = region
            distance_0 = distance
    return region_out

data_1.loc[data_1['Origin Region'].isnull()].apply(fill_region,axis=1)

2080     21.0
4442     11.0
8220     32.0
8228      2.0
15023     5.0
20513    16.0
20515    24.0
28710    23.0
34853    29.0
36904    16.0
dtype: float64
data_1.loc[data_1['Origin Region'].isnull()]

Id Drone Type Post Type Package Weight Origin Region Destination Region Origin Latitude Origin Longitude Destination Latitude Destination Longitude Journey Distance Departure Date Departure Time Travel Time Delivery Time Delivery Fare
2080 ID1634835235 1.0 0 31.871 NaN 22.0 37.300712 143.610855 36.794684 144.280083 81.904 2018-06-09 03:21:49 86.04 4:47:51 95.70
4442 ID1321659107 2.0 0 12.991 NaN 10.0 38.180248 146.775985 37.396662 147.313624 99.224 2018-04-12 11:25:54 95.42 13:01:19 98.48
8220 ID1522212019 1.0 0 25.282 NaN 23.0 36.756457 148.437792 37.665363 145.818608 253.275 2018-03-27 17:56:31 251.88 22:08:23 94.56
8228 ID1946204979 2.0 0 12.779 NaN 25.0 37.391203 148.116452 36.520786 147.557258 108.914 2018-07-08 10:57:12 104.10 12:41:17 102.49
15023 ID1891262263 1.0 0 36.466 NaN 7.0 37.537619 145.202427 38.723869 143.685908 187.265 2018-03-13 07:21:08 188.00 10:29:08 72.78
20513 ID1490968406 1.0 0 42.648 NaN 9.0 36.654443 143.691689 38.849982 145.479151 290.643 2018-02-16 18:46:02 288.04 23:34:04 99.81
20515 ID5148310393 1.0 1 44.505 NaN 7.0 38.973121 142.989422 38.823172 143.990199 88.293 2018-04-18 23:35:27 92.22 1:07:40 119.24
28710 ID5941350307 1.0 1 17.232 NaN 32.0 37.616289 146.052853 36.696768 148.280560 222.563 2018-02-22 02:17:46 222.16 5:59:55 125.46
34853 ID5234294750 2.0 1 29.063 NaN 7.0 37.048542 142.843945 38.852320 143.963431 223.542 2018-07-14 13:22:02 206.75 16:48:47 151.47
36904 ID1724943602 1.0 0 31.774 NaN 31.0 36.736858 143.688416 37.548149 143.885770 91.993 2018-02-12 14:31:24 95.80 16:07:11 81.57

任务16:找出 'Departure Date'中错误的值

data_1['Departure Date'].sort_values(ascending=False)

31975    2018-30-06
4990     2018-28-06
17911    2018-28-06
28350    2018-28-05
35740    2018-28-05
            ...    
12702    2018-01-01
36755    2018-01-01
12684    2018-01-01
36782    2018-01-01
27230    2018-01-01
Name: Departure Date, Length: 37844, dtype: object
def fix_date(dt):
    split_ = dt.split('-')
    year,month,day = split_[0],split_[1],split_[2]
    if month > '08':
        return year+'-'+day+'-'+month
    else:
        return dt
data_1['Departure Date'] = data_1['Departure Date'].apply(fix_date)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
data_1['Departure Date'].sort_values(ascending=False)
17849    2018-07-28
17570    2018-07-28
988      2018-07-28
592      2018-07-28
12772    2018-07-28
            ...    
29092    2018-01-01
5232     2018-01-01
12584    2018-01-01
29041    2018-01-01
5519     2018-01-01
Name: Departure Date, Length: 37844, dtype: object

任务17:输出数据集为‘solution.csv’到当前目录下面

data_1.to_csv('solution.csv')
posted @ 2019-12-06 21:00  Sean_Yang  阅读(446)  评论(0编辑  收藏  举报