数据分析与挖掘练习
1.0 背景
该数据集是澳大利亚某公司无人机送货的记录(2018年8月之前),主要包括以下的列:
- 'Id' : 记录的ID
- 'Drone Type' : 无人机的类别分 1类 2类 3类
- 'Post Type' : 运送的类别 0为普通运送 1为速运
- 'Package Weight' :包裹的重量
- 'Origin Region' :出发地的区域代码
- 'Destination Region' :目的地的区域代码
- 'Origin Latitude' :出发纬度
- 'Origin Longitude' :出发经度
- 'Destination Latitude' :目的地纬度
- 'Destination Longitude' :目的地经度
- 'Journey Distance' :运送距离
- 'Departure Date' :出发日期
- 'Departure Time' :出发时间
- 'Travel Time' :飞行时间
- 'Delivery Time' :到达时间
- 'Delivery Fare' :运送费用
pd.options.display.max_rows = 10
2.0 载入包和数据
#loading library
import pandas as pd
import re
import matplotlib.pyplot as plt
#import seaborn as sns !pip intall seaborn
import scipy.stats as st
import numpy as np
import math
from math import *
from datetime import datetime,timedelta
任务1:载入名为‘data.csv’的数据
data = pd.read_csv('data.csv')
DataFrame
Series
type(data)
pandas.core.frame.DataFrame
3.0 数据初步探索
任务2:找出数据有多少行列
data.shape
(37903, 16)
任务3:查看列的统计信息
提示:describe()
data.describe()
Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Travel Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 37893.000000 | 37883.000000 | 37903.000000 | 37893.000000 | 37893.000000 | 37903.000000 | 37903.000000 | 37903.000000 | 37903.000000 | 37903.000000 | 37863.000000 | 37874.000000 |
mean | 1.699285 | 0.298709 | 25.669901 | 20.476684 | 20.452722 | -37.728867 | 145.423058 | -37.722054 | 145.434035 | 221.954150 | 208.794518 | 126.814976 |
std | 0.779845 | 0.457698 | 12.107150 | 11.501110 | 11.509311 | 1.899183 | 6.923993 | 1.895621 | 6.909055 | 116.604355 | 107.612447 | 59.314445 |
min | 1.000000 | 0.000000 | 5.001000 | 1.000000 | 1.000000 | -39.006941 | -148.337157 | -39.006941 | -147.691902 | 0.664000 | 7.420000 | 54.020000 |
25% | 1.000000 | 0.000000 | 15.199000 | 11.000000 | 11.000000 | -38.443034 | 143.965002 | -38.431293 | 143.951543 | 131.044500 | 125.165000 | 97.440000 |
50% | 2.000000 | 0.000000 | 25.446000 | 20.000000 | 20.000000 | -37.707244 | 145.423386 | -37.700695 | 145.450794 | 209.796000 | 196.370000 | 120.045000 |
75% | 2.000000 | 1.000000 | 35.953500 | 30.000000 | 30.000000 | -37.094433 | 147.170334 | -37.080256 | 147.216886 | 302.052000 | 281.250000 | 145.800000 |
max | 3.000000 | 1.000000 | 55.992000 | 40.000000 | 40.000000 | 38.986998 | 148.450576 | 38.989473 | 148.450576 | 556.637000 | 545.460000 | 1217.690000 |
任务4:找出每个列名称
data.columns
Index(['Id', 'Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
'Destination Region', 'Origin Latitude', 'Origin Longitude',
'Destination Latitude', 'Destination Longitude', 'Journey Distance',
'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
'Delivery Fare'],
dtype='object')
任务5:找出每个列的属性
是Obeject 还是 float
data.info()
data['Drone Type'] = data['Drone Type'].astype('str')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37903 entries, 0 to 37902
Data columns (total 16 columns):
Id 37878 non-null object
Drone Type 37893 non-null float64
Post Type 37883 non-null float64
Package Weight 37903 non-null float64
Origin Region 37893 non-null float64
Destination Region 37893 non-null float64
Origin Latitude 37903 non-null float64
Origin Longitude 37903 non-null float64
Destination Latitude 37903 non-null float64
Destination Longitude 37903 non-null float64
Journey Distance 37903 non-null float64
Departure Date 37903 non-null object
Departure Time 37903 non-null object
Travel Time 37863 non-null float64
Delivery Time 37903 non-null object
Delivery Fare 37874 non-null float64
dtypes: float64(12), object(4)
memory usage: 4.6+ MB
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37903 entries, 0 to 37902
Data columns (total 16 columns):
Id 37878 non-null object
Drone Type 37903 non-null object
Post Type 37883 non-null float64
Package Weight 37903 non-null float64
Origin Region 37893 non-null float64
Destination Region 37893 non-null float64
Origin Latitude 37903 non-null float64
Origin Longitude 37903 non-null float64
Destination Latitude 37903 non-null float64
Destination Longitude 37903 non-null float64
Journey Distance 37903 non-null float64
Departure Date 37903 non-null object
Departure Time 37903 non-null object
Travel Time 37863 non-null float64
Delivery Time 37903 non-null object
Delivery Fare 37874 non-null float64
dtypes: float64(11), object(5)
memory usage: 4.6+ MB
任务6:找出数据的前5行和后5行
data.head()
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ID1645282128 | 2.0 | 0.0 | 21.686 | 19.0 | 38.0 | -37.089338 | 144.429529 | -37.639134 | 142.891391 | 149.212 | 2018-01-16 | 09:38:17 | 140.19 | 11:58:28 | 99.25 |
1 | ID1697620764 | nan | 0.0 | 39.075 | 15.0 | 15.0 | -38.481935 | 146.009567 | -38.585528 | 146.199827 | 20.185 | 2018-02-10 | 04:28:17 | 22.84 | 4:51:07 | 149.04 |
2 | ID1543933503 | 2.0 | 0.0 | 7.243 | 33.0 | 28.0 | -38.754167 | 144.509664 | -38.242224 | 147.855342 | 296.975 | 2018-05-05 | 01:38:03 | 272.52 | 6:10:34 | 141.48 |
3 | ID1756517608 | 2.0 | 0.0 | 13.383 | 10.0 | 38.0 | -37.240526 | 147.568019 | -37.687178 | 142.991188 | 407.396 | 2018-06-11 | 11:43:04 | 371.40 | 17:54:27 | 122.82 |
4 | ID1832325834 | 2.0 | 0.0 | 8.123 | 1.0 | 8.0 | -38.143985 | 143.798292 | -38.548315 | 144.769228 | 95.974 | 2018-03-16 | 14:50:25 | 92.51 | 16:22:55 | 111.97 |
5 | ID1802448576 | 2.0 | 0.0 | 32.859 | 2.0 | 28.0 | -37.421211 | 148.044072 | -38.159627 | 148.194048 | 83.250 | 2018-05-15 | 16:35:50 | 81.12 | 17:56:57 | 113.88 |
6 | ID1940231408 | 1.0 | 0.0 | 20.616 | 29.0 | 36.0 | -37.173949 | 143.140662 | -37.021605 | 145.197043 | 183.363 | 2018-04-01 | 19:31:12 | 184.22 | 22:35:25 | 85.60 |
7 | ID1299303958 | 2.0 | 0.0 | 44.577 | 36.0 | 31.0 | -37.123190 | 145.236196 | -37.667199 | 143.877650 | 134.543 | 2018-05-01 | 18:39:36 | 127.05 | 20:46:38 | 114.22 |
8 | ID1752722028 | 1.0 | 0.0 | 15.363 | 20.0 | 30.0 | -38.850561 | 148.317253 | -38.024914 | 144.823938 | 318.132 | 2018-05-27 | 14:48:17 | 314.64 | 20:02:55 | 87.39 |
9 | ID5995243590 | 1.0 | 1.0 | 36.190 | 18.0 | 28.0 | -38.070189 | 142.950207 | -37.996817 | 148.026520 | 445.106 | 2018-06-17 | 12:53:02 | 437.52 | 20:10:33 | 142.95 |
data.tail()
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
37898 | NaN | 3.0 | 1.0 | 27.153 | 39.0 | 16.0 | -38.446310 | 148.292498 | -36.739777 | 143.604529 | 454.968 | 2018-07-23 | 08:29:19 | 366.09 | 14:35:24 | 188.49 |
37899 | ID5862552991 | 1.0 | 1.0 | 40.363 | 9.0 | 38.0 | -38.983710 | 145.320518 | -37.673908 | 142.879230 | 258.259 | 2018-06-26 | 15:55:37 | 256.70 | 20:12:18 | 122.98 |
37900 | ID5339104082 | 1.0 | 1.0 | 35.955 | 13.0 | 32.0 | -38.292301 | 147.562013 | -36.605285 | 148.293183 | 198.597 | 2018-03-19 | 16:41:10 | 198.97 | 20:00:08 | 118.47 |
37901 | ID5468787866 | 2.0 | 1.0 | 29.566 | 33.0 | 23.0 | -38.853243 | 144.508346 | -37.727691 | 145.662270 | 160.816 | 2018-02-26 | 04:22:30 | 150.58 | 6:53:04 | 161.96 |
37902 | ID1448126768 | 3.0 | 0.0 | 44.070 | 36.0 | 34.0 | -37.129313 | 145.266426 | -38.428477 | 143.341632 | 222.687 | 2018-07-07 | 08:01:42 | 182.71 | 11:04:24 | 144.41 |
任务7:找出所有列的缺失值个数并且按照多到少排列
隐藏任务:可视化缺失值的列
data.isnull().sum().sort_values(ascending=False)
Travel Time 40
Delivery Fare 29
Id 25
Post Type 20
Destination Region 10
Origin Region 10
Delivery Time 0
Departure Time 0
Departure Date 0
Journey Distance 0
Destination Longitude 0
Destination Latitude 0
Origin Longitude 0
Origin Latitude 0
Package Weight 0
Drone Type 0
dtype: int64
count = {}
for col in data.columns:
count_null = data[col].isnull().sum()
count[col] = count_null
for i,j in sorted(count.items(),key = lambda s: s[1], reverse=True):
print('列名:%s,存在缺失值 %s 个'%(i,j))
列名:Travel Time,存在缺失值 40 个
列名:Delivery Fare,存在缺失值 29 个
列名:Id,存在缺失值 25 个
列名:Post Type,存在缺失值 20 个
列名:Drone Type,存在缺失值 10 个
列名:Origin Region,存在缺失值 10 个
列名:Destination Region,存在缺失值 10 个
列名:Package Weight,存在缺失值 0 个
列名:Origin Latitude,存在缺失值 0 个
列名:Origin Longitude,存在缺失值 0 个
列名:Destination Latitude,存在缺失值 0 个
列名:Destination Longitude,存在缺失值 0 个
列名:Journey Distance,存在缺失值 0 个
列名:Departure Date,存在缺失值 0 个
列名:Departure Time,存在缺失值 0 个
列名:Delivery Time,存在缺失值 0 个
任务8:找出所有至少含有一个缺失值的行,并统计有多少行
data.isnull().any(axis=1) # 判断至少有一个缺失值
0 False
1 True
2 False
3 False
4 False
...
37898 True
37899 False
37900 False
37901 False
37902 False
Length: 37903, dtype: bool
data.drop(data.iloc[0,2])
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | ID1697620764 | NaN | 0.0 | 39.075 | 15.0 | 15.0 | -38.481935 | 146.009567 | -38.585528 | 146.199827 | 20.185 | 2018-02-10 | 04:28:17 | 22.84 | 4:51:07 | 149.04 |
2 | ID1543933503 | 2.0 | 0.0 | 7.243 | 33.0 | 28.0 | -38.754167 | 144.509664 | -38.242224 | 147.855342 | 296.975 | 2018-05-05 | 01:38:03 | 272.52 | 6:10:34 | 141.48 |
3 | ID1756517608 | 2.0 | 0.0 | 13.383 | 10.0 | 38.0 | -37.240526 | 147.568019 | -37.687178 | 142.991188 | 407.396 | 2018-06-11 | 11:43:04 | 371.40 | 17:54:27 | 122.82 |
4 | ID1832325834 | 2.0 | 0.0 | 8.123 | 1.0 | 8.0 | -38.143985 | 143.798292 | -38.548315 | 144.769228 | 95.974 | 2018-03-16 | 14:50:25 | 92.51 | 16:22:55 | 111.97 |
5 | ID1802448576 | 2.0 | 0.0 | 32.859 | 2.0 | 28.0 | -37.421211 | 148.044072 | -38.159627 | 148.194048 | 83.250 | 2018-05-15 | 16:35:50 | 81.12 | 17:56:57 | 113.88 |
6 | ID1940231408 | 1.0 | 0.0 | 20.616 | 29.0 | 36.0 | -37.173949 | 143.140662 | -37.021605 | 145.197043 | 183.363 | 2018-04-01 | 19:31:12 | 184.22 | 22:35:25 | 85.60 |
7 | ID1299303958 | 2.0 | 0.0 | 44.577 | 36.0 | 31.0 | -37.123190 | 145.236196 | -37.667199 | 143.877650 | 134.543 | 2018-05-01 | 18:39:36 | 127.05 | 20:46:38 | 114.22 |
8 | ID1752722028 | 1.0 | 0.0 | 15.363 | 20.0 | 30.0 | -38.850561 | 148.317253 | -38.024914 | 144.823938 | 318.132 | 2018-05-27 | 14:48:17 | 314.64 | 20:02:55 | 87.39 |
9 | ID5995243590 | 1.0 | 1.0 | 36.190 | 18.0 | 28.0 | -38.070189 | 142.950207 | -37.996817 | 148.026520 | 445.106 | 2018-06-17 | 12:53:02 | 437.52 | 20:10:33 | 142.95 |
10 | ID1483358088 | 2.0 | 0.0 | 23.172 | 13.0 | 27.0 | -38.225456 | 147.425515 | -37.642798 | 147.124104 | 70.051 | 2018-03-19 | 09:59:10 | 69.30 | 11:08:27 | 96.95 |
11 | ID1626798395 | 2.0 | 0.0 | 19.754 | 23.0 | 26.0 | -37.625368 | 145.838281 | -36.789955 | 147.133916 | 147.791 | 2018-02-28 | 17:40:59 | 138.92 | 19:59:54 | 117.48 |
12 | ID5277549009 | 3.0 | 1.0 | 12.807 | 4.0 | 6.0 | -36.855984 | 142.929596 | -36.906838 | 145.696986 | 246.465 | 2018-03-26 | 07:55:48 | 201.49 | 11:17:17 | 173.32 |
13 | ID1950928883 | 2.0 | 0.0 | 22.332 | 33.0 | 19.0 | -38.894115 | 144.457143 | -37.173740 | 144.152105 | 193.365 | 2018-06-28 | 16:05:55 | 179.73 | 19:05:38 | 117.67 |
14 | ID5143738648 | 2.0 | 1.0 | 25.880 | 33.0 | 5.0 | -38.872372 | 144.606034 | -37.553304 | 145.120753 | 153.580 | 2018-05-09 | 08:33:29 | 144.10 | 10:57:34 | 132.98 |
15 | ID5132897910 | 2.0 | 1.0 | 38.691 | 7.0 | 15.0 | -38.844622 | 144.093195 | -38.476630 | 145.849992 | 158.102 | 2018-01-05 | 15:55:00 | 148.15 | 18:23:09 | 147.73 |
16 | ID1290889802 | 1.0 | 0.0 | 30.742 | 19.0 | 14.0 | -37.178059 | 144.403991 | -37.713867 | 146.382965 | 184.783 | 2018-06-05 | 13:03:43 | 185.60 | 16:09:18 | 84.82 |
17 | ID5226355535 | 1.0 | 1.0 | 18.055 | 10.0 | 18.0 | -37.141728 | 147.256091 | -37.983127 | 143.290272 | 362.227 | 2018-05-04 | 07:42:51 | 357.32 | 13:40:10 | 116.35 |
18 | ID1898978312 | 1.0 | 0.0 | 5.986 | 23.0 | 18.0 | -37.706960 | 145.718119 | -37.983127 | 143.182464 | 224.997 | 2018-05-23 | 12:53:12 | 224.51 | 16:37:42 | 82.47 |
19 | ID5284908619 | 2.0 | 1.0 | 30.664 | 13.0 | 14.0 | -38.250238 | 147.366610 | -37.774617 | 146.503568 | 92.371 | 2018-07-11 | 12:52:57 | 89.29 | 14:22:14 | 143.96 |
20 | ID1585556406 | 3.0 | 0.0 | 24.942 | 3.0 | 39.0 | -38.322643 | 145.505910 | -38.453191 | 148.300405 | 244.251 | 2018-03-14 | 22:16:33 | 199.74 | 1:36:17 | 167.45 |
21 | ID1901962779 | 1.0 | 0.0 | 34.012 | 27.0 | 37.0 | -37.516090 | 146.969053 | -38.852488 | 147.816987 | 166.236 | 2018-07-17 | 16:26:55 | 167.65 | 19:14:34 | 88.98 |
22 | ID5590279060 | 1.0 | 1.0 | 55.229 | 7.0 | 6.0 | -38.876145 | 143.911302 | -36.875012 | 145.759049 | 275.633 | 2018-05-01 | 12:42:19 | 273.52 | 17:15:50 | 688.24 |
23 | ID1473718059 | 3.0 | 0.0 | 40.741 | 33.0 | 17.0 | -38.756634 | 144.375564 | -38.817727 | 147.071495 | 234.014 | 2018-06-21 | 01:04:15 | 191.66 | 4:15:54 | 165.37 |
24 | ID5551646734 | 3.0 | 1.0 | 15.419 | 35.0 | 21.0 | -36.922413 | 146.362349 | -37.240137 | 143.768208 | 233.068 | 2018-01-05 | 07:19:48 | 190.91 | 10:30:42 | 174.52 |
25 | ID1772122934 | 2.0 | 0.0 | 10.663 | 29.0 | 1.0 | -37.019910 | 142.798295 | -38.358124 | 143.943955 | 179.929 | 2018-02-27 | 04:42:03 | 167.70 | 7:29:44 | 124.95 |
26 | ID1987608852 | 1.0 | 0.0 | 20.685 | 9.0 | 35.0 | -38.970092 | 145.435801 | -37.068624 | 146.317717 | 225.349 | 2018-05-15 | 15:35:10 | 224.85 | 19:20:00 | 80.66 |
27 | ID1249352358 | 2.0 | 0.0 | 10.272 | 8.0 | 34.0 | -38.485641 | 144.522135 | -38.505484 | 143.311788 | 105.471 | 2018-02-05 | 06:13:29 | 101.02 | 7:54:30 | 102.09 |
28 | ID1611614450 | 1.0 | 0.0 | 7.373 | 16.0 | 38.0 | -36.600877 | 143.566811 | -37.804113 | 142.793139 | 150.483 | 2018-06-18 | 14:01:42 | 152.40 | 16:34:06 | 82.11 |
29 | ID1262379299 | 2.0 | 0.0 | 31.358 | 13.0 | 16.0 | -38.195911 | 147.436921 | -36.734698 | 143.764279 | 362.941 | 2018-04-24 | 22:40:54 | 331.59 | 4:12:29 | 147.21 |
30 | ID5498216777 | 2.0 | 1.0 | 6.383 | 40.0 | 29.0 | -37.692167 | 147.890721 | -37.273398 | 142.951899 | 438.698 | 2018-05-09 | 10:31:10 | 399.43 | 17:10:35 | 152.50 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37873 | NaN | 1.0 | 0.0 | 39.038 | 7.0 | 22.0 | -38.664403 | 143.702592 | -36.670478 | 144.312065 | 228.361 | 2018-02-02 | 18:03:25 | 227.77 | 21:51:11 | 87.87 |
37874 | NaN | 1.0 | 0.0 | 7.171 | 14.0 | 1.0 | -37.589382 | 146.559422 | -38.229669 | 143.710836 | 260.125 | 2018-02-03 | 22:11:31 | 258.51 | 2:30:01 | 106.47 |
37875 | NaN | 1.0 | 0.0 | 39.141 | 5.0 | 1.0 | -37.491746 | 145.168879 | -38.168680 | 143.936761 | 131.959 | 2018-04-19 | 22:54:02 | 134.48 | 1:08:30 | 94.83 |
37876 | NaN | 2.0 | 0.0 | 42.330 | 25.0 | 5.0 | -36.618297 | 147.643165 | -37.563856 | 145.038804 | 254.069 | 2018-01-27 | 11:06:09 | 234.09 | 15:00:14 | 116.10 |
37877 | NaN | 1.0 | 1.0 | 43.965 | 25.0 | 38.0 | -36.542082 | 147.863736 | -37.549355 | 142.980000 | 448.104 | 2018-06-14 | 22:15:50 | 440.42 | 5:36:15 | 149.26 |
37878 | ID1525565031 | 2.0 | 0.0 | 27.560 | 7.0 | 35.0 | -38.689638 | 144.093996 | -37.046945 | 146.460389 | 276.894 | 2018-02-22 | 23:43:48 | 254.53 | 3:58:19 | 135.68 |
37879 | NaN | 1.0 | 0.0 | 18.540 | 36.0 | 18.0 | -36.974023 | 145.036046 | -38.015640 | 142.955101 | 217.299 | 2018-05-28 | 07:14:50 | 217.06 | 10:51:53 | 76.31 |
37880 | ID1272392458 | 1.0 | 0.0 | 44.226 | 40.0 | 3.0 | -37.740012 | 147.768225 | -38.276119 | 145.403536 | 215.813 | 2018-02-21 | 21:28:22 | 215.63 | 1:03:59 | 105.89 |
37881 | ID1909131399 | 2.0 | 0.0 | 6.415 | 25.0 | 27.0 | -36.585246 | 147.853062 | -37.700285 | 147.027318 | 144.135 | 2018-01-16 | 08:26:33 | 135.64 | 10:42:11 | 100.45 |
37882 | ID5420355345 | 3.0 | 1.0 | 30.471 | 3.0 | 15.0 | -38.345519 | 145.542105 | -38.712933 | 145.977537 | 55.772 | 2018-07-22 | 00:13:54 | 50.94 | 1:04:50 | 183.73 |
37883 | ID5164753016 | 2.0 | 1.0 | 6.553 | 5.0 | 15.0 | -37.625311 | 145.097611 | -38.630132 | 146.040376 | 139.018 | 2018-01-14 | 16:51:56 | 131.06 | 19:02:59 | 145.71 |
37884 | ID1122103211 | 1.0 | 0.0 | 37.850 | 38.0 | 10.0 | -37.510578 | 142.826103 | -37.337327 | 147.315144 | 397.278 | 2018-04-27 | 14:54:35 | 391.24 | 21:25:49 | 102.88 |
37885 | NaN | 1.0 | 0.0 | 13.544 | 34.0 | 38.0 | -38.528632 | 143.386140 | -37.561137 | 142.835312 | 118.028 | 2018-04-09 | 11:48:46 | 120.99 | 13:49:45 | 69.88 |
37886 | ID1970402579 | 2.0 | 0.0 | 38.656 | 1.0 | 32.0 | -38.171542 | 143.871582 | -36.756484 | 148.322553 | 423.585 | 2018-03-18 | 01:01:33 | 385.90 | 7:27:26 | 142.19 |
37887 | ID1388385049 | 1.0 | 0.0 | 23.699 | 10.0 | 10.0 | -37.352794 | 147.476662 | -37.104700 | 147.417027 | 28.118 | 2018-05-24 | 00:41:08 | 33.99 | 1:15:07 | 83.77 |
37888 | NaN | 1.0 | 1.0 | 34.923 | 36.0 | 28.0 | -37.044972 | 144.920592 | -38.218799 | 148.050871 | 305.307 | 2018-07-20 | 03:13:54 | 302.23 | 8:16:07 | 130.70 |
37889 | ID1281653747 | 1.0 | 0.0 | 34.130 | 8.0 | 36.0 | -38.434373 | 144.730220 | -37.047801 | 145.309517 | 162.554 | 2018-02-06 | 06:27:55 | 164.08 | 9:11:59 | 69.44 |
37890 | ID5349085772 | 1.0 | 1.0 | 18.286 | 22.0 | 14.0 | -36.720129 | 144.588398 | -37.695893 | 146.430793 | 196.154 | 2018-02-15 | 17:04:44 | 196.60 | 20:21:19 | 121.11 |
37891 | ID5972337482 | 1.0 | 1.0 | 11.538 | 33.0 | 2.0 | -38.836802 | 144.357057 | -37.549931 | 148.306793 | 374.024 | 2018-05-26 | 02:07:50 | 368.73 | 8:16:33 | 141.06 |
37892 | NaN | 3.0 | 1.0 | 5.416 | 20.0 | 33.0 | -38.959090 | 148.294700 | -38.930545 | 144.661887 | 314.514 | 2018-04-27 | 20:53:02 | 255.21 | 1:08:14 | 185.96 |
37893 | ID1539650034 | 2.0 | 0.0 | 34.355 | 8.0 | 39.0 | -38.520278 | 144.408786 | -38.447195 | 148.416066 | 349.251 | 2018-05-09 | 07:26:49 | 319.33 | 12:46:08 | 121.30 |
37894 | NaN | 2.0 | 0.0 | 41.232 | 38.0 | 39.0 | -37.657406 | 142.777301 | -38.622040 | 148.366529 | 500.901 | 2018-07-02 | 08:59:29 | 455.14 | 16:34:37 | 139.79 |
37895 | ID1796943211 | 1.0 | 0.0 | 44.341 | 23.0 | 24.0 | -37.777223 | 146.024184 | -38.913981 | 142.913934 | 299.552 | 2018-02-20 | 05:08:12 | 296.66 | 10:04:51 | 113.70 |
37896 | ID5429883749 | 2.0 | 1.0 | 17.798 | 11.0 | 40.0 | -38.045551 | 146.736254 | -37.633007 | 147.639273 | 91.711 | 2018-05-03 | 10:19:32 | 88.70 | 11:48:14 | 130.51 |
37897 | NaN | 1.0 | 0.0 | 8.865 | 9.0 | 2.0 | -38.839254 | 145.226776 | -37.695101 | 148.251214 | 293.394 | 2018-03-11 | 12:18:21 | 290.70 | 17:09:02 | 88.71 |
37898 | NaN | 3.0 | 1.0 | 27.153 | 39.0 | 16.0 | -38.446310 | 148.292498 | -36.739777 | 143.604529 | 454.968 | 2018-07-23 | 08:29:19 | 366.09 | 14:35:24 | 188.49 |
37899 | ID5862552991 | 1.0 | 1.0 | 40.363 | 9.0 | 38.0 | -38.983710 | 145.320518 | -37.673908 | 142.879230 | 258.259 | 2018-06-26 | 15:55:37 | 256.70 | 20:12:18 | 122.98 |
37900 | ID5339104082 | 1.0 | 1.0 | 35.955 | 13.0 | 32.0 | -38.292301 | 147.562013 | -36.605285 | 148.293183 | 198.597 | 2018-03-19 | 16:41:10 | 198.97 | 20:00:08 | 118.47 |
37901 | ID5468787866 | 2.0 | 1.0 | 29.566 | 33.0 | 23.0 | -38.853243 | 144.508346 | -37.727691 | 145.662270 | 160.816 | 2018-02-26 | 04:22:30 | 150.58 | 6:53:04 | 161.96 |
37902 | ID1448126768 | 3.0 | 0.0 | 44.070 | 36.0 | 34.0 | -37.129313 | 145.266426 | -38.428477 | 143.341632 | 222.687 | 2018-07-07 | 08:01:42 | 182.71 | 11:04:24 | 144.41 |
37902 rows × 16 columns
# axis=1针对的是行;=0针对的是列
data[data.isnull().any(axis=1)].shape
#(data.isnull().sum(axis=1) >= 1).sum()
data[data.isnull().any(axis=1)].shape
data.isnull().any(axis=1)
0 False
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
...
37873 True
37874 True
37875 True
37876 True
37877 True
37878 False
37879 True
37880 False
37881 False
37882 False
37883 False
37884 False
37885 True
37886 False
37887 False
37888 True
37889 False
37890 False
37891 False
37892 True
37893 False
37894 True
37895 False
37896 False
37897 True
37898 True
37899 False
37900 False
37901 False
37902 False
dtype: bool
4.0 数据清洗
任务9: 填补 'Id'列的空值
任务9.1 统计‘id’列有多少个空值
data['Id'].isnull().sum()
25
任务9.2 找出所有‘id’为空的行
data[data['Id'].isnull()]
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
37844 | NaN | 1.0 | 0.0 | 22.498 | 30.0 | 15.0 | -37.885792 | 144.875305 | -38.680341 | 145.874064 | 124.252 | 2018-01-28 | 13:07:09 | 127.02 | 15:14:10 | 74.69 |
37845 | NaN | 3.0 | 0.0 | 32.300 | 16.0 | 36.0 | -36.571169 | 143.741010 | -36.993435 | 144.983048 | 120.297 | 2018-03-26 | 02:34:49 | 101.88 | 4:16:41 | 162.20 |
37846 | NaN | 3.0 | 0.0 | 18.601 | 38.0 | 29.0 | -37.694132 | 142.851548 | -37.058014 | 142.873698 | 70.838 | 2018-03-02 | 20:33:11 | 62.83 | 21:36:00 | 142.10 |
37850 | NaN | 2.0 | 1.0 | 28.203 | 18.0 | 21.0 | -38.139260 | 143.345778 | -37.279348 | 143.604189 | 98.391 | 2018-04-17 | 03:28:41 | 94.68 | 5:03:21 | 152.30 |
37851 | NaN | 2.0 | 1.0 | 45.696 | 28.0 | 1.0 | -38.152835 | 147.793072 | -38.162103 | 144.043047 | 328.220 | 2018-03-25 | 13:50:57 | 300.50 | 18:51:27 | 167.39 |
37852 | NaN | 1.0 | 0.0 | 27.143 | 38.0 | 32.0 | -37.613135 | 142.854194 | -36.713765 | 148.383062 | 500.500 | 2018-05-12 | 08:54:52 | 491.13 | 17:05:59 | 97.44 |
37854 | NaN | 1.0 | 0.0 | 13.002 | 27.0 | 3.0 | -37.428594 | 147.056992 | -38.383631 | 145.590528 | 167.005 | 2018-02-20 | 03:35:43 | 168.39 | 6:24:06 | 92.12 |
37857 | NaN | 1.0 | 0.0 | 19.468 | 5.0 | 31.0 | -37.570514 | 145.253281 | -37.824303 | 143.862472 | 125.716 | 2018-04-20 | 11:48:14 | 128.43 | 13:56:39 | 69.86 |
37860 | NaN | 1.0 | 0.0 | 10.647 | 21.0 | 5.0 | -37.239046 | 143.524656 | -37.546884 | 145.204154 | 152.434 | 2018-06-13 | 23:05:39 | 154.29 | 1:39:56 | 95.65 |
37861 | NaN | 2.0 | 0.0 | 40.775 | 22.0 | 19.0 | -36.678761 | 144.345069 | -37.167099 | 144.257043 | 54.922 | 2018-03-27 | 04:31:43 | 55.75 | 5:27:28 | 120.70 |
37863 | NaN | 1.0 | 0.0 | 35.410 | 9.0 | 16.0 | -39.006941 | 145.406988 | -36.525200 | 143.517906 | 322.399 | 2018-07-21 | 06:48:39 | 318.77 | 12:07:25 | 88.14 |
37866 | NaN | 1.0 | 1.0 | 40.151 | 14.0 | 34.0 | -37.644061 | 146.625820 | -38.518928 | 143.369714 | 301.447 | 2018-04-14 | 07:29:13 | 298.50 | 12:27:43 | 118.89 |
37869 | NaN | 3.0 | 1.0 | 44.559 | 33.0 | 10.0 | -38.733722 | 144.474460 | -37.220132 | 147.498884 | 314.323 | 2018-04-18 | 04:57:37 | 255.06 | 9:12:40 | 201.89 |
37873 | NaN | 1.0 | 0.0 | 39.038 | 7.0 | 22.0 | -38.664403 | 143.702592 | -36.670478 | 144.312065 | 228.361 | 2018-02-02 | 18:03:25 | 227.77 | 21:51:11 | 87.87 |
37874 | NaN | 1.0 | 0.0 | 7.171 | 14.0 | 1.0 | -37.589382 | 146.559422 | -38.229669 | 143.710836 | 260.125 | 2018-02-03 | 22:11:31 | 258.51 | 2:30:01 | 106.47 |
37875 | NaN | 1.0 | 0.0 | 39.141 | 5.0 | 1.0 | -37.491746 | 145.168879 | -38.168680 | 143.936761 | 131.959 | 2018-04-19 | 22:54:02 | 134.48 | 1:08:30 | 94.83 |
37876 | NaN | 2.0 | 0.0 | 42.330 | 25.0 | 5.0 | -36.618297 | 147.643165 | -37.563856 | 145.038804 | 254.069 | 2018-01-27 | 11:06:09 | 234.09 | 15:00:14 | 116.10 |
37877 | NaN | 1.0 | 1.0 | 43.965 | 25.0 | 38.0 | -36.542082 | 147.863736 | -37.549355 | 142.980000 | 448.104 | 2018-06-14 | 22:15:50 | 440.42 | 5:36:15 | 149.26 |
37879 | NaN | 1.0 | 0.0 | 18.540 | 36.0 | 18.0 | -36.974023 | 145.036046 | -38.015640 | 142.955101 | 217.299 | 2018-05-28 | 07:14:50 | 217.06 | 10:51:53 | 76.31 |
37885 | NaN | 1.0 | 0.0 | 13.544 | 34.0 | 38.0 | -38.528632 | 143.386140 | -37.561137 | 142.835312 | 118.028 | 2018-04-09 | 11:48:46 | 120.99 | 13:49:45 | 69.88 |
37888 | NaN | 1.0 | 1.0 | 34.923 | 36.0 | 28.0 | -37.044972 | 144.920592 | -38.218799 | 148.050871 | 305.307 | 2018-07-20 | 03:13:54 | 302.23 | 8:16:07 | 130.70 |
37892 | NaN | 3.0 | 1.0 | 5.416 | 20.0 | 33.0 | -38.959090 | 148.294700 | -38.930545 | 144.661887 | 314.514 | 2018-04-27 | 20:53:02 | 255.21 | 1:08:14 | 185.96 |
37894 | NaN | 2.0 | 0.0 | 41.232 | 38.0 | 39.0 | -37.657406 | 142.777301 | -38.622040 | 148.366529 | 500.901 | 2018-07-02 | 08:59:29 | 455.14 | 16:34:37 | 139.79 |
37897 | NaN | 1.0 | 0.0 | 8.865 | 9.0 | 2.0 | -38.839254 | 145.226776 | -37.695101 | 148.251214 | 293.394 | 2018-03-11 | 12:18:21 | 290.70 | 17:09:02 | 88.71 |
37898 | NaN | 3.0 | 1.0 | 27.153 | 39.0 | 16.0 | -38.446310 | 148.292498 | -36.739777 | 143.604529 | 454.968 | 2018-07-23 | 08:29:19 | 366.09 | 14:35:24 | 188.49 |
9.2.1 删除 除ID列之外其余数据重复的行
data[
['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
'Destination Region', 'Origin Latitude', 'Origin Longitude',
'Destination Latitude', 'Destination Longitude', 'Journey Distance',
'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
'Delivery Fare']
]
Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2.0 | 0.0 | 21.686 | 19.0 | 38.0 | -37.089338 | 144.429529 | -37.639134 | 142.891391 | 149.212 | 2018-01-16 | 09:38:17 | 140.19 | 11:58:28 | 99.25 |
1 | NaN | 0.0 | 39.075 | 15.0 | 15.0 | -38.481935 | 146.009567 | -38.585528 | 146.199827 | 20.185 | 2018-02-10 | 04:28:17 | 22.84 | 4:51:07 | 149.04 |
2 | 2.0 | 0.0 | 7.243 | 33.0 | 28.0 | -38.754167 | 144.509664 | -38.242224 | 147.855342 | 296.975 | 2018-05-05 | 01:38:03 | 272.52 | 6:10:34 | 141.48 |
3 | 2.0 | 0.0 | 13.383 | 10.0 | 38.0 | -37.240526 | 147.568019 | -37.687178 | 142.991188 | 407.396 | 2018-06-11 | 11:43:04 | 371.40 | 17:54:27 | 122.82 |
4 | 2.0 | 0.0 | 8.123 | 1.0 | 8.0 | -38.143985 | 143.798292 | -38.548315 | 144.769228 | 95.974 | 2018-03-16 | 14:50:25 | 92.51 | 16:22:55 | 111.97 |
5 | 2.0 | 0.0 | 32.859 | 2.0 | 28.0 | -37.421211 | 148.044072 | -38.159627 | 148.194048 | 83.250 | 2018-05-15 | 16:35:50 | 81.12 | 17:56:57 | 113.88 |
6 | 1.0 | 0.0 | 20.616 | 29.0 | 36.0 | -37.173949 | 143.140662 | -37.021605 | 145.197043 | 183.363 | 2018-04-01 | 19:31:12 | 184.22 | 22:35:25 | 85.60 |
7 | 2.0 | 0.0 | 44.577 | 36.0 | 31.0 | -37.123190 | 145.236196 | -37.667199 | 143.877650 | 134.543 | 2018-05-01 | 18:39:36 | 127.05 | 20:46:38 | 114.22 |
8 | 1.0 | 0.0 | 15.363 | 20.0 | 30.0 | -38.850561 | 148.317253 | -38.024914 | 144.823938 | 318.132 | 2018-05-27 | 14:48:17 | 314.64 | 20:02:55 | 87.39 |
9 | 1.0 | 1.0 | 36.190 | 18.0 | 28.0 | -38.070189 | 142.950207 | -37.996817 | 148.026520 | 445.106 | 2018-06-17 | 12:53:02 | 437.52 | 20:10:33 | 142.95 |
10 | 2.0 | 0.0 | 23.172 | 13.0 | 27.0 | -38.225456 | 147.425515 | -37.642798 | 147.124104 | 70.051 | 2018-03-19 | 09:59:10 | 69.30 | 11:08:27 | 96.95 |
11 | 2.0 | 0.0 | 19.754 | 23.0 | 26.0 | -37.625368 | 145.838281 | -36.789955 | 147.133916 | 147.791 | 2018-02-28 | 17:40:59 | 138.92 | 19:59:54 | 117.48 |
12 | 3.0 | 1.0 | 12.807 | 4.0 | 6.0 | -36.855984 | 142.929596 | -36.906838 | 145.696986 | 246.465 | 2018-03-26 | 07:55:48 | 201.49 | 11:17:17 | 173.32 |
13 | 2.0 | 0.0 | 22.332 | 33.0 | 19.0 | -38.894115 | 144.457143 | -37.173740 | 144.152105 | 193.365 | 2018-06-28 | 16:05:55 | 179.73 | 19:05:38 | 117.67 |
14 | 2.0 | 1.0 | 25.880 | 33.0 | 5.0 | -38.872372 | 144.606034 | -37.553304 | 145.120753 | 153.580 | 2018-05-09 | 08:33:29 | 144.10 | 10:57:34 | 132.98 |
15 | 2.0 | 1.0 | 38.691 | 7.0 | 15.0 | -38.844622 | 144.093195 | -38.476630 | 145.849992 | 158.102 | 2018-01-05 | 15:55:00 | 148.15 | 18:23:09 | 147.73 |
16 | 1.0 | 0.0 | 30.742 | 19.0 | 14.0 | -37.178059 | 144.403991 | -37.713867 | 146.382965 | 184.783 | 2018-06-05 | 13:03:43 | 185.60 | 16:09:18 | 84.82 |
17 | 1.0 | 1.0 | 18.055 | 10.0 | 18.0 | -37.141728 | 147.256091 | -37.983127 | 143.290272 | 362.227 | 2018-05-04 | 07:42:51 | 357.32 | 13:40:10 | 116.35 |
18 | 1.0 | 0.0 | 5.986 | 23.0 | 18.0 | -37.706960 | 145.718119 | -37.983127 | 143.182464 | 224.997 | 2018-05-23 | 12:53:12 | 224.51 | 16:37:42 | 82.47 |
19 | 2.0 | 1.0 | 30.664 | 13.0 | 14.0 | -38.250238 | 147.366610 | -37.774617 | 146.503568 | 92.371 | 2018-07-11 | 12:52:57 | 89.29 | 14:22:14 | 143.96 |
20 | 3.0 | 0.0 | 24.942 | 3.0 | 39.0 | -38.322643 | 145.505910 | -38.453191 | 148.300405 | 244.251 | 2018-03-14 | 22:16:33 | 199.74 | 1:36:17 | 167.45 |
21 | 1.0 | 0.0 | 34.012 | 27.0 | 37.0 | -37.516090 | 146.969053 | -38.852488 | 147.816987 | 166.236 | 2018-07-17 | 16:26:55 | 167.65 | 19:14:34 | 88.98 |
22 | 1.0 | 1.0 | 55.229 | 7.0 | 6.0 | -38.876145 | 143.911302 | -36.875012 | 145.759049 | 275.633 | 2018-05-01 | 12:42:19 | 273.52 | 17:15:50 | 688.24 |
23 | 3.0 | 0.0 | 40.741 | 33.0 | 17.0 | -38.756634 | 144.375564 | -38.817727 | 147.071495 | 234.014 | 2018-06-21 | 01:04:15 | 191.66 | 4:15:54 | 165.37 |
24 | 3.0 | 1.0 | 15.419 | 35.0 | 21.0 | -36.922413 | 146.362349 | -37.240137 | 143.768208 | 233.068 | 2018-01-05 | 07:19:48 | 190.91 | 10:30:42 | 174.52 |
25 | 2.0 | 0.0 | 10.663 | 29.0 | 1.0 | -37.019910 | 142.798295 | -38.358124 | 143.943955 | 179.929 | 2018-02-27 | 04:42:03 | 167.70 | 7:29:44 | 124.95 |
26 | 1.0 | 0.0 | 20.685 | 9.0 | 35.0 | -38.970092 | 145.435801 | -37.068624 | 146.317717 | 225.349 | 2018-05-15 | 15:35:10 | 224.85 | 19:20:00 | 80.66 |
27 | 2.0 | 0.0 | 10.272 | 8.0 | 34.0 | -38.485641 | 144.522135 | -38.505484 | 143.311788 | 105.471 | 2018-02-05 | 06:13:29 | 101.02 | 7:54:30 | 102.09 |
28 | 1.0 | 0.0 | 7.373 | 16.0 | 38.0 | -36.600877 | 143.566811 | -37.804113 | 142.793139 | 150.483 | 2018-06-18 | 14:01:42 | 152.40 | 16:34:06 | 82.11 |
29 | 2.0 | 0.0 | 31.358 | 13.0 | 16.0 | -38.195911 | 147.436921 | -36.734698 | 143.764279 | 362.941 | 2018-04-24 | 22:40:54 | 331.59 | 4:12:29 | 147.21 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37873 | 1.0 | 0.0 | 39.038 | 7.0 | 22.0 | -38.664403 | 143.702592 | -36.670478 | 144.312065 | 228.361 | 2018-02-02 | 18:03:25 | 227.77 | 21:51:11 | 87.87 |
37874 | 1.0 | 0.0 | 7.171 | 14.0 | 1.0 | -37.589382 | 146.559422 | -38.229669 | 143.710836 | 260.125 | 2018-02-03 | 22:11:31 | 258.51 | 2:30:01 | 106.47 |
37875 | 1.0 | 0.0 | 39.141 | 5.0 | 1.0 | -37.491746 | 145.168879 | -38.168680 | 143.936761 | 131.959 | 2018-04-19 | 22:54:02 | 134.48 | 1:08:30 | 94.83 |
37876 | 2.0 | 0.0 | 42.330 | 25.0 | 5.0 | -36.618297 | 147.643165 | -37.563856 | 145.038804 | 254.069 | 2018-01-27 | 11:06:09 | 234.09 | 15:00:14 | 116.10 |
37877 | 1.0 | 1.0 | 43.965 | 25.0 | 38.0 | -36.542082 | 147.863736 | -37.549355 | 142.980000 | 448.104 | 2018-06-14 | 22:15:50 | 440.42 | 5:36:15 | 149.26 |
37878 | 2.0 | 0.0 | 27.560 | 7.0 | 35.0 | -38.689638 | 144.093996 | -37.046945 | 146.460389 | 276.894 | 2018-02-22 | 23:43:48 | 254.53 | 3:58:19 | 135.68 |
37879 | 1.0 | 0.0 | 18.540 | 36.0 | 18.0 | -36.974023 | 145.036046 | -38.015640 | 142.955101 | 217.299 | 2018-05-28 | 07:14:50 | 217.06 | 10:51:53 | 76.31 |
37880 | 1.0 | 0.0 | 44.226 | 40.0 | 3.0 | -37.740012 | 147.768225 | -38.276119 | 145.403536 | 215.813 | 2018-02-21 | 21:28:22 | 215.63 | 1:03:59 | 105.89 |
37881 | 2.0 | 0.0 | 6.415 | 25.0 | 27.0 | -36.585246 | 147.853062 | -37.700285 | 147.027318 | 144.135 | 2018-01-16 | 08:26:33 | 135.64 | 10:42:11 | 100.45 |
37882 | 3.0 | 1.0 | 30.471 | 3.0 | 15.0 | -38.345519 | 145.542105 | -38.712933 | 145.977537 | 55.772 | 2018-07-22 | 00:13:54 | 50.94 | 1:04:50 | 183.73 |
37883 | 2.0 | 1.0 | 6.553 | 5.0 | 15.0 | -37.625311 | 145.097611 | -38.630132 | 146.040376 | 139.018 | 2018-01-14 | 16:51:56 | 131.06 | 19:02:59 | 145.71 |
37884 | 1.0 | 0.0 | 37.850 | 38.0 | 10.0 | -37.510578 | 142.826103 | -37.337327 | 147.315144 | 397.278 | 2018-04-27 | 14:54:35 | 391.24 | 21:25:49 | 102.88 |
37885 | 1.0 | 0.0 | 13.544 | 34.0 | 38.0 | -38.528632 | 143.386140 | -37.561137 | 142.835312 | 118.028 | 2018-04-09 | 11:48:46 | 120.99 | 13:49:45 | 69.88 |
37886 | 2.0 | 0.0 | 38.656 | 1.0 | 32.0 | -38.171542 | 143.871582 | -36.756484 | 148.322553 | 423.585 | 2018-03-18 | 01:01:33 | 385.90 | 7:27:26 | 142.19 |
37887 | 1.0 | 0.0 | 23.699 | 10.0 | 10.0 | -37.352794 | 147.476662 | -37.104700 | 147.417027 | 28.118 | 2018-05-24 | 00:41:08 | 33.99 | 1:15:07 | 83.77 |
37888 | 1.0 | 1.0 | 34.923 | 36.0 | 28.0 | -37.044972 | 144.920592 | -38.218799 | 148.050871 | 305.307 | 2018-07-20 | 03:13:54 | 302.23 | 8:16:07 | 130.70 |
37889 | 1.0 | 0.0 | 34.130 | 8.0 | 36.0 | -38.434373 | 144.730220 | -37.047801 | 145.309517 | 162.554 | 2018-02-06 | 06:27:55 | 164.08 | 9:11:59 | 69.44 |
37890 | 1.0 | 1.0 | 18.286 | 22.0 | 14.0 | -36.720129 | 144.588398 | -37.695893 | 146.430793 | 196.154 | 2018-02-15 | 17:04:44 | 196.60 | 20:21:19 | 121.11 |
37891 | 1.0 | 1.0 | 11.538 | 33.0 | 2.0 | -38.836802 | 144.357057 | -37.549931 | 148.306793 | 374.024 | 2018-05-26 | 02:07:50 | 368.73 | 8:16:33 | 141.06 |
37892 | 3.0 | 1.0 | 5.416 | 20.0 | 33.0 | -38.959090 | 148.294700 | -38.930545 | 144.661887 | 314.514 | 2018-04-27 | 20:53:02 | 255.21 | 1:08:14 | 185.96 |
37893 | 2.0 | 0.0 | 34.355 | 8.0 | 39.0 | -38.520278 | 144.408786 | -38.447195 | 148.416066 | 349.251 | 2018-05-09 | 07:26:49 | 319.33 | 12:46:08 | 121.30 |
37894 | 2.0 | 0.0 | 41.232 | 38.0 | 39.0 | -37.657406 | 142.777301 | -38.622040 | 148.366529 | 500.901 | 2018-07-02 | 08:59:29 | 455.14 | 16:34:37 | 139.79 |
37895 | 1.0 | 0.0 | 44.341 | 23.0 | 24.0 | -37.777223 | 146.024184 | -38.913981 | 142.913934 | 299.552 | 2018-02-20 | 05:08:12 | 296.66 | 10:04:51 | 113.70 |
37896 | 2.0 | 1.0 | 17.798 | 11.0 | 40.0 | -38.045551 | 146.736254 | -37.633007 | 147.639273 | 91.711 | 2018-05-03 | 10:19:32 | 88.70 | 11:48:14 | 130.51 |
37897 | 1.0 | 0.0 | 8.865 | 9.0 | 2.0 | -38.839254 | 145.226776 | -37.695101 | 148.251214 | 293.394 | 2018-03-11 | 12:18:21 | 290.70 | 17:09:02 | 88.71 |
37898 | 3.0 | 1.0 | 27.153 | 39.0 | 16.0 | -38.446310 | 148.292498 | -36.739777 | 143.604529 | 454.968 | 2018-07-23 | 08:29:19 | 366.09 | 14:35:24 | 188.49 |
37899 | 1.0 | 1.0 | 40.363 | 9.0 | 38.0 | -38.983710 | 145.320518 | -37.673908 | 142.879230 | 258.259 | 2018-06-26 | 15:55:37 | 256.70 | 20:12:18 | 122.98 |
37900 | 1.0 | 1.0 | 35.955 | 13.0 | 32.0 | -38.292301 | 147.562013 | -36.605285 | 148.293183 | 198.597 | 2018-03-19 | 16:41:10 | 198.97 | 20:00:08 | 118.47 |
37901 | 2.0 | 1.0 | 29.566 | 33.0 | 23.0 | -38.853243 | 144.508346 | -37.727691 | 145.662270 | 160.816 | 2018-02-26 | 04:22:30 | 150.58 | 6:53:04 | 161.96 |
37902 | 3.0 | 0.0 | 44.070 | 36.0 | 34.0 | -37.129313 | 145.266426 | -38.428477 | 143.341632 | 222.687 | 2018-07-07 | 08:01:42 | 182.71 | 11:04:24 | 144.41 |
37903 rows × 15 columns
# drop_duplicates返回一个dataframe,重复的行会标为False
# data_1 = data.drop_duplicates(data[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
# 'Destination Region', 'Origin Latitude', 'Origin Longitude',
# 'Destination Latitude', 'Destination Longitude', 'Journey Distance',
# 'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
# 'Delivery Fare']])
# 2
data_1 = data.drop_duplicates(['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
'Destination Region', 'Origin Latitude', 'Origin Longitude',
'Destination Latitude', 'Destination Longitude', 'Journey Distance',
'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
'Delivery Fare'])
data_1
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ID1645282128 | 2.0 | 0.0 | 21.686 | 19.0 | 38.0 | -37.089338 | 144.429529 | -37.639134 | 142.891391 | 149.212 | 2018-01-16 | 09:38:17 | 140.19 | 11:58:28 | 99.25 |
1 | ID1697620764 | NaN | 0.0 | 39.075 | 15.0 | 15.0 | -38.481935 | 146.009567 | -38.585528 | 146.199827 | 20.185 | 2018-02-10 | 04:28:17 | 22.84 | 4:51:07 | 149.04 |
2 | ID1543933503 | 2.0 | 0.0 | 7.243 | 33.0 | 28.0 | -38.754167 | 144.509664 | -38.242224 | 147.855342 | 296.975 | 2018-05-05 | 01:38:03 | 272.52 | 6:10:34 | 141.48 |
3 | ID1756517608 | 2.0 | 0.0 | 13.383 | 10.0 | 38.0 | -37.240526 | 147.568019 | -37.687178 | 142.991188 | 407.396 | 2018-06-11 | 11:43:04 | 371.40 | 17:54:27 | 122.82 |
4 | ID1832325834 | 2.0 | 0.0 | 8.123 | 1.0 | 8.0 | -38.143985 | 143.798292 | -38.548315 | 144.769228 | 95.974 | 2018-03-16 | 14:50:25 | 92.51 | 16:22:55 | 111.97 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37839 | ID1879423081 | 2.0 | 0.0 | 17.081 | 20.0 | 14.0 | -38.860358 | 148.174855 | -37.562224 | 146.443991 | 209.278 | 2018-02-15 | 12:09:15 | 193.98 | 15:23:13 | 121.33 |
37840 | ID5705840841 | 1.0 | 1.0 | 35.164 | 29.0 | 23.0 | -37.250331 | 142.837244 | -37.739285 | 145.963420 | 281.403 | 2018-01-01 | 00:07:35 | 279.10 | 4:46:41 | 136.58 |
37841 | ID1276239209 | 3.0 | 0.0 | 36.704 | 12.0 | 11.0 | -36.578568 | 145.273744 | -38.305557 | 146.997297 | 245.269 | 2018-01-24 | 06:48:05 | 200.54 | 10:08:37 | 146.69 |
37842 | ID1432868583 | 3.0 | 0.0 | 13.195 | 37.0 | 21.0 | -38.755802 | 147.744770 | -37.487866 | 143.585293 | 390.599 | 2018-04-07 | 18:57:52 | 315.28 | 0:13:08 | 170.52 |
37889 | ID1281653747 | 1.0 | 0.0 | 34.130 | 8.0 | 36.0 | -38.434373 | 144.730220 | -37.047801 | 145.309517 | 162.554 | 2018-02-06 | 06:27:55 | 164.08 | 9:11:59 | 69.44 |
37844 rows × 16 columns
9.2.2 设置ID列相同,但其余数据不重复的ID为np.nan
# 返回一个布尔型的series,表示id是否重复行
data_1['Id'].duplicated()
0 False
1 False
2 False
3 False
4 False
...
37839 False
37840 False
37841 False
37842 False
37889 True
Name: Id, Length: 37844, dtype: bool
(data_1['Id'].duplicated()) & ~(data_1[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
'Destination Region', 'Origin Latitude', 'Origin Longitude',
'Destination Latitude', 'Destination Longitude', 'Journey Distance',
'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
'Delivery Fare']].duplicated())
0 False
1 False
2 False
3 False
4 False
...
37839 False
37840 False
37841 False
37842 False
37889 True
Length: 37844, dtype: bool
data_1.loc[(data_1['Id'].duplicated()) & ~(data_1[['Drone Type', 'Post Type', 'Package Weight', 'Origin Region',
'Destination Region', 'Origin Latitude', 'Origin Longitude',
'Destination Latitude', 'Destination Longitude', 'Journey Distance',
'Departure Date', 'Departure Time', 'Travel Time', 'Delivery Time',
'Delivery Fare']].duplicated()),'Id'] = np.nan
data_1['Id'].value_counts()
ID1858018960 1
ID1792956453 1
ID1574575344 1
ID1146475421 1
ID1897230811 1
..
ID5349642032 1
ID5451065609 1
ID5131869013 1
ID1550669799 1
ID5260585195 1
Name: Id, Length: 37843, dtype: int64
任务9.3 想出空值的填补的2种方法,并且选一种应用
#方法1 :随机数
import random
# 'Id'+str(random.randint(1000000000,2000000000)) in data_1['Id']
#方法2 :随机数填补
def fill_id(Id):
Id_out = Id
while Id_out in data_1['Id'].tolist():
Id_out = 'ID'+str(random.randint(1000000000,2000000000))
return Id_out
data_1.loc[(data_1['Post Type']==0)&(data_1['Id'].isnull()),'Id'].apply(fill_id)
37889 ID1117118084
Name: Id, dtype: object
data_1.loc[(data_1['Post Type']==0)&(data_1['Id'].isnull()),'Id']
37889 NaN
Name: Id, dtype: object
data_1['Id']
0 ID1645282128
1 ID1697620764
2 ID1543933503
3 ID1756517608
4 ID1832325834
...
37839 ID1879423081
37840 ID5705840841
37841 ID1276239209
37842 ID1432868583
37889 NaN
Name: Id, Length: 37844, dtype: object
任务9.4 检查‘id’是否还有空值
data_1['Id'].isnull().sum()
1
任务10:找出所有重复的id
data[data.Id.duplicated()].Id
37843 ID1874340610
37845 NaN
37846 NaN
37847 ID5156350605
37848 ID1176413101
...
37898 NaN
37899 ID5862552991
37900 ID5339104082
37901 ID5468787866
37902 ID1448126768
Name: Id, Length: 59, dtype: object
data['Id'].value_counts() >= 2
ID5281864060 True
ID1796943211 True
ID1877344172 True
ID5122284320 True
ID1238297934 True
...
ID5672029782 False
ID1114364309 False
ID5495523518 False
ID1874532678 False
ID5260585195 False
Name: Id, Length: 37843, dtype: bool
任务11:删除重复行
data.duplicated()
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
...
37873 False
37874 False
37875 False
37876 False
37877 False
37878 True
37879 False
37880 True
37881 True
37882 True
37883 True
37884 True
37885 False
37886 True
37887 True
37888 False
37889 False
37890 True
37891 True
37892 False
37893 True
37894 False
37895 True
37896 True
37897 False
37898 False
37899 True
37900 True
37901 True
37902 True
dtype: bool
data.drop_duplicates()
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ID1645282128 | 2.0 | 0.0 | 21.686 | 19.0 | 38.0 | -37.089338 | 144.429529 | -37.639134 | 142.891391 | 149.212 | 2018-01-16 | 09:38:17 | 140.19 | 11:58:28 | 99.25 |
1 | ID1697620764 | NaN | 0.0 | 39.075 | 15.0 | 15.0 | -38.481935 | 146.009567 | -38.585528 | 146.199827 | 20.185 | 2018-02-10 | 04:28:17 | 22.84 | 4:51:07 | 149.04 |
2 | ID1543933503 | 2.0 | 0.0 | 7.243 | 33.0 | 28.0 | -38.754167 | 144.509664 | -38.242224 | 147.855342 | 296.975 | 2018-05-05 | 01:38:03 | 272.52 | 6:10:34 | 141.48 |
3 | ID1756517608 | 2.0 | 0.0 | 13.383 | 10.0 | 38.0 | -37.240526 | 147.568019 | -37.687178 | 142.991188 | 407.396 | 2018-06-11 | 11:43:04 | 371.40 | 17:54:27 | 122.82 |
4 | ID1832325834 | 2.0 | 0.0 | 8.123 | 1.0 | 8.0 | -38.143985 | 143.798292 | -38.548315 | 144.769228 | 95.974 | 2018-03-16 | 14:50:25 | 92.51 | 16:22:55 | 111.97 |
5 | ID1802448576 | 2.0 | 0.0 | 32.859 | 2.0 | 28.0 | -37.421211 | 148.044072 | -38.159627 | 148.194048 | 83.250 | 2018-05-15 | 16:35:50 | 81.12 | 17:56:57 | 113.88 |
6 | ID1940231408 | 1.0 | 0.0 | 20.616 | 29.0 | 36.0 | -37.173949 | 143.140662 | -37.021605 | 145.197043 | 183.363 | 2018-04-01 | 19:31:12 | 184.22 | 22:35:25 | 85.60 |
7 | ID1299303958 | 2.0 | 0.0 | 44.577 | 36.0 | 31.0 | -37.123190 | 145.236196 | -37.667199 | 143.877650 | 134.543 | 2018-05-01 | 18:39:36 | 127.05 | 20:46:38 | 114.22 |
8 | ID1752722028 | 1.0 | 0.0 | 15.363 | 20.0 | 30.0 | -38.850561 | 148.317253 | -38.024914 | 144.823938 | 318.132 | 2018-05-27 | 14:48:17 | 314.64 | 20:02:55 | 87.39 |
9 | ID5995243590 | 1.0 | 1.0 | 36.190 | 18.0 | 28.0 | -38.070189 | 142.950207 | -37.996817 | 148.026520 | 445.106 | 2018-06-17 | 12:53:02 | 437.52 | 20:10:33 | 142.95 |
10 | ID1483358088 | 2.0 | 0.0 | 23.172 | 13.0 | 27.0 | -38.225456 | 147.425515 | -37.642798 | 147.124104 | 70.051 | 2018-03-19 | 09:59:10 | 69.30 | 11:08:27 | 96.95 |
11 | ID1626798395 | 2.0 | 0.0 | 19.754 | 23.0 | 26.0 | -37.625368 | 145.838281 | -36.789955 | 147.133916 | 147.791 | 2018-02-28 | 17:40:59 | 138.92 | 19:59:54 | 117.48 |
12 | ID5277549009 | 3.0 | 1.0 | 12.807 | 4.0 | 6.0 | -36.855984 | 142.929596 | -36.906838 | 145.696986 | 246.465 | 2018-03-26 | 07:55:48 | 201.49 | 11:17:17 | 173.32 |
13 | ID1950928883 | 2.0 | 0.0 | 22.332 | 33.0 | 19.0 | -38.894115 | 144.457143 | -37.173740 | 144.152105 | 193.365 | 2018-06-28 | 16:05:55 | 179.73 | 19:05:38 | 117.67 |
14 | ID5143738648 | 2.0 | 1.0 | 25.880 | 33.0 | 5.0 | -38.872372 | 144.606034 | -37.553304 | 145.120753 | 153.580 | 2018-05-09 | 08:33:29 | 144.10 | 10:57:34 | 132.98 |
15 | ID5132897910 | 2.0 | 1.0 | 38.691 | 7.0 | 15.0 | -38.844622 | 144.093195 | -38.476630 | 145.849992 | 158.102 | 2018-01-05 | 15:55:00 | 148.15 | 18:23:09 | 147.73 |
16 | ID1290889802 | 1.0 | 0.0 | 30.742 | 19.0 | 14.0 | -37.178059 | 144.403991 | -37.713867 | 146.382965 | 184.783 | 2018-06-05 | 13:03:43 | 185.60 | 16:09:18 | 84.82 |
17 | ID5226355535 | 1.0 | 1.0 | 18.055 | 10.0 | 18.0 | -37.141728 | 147.256091 | -37.983127 | 143.290272 | 362.227 | 2018-05-04 | 07:42:51 | 357.32 | 13:40:10 | 116.35 |
18 | ID1898978312 | 1.0 | 0.0 | 5.986 | 23.0 | 18.0 | -37.706960 | 145.718119 | -37.983127 | 143.182464 | 224.997 | 2018-05-23 | 12:53:12 | 224.51 | 16:37:42 | 82.47 |
19 | ID5284908619 | 2.0 | 1.0 | 30.664 | 13.0 | 14.0 | -38.250238 | 147.366610 | -37.774617 | 146.503568 | 92.371 | 2018-07-11 | 12:52:57 | 89.29 | 14:22:14 | 143.96 |
20 | ID1585556406 | 3.0 | 0.0 | 24.942 | 3.0 | 39.0 | -38.322643 | 145.505910 | -38.453191 | 148.300405 | 244.251 | 2018-03-14 | 22:16:33 | 199.74 | 1:36:17 | 167.45 |
21 | ID1901962779 | 1.0 | 0.0 | 34.012 | 27.0 | 37.0 | -37.516090 | 146.969053 | -38.852488 | 147.816987 | 166.236 | 2018-07-17 | 16:26:55 | 167.65 | 19:14:34 | 88.98 |
22 | ID5590279060 | 1.0 | 1.0 | 55.229 | 7.0 | 6.0 | -38.876145 | 143.911302 | -36.875012 | 145.759049 | 275.633 | 2018-05-01 | 12:42:19 | 273.52 | 17:15:50 | 688.24 |
23 | ID1473718059 | 3.0 | 0.0 | 40.741 | 33.0 | 17.0 | -38.756634 | 144.375564 | -38.817727 | 147.071495 | 234.014 | 2018-06-21 | 01:04:15 | 191.66 | 4:15:54 | 165.37 |
24 | ID5551646734 | 3.0 | 1.0 | 15.419 | 35.0 | 21.0 | -36.922413 | 146.362349 | -37.240137 | 143.768208 | 233.068 | 2018-01-05 | 07:19:48 | 190.91 | 10:30:42 | 174.52 |
25 | ID1772122934 | 2.0 | 0.0 | 10.663 | 29.0 | 1.0 | -37.019910 | 142.798295 | -38.358124 | 143.943955 | 179.929 | 2018-02-27 | 04:42:03 | 167.70 | 7:29:44 | 124.95 |
26 | ID1987608852 | 1.0 | 0.0 | 20.685 | 9.0 | 35.0 | -38.970092 | 145.435801 | -37.068624 | 146.317717 | 225.349 | 2018-05-15 | 15:35:10 | 224.85 | 19:20:00 | 80.66 |
27 | ID1249352358 | 2.0 | 0.0 | 10.272 | 8.0 | 34.0 | -38.485641 | 144.522135 | -38.505484 | 143.311788 | 105.471 | 2018-02-05 | 06:13:29 | 101.02 | 7:54:30 | 102.09 |
28 | ID1611614450 | 1.0 | 0.0 | 7.373 | 16.0 | 38.0 | -36.600877 | 143.566811 | -37.804113 | 142.793139 | 150.483 | 2018-06-18 | 14:01:42 | 152.40 | 16:34:06 | 82.11 |
29 | ID1262379299 | 2.0 | 0.0 | 31.358 | 13.0 | 16.0 | -38.195911 | 147.436921 | -36.734698 | 143.764279 | 362.941 | 2018-04-24 | 22:40:54 | 331.59 | 4:12:29 | 147.21 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37839 | ID1879423081 | 2.0 | 0.0 | 17.081 | 20.0 | 14.0 | -38.860358 | 148.174855 | -37.562224 | 146.443991 | 209.278 | 2018-02-15 | 12:09:15 | 193.98 | 15:23:13 | 121.33 |
37840 | ID5705840841 | 1.0 | 1.0 | 35.164 | 29.0 | 23.0 | -37.250331 | 142.837244 | -37.739285 | 145.963420 | 281.403 | 2018-01-01 | 00:07:35 | 279.10 | 4:46:41 | 136.58 |
37841 | ID1276239209 | 3.0 | 0.0 | 36.704 | 12.0 | 11.0 | -36.578568 | 145.273744 | -38.305557 | 146.997297 | 245.269 | 2018-01-24 | 06:48:05 | 200.54 | 10:08:37 | 146.69 |
37842 | ID1432868583 | 3.0 | 0.0 | 13.195 | 37.0 | 21.0 | -38.755802 | 147.744770 | -37.487866 | 143.585293 | 390.599 | 2018-04-07 | 18:57:52 | 315.28 | 0:13:08 | 170.52 |
37844 | NaN | 1.0 | 0.0 | 22.498 | 30.0 | 15.0 | -37.885792 | 144.875305 | -38.680341 | 145.874064 | 124.252 | 2018-01-28 | 13:07:09 | 127.02 | 15:14:10 | 74.69 |
37845 | NaN | 3.0 | 0.0 | 32.300 | 16.0 | 36.0 | -36.571169 | 143.741010 | -36.993435 | 144.983048 | 120.297 | 2018-03-26 | 02:34:49 | 101.88 | 4:16:41 | 162.20 |
37846 | NaN | 3.0 | 0.0 | 18.601 | 38.0 | 29.0 | -37.694132 | 142.851548 | -37.058014 | 142.873698 | 70.838 | 2018-03-02 | 20:33:11 | 62.83 | 21:36:00 | 142.10 |
37850 | NaN | 2.0 | 1.0 | 28.203 | 18.0 | 21.0 | -38.139260 | 143.345778 | -37.279348 | 143.604189 | 98.391 | 2018-04-17 | 03:28:41 | 94.68 | 5:03:21 | 152.30 |
37851 | NaN | 2.0 | 1.0 | 45.696 | 28.0 | 1.0 | -38.152835 | 147.793072 | -38.162103 | 144.043047 | 328.220 | 2018-03-25 | 13:50:57 | 300.50 | 18:51:27 | 167.39 |
37852 | NaN | 1.0 | 0.0 | 27.143 | 38.0 | 32.0 | -37.613135 | 142.854194 | -36.713765 | 148.383062 | 500.500 | 2018-05-12 | 08:54:52 | 491.13 | 17:05:59 | 97.44 |
37854 | NaN | 1.0 | 0.0 | 13.002 | 27.0 | 3.0 | -37.428594 | 147.056992 | -38.383631 | 145.590528 | 167.005 | 2018-02-20 | 03:35:43 | 168.39 | 6:24:06 | 92.12 |
37857 | NaN | 1.0 | 0.0 | 19.468 | 5.0 | 31.0 | -37.570514 | 145.253281 | -37.824303 | 143.862472 | 125.716 | 2018-04-20 | 11:48:14 | 128.43 | 13:56:39 | 69.86 |
37860 | NaN | 1.0 | 0.0 | 10.647 | 21.0 | 5.0 | -37.239046 | 143.524656 | -37.546884 | 145.204154 | 152.434 | 2018-06-13 | 23:05:39 | 154.29 | 1:39:56 | 95.65 |
37861 | NaN | 2.0 | 0.0 | 40.775 | 22.0 | 19.0 | -36.678761 | 144.345069 | -37.167099 | 144.257043 | 54.922 | 2018-03-27 | 04:31:43 | 55.75 | 5:27:28 | 120.70 |
37863 | NaN | 1.0 | 0.0 | 35.410 | 9.0 | 16.0 | -39.006941 | 145.406988 | -36.525200 | 143.517906 | 322.399 | 2018-07-21 | 06:48:39 | 318.77 | 12:07:25 | 88.14 |
37866 | NaN | 1.0 | 1.0 | 40.151 | 14.0 | 34.0 | -37.644061 | 146.625820 | -38.518928 | 143.369714 | 301.447 | 2018-04-14 | 07:29:13 | 298.50 | 12:27:43 | 118.89 |
37869 | NaN | 3.0 | 1.0 | 44.559 | 33.0 | 10.0 | -38.733722 | 144.474460 | -37.220132 | 147.498884 | 314.323 | 2018-04-18 | 04:57:37 | 255.06 | 9:12:40 | 201.89 |
37873 | NaN | 1.0 | 0.0 | 39.038 | 7.0 | 22.0 | -38.664403 | 143.702592 | -36.670478 | 144.312065 | 228.361 | 2018-02-02 | 18:03:25 | 227.77 | 21:51:11 | 87.87 |
37874 | NaN | 1.0 | 0.0 | 7.171 | 14.0 | 1.0 | -37.589382 | 146.559422 | -38.229669 | 143.710836 | 260.125 | 2018-02-03 | 22:11:31 | 258.51 | 2:30:01 | 106.47 |
37875 | NaN | 1.0 | 0.0 | 39.141 | 5.0 | 1.0 | -37.491746 | 145.168879 | -38.168680 | 143.936761 | 131.959 | 2018-04-19 | 22:54:02 | 134.48 | 1:08:30 | 94.83 |
37876 | NaN | 2.0 | 0.0 | 42.330 | 25.0 | 5.0 | -36.618297 | 147.643165 | -37.563856 | 145.038804 | 254.069 | 2018-01-27 | 11:06:09 | 234.09 | 15:00:14 | 116.10 |
37877 | NaN | 1.0 | 1.0 | 43.965 | 25.0 | 38.0 | -36.542082 | 147.863736 | -37.549355 | 142.980000 | 448.104 | 2018-06-14 | 22:15:50 | 440.42 | 5:36:15 | 149.26 |
37879 | NaN | 1.0 | 0.0 | 18.540 | 36.0 | 18.0 | -36.974023 | 145.036046 | -38.015640 | 142.955101 | 217.299 | 2018-05-28 | 07:14:50 | 217.06 | 10:51:53 | 76.31 |
37885 | NaN | 1.0 | 0.0 | 13.544 | 34.0 | 38.0 | -38.528632 | 143.386140 | -37.561137 | 142.835312 | 118.028 | 2018-04-09 | 11:48:46 | 120.99 | 13:49:45 | 69.88 |
37888 | NaN | 1.0 | 1.0 | 34.923 | 36.0 | 28.0 | -37.044972 | 144.920592 | -38.218799 | 148.050871 | 305.307 | 2018-07-20 | 03:13:54 | 302.23 | 8:16:07 | 130.70 |
37889 | ID1281653747 | 1.0 | 0.0 | 34.130 | 8.0 | 36.0 | -38.434373 | 144.730220 | -37.047801 | 145.309517 | 162.554 | 2018-02-06 | 06:27:55 | 164.08 | 9:11:59 | 69.44 |
37892 | NaN | 3.0 | 1.0 | 5.416 | 20.0 | 33.0 | -38.959090 | 148.294700 | -38.930545 | 144.661887 | 314.514 | 2018-04-27 | 20:53:02 | 255.21 | 1:08:14 | 185.96 |
37894 | NaN | 2.0 | 0.0 | 41.232 | 38.0 | 39.0 | -37.657406 | 142.777301 | -38.622040 | 148.366529 | 500.901 | 2018-07-02 | 08:59:29 | 455.14 | 16:34:37 | 139.79 |
37897 | NaN | 1.0 | 0.0 | 8.865 | 9.0 | 2.0 | -38.839254 | 145.226776 | -37.695101 | 148.251214 | 293.394 | 2018-03-11 | 12:18:21 | 290.70 | 17:09:02 | 88.71 |
37898 | NaN | 3.0 | 1.0 | 27.153 | 39.0 | 16.0 | -38.446310 | 148.292498 | -36.739777 | 143.604529 | 454.968 | 2018-07-23 | 08:29:19 | 366.09 | 14:35:24 | 188.49 |
37869 rows × 16 columns
任务12:填补 'Post Type'的空值
提示:可能与id有关
data_1[data_1['Post Type'].isnull()]
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2179 | ID5742860733 | 1.0 | NaN | 27.126 | 22.0 | 34.0 | -36.679647 | 144.463364 | -38.469625 | 143.202877 | 228.181 | 2018-02-28 | 14:01:50 | 227.59 | 17:49:25 | 114.27 |
4212 | ID1710452907 | 2.0 | NaN | 37.466 | 22.0 | 21.0 | -36.695380 | 144.474875 | -37.357030 | 143.508873 | 113.113 | 2018-07-28 | 11:30:33 | 107.86 | 13:18:24 | 103.59 |
4219 | ID1787036788 | 1.0 | NaN | 30.877 | 3.0 | 6.0 | -38.311620 | 145.478355 | -36.832092 | 145.600747 | 165.050 | 2018-07-18 | 11:58:01 | 166.50 | 14:44:31 | 75.01 |
4253 | ID1680028038 | 2.0 | NaN | 13.860 | 22.0 | 19.0 | -36.837363 | 144.515369 | -37.089526 | 144.531402 | 28.106 | 2018-06-17 | 04:43:00 | 31.74 | 5:14:44 | 120.43 |
6299 | ID1377767619 | 1.0 | NaN | 45.416 | 23.0 | 13.0 | -37.720326 | 146.008038 | -38.222304 | 147.498125 | 142.197 | 2018-01-13 | 09:36:02 | 144.38 | 12:00:24 | 69.85 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
26778 | ID1571528676 | 1.0 | NaN | 15.062 | 33.0 | 9.0 | -38.965718 | 144.650812 | -38.962812 | 145.292314 | 55.525 | 2018-04-28 | 21:13:25 | 60.51 | 22:13:55 | 89.97 |
28691 | ID5883461461 | 1.0 | NaN | 40.325 | 4.0 | 29.0 | -36.542055 | 143.024959 | -37.232887 | 142.980514 | 77.003 | 2018-05-01 | 04:18:09 | 81.29 | 5:39:26 | 124.92 |
30861 | ID1971183564 | 3.0 | NaN | 42.700 | 19.0 | 4.0 | -37.120747 | 144.287009 | -36.762751 | 143.078632 | 114.656 | 2018-02-25 | 16:19:23 | 97.43 | 17:56:48 | 144.33 |
32921 | ID5291598933 | 2.0 | NaN | 27.189 | 2.0 | 7.0 | -37.564903 | 148.097962 | -38.797131 | 143.873724 | 394.211 | 2018-01-09 | 12:39:15 | 359.59 | 18:38:50 | 165.54 |
36991 | ID1484986030 | 1.0 | NaN | 27.517 | 37.0 | 36.0 | -38.938504 | 147.755657 | -37.060130 | 144.993660 | 320.003 | 2018-01-07 | 20:24:23 | 316.45 | 1:40:49 | 99.58 |
20 rows × 16 columns
#用投递速度来区分
def speed(df):
speed = df['Journey Distance']/df['Travel Time']
return round(speed,2)
data_1.loc[data_1['Post Type']==0].apply(speed,axis=1)
0 1.06
1 0.88
2 1.09
3 1.10
4 1.04
...
37837 0.99
37839 1.08
37841 1.22
37842 1.24
37889 0.99
Length: 26529, dtype: float64
#用投递价格来区分
def price(df):
price = df['Delivery Fare']/df['Package Weight']
return round(price,2)
data_1.loc[data_1['Post Type']==0,['Delivery Fare','Package Weight']].apply(price,axis=1)
0 4.58
1 3.81
2 19.53
3 9.18
4 13.78
...
37837 13.20
37839 7.10
37841 4.00
37842 12.92
37889 2.03
Length: 26529, dtype: float64
#ID以5还是1开头为区别
data_1.loc[data_1['Post Type'].isnull(),'Post Type'] = data_1.loc[data_1['Post Type'].isnull(),'Id'].apply(lambda s:s[2]=='1')
任务13:修复 'Origin Longitude'与 'Origin Latitude' 列中错误的值
data_1['Origin Longitude'].describe()
#有负值
count 37844.000000
mean 145.423081
std 6.929107
min -148.337157
25% 143.964265
50% 145.424189
75% 147.171954
max 148.450576
Name: Origin Longitude, dtype: float64
def fix_Longitude_Latitude(flt):
if flt<0:
return -flt
else:
return flt
data_1['Origin Longitude'].apply(fix_Longitude_Latitude).describe()
count 37844.000000
mean 145.577375
std 1.764044
min 142.769991
25% 143.966143
50% 145.426161
75% 147.172965
max 148.450576
Name: Origin Longitude, dtype: float64
data_1['Origin Longitude'] = data_1['Origin Longitude'].apply(fix_Longitude_Latitude)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
data_1['Origin Latitude'].describe()
count 37844.000000
mean -37.728738
std 1.900393
min -39.006941
25% -38.442905
50% -37.707244
75% -37.094433
max 38.986998
Name: Origin Latitude, dtype: float64
data_1['Origin Latitude'] = data_1['Origin Latitude'].apply(fix_Longitude_Latitude)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
任务14:修复 'Destination Latitude' 与 'Destination Longitude'列中错误的值
data_1['Destination Latitude'] = data_1['Destination Latitude'].apply(fix_Longitude_Latitude)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
data_1['Destination Longitude'] = data_1['Destination Longitude'].apply(fix_Longitude_Latitude)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
任务 15:填补'Origin Region'和‘Destination Region’中的空值
data_1[data_1['Origin Region'].isnull()]
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2080 | ID1634835235 | 1.0 | 0 | 31.871 | NaN | 22.0 | 37.300712 | 143.610855 | 36.794684 | 144.280083 | 81.904 | 2018-06-09 | 03:21:49 | 86.04 | 4:47:51 | 95.70 |
4442 | ID1321659107 | 2.0 | 0 | 12.991 | NaN | 10.0 | 38.180248 | 146.775985 | 37.396662 | 147.313624 | 99.224 | 2018-04-12 | 11:25:54 | 95.42 | 13:01:19 | 98.48 |
8220 | ID1522212019 | 1.0 | 0 | 25.282 | NaN | 23.0 | 36.756457 | 148.437792 | 37.665363 | 145.818608 | 253.275 | 2018-03-27 | 17:56:31 | 251.88 | 22:08:23 | 94.56 |
8228 | ID1946204979 | 2.0 | 0 | 12.779 | NaN | 25.0 | 37.391203 | 148.116452 | 36.520786 | 147.557258 | 108.914 | 2018-07-08 | 10:57:12 | 104.10 | 12:41:17 | 102.49 |
15023 | ID1891262263 | 1.0 | 0 | 36.466 | NaN | 7.0 | 37.537619 | 145.202427 | 38.723869 | 143.685908 | 187.265 | 2018-03-13 | 07:21:08 | 188.00 | 10:29:08 | 72.78 |
20513 | ID1490968406 | 1.0 | 0 | 42.648 | NaN | 9.0 | 36.654443 | 143.691689 | 38.849982 | 145.479151 | 290.643 | 2018-02-16 | 18:46:02 | 288.04 | 23:34:04 | 99.81 |
20515 | ID5148310393 | 1.0 | 1 | 44.505 | NaN | 7.0 | 38.973121 | 142.989422 | 38.823172 | 143.990199 | 88.293 | 2018-04-18 | 23:35:27 | 92.22 | 1:07:40 | 119.24 |
28710 | ID5941350307 | 1.0 | 1 | 17.232 | NaN | 32.0 | 37.616289 | 146.052853 | 36.696768 | 148.280560 | 222.563 | 2018-02-22 | 02:17:46 | 222.16 | 5:59:55 | 125.46 |
34853 | ID5234294750 | 2.0 | 1 | 29.063 | NaN | 7.0 | 37.048542 | 142.843945 | 38.852320 | 143.963431 | 223.542 | 2018-07-14 | 13:22:02 | 206.75 | 16:48:47 | 151.47 |
36904 | ID1724943602 | 1.0 | 0 | 31.774 | NaN | 31.0 | 36.736858 | 143.688416 | 37.548149 | 143.885770 | 91.993 | 2018-02-12 | 14:31:24 | 95.80 | 16:07:11 | 81.57 |
data_1.loc[data_1['Origin Region'].notnull()]
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ID1645282128 | 2.0 | 0 | 21.686 | 19.0 | 38.0 | 37.089338 | 144.429529 | 37.639134 | 142.891391 | 149.212 | 2018-01-16 | 09:38:17 | 140.19 | 11:58:28 | 99.25 |
1 | ID1697620764 | NaN | 0 | 39.075 | 15.0 | 15.0 | 38.481935 | 146.009567 | 38.585528 | 146.199827 | 20.185 | 2018-02-10 | 04:28:17 | 22.84 | 4:51:07 | 149.04 |
2 | ID1543933503 | 2.0 | 0 | 7.243 | 33.0 | 28.0 | 38.754167 | 144.509664 | 38.242224 | 147.855342 | 296.975 | 2018-05-05 | 01:38:03 | 272.52 | 6:10:34 | 141.48 |
3 | ID1756517608 | 2.0 | 0 | 13.383 | 10.0 | 38.0 | 37.240526 | 147.568019 | 37.687178 | 142.991188 | 407.396 | 2018-06-11 | 11:43:04 | 371.40 | 17:54:27 | 122.82 |
4 | ID1832325834 | 2.0 | 0 | 8.123 | 1.0 | 8.0 | 38.143985 | 143.798292 | 38.548315 | 144.769228 | 95.974 | 2018-03-16 | 14:50:25 | 92.51 | 16:22:55 | 111.97 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
37839 | ID1879423081 | 2.0 | 0 | 17.081 | 20.0 | 14.0 | 38.860358 | 148.174855 | 37.562224 | 146.443991 | 209.278 | 2018-02-15 | 12:09:15 | 193.98 | 15:23:13 | 121.33 |
37840 | ID5705840841 | 1.0 | 1 | 35.164 | 29.0 | 23.0 | 37.250331 | 142.837244 | 37.739285 | 145.963420 | 281.403 | 2018-01-01 | 00:07:35 | 279.10 | 4:46:41 | 136.58 |
37841 | ID1276239209 | 3.0 | 0 | 36.704 | 12.0 | 11.0 | 36.578568 | 145.273744 | 38.305557 | 146.997297 | 245.269 | 2018-01-24 | 06:48:05 | 200.54 | 10:08:37 | 146.69 |
37842 | ID1432868583 | 3.0 | 0 | 13.195 | 37.0 | 21.0 | 38.755802 | 147.744770 | 37.487866 | 143.585293 | 390.599 | 2018-04-07 | 18:57:52 | 315.28 | 0:13:08 | 170.52 |
37889 | NaN | 1.0 | 0 | 34.130 | 8.0 | 36.0 | 38.434373 | 144.730220 | 37.047801 | 145.309517 | 162.554 | 2018-02-06 | 06:27:55 | 164.08 | 9:11:59 | 69.44 |
37834 rows × 16 columns
data_2 = data_1.loc[data_1['Origin Region'].notnull()]
def fill_region(df):
longitude,latitude = df['Origin Longitude'],df['Origin Latitude']
distance_0 = 100
region_out = ''
for index in data_2.index:
longitude_ = data_2.loc[index,'Origin Longitude']
latitude_ = data_2.loc[index,'Origin Latitude']
region = data_2.loc[index,'Origin Region']
distance = (longitude-longitude_)**2+(latitude_-latitude)**2
if distance<distance_0 :
region_out = region
distance_0 = distance
return region_out
data_1.loc[data_1['Origin Region'].isnull()].apply(fill_region,axis=1)
2080 21.0
4442 11.0
8220 32.0
8228 2.0
15023 5.0
20513 16.0
20515 24.0
28710 23.0
34853 29.0
36904 16.0
dtype: float64
data_1.loc[data_1['Origin Region'].isnull()]
Id | Drone Type | Post Type | Package Weight | Origin Region | Destination Region | Origin Latitude | Origin Longitude | Destination Latitude | Destination Longitude | Journey Distance | Departure Date | Departure Time | Travel Time | Delivery Time | Delivery Fare | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2080 | ID1634835235 | 1.0 | 0 | 31.871 | NaN | 22.0 | 37.300712 | 143.610855 | 36.794684 | 144.280083 | 81.904 | 2018-06-09 | 03:21:49 | 86.04 | 4:47:51 | 95.70 |
4442 | ID1321659107 | 2.0 | 0 | 12.991 | NaN | 10.0 | 38.180248 | 146.775985 | 37.396662 | 147.313624 | 99.224 | 2018-04-12 | 11:25:54 | 95.42 | 13:01:19 | 98.48 |
8220 | ID1522212019 | 1.0 | 0 | 25.282 | NaN | 23.0 | 36.756457 | 148.437792 | 37.665363 | 145.818608 | 253.275 | 2018-03-27 | 17:56:31 | 251.88 | 22:08:23 | 94.56 |
8228 | ID1946204979 | 2.0 | 0 | 12.779 | NaN | 25.0 | 37.391203 | 148.116452 | 36.520786 | 147.557258 | 108.914 | 2018-07-08 | 10:57:12 | 104.10 | 12:41:17 | 102.49 |
15023 | ID1891262263 | 1.0 | 0 | 36.466 | NaN | 7.0 | 37.537619 | 145.202427 | 38.723869 | 143.685908 | 187.265 | 2018-03-13 | 07:21:08 | 188.00 | 10:29:08 | 72.78 |
20513 | ID1490968406 | 1.0 | 0 | 42.648 | NaN | 9.0 | 36.654443 | 143.691689 | 38.849982 | 145.479151 | 290.643 | 2018-02-16 | 18:46:02 | 288.04 | 23:34:04 | 99.81 |
20515 | ID5148310393 | 1.0 | 1 | 44.505 | NaN | 7.0 | 38.973121 | 142.989422 | 38.823172 | 143.990199 | 88.293 | 2018-04-18 | 23:35:27 | 92.22 | 1:07:40 | 119.24 |
28710 | ID5941350307 | 1.0 | 1 | 17.232 | NaN | 32.0 | 37.616289 | 146.052853 | 36.696768 | 148.280560 | 222.563 | 2018-02-22 | 02:17:46 | 222.16 | 5:59:55 | 125.46 |
34853 | ID5234294750 | 2.0 | 1 | 29.063 | NaN | 7.0 | 37.048542 | 142.843945 | 38.852320 | 143.963431 | 223.542 | 2018-07-14 | 13:22:02 | 206.75 | 16:48:47 | 151.47 |
36904 | ID1724943602 | 1.0 | 0 | 31.774 | NaN | 31.0 | 36.736858 | 143.688416 | 37.548149 | 143.885770 | 91.993 | 2018-02-12 | 14:31:24 | 95.80 | 16:07:11 | 81.57 |
任务16:找出 'Departure Date'中错误的值
data_1['Departure Date'].sort_values(ascending=False)
31975 2018-30-06
4990 2018-28-06
17911 2018-28-06
28350 2018-28-05
35740 2018-28-05
...
12702 2018-01-01
36755 2018-01-01
12684 2018-01-01
36782 2018-01-01
27230 2018-01-01
Name: Departure Date, Length: 37844, dtype: object
def fix_date(dt):
split_ = dt.split('-')
year,month,day = split_[0],split_[1],split_[2]
if month > '08':
return year+'-'+day+'-'+month
else:
return dt
data_1['Departure Date'] = data_1['Departure Date'].apply(fix_date)
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
data_1['Departure Date'].sort_values(ascending=False)
17849 2018-07-28
17570 2018-07-28
988 2018-07-28
592 2018-07-28
12772 2018-07-28
...
29092 2018-01-01
5232 2018-01-01
12584 2018-01-01
29041 2018-01-01
5519 2018-01-01
Name: Departure Date, Length: 37844, dtype: object
任务17:输出数据集为‘solution.csv’到当前目录下面
data_1.to_csv('solution.csv')
作者信息
Sean
Stay hungry,Stay foolish.