空气质量与车流量的相关性分析
空气质量与车流量对应指标的相关性分析
数据预处理
1.当天空气质量/车流量其中一类全部缺失/均缺失的占整体数据的不到5%,这部分数据直接删去,认为不影响准确性;
2.剩余数据根据时间进行了连接,去了两张表格相交的日期(2017/3/23-2023/6/25),共1789天(部分天数不连续);
3.面对空气质量衡量指标的部分缺失,考虑到表中无0值,这里假设空值均代表未测量到对应空气污染量,因此置为0。
处理目标
得到车辆数和空气质量以及大车数和空气质量的相关性。
代码实现
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns ;sns.set(color_codes=True)#用color_codes预定的颜色
import chardet#用于判断表中数据的类型
from sklearn.preprocessing import StandardScaler#为了标准化
#df = pd.read_csv("python_play.csv")
# 读取CSV文件
#df.head()
# 显示数据框的前几行
#with open('python_play.csv', 'rb') as f:
# content = f.read()
# print(content)
with open('last.csv', 'rb') as f:
content = f.read()
encoding = chardet.detect(content)['encoding']
print(encoding)
out:
UTF-8-SIG
# 读取CSV文件,指定编码为UTF-8-SIG
df = pd.read_csv('last.csv', encoding='UTF-8-SIG',usecols=lambda column: column != 'date')
df.head()
out:
stream | long-car | large-car | middle-car | light-car | little-car | pm25 | pm10 | o3 | no2 | so2 | co | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6245 | 601 | 218 | 347 | 389 | 4690 | 123 | 43 | 71 | 23 | 4 | 8 |
1 | 18504 | 2401 | 932 | 1612 | 1339 | 12220 | 87 | 45 | 34 | 30 | 4 | 9 |
2 | 16541 | 2528 | 1047 | 1808 | 1504 | 9654 | 88 | 44 | 36 | 21 | 4 | 8 |
3 | 13164 | 2194 | 876 | 1410 | 1255 | 7429 | 77 | 57 | 60 | 29 | 6 | 8 |
4 | 8533 | 1490 | 559 | 973 | 821 | 4690 | 104 | 67 | 73 | 28 | 6 | 7 |
车流量 长车流量 大型车流量 中型车流量 轻型车流量 微型车流量
scaler = StandardScaler()
# 初始化标准化器
df_normalized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# 对每列数据进行标准化
df_normalized.head()
out:
stream | long-car | large-car | middle-car | light-car | little-car | pm25 | pm10 | o3 | no2 | so2 | co | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.315919 | -0.472971 | -0.011644 | -0.676805 | 0.263679 | -0.254230 | 1.058263 | -0.109127 | 1.027142 | 0.474434 | 0.130986 | 1.102301 |
1 | 1.028897 | 0.594040 | 1.689531 | 0.981990 | 2.363031 | 0.885089 | 0.006061 | 0.001327 | -0.141546 | 1.312421 | 0.130986 | 1.693079 |
2 | 0.813555 | 0.669323 | 1.963529 | 1.239005 | 2.727656 | 0.496843 | 0.035289 | -0.053900 | -0.078374 | 0.235009 | 0.130986 | 1.102301 |
3 | 0.443097 | 0.471333 | 1.556105 | 0.717108 | 2.177405 | 0.160192 | -0.286217 | 0.664057 | 0.679694 | 1.192709 | 1.132417 | 1.102301 |
4 | -0.064925 | 0.054014 | 0.800822 | 0.144069 | 1.218332 | -0.254230 | 0.502934 | 1.216331 | 1.090314 | 1.072996 | 1.132417 | 0.511523 |
df_normalized.corr()
out:
stream | long-car | large-car | middle-car | light-car | little-car | pm25 | pm10 | o3 | no2 | so2 | co | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
stream | 1.000000 | 0.960397 | 0.334564 | 0.936301 | 0.290630 | 0.984934 | -0.291996 | -0.259647 | -0.249167 | -0.195712 | -0.076936 | -0.182893 |
long-car | 0.960397 | 1.000000 | 0.468035 | 0.893379 | 0.401189 | 0.909110 | -0.219200 | -0.208401 | -0.184057 | -0.138806 | -0.000536 | -0.128781 |
large-car | 0.334564 | 0.468035 | 1.000000 | 0.274743 | 0.953798 | 0.181474 | 0.137599 | 0.086044 | 0.172481 | 0.161718 | 0.342927 | 0.185937 |
middle-car | 0.936301 | 0.893379 | 0.274743 | 1.000000 | 0.257476 | 0.912902 | -0.296563 | -0.250422 | -0.324601 | -0.197463 | -0.107954 | -0.189003 |
light-car | 0.290630 | 0.401189 | 0.953798 | 0.257476 | 1.000000 | 0.139703 | 0.148665 | 0.091512 | 0.195345 | 0.176855 | 0.386891 | 0.230381 |
little-car | 0.984934 | 0.909110 | 0.181474 | 0.912902 | 0.139703 | 1.000000 | -0.331484 | -0.287759 | -0.283557 | -0.234101 | -0.141788 | -0.225157 |
pm25 | -0.291996 | -0.219200 | 0.137599 | -0.296563 | 0.148665 | -0.331484 | 1.000000 | 0.722665 | 0.422200 | 0.561810 | 0.473017 | 0.550983 |
pm10 | -0.259647 | -0.208401 | 0.086044 | -0.250422 | 0.091512 | -0.287759 | 0.722665 | 1.000000 | 0.523732 | 0.736929 | 0.530307 | 0.550590 |
o3 | -0.249167 | -0.184057 | 0.172481 | -0.324601 | 0.195345 | -0.283557 | 0.422200 | 0.523732 | 1.000000 | 0.417945 | 0.424055 | 0.373651 |
no2 | -0.195712 | -0.138806 | 0.161718 | -0.197463 | 0.176855 | -0.234101 | 0.561810 | 0.736929 | 0.417945 | 1.000000 | 0.552486 | 0.585678 |
so2 | -0.076936 | -0.000536 | 0.342927 | -0.107954 | 0.386891 | -0.141788 | 0.473017 | 0.530307 | 0.424055 | 0.552486 | 1.000000 | 0.465094 |
co | -0.182893 | -0.128781 | 0.185937 | -0.189003 | 0.230381 | -0.225157 | 0.550983 | 0.550590 | 0.373651 | 0.585678 | 0.465094 | 1.000000 |
- 后续发现:是否标准化对相关系数影响不变。
相关系数表
df.corr()
out:
stream | long-car | large-car | middle-car | light-car | little-car | pm25 | pm10 | o3 | no2 | so2 | co | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
stream | 1.000000 | 0.960397 | 0.334564 | 0.936301 | 0.290630 | 0.984934 | -0.291996 | -0.259647 | -0.249167 | -0.195712 | -0.076936 | -0.182893 |
long-car | 0.960397 | 1.000000 | 0.468035 | 0.893379 | 0.401189 | 0.909110 | -0.219200 | -0.208401 | -0.184057 | -0.138806 | -0.000536 | -0.128781 |
large-car | 0.334564 | 0.468035 | 1.000000 | 0.274743 | 0.953798 | 0.181474 | 0.137599 | 0.086044 | 0.172481 | 0.161718 | 0.342927 | 0.185937 |
middle-car | 0.936301 | 0.893379 | 0.274743 | 1.000000 | 0.257476 | 0.912902 | -0.296563 | -0.250422 | -0.324601 | -0.197463 | -0.107954 | -0.189003 |
light-car | 0.290630 | 0.401189 | 0.953798 | 0.257476 | 1.000000 | 0.139703 | 0.148665 | 0.091512 | 0.195345 | 0.176855 | 0.386891 | 0.230381 |
little-car | 0.984934 | 0.909110 | 0.181474 | 0.912902 | 0.139703 | 1.000000 | -0.331484 | -0.287759 | -0.283557 | -0.234101 | -0.141788 | -0.225157 |
pm25 | -0.291996 | -0.219200 | 0.137599 | -0.296563 | 0.148665 | -0.331484 | 1.000000 | 0.722665 | 0.422200 | 0.561810 | 0.473017 | 0.550983 |
pm10 | -0.259647 | -0.208401 | 0.086044 | -0.250422 | 0.091512 | -0.287759 | 0.722665 | 1.000000 | 0.523732 | 0.736929 | 0.530307 | 0.550590 |
o3 | -0.249167 | -0.184057 | 0.172481 | -0.324601 | 0.195345 | -0.283557 | 0.422200 | 0.523732 | 1.000000 | 0.417945 | 0.424055 | 0.373651 |
no2 | -0.195712 | -0.138806 | 0.161718 | -0.197463 | 0.176855 | -0.234101 | 0.561810 | 0.736929 | 0.417945 | 1.000000 | 0.552486 | 0.585678 |
so2 | -0.076936 | -0.000536 | 0.342927 | -0.107954 | 0.386891 | -0.141788 | 0.473017 | 0.530307 | 0.424055 | 0.552486 | 1.000000 | 0.465094 |
co | -0.182893 | -0.128781 | 0.185937 | -0.189003 | 0.230381 | -0.225157 | 0.550983 | 0.550590 | 0.373651 | 0.585678 | 0.465094 | 1.000000 |
轻型车与衡量空气质量指标的正相关最为明显(最高为so2与轻型车流量,达到38.7%),接着是大型车(最高为so2与大型车流量,达到34.3%),其余车型关于空气质量均出现了不同程度的负相关。
还可看到,微型车与车流量的正相关程度最高。大型车次之,均在96%以上。
两两变量关系图
sns.pairplot(df_normalized)
D:\anaconda3\envs\FLpyth38\lib\site-packages\seaborn\axisgrid.py:123: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)
out:
<seaborn.axisgrid.PairGrid at 0x26837f3c190>
相关系数热力图
sns.heatmap(df_normalized.corr())
out:
<Axes: >
相关系数聚类图
sns.clustermap(df_normalized.corr(), figsize=(7, 7))
out:
<seaborn.matrix.ClusterGrid at 0x2685052ffd0>
列名说明
print(df.columns)
out:
Index(['stream', 'long-car', 'large-car', 'middle-car', 'light-car',
'little-car', ' pm25', ' pm10', ' o3', ' no2', ' so2', ' co'],
dtype='object')
相关程度较高的关系呈现
车流量与pm2.5关系图
sns.jointplot(x='stream',y=' pm25',data=df_normalized)
out:
<seaborn.axisgrid.JointGrid at 0x268546bcd00>
<Figure size 400x400 with 0 Axes>
车流量与pm10关系图
sns.jointplot(x='stream',y=' pm10',data=df_normalized)
out:
<seaborn.axisgrid.JointGrid at 0x26847b7d160>
大型车流量与SO2浓度
sns.jointplot(x='large-car',y=' so2',data=df_normalized)
out:
<seaborn.axisgrid.JointGrid at 0x268458aa040>
大型车流量与CO浓度关系
sns.jointplot(x='large-car',y=' co',data=df_normalized)
out:
<seaborn.axisgrid.JointGrid at 0x268485f2c40>
车流量与小型车流量关系
sns.jointplot(x='stream',y='little-car',data=df_normalized,kind='hex')
out:
<seaborn.axisgrid.JointGrid at 0x26848ac0100>
车流量与大型车流量关系
sns.jointplot(x='stream',y='large-car',data=df_normalized,kind='hex')
out:
<seaborn.axisgrid.JointGrid at 0x26848ce2250>
大型车流量与SO2浓度关系
sns.jointplot(x='large-car',y=' so2',data=df_normalized,kind='hex')
out:
<seaborn.axisgrid.JointGrid at 0x268493f6730>
车流量与pm10浓度关系
sns.jointplot(x='stream',y=' pm10',data=df_normalized,kind='hex')
out:
<seaborn.axisgrid.JointGrid at 0x268498d8a30>
车流量与pm10浓度关系
sns.jointplot(x='stream',y=' pm10',data=df_normalized,kind='reg')
out:
<seaborn.axisgrid.JointGrid at 0x26849b37430>
车流量与大型车流量浓度关系
sns.jointplot(x='large-car',y='stream',data=df_normalized,kind='reg')
out:
<seaborn.axisgrid.JointGrid at 0x26849f8a2b0>
车流量与轻型车流量关系
sns.jointplot(x='light-car',y='stream',data=df_normalized,kind='reg')
out:
<seaborn.axisgrid.JointGrid at 0x2684ebc6af0>
轻型车流量与SO2浓度关系
sns.jointplot(x='light-car',y=' so2',data=df_normalized,kind='reg')
out:
<seaborn.axisgrid.JointGrid at 0x2684df7e040>
大型车流量与SO2浓度关系
sns.jointplot(x='large-car',y=' so2',data=df_normalized,kind='reg')
out:
<seaborn.axisgrid.JointGrid at 0x2684e46f1c0>