空气质量与车流量的相关性分析

空气质量与车流量对应指标的相关性分析

数据预处理

1.当天空气质量/车流量其中一类全部缺失/均缺失的占整体数据的不到5%,这部分数据直接删去,认为不影响准确性;

2.剩余数据根据时间进行了连接,去了两张表格相交的日期(2017/3/23-2023/6/25),共1789天(部分天数不连续);

3.面对空气质量衡量指标的部分缺失,考虑到表中无0值,这里假设空值均代表未测量到对应空气污染量,因此置为0。

处理目标

得到车辆数和空气质量以及大车数和空气质量的相关性。

代码实现

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns ;sns.set(color_codes=True)#用color_codes预定的颜色
import chardet#用于判断表中数据的类型
from sklearn.preprocessing import StandardScaler#为了标准化
#df = pd.read_csv("python_play.csv")
# 读取CSV文件
#df.head()
# 显示数据框的前几行
#with open('python_play.csv', 'rb') as f:
 #   content = f.read()
  #  print(content)
with open('last.csv', 'rb') as f:
    content = f.read()
    encoding = chardet.detect(content)['encoding']

print(encoding)

out:
UTF-8-SIG


# 读取CSV文件,指定编码为UTF-8-SIG
df = pd.read_csv('last.csv', encoding='UTF-8-SIG',usecols=lambda column: column != 'date')
df.head()

out:

stream long-car large-car middle-car light-car little-car pm25 pm10 o3 no2 so2 co
0 6245 601 218 347 389 4690 123 43 71 23 4 8
1 18504 2401 932 1612 1339 12220 87 45 34 30 4 9
2 16541 2528 1047 1808 1504 9654 88 44 36 21 4 8
3 13164 2194 876 1410 1255 7429 77 57 60 29 6 8
4 8533 1490 559 973 821 4690 104 67 73 28 6 7

车流量 长车流量 大型车流量 中型车流量 轻型车流量 微型车流量

scaler = StandardScaler()
# 初始化标准化器
df_normalized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# 对每列数据进行标准化
df_normalized.head()

out:

stream long-car large-car middle-car light-car little-car pm25 pm10 o3 no2 so2 co
0 -0.315919 -0.472971 -0.011644 -0.676805 0.263679 -0.254230 1.058263 -0.109127 1.027142 0.474434 0.130986 1.102301
1 1.028897 0.594040 1.689531 0.981990 2.363031 0.885089 0.006061 0.001327 -0.141546 1.312421 0.130986 1.693079
2 0.813555 0.669323 1.963529 1.239005 2.727656 0.496843 0.035289 -0.053900 -0.078374 0.235009 0.130986 1.102301
3 0.443097 0.471333 1.556105 0.717108 2.177405 0.160192 -0.286217 0.664057 0.679694 1.192709 1.132417 1.102301
4 -0.064925 0.054014 0.800822 0.144069 1.218332 -0.254230 0.502934 1.216331 1.090314 1.072996 1.132417 0.511523

df_normalized.corr()

out:

stream long-car large-car middle-car light-car little-car pm25 pm10 o3 no2 so2 co
stream 1.000000 0.960397 0.334564 0.936301 0.290630 0.984934 -0.291996 -0.259647 -0.249167 -0.195712 -0.076936 -0.182893
long-car 0.960397 1.000000 0.468035 0.893379 0.401189 0.909110 -0.219200 -0.208401 -0.184057 -0.138806 -0.000536 -0.128781
large-car 0.334564 0.468035 1.000000 0.274743 0.953798 0.181474 0.137599 0.086044 0.172481 0.161718 0.342927 0.185937
middle-car 0.936301 0.893379 0.274743 1.000000 0.257476 0.912902 -0.296563 -0.250422 -0.324601 -0.197463 -0.107954 -0.189003
light-car 0.290630 0.401189 0.953798 0.257476 1.000000 0.139703 0.148665 0.091512 0.195345 0.176855 0.386891 0.230381
little-car 0.984934 0.909110 0.181474 0.912902 0.139703 1.000000 -0.331484 -0.287759 -0.283557 -0.234101 -0.141788 -0.225157
pm25 -0.291996 -0.219200 0.137599 -0.296563 0.148665 -0.331484 1.000000 0.722665 0.422200 0.561810 0.473017 0.550983
pm10 -0.259647 -0.208401 0.086044 -0.250422 0.091512 -0.287759 0.722665 1.000000 0.523732 0.736929 0.530307 0.550590
o3 -0.249167 -0.184057 0.172481 -0.324601 0.195345 -0.283557 0.422200 0.523732 1.000000 0.417945 0.424055 0.373651
no2 -0.195712 -0.138806 0.161718 -0.197463 0.176855 -0.234101 0.561810 0.736929 0.417945 1.000000 0.552486 0.585678
so2 -0.076936 -0.000536 0.342927 -0.107954 0.386891 -0.141788 0.473017 0.530307 0.424055 0.552486 1.000000 0.465094
co -0.182893 -0.128781 0.185937 -0.189003 0.230381 -0.225157 0.550983 0.550590 0.373651 0.585678 0.465094 1.000000

  • 后续发现:是否标准化对相关系数影响不变。

相关系数表

df.corr()

out:

stream long-car large-car middle-car light-car little-car pm25 pm10 o3 no2 so2 co
stream 1.000000 0.960397 0.334564 0.936301 0.290630 0.984934 -0.291996 -0.259647 -0.249167 -0.195712 -0.076936 -0.182893
long-car 0.960397 1.000000 0.468035 0.893379 0.401189 0.909110 -0.219200 -0.208401 -0.184057 -0.138806 -0.000536 -0.128781
large-car 0.334564 0.468035 1.000000 0.274743 0.953798 0.181474 0.137599 0.086044 0.172481 0.161718 0.342927 0.185937
middle-car 0.936301 0.893379 0.274743 1.000000 0.257476 0.912902 -0.296563 -0.250422 -0.324601 -0.197463 -0.107954 -0.189003
light-car 0.290630 0.401189 0.953798 0.257476 1.000000 0.139703 0.148665 0.091512 0.195345 0.176855 0.386891 0.230381
little-car 0.984934 0.909110 0.181474 0.912902 0.139703 1.000000 -0.331484 -0.287759 -0.283557 -0.234101 -0.141788 -0.225157
pm25 -0.291996 -0.219200 0.137599 -0.296563 0.148665 -0.331484 1.000000 0.722665 0.422200 0.561810 0.473017 0.550983
pm10 -0.259647 -0.208401 0.086044 -0.250422 0.091512 -0.287759 0.722665 1.000000 0.523732 0.736929 0.530307 0.550590
o3 -0.249167 -0.184057 0.172481 -0.324601 0.195345 -0.283557 0.422200 0.523732 1.000000 0.417945 0.424055 0.373651
no2 -0.195712 -0.138806 0.161718 -0.197463 0.176855 -0.234101 0.561810 0.736929 0.417945 1.000000 0.552486 0.585678
so2 -0.076936 -0.000536 0.342927 -0.107954 0.386891 -0.141788 0.473017 0.530307 0.424055 0.552486 1.000000 0.465094
co -0.182893 -0.128781 0.185937 -0.189003 0.230381 -0.225157 0.550983 0.550590 0.373651 0.585678 0.465094 1.000000

轻型车与衡量空气质量指标的正相关最为明显(最高为so2与轻型车流量,达到38.7%),接着是大型车(最高为so2与大型车流量,达到34.3%),其余车型关于空气质量均出现了不同程度的负相关。

还可看到,微型车与车流量的正相关程度最高。大型车次之,均在96%以上。

两两变量关系图

sns.pairplot(df_normalized)
D:\anaconda3\envs\FLpyth38\lib\site-packages\seaborn\axisgrid.py:123: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

out:
<seaborn.axisgrid.PairGrid at 0x26837f3c190>



png

相关系数热力图

sns.heatmap(df_normalized.corr())

out:
<Axes: >

png


相关系数聚类图

sns.clustermap(df_normalized.corr(), figsize=(7, 7))

out:
<seaborn.matrix.ClusterGrid at 0x2685052ffd0>

png


列名说明

print(df.columns)

out:
Index(['stream', 'long-car', 'large-car', 'middle-car', 'light-car',
'little-car', ' pm25', ' pm10', ' o3', ' no2', ' so2', ' co'],
dtype='object')


相关程度较高的关系呈现

车流量与pm2.5关系图

sns.jointplot(x='stream',y=' pm25',data=df_normalized)

out:
<seaborn.axisgrid.JointGrid at 0x268546bcd00>

<Figure size 400x400 with 0 Axes>

png


车流量与pm10关系图

sns.jointplot(x='stream',y=' pm10',data=df_normalized)

out:
<seaborn.axisgrid.JointGrid at 0x26847b7d160>

png


大型车流量与SO2浓度

sns.jointplot(x='large-car',y=' so2',data=df_normalized)

out:

<seaborn.axisgrid.JointGrid at 0x268458aa040>

png


大型车流量与CO浓度关系

sns.jointplot(x='large-car',y=' co',data=df_normalized)

out:

<seaborn.axisgrid.JointGrid at 0x268485f2c40>

png


车流量与小型车流量关系

sns.jointplot(x='stream',y='little-car',data=df_normalized,kind='hex')

out:

<seaborn.axisgrid.JointGrid at 0x26848ac0100>

png


车流量与大型车流量关系

sns.jointplot(x='stream',y='large-car',data=df_normalized,kind='hex')

out:

<seaborn.axisgrid.JointGrid at 0x26848ce2250>

png


大型车流量与SO2浓度关系

sns.jointplot(x='large-car',y=' so2',data=df_normalized,kind='hex')

out:

<seaborn.axisgrid.JointGrid at 0x268493f6730>

png


车流量与pm10浓度关系

sns.jointplot(x='stream',y=' pm10',data=df_normalized,kind='hex')

out:

<seaborn.axisgrid.JointGrid at 0x268498d8a30>


png


车流量与pm10浓度关系

sns.jointplot(x='stream',y=' pm10',data=df_normalized,kind='reg')

out:

<seaborn.axisgrid.JointGrid at 0x26849b37430>


png


车流量与大型车流量浓度关系

sns.jointplot(x='large-car',y='stream',data=df_normalized,kind='reg')

out:
<seaborn.axisgrid.JointGrid at 0x26849f8a2b0>


png


车流量与轻型车流量关系

sns.jointplot(x='light-car',y='stream',data=df_normalized,kind='reg')

out:

<seaborn.axisgrid.JointGrid at 0x2684ebc6af0>

png


轻型车流量与SO2浓度关系

sns.jointplot(x='light-car',y=' so2',data=df_normalized,kind='reg')

out:

<seaborn.axisgrid.JointGrid at 0x2684df7e040>


png


大型车流量与SO2浓度关系

sns.jointplot(x='large-car',y=' so2',data=df_normalized,kind='reg')

out:

<seaborn.axisgrid.JointGrid at 0x2684e46f1c0>


png


posted @ 2024-05-25 22:01  岁月月宝贝  阅读(49)  评论(0编辑  收藏  举报