想要改变世界,就得先改变自己。 ------ 博客首页

5-3 可视化库Seaborn-变量分析绘图

 

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
from scipy import stats,integrate
import matplotlib.pyplot as plt

import seaborn as sns
sns.set(color_codes=True)
#生成高斯数据随机种子
np.random.seed(sum(map(ord,"distributions")))
 

1.简单的单变量数据特征绘图

  • 直方图:显示数据范围里的数据个数
  • kde:核密度估计
In [2]:
x=np.random.normal(size=100)
#画直方图bins自动分组
sns.distplot(x,kde=False)
 
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
Out[2]:
<matplotlib.axes._subplots.AxesSubplot at 0x98fb978>
 
 

在distplot()中改变bins的个数,将数据分成30组

In [3]:
sns.distplot(x,bins=30,kde=False)
 
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x9e0d2b0>
 
 

2.查看数据分布情况

In [4]:
x=np.random.gamma(6,size=200)
sns.distplot(x,kde=False,fit=stats.gamma)#fit=stats.gamma是拟合统计的曲线
 
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0xae8ab00>
 
 

3.根据均值和协方差生成数据

In [5]:
mean,cov=[0,1],[(1,.5),(.5,1)]
data=np.random.multivariate_normal(mean,cov,200)
df=pd.DataFrame(data,columns=["x","y"])
df
Out[5]:
 
 xy
0 2.190873 2.902961
1 0.387901 3.441322
2 -1.304909 0.586173
3 -0.016867 0.907323
4 0.284953 1.189304
5 -0.050474 0.670980
6 0.722333 1.062931
7 -0.026326 1.294782
8 -0.788587 0.669541
9 -0.372764 1.731517
10 0.793945 0.844329
11 -1.587542 -0.325003
12 0.982330 -0.079164
13 -0.709190 0.617583
14 -0.320185 1.700419
15 -1.107602 1.969576
16 -0.152153 0.863231
17 0.672152 -0.337424
18 -0.054459 1.291490
19 -0.854301 0.461832
20 -1.467110 0.988230
21 0.769059 0.464059
22 0.864182 2.160841
23 -0.320895 -0.682581
24 0.201675 0.767145
25 0.910064 0.352476
26 -0.203879 2.281753
27 -1.968103 0.814249
28 -0.312965 1.835252
29 -1.017516 2.107019
... ... ...
170 -0.820467 2.150415
171 1.987218 2.863377
172 0.541367 1.672410
173 -0.230476 1.188198
174 0.654961 3.311254
175 -0.393180 -0.064882
176 -0.466270 -0.311687
177 -1.669818 -0.640678
178 -0.010700 1.530689
179 -0.726582 0.929317
180 2.601033 1.901285
181 -0.035434 2.095059
182 -1.025942 0.567045
183 0.029807 -0.504842
184 -0.469849 0.985867
185 -0.759971 0.572691
186 -1.028649 0.214142
187 -0.875858 0.196325
188 -0.473615 0.036407
189 0.736970 2.111486
190 -0.739024 0.271240
191 -0.278210 -0.210885
192 0.073279 2.083343
193 -0.302893 -0.749108
194 1.776171 2.567845
195 -0.804338 0.139381
196 1.674393 2.735944
197 -1.237634 0.002766
198 -1.044683 0.482758
199 -0.890160 0.042753

200 rows × 2 columns

 

4.观测两个变量之间的分布关系最好用散点图

In [6]:
sns.jointplot(x="x",y="y",data=df)
 
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
Out[6]:
<seaborn.axisgrid.JointGrid at 0xae3ae80>
 
 

5.数据量大时通过颜色深浅来判别每个区域的密度

  • kind:指定绘图方式有{ “scatter” | “reg” | “resid” | “kde” | “hex” }这几种
In [7]:
x,y=np.random.multivariate_normal(mean,cov,2000).T
with sns.axes_style("white"):#指定风格
      sns.jointplot(x=x,y=y,kind="hex",color="k")
 
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
 
In [8]:
 sns.jointplot(x=x,y=y,kind="kde",color="k")
Out[8]:
<seaborn.axisgrid.JointGrid at 0xb0880f0>
 
In [9]:
 sns.jointplot(x=x,y=y,kind="reg",color="k")
 
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
E:\Software\Anaconda3_5.2.0\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
Out[9]:
<seaborn.axisgrid.JointGrid at 0xb3115f8>
 
 

5.用鸢尾花数据来绘制两个变量间的关系图

In [10]:
iris=sns.load_dataset("iris")
sns.pairplot(iris)
Out[10]:
<seaborn.axisgrid.PairGrid at 0xb3eeeb8>
 
posted @ 2019-10-30 20:37  karina512  阅读(467)  评论(0编辑  收藏  举报