数据的直观分析
4.2 数据的直观分析
4.2.1 常用的绘图函数
matplotlib是Python的基本绘图包,是一个Python的图形框架。提供了一整套与matlab相似的命令API,十分适合基本统计图形的绘制。
# 绘制图形时,须作一些基本设置
%config InlineBackend.figure_format='retina' # 提高图形显示的清晰度
4.2.1.1 计数数据统计图
X=['A','B','C','D','E','F','G']
Y=[1,4,7,3,2,5,6]
- 条图
import matplotlib.pyplot as plt # 加载基本绘图包
plt.bar(X,Y) # 条图
<BarContainer object of 7 artists>
- 饼图
plt.pie(Y,labels=X) # 饼图
([<matplotlib.patches.Wedge at 0x2ddde98f190>,
<matplotlib.patches.Wedge at 0x2ddde98f6d0>,
<matplotlib.patches.Wedge at 0x2ddde98fbb0>,
<matplotlib.patches.Wedge at 0x2ddde99c0d0>,
<matplotlib.patches.Wedge at 0x2ddde99c5b0>,
<matplotlib.patches.Wedge at 0x2ddde99ca90>,
<matplotlib.patches.Wedge at 0x2ddde99cf10>],
[Text(1.0930834302648262, 0.12316092919623813, 'A'),
Text(0.8600146100749843, 0.6858388079261577, 'B'),
Text(-0.3633070202274737, 1.038271645116746, 'C'),
Text(-1.0930834374717984, 0.12316086523257752, 'D'),
Text(-0.9910657227744873, -0.47727217930807914, 'E'),
Text(-0.3633068744124594, -1.038271696139623, 'F'),
Text(0.8600147063942706, -0.6858386871455828, 'G')])
4.2.1.2 计量数据统计图
- 线图
plt.plot(X,Y)
[<matplotlib.lines.Line2D at 0x2ddde9dfb20>]
- 直方图
import pandas as pd
BSdata=pd.read_excel('data/BSdata.xlsx','Sheet1');BSdata #读取数据
Region/Country/Area | Unnamed: 1 | Year | Series | Value | Footnotes | Source | |
---|---|---|---|---|---|---|---|
0 | 1 | Total, all countries or areas | 2010 | Population mid-year estimates (millions) | 6956.82 | NaN | United Nations Population Division, New York, ... |
1 | 1 | Total, all countries or areas | 2010 | Population mid-year estimates for males (milli... | 3507.70 | NaN | United Nations Population Division, New York, ... |
2 | 1 | Total, all countries or areas | 2010 | Population mid-year estimates for females (mil... | 3449.12 | NaN | United Nations Population Division, New York, ... |
3 | 1 | Total, all countries or areas | 2010 | Sex ratio (males per 100 females) | 101.70 | NaN | United Nations Population Division, New York, ... |
4 | 1 | Total, all countries or areas | 2010 | Population aged 0 to 14 years old (percentage) | 27.00 | NaN | United Nations Population Division, New York, ... |
5 | 1 | Total, all countries or areas | 2010 | Population aged 60+ years old (percentage) | 11.00 | NaN | United Nations Population Division, New York, ... |
6 | 1 | Total, all countries or areas | 2010 | Population density | 53.50 | NaN | United Nations Population Division, New York, ... |
7 | 1 | Total, all countries or areas | 2015 | Population mid-year estimates (millions) | 7379.80 | NaN | United Nations Population Division, New York, ... |
8 | 1 | Total, all countries or areas | 2015 | Population mid-year estimates for males (milli... | 3720.70 | NaN | United Nations Population Division, New York, ... |
9 | 1 | Total, all countries or areas | 2015 | Population mid-year estimates for females (mil... | 3659.10 | NaN | United Nations Population Division, New York, ... |
10 | 1 | Total, all countries or areas | 2015 | Sex ratio (males per 100 females) | 101.70 | NaN | United Nations Population Division, New York, ... |
11 | 1 | Total, all countries or areas | 2015 | Population aged 0 to 14 years old (percentage) | 26.20 | NaN | United Nations Population Division, New York, ... |
12 | 1 | Total, all countries or areas | 2015 | Population aged 60+ years old (percentage) | 12.20 | NaN | United Nations Population Division, New York, ... |
13 | 1 | Total, all countries or areas | 2015 | Population density | 56.70 | NaN | United Nations Population Division, New York, ... |
14 | 1 | Total, all countries or areas | 2015 | Surface area (thousand km2) | 136162.00 | NaN | United Nations Statistics Division, New York, ... |
15 | 1 | Total, all countries or areas | 2019 | Population mid-year estimates (millions) | 7713.47 | NaN | United Nations Population Division, New York, ... |
16 | 1 | Total, all countries or areas | 2019 | Population mid-year estimates for males (milli... | 3889.03 | NaN | United Nations Population Division, New York, ... |
17 | 1 | Total, all countries or areas | 2019 | Population mid-year estimates for females (mil... | 3824.43 | NaN | United Nations Population Division, New York, ... |
18 | 1 | Total, all countries or areas | 2019 | Sex ratio (males per 100 females) | 101.70 | NaN | United Nations Population Division, New York, ... |
19 | 1 | Total, all countries or areas | 2019 | Population aged 0 to 14 years old (percentage) | 25.60 | NaN | United Nations Population Division, New York, ... |
20 | 1 | Total, all countries or areas | 2019 | Population aged 60+ years old (percentage) | 13.20 | NaN | United Nations Population Division, New York, ... |
21 | 1 | Total, all countries or areas | 2019 | Population density | 59.30 | NaN | United Nations Population Division, New York, ... |
22 | 1 | Total, all countries or areas | 2019 | Surface area (thousand km2) | 130094.00 | NaN | United Nations Statistics Division, New York, ... |
23 | 1 | Total, all countries or areas | 2021 | Population mid-year estimates (millions) | 7874.97 | Projected estimate (medium fertility variant). | United Nations Population Division, New York, ... |
24 | 1 | Total, all countries or areas | 2021 | Population mid-year estimates for males (milli... | 3970.24 | Projected estimate (medium fertility variant). | United Nations Population Division, New York, ... |
plt.hist(BSdata['Year']) # 频数直方图,默认density=False
(array([7., 0., 0., 0., 8., 0., 0., 0., 8., 2.]),
array([2010. , 2011.1, 2012.2, 2013.3, 2014.4, 2015.5, 2016.6, 2017.7,
2018.8, 2019.9, 2021. ]),
<BarContainer object of 10 artists>)
plt.hist(BSdata.Series)
(array([4., 4., 3., 0., 3., 3., 0., 3., 3., 2.]),
array([0. , 0.7, 1.4, 2.1, 2.8, 3.5, 4.2, 4.9, 5.6, 6.3, 7. ]),
<BarContainer object of 10 artists>)
- 散点图 scatter
plt.scatter(BSdata.Year,BSdata.Series)
<matplotlib.collections.PathCollection at 0x2dde10a7460>
4.2.1.3 图形参数的设置
标题、标签、标尺及颜色
plt.plot(X,Y,c='red') # 控制图形的颜色colors, c='red'为红色
plt.ylim(0,8) # plt.xlim, plt.ylim: 设置横纵坐标轴范围
plt.xlabel('x');plt.ylabel('y'); # plt.xlabel, plt.ylabel: 设置坐标轴名称
plt.plot(X,Y,linestyle='--',marker='.')
# linestyle: 控制连线的线性(-:实线,--:虚线, ::点线)
# marker:控制符号的类型,例如:'o'控制实心圆点图
[<matplotlib.lines.Line2D at 0x2dde117e4c0>]
plt.plot(X,Y,linestyle='-',marker='o')
[<matplotlib.lines.Line2D at 0x2dde11deb20>]
plt.plot(X,Y,linestyle=':',marker='o')
[<matplotlib.lines.Line2D at 0x2dde26b4100>]
绘制函数附加图形
plt.plot(X,Y,'o--')
plt.axvline(x=1) # 垂直线:在横坐标x处画垂直线(plt.axvline)
plt.axhline(y=4) # 水平线:在纵坐标y处画水平(plt.axhline)
<matplotlib.lines.Line2D at 0x2dde216a340>
文字函数:text(x,y,labels,...),在(x,y)处添加用labels指定的文字
plt.plot(X,Y);plt.text(2,7,'peak point');
图例:绘制图形后,可使用legend函数给图形加图例
plt.plot(X,Y,label='line');plt.legend();
plt.plot(X,Y,'.',label='point');plt.legend()
<matplotlib.legend.Legend at 0x2dde2a9ba90>
误差线图
s=[0.1,0.4,0.7,0.3,0.2,0.5,0.6] # 误差值
plt.plot(X,Y);plt.errorbar(X,Y,yerr=s,fmt='o',capsize=4)
<ErrorbarContainer object of 3 artists>
误差条图
plt.bar(X,Y,yerr=s,capsize=4) # kw={'capsize': 4}
<BarContainer object of 7 artists>
4.2.1.4 多图的排列与绘制
- 在matplotlib下,一个Figure对象可以包含多个子图(Axes),有两种调用形式
- subplot(numRows,numCols,plotNum)
- fig.ax=plt.subplots(numRows,numCols,figsize=(width,height))
一行绘制两个图形
plt.subplot(121);plt.bar(X,Y);
plt.subplot(122);plt.plot(Y);
一列绘制两个图形
plt.subplot(211);plt.bar(X,Y);
plt.subplot(212);plt.plot(Y);
根据页面大小绘制两个图形
fig,ax=plt.subplots(1,2,figsize=(10,4))
ax[0].bar(X,Y);ax[1].plot(X,Y);
一页绘制四个图形
fig,ax=plt.subplots(2,2,figsize=(10,8))
ax[0,0].bar(X,Y);
ax[0,1].pie(Y,labels=X);
ax[1,0].plot(Y);
ax[1,1].plot(Y,'.-',linewidth=3);