数据的描述性分析
常用统计函数表:
- 计数
value_counts 针对一维频数表
crosstab 针对二维列联表
pivot_table 针对多维透视表 - 计量
mean 算均值
median 算中位数
quantile 算分位数
std 算标准差
import pandas as pd
BSdata=pd.read_excel('data/BSdata.xlsx','Sheet1');BSdata #读取数据
Region/Country/Area | Unnamed: 1 | Year | Series | Value | Footnotes | Source | |
---|---|---|---|---|---|---|---|
0 | 1 | Total, all countries or areas | 2010 | Population mid-year estimates (millions) | 6956.82 | NaN | United Nations Population Division, New York, ... |
1 | 1 | Total, all countries or areas | 2010 | Population mid-year estimates for males (milli... | 3507.70 | NaN | United Nations Population Division, New York, ... |
2 | 1 | Total, all countries or areas | 2010 | Population mid-year estimates for females (mil... | 3449.12 | NaN | United Nations Population Division, New York, ... |
3 | 1 | Total, all countries or areas | 2010 | Sex ratio (males per 100 females) | 101.70 | NaN | United Nations Population Division, New York, ... |
4 | 1 | Total, all countries or areas | 2010 | Population aged 0 to 14 years old (percentage) | 27.00 | NaN | United Nations Population Division, New York, ... |
5 | 1 | Total, all countries or areas | 2010 | Population aged 60+ years old (percentage) | 11.00 | NaN | United Nations Population Division, New York, ... |
6 | 1 | Total, all countries or areas | 2010 | Population density | 53.50 | NaN | United Nations Population Division, New York, ... |
7 | 1 | Total, all countries or areas | 2015 | Population mid-year estimates (millions) | 7379.80 | NaN | United Nations Population Division, New York, ... |
8 | 1 | Total, all countries or areas | 2015 | Population mid-year estimates for males (milli... | 3720.70 | NaN | United Nations Population Division, New York, ... |
9 | 1 | Total, all countries or areas | 2015 | Population mid-year estimates for females (mil... | 3659.10 | NaN | United Nations Population Division, New York, ... |
10 | 1 | Total, all countries or areas | 2015 | Sex ratio (males per 100 females) | 101.70 | NaN | United Nations Population Division, New York, ... |
11 | 1 | Total, all countries or areas | 2015 | Population aged 0 to 14 years old (percentage) | 26.20 | NaN | United Nations Population Division, New York, ... |
12 | 1 | Total, all countries or areas | 2015 | Population aged 60+ years old (percentage) | 12.20 | NaN | United Nations Population Division, New York, ... |
13 | 1 | Total, all countries or areas | 2015 | Population density | 56.70 | NaN | United Nations Population Division, New York, ... |
14 | 1 | Total, all countries or areas | 2015 | Surface area (thousand km2) | 136162.00 | NaN | United Nations Statistics Division, New York, ... |
15 | 1 | Total, all countries or areas | 2019 | Population mid-year estimates (millions) | 7713.47 | NaN | United Nations Population Division, New York, ... |
16 | 1 | Total, all countries or areas | 2019 | Population mid-year estimates for males (milli... | 3889.03 | NaN | United Nations Population Division, New York, ... |
17 | 1 | Total, all countries or areas | 2019 | Population mid-year estimates for females (mil... | 3824.43 | NaN | United Nations Population Division, New York, ... |
18 | 1 | Total, all countries or areas | 2019 | Sex ratio (males per 100 females) | 101.70 | NaN | United Nations Population Division, New York, ... |
19 | 1 | Total, all countries or areas | 2019 | Population aged 0 to 14 years old (percentage) | 25.60 | NaN | United Nations Population Division, New York, ... |
20 | 1 | Total, all countries or areas | 2019 | Population aged 60+ years old (percentage) | 13.20 | NaN | United Nations Population Division, New York, ... |
21 | 1 | Total, all countries or areas | 2019 | Population density | 59.30 | NaN | United Nations Population Division, New York, ... |
22 | 1 | Total, all countries or areas | 2019 | Surface area (thousand km2) | 130094.00 | NaN | United Nations Statistics Division, New York, ... |
23 | 1 | Total, all countries or areas | 2021 | Population mid-year estimates (millions) | 7874.97 | Projected estimate (medium fertility variant). | United Nations Population Division, New York, ... |
24 | 1 | Total, all countries or areas | 2021 | Population mid-year estimates for males (milli... | 3970.24 | Projected estimate (medium fertility variant). | United Nations Population Division, New York, ... |
1 计数数据汇总分析
# 【1】频数:绝对数
T1=BSdata['Year'].value_counts();T1
2015 8
2019 8
2010 7
2021 2
Name: Year, dtype: int64
# 【2】频率:相对数
T1/sum(T1)*100
2015 32.0
2019 32.0
2010 28.0
2021 8.0
Name: Year, dtype: float64
2 计量数据汇总分析
- 集中趋势:均值、中位数、众数
- 离散程度:方差、标准差、变异系数
# 反映数据集中趋势
# 均数(算术平均值)
X=BSdata['Value']
X.mean()
12911.647199999998
# 中位数
X.median()
3449.12
如果均值和中位数差不多,则说明数据是对称的、正态的
# 反映数据离散程度
# 极差
X.max()-X.min() # 简单,但受极大值和极小值影响很大
136151.0
# 方差 - 离均差平方和除n-1
X.var() # 无偏估计 即除以n-1
1317422274.184596
# 标准差 - 方差的开方
X.std()
36296.31212925903
# 四分位数间距(IQR)
X.quantile(0.75)-X.quantile(0.25)
3916.74
# 偏度 - 离均差立方和除以n
X.skew()
3.267375071429257
# 峰度 - 离均差四次方的和
X.kurt()
9.528076655103652
3 汇总性统计量
默认为计算计量数据的基本统计量
BSdata.describe()
Region/Country/Area | Year | Value | |
---|---|---|---|
count | 25.0 | 25.000000 | 25.000000 |
mean | 1.0 | 2015.360000 | 12911.647200 |
std | 0.0 | 3.935734 | 36296.312129 |
min | 1.0 | 2010.000000 | 11.000000 |
25% | 1.0 | 2010.000000 | 53.500000 |
50% | 1.0 | 2015.000000 | 3449.120000 |
75% | 1.0 | 2019.000000 | 3970.240000 |
max | 1.0 | 2021.000000 | 136162.000000 |
BSdata[['Unnamed: 1','Series','Footnotes','Source']].describe() # 计数数据统计
Unnamed: 1 | Series | Footnotes | Source | |
---|---|---|---|---|
count | 25 | 25 | 2 | 25 |
unique | 1 | 8 | 1 | 3 |
top | Total, all countries or areas | Population mid-year estimates (millions) | Projected estimate (medium fertility variant). | United Nations Population Division, New York, ... |
freq | 25 | 4 | 2 | 14 |
-自编计算基本统计量函数
def stats(x):
stat=[x.count(),x.min(),x.quantile(.25),x.mean(),x.median(),x.quantile(.75),x.max(),x.max()-x.min(),x.var(),x.std(),x.skew(),x.kurt()]
stat=pd.Series(stat,index=['Count','Min','Q1(25%)','Mean','Median','Q3(75%)','Max','Range','Var','Std','Skew','Kurt'])
return stat
stats(BSdata.Year)
Count 25.000000
Min 2010.000000
Q1(25%) 2010.000000
Mean 2015.360000
Median 2015.000000
Q3(75%) 2019.000000
Max 2021.000000
Range 11.000000
Var 15.490000
Std 3.935734
Skew -0.247878
Kurt -1.361406
dtype: float64
stats(BSdata.Value)
Count 2.500000e+01
Min 1.100000e+01
Q1(25%) 5.350000e+01
Mean 1.291165e+04
Median 3.449120e+03
Q3(75%) 3.970240e+03
Max 1.361620e+05
Range 1.361510e+05
Var 1.317422e+09
Std 3.629631e+04
Skew 3.267375e+00
Kurt 9.528077e+00
dtype: float64