数据的描述性分析

常用统计函数表:

  • 计数
    value_counts 针对一维频数表
    crosstab 针对二维列联表
    pivot_table 针对多维透视表
  • 计量
    mean 算均值
    median 算中位数
    quantile 算分位数
    std 算标准差
import pandas as pd
BSdata=pd.read_excel('data/BSdata.xlsx','Sheet1');BSdata #读取数据
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source
0 1 Total, all countries or areas 2010 Population mid-year estimates (millions) 6956.82 NaN United Nations Population Division, New York, ...
1 1 Total, all countries or areas 2010 Population mid-year estimates for males (milli... 3507.70 NaN United Nations Population Division, New York, ...
2 1 Total, all countries or areas 2010 Population mid-year estimates for females (mil... 3449.12 NaN United Nations Population Division, New York, ...
3 1 Total, all countries or areas 2010 Sex ratio (males per 100 females) 101.70 NaN United Nations Population Division, New York, ...
4 1 Total, all countries or areas 2010 Population aged 0 to 14 years old (percentage) 27.00 NaN United Nations Population Division, New York, ...
5 1 Total, all countries or areas 2010 Population aged 60+ years old (percentage) 11.00 NaN United Nations Population Division, New York, ...
6 1 Total, all countries or areas 2010 Population density 53.50 NaN United Nations Population Division, New York, ...
7 1 Total, all countries or areas 2015 Population mid-year estimates (millions) 7379.80 NaN United Nations Population Division, New York, ...
8 1 Total, all countries or areas 2015 Population mid-year estimates for males (milli... 3720.70 NaN United Nations Population Division, New York, ...
9 1 Total, all countries or areas 2015 Population mid-year estimates for females (mil... 3659.10 NaN United Nations Population Division, New York, ...
10 1 Total, all countries or areas 2015 Sex ratio (males per 100 females) 101.70 NaN United Nations Population Division, New York, ...
11 1 Total, all countries or areas 2015 Population aged 0 to 14 years old (percentage) 26.20 NaN United Nations Population Division, New York, ...
12 1 Total, all countries or areas 2015 Population aged 60+ years old (percentage) 12.20 NaN United Nations Population Division, New York, ...
13 1 Total, all countries or areas 2015 Population density 56.70 NaN United Nations Population Division, New York, ...
14 1 Total, all countries or areas 2015 Surface area (thousand km2) 136162.00 NaN United Nations Statistics Division, New York, ...
15 1 Total, all countries or areas 2019 Population mid-year estimates (millions) 7713.47 NaN United Nations Population Division, New York, ...
16 1 Total, all countries or areas 2019 Population mid-year estimates for males (milli... 3889.03 NaN United Nations Population Division, New York, ...
17 1 Total, all countries or areas 2019 Population mid-year estimates for females (mil... 3824.43 NaN United Nations Population Division, New York, ...
18 1 Total, all countries or areas 2019 Sex ratio (males per 100 females) 101.70 NaN United Nations Population Division, New York, ...
19 1 Total, all countries or areas 2019 Population aged 0 to 14 years old (percentage) 25.60 NaN United Nations Population Division, New York, ...
20 1 Total, all countries or areas 2019 Population aged 60+ years old (percentage) 13.20 NaN United Nations Population Division, New York, ...
21 1 Total, all countries or areas 2019 Population density 59.30 NaN United Nations Population Division, New York, ...
22 1 Total, all countries or areas 2019 Surface area (thousand km2) 130094.00 NaN United Nations Statistics Division, New York, ...
23 1 Total, all countries or areas 2021 Population mid-year estimates (millions) 7874.97 Projected estimate (medium fertility variant). United Nations Population Division, New York, ...
24 1 Total, all countries or areas 2021 Population mid-year estimates for males (milli... 3970.24 Projected estimate (medium fertility variant). United Nations Population Division, New York, ...

1 计数数据汇总分析

# 【1】频数:绝对数
T1=BSdata['Year'].value_counts();T1
2015    8
2019    8
2010    7
2021    2
Name: Year, dtype: int64
# 【2】频率:相对数
T1/sum(T1)*100
2015    32.0
2019    32.0
2010    28.0
2021     8.0
Name: Year, dtype: float64

2 计量数据汇总分析

  • 集中趋势:均值、中位数、众数
  • 离散程度:方差、标准差、变异系数
# 反映数据集中趋势
# 均数(算术平均值)
X=BSdata['Value']
X.mean()
12911.647199999998
# 中位数
X.median()
3449.12

如果均值和中位数差不多,则说明数据是对称的、正态的

# 反映数据离散程度
# 极差
X.max()-X.min() # 简单,但受极大值和极小值影响很大
136151.0
# 方差 - 离均差平方和除n-1
X.var() # 无偏估计 即除以n-1
1317422274.184596
# 标准差 - 方差的开方
X.std()
36296.31212925903
# 四分位数间距(IQR)
X.quantile(0.75)-X.quantile(0.25)
3916.74
# 偏度 - 离均差立方和除以n
X.skew()
3.267375071429257
# 峰度 - 离均差四次方的和
X.kurt()
9.528076655103652

3 汇总性统计量

默认为计算计量数据的基本统计量

BSdata.describe()
Region/Country/Area Year Value
count 25.0 25.000000 25.000000
mean 1.0 2015.360000 12911.647200
std 0.0 3.935734 36296.312129
min 1.0 2010.000000 11.000000
25% 1.0 2010.000000 53.500000
50% 1.0 2015.000000 3449.120000
75% 1.0 2019.000000 3970.240000
max 1.0 2021.000000 136162.000000
BSdata[['Unnamed: 1','Series','Footnotes','Source']].describe() # 计数数据统计
Unnamed: 1 Series Footnotes Source
count 25 25 2 25
unique 1 8 1 3
top Total, all countries or areas Population mid-year estimates (millions) Projected estimate (medium fertility variant). United Nations Population Division, New York, ...
freq 25 4 2 14

-自编计算基本统计量函数

def stats(x):
    stat=[x.count(),x.min(),x.quantile(.25),x.mean(),x.median(),x.quantile(.75),x.max(),x.max()-x.min(),x.var(),x.std(),x.skew(),x.kurt()]
    stat=pd.Series(stat,index=['Count','Min','Q1(25%)','Mean','Median','Q3(75%)','Max','Range','Var','Std','Skew','Kurt'])
    return stat
stats(BSdata.Year)
Count        25.000000
Min        2010.000000
Q1(25%)    2010.000000
Mean       2015.360000
Median     2015.000000
Q3(75%)    2019.000000
Max        2021.000000
Range        11.000000
Var          15.490000
Std           3.935734
Skew         -0.247878
Kurt         -1.361406
dtype: float64
stats(BSdata.Value)
Count      2.500000e+01
Min        1.100000e+01
Q1(25%)    5.350000e+01
Mean       1.291165e+04
Median     3.449120e+03
Q3(75%)    3.970240e+03
Max        1.361620e+05
Range      1.361510e+05
Var        1.317422e+09
Std        3.629631e+04
Skew       3.267375e+00
Kurt       9.528077e+00
dtype: float64
posted @ 2022-10-12 21:52  LUNA2333  阅读(118)  评论(0编辑  收藏  举报