Pandas(三)

3.3.4 数据框的操作

(1)数据框显示

info显示数据结构
head默认显示前5行
tail默认显示后5行

# 打印所有结果
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# 导入Pandas包
import pandas as pd
# 读取CSV数据
BSdata = pd.read_csv("data/test.csv", encoding="utf-8") #GBK
BSdata.info() # 数据框信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6944 entries, 0 to 6943
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Region/Country/Area  25 non-null     float64
 1   Unnamed: 1           25 non-null     object 
 2   Year                 25 non-null     float64
 3   Series               25 non-null     object 
 4   Value                25 non-null     object 
 5   Footnotes            2 non-null      object 
 6   Source               25 non-null     object 
dtypes: float64(2), object(5)
memory usage: 379.9+ KB
BSdata.head() # 显示前5行
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source
0 1.0 Total, all countries or areas 2010.0 Population mid-year estimates (millions) 6,956.82 NaN United Nations Population Division, New York, ...
1 1.0 Total, all countries or areas 2010.0 Population mid-year estimates for males (milli... 3,507.70 NaN United Nations Population Division, New York, ...
2 1.0 Total, all countries or areas 2010.0 Population mid-year estimates for females (mil... 3,449.12 NaN United Nations Population Division, New York, ...
3 1.0 Total, all countries or areas 2010.0 Sex ratio (males per 100 females) 101.7 NaN United Nations Population Division, New York, ...
4 1.0 Total, all countries or areas 2010.0 Population aged 0 to 14 years old (percentage) 27 NaN United Nations Population Division, New York, ...
BSdata.tail() # 显示后5行
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source
6939 NaN NaN NaN NaN NaN NaN NaN
6940 NaN NaN NaN NaN NaN NaN NaN
6941 NaN NaN NaN NaN NaN NaN NaN
6942 NaN NaN NaN NaN NaN NaN NaN
6943 NaN NaN NaN NaN NaN NaN NaN

(2)数据框列名(变量名)

columns 查看列名称

BSdata.columns # 查看列名称
Index(['Region/Country/Area', 'Unnamed: 1', 'Year', 'Series', 'Value',
       'Footnotes', 'Source'],
      dtype='object')

(3)数据框行名(样品名)

index

BSdata.index # 数据框行名
RangeIndex(start=0, stop=6944, step=1)

(4)数据框维度

shape

BSdata.shape # 显示数据框的行数和列数
BSdata.shape[0] # 数据框行数
BSdata.shape[1] # 数据框列数
(6944, 7)






6944






7

(5)数据框值(数组)

values

BSdata.values[:5] # 数据框值数组
array([[1.0, 'Total, all countries or areas', 2010.0,
        'Population mid-year estimates (millions)', '6,956.82', nan,
        'United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2021.'],
       [1.0, 'Total, all countries or areas', 2010.0,
        'Population mid-year estimates for males (millions)', '3,507.70',
        nan,
        'United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2021.'],
       [1.0, 'Total, all countries or areas', 2010.0,
        'Population mid-year estimates for females (millions)',
        '3,449.12', nan,
        'United Nations Population Division, New York, World Population Prospects: The 2019 Revision, last accessed June 2021.'],
       [1.0, 'Total, all countries or areas', 2010.0,
        'Sex ratio (males per 100 females)', '101.7', nan,
        'United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2019 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2021.'],
       [1.0, 'Total, all countries or areas', 2010.0,
        'Population aged 0 to 14 years old (percentage)', '27', nan,
        'United Nations Population Division, New York, World Population Prospects: The 2019 Revision; supplemented by data from the United Nations Statistics Division, New York, Demographic Yearbook 2019 and Secretariat for the Pacific Community (SPC) for small countries or areas, last accessed June 2021.']],
      dtype=object)

3.3.4.2 选取变量

(1)[""]


.

BSdata['Year'] # 取一列数据【方法1】
0       2010.0
1       2010.0
2       2010.0
3       2010.0
4       2010.0
         ...  
6939       NaN
6940       NaN
6941       NaN
6942       NaN
6943       NaN
Name: Year, Length: 6944, dtype: float64
BSdata[['Year','Series']] # 取两列数据
Year Series
0 2010.0 Population mid-year estimates (millions)
1 2010.0 Population mid-year estimates for males (milli...
2 2010.0 Population mid-year estimates for females (mil...
3 2010.0 Sex ratio (males per 100 females)
4 2010.0 Population aged 0 to 14 years old (percentage)
... ... ...
6939 NaN NaN
6940 NaN NaN
6941 NaN NaN
6942 NaN NaN
6943 NaN NaN

6944 rows × 2 columns

BSdata.Year # 取一列数据 【方法2】
0       2010.0
1       2010.0
2       2010.0
3       2010.0
4       2010.0
         ...  
6939       NaN
6940       NaN
6941       NaN
6942       NaN
6943       NaN
Name: Year, Length: 6944, dtype: float64
(2)下标法

【从零开始计数】

  • iloc i表示行index loc表示列column 【区间左闭右开】
  • loc 【闭合区间】
BSdata.iloc[:,2] # 取全部行、第1列
0       2010.0
1       2010.0
2       2010.0
3       2010.0
4       2010.0
         ...  
6939       NaN
6940       NaN
6941       NaN
6942       NaN
6943       NaN
Name: Year, Length: 6944, dtype: float64
BSdata.iloc[:,2:4] # 取全部行、第2、3列【从0开始计数】【区间左闭右开】
Year Series
0 2010.0 Population mid-year estimates (millions)
1 2010.0 Population mid-year estimates for males (milli...
2 2010.0 Population mid-year estimates for females (mil...
3 2010.0 Sex ratio (males per 100 females)
4 2010.0 Population aged 0 to 14 years old (percentage)
... ... ...
6939 NaN NaN
6940 NaN NaN
6941 NaN NaN
6942 NaN NaN
6943 NaN NaN

6944 rows × 2 columns

3.3.4.3 提取样品
BSdata.iloc[3,:] # 取第3行、全部列【从0开始计数】
Region/Country/Area                                                  1.0
Unnamed: 1                                 Total, all countries or areas
Year                                                              2010.0
Series                                 Sex ratio (males per 100 females)
Value                                                              101.7
Footnotes                                                            NaN
Source                 United Nations Population Division, New York, ...
Name: 3, dtype: object
BSdata.loc[3] # 取第3行【从0开始计数】 【效果同上】
Region/Country/Area                                                  1.0
Unnamed: 1                                 Total, all countries or areas
Year                                                              2010.0
Series                                 Sex ratio (males per 100 females)
Value                                                              101.7
Footnotes                                                            NaN
Source                 United Nations Population Division, New York, ...
Name: 3, dtype: object
BSdata.loc[3:5] # 取3至5行【闭合区间】
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source
3 1.0 Total, all countries or areas 2010.0 Sex ratio (males per 100 females) 101.7 NaN United Nations Population Division, New York, ...
4 1.0 Total, all countries or areas 2010.0 Population aged 0 to 14 years old (percentage) 27 NaN United Nations Population Division, New York, ...
5 1.0 Total, all countries or areas 2010.0 Population aged 60+ years old (percentage) 11 NaN United Nations Population Division, New York, ...
3.3.4.4 选取观测和变量
BSdata.loc[:3,['Year','Series']] # 选取0至3行、Year或Series两列数据【闭区间】
Year Series
0 2010.0 Population mid-year estimates (millions)
1 2010.0 Population mid-year estimates for males (milli...
2 2010.0 Population mid-year estimates for females (mil...
3 2010.0 Sex ratio (males per 100 females)
BSdata.iloc[:3,:5] # 0至2行、0至4列数据【区间左闭右开】
Region/Country/Area Unnamed: 1 Year Series Value
0 1.0 Total, all countries or areas 2010.0 Population mid-year estimates (millions) 6,956.82
1 1.0 Total, all countries or areas 2010.0 Population mid-year estimates for males (milli... 3,507.70
2 1.0 Total, all countries or areas 2010.0 Population mid-year estimates for females (mil... 3,449.12
3.3.4.5 条件选取
BSdata[BSdata['Year']>2010]
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source
7 1.0 Total, all countries or areas 2015.0 Population mid-year estimates (millions) 7,379.80 NaN United Nations Population Division, New York, ...
8 1.0 Total, all countries or areas 2015.0 Population mid-year estimates for males (milli... 3,720.70 NaN United Nations Population Division, New York, ...
9 1.0 Total, all countries or areas 2015.0 Population mid-year estimates for females (mil... 3,659.10 NaN United Nations Population Division, New York, ...
10 1.0 Total, all countries or areas 2015.0 Sex ratio (males per 100 females) 101.7 NaN United Nations Population Division, New York, ...
11 1.0 Total, all countries or areas 2015.0 Population aged 0 to 14 years old (percentage) 26.2 NaN United Nations Population Division, New York, ...
12 1.0 Total, all countries or areas 2015.0 Population aged 60+ years old (percentage) 12.2 NaN United Nations Population Division, New York, ...
13 1.0 Total, all countries or areas 2015.0 Population density 56.7 NaN United Nations Population Division, New York, ...
14 1.0 Total, all countries or areas 2015.0 Surface area (thousand km2) 136,162 NaN United Nations Statistics Division, New York, ...
15 1.0 Total, all countries or areas 2019.0 Population mid-year estimates (millions) 7,713.47 NaN United Nations Population Division, New York, ...
16 1.0 Total, all countries or areas 2019.0 Population mid-year estimates for males (milli... 3,889.03 NaN United Nations Population Division, New York, ...
17 1.0 Total, all countries or areas 2019.0 Population mid-year estimates for females (mil... 3,824.43 NaN United Nations Population Division, New York, ...
18 1.0 Total, all countries or areas 2019.0 Sex ratio (males per 100 females) 101.7 NaN United Nations Population Division, New York, ...
19 1.0 Total, all countries or areas 2019.0 Population aged 0 to 14 years old (percentage) 25.6 NaN United Nations Population Division, New York, ...
20 1.0 Total, all countries or areas 2019.0 Population aged 60+ years old (percentage) 13.2 NaN United Nations Population Division, New York, ...
21 1.0 Total, all countries or areas 2019.0 Population density 59.3 NaN United Nations Population Division, New York, ...
22 1.0 Total, all countries or areas 2019.0 Surface area (thousand km2) 130,094 NaN United Nations Statistics Division, New York, ...
23 1.0 Total, all countries or areas 2021.0 Population mid-year estimates (millions) 7,874.97 Projected estimate (medium fertility variant). United Nations Population Division, New York, ...
24 1.0 Total, all countries or areas 2021.0 Population mid-year estimates for males (milli... 3,970.24 Projected estimate (medium fertility variant). United Nations Population Division, New York, ...
BSdata[(BSdata['Year']>2010) & (BSdata['Year']<2016)]
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source
7 1.0 Total, all countries or areas 2015.0 Population mid-year estimates (millions) 7,379.80 NaN United Nations Population Division, New York, ...
8 1.0 Total, all countries or areas 2015.0 Population mid-year estimates for males (milli... 3,720.70 NaN United Nations Population Division, New York, ...
9 1.0 Total, all countries or areas 2015.0 Population mid-year estimates for females (mil... 3,659.10 NaN United Nations Population Division, New York, ...
10 1.0 Total, all countries or areas 2015.0 Sex ratio (males per 100 females) 101.7 NaN United Nations Population Division, New York, ...
11 1.0 Total, all countries or areas 2015.0 Population aged 0 to 14 years old (percentage) 26.2 NaN United Nations Population Division, New York, ...
12 1.0 Total, all countries or areas 2015.0 Population aged 60+ years old (percentage) 12.2 NaN United Nations Population Division, New York, ...
13 1.0 Total, all countries or areas 2015.0 Population density 56.7 NaN United Nations Population Division, New York, ...
14 1.0 Total, all countries or areas 2015.0 Surface area (thousand km2) 136,162 NaN United Nations Statistics Division, New York, ...
3.3.4.6 数据框的运算
  • 生成新的数据框
BSdata["年/值"]=BSdata['Region/Country/Area']+1
BSdata.head()
Region/Country/Area Unnamed: 1 Year Series Value Footnotes Source 年/值
0 1.0 Total, all countries or areas 2010.0 Population mid-year estimates (millions) 6,956.82 NaN United Nations Population Division, New York, ... 2.0
1 1.0 Total, all countries or areas 2010.0 Population mid-year estimates for males (milli... 3,507.70 NaN United Nations Population Division, New York, ... 2.0
2 1.0 Total, all countries or areas 2010.0 Population mid-year estimates for females (mil... 3,449.12 NaN United Nations Population Division, New York, ... 2.0
3 1.0 Total, all countries or areas 2010.0 Sex ratio (males per 100 females) 101.7 NaN United Nations Population Division, New York, ... 2.0
4 1.0 Total, all countries or areas 2010.0 Population aged 0 to 14 years old (percentage) 27 NaN United Nations Population Division, New York, ... 2.0
(2)数据框的合并

concat()

pd.concat([BSdata.Year,BSdata.Series],axis=0) # 按行合并 axis=0
0       2010.0
1       2010.0
2       2010.0
3       2010.0
4       2010.0
         ...  
6939       NaN
6940       NaN
6941       NaN
6942       NaN
6943       NaN
Length: 13888, dtype: object
pd.concat([BSdata.Year,BSdata.Series],axis=1) # 按列合并 axis=1
Year Series
0 2010.0 Population mid-year estimates (millions)
1 2010.0 Population mid-year estimates for males (milli...
2 2010.0 Population mid-year estimates for females (mil...
3 2010.0 Sex ratio (males per 100 females)
4 2010.0 Population aged 0 to 14 years old (percentage)
... ... ...
6939 NaN NaN
6940 NaN NaN
6941 NaN NaN
6942 NaN NaN
6943 NaN NaN

6944 rows × 2 columns

(3)数据框转置

T

BSdata.iloc[:3,:5].T
0 1 2
Region/Country/Area 1.0 1.0 1.0
Unnamed: 1 Total, all countries or areas Total, all countries or areas Total, all countries or areas
Year 2010.0 2010.0 2010.0
Series Population mid-year estimates (millions) Population mid-year estimates for males (milli... Population mid-year estimates for females (mil...
Value 6,956.82 3,507.70 3,449.12
posted @ 2022-10-05 19:13  LUNA2333  阅读(52)  评论(0编辑  收藏  举报