pandas练习(二)------ 数据过滤与排序

数据过滤与排序------探索2012欧洲杯数据

相关数据见(github

步骤1 - 导入pandas库

import pandas as pd

步骤2 - 数据集

path2 = "./data/Euro2012.csv"      # Euro2012.csv

步骤3 - 将数据集命名为euro12

euro12 = pd.read_csv(path2)
euro12.tail()

输出:

步骤4 选取 Goals 这一列

euro12.Goals  # euro12['Goals'] 

输出:

步骤5 有多少球队参与了2012欧洲杯?

euro12.shape[0]

输出:

16

步骤6 该数据集中一共有多少列(columns)?

euro12.info()

输出:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 35 columns):
Team                          16 non-null object
Goals                         16 non-null int64
Shots on target               16 non-null int64
Shots off target              16 non-null int64
Shooting Accuracy             16 non-null object
% Goals-to-shots              16 non-null object
Total shots (inc. Blocked)    16 non-null int64
Hit Woodwork                  16 non-null int64
Penalty goals                 16 non-null int64
Penalties not scored          16 non-null int64
Headed goals                  16 non-null int64
Passes                        16 non-null int64
Passes completed              16 non-null int64
Passing Accuracy              16 non-null object
Touches                       16 non-null int64
Crosses                       16 non-null int64
Dribbles                      16 non-null int64
Corners Taken                 16 non-null int64
Tackles                       16 non-null int64
Clearances                    16 non-null int64
Interceptions                 16 non-null int64
Clearances off line           15 non-null float64
Clean Sheets                  16 non-null int64
Blocks                        16 non-null int64
Goals conceded                16 non-null int64
Saves made                    16 non-null int64
Saves-to-shots ratio          16 non-null object
Fouls Won                     16 non-null int64
Fouls Conceded                16 non-null int64
Offsides                      16 non-null int64
Yellow Cards                  16 non-null int64
Red Cards                     16 non-null int64
Subs on                       16 non-null int64
Subs off                      16 non-null int64
Players Used                  16 non-null int64
dtypes: float64(1), int64(29), object(5)
memory usage: 4.5+ KB

步骤7 将数据集中的列Team, Yellow Cards和Red Cards单独存为一个名叫discipline的数据框

discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']]
discipline

输出:

 

步骤8 对数据框discipline按照先Red Cards再Yellow Cards进行排序

discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending = False)

 输出:

 

步骤9 计算每个球队拿到的黄牌数的平均值

round(discipline['Yellow Cards'].mean())

输出:

7.0

步骤10 找到进球数Goals超过6的球队数据

euro12[euro12.Goals > 6]

输出:

步骤11 选取以字母G开头或以e结尾的球队数据

# euro12[euro12.Team.str.startswith('G')]
euro12[euro12.Team.str.endswith('e')]  # 以字母e结束的球队

输出:

步骤12 选取前7列

euro12.iloc[: , 0:7]

输出:

步骤13 选取除了最后3列之外的全部列

euro12.iloc[: , :-3]

输出:

 

步骤14 找到英格兰(England)、意大利(Italy)和俄罗斯(Russia)的命中率(Shooting Accuracy)

euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team','Shooting Accuracy']]

输出:

 

参考链接:

1、http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook

2、https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/

3、https://github.com/guipsamora/pandas_exercises

posted @ 2018-06-13 11:15  半夜打老虎  阅读(6543)  评论(0编辑  收藏  举报