python数据处理：数据合并和Reshaping

本文资料来自于：

　　Python for Data Analysis: Chapter5, 7, 12

文中实例查看地址：http://nbviewer.jupyter.org/github/RZAmber/for_blog/blob/master/learn_numpy.ipynb

1. Combing and Merging Data Sets

在pandas中，数据可以通过三种方式进行合并。

1.1 panda.merge

　通过一个或者多个key连接两个df的row。这根跟sql中的join运算类似。等复习完sql之后再来补充

1.2 pandas.concat

　numpy有concatenate函数来连接两个ndarray，pandas中用concat函数，沿着轴粘连或者堆积对象

　　默认按照行（垂直方向）来合并，axis=1为按照列

　　参数说明：join：选择合并类型，inner outer等

　　　　　　 join_axes: 规定合并的轴

　　　　　　 keys: 建立分层索引

b[:-2]

　　拼接有overlapping data的对象，用其中一个boject的值补充另外一个object中的missing data

这是一个if else的关系，类似numpy.where函数。

b[:-2].combine_first(a[2:])：解释就是如果(if) pd.isnull(b[:-2])中为Ture，就用a[2:]中的值补充，否则(else)还是使用b[:-2]的值

a[2:].combine_first(b[:-2])：就是相反的，所以可以看到a[2:]中d的值为NaN，所以用了b[:-2]中d的值2.0。

2. Reshaping

2.1 Hierarchical indexing

了解reshaping要先了解分层索引，因为这两者经常一起使用。

分层索引可以让一个轴(axis)拥有多个index level。It provides a way for you to work with higher dimensional data in a lower dmensional form.

调取index参数的时候，名称是MultiIndex，通过这个也可以确定这个是分层索引

Hierachical indexing plays a critical role in reshaping data and group-based operations like forming a pivot table.

2.2 reshaping：用来rearrange tablular data

2.2.1 reshaping in hierarchical indexing

a: stack rotetes from the columns in the data to rows

b: unstack pivot from the rows into the col

　　注意：1. 默认最底层进行unstack或者stack。如果换其他level的话可以输入level number或者name

　　　　　2. 默认stacking筛除missing data，加入stack(dropna=False)就可以了

3 advanced numpy

如上文所说，numpy有concatenate函数

numpy.concatenate takes a swquencc (tuple, list, etc) of array and joins them together in order along the input axis.

这经常跟stack系列函数放在一起使用来进行data的重组或者调用，但是我们常用panda.concat函数，跟numpy.concatenate一个意思

而且下列函数比numpy.concatenate更加简便

下面是一个函数列表：

stack() Join a sequence of arrays along a new axis.

hstack() Stack arrays in sequence horizontally (column wise).

vstack() Stack arrays in sequence vertically (row wise).

dstack() Stack arrays in sequence depth wise (along third dimension).

concatenate() Join a sequence of arrays along an existing axis.

vsplit () Split array into a list of multiple sub-arrays vertically.

posted on 2017-07-18 19:24 xbamboo 阅读(760) 评论(0) 收藏举报

刷新页面返回顶部