源地址为:http://pandas.pydata.org/pandas-docs/stable/10min.html#min
Pandas(Python Data Analysis Library) 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。Pandas 纳入了大量库和一些标准的数据模型,提供了高效地操作大型数据集所需的工具。pandas提供了大量能使我们快速便捷地处理数据的函数和方法。你很快就会发现,它是使Python成为强大而高效的数据分析环境的重要因素之一。
Pandas的数据结构:
Series:一维数组,与Numpy中的一维array类似。二者与Python基本的数据结构List也很相近,其区别是:List中的元素可以是不同的数据类型,而Array和Series中则只允许存储相同的数据类型,这样可以更有效的使用内存,提高运算效率。
Time- Series:以时间为索引的Series。
DataFrame:二维的表格型数据结构。很多功能与R中的data.frame类似。可以将DataFrame理解为Series的容器。以下的内容主要以DataFrame为主。
Panel :三维的数组,可以理解为DataFrame的容器。
对象建立¶
创建Series,在pandas中,Series是代labels的数组。 首先用list创建数据,pandas会自动创建labels
Out[70]:
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
通过传递numpy数组,使用datetime索引和标记列来创建DataFrame:
Out[71]:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
Out[72]:
| A | B | C | D |
2013-01-01 |
-0.234526 |
-0.758501 |
-0.609560 |
-0.541163 |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
通过传递可以转换为类似Series的对象的Dict来创建DataFrame。
Out[73]:
| A | B | C | D | E | F |
0 |
1.0 |
2013-01-02 |
1.0 |
3 |
test |
foo |
1 |
1.0 |
2013-01-02 |
1.0 |
3 |
train |
foo |
2 |
1.0 |
2013-01-02 |
1.0 |
3 |
test |
foo |
3 |
1.0 |
2013-01-02 |
1.0 |
3 |
train |
foo |
Out[74]:
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
Out[75]:
| A | B | C | D |
2013-01-01 |
-0.234526 |
-0.758501 |
-0.609560 |
-0.541163 |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
Out[76]:
| A | B | C | D |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
Out[77]:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
Out[78]:
Index(['A', 'B', 'C', 'D'], dtype='object')
Out[79]:
array([[-0.23452598, -0.75850137, -0.60955962, -0.54116297],
[-1.37778166, -0.80120685, 2.16304408, 0.48161429],
[ 0.34576422, 0.03167245, 0.66053017, 0.24051834],
[-0.89690453, -1.61403402, -0.53950167, -1.5127315 ],
[ 0.11994836, 0.93271156, -0.37552097, -1.90197732],
[ 1.1536568 , 1.55695533, -1.17266403, -0.097324 ]])
Out[80]:
| A | B | C | D |
count |
6.000000 |
6.000000 |
6.000000 |
6.000000 |
mean |
-0.148307 |
-0.108734 |
0.021055 |
-0.555177 |
std |
0.904502 |
1.187138 |
1.207575 |
0.964228 |
min |
-1.377782 |
-1.614034 |
-1.172664 |
-1.901977 |
25% |
-0.731310 |
-0.790530 |
-0.592045 |
-1.269839 |
50% |
-0.057289 |
-0.363414 |
-0.457511 |
-0.319243 |
75% |
0.289310 |
0.707452 |
0.401517 |
0.156058 |
max |
1.153657 |
1.556955 |
2.163044 |
0.481614 |
Out[81]:
| 2013-01-01 00:00:00 | 2013-01-02 00:00:00 | 2013-01-03 00:00:00 | 2013-01-04 00:00:00 | 2013-01-05 00:00:00 | 2013-01-06 00:00:00 |
A |
-0.234526 |
-1.377782 |
0.345764 |
-0.896905 |
0.119948 |
1.153657 |
B |
-0.758501 |
-0.801207 |
0.031672 |
-1.614034 |
0.932712 |
1.556955 |
C |
-0.609560 |
2.163044 |
0.660530 |
-0.539502 |
-0.375521 |
-1.172664 |
D |
-0.541163 |
0.481614 |
0.240518 |
-1.512732 |
-1.901977 |
-0.097324 |
Out[82]:
| D | C | B | A |
2013-01-01 |
-0.541163 |
-0.609560 |
-0.758501 |
-0.234526 |
2013-01-02 |
0.481614 |
2.163044 |
-0.801207 |
-1.377782 |
2013-01-03 |
0.240518 |
0.660530 |
0.031672 |
0.345764 |
2013-01-04 |
-1.512732 |
-0.539502 |
-1.614034 |
-0.896905 |
2013-01-05 |
-1.901977 |
-0.375521 |
0.932712 |
0.119948 |
2013-01-06 |
-0.097324 |
-1.172664 |
1.556955 |
1.153657 |
Out[83]:
| A | B | C | D |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
2013-01-01 |
-0.234526 |
-0.758501 |
-0.609560 |
-0.541163 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
注意:对于标准的Python/Numpy方法,选择与设置元素是直观并且易于交互的。但,对于生成代码,建议采用Pandas方法, .at, .iat, .loc, .iloc and .ix.
Out[84]:
2013-01-01 -0.234526
2013-01-02 -1.377782
2013-01-03 0.345764
2013-01-04 -0.896905
2013-01-05 0.119948
2013-01-06 1.153657
Freq: D, Name: A, dtype: float64
Out[85]:
| A | B | C | D |
2013-01-01 |
-0.234526 |
-0.758501 |
-0.609560 |
-0.541163 |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
Out[86]:
| A | B | C | D |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
通过label选择数据¶
通过label获取对应数据
Out[87]:
A -0.234526
B -0.758501
C -0.609560
D -0.541163
Name: 2013-01-01 00:00:00, dtype: float64
Out[88]:
| A | B |
2013-01-01 |
-0.234526 |
-0.758501 |
2013-01-02 |
-1.377782 |
-0.801207 |
2013-01-03 |
0.345764 |
0.031672 |
2013-01-04 |
-0.896905 |
-1.614034 |
2013-01-05 |
0.119948 |
0.932712 |
2013-01-06 |
1.153657 |
1.556955 |
Out[89]:
| A | B |
2013-01-02 |
-1.377782 |
-0.801207 |
2013-01-03 |
0.345764 |
0.031672 |
2013-01-04 |
-0.896905 |
-1.614034 |
Out[90]:
A -1.377782
B -0.801207
Name: 2013-01-02 00:00:00, dtype: float64
Out[93]:
A -0.896905
B -1.614034
C -0.539502
D -1.512732
Name: 2013-01-04 00:00:00, dtype: float64
通过行与列的位置选择数据,类似于python与numpy
Out[94]:
| A | B |
2013-01-04 |
-0.896905 |
-1.614034 |
2013-01-05 |
0.119948 |
0.932712 |
Out[95]:
| A | C |
2013-01-01 |
-0.234526 |
-0.609560 |
2013-01-03 |
0.345764 |
0.660530 |
2013-01-05 |
0.119948 |
-0.375521 |
Out[96]:
| A | B | C | D |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
Out[97]:
| B | C |
2013-01-01 |
-0.758501 |
-0.609560 |
2013-01-02 |
-0.801207 |
2.163044 |
2013-01-03 |
0.031672 |
0.660530 |
2013-01-04 |
-1.614034 |
-0.539502 |
2013-01-05 |
0.932712 |
-0.375521 |
2013-01-06 |
1.556955 |
-1.172664 |
Out[100]:
| A | B | C | D |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
Out[101]:
| A | B | C | D |
2013-01-01 |
NaN |
NaN |
NaN |
NaN |
2013-01-02 |
NaN |
NaN |
2.163044 |
0.481614 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2013-01-04 |
NaN |
NaN |
NaN |
NaN |
2013-01-05 |
0.119948 |
0.932712 |
NaN |
NaN |
2013-01-06 |
1.153657 |
1.556955 |
NaN |
NaN |
Out[102]:
| A | B | C | D | E |
2013-01-01 |
-0.234526 |
-0.758501 |
-0.609560 |
-0.541163 |
one |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
two |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
three |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
four |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
five |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
six |
Out[103]:
| A | B | C | D | E |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
two |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
four |
Out[104]:
2013-01-02 1
2013-01-03 2
2013-01-04 3
2013-01-05 4
2013-01-06 5
2013-01-07 6
Freq: D, dtype: int64
Out[105]:
| A | B | C | D | F |
2013-01-01 |
-0.234526 |
-0.758501 |
-0.609560 |
-0.541163 |
NaN |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
1.0 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2.0 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
3.0 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
4.0 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
5.0 |
Out[106]:
| A | B | C | D | F |
2013-01-01 |
0.000000 |
-0.758501 |
-0.609560 |
-0.541163 |
NaN |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
1.0 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2.0 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
3.0 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
4.0 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
5.0 |
Out[107]:
| A | B | C | D | F |
2013-01-01 |
0.000000 |
0.000000 |
-0.609560 |
-0.541163 |
NaN |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
0.481614 |
1.0 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
0.240518 |
2.0 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-1.512732 |
3.0 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
-1.901977 |
4.0 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
-0.097324 |
5.0 |
Out[109]:
| A | B | C | D | F |
2013-01-01 |
0.000000 |
0.000000 |
-0.609560 |
5 |
NaN |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
5 |
1.0 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
5 |
2.0 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
5 |
3.0 |
2013-01-05 |
0.119948 |
0.932712 |
-0.375521 |
5 |
4.0 |
2013-01-06 |
1.153657 |
1.556955 |
-1.172664 |
5 |
5.0 |
Out[112]:
| A | B | C | D | F |
2013-01-01 |
0.000000 |
0.000000 |
-0.609560 |
-5 |
NaN |
2013-01-02 |
-1.377782 |
-0.801207 |
-2.163044 |
-5 |
-1.0 |
2013-01-03 |
-0.345764 |
-0.031672 |
-0.660530 |
-5 |
-2.0 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
-5 |
-3.0 |
2013-01-05 |
-0.119948 |
-0.932712 |
-0.375521 |
-5 |
-4.0 |
2013-01-06 |
-1.153657 |
-1.556955 |
-1.172664 |
-5 |
-5.0 |
Pandas基本采用np.nan表示缺失数据,缺失数据默认不会参加运算。
重新索引允许您更改/添加/删除指定行或列上的索引。 这将返回数据的副本。
Out[113]:
| A | B | C | D | F | E |
2013-01-01 |
0.000000 |
0.000000 |
-0.609560 |
5 |
NaN |
1.0 |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
5 |
1.0 |
1.0 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
5 |
2.0 |
NaN |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
5 |
3.0 |
NaN |
Out[114]:
| A | B | C | D | F | E |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
5 |
1.0 |
1.0 |
Out[116]:
| A | B | C | D | F | E |
2013-01-01 |
0.000000 |
0.000000 |
-0.609560 |
5 |
5.0 |
1.0 |
2013-01-02 |
-1.377782 |
-0.801207 |
2.163044 |
5 |
1.0 |
1.0 |
2013-01-03 |
0.345764 |
0.031672 |
0.660530 |
5 |
2.0 |
5.0 |
2013-01-04 |
-0.896905 |
-1.614034 |
-0.539502 |
5 |
3.0 |
5.0 |
返回布尔数据,如果为nan返回true,否则返回false
Out[117]:
| A | B | C | D | F | E |
2013-01-01 |
False |
False |
False |
False |
True |
False |
2013-01-02 |
False |
False |
False |
False |
False |
False |
2013-01-03 |
False |
False |
False |
False |
False |
True |
2013-01-04 |
False |
False |
False |
False |
False |
True |
Out[118]:
A -0.109219
B 0.017683
C 0.021055
D 5.000000
F 3.000000
dtype: float64
Out[119]:
2013-01-01 1.097610
2013-01-02 1.196811
2013-01-03 1.607593
2013-01-04 0.989912
2013-01-05 1.935428
2013-01-06 2.307590
Freq: D, dtype: float64
操作具有不同维度和需要对齐的对象。 此外,Pandas会沿着指定的维度自动广播。
Out[120]:
2013-01-01 NaN
2013-01-02 NaN
2013-01-03 1.0
2013-01-04 3.0
2013-01-05 5.0
2013-01-06 NaN
Freq: D, dtype: float64
Out[121]:
| A | B | C | D | F |
2013-01-01 |
NaN |
NaN |
NaN |
NaN |
NaN |
2013-01-02 |
NaN |
NaN |
NaN |
NaN |
NaN |
2013-01-03 |
-0.654236 |
-0.968328 |
-0.339470 |
4.0 |
1.0 |
2013-01-04 |
-3.896905 |
-4.614034 |
-3.539502 |
2.0 |
0.0 |
2013-01-05 |
-4.880052 |
-4.067288 |
-5.375521 |
0.0 |
-1.0 |
2013-01-06 |
NaN |
NaN |
NaN |
NaN |
NaN |
Out[122]:
| A | B | C | D | F |
2013-01-01 |
0.000000 |
0.000000 |
-0.609560 |
5 |
NaN |
2013-01-02 |
-1.377782 |
-0.801207 |
1.553484 |
10 |
1.0 |
2013-01-03 |
-1.032017 |
-0.769534 |
2.214015 |
15 |
3.0 |
2013-01-04 |
-1.928922 |
-2.383568 |
1.674513 |
20 |
6.0 |
2013-01-05 |
-1.808974 |
-1.450857 |
1.298992 |
25 |
10.0 |
2013-01-06 |
-0.655317 |
0.106098 |
0.126328 |
30 |
15.0 |
Out[123]:
A 2.531438
B 3.170989
C 3.335708
D 0.000000
F 4.000000
dtype: float64
Out[125]:
0 5
1 4
2 0
3 6
4 1
5 0
6 4
7 1
8 1
9 4
dtype: int32
Out[126]:
4 3
1 3
0 2
6 1
5 1
dtype: int64
Series在string中提供配置了一组字符串处理方法,可以方便地对数组的每个元素进行操作。
Out[128]:
0 A
1 B
2 C
3 Aaba
4 Baca
5 NaN
6 CABA
7 dog
8 cat
dtype: object
Out[129]:
0 a
1 b
2 c
3 aaba
4 baca
5 NaN
6 caba
7 dog
8 cat
dtype: object
Pandas提供一系列合并数据的操作,例如:Series, DataFrame, and Panel操作
Out[133]:
| 0 | 1 | 2 | 3 |
0 |
-0.382355 |
-1.457945 |
-1.058879 |
1.106354 |
1 |
1.120769 |
-0.559993 |
-2.319053 |
1.404413 |
2 |
-0.382069 |
1.362170 |
0.289857 |
1.009210 |
3 |
-1.280824 |
1.158049 |
-0.356128 |
0.023975 |
4 |
0.797376 |
0.384624 |
0.152443 |
-1.392915 |
5 |
-1.886364 |
-0.276635 |
-1.159430 |
2.355427 |
6 |
-1.851654 |
0.781897 |
1.054916 |
0.323811 |
7 |
0.708013 |
1.648491 |
-0.031482 |
-1.245125 |
8 |
0.502539 |
-1.013665 |
-1.653738 |
0.748612 |
9 |
-0.773031 |
-1.085371 |
-1.070529 |
-0.190790 |
Out[135]:
| 0 | 1 | 2 | 3 |
0 |
-0.382355 |
-1.457945 |
-1.058879 |
1.106354 |
1 |
1.120769 |
-0.559993 |
-2.319053 |
1.404413 |
2 |
-0.382069 |
1.362170 |
0.289857 |
1.009210 |
3 |
-1.280824 |
1.158049 |
-0.356128 |
0.023975 |
4 |
0.797376 |
0.384624 |
0.152443 |
-1.392915 |
5 |
-1.886364 |
-0.276635 |
-1.159430 |
2.355427 |
6 |
-1.851654 |
0.781897 |
1.054916 |
0.323811 |
7 |
0.708013 |
1.648491 |
-0.031482 |
-1.245125 |
8 |
0.502539 |
-1.013665 |
-1.653738 |
0.748612 |
9 |
-0.773031 |
-1.085371 |
-1.070529 |
-0.190790 |
Out[139]:
| key | lval | rval |
0 |
foo |
1 |
4 |
1 |
foo |
1 |
5 |
2 |
foo |
2 |
4 |
3 |
foo |
2 |
5 |
Out[143]:
| key | lval | rval |
0 |
foo |
1 |
4 |
1 |
bar |
2 |
5 |
Out[145]:
| A | B | C | D |
0 |
-0.789698 |
0.807736 |
-0.535839 |
1.633274 |
1 |
1.122605 |
0.550968 |
-0.337497 |
-0.287827 |
2 |
0.231271 |
0.273175 |
0.891461 |
-0.531991 |
3 |
0.902182 |
0.802540 |
-0.319774 |
0.131826 |
4 |
-1.999558 |
-0.114146 |
0.449653 |
0.174982 |
5 |
1.688851 |
0.789424 |
-0.194151 |
2.017002 |
6 |
0.738474 |
-0.682955 |
-1.420737 |
-0.978726 |
7 |
-0.234849 |
0.682447 |
0.753980 |
0.327666 |
Out[146]:
| A | B | C | D |
0 |
-0.789698 |
0.807736 |
-0.535839 |
1.633274 |
1 |
1.122605 |
0.550968 |
-0.337497 |
-0.287827 |
2 |
0.231271 |
0.273175 |
0.891461 |
-0.531991 |
3 |
0.902182 |
0.802540 |
-0.319774 |
0.131826 |
4 |
-1.999558 |
-0.114146 |
0.449653 |
0.174982 |
5 |
1.688851 |
0.789424 |
-0.194151 |
2.017002 |
6 |
0.738474 |
-0.682955 |
-1.420737 |
-0.978726 |
7 |
-0.234849 |
0.682447 |
0.753980 |
0.327666 |
8 |
0.902182 |
0.802540 |
-0.319774 |
0.131826 |
group by操作可以分为如下操作:1.基于某些规则分组数据;
Out[147]:
| A | B | C | D |
0 |
foo |
one |
0.645354 |
0.110824 |
1 |
bar |
one |
1.090199 |
-1.103622 |
2 |
foo |
two |
0.492488 |
0.918727 |
3 |
bar |
three |
-1.938242 |
0.122475 |
4 |
foo |
two |
-0.416253 |
-1.056679 |
5 |
bar |
two |
-0.616998 |
0.482990 |
6 |
foo |
one |
0.311077 |
0.641402 |
7 |
foo |
three |
-1.400911 |
-0.126283 |
Out[148]:
| C | D |
A | | |
bar |
-1.465042 |
-0.498157 |
foo |
-0.368245 |
0.487990 |
Out[149]:
| | C | D |
A | B | | |
bar | one |
1.090199 |
-1.103622 |
three |
-1.938242 |
0.122475 |
two |
-0.616998 |
0.482990 |
foo | one |
0.956431 |
0.752225 |
three |
-1.400911 |
-0.126283 |
two |
0.076235 |
-0.137952 |
Out[151]:
| | A | B |
first | second | | |
bar | one |
0.482226 |
-1.085646 |
two |
0.805270 |
1.785375 |
baz | one |
-0.860945 |
-1.018662 |
two |
0.921586 |
-0.216946 |
Out[152]:
first second
bar one A 0.482226
B -1.085646
two A 0.805270
B 1.785375
baz one A -0.860945
B -1.018662
two A 0.921586
B -0.216946
dtype: float64
unstack()方法,是stack()方法的反操作,例如:
Out[153]:
| | A | B |
first | second | | |
bar | one |
0.482226 |
-1.085646 |
two |
0.805270 |
1.785375 |
baz | one |
-0.860945 |
-1.018662 |
two |
0.921586 |
-0.216946 |
Out[154]:
| second | one | two |
first | | | |
bar | A |
0.482226 |
0.805270 |
B |
-1.085646 |
1.785375 |
baz | A |
-0.860945 |
0.921586 |
B |
-1.018662 |
-0.216946 |
Out[155]:
| first | bar | baz |
second | | | |
one | A |
0.482226 |
-0.860945 |
B |
-1.085646 |
-1.018662 |
two | A |
0.805270 |
0.921586 |
B |
1.785375 |
-0.216946 |
Out[156]:
| A | B | C | D | E |
0 |
one |
A |
foo |
-1.863191 |
-0.420588 |
1 |
one |
B |
foo |
1.308556 |
1.373186 |
2 |
two |
C |
foo |
0.969745 |
-0.244350 |
3 |
three |
A |
bar |
0.308731 |
0.904415 |
4 |
one |
B |
bar |
1.918195 |
-0.113832 |
5 |
one |
C |
bar |
0.121828 |
-0.364853 |
6 |
two |
A |
foo |
0.240318 |
-0.536804 |
7 |
three |
B |
foo |
-1.067721 |
0.743653 |
8 |
one |
C |
foo |
-0.837986 |
0.775367 |
9 |
one |
A |
bar |
-0.622907 |
1.653944 |
10 |
two |
B |
bar |
0.221569 |
-2.451701 |
11 |
three |
C |
bar |
-2.312243 |
0.502830 |
Out[157]:
| C | bar | foo |
A | B | | |
one | A |
-0.622907 |
-1.863191 |
B |
1.918195 |
1.308556 |
C |
0.121828 |
-0.837986 |
three | A |
0.308731 |
NaN |
B |
NaN |
-1.067721 |
C |
-2.312243 |
NaN |
two | A |
NaN |
0.240318 |
B |
0.221569 |
NaN |
C |
NaN |
0.969745 |
Pandas具有简单,强大和高效的功能,用于在频率转换期间执行重采样操作
Out[158]:
2012-01-01 26325
Freq: 5T, dtype: int32
Out[159]:
2012-03-06 -0.756171
2012-03-07 -1.818210
2012-03-08 -1.742229
2012-03-09 -0.666278
2012-03-10 -0.246013
Freq: D, dtype: float64
Out[160]:
2012-03-06 00:00:00+00:00 -0.756171
2012-03-07 00:00:00+00:00 -1.818210
2012-03-08 00:00:00+00:00 -1.742229
2012-03-09 00:00:00+00:00 -0.666278
2012-03-10 00:00:00+00:00 -0.246013
Freq: D, dtype: float64
Out[161]:
2012-03-05 19:00:00-05:00 -0.756171
2012-03-06 19:00:00-05:00 -1.818210
2012-03-07 19:00:00-05:00 -1.742229
2012-03-08 19:00:00-05:00 -0.666278
2012-03-09 19:00:00-05:00 -0.246013
Freq: D, dtype: float64
Out[162]:
2012-01-31 -2.280217
2012-02-29 -1.265666
2012-03-31 -0.693234
2012-04-30 -0.160583
2012-05-31 0.394237
Freq: M, dtype: float64
Out[163]:
2012-01 -2.280217
2012-02 -1.265666
2012-03 -0.693234
2012-04 -0.160583
2012-05 0.394237
Freq: M, dtype: float64
Out[164]:
2012-01-01 -2.280217
2012-02-01 -1.265666
2012-03-01 -0.693234
2012-04-01 -0.160583
2012-05-01 0.394237
Freq: MS, dtype: float64
在周期和时间戳之间进行转换,可以使用一些方便的数学函数。
Out[165]:
1990-03-01 09:00 -0.116439
1990-06-01 09:00 0.689428
1990-09-01 09:00 1.603478
1990-12-01 09:00 -0.074146
1991-03-01 09:00 -0.642818
Freq: H, dtype: float64
pandas自0.15版本后,Pandas包含了Categorical Data
Out[167]:
0 a
1 b
2 b
3 a
4 a
5 e
Name: grade, dtype: category
Categories (3, object): [a, b, e]
将Categorical Data重命名为更有意义的名称
Out[169]:
0 very good
1 good
2 good
3 very good
4 very good
5 very bad
Name: grade, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]
Out[170]:
| id | raw_grade | grade |
5 |
6 |
e |
very bad |
1 |
2 |
b |
good |
2 |
3 |
b |
good |
0 |
1 |
a |
very good |
3 |
4 |
a |
very good |
4 |
5 |
a |
very good |
Out[171]:
grade
very bad 1
bad 0
medium 0
good 2
very good 3
dtype: int64
Out[174]:
<matplotlib.axes._subplots.AxesSubplot at 0x15444d8a518>
Out[176]:
<matplotlib.legend.Legend at 0x154451d2048>
Out[178]:
| Unnamed: 0 | A | B | C | D |
0 |
2000-01-01 |
-1.393111 |
1.514788 |
0.280721 |
-0.294551 |
1 |
2000-01-02 |
-0.642525 |
2.195259 |
0.592064 |
-0.057531 |
2 |
2000-01-03 |
-1.001919 |
1.979959 |
2.552074 |
1.107618 |
3 |
2000-01-04 |
0.949372 |
1.834230 |
2.081279 |
1.344826 |
4 |
2000-01-05 |
1.031375 |
2.840040 |
1.999606 |
2.436098 |
5 |
2000-01-06 |
0.709656 |
3.388670 |
1.952937 |
1.355730 |
6 |
2000-01-07 |
1.723083 |
3.503932 |
0.290802 |
1.950817 |
7 |
2000-01-08 |
2.374127 |
5.203101 |
-0.330689 |
3.287472 |
8 |
2000-01-09 |
2.674692 |
8.018739 |
-0.036404 |
3.521545 |
9 |
2000-01-10 |
2.630701 |
8.189795 |
0.385022 |
4.931897 |
10 |
2000-01-11 |
3.126444 |
7.579697 |
0.679853 |
5.769419 |
11 |
2000-01-12 |
2.709425 |
8.329440 |
-1.069793 |
5.963032 |
12 |
2000-01-13 |
2.860447 |
7.253739 |
-1.309274 |
5.421635 |
13 |
2000-01-14 |
3.314667 |
8.721476 |
-1.965677 |
4.064228 |
14 |
2000-01-15 |
2.349331 |
8.863615 |
-2.761195 |
4.930499 |
15 |
2000-01-16 |
1.958310 |
7.119070 |
-2.625528 |
2.593476 |
16 |
2000-01-17 |
2.070067 |
9.004844 |
-2.052658 |
2.760979 |
17 |
2000-01-18 |
1.965295 |
8.512214 |
-2.359618 |
0.699228 |
18 |
2000-01-19 |
0.900034 |
8.133227 |
-1.982716 |
1.624482 |
19 |
2000-01-20 |
0.035752 |
6.445181 |
-2.893294 |
1.594081 |
20 |
2000-01-21 |
-2.014831 |
6.620557 |
-4.773599 |
1.002027 |
21 |
2000-01-22 |
-0.923967 |
8.693741 |
-3.695795 |
0.558935 |
22 |
2000-01-23 |
-1.668225 |
7.790933 |
-3.573539 |
0.506860 |
23 |
2000-01-24 |
-2.056015 |
6.824887 |
-2.564427 |
-0.662262 |
24 |
2000-01-25 |
-1.843934 |
8.955545 |
-2.683112 |
-0.809258 |
25 |
2000-01-26 |
-1.856735 |
8.607138 |
-2.440229 |
-2.196696 |
26 |
2000-01-27 |
-3.463581 |
8.966359 |
-3.873806 |
-1.524501 |
27 |
2000-01-28 |
-4.788617 |
7.752132 |
-4.553026 |
-2.399321 |
28 |
2000-01-29 |
-5.976016 |
7.036011 |
-4.460207 |
-1.605071 |
29 |
2000-01-30 |
-5.034437 |
6.138244 |
-4.451524 |
-1.628326 |
... |
... |
... |
... |
... |
... |
970 |
2002-08-28 |
7.087637 |
-10.915543 |
-58.482667 |
-74.821127 |
971 |
2002-08-29 |
6.120060 |
-10.997083 |
-57.452650 |
-73.742471 |
972 |
2002-08-30 |
7.691062 |
-11.176013 |
-57.091866 |
-73.281751 |
973 |
2002-08-31 |
8.452480 |
-11.293687 |
-56.313169 |
-74.869274 |
974 |
2002-09-01 |
7.553114 |
-10.510296 |
-56.900910 |
-75.732503 |
975 |
2002-09-02 |
8.275944 |
-9.958087 |
-56.768495 |
-75.445624 |
976 |
2002-09-03 |
10.713038 |
-10.430143 |
-56.750073 |
-76.824079 |
977 |
2002-09-04 |
12.299215 |
-11.939397 |
-58.104176 |
-77.024297 |
978 |
2002-09-05 |
13.160027 |
-12.323100 |
-58.055834 |
-79.144961 |
979 |
2002-09-06 |
14.413937 |
-11.905493 |
-57.145252 |
-78.776055 |
980 |
2002-09-07 |
14.514871 |
-11.271294 |
-58.854309 |
-78.735910 |
981 |
2002-09-08 |
14.828057 |
-10.749718 |
-59.836208 |
-77.881782 |
982 |
2002-09-09 |
14.525107 |
-9.981579 |
-59.973495 |
-77.019424 |
983 |
2002-09-10 |
13.248493 |
-10.430729 |
-60.947579 |
-76.936804 |
984 |
2002-09-11 |
13.866722 |
-9.581368 |
-61.176041 |
-76.366701 |
985 |
2002-09-12 |
13.711199 |
-9.163406 |
-60.199849 |
-76.076018 |
986 |
2002-09-13 |
12.278210 |
-9.744253 |
-61.488643 |
-77.830263 |
987 |
2002-09-14 |
13.872331 |
-9.769701 |
-60.596225 |
-79.301200 |
988 |
2002-09-15 |
13.742700 |
-11.035063 |
-61.132231 |
-79.930390 |
989 |
2002-09-16 |
13.117977 |
-10.444062 |
-61.367571 |
-79.468517 |
990 |
2002-09-17 |
12.547621 |
-13.403313 |
-61.542260 |
-77.685127 |
991 |
2002-09-18 |
12.156448 |
-14.764817 |
-62.347422 |
-77.060493 |
992 |
2002-09-19 |
12.356847 |
-14.440572 |
-62.154593 |
-75.846297 |
993 |
2002-09-20 |
12.967553 |
-13.593848 |
-62.368733 |
-76.248747 |
994 |
2002-09-21 |
12.188960 |
-13.960158 |
-64.801744 |
-77.560436 |
995 |
2002-09-22 |
10.930343 |
-14.180484 |
-61.880778 |
-78.704844 |
996 |
2002-09-23 |
11.546164 |
-14.640839 |
-60.011920 |
-79.081217 |
997 |
2002-09-24 |
12.010131 |
-13.948313 |
-58.988687 |
-79.707070 |
998 |
2002-09-25 |
10.203590 |
-14.170849 |
-59.165740 |
-78.678488 |
999 |
2002-09-26 |
9.985623 |
-13.509717 |
-57.876236 |
-78.605853 |
1000 rows × 5 columns
Out[180]:
| A | B | C | D |
2000-01-01 |
-1.393111 |
1.514788 |
0.280721 |
-0.294551 |
2000-01-02 |
-0.642525 |
2.195259 |
0.592064 |
-0.057531 |
2000-01-03 |
-1.001919 |
1.979959 |
2.552074 |
1.107618 |
2000-01-04 |
0.949372 |
1.834230 |
2.081279 |
1.344826 |
2000-01-05 |
1.031375 |
2.840040 |
1.999606 |
2.436098 |
2000-01-06 |
0.709656 |
3.388670 |
1.952937 |
1.355730 |
2000-01-07 |
1.723083 |
3.503932 |
0.290802 |
1.950817 |
2000-01-08 |
2.374127 |
5.203101 |
-0.330689 |
3.287472 |
2000-01-09 |
2.674692 |
8.018739 |
-0.036404 |
3.521545 |
2000-01-10 |
2.630701 |
8.189795 |
0.385022 |
4.931897 |
2000-01-11 |
3.126444 |
7.579697 |
0.679853 |
5.769419 |
2000-01-12 |
2.709425 |
8.329440 |
-1.069793 |
5.963032 |
2000-01-13 |
2.860447 |
7.253739 |
-1.309274 |
5.421635 |
2000-01-14 |
3.314667 |
8.721476 |
-1.965677 |
4.064228 |
2000-01-15 |
2.349331 |
8.863615 |
-2.761195 |
4.930499 |
2000-01-16 |
1.958310 |
7.119070 |
-2.625528 |
2.593476 |
2000-01-17 |
2.070067 |
9.004844 |
-2.052658 |
2.760979 |
2000-01-18 |
1.965295 |
8.512214 |
-2.359618 |
0.699228 |
2000-01-19 |
0.900034 |
8.133227 |
-1.982716 |
1.624482 |
2000-01-20 |
0.035752 |
6.445181 |
-2.893294 |
1.594081 |
2000-01-21 |
-2.014831 |
6.620557 |
-4.773599 |
1.002027 |
2000-01-22 |
-0.923967 |
8.693741 |
-3.695795 |
0.558935 |
2000-01-23 |
-1.668225 |
7.790933 |
-3.573539 |
0.506860 |
2000-01-24 |
-2.056015 |
6.824887 |
-2.564427 |
-0.662262 |
2000-01-25 |
-1.843934 |
8.955545 |
-2.683112 |
-0.809258 |
2000-01-26 |
-1.856735 |
8.607138 |
-2.440229 |
-2.196696 |
2000-01-27 |
-3.463581 |
8.966359 |
-3.873806 |
-1.524501 |
2000-01-28 |
-4.788617 |
7.752132 |
-4.553026 |
-2.399321 |
2000-01-29 |
-5.976016 |
7.036011 |
-4.460207 |
-1.605071 |
2000-01-30 |
-5.034437 |
6.138244 |
-4.451524 |
-1.628326 |
... |
... |
... |
... |
... |
2002-08-28 |
7.087637 |
-10.915543 |
-58.482667 |
-74.821127 |
2002-08-29 |
6.120060 |
-10.997083 |
-57.452650 |
-73.742471 |
2002-08-30 |
7.691062 |
-11.176013 |
-57.091866 |
-73.281751 |
2002-08-31 |
8.452480 |
-11.293687 |
-56.313169 |
-74.869274 |
2002-09-01 |
7.553114 |
-10.510296 |
-56.900910 |
-75.732503 |
2002-09-02 |
8.275944 |
-9.958087 |
-56.768495 |
-75.445624 |
2002-09-03 |
10.713038 |
-10.430143 |
-56.750073 |
-76.824079 |
2002-09-04 |
12.299215 |
-11.939397 |
-58.104176 |
-77.024297 |
2002-09-05 |
13.160027 |
-12.323100 |
-58.055834 |
-79.144961 |
2002-09-06 |
14.413937 |
-11.905493 |
-57.145252 |
-78.776055 |
2002-09-07 |
14.514871 |
-11.271294 |
-58.854309 |
-78.735910 |
2002-09-08 |
14.828057 |
-10.749718 |
-59.836208 |
-77.881782 |
2002-09-09 |
14.525107 |
-9.981579 |
-59.973495 |
-77.019424 |
2002-09-10 |
13.248493 |
-10.430729 |
-60.947579 |
-76.936804 |
2002-09-11 |
13.866722 |
-9.581368 |
-61.176041 |
-76.366701 |
2002-09-12 |
13.711199 |
-9.163406 |
-60.199849 |
-76.076018 |
2002-09-13 |
12.278210 |
-9.744253 |
-61.488643 |
-77.830263 |
2002-09-14 |
13.872331 |
-9.769701 |
-60.596225 |
-79.301200 |
2002-09-15 |
13.742700 |
-11.035063 |
-61.132231 |
-79.930390 |
2002-09-16 |
13.117977 |
-10.444062 |
-61.367571 |
-79.468517 |
2002-09-17 |
12.547621 |
-13.403313 |
-61.542260 |
-77.685127 |
2002-09-18 |
12.156448 |
-14.764817 |
-62.347422 |
-77.060493 |
2002-09-19 |
12.356847 |
-14.440572 |
-62.154593 |
-75.846297 |
2002-09-20 |
12.967553 |
-13.593848 |
-62.368733 |
-76.248747 |
2002-09-21 |
12.188960 |
-13.960158 |
-64.801744 |
-77.560436 |
2002-09-22 |
10.930343 |
-14.180484 |
-61.880778 |
-78.704844 |
2002-09-23 |
11.546164 |
-14.640839 |
-60.011920 |
-79.081217 |
2002-09-24 |
12.010131 |
-13.948313 |
-58.988687 |
-79.707070 |
2002-09-25 |
10.203590 |
-14.170849 |
-59.165740 |
-78.678488 |
2002-09-26 |
9.985623 |
-13.509717 |
-57.876236 |
-78.605853 |
1000 rows × 4 columns
Out[182]:
| A | B | C | D |
2000-01-01 |
-1.393111 |
1.514788 |
0.280721 |
-0.294551 |
2000-01-02 |
-0.642525 |
2.195259 |
0.592064 |
-0.057531 |
2000-01-03 |
-1.001919 |
1.979959 |
2.552074 |
1.107618 |
2000-01-04 |
0.949372 |
1.834230 |
2.081279 |
1.344826 |
2000-01-05 |
1.031375 |
2.840040 |
1.999606 |
2.436098 |
2000-01-06 |
0.709656 |
3.388670 |
1.952937 |
1.355730 |
2000-01-07 |
1.723083 |
3.503932 |
0.290802 |
1.950817 |
2000-01-08 |
2.374127 |
5.203101 |
-0.330689 |
3.287472 |
2000-01-09 |
2.674692 |
8.018739 |
-0.036404 |
3.521545 |
2000-01-10 |
2.630701 |
8.189795 |
0.385022 |
4.931897 |
2000-01-11 |
3.126444 |
7.579697 |
0.679853 |
5.769419 |
2000-01-12 |
2.709425 |
8.329440 |
-1.069793 |
5.963032 |
2000-01-13 |
2.860447 |
7.253739 |
-1.309274 |
5.421635 |
2000-01-14 |
3.314667 |
8.721476 |
-1.965677 |
4.064228 |
2000-01-15 |
2.349331 |
8.863615 |
-2.761195 |
4.930499 |
2000-01-16 |
1.958310 |
7.119070 |
-2.625528 |
2.593476 |
2000-01-17 |
2.070067 |
9.004844 |
-2.052658 |
2.760979 |
2000-01-18 |
1.965295 |
8.512214 |
-2.359618 |
0.699228 |
2000-01-19 |
0.900034 |
8.133227 |
-1.982716 |
1.624482 |
2000-01-20 |
0.035752 |
6.445181 |
-2.893294 |
1.594081 |
2000-01-21 |
-2.014831 |
6.620557 |
-4.773599 |
1.002027 |
2000-01-22 |
-0.923967 |
8.693741 |
-3.695795 |
0.558935 |
2000-01-23 |
-1.668225 |
7.790933 |
-3.573539 |
0.506860 |
2000-01-24 |
-2.056015 |
6.824887 |
-2.564427 |
-0.662262 |
2000-01-25 |
-1.843934 |
8.955545 |
-2.683112 |
-0.809258 |
2000-01-26 |
-1.856735 |
8.607138 |
-2.440229 |
-2.196696 |
2000-01-27 |
-3.463581 |
8.966359 |
-3.873806 |
-1.524501 |
2000-01-28 |
-4.788617 |
7.752132 |
-4.553026 |
-2.399321 |
2000-01-29 |
-5.976016 |
7.036011 |
-4.460207 |
-1.605071 |
2000-01-30 |
-5.034437 |
6.138244 |
-4.451524 |
-1.628326 |
... |
... |
... |
... |
... |
2002-08-28 |
7.087637 |
-10.915543 |
-58.482667 |
-74.821127 |
2002-08-29 |
6.120060 |
-10.997083 |
-57.452650 |
-73.742471 |
2002-08-30 |
7.691062 |
-11.176013 |
-57.091866 |
-73.281751 |
2002-08-31 |
8.452480 |
-11.293687 |
-56.313169 |
-74.869274 |
2002-09-01 |
7.553114 |
-10.510296 |
-56.900910 |
-75.732503 |
2002-09-02 |
8.275944 |
-9.958087 |
-56.768495 |
-75.445624 |
2002-09-03 |
10.713038 |
-10.430143 |
-56.750073 |
-76.824079 |
2002-09-04 |
12.299215 |
-11.939397 |
-58.104176 |
-77.024297 |
2002-09-05 |
13.160027 |
-12.323100 |
-58.055834 |
-79.144961 |
2002-09-06 |
14.413937 |
-11.905493 |
-57.145252 |
-78.776055 |
2002-09-07 |
14.514871 |
-11.271294 |
-58.854309 |
-78.735910 |
2002-09-08 |
14.828057 |
-10.749718 |
-59.836208 |
-77.881782 |
2002-09-09 |
14.525107 |
-9.981579 |
-59.973495 |
-77.019424 |
2002-09-10 |
13.248493 |
-10.430729 |
-60.947579 |
-76.936804 |
2002-09-11 |
13.866722 |
-9.581368 |
-61.176041 |
-76.366701 |
2002-09-12 |
13.711199 |
-9.163406 |
-60.199849 |
-76.076018 |
2002-09-13 |
12.278210 |
-9.744253 |
-61.488643 |
-77.830263 |
2002-09-14 |
13.872331 |
-9.769701 |
-60.596225 |
-79.301200 |
2002-09-15 |
13.742700 |
-11.035063 |
-61.132231 |
-79.930390 |
2002-09-16 |
13.117977 |
-10.444062 |
-61.367571 |
-79.468517 |
2002-09-17 |
12.547621 |
-13.403313 |
-61.542260 |
-77.685127 |
2002-09-18 |
12.156448 |
-14.764817 |
-62.347422 |
-77.060493 |
2002-09-19 |
12.356847 |
-14.440572 |
-62.154593 |
-75.846297 |
2002-09-20 |
12.967553 |
-13.593848 |
-62.368733 |
-76.248747 |
2002-09-21 |
12.188960 |
-13.960158 |
-64.801744 |
-77.560436 |
2002-09-22 |
10.930343 |
-14.180484 |
-61.880778 |
-78.704844 |
2002-09-23 |
11.546164 |
-14.640839 |
-60.011920 |
-79.081217 |
2002-09-24 |
12.010131 |
-13.948313 |
-58.988687 |
-79.707070 |
2002-09-25 |
10.203590 |
-14.170849 |
-59.165740 |
-78.678488 |
2002-09-26 |
9.985623 |
-13.509717 |
-57.876236 |
-78.605853 |
1000 rows × 4 columns