Python Pandas -- Series
pandas.Series
class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
One-dimensional ndarray with axis labels (including time series).
Labels need not be unique but must be any hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN)
Operations between Series (+, -, /, , *) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.
Parameters : |
data : array-like, dict, or scalar value
index : array-like or Index (1d)
dtype : numpy.dtype or None
copy : boolean, default False
|
---|
Series 类似数组,但是它有标签(label) 或者索引(index).
1. 从最简单的series开始看。
from pandas import Series, DataFrame import pandas as pd ser1 = Series([1,2,3,4]) print(ser1) #0 1 #1 2 #2 3 #3 4 #dtype: int64
此时因为没有设置index,所以用默认
2. 加上索引
ser2 = Series(range(4),index=['a','b','c','d']) print(ser2) #a 0 #b 1 #c 2 #d 3 #dtype: int64
3. dictionnary 作为输入
dict1 = {'ohio':35000,'Texas':71000,'Oregon':1600,'Utah':500} ser3 = Series(dict1) #Oregon 1600 #Texas 71000 #Utah 500 #ohio 35000 #dtype: int64
key:默认设置为index
dict1 = {'ohio':35000,'Texas':71000,'Oregon':1600,'Utah':500} ser3 = Series(dict1) #Oregon 1600 #Texas 71000 #Utah 500 #ohio 35000 #dtype: int64 print(ser3) states = ['California', 'Ohio', 'Oregon', 'Texas'] ser4 = Series(dict1,index = states) print(ser4) #California NaN #Ohio NaN #Oregon 1600.0 #Texas 71000.0 #dtype: float64
用了dictionary时候,也是可以特定的制定index的,当没有map到value的时候,给NaN.
print(pd.isnull(ser4)) #California True #Ohio True #Oregon False #Texas False #dtype: bool
函数isnull判断是否为null
print(pd.isnull(ser4)) #California True #Ohio True #Oregon False #Texas False #dtype: bool
函数notnull判断是否为非null
print(pd.notnull(ser4)) #California False #Ohio False #Oregon True #Texas True #dtype: bool
4. 访问元素和索引用法
print (ser2['a']) #0 #print (ser2['a','c']) error print (ser2[['a','c']]) #a 0 #c 2 #dtype: int64 print(ser2.values) #[0 1 2 3] print(ser2.index) #Index(['a', 'b', 'c', 'd'], dtype='object')
5. 运算, pandas的series保留Numpy的数组操作
print(ser2[ser2>2]) #d 3 #dtype: int64 print(ser2*2) #a 0 #b 2 #c 4 #d 6 #dtype: int64 print(np.exp(ser2)) #a 1.000000 #b 2.718282 #c 7.389056 #d 20.085537 #dtype: float64
6. series 的自动匹配,这个有点类似sql中的full join,会基于索引键链接,没有的设置为null
print (ser3+ser4) #California NaN #Ohio NaN #Oregon 3200.0 #Texas 142000.0 #Utah NaN #ohio NaN #dtype: float64
7. series对象和索引都有一个name属性
ser4.index.name = 'state' ser4.name = 'population count' print(ser4) #state #California NaN #Ohio NaN #Oregon 1600.0 #Texas 71000.0 #Name: population count, dtype: float64
8.预览数据
print(ser4.head(2)) print(ser4.tail(2)) #state #California NaN #Ohio NaN #Name: population count, dtype: float64 #state #Oregon 1600.0 #Texas 71000.0 #Name: population count, dtype: float64