pandas数据清洗
一.处理缺省值
DataFrame.dropna
DataFrame.
dropna
(axis=0, how='any', thresh=None, subset=None, inplace=False)[source]
return DataFrame with NA entries dropped from it.
DataFrame.fillna
DataFrame.
fillna
(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)[source]
Fill NA/NaN values using the specified method
return filled DataFrame
DataFrame.replace
DataFrame.
replace
(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')[source]
Replace values given in to_replace with value.
return DataFrame object after replacement
二.series字符串处理函数
series.str.contains
series.str.strip
series.str.split
series.str.join
series.str.upper,lower,title
三.删除重复项
DataFrame.duplicated() 判断是否有重复项
DataFrame.drop_duplicates()删除重复项
DataFrame.
drop_duplicates
(subset=None, keep='first', inplace=False)[source]
四.离散化
pandas.
cut
(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise')[source]
按照bins 位置划分
pandas.
qcut
(x, q, labels=None, retbins=False, precision=3, duplicates='raise')[source]
按比例离散
插入空行:
index1 = list(df.index) index1.append('seven') df = pd.DataFrame(df,index=index1) df