Pandas常用命令
仍然使用之前爬取的链家数据:
查看数据缺失情况
#检查数据缺失情况 df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 82644 entries, 0 to 82643 Data columns (total 10 columns): Region 82644 non-null object Garden 82644 non-null object Layout 82644 non-null object Area 82644 non-null float64 Direction 82644 non-null object Renovation 82644 non-null object Elevator 64162 non-null object Price 82644 non-null float64 Year 62450 non-null float64 PerPrice 82644 non-null int64 dtypes: float64(3), int64(1), object(6) memory usage: 6.3+ MB
统计特性
# 结果保留两位小数 df.describe().round(2)
# 调整列的顺序,把Price放在最后一列 columns = ['Region', 'Garden', 'Layout', 'Area', 'Direction', 'Renovation', 'Elevator', 'Year', 'PerPrice', 'Price'] df = pd.DataFrame(df, columns=columns) df.head()