Pandas

Dataframe & Series Columns & Index Missing values: NaN

df.index
df.columns
df.data
type(...)

df.dtypes

series.to_frame()
s.value_counts()
s.describe()
s.isnull()
s.fillna(0)
s.dropna()

s.value_counts(normalize=True)

s.hasnans()
dataframe.isnull()
df.sum()

pd.read_csv(..., index_col="...")
df.reset_index
df.rename(index={...}, columns={...})

idx_list = df.index.tolist()
idx_list[1] = ...
df.index = idx_list

df.drop("...", axis="columns")
df.insert(loc=..., column="...", value=[])

Operations

df.filter(like="...")
df.filter(regex="...")

df.count(...) // no NaN values

df.isnull()
df.sum()
df.head()

df.memory_usage()

df.nunique()
col.astype("categorical")

df.nlargest()
df.sort_values(...)

df.drop_duplicate()

df.iloc[...] // index
df.loc[...] // label

df.columns
df.get_loc(...)

df.col.pct_change()

pd.cut(col, bins)

Tidy data => "Hadley"

  • Stack & melt
  • vs Unstack & pivot

The Zen of Python

Combining Pandas Objects

df.loc[len(df)] = {Age: ...}

pd.concat([df1, df2])

Time Series Analysis

  • date
  • time
  • datetime
  • timedelta
  • pd.Timestamp
df.between_time()
df.at_time()

df.resample("w")
df.size()

df.resample("w", on="col1")


REF
https://gist.github.com/MaximePawlakFr/71a5cfbaef45ad5b0f4f23536752f229
posted @ 2021-03-07 21:40  emanlee  阅读(72)  评论(0编辑  收藏  举报