Pandas学习笔记

df.loc

Access a group of rows and columns by label(s) or a boolean array.
只能通过标签和布尔值来索取数据
df.loc[] 与df.iloc[[]]的区别:
[]返回一个Series，[[]]返回一个DataFrame

df.dropna

DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False)
去除缺失值

根据数据类型选择行

df["dtype"] = df['A'].apply(lambda x: isinstance(x, str)) 判断每一行的数据类型并生成新的一列
isinstance(object, classinfo)
如果object属于calssinfo的类型，则返回True
df[df["dtype"] == Ture] 返回为True的行

json, dict and pandas

json格式规定{}内必须是双引号,而非单引号
如果txt文件中是单引号,推荐使用 ast.literal_eval(txt),可以将字符转为字典
从字典转为df,可以使用pd.DataFrame.from_dict(dict, orient='index');参数orient='index'将指定字典中的key视为rows,如果需要把key视为columns,可以在后面加上.T进行转置
如果直接使用pd.DataFrame(dict),需要dict中每个value的长度都保持一致,这在实际情况中会很麻烦.

SettingWithCopyWarning

当我想对DF中的部分数据进行操作时，例如：

file_gs01 = file_gs[:1000]
file_gs01["variable"] = file_gs01["another variable"]

第二行命令会引起SettingWithCopyWarning，具体解释见：
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
一个解决办法是在生成file_gs01时，采用：
file_gs01 = file_gs[:1000].copy()

concurrent.futures

为什么叫Future？

The future is a proxy for a result that does not exist yet but will exist in the future.
A task is submitted to an executor, and the executor gives us back a future.
So we can think of it as a sort of receipt so that we can come back later and use it to get the result of our task

concurrent.futures.Future.result()

result()方法是concurrent.futures模块中Future类中的一个方法。

Return the value returned by the call. If the call hasn’t yet completed then this method will wait up to timeout seconds.
The result() method may block only if the task is not done by the time it is called.

只有当submit()或map()中对应的任务已经执行完毕时，result()才会返回结果，否则会在调用result()时一直等待，直到获取到结果。

参考：

pandas的描述性统计

df["var"].min()返回该列的最小值
df["var"].max()返回该列的最大值
df["var"].value_counts()统计每个值出现的次数，类似stata中的tab

posted @ 2023-03-15 20:15 梁书源阅读(23) 评论(0) 编辑收藏举报

刷新页面返回顶部

我的备忘录

记录学习中容易遗忘的零碎知识点

Pandas学习笔记

df.loc

df.dropna

根据数据类型选择行

json, dict and pandas

SettingWithCopyWarning

concurrent.futures

pandas的描述性统计

公告