Pandas一个需求：存在一个列表，需要在一个DataFrame中取到以该列表为索引的数据

需求：Pandas一个需求：存在一个列表，需要在一个DataFrame中取到以该列表为索引的数据

这里有一个坑，

In [103]: s = pd.Series([1, 2, 3])

In [104]: s
Out[104]: 
0    1
1    2
2    3
dtype: int64

当loc[]中的列表包含于S的索引中的话，没有问题

In [105]: s.loc[[1, 2]]
Out[105]: 
1    2
2    3
dtype: int64

In [4]: s.loc[[1, 2, 3]]

Passing list-likes to .loc with any non-matching elements will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike

由于 3不在s的索引中，所以引发了一个错误。

解决办法：

将s先用reindex() 接收需索引的列表，这样就会将s索引中没有的值，填充为Nan

In [106]: s.reindex([1, 2, 3])
Out[106]: 
1    2.0
2    3.0
3    NaN
dtype: float64

另外：如果你只想取到给定list和DataFrame索引中交集的值，那么就可以用一下方式：

In [107]: labels = [1, 2, 3]

In [108]: s.loc[s.index.intersection(labels)]
Out[108]: 
1    2
2    3
dtype: int64


同时需要注意：

------------------------- ------------------------- ------------------------- ------------------------- ------------------------- ------------------------- -------------------------

2023年11月6日补充：

可以直接用索引检测数据的形式替代merge: 具体代码如下：

import pandas as pd

df1 = pd.DataFrame({'key1': ['A', 'B', 'C', 'D'],

                    'value1': [1, 2, 3, 4]},index=['a','b', 'c', 'd'])
df2 = pd.DataFrame({'value2': [5, 6, 7, 8]}, index=['c', 'd', 'e', 'f'])
print(df1)
print(df2)
df1 = df1.reindex(df1.index.union(df2.index), copy=True)
df1.loc[df2.index, 'value2'] = df2['value2']
print(df1)
print(df1.merge(df2, right_index=True, left_index=True, how='outer'))

打印结果如下；

可以通过创建两个DF的索引交集/并集，计算出对应merge方法的inner join 和 outer join 结果出来。

posted @ 2021-05-24 08:39 脱离低级趣味阅读(376) 评论(0) 编辑收藏举报

努力加载评论中...

刷新页面返回顶部