pandas操作
python中使用了pandas的一些操作,特此记录下来:
生成DataFrame
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
print(data)
得到结果为:
label v_id
0 a,b v_1
1 e,f,g v_2
按照逗号分隔并拼接
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_2'],
'label': ["a,b", 'e,f,g'],
})
df = data.drop('label', axis=1).join(data['label'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('label'))
print(df)
得到结果为:
v_id label
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
筛选符合条件的行
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
'label': ["a", 'b', "e", "f", "g"],
})
target_label = data.loc[data['label'].isin(["e", "f"])]
print(target_label)
得到结果为:
v_id label
1 v_2 e
1 v_2 f
筛选不符合条件的行
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
'label': ["a", 'b', "e", "f", "g"],
'num': [1, 2, 3, 4, 5],
})
other_label1 = data[~data['label'].isin(["f", "g"])]
print(other_label1)
other_label2 = data.query("num<=3 & label!='b'")
print(other_label2)
得到结果为:
v_id label
0 v_1 a
0 v_1 b
1 v_2 e
label num v_id
0 a 1 v_1
2 e 3 v_2
替换某一列的值
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
'label': ["a", 'b', "e", "f", "g"],
})
df = data.copy()
df.loc[df["label"] != "", 'label'] = "1"
print(df)
得到结果为:
v_id label
0 v_1 1
0 v_1 1
1 v_2 1
1 v_2 1
1 v_2 1
取某一列转换成list
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
'label': ["a", 'b', "e", "f", "g"],
})
print(data["label"].values.tolist())
得到结果为:
['a', 'b', 'e', 'f', 'g']
按照某一列去重
import pandas as pd
data = pd.DataFrame({
'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
'label': ["a", 'b', "e", "f", "g"],
})
print(data.drop_duplicates(subset=['v_id']))
得到结果为:
v_id label
0 v_1 a
1 v_2 e
复制dataframe并拼接
data = pd.DataFrame({
'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
'label': ["a", 'b', "e", "f", "g"],
})
data_copy = data.copy()
times = 2
for i in range(times):
data_copy = pd.concat([data_copy,data])
print(data_copy)
得到结果为:
v_id label
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
0 v_1 a
0 v_1 b
1 v_2 e
1 v_2 f
1 v_2 g
更改某一列类型
data = pd.DataFrame({
'v_id': ["v_1", 'v_1', "v_2", "v_2","v_2"],
'label': ["a", 'b', "e", "f", "g"],
'num': [1.0, 2.0, 3.0, 4.0, 5.0],
})
data["num"] = data[["num"]].astype(int)
print(data)
得到结果为:
label num v_id
0 a 1 v_1
1 b 2 v_1
2 e 3 v_2
3 f 4 v_2
4 g 5 v_2