pandas筛选出某列中含有特定文本内容的所在行
1、所在行内容是单一的或者是标量
df_fintech = df_text[df_text['业务一级分类']=="金融科技"]
2、所在行内容是割裂的
先转成str格式再用contains筛选
df_fintech = df_text[df_text['业务一级分类'].str.contains("金融科技")]
3、筛选出列值属于某个范围内的行,用isin
df.loc[df['column_name'].isin(some_values)] # some_values是可迭代对象
4、多种条件限制时使用&,&的优先级高于>=或<=,所以要注意括号的使用
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
5、筛选出列值不等于某个/些值的行
利用反选的思想:
df.loc[df['column_name'] != 'some_value'] df.loc[~df['column_name'].isin('some_values')] #~取反 if values are str, remember to pass a list ['str1','str2']
在字符串pandas列中查找多个关键字的更有效方法示例(也就是上面第2个方法)
import pandas as pd # create regex pattern out of the list of words positive_kw = '|'.join(['rise','positive','high','surge']) negative_kw = '|'.join(['sink','lower','fall','drop','slip','loss','losses']) neutral_kw = '|'.join(['flat','neutral']) # creating some fake data for demonstration words = [ 'rise high', 'positive attitude', 'something', 'foo', 'lowercase', 'flat earth', 'neutral opinion' ] df = pd.DataFrame(data=words, columns=['words']) df['positive'] = df['words'].str.contains(positive_kw).astype(int) df['negative'] = df['words'].str.contains(negative_kw).astype(int) df['neutral'] = df['words'].str.contains(neutral_kw).astype(int) print(df)
6、用groupby分组并将每一组单独保存为excel文件(get_group)
import pandas as pd from styleframe import StyleFrame file_name = "总表.xlsx" df = pd.read_excel(file_name, skiprows=1) rows = list(set(df["列标题"])) group = df.groupby("列标题") for row in rows: count = len(group.get_group(row)) group.get_group(row).to_excel(row+ str(count) + ".xlsx")
参考:https://blog.csdn.net/weixin_43557139/article/details/109459352
https://www.coder.work/article/4980040