pandas笔记
1、df = pd.read_csv('/home/greg/桌面/uk_rain_2014.csv', sep=',',header=0)
# 默认第一行为列名,列名不读入
第一个是文件名,
第二个是分隔符,也就是两列之间的分隔符,默认是',',
第三个参数也就是文件每一列的命名,如果没有列名可设置为header=None
从限定分隔符文本中导入数据,read_tabel
从excel中读入数据,read_excel
df = pd.DataFrame.from_csv(r'e:\temp\1.csv',encoding='gbk') # encoding='utf-8'
2、构造数据
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['f1', 'f2', 'f3'])
按列生成:df = pd.DataFrame({'user_id':[1,2,3], 'item_id':[12,34,56]})
3、输出到文件
df.to_csv('~/桌面/test.csv', encoding='utf-8', index=False, sep=',', header=False)
4、读取单元格内容
_csv_files = pd.read_csv(csv_file, encoding='utf-8')
file_name = _csv_files.iloc[index,0] # index是行,0是列
5、
df = pd.DataFrame()
for file in files:
csv_file = os.path.join(csv_path, file)
data = pd.read_csv(csv_file, encoding="utf-8")
df = df.append(data)
df.to_csv(newfile, encoding="utf-8", index=False)
15、指定列顺序:
df = pd.DataFrame({'filename':filename, 'classify':classify})
df = df.ix[:, ['filename', 'classify']]
16、筛选:
df[df.A==100]
df_train = df1[df1.iloc[:,0].isin(files_train)]
df_train.to_csv(r'train.csv', index=None, header=None)df[df[
'creativeID'
]<
=
10000
]
对于appID这个属性,我们想去掉appID=278和appID=382的样本:df[(
True
-
df[
'appID'
].isin([
278
,
382
]))]
过滤掉appPlatform=2而且appID=278和appID=382的样本:df[(
True
-
df[
'appID'
].isin([
278
,
382
]))&(
True
-
df[
'appPlatform'
].isin([
2
]))]
17、批量修改:
dit = {'positive':0,'negative':1}
train_df['labels'] = train_df['labels'].map(dit)
18、修改单元格:
df.loc[0, 'rate'] = hit_sum_rate_list
df.loc[(df.user_name == 'Xiaoming'), 'age'] = 8
19、随机划分:
df = df.sample(frac=1.0) # 全部打乱
cut_idx = int(round(0.2 * df.shape[0]))
df_test, df_train = df.iloc[:cut_idx], df.iloc[cut_idx:]
20、分组:
df = pd.read_csv(f)
每5行一组求和:df.groupby(df.index // 5).sum()
https://blog.csdn.net/weixin_45144170/article/details/104323786