pandas笔记

1、df = pd.read_csv('/home/greg/桌面/uk_rain_2014.csv', sep=',',header=0)

  # 默认第一行为列名,列名不读入

  第一个是文件名,

  第二个是分隔符,也就是两列之间的分隔符,默认是',',

  第三个参数也就是文件每一列的命名,如果没有列名可设置为header=None

  从限定分隔符文本中导入数据,read_tabel

  从excel中读入数据,read_excel

 

  df = pd.DataFrame.from_csv(r'e:\temp\1.csv',encoding='gbk') # encoding='utf-8'

 

2、构造数据

  df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['f1', 'f2', 'f3'])

  

  按列生成:df = pd.DataFrame({'user_id':[1,2,3], 'item_id':[12,34,56]})

  

3、输出到文件

  df.to_csv('~/桌面/test.csv', encoding='utf-8', index=False, sep=',', header=False)

 4、读取单元格内容

  _csv_files = pd.read_csv(csv_file, encoding='utf-8')

  file_name = _csv_files.iloc[index,0] # index是行,0是列

5、

  # 读到的数据行数
  # len(df)
  # 获取表格值
  df.iloc[1,1]
  df.loc[1][1]
 
  # 获取一列值
  df.iloc[:,2]
 
  # 获取一行值
  df.iloc[2,:]
 
 6、遍历:for i,row in df.iterrows():
      info_list = eval(row['info'])
 
7、删除列:df.drop(['info'], axis=1, inplace=True),删除行:df.drop([0,4])
8、统计信息:df_train.shape,df_train.describe()
9、筛选多列得到新data:data = pd.concat([df_train['SalePrice'], df_train['CentralAir']], axis=1)
10、一列值:df_train.SalePrice.values
11、2个df汇总成一个:df_all_data = pd.concat([df_train, df_test])
12、分组统计:xx = df_train.groupby('Neighborhood')       xx.describe()        xx['GarageCars'].describe()   xx.mean()   xx['GarageCars'].mean()
13、result = pd.DataFrame({"name":fid},columns=['name'])
14、合并csv文件:

df = pd.DataFrame()
for file in files:
  csv_file = os.path.join(csv_path, file)
  data = pd.read_csv(csv_file, encoding="utf-8")
  df = df.append(data)
df.to_csv(newfile, encoding="utf-8", index=False)

 15、指定列顺序:

df = pd.DataFrame({'filename':filename, 'classify':classify})
df = df.ix[:, ['filename', 'classify']]

16、筛选:

df[df.A==100]
df_train = df1[df1.iloc[:,0].isin(files_train)]
df_train.to_csv(r'train.csv', index=None, header=None)

df[df['creativeID']<=10000]
对于appID这个属性,我们想去掉appID=278和appID=382的样本:
df[(True-df['appID'].isin([278,382]))]
过滤掉appPlatform=2而且appID=278和appID=382的样本:
df[(True-df['appID'].isin([278,382]))&(True-df['appPlatform'].isin([2]))]

 17、批量修改:

  dit = {'positive':0,'negative':1}
  train_df['labels'] = train_df['labels'].map(dit)

 18、修改单元格:

  df.loc[0, 'rate'] = hit_sum_rate_list

  df.loc[(df.user_name == 'Xiaoming'), 'age'] = 8

19、随机划分:

df = df.sample(frac=1.0) # 全部打乱
cut_idx = int(round(0.2 * df.shape[0]))
df_test, df_train = df.iloc[:cut_idx], df.iloc[cut_idx:]

 20、分组:

df = pd.read_csv(f)

每5行一组求和:df.groupby(df.index // 5).sum()

 https://blog.csdn.net/weixin_45144170/article/details/104323786

 

pandas操作:https://blog.csdn.net/chandelierds/article/details/83627060
set_index、reset_index:https://blog.csdn.net/cuit2016123070/article/details/83624074
 
posted @ 2019-09-30 22:03  碧水青山  阅读(437)  评论(0编辑  收藏  举报