第十一课: - 合并来自各种来源的数据
第 11 课
从多个Excel文件中抓取数据并将它们合并到一个数据帧中。
In [1]:
import pandas as pd
import matplotlib
import os
import sys
%matplotlib inline
In [2]:
print('Python version ' + sys.version)
print('Pandas version ' + pd.__version__)
print('Matplotlib version ' + matplotlib.__version__)
创建3个excel文件
In [3]:
# Create DataFrame
d = {'Channel':[1], 'Number':[255]}
df = pd.DataFrame(d)
df
Out[3]:
In [4]:
# Export to Excel
df.to_excel('test1.xlsx', sheet_name = 'test1', index = False)
df.to_excel('test2.xlsx', sheet_name = 'test2', index = False)
df.to_excel('test3.xlsx', sheet_name = 'test3', index = False)
print('Done')
将所有三个Excel文件放入一个DataFrame中
获取文件名列表,确保文件夹中没有其他Excel文件。
In [5]:
# List to hold file names
FileNames = []
# Your path will be different, please modify the path below.
os.chdir(r"C:\Users\david\notebooks\update")
# Find any file that ends with ".xlsx"
for files in os.listdir("."):
if files.endswith(".xlsx"):
FileNames.append(files)
FileNames
Out[5]:
创建一个函数来处理所有的excel文件。
In [6]:
def GetFile(fnombre):
# Path to excel file
# Your path will be different, please modify the path below.
location = r'C:\Users\david\notebooks\update\\' + fnombre
# Parse the excel file
# 0 = first sheet
df = pd.read_excel(location, 0)
# Tag record to file name
df['File'] = fnombre
# Make the "File" column the index of the df
return df.set_index(['File'])
遍历每个文件名,创建一个数据帧,并将其添加到列表中。
i.e.
df_list = [df, df, df]
In [7]:
# Create a list of dataframes
df_list = [GetFile(fname) for fname in FileNames]
df_list
Out[7]:
In [8]:
# Combine all of the dataframes into one
big_df = pd.concat(df_list)
big_df
Out[8]:
In [9]:
big_df.dtypes
Out[9]:
In [10]:
# Plot it!
big_df['Channel'].plot.bar();
This tutorial was rewrited by CDS