Working With Data Sources 10

Preparing Data for SQL:

Sometimes we would like to stroe data into SQL server. However , the dataset need to be cleaned before it is sent. So here we use pandas to deal with dataset(.csv)file.

1. read_csv, set encoding:

　　file = pd.read_csv("academy_awards.csv",encoding = 'ISO-8859-1')

2. Use str function to read first 4 letters of all the strings in the column.

　　file["Year"] = file["Year"].str[0:4]

3. Use .isin function to get the target rows I need:

　　award_categories = ["Actor -- Leading Role","Actor -- Supporting Role",'Actress -- Leading Role','Actress -- Supporting Role']

　　nominations = later_than_2000[later_than_2000.isin(award_categories)['Category']]

4. Use .map() function to replace all the element in the column as I need:

　　won_dic = {
　　'NO' : 0,
　　'YES': 1
　　}
　　nominations.is_copy = False #Attention, here we can not directly modify the copied dataframe, we have to run this line to make copied dataframe changable.
　　nominations["Won?"] = nominations["Won?"].map(won_dic)

5. Use .drop() function to get rid of columns I do not need:

　　final_nominations = nominations.drop(delete_list,axis = 1)

6. Use vectorized string method to modify each string in a column in the dataframe:

　　additional_info_one = final_nominations["Additional Info"].str.rstrip("'}") #rstrip is to get rid of all the strings on the right side of the target string in the bracket.
　　additional_info_two = additional_info_one.str.split("{.")　　
　　movie_names = additional_info_two.str[0]
　　characters = additional_info_two.str[1]

7. Use to_sql request to save the dataset into the sql:

　　final_nominations.to_sql("nominations",conn,index = False)

posted on 2016-11-24 05:31 阿难1020 阅读(120) 评论(0) 收藏举报