Data Analysis with Pandas 2

1. pandas.read_csv() to read the .csv file. After read, it is automatically converted to DataFrame format

2.The DataFrame is the frame for Pandas. It is not a matrix. The first column is not the column name but the first row of data. Column name is different from row of data.

3.Pandas utilizes this feature to provide more context when returning a row or a column from a DataFrame. For example, when you select a row from a DataFrame, instead of just returning the values in that row as a list, Pandas returns a Series object that contains the column labels as well as the corresponding values

4.In the numpy, we use a[99,0] to present the 99th row of the matrix. In pandas, we only need to use a.loc[99] .And there is also a series of 100th row shows on the display.

5.A convenient dtypes attribute for DataFrames returns a Series with the data type of each column.

6.The process of selecting certain columns in all the columns in Pandas format. First convert the Dataframe format to a vector by using .tolist() function. Then loop the list to select the certain row which satisfy the requirement and append these rows into a empty list. In the end, convert the selected list to DataFrame format by using food_list[[list]].

7.To normalize the data, devide the maximum .max() value in the column.

8.The way to assign a new column is similar to the way to assign a new key&value to a dictionary.

9.Sort columns:

DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')

Parameters:

Parameters:	by : string name or list of names which refer to the axis items axis : index, columns to direct sorting ascending : bool or list of bool Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by. inplace : bool if True, perform operation in-place kind : {quicksort, mergesort, heapsort} Choice of sorting algorithm. See also ndarray.np.sort for more information.mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label. na_position : {‘first’, ‘last’} first puts NaNs at the beginning, last puts NaNs at the end
Returns:	sorted_obj : DataFrame

by : string name or list of names which refer to the axis items

axis : index, columns to direct sorting

ascending : bool or list of bool

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

inplace : bool

if True, perform operation in-place

kind : {quicksort, mergesort, heapsort}

Choice of sorting algorithm. See also ndarray.np.sort for more information.mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.

na_position : {‘first’, ‘last’}

first puts NaNs at the beginning, last puts NaNs at the end

Returns:

sorted_obj : DataFrame

posted on 2016-10-06 14:02 阿难1020 阅读(163) 评论(0) 编辑收藏举报