ArmRoundMan

博客园 首页 新随笔 联系 订阅 管理

DataFrames are the central data structure in the pandas API. It‘s like a spreadsheet, with numbered rows and named columns.

为方便引入例程,先导入对应模块。

1 import pandas as pd
View Code

The following code instantiates a  pd.DataFrame  class to generate a DataFrame.

 1 # Create and populate a 5x2 NumPy array.
 2 my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]])
 3 
 4 # Create a Python list that holds the names of the two columns.
 5 my_column_names = ['temperature', 'activity']
 6 
 7 # Create a DataFrame.
 8 my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names)
 9 
10 # Print the entire DataFrame
11 print(my_dataframe)
View Code

You may add a new column to an existing pandas DataFrame just by assigning values to a new column name.

1 # Create a new column named adjusted.
2 my_dataframe["adjusted"] = my_dataframe["activity"] + 2
3 
4 # Print the entire DataFrame
5 print(my_dataframe)
View Code

Pandas provide multiples ways to isolate specific rows, columns, slices or cells in a DataFrame.

print("Rows #0, #1, and #2:")
print(my_dataframe.head(3), '\n')

print("Row #2:")
print(my_dataframe.iloc[[2]], '\n') # The type of result is DataFrame.
print("Row #2:")
print(my_dataframe.iloc[2], '\n') # The type of the result is Series.
print("Rows #1, #2, and #3:")
print(my_dataframe[1:4], '\n') # Note the index starts from the second row not 
# 1st

print("Column 'temperature':")
print(my_dataframe['temperature'])
View Code

Q: What's the difference between Series and DataFrame? 

A: The former is a column(Google Gemini insists row but I don't know why) of the latter.

How to index a particular cell of the DataFrame?

 1 # Create a Python list that holds the names of the four columns.
 2 my_column_names = ['Eleanor', 'Chidi', 'Tahani', 'Jason']
 3 
 4 # Create a 3x4 numpy array, each cell populated with a random integer.
 5 my_data = np.random.randint(low=0, high=101, size=(3, 4))
 6 
 7 # Create a DataFrame.
 8 df = pd.DataFrame(data=my_data, columns=my_column_names)
 9 
10 # Print the entire DataFrame
11 print(df)
12 
13 # Print the value in row #1 of the Eleanor column.
14 print("\nSecond row of the Eleanor column: %d\n" % df['Eleanor'][1]) #Chained # indexing
View Code

The following code shows how to create a new column to an existing DataFrame through row-by-row calculation between or among columns:

1 # Create a column named Janet whose contents are the sum
2 # of two other columns.
3 df['Janet'] = df['Tahani'] + df['Jason']
4 
5 # Print the enhanced DataFrame
6 print(df)
View Code

Pandas provides two different ways to duplicate a DataFrame:

  • Referencing: 藕不断丝连。
  • Copying: 相互独立。

 

 1 # Create a reference by assigning my_dataframe to a new variable.
 2 print("Experiment with a reference:")
 3 reference_to_df = df
 4 
 5 # Print the starting value of a particular cell.
 6 print("  Starting value of df: %d" % df['Jason'][1])
 7 print("  Starting value of reference_to_df: %d\n" % reference_to_df['Jason'][1])
 8 
 9 # Modify a cell in df.
10 df.at[1, 'Jason'] = df['Jason'][1] + 5 # Why not using Chained Indexing for #DataFrame assignment?
11 print("  Updated df: %d" % df['Jason'][1])
12 print("  Updated reference_to_df: %d\n\n" % reference_to_df['Jason'][1])
View Code

There're a lot of differences among  .iloc ,  .at  and Chained indexing. It seems the last one might not be a proper way for assignment, though it can exchange positions freely with  .at  generating exactly the same output, superficially.

The following code shows an experiment of a copy(to B finished)

copy_of_my_dataframe = my_dataframe.copy()
View Code

 

posted on 2024-08-22 20:07  后生那各膊客圆了  阅读(1)  评论(0编辑  收藏  举报