【843】dataframe相关操作 (match merge)
Reference:
29-April-2024
-
use
to_string()
to print the entire DataFrame. -
Check the number of maximum returned rows:
print(pd.options.display.max_rows)
ref: https://www.w3schools.com/python/pandas/pandas_csv.asp -
Increase the maximum number of rows to display the entire DataFrame:
pd.options.display.max_rows = 9999
ref: https://www.w3schools.com/python/pandas/pandas_csv.asp -
The DataFrames object has a method called
info()
, that gives you more information about the data set. -
Remove all rows with NULL values:
dropna()
-
The
fillna()
method allows us to replace empty cells with a value -
Pandas uses the
mean()
median()
andmode()
methods to calculate the respective values for a specified column-
Mean = the average value (the sum of all values divided by number of values).
-
Median = the value in the middle, after you have sorted all values ascending.
-
Mode = the value that appears most frequently.
-
-
Pandas has a
to_datetime()
method to convert all cells in the 'Date' column into dates:df['Date'] = pd.to_datetime(df['Date'])
-
Remove rows with a NULL value in the "Date" column:
df.dropna(subset=['Date'], inplace = True)
pandas.DataFrame.merge (match columns)
- DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=None, indicator=False, validate=None)[source]
-
Merge DataFrame or named Series objects with a database-style join.
Parameters:
how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’
Type of merge to be performed.
-
left: use only keys from left frame, similar to a SQL left outer join; preserve key order. (Keep all left dataframe's records, and no matched cells will be set with "NaN")
-
right: use only keys from right frame, similar to a SQL right outer join; preserve key order. (Keep all right dataframe's records, and no matched cells will be set with "NaN")
-
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically. (Keep both of the records, and no matched cells will be set with "NaN")
-
inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys. (Only keep the matched records)
-
cross: creates the cartesian product from both frames, preserves the order of the left keys.
import pandas as pd # Create two sample DataFrames df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'], 'value1': [1, 2, 3, 4, 5, 6]}) df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'], 'value2': [5, 6, 7, 8]}) print("Left DataFrame:") print(df1) print() print("Right DataFrame:") print(df2) print() # Merge the DataFrames based on the 'key' column merged_df1 = pd.merge(df1, df2, on='key', how='left') # Display the merged DataFrame print("how='left': Keep all left dataframe's records, and no matched cells will be set with \"NaN\"") print(merged_df1) print() # Merge the DataFrames based on the 'key' column merged_df2 = pd.merge(df1, df2, on='key', how='right') # Display the merged DataFrame print("how='right': Keep all right dataframe's records, and no matched cells will be set with \"NaN\"") print(merged_df2) print() # Merge the DataFrames based on the 'key' column merged_df3 = pd.merge(df1, df2, on='key', how='outer') # Display the merged DataFrame print("how='outer': Keep both of the records, and no matched cells will be set with \"NaN\"") print(merged_df3) print() # Merge the DataFrames based on the 'key' column merged_df4 = pd.merge(df1, df2, on='key', how='inner') # Display the merged DataFrame print("how='inner': Only keep the matched records") print(merged_df4) Output:
Left DataFrame: key value1 0 A 1 1 B 2 2 C 3 3 D 4 4 E 5 5 F 6 Right DataFrame: key value2 0 A 5 1 B 6 2 C 7 3 G 8 how='left': Keep all left dataframe's records, and no matched cells will be set with "NaN" key value1 value2 0 A 1 5.0 1 B 2 6.0 2 C 3 7.0 3 D 4 NaN 4 E 5 NaN 5 F 6 NaN how='right': Keep all right dataframe's records, and no matched cells will be set with "NaN" key value1 value2 0 A 1.0 5 1 B 2.0 6 2 C 3.0 7 3 G NaN 8 how='outer': Keep both of the records, and no matched cells will be set with "NaN" key value1 value2 0 A 1.0 5.0 1 B 2.0 6.0 2 C 3.0 7.0 3 D 4.0 NaN 4 E 5.0 NaN 5 F 6.0 NaN 6 G NaN 8.0 how='inner': Only keep the matched records key value1 value2 0 A 1 5 1 B 2 6 2 C 3 7 With two columns:
# Create two sample DataFrames df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'], 'value': [1, 2, 3, 4, 5, 6]}) df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'], 'value': [1, 6, 3, 8]}) print("Left DataFrame:") print(df1) print() print("Right DataFrame:") print(df2) print() # Merge the DataFrames based on the 'key' column merged_df4 = pd.merge(df1, df2, on=['key', 'value'], how='inner') # Display the merged DataFrame print("how='inner': Only keep the matched records for both columns of 'key' and 'value'") print(merged_df4) Output:
Left DataFrame: key value 0 A 1 1 B 2 2 C 3 3 D 4 4 E 5 5 F 6 Right DataFrame: key value 0 A 1 1 B 6 2 C 3 3 G 8 how='inner': Only keep the matched records for both columns of 'key' and 'value' key value 0 A 1 1 C 3 -
pandas与SQL对比:Comparison with SQL
按照列排序:pandas.DataFrame.sort_values
创建dataframe:pandas读取字典(dict)数据
数据表join,生成新列:PANDAS 数据合并与重塑(join/merge篇)
- pandas的merge方法提供了一种类似于SQL的内存链接操作,官网文档提到它的性能会比其他开源语言的数据操作(例如R)要高效。和SQL语句的对比可以看这里
- merge的列需要有同样的名称
- merge的参数
- on:列名,join用来对齐的那一列的名字,用到这个参数的时候一定要保证左表和右表用来对齐的那一列都有相同的列名。
- left_on:左表对齐的列,可以是列名,也可以是和dataframe同样长度的arrays。
- right_on:右表对齐的列,可以是列名,也可以是和dataframe同样长度的arrays。
- left_index/ right_index: 如果是True的haunted以index作为对齐的key。
- how:数据融合的方法。
- sort:根据dataframe合并的keys按字典顺序排序,默认是,如果置false可以提高表现。
Example:
pd.merge(gdf_africa, df_af_mp, how='left', on='country')
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)
2020-06-15 【469】RNN, LSTM参考博客
2018-06-15 【318】C# 学习笔记
2016-06-15 【206】Firefox 扩展收集