alex_bn_lee

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

统计

【843】dataframe相关操作 (match merge)

Reference:


29-April-2024

  • use to_string() to print the entire DataFrame.

  • Check the number of maximum returned rows: print(pd.options.display.max_rows)ref: https://www.w3schools.com/python/pandas/pandas_csv.asp

  • Increase the maximum number of rows to display the entire DataFrame: pd.options.display.max_rows = 9999ref: https://www.w3schools.com/python/pandas/pandas_csv.asp

  • The DataFrames object has a method called info(), that gives you more information about the data set.

  • Remove all rows with NULL values: dropna()

  • The fillna() method allows us to replace empty cells with a value 

  • Pandas uses the mean() median() and mode() methods to calculate the respective values for a specified column

    • Mean = the average value (the sum of all values divided by number of values).

    • Median = the value in the middle, after you have sorted all values ascending.

    • Mode = the value that appears most frequently.

  • Pandas has a to_datetime() method to convert all cells in the 'Date' column into dates: df['Date'] = pd.to_datetime(df['Date'])

  • Remove rows with a NULL value in the "Date" column: df.dropna(subset=['Date'], inplace = True)


pandas.DataFrame.merge (match columns)

DataFrame.merge(righthow='inner'on=Noneleft_on=Noneright_on=Noneleft_index=Falseright_index=Falsesort=Falsesuffixes=('_x', '_y')copy=Noneindicator=Falsevalidate=None)[source]

Merge DataFrame or named Series objects with a database-style join.

Parameters:

how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’

Type of merge to be performed.

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order. (Keep all left dataframe's records, and no matched cells will be set with "NaN")

  • right: use only keys from right frame, similar to a SQL right outer join; preserve key order. (Keep all right dataframe's records, and no matched cells will be set with "NaN")

  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically. (Keep both of the records, and no matched cells will be set with "NaN")

  • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys. (Only keep the matched records)

  • cross: creates the cartesian product from both frames, preserves the order of the left keys.

import pandas as pd
# Create two sample DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'],
'value1': [1, 2, 3, 4, 5, 6]})
df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'],
'value2': [5, 6, 7, 8]})
print("Left DataFrame:")
print(df1)
print()
print("Right DataFrame:")
print(df2)
print()
# Merge the DataFrames based on the 'key' column
merged_df1 = pd.merge(df1, df2, on='key', how='left')
# Display the merged DataFrame
print("how='left': Keep all left dataframe's records, and no matched cells will be set with \"NaN\"")
print(merged_df1)
print()
# Merge the DataFrames based on the 'key' column
merged_df2 = pd.merge(df1, df2, on='key', how='right')
# Display the merged DataFrame
print("how='right': Keep all right dataframe's records, and no matched cells will be set with \"NaN\"")
print(merged_df2)
print()
# Merge the DataFrames based on the 'key' column
merged_df3 = pd.merge(df1, df2, on='key', how='outer')
# Display the merged DataFrame
print("how='outer': Keep both of the records, and no matched cells will be set with \"NaN\"")
print(merged_df3)
print()
# Merge the DataFrames based on the 'key' column
merged_df4 = pd.merge(df1, df2, on='key', how='inner')
# Display the merged DataFrame
print("how='inner': Only keep the matched records")
print(merged_df4)

Output:

Left DataFrame:
key value1
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5
5 F 6
Right DataFrame:
key value2
0 A 5
1 B 6
2 C 7
3 G 8
how='left': Keep all left dataframe's records, and no matched cells will be set with "NaN"
key value1 value2
0 A 1 5.0
1 B 2 6.0
2 C 3 7.0
3 D 4 NaN
4 E 5 NaN
5 F 6 NaN
how='right': Keep all right dataframe's records, and no matched cells will be set with "NaN"
key value1 value2
0 A 1.0 5
1 B 2.0 6
2 C 3.0 7
3 G NaN 8
how='outer': Keep both of the records, and no matched cells will be set with "NaN"
key value1 value2
0 A 1.0 5.0
1 B 2.0 6.0
2 C 3.0 7.0
3 D 4.0 NaN
4 E 5.0 NaN
5 F 6.0 NaN
6 G NaN 8.0
how='inner': Only keep the matched records
key value1 value2
0 A 1 5
1 B 2 6
2 C 3 7

With two columns:

# Create two sample DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D', 'E', 'F'],
'value': [1, 2, 3, 4, 5, 6]})
df2 = pd.DataFrame({'key': ['A', 'B', 'C', 'G'],
'value': [1, 6, 3, 8]})
print("Left DataFrame:")
print(df1)
print()
print("Right DataFrame:")
print(df2)
print()
# Merge the DataFrames based on the 'key' column
merged_df4 = pd.merge(df1, df2, on=['key', 'value'], how='inner')
# Display the merged DataFrame
print("how='inner': Only keep the matched records for both columns of 'key' and 'value'")
print(merged_df4)

Output: 

Left DataFrame:
key value
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5
5 F 6
Right DataFrame:
key value
0 A 1
1 B 6
2 C 3
3 G 8
how='inner': Only keep the matched records for both columns of 'key' and 'value'
key value
0 A 1
1 C 3

pandas与SQL对比:Comparison with SQL

按照列排序:pandas.DataFrame.sort_values

创建dataframe:pandas读取字典(dict)数据

数据表join,生成新列:PANDAS 数据合并与重塑(join/merge篇)

  • pandas的merge方法提供了一种类似于SQL的内存链接操作,官网文档提到它的性能会比其他开源语言的数据操作(例如R)要高效。和SQL语句的对比可以看这里
  • merge的列需要有同样的名称
  • merge的参数
    • on:列名,join用来对齐的那一列的名字,用到这个参数的时候一定要保证左表和右表用来对齐的那一列都有相同的列名。
    • left_on:左表对齐的列,可以是列名,也可以是和dataframe同样长度的arrays。
    • right_on:右表对齐的列,可以是列名,也可以是和dataframe同样长度的arrays。
    • left_index/ right_index: 如果是True的haunted以index作为对齐的key。
    • how:数据融合的方法。
    • sort:根据dataframe合并的keys按字典顺序排序,默认是,如果置false可以提高表现。

Example:

pd.merge(gdf_africa, df_af_mp, how='left', on='country')

 

posted on   McDelfino  阅读(18)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2020-06-15 【469】RNN, LSTM参考博客
2018-06-15 【318】C# 学习笔记
2016-06-15 【206】Firefox 扩展收集
点击右上角即可分享
微信分享提示