Python Seaborn 衍生变量的可视化

Seaborn 是一个基于 matplotlib 的 Python 库，用于创建统计图形。衍生变量是指从原始数据中计算得出的新变量。使用 Seaborn 可视化衍生变量，通过绘制衍生变量的图表，可以更好地理解数据之间的关系，发现数据中的潜在模式，识别异常值。

1、创建衍生变量

对衍生变量进行可视化是一种强大的数据分析实践。衍生变量是从现有数据中生成的新变量，它们可以提供对数据集的更深层次理解。

import pandas as pd
import numpy as np

# 生成示例数据
np.random.seed(0)
data = {
    'Age': np.random.randint(20, 60, 100),
    'Salary': np.random.randint(50000, 150000, 100),
    'Education': np.random.choice(['Bachelor', 'Master', 'PhD'], 100),
    'City': np.random.choice(['Zhang San', 'Li Si', 'Wang Wu'], 100)
}

# 创建DataFrame
df = pd.DataFrame(data)
print(df.head())

# 在原有的df基础上创建衍生变量

# 创建一个简单的衍生变量，例如"Seniority"，基于年龄
df['Seniority'] = df['Age'].apply(lambda x: 'Senior' if x >= 40 else 'Junior')

# 假设薪资大于100000为高收入，否则为普通收入
df['Income Level'] = df['Salary'].apply(lambda x: 'High' if x > 100000 else 'Medium')

print(df.head())

参考文档：Python pandas.DataFrame.apply函数方法的使用

2、使用 pairplot() 绘制成对的双变量分布

pairplot()函数是一个非常有用的工具，用于绘制数据集中每对变量之间的关系。这对于快速查看数据集中多个变量之间的关系非常有用。通过调整参数，可以轻松地探索数据集中多个变量之间的复杂关系。常用参数如下，

参数	描述
data	要绘制的DataFrame。
hue	用于分组的变量名称，通常是分类变量。
palette	用于绘图的颜色方案。
vars	要绘制的DataFrame中的特定列名列表。
kind	非对角线上的图的类型（如'scatter', 'reg'）。
diag_kind	对角线上的图的类型（如 'hist', 'kde'）。
markers	每个等级的 hue 变量的标记。
height	每个子图的高度（英寸）。

使用示例：

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# 生成示例数据
np.random.seed(0)
data = {
    'Age': np.random.randint(20, 60, 100),
    'Salary': np.random.randint(50000, 150000, 100),
    'Education': np.random.choice(['Bachelor', 'Master', 'PhD'], 100),
    'City': np.random.choice(['Zhang San', 'Li Si', 'Wang Wu'], 100)
}

# 创建DataFrame
df = pd.DataFrame(data)

# 在原有的df基础上创建衍生变量

# 创建一个简单的衍生变量，例如"Seniority"，基于年龄
df['Seniority'] = df['Age'].apply(lambda x: 'Senior' if x >= 40 else 'Junior')

# 假设薪资大于100000为高收入，否则为普通收入
df['Income Level'] = df['Salary'].apply(lambda x: 'High' if x > 100000 else 'Medium')

# 使用pairplot可视化衍生变量
sns.pairplot(df, hue='Income Level', diag_kind='kde', height=2.5)

plt.show()

参考文档：Python Seaborn 衍生变量的可视化实践-CJavaPy

posted @ 2024-01-29 22:35 leviliang 阅读(64) 评论(0) 收藏举报

刷新页面返回顶部

Python Seaborn 衍生变量的可视化

1、创建衍生变量

2、使用 pairplot() 绘制成对的双变量分布

公告