kaggle2 - 数据可视化

# 我们将index_col的值设置为第一列的名称(“日期”,在Excel中打开时在文件的单元格A1中找到) , 将行的标签当作日期来读
fifa_data = pd.read_csv(fifa_filepath, index_col="Date", parse_dates=True)
# 使用Seaborn画数据
plt.figure(figsize=(14,6))

plt.title("Daily Global Streams of Popular Songs in 2017-2018")

sns.lineplot(data=spotify_data)
# 打印所有列名字
list(spotify_data.columns)
# 使标签倾斜
plt.xticks(rotation=-45)
# 画某一列
# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# 画一个柱型图
plt.figure(figsize=(10,6))

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])
# ci=None 消除误差棒

# 热力图

sns.heatmap    -这告诉笔记本我们要创建一个heatmap。

data=data_airplot  -这告诉笔记本使用航班数据中的所有条目来创建热图。

annot=true     -这可以确保每个单元格的值显示在图表上。(去掉这个会删除每个单元格中的数字!)

 

# 返回列表中最大值的索引

np.argmax(alist())

# 画一个散点图

sns.scatterplot(x = candy_data['sugarpercent'] ,y=candy_data['winpercent'])

   区分数据的标记

  hue=candy_data['chocolate'] 

# 有一个相关曲线的图

sns.regplot(x=candy_data['sugarpercent'], y=candy_data['winpercent'])

# 有两个相关曲线的图

sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)

# 画一个分类散点图  (像小花的图,横坐标最好是两种情况,Yes,或No)

sns.swarmplot(x=candy_data['chocolate'],y=candy_data['winpercent'])

 

 

# 画一个直方图

sns.distplot(a=iris_data['Petal Length (cm)'], kde=False)

kde = Flase  是否在图中画出核密度估计图

 

# 核密度估计图 (可以理解为平滑的直方图)

# KDE plot 

sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)

 

# 2维的核密度估计图

# 2D KDE plot

sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")

 

# 图的分类

Since it's not always easy to decide how to best tell the story behind your data, we've broken the chart types into three broad categories to help with this.

  • Trends - A trend is defined as a pattern of change.
    • sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.
  • Relationship - There are many different chart types that you can use to understand relationships between variables in your data.
    • sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
    • sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
    • sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
    • sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
    • sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
    • sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.
  • Distribution - We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.
    • sns.distplot - Histograms show the distribution of a single numerical variable.
    • sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
    • sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

# seaborn 主题

sns.set_style("dark")

Seaborn有五个不同的主题:(1)“DarkGrid”、(2)“WhiteGrid”、(3)“Dark”、(4)“White”和(5)“Ticks”,您只需要使用与上面代码单元中的类似的命令(填充所选主题)来更改它。



 
posted @ 2019-06-24 21:42  childhood_2  阅读(364)  评论(0编辑  收藏  举报