mglearn初探
这个是取自于《python机器学习基础教程》16页
代码:
# import numpy as np
# import matplotlib.pyplot as plt
# import pandas as pd
# import mglearn
# from sklearn.datasets import load_iris
# from sklearn.model_selection import train_test_split
# iris_dataset = load_iris()
# X_train,X_test,y_train,y_test = train_test_split(iris_dataset['data'],iris_dataset['target'],random_state=0)
# # 利用X_train中的数据创建DataFrame
# # # 利用iris_dataset.feature_names中的字符串对数据列进行标记
# iris_dataframe = pd.DataFrame(X_train,columns=iris_dataset.feature_names)
# grr = pd.plotting.scatter_matrix(iris_dataframe,c=y_train,figsize=(15,15),marker='o',hist_kwds={'bins':20},s=60,alpha=8,cmap=mglearn.cm3)
# grr.show()
import mglearn
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import load_iris
iris_dataset = load_iris()
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(iris_dataset['data'],iris_dataset['target'],random_state=0)
iris_dataframe=pd.DataFrame(X_train,columns=iris_dataset.feature_names)
grr = pd.plotting.scatter_matrix(iris_dataframe,marker='o',c = y_train,hist_kwds={'bins':20},cmap=mglearn.cm3)
plt.show()
效果:
我们来看scatter_matrix的参数
def scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, diagonal='hist', marker='.', density_kwds=None,hist_kwds=None, range_padding=0.05, **kwds)
frame:pandas dataframe对象
alpha:(float, 可选), 图像透明度,一般取(0,1]
figsize: ((float,float), 可选),以英寸为单位的图像大小,一般以元组 (width, height) 形式设置
ax:(Matplotlib axis object, 可选),一般取None
diagonal:({‘hist’, ‘kde’}),必须且只能在{‘hist’, ‘kde’}中选择1个,’hist’表示直方图(Histogram plot),’kde’表示核密度估计(Kernel Density Estimation);该参数是scatter_matrix函数的关键参数,下文将做进一步介绍
marker:(str, 可选), Matplotlib可用的标记类型,如’.’,’,’,’o’等
density_kwds:(other plotting keyword arguments,可选),与kde相关的字典参数
hist_kwds:(other plotting keyword arguments,可选),与hist相关的字典参数
range_padding:(float, 可选),图像在x轴、y轴原点附近的留白(padding),该值越大,留白距离越大,图像远离坐标原点
kwds:(other plotting keyword arguments,可选),与scatter_matrix函数本身相关的字典参数
参考来自:https://blog.csdn.net/hurry0808/article/details/78573585