Pipeline工作流

一、概述

pipeline实现了对特征处理与机器学习的封装流程化管理,期间处理的参数可以很方便的在测试集和未来数据上反复使用。

  • Pipeline都是执行各学习器中对应的方法,如果该学习器没有该方法,则报错

  • 假设该pipeline有n个学习器

  • fit依次对前n-1的学习器执行fit和transform方法,并且对最后一个学习器执行fit方法

  • predict先对n-1学习器执行transform方法,然后执行最后一个学习器的predict方法

  • score先对n-1学习器执行transform方法,然后执行最后一个学习器的score方法

 

 

二、代码展示

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
import warnings
warnings.filterwarnings("ignore")


X,y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)


pipe_lr = Pipeline([
    ('pf', PolynomialFeatures(degree=3,include_bias=False,interaction_only=False)),
    ('sc', StandardScaler()),
    ('clf', Ridge(alpha=0.8))])



# fit依次对前n-1的学习器执行fit和transform方法,并且对最后一个学习器执行fit方法
pipe_lr.fit(X_train, y_train)
# score依次对前n-1的学习器执行transform方法,并且对最后一个学习器执行score方法
print(f'Train score: {pipe_lr.score(X_train, y_train):.5%},Test score: {pipe_lr.score(X_test, y_test):.5%},')


#  pip实现的就是下面代码的功能

X,y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

pf = PolynomialFeatures(degree=3,include_bias=False,interaction_only=False)
X_train = pf.fit_transform(X_train)
X_test = pf.transform(X_test)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

clf = Ridge(alpha=0.8)
clf.fit(X_train, y_train)
print(f'Train score: {clf.score(X_train, y_train):.5%},Test score: {clf.score(X_test, y_test):.5%},')

 

posted @ 2022-12-18 21:16  qsl_你猜  阅读(55)  评论(0编辑  收藏  举报