代码改变世界

pipeline 对部分特征进行处理

2017-08-17 14:24  xplorerthik  阅读(533)  评论(0编辑  收藏  举报

http://scikit-learn.org/stable/auto_examples/preprocessing/plot_function_transformer.html#sphx-glr-auto-examples-preprocessing-plot-function-transformer-py

利用下面的方法实现, 先对某一些进行选择,然后利用featureUnin 进行合并,重新变成整个特征集 。 

def all_but_first_column(X):
    return X[:, 1:]


def drop_first_component(X, y):
    """
    Create a pipeline with PCA and the column selector and use it to
    transform the dataset.
    """
    pipeline = make_pipeline(
        PCA(), FunctionTransformer(all_but_first_column),  # 先对整个特征集做pca,然后再滤掉第一列。即0列。
    )
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    pipeline.fit(X_train, y_train)
    return pipeline.transform(X_test), y_test