# -*- coding: utf-8 -*- """ Created on Wed Aug 10 20:26:15 2016 @author: qqhfeng """ #模块1 VarianceThreshold 选择特征值 ''' Feature selector that removes all low-variance features. This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning. VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples. As an example, suppose that we have a dataset with boolean features, and we want to remove all features that are either one or zero (on or off) in more than 80% of the samples. Boolean features are Bernoulli random variables, and the variance of such variables is given by ''' from sklearn.feature_selection import VarianceThreshold X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]] #sel = VarianceThreshold(threshold=(.8 * (1 - .8))) sel = VarianceThreshold() print sel.fit_transform(X) #模块2 选择最重要的 SelectKBest removes all but the k highest scoring features from sklearn.datasets import load_iris from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import chi2 iris = load_iris() X, y = iris.data, iris.target print X.shape X_new = SelectKBest(chi2, k=2).fit_transform(X, y) #chi2是一种特征重要性评价方法 print X_new.shape #模块3 递归特征消除法