代码改变世界

特征选择

2018-01-29 12:41  xplorerthik  阅读(265)  评论(0编辑  收藏  举报

Filter methods
These include simple statistical test to determine if a feature is statistically significant for example the p value for a t test to determine if the null hypothesis should be accepted and the feature rejected. This does not take into account feature interactions and is generally not a very recommended way of doing feature selection as it can lead to lost in information 

 Wrapper based methods

 Tree based models like RandomForest are also robust against issues like multi-collinearity, missing values, outliers etc as well as being able to discover some interactions between features. However this can be rather computationally expensive.

 

a simple wrapper method: Forward Feature Selection (FFS) ,特征逐步添加。 每次迭代添加一个特征。

Feature engineering is a super-set of  activities which include feature extraction, feature construction and feature selection. Each of the three are important steps and none should be ignored. We could make a generalization of the importance though, from my experience the relative importance of the steps would be feature construction > feature extraction > feature selection.