单因素特征选择--Univariate Feature Selection

An example showing univariate feature selection.

Noisy (non informative) features are added to the iris data and univariate feature selection(单因素特征选择) is applied. For each feature, we plot the p-values for the univariate feature selection and the corresponding weights of an SVM. We can see that univariate feature selection selects the informative features and that these have larger SVM weights.

In the total set of features, only the 4 first ones are significant. We can see that they have the highest score with univariate feature selection. The SVM assigns a large weight to one of these features, but also Selects many of the non-informative features. Applying univariate feature selection before the SVM increases the SVM weight attributed to the significant features, and will thus improve classification.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#encoding:utf-8
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets,svm
from sklearn.feature_selection import SelectPercentile,f_classif
 
###load iris dateset
iris=datasets.load_iris()
 
###Some Noisy data not correlated
E=np.random.uniform(0,0.1,size=(len(iris.data),20)) ###uniform distribution   150*20
X=np.hstack((iris.data,E))
y=iris.target
 
plt.figure(1)
plt.clf()
 
X_indices=np.arange(X.shape[-1])   ###X.shape=(150,24)    X.shape([-1])=24
 
selector=SelectPercentile(f_classif,percentile=10)
selector.fit(X,y)
scores=-np.log10(selector.pvalues_)
scores/=scores.max()
 
plt.bar(X_indices-0.45,scores,width=0.2,label=r"Univariate score ($-Log(p_{value})$)",color='darkorange')
# plt.show()
 
 
####Compare to weight of an svm
clf=svm.SVC(kernel='linear')
clf.fit(X,y)
 
svm_weights=(clf.coef_**2).sum(axis=0)
svm_weights/=svm_weights.max()
plt.bar(X_indices - .25, svm_weights, width=.2, label='SVM weight',
        color='navy')
clf_selected=svm.SVC(kernel='linear')
# clf_selected.fit(selector.transform((X,y)))
clf_selected.fit(selector.transform(X),y)
 
svm_weights_selected=(clf_selected.coef_**2).sum(axis=0)
svm_weights_selected/=svm_weights_selected.max()
 
plt.bar(X_indices[selector.get_support()]-.05,svm_weights_selected,width=.2,label='SVM weight after selection',color='c')
 
plt.title("Comparing feature selection")
plt.xlabel('Feature number')
plt.yticks(())
plt.axis('tight')
plt.legend(loc='upper right')
plt.show()

 实验结果:

 

posted on   波比12  阅读(1443)  评论(0编辑  收藏  举报

编辑推荐:
· .NET 9 new features-C#13新的锁类型和语义
· Linux系统下SQL Server数据库镜像配置全流程详解
· 现代计算机视觉入门之:什么是视频
· 你所不知道的 C/C++ 宏知识
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
阅读排行:
· DeepSeek V3 两周使用总结
· 回顾我的软件开发经历(1)
· C#使用yield关键字提升迭代性能与效率
· 低成本高可用方案!Linux系统下SQL Server数据库镜像配置全流程详解
· 4. 使用sql查询excel内容

导航

< 2025年1月 >
29 30 31 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 1
2 3 4 5 6 7 8
点击右上角即可分享
微信分享提示