单因素特征选择--Univariate Feature Selection
An example showing univariate feature selection.
Noisy (non informative) features are added to the iris data and univariate feature selection(单因素特征选择) is applied. For each feature, we plot the p-values for the univariate feature selection and the corresponding weights of an SVM. We can see that univariate feature selection selects the informative features and that these have larger SVM weights.
In the total set of features, only the 4 first ones are significant. We can see that they have the highest score with univariate feature selection. The SVM assigns a large weight to one of these features, but also Selects many of the non-informative features. Applying univariate feature selection before the SVM increases the SVM weight attributed to the significant features, and will thus improve classification.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | #encoding:utf-8 import numpy as np import matplotlib.pyplot as plt from sklearn import datasets,svm from sklearn.feature_selection import SelectPercentile,f_classif ###load iris dateset iris = datasets.load_iris() ###Some Noisy data not correlated E = np.random.uniform( 0 , 0.1 ,size = ( len (iris.data), 20 )) ###uniform distribution 150*20 X = np.hstack((iris.data,E)) y = iris.target plt.figure( 1 ) plt.clf() X_indices = np.arange(X.shape[ - 1 ]) ###X.shape=(150,24) X.shape([-1])=24 selector = SelectPercentile(f_classif,percentile = 10 ) selector.fit(X,y) scores = - np.log10(selector.pvalues_) scores / = scores. max () plt.bar(X_indices - 0.45 ,scores,width = 0.2 ,label = r "Univariate score ($-Log(p_{value})$)" ,color = 'darkorange' ) # plt.show() ####Compare to weight of an svm clf = svm.SVC(kernel = 'linear' ) clf.fit(X,y) svm_weights = (clf.coef_ * * 2 ). sum (axis = 0 ) svm_weights / = svm_weights. max () plt.bar(X_indices - . 25 , svm_weights, width = . 2 , label = 'SVM weight' , color = 'navy' ) clf_selected = svm.SVC(kernel = 'linear' ) # clf_selected.fit(selector.transform((X,y))) clf_selected.fit(selector.transform(X),y) svm_weights_selected = (clf_selected.coef_ * * 2 ). sum (axis = 0 ) svm_weights_selected / = svm_weights_selected. max () plt.bar(X_indices[selector.get_support()] - . 05 ,svm_weights_selected,width = . 2 ,label = 'SVM weight after selection' ,color = 'c' ) plt.title( "Comparing feature selection" ) plt.xlabel( 'Feature number' ) plt.yticks(()) plt.axis( 'tight' ) plt.legend(loc = 'upper right' ) plt.show() |
实验结果:
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· .NET 9 new features-C#13新的锁类型和语义
· Linux系统下SQL Server数据库镜像配置全流程详解
· 现代计算机视觉入门之:什么是视频
· 你所不知道的 C/C++ 宏知识
· 聊一聊 操作系统蓝屏 c0000102 的故障分析
· DeepSeek V3 两周使用总结
· 回顾我的软件开发经历(1)
· C#使用yield关键字提升迭代性能与效率
· 低成本高可用方案!Linux系统下SQL Server数据库镜像配置全流程详解
· 4. 使用sql查询excel内容