pyspark GBTRegressor 特征重要度 及排序
GBTRegressor 模型评估指标和特征重要度分析
官方文档:https://spark.apache.org/docs/2.2.0/api/python/_modules/pyspark/ml/regression.html
和随机森林类似,训练好model 可用如下代码打印特征以及重要度排序
#打印特征索引及其重要度 features_important = model.featureImportances print(features_important) #获取各个特征在模型中的重要性并按照权重倒序打印 ks = list(features_important.indices) vs = list(features_important.toArray()) features_important = tuple(features_important) print(len(features_important)) name_index = train.schema["features"].metadata["ml_attr"]["attrs"] index_im = zip(ks, vs) names = [] idxs = [] fea_num = 0 for it in name_index['numeric']: names.append(it['name']) idxs.append(it['idx']) fea_num += 1 print (fea_num) d = zip(names, idxs) p = zip(index_im, d) kv = {} for fir, sec in p: kv[sec[0]] = fir[1] fea_num += 1 print(len(kv)) print (sorted(kv.items(), key=lambda el: el[1], reverse=True))
参考链接