数据分析模型之Logistics回归

Logistic回归

研究的是分类问题（是或否），跟之前的线性回归、岭回归、Lasso回归不同（连续型或有规律的数据）也称之为广义线性回归公式层面上就是由多元线性回归公式做Logit变换得到。

混淆矩阵

ROC曲线

KS曲线

Logistics函数说明

LogisticRegression(tol=0.0001, fit_intercept=True,class_weight=None, max_iter=100)
tol：⽤于指定模型跌倒收敛的阈值
fit_intercept：bool类型参数，是否拟合模型的截距项，默认为True
class_weight：⽤于指定因变量类别的权重，如果为字典，则通过字典的形式{class_label:weight}传
递每个类别的权重；如果为字符串'balanced'，则每个分类的权重与实际样本中的⽐例成反⽐，当各分
类存在严重不平衡时，设置为'balanced'会⽐较好；如果为None，则表示每个分类的权重相等
max_iter：指定模型求解过程中的最⼤迭代次数，默认为100

Logistics代码

  # 导⼊第三⽅模块
  Import pandas as pd
  Import numpy as np
  from sklearn import linear_model
  # 读取数据
  sports = pd.read_csv(r'RunorWalk.csv')
  # 利⽤训练集建模
  sklearn_logistic=linear_model.LogisticRegression()
  sklearn_logistic.fit(X_train,y_train)
  # 返回模型的各个参数
  print(sklearn_logistic.intercept_,sklearn_logistic.coef_)
  # 导⼊第三⽅模块
  from sklearn import metrics
  # 混淆矩阵
  cm = metrics.confusion_matrix(y_test,sklearn_predict,labels=[0,1])
  # y得分为模型预测正例的概率
  y_score = sklearn_logistic.predict_proba(X_test)[:,1]
  # 计算不同阈值下，fpr和tpr的组合值，其中fpr表示1-Specificity，tpr表示Sensitivity
  fpr,tpr,threshold = metrics.roc_curve(y_test, y_score)
  # 绘制⾯积图
  plt.stackplot(fpr, tpr, color='steelblue', alpha = 0.5, edgecolor = 'black')
  # 添加ROC曲线的轮廓
  plt.plot(fpr, tpr, color='black', lw = 1)
  # 添加对⻆线
  plt.plot([0,1],[0,1], color = 'red', linestyle = '--')
  # 显示图形
  plt.show()
  # 调⽤⾃定义函数，绘制K-S曲线
  plot_ks(y_test = y_test, y_score = y_score, positive_flag = 1)

posted on 2020-10-25 20:50 勿要阅读(1158) 评论(0) 收藏举报

刷新页面返回顶部

勿要