day08-逻辑回归
- 逻辑回归主要处理二分类问题
- 逻辑回归是在线性回归的基础上引入sigmoid函数
- 逻辑回归主要优势是可以预测二分类中,是和否的概率,例如广告点击率就是点击广告的概率和不点击广告的概率
# coding=utf-8
from sklearn.metrics import mean_squared_error,classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
def ljhg():
# 准备数据
columns = ['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']
data = pd.read_csv("../data/breast-cancer-wisconsin.data",names=columns)
# 缺失值处理
data = data.replace(to_replace="?",value=np.nan)
data = data.dropna()
# 数据分割
x_train,x_test,y_train,y_test = train_test_split(data[columns[1:10]],data[columns[10]],test_size=0.25)
# 数据标准化
std = StandardScaler()
x_train = std.fit_transform(x_train)
x_test = std.transform(x_test)
# 逻辑回归
lg = LogisticRegression()
lg.fit(x_train,y_train)
print("权重为:",lg.coef_)
print("预测的准确率(本场景无意义)为:",lg.score(x_test,y_test))
print("预测的召回率为:",classification_report(y_test,lg.predict(x_test),labels=[2,4],target_names=["良性","恶性"]))
return None
if __name__ == '__main__':
ljhg()
结果:
权重为: [[0.93399495 0.49202418 0.68596862 0.85106892 0.2800929 1.21018499
0.98731303 0.75633944 0.93642056]]
预测的准确率(本场景无意义)为: 0.9590643274853801
预测的召回率为: precision recall f1-score support
良性 0.97 0.97 0.97 115
恶性 0.93 0.95 0.94 56
accuracy 0.96 171
macro avg 0.95 0.96 0.95 171
weighted avg 0.96 0.96 0.96 171
Process finished with exit code 0