机器学习之逻辑回归

知识点:

"""
逻辑回归:只能解决二分类问题

损失函数:
    1、均方误差(不存在多个局部最低点):只有一个最小值
    2、对数似然损失:存在多个局部最小值 ;
        改善方法:1、多次随机初始化,多次比较最小值结果;
                  2、调整学习率

逻辑回归缺点:不好处理多分类问题                  

生成模型:有先验概率 (逻辑回归,隐马尔科夫模型)

判别模型:没有先验概率 (KNN,决策树,随机森林,神经网络)
"""

代码:

def logistic():
    """
    逻辑回归做二分类进行癌症预测(根据细胞的属性特征)
    :return: NOne
    """
    # 构造列标签名字
    column = ['Sample code number','Clump Thickness', 'Uniformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion', 'Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class']

    # 读取数据
    data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data", names=column)

    print(data)

    # 缺失值进行处理
    data = data.replace(to_replace='?', value=np.nan)

    data = data.dropna()

    # 进行数据的分割
    x_train, x_test, y_train, y_test = train_test_split(data[column[1:10]], data[column[10]], test_size=0.25)

    # 进行标准化处理
    std = StandardScaler()

    x_train = std.fit_transform(x_train)
    x_test = std.transform(x_test)

    # 逻辑回归预测
    lg = LogisticRegression(C=1.0)

    lg.fit(x_train, y_train)

    print(lg.coef_)

    y_predict = lg.predict(x_test)

    print("准确率:", lg.score(x_test, y_test))

    print("召回率:", classification_report(y_test, y_predict, labels=[2, 4], target_names=["良性", "恶性"]))

    return None

损失函数:

 

posted @ 2019-05-21 11:17  小白啊小白,Fighting  阅读(441)  评论(1编辑  收藏  举报