logstic 回归

        一天,某人问我什么是logstic回归。虽然做数据分析这么长时间经常用,仅仅是import some * 而已,没有深入思考,然而很遗憾,我在网上看到的logstic回归的数学推导都是错的,包括几本机器学习的经典教科书。花了几天时间推导一下,发现其背后的数学思想比较复杂,涉及到矩阵点乘和矩阵微分的概念
logstic回归就是对p/(1-p)进行线性回归)

在这里插入图片描述
这里写图片描述
这里写图片描述

from numpy import *

def loadDataSet():
    dataMat = []; labelMat = []
    fr = open('testSet.txt')
    for line in fr.readlines():
        lineArr = line.strip().split()
        dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
        labelMat.append(int(lineArr[2]))
    return dataMat,labelMat

def sigmoid(inX):
    return 1.0/(1+exp(-inX))

def gradAscent(dataMatIn, classLabels):
    dataMatrix = mat(dataMatIn)             #convert to NumPy matrix
    labelMat = mat(classLabels).transpose() #convert to NumPy matrix
    m,n = shape(dataMatrix)
    alpha = 0.001
    maxCycles = 5000
    weights = ones((n,1))
    for k in range(maxCycles):              #heavy on matrix operations
        h = sigmoid(dataMatrix*weights)     #matrix mult
        error = (labelMat - h)              #vector subtraction
        weights = weights + alpha * dataMatrix.transpose()* error #matrix mult
    return weights

def plotBestFit(weights):
    import matplotlib.pyplot as plt
    dataMat,labelMat=loadDataSet()
    dataArr = array(dataMat)
    n = shape(dataArr)[0] 
    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(n):
        if int(labelMat[i])== 1:
            xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
    ax.scatter(xcord2, ycord2, s=30, c='green')
    x = arange(-3.0, 3.0, 0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    ax.plot(x, y)
    plt.xlabel('X1'); plt.ylabel('X2');
    plt.show()

#import logRegres    

dataArr,labelMat=loadDataSet() 


weights=gradAscent(dataArr,labelMat)
plotBestFit(weights.getA())

这里写图片描述
输出的weight=

matrix([[ 9.35184677],
        [ 0.87401362],
        [-1.28891422]])
      

xw=9.35+0.87x-1.28y
令9.35+0.87x-1.28y=0,这就是分类曲线,为什么要这么做,在logstic 回归中,在分类中以概率值0.5为分类界限,ln(p/1-p)=xw,p=0.5,得xw=0

代码下载

posted @ 2022-08-19 22:58  luoganttcc  阅读(11)  评论(0编辑  收藏  举报