郑捷《机器学习算法原理与编程实践》学习笔记(第七章 预测技术与哲学)7.3 岭回归

7.3 岭回归

7.3.1 验证多重共线性

7.3.2 岭回归理论

7.3.3 岭际分析

7.3.4 k值的判断

7.3.5 辅助函数

 (1)导入多维数据集:加载数据集

def loadDataSet(filename):
    numFeat    = len(open(filename).readline().split('\t'))-1#get number of fields
    dataMat    = []
    labelMat   = []
    fr         = open(filename)
    for line in fr.readlines():
        lineArr = []
        curLine = line.strip().split('\t')
        for i in range(numFeat):
            lineArr.append(float(curLine[i]))
        dataMat.append(lineArr)
        labelMat.append(float(curLine[-1]))
    return dataMat,labelMat

(2)标准化矩阵数据集

 

#标准化数据集
def normData(xArr,yArr):
    xMat = mat(xArr)
    yMat = mat(yArr).T
    yMean = mean(yMat,0)
    xMean = mean(xMat,0)
    ynorm = yMat - yMean
    xVar  = var(xMat,0)
    xnorm = (xMat-xMean)/xVar
    return xnorm,ynorm

 (3)绘制图形

def scatterplot(wMat,k):#绘制图形
    fig = plt.figure()
    ax  = fig.add_subplot(111)
    wMatT = wMat.T
    m,n   = shape(wMatT)
    for i in xrange(m):
        ax.plot(k,wMatT[i,:])
        ax.annotate("feature["+str(i)+"]",xy = (0,wMatT[i,0]),color = 'black')
    plt.show()

 7.3.6 岭回归实现与K值确定

#前8列为Arr,后1列为yArr
xArr,yArr = loadDataSet('abalone.txt')
xMat,yMat = normData(xArr,yArr) #标准化数据集

Knum      = 30 #确定k的迭代次数
wMat      = zeros((Knum,shape(xMat)[1]))
klist     = zeros((Knum,1))
for i in xrange(Knum):
    k = float(i)/500  #算法的目的是确定k的值
    klist[i] = k      #k值列表
    xTx      = xMat.T*xMat
    denom    = xTx + eye(shape(xMat)[1])*k
    if linalg.det(denom) == 0.0:
        print "This matrix is singular,connot do inverse"
        sys.exit(0)
    ws = linalg.inv(denom) * (xMat.T*yMat)
    wMat[i,:] = ws.T
print klist
scatterplot(klist,klist)
scatterplot(wMat,klist)

 

 

 

 

参考资料:郑捷《机器学习算法原理与编程实践》 仅供学习研究

posted on 2017-02-20 10:41  金秀  阅读(299)  评论(0编辑  收藏  举报

导航