《机器学习》周志华 习题答案5.5
原题是写一个BP神经网络来拟合西瓜数据集,西瓜数据集我已经数值化了如下:
编号,色泽,根蒂,敲声,纹理,脐部,触感,密度,含糖率,好瓜 1,1,1,3,1,1,1,0.697,0.46,1 2,2,1,2,1,1,1,0.774,0.376,1 3,2,1,3,1,1,1,0.634,0.264,1 4,1,1,2,1,1,1,0.608,0.318,1 5,3,1,3,1,1,1,0.556,0.215,1 6,1,2,3,1,2,2,0.403,0.237,1 7,2,2,3,2,2,2,0.481,0.149,1 8,2,2,3,1,2,1,0.437,0.211,1 9,2,2,2,2,2,1,0.666,0.091,0 10,1,3,1,1,3,2,0.243,0.267,0 11,3,3,1,3,3,1,0.245,0.057,0 12,3,1,3,3,3,2,0.343,0.099,0 13,1,2,3,2,1,1,0.639,0.161,0 14,3,2,2,2,1,1,0.657,0.198,0 15,2,2,3,1,2,2,0.36,0.37,0 16,3,1,3,3,3,1,0.593,0.042,0 17,1,1,2,2,2,1,0.719,0.103,0
而后调用pybrain的库建立具有50个单元的单隐层神经网络,如下
#!/usr/bin/python
# -*- coding:utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.datasets import SupervisedDataSet
file1 = open('c:\quant\watermelon.csv','r')
data = [line.strip('\n').split(',') for line in file1]
data = np.array(data)
X = [raw for raw in data[1:,1:-1]]
y = [1 if raw[-1]=='1' else 0 for raw in data[1:]]
X = np.array(X)
y = np.array(y)
print X,y
#######################################################################以上是西瓜
fnn = buildNetwork(8,50,1)
DS = SupervisedDataSet(8,1)
for a,b in zip(X,y):
DS.addSample(a,b)
# 训练器采用BP算法
# verbose = True即训练时会把Total error打印出来,库里默认训练集和验证集的比例为4:1,可以在括号里更改
trainer = BackpropTrainer(fnn, DS, verbose = True, learningrate=0.01)
# maxEpochs即你需要的最大收敛迭代次数,这里采用的方法是训练至收敛,我一般设为1000
trainer.trainUntilConvergence(maxEpochs=10000)
# activate函数即神经网络训练后,预测的X2的输出值
for a,b in zip(X,y):
prediction = fnn.activate(a)
print prediction,b
下面分别是训练10000次和1000次的效果对比:
训练10000次,左边是训练结果,右边是理想输出 [ 0.99417443] 1 [ 0.99774329] 1 [ 1.00390992] 1 [ 0.99456691] 1 [ 0.99167349] 1 [ 0.99627566] 1 [-0.16419402] 1 [ 0.99678622] 1 [-0.00259512] 0 [ 1.46741515] 0 [ 0.57305884] 0 [-0.00284737] 0 [-0.0029103] 0 [-0.00400758] 0 [ 1.19899233] 0 [-0.00333452] 0 [-0.26766382] 0
训练1000次,左边是训练结果,右边是理想输出。 [ 0.3439171] 1 [ 0.71063964] 1 [ 0.86324691] 1 [ 0.39205173] 1 [ 0.97416348] 1 [ 0.55886924] 1 [ 0.1247508] 1 [ 1.6945434] 1 [ 0.38352444] 0 [ 0.7585709] 0 [-0.23212559] 0 [ 0.03274158] 0 [-0.33641601] 0 [-0.65105817] 0 [ 1.22768539] 0 [ 0.11638493] 0 [-0.13244805] 0
可以看到10000次的训练误差明显要低的多,但是有可能有过拟合问题。
参考文章:http://www.zengmingxia.com/use-pybrain-to-fit-neural-networks/