【深度学习基础】基于Numpy的感知机Perception构建和训练

1. 感知机模型

感知机Perception是一个线性的分类器，其只适用于线性可分的数据：\[ f(\mathbf{x}) = sign(\mathbf{w}^\mathrm{T} \mathbf{x} + b)\]

其试图在所有的线性可分超平面构成的假设空间中找到一个能使训练集中的数据可分的超平面。

感知机并未对这一超平面做特殊要求，只要能区分开训练数据即可。因此，它找到的超平面并不一定是最优的，即可能只是恰好拟合了训练数据的超平面，其泛化能力并不佳。

2. 学习

由于直接最小化误分类点的个数并不可微，感知机的学习策略被设为：最小化误分类点到超平面的距离。

在初始化超平面的参数即法向量和偏置后，迭代地计算样本的损失并求出参数的梯度，再基于SGD来更新参数即可，直到损失收敛后停止学习。

3. 基于numpy的感知机实现

 1 # coding: utf-8
 2 import numpy as np
 3 
 4 
 5 def prepare_data(n=100):
 6     # Fitting OR gate
 7     def OR(x):
 8         w = np.array([0.5, 0.5])
 9         b = -0.2
10         tmp = np.sum(w*x) + b
11         if tmp <= 0:
12             return 0
13         else:
14             return 1
15 
16     inputs = np.random.randn(n, input_size)
17     labels = np.array([OR(inputs[i]) for i in range(n)])
18     return inputs, labels
19     
20 
21 class Perception:
22     def __init__(self, input_size, lr=0.001):
23         # 初始化权重和偏置
24         self.w = np.random.randn(input_size)
25         self.b = np.random.randn(1)
26         self.lr = np.array(lr)
27 
28     def predict(self, x):
29         tmp = np.sum(self.w*x) + self.b
30         if tmp <= 0:
31             return -1
32         else:
33             return 1
34 
35     def update(self, x, y):
36         # 基于SGD的参数更新（由最小化误分类点到超平面的距离求导可得）
37         self.w = self.w + self.lr*y*x
38         self.b = self.b + self.lr*y
39 
40 
41 n = 1000     # 训练样本数
42 ratio = 0.8  # 训练测试比
43 input_size = 2
44 
45 print("Preparing Data {}".format(n))
46 X, Y = prepare_data(n)
47 clip_num = int(n * ratio)
48 train_X, train_Y = X[:clip_num], Y[:clip_num]
49 test_X, test_Y = X[clip_num:], Y[clip_num:]
50 
51 # Init model
52 lr = 0.005
53 model = Perception(input_size, lr)
54 s = model.predict(X[0])
55 print("Input: ({}, {}), Output: {}".format(X[0][0], X[0][1], s))
56 
57 # Training
58 epoches = 100
59 for i in range(epoches):
60     loss = 0
61     wrong_index = []
62     print("\nEpoch {}".format(i+1))
63     print("Forward Computing")
64     for idx in range(clip_num):
65         pred_y = model.predict(train_X[idx])
66         if pred_y != train_Y[idx]:
67             wrong_index.append(idx)
68             tmp_loss = abs(float(np.sum(model.w*train_X[idx]) + model.b))
69             loss += tmp_loss
70 
71     print("Wrong predict samples: {}, Loss: {}".format(len(wrong_index), loss))
72     print("Learning")
73     for j in wrong_index:
74         model.update(train_X[j], train_Y[j])
75         
76 
77 # Testing
78 wrong_num = 0
79 test_loss = 0
80 for j in range(test_X.shape[0]):
81     pred_y = model.predict(test_X[j])
82     if pred_y != test_Y[j]:
83         tmp_loss = abs(float(np.sum(model.w*test_X[j]) + model.b))
84         test_loss += tmp_loss
85         wrong_num += 1
86 print("\nTest wrong predict samples: {}, Loss: {}".format(wrong_num , test_loss))

4. 感知机的延伸

感知机Perception是线性模型，它不能学习非线性函数，因而它对线性不可分的数据束手无力。例如，感知机可以拟合与门（AND）、或门（OR）、非门（NOT）产生的数据，但是不能处理好异或门（XOR）产生的数据。

基于感知机，可以延伸出LR、 SVM。此外，值得注意的是，虽然单个感知机的表达能力有限，但是如果将多个感知机叠加起来，则可以具备足够强的表达能力，即 Multi-layer Perception（MLP)的通用近似定理（给定足够多的数据和足够宽的两层MLP，可以近似任意连续函数）。

在《深度学习入门：基于Python的理论与实现》一书中有个直观的例子。假设用三个Perception分别拟合与门、非门和或门，再基于数字电路的知识将这三个门组合起来，即可以构成异或门。

posted @ 2022-04-18 17:24 LeonYi 阅读(280) 评论(0) 编辑收藏举报

刷新页面返回顶部

LeonYi