从二类分类到多类分类
转载请注明出处:http://www.cnblogs.com/OldPanda
上次的感知机算法笔记最后提到了将算法用于多类分类,还提供了相关的PPT链接,最近两天随便翻了翻《模式识别》的感知机部分,觉得某些地方比《统计学习方法》要来得晦涩一些,不过书上的内容比较全,其中就有这个将二类分类到多类分类的问题,是一个叫Kesler construction的方法,书上也给出了例题,在之前的PPT上从第35张开始,给的例题也非常详细,所以具体的算法操作过程就不再详述了。
现在就对书上的例题给出解答。
例 在二维空间考虑三类问题。每一类的训练向量如下:
因为不同类的向量位于不同的象限,所以很明显这是一个线性可分问题。
大体的思路是这样的,先将向量扩展到三维空间,然后利用Kesler construction将这9个向量扩展到18个向量,每个向量为9×1大小。相应的权向量为
为了编码方便,我直接将这三个权向量合并成一个1×9的向量w。我们可以通过使这18个特征向量都满足wx>0来运行感知机算法,也就是说,使所有的向量位于决策超平面的同一面。权向量的初值一般随机生成,这里简单起见全部设为0。
同样,给出我的求解代码:
import os import numpy as np # An simple example, the training set and parameters' sizes are fixed training_set = np.array([[(1, 1), 1], [(2, 2), 1], [(2, 1), 1], [(1, -1), 2], [(1, -2), 2], [(2, -2), 2], [(-1, 1), 3], [(-1, 2), 3], [(-2, 1), 3]]) num_of_class = 3 new_features = np.array([]) # features after Kesler construction w = np.zeros((1, 9)) # initial weights can be obtained by a random generator learning_rate = 0.5 # extend dimension of features into a higher number, like from 2D to 3D def extend(feature): new_feature = np.append(feature, (1)) return new_feature # build Kesler construction def k_construct(item): global new_features dimension = (len(item[0]) + 1) * num_of_class extended_feature = extend(item[0]) minus_feature = np.negative(extended_feature) zero_vector = np.zeros((1, 3)) i = item[1] - 1 res_1 = np.zeros((1, 9)) res_2 = np.zeros((1, 9)) flag = False for j in range(num_of_class): if i == j: res_1[0, (i * num_of_class) : ((i + 1) * num_of_class)] = extended_feature res_2[0, (i * num_of_class) : ((i + 1) * num_of_class)] = extended_feature elif flag == False: res_1[0, (j * num_of_class) : ((j + 1) * num_of_class)] = minus_feature res_2[0, (j * num_of_class) : ((j + 1) * num_of_class)] = zero_vector flag = True else: res_1[0, (j * num_of_class) : ((j + 1) * num_of_class)] = zero_vector res_2[0, (j * num_of_class) : ((j + 1) * num_of_class)] = minus_feature if len(new_features) == 0: new_features = res_1 new_features = np.vstack((new_features, res_2)) else: new_features = np.vstack((new_features, res_1)) new_features = np.vstack((new_features, res_2)) # update weights def update(item): global w w += learning_rate * item # calculate the condition the hyperplane should satisfy def cal(item): global w res = np.dot(item, np.transpose(w)) return sum(res) # check if the hyperplane can classify the examples correctly def check(): flag = False for item in new_features: if cal(item) <= 0: flag = True update(item) if not flag: print "RESULT: w: " + str(w) os._exit(0) flag = False if __name__ == "__main__": for item in training_set: k_construct(item) for i in range(1000): # if we check more than 1000 times and can still not get the result, okey, goodbye! check() print "The training_set is not linear separable. "
最后可以得到结果RESULT: w: [[ 1. 0.5 -1. 0.5 -1.5 0.5 -1.5 1. 0.5]],w的初值不同,结果也不一样。由于用python画图的技术还不怎么娴熟,就不给结果图了。