从二类分类到多类分类

转载请注明出处:http://www.cnblogs.com/OldPanda

上次的感知机算法笔记最后提到了将算法用于多类分类,还提供了相关的PPT链接,最近两天随便翻了翻《模式识别》的感知机部分,觉得某些地方比《统计学习方法》要来得晦涩一些,不过书上的内容比较全,其中就有这个将二类分类到多类分类的问题,是一个叫Kesler construction的方法,书上也给出了例题,在之前的PPT上从第35张开始,给的例题也非常详细,所以具体的算法操作过程就不再详述了。

现在就对书上的例题给出解答。

在二维空间考虑三类问题。每一类的训练向量如下:

                                                                                             

                                                                                             

                                                                                             

因为不同类的向量位于不同的象限,所以很明显这是一个线性可分问题。

大体的思路是这样的,先将向量扩展到三维空间,然后利用Kesler construction将这9个向量扩展到18个向量,每个向量为9×1大小。相应的权向量为

                                                                                              

                                                                                              

                                                                                              

为了编码方便,我直接将这三个权向量合并成一个1×9的向量w。我们可以通过使这18个特征向量都满足wx>0来运行感知机算法,也就是说,使所有的向量位于决策超平面的同一面。权向量的初值一般随机生成,这里简单起见全部设为0。

同样,给出我的求解代码:

import os
import numpy as np

# An simple example, the training set and parameters' sizes are fixed
training_set = np.array([[(1, 1), 1], [(2, 2), 1], [(2, 1), 1], [(1, -1), 2], [(1, -2), 2], [(2, -2), 2], [(-1, 1), 3], [(-1, 2), 3], [(-2, 1), 3]])
num_of_class = 3
new_features = np.array([]) # features after Kesler construction
w = np.zeros((1, 9)) # initial weights can be obtained by a random generator
learning_rate = 0.5

# extend dimension of features into a higher number, like from 2D to 3D
def extend(feature):
    new_feature = np.append(feature, (1))
    return new_feature

# build Kesler construction
def k_construct(item):
    global new_features
    dimension = (len(item[0]) + 1) * num_of_class
    extended_feature = extend(item[0])
    minus_feature = np.negative(extended_feature)
    zero_vector = np.zeros((1, 3))
    i = item[1] - 1
    res_1 = np.zeros((1, 9))
    res_2 = np.zeros((1, 9))
    flag = False
    for j in range(num_of_class):
        if i == j:
            res_1[0, (i * num_of_class) : ((i + 1) * num_of_class)] = extended_feature
            res_2[0, (i * num_of_class) : ((i + 1) * num_of_class)] = extended_feature
        elif flag == False:
            res_1[0, (j * num_of_class) : ((j + 1) * num_of_class)] = minus_feature
            res_2[0, (j * num_of_class) : ((j + 1) * num_of_class)] = zero_vector
            flag = True
        else:
            res_1[0, (j * num_of_class) : ((j + 1) * num_of_class)] = zero_vector
            res_2[0, (j * num_of_class) : ((j + 1) * num_of_class)] = minus_feature
    if len(new_features) == 0:
        new_features = res_1
        new_features = np.vstack((new_features, res_2))
    else:
        new_features = np.vstack((new_features, res_1))
        new_features = np.vstack((new_features, res_2))

# update weights
def update(item):
    global w
    w += learning_rate * item

# calculate the condition the hyperplane should satisfy
def cal(item):
    global w
    res = np.dot(item, np.transpose(w))
    return sum(res)

# check if the hyperplane can classify the examples correctly
def check():
    flag = False
    for item in new_features:
        if cal(item) <= 0:
            flag = True
            update(item)
    if not flag:
        print "RESULT: w: " + str(w)
        os._exit(0)
    flag = False

if __name__ == "__main__":
    for item in training_set:
        k_construct(item)
    for i in range(1000): # if we check more than 1000 times and can still not get the result, okey, goodbye! 
        check()
    print "The training_set is not linear separable. "

最后可以得到结果RESULT: w: [[ 1.   0.5 -1.   0.5 -1.5  0.5 -1.5  1.   0.5]],w的初值不同,结果也不一样。由于用python画图的技术还不怎么娴熟,就不给结果图了。

posted @ 2013-04-18 23:59  OldPanda  阅读(1624)  评论(1编辑  收藏  举报