从二类分类到多类分类

转载请注明出处：http://www.cnblogs.com/OldPanda

上次的感知机算法笔记最后提到了将算法用于多类分类，还提供了相关的PPT链接，最近两天随便翻了翻《模式识别》的感知机部分，觉得某些地方比《统计学习方法》要来得晦涩一些，不过书上的内容比较全，其中就有这个将二类分类到多类分类的问题，是一个叫Kesler construction的方法，书上也给出了例题，在之前的PPT上从第35张开始，给的例题也非常详细，所以具体的算法操作过程就不再详述了。

现在就对书上的例题给出解答。

例在二维空间考虑三类问题。每一类的训练向量如下：

${ \omega }_{ 1 }\quad :\quad { [1,\quad 1] }^{ T },\quad { [2,\quad 2] }^{ T },\quad { [2,\quad 1] }^{ T }$

${ \omega }_{ 2 }\quad :\quad { [1,\quad -1] }^{ T },\quad { [1,\quad -2] }^{ T },\quad { [2,\quad -2] }^{ T }$

${ \omega }_{ 3 }\quad :\quad { [-1,\quad 1] }^{ T },\quad { [-1,\quad 2] }^{ T },\quad { [-2,\quad 1] }^{ T }$

因为不同类的向量位于不同的象限，所以很明显这是一个线性可分问题。

大体的思路是这样的，先将向量扩展到三维空间，然后利用Kesler construction将这9个向量扩展到18个向量，每个向量为9×1大小。相应的权向量为

${ w }_{ 1 }\quad =\quad [{ w }_{ 11 },\quad { w }_{ 12 },\quad { w }_{ 10 }]$

${ w }_{ 2 }\quad =\quad [{ w }_{ 21 },\quad { w }_{ 22 },\quad { w }_{ 20 }]$

${ w }_{ 3 }\quad =\quad [{ w }_{ 31 },\quad { w }_{ 32 },\quad { w }_{ 30 }]$

为了编码方便，我直接将这三个权向量合并成一个1×9的向量w。我们可以通过使这18个特征向量都满足wx>0来运行感知机算法，也就是说，使所有的向量位于决策超平面的同一面。权向量的初值一般随机生成，这里简单起见全部设为0。

同样，给出我的求解代码：

import os
import numpy as np

# An simple example, the training set and parameters' sizes are fixed
training_set = np.array([[(1, 1), 1], [(2, 2), 1], [(2, 1), 1], [(1, -1), 2], [(1, -2), 2], [(2, -2), 2], [(-1, 1), 3], [(-1, 2), 3], [(-2, 1), 3]])
num_of_class = 3
new_features = np.array([]) # features after Kesler construction
w = np.zeros((1, 9)) # initial weights can be obtained by a random generator
learning_rate = 0.5

# extend dimension of features into a higher number, like from 2D to 3D
def extend(feature):
    new_feature = np.append(feature, (1))
    return new_feature

# build Kesler construction
def k_construct(item):
    global new_features
    dimension = (len(item[0]) + 1) * num_of_class
    extended_feature = extend(item[0])
    minus_feature = np.negative(extended_feature)
    zero_vector = np.zeros((1, 3))
    i = item[1] - 1
    res_1 = np.zeros((1, 9))
    res_2 = np.zeros((1, 9))
    flag = False
    for j in range(num_of_class):
        if i == j:
            res_1[0, (i * num_of_class) : ((i + 1) * num_of_class)] = extended_feature
            res_2[0, (i * num_of_class) : ((i + 1) * num_of_class)] = extended_feature
        elif flag == False:
            res_1[0, (j * num_of_class) : ((j + 1) * num_of_class)] = minus_feature
            res_2[0, (j * num_of_class) : ((j + 1) * num_of_class)] = zero_vector
            flag = True
        else:
            res_1[0, (j * num_of_class) : ((j + 1) * num_of_class)] = zero_vector
            res_2[0, (j * num_of_class) : ((j + 1) * num_of_class)] = minus_feature
    if len(new_features) == 0:
        new_features = res_1
        new_features = np.vstack((new_features, res_2))
    else:
        new_features = np.vstack((new_features, res_1))
        new_features = np.vstack((new_features, res_2))

# update weights
def update(item):
    global w
    w += learning_rate * item

# calculate the condition the hyperplane should satisfy
def cal(item):
    global w
    res = np.dot(item, np.transpose(w))
    return sum(res)

# check if the hyperplane can classify the examples correctly
def check():
    flag = False
    for item in new_features:
        if cal(item) <= 0:
            flag = True
            update(item)
    if not flag:
        print "RESULT: w: " + str(w)
        os._exit(0)
    flag = False

if __name__ == "__main__":
    for item in training_set:
        k_construct(item)
    for i in range(1000): # if we check more than 1000 times and can still not get the result, okey, goodbye! 
        check()
    print "The training_set is not linear separable. "

最后可以得到结果RESULT: w: [[ 1. 0.5 -1. 0.5 -1.5 0.5 -1.5 1. 0.5]]，w的初值不同，结果也不一样。由于用python画图的技术还不怎么娴熟，就不给结果图了。

posted @ 2013-04-18 23:59 OldPanda 阅读(1624) 评论(1) 编辑收藏举报

刷新页面返回顶部

Panda Home

从二类分类到多类分类

公告