AdaBoost - Python实现
- 算法特征:
①. 模型级联加权(级联权重); ②. 样本特征选择; ③. 样本权重更新(关联权重) - 算法原理:
Part Ⅰ:
给定如下原始数据集:
\begin{equation}
D = \{(x^{(1)}, \bar{y}^{(1)}), (x^{(2)}, \bar{y}^{(2)}), \cdots, (x^{(n)}, \bar{y}^{(n)})\}, \quad\text{where }\bar{y}^{(i)} \in \{-1, +1\}
\label{eq_1}
\end{equation}
指定级联加权弱模型之最大数量为$T$. 对于第$t$个弱模型$h_t(x)$, 其样本关联权重分布如下:
\begin{equation}
W_t = \{ w_t^{(1)}, w_t^{(2)}, \cdots, w_t^{(n)} \}
\label{eq_2}
\end{equation}
主要流程如下:
步骤1. 均匀初始化样本关联权重:
\begin{equation}
W_1 = \{ w_1^{(1)}, w_1^{(2)}, \cdots, w_1^{(n)}\}, \quad\text{where }w_1^{(i)} = \frac{1}{n}
\label{eq_3}
\end{equation}
步骤2. 对于迭代次数$t = 1, 2, \cdots, T$, 在训练样本上基于关联权重分布$W_t$获得弱模型$h_t(x) = \pm 1$.
注意:
①. 此弱模型的构建依赖于训练样本的特征选择操作;
②. 若弱模型支持权重样本, 重赋权(调整损失函数)以更新模型, 否则, 重采样(有放回采样)以更新模型.
步骤3. 计算弱模型$h_t(x)$在训练样本上的加权错误率:
\begin{equation}
\epsilon_t = \sum_{i=1}^nw_t^{(i)}I(h_t(x^{(i)}) \neq \bar{y}^{(i)})
\label{eq_4}
\end{equation}
计算弱模型$h_t(x)$在最终决策模型中的级联权重:
\begin{equation}
\alpha_t = \frac{1}{2}\mathrm{ln}\frac{1-\epsilon_t}{\epsilon_t}
\label{eq_5}
\end{equation}
步骤4. 更新样本关联权重
\begin{equation}
\begin{split}
&W_{t+1} = \{ w_{t+1}^{(1)}, w_{t+1}^{(2)}, \cdots, w_{t+1}^{(n)} \} \\
&w_{t+1}^{(i)} = \frac{w_{t}^{(i)}}{Z_t}\mathrm{exp}(-\alpha_t\bar{y}^{(i)}h_t(x^{(i)})) \\
&Z_t = \sum_{i=1}^{n}w_{t}^{(i)}\mathrm{exp}(-\alpha_t\bar{y}^{(i)}h_t(x^{(i)}))
\end{split}
\label{eq_6}
\end{equation}
步骤5. 回到步骤2
步骤6. 最终决策模型为:
\begin{equation}
\begin{split}
f(x) &= \sum_{t=1}^{T}\alpha_th_t(x) \\
h_{final}(x) &= \mathrm{sign}(f(x)) = \mathrm{sign}(\sum_{t=1}^T\alpha_th_t(x))
\end{split}
\label{eq_7}
\end{equation}
Part Ⅱ:
定理:
若弱模型$h_t(x)$在训练集上加权错误率$\epsilon_t < 0.5$, 则随着$T$的增加, AdaBoost最终决策模型$h_{final}(x)$在训练集上错误率$E$将越来越小, 即:
\begin{equation*}
E = \frac{1}{n}\sum_{i=1}^{n}I(h_{final}(x^{(i)}) \neq \bar{y}^{(i)}) \leq \frac{1}{n}\sum_{i=1}^{n}\mathrm{exp}(-\bar{y}^{(i)}f(x^{(i)})) = \prod_{t=1}^{T}Z_t
\end{equation*}
证明:
\begin{equation*}
\begin{split}
E &\leq \frac{1}{n}\sum_{i=1}^n\mathrm{exp}(-\bar{y}^{(i)}f(x^{(i)})) \\
&=\frac{1}{n}\sum_{i=1}^n\mathrm{exp}(-\bar{y}^{(i)}\sum_{t=1}^T\alpha_th_t(x^{(i)})) \\
&=\frac{1}{n}\sum_{i=1}^n\mathrm{exp}(\sum_{t=1}^T-\alpha_t\bar{y}^{(i)}h_t(x^{(i)})) \\
&=\sum_{i=1}^nw_1^{(i)}\prod_{t=1}^T\mathrm{exp}(-\alpha_t\bar{y}^{(i)}h_t(x^{(i)})) \\
&=\sum_{i=1}^nw_1^{(i)}\mathrm{exp}(-\alpha_1\bar{y}^{(i)}h_1(x^{(i)}))\prod_{t=2}^T\mathrm{exp}(-\alpha_t\bar{y}^{(i)}h_t(x^{(i)})) \\
&=\sum_{i=1}^nZ_1w_2^{(i)}\prod_{t=2}^T\mathrm{exp}(-\alpha_t\bar{y}^{(i)}h_t(x^{(i)})) \\
&=Z_1\sum_{i=1}^nw_2^{(i)}\prod_{t=2}^T\mathrm{exp}(-\alpha_t\bar{y}^{(i)}h_t(x^{(i)})) \\
&=\prod_{t=1}^TZ_t
\end{split}
\end{equation*}
\begin{equation*}
\begin{split}
Z_t &= \sum_{i=1}^{n}w_{t}^{(i)}\mathrm{exp}(-\alpha_t\bar{y}^{(i)}h_t(x^{(i)})) \\
&=\sum_{i=1; \bar{y}^{(i)}=h_t(x^{(i)})}^nw_t^{(i)}\mathrm{exp}(-\alpha_t) + \sum_{i=1; \bar{y}^{(i)} \neq h_t(x^{(i)})}^nw_t^{(i)}\mathrm{exp}(\alpha_t) \\
&=\mathrm{exp}(-\alpha_t)\sum_{i=1; \bar{y}^{(i)}=h_t(x^{(i)})}^nw_t^{(i)} + \mathrm{exp}(\alpha_t)\sum_{i=1; \bar{y}^{(i)} \neq h_t(x^{(i)})}^nw_t^{(i)} \\
&=\mathrm{exp}(-\alpha_t)(1 - \epsilon_t) + \mathrm{exp}(\alpha_t)\epsilon_t \\
&=2\sqrt{\epsilon_t(1 - \epsilon_t)}
\end{split}
\end{equation*}
对于弱模型$h_t(x)$, 若其在训练集上的加权错误率$\epsilon_t < 0.5$, 则$Z_t < 1$. 此时, 随着$T$的增加, AdaBoost最终决策模型$h_{final}(x)$在训练集上错误率$E$将越来越小. 证毕. - 代码实现:
本文以线性SVM作为弱模型, 对AdaBoost进行算法实施. SVM相关内容详见:
Smooth Support Vector Machine - Python实现
1 # AdaBoost之实现 2 # 注意, 采用级联加权SVM实施之 3 4 5 import numpy 6 from matplotlib import pyplot as plt 7 8 9 10 def spiral_point(val, center=(0, 0)): 11 rn = 0.4 * (105 - val) / 104 12 an = numpy.pi * (val - 1) / 25 13 14 x0 = center[0] + rn * numpy.sin(an) 15 y0 = center[1] + rn * numpy.cos(an) 16 z0 = -1 17 x1 = center[0] - rn * numpy.sin(an) 18 y1 = center[1] - rn * numpy.cos(an) 19 z1 = 1 20 21 return (x0, y0, z0), (x1, y1, z1) 22 23 24 25 def spiral_data(valList): 26 dataList = list(spiral_point(val) for val in valList) 27 data0 = numpy.array(list(item[0] for item in dataList)) 28 data1 = numpy.array(list(item[1] for item in dataList)) 29 return data0, data1 30 31 32 33 class SSVM(object): 34 35 def __init__(self, trainingSet, c=1, mu=1, beta=100): 36 self.__trainingSet = trainingSet # 训练集数据 37 self.__c = c # 误差项权重 38 self.__mu = mu # gaussian kernel参数 39 self.__beta = beta # 光滑化参数 40 41 self.__A, self.__D = self.__get_AD() 42 43 44 def get_cls(self, x, alpha, b): 45 A, D = self.__A, self.__D 46 mu = self.__mu 47 48 x = numpy.array(x).reshape((-1, 1)) 49 KAx = self.__get_KAx(A, x, mu) 50 clsVal = self.__calc_hVal(KAx, D, alpha, b) 51 if clsVal >= 0: 52 return 1 53 else: 54 return -1 55 56 57 def optimize(self, maxIter=1000, epsilon=1.e-9): 58 ''' 59 maxIter: 最大迭代次数 60 epsilon: 收敛判据, 梯度趋于0则收敛 61 ''' 62 A, D = self.__A, self.__D 63 c = self.__c 64 mu = self.__mu 65 beta = self.__beta 66 67 alpha, b = self.__init_alpha_b((A.shape[1], 1)) 68 KAA = self.__get_KAA(A, mu) 69 70 JVal = self.__calc_JVal(KAA, D, c, beta, alpha, b) 71 grad = self.__calc_grad(KAA, D, c, beta, alpha, b) 72 DMat = self.__init_D(KAA.shape[0] + 1) 73 74 for i in range(maxIter): 75 # print("iterCnt: {:3d}, JVal: {}".format(i, JVal)) 76 if self.__converged1(grad, epsilon): 77 return alpha, b, True 78 79 dCurr = -numpy.matmul(DMat, grad) 80 ALPHA = self.__calc_ALPHA_by_ArmijoRule(alpha, b, JVal, grad, dCurr, KAA, D, c, beta) 81 82 delta = ALPHA * dCurr 83 alphaNew = alpha + delta[:-1, :] 84 bNew = b + delta[-1, -1] 85 JValNew = self.__calc_JVal(KAA, D, c, beta, alphaNew, bNew) 86 if self.__converged2(delta, JValNew - JVal, epsilon ** 2): 87 return alphaNew, bNew, True 88 89 gradNew = self.__calc_grad(KAA, D, c, beta, alphaNew, bNew) 90 DMatNew = self.__update_D_by_BFGS(delta, gradNew - grad, DMat) 91 92 alpha, b, JVal, grad, DMat = alphaNew, bNew, JValNew, gradNew, DMatNew 93 else: 94 if self.__converged1(grad, epsilon): 95 return alpha, b, True 96 return alpha, b, False 97 98 99 def __update_D_by_BFGS(self, sk, yk, D): 100 rk = 1 / (numpy.matmul(yk.T, sk)[0, 0] + 1.e-30) 101 102 term1 = rk * numpy.matmul(sk, yk.T) 103 term2 = rk * numpy.matmul(yk, sk.T) 104 I = numpy.identity(term1.shape[0]) 105 term3 = numpy.matmul(I - term1, D) 106 term4 = numpy.matmul(term3, I - term2) 107 term5 = rk * numpy.matmul(sk, sk.T) 108 109 DNew = term4 + term5 110 return DNew 111 112 113 def __calc_ALPHA_by_ArmijoRule(self, alphaCurr, bCurr, JCurr, gCurr, dCurr, KAA, D, c, beta, C=1.e-4, v=0.5): 114 i = 0 115 ALPHA = v ** i 116 delta = ALPHA * dCurr 117 alphaNext = alphaCurr + delta[:-1, :] 118 bNext = bCurr + delta[-1, -1] 119 JNext = self.__calc_JVal(KAA, D, c, beta, alphaNext, bNext) 120 while True: 121 if JNext <= JCurr + C * ALPHA * numpy.matmul(dCurr.T, gCurr)[0, 0]: break 122 i += 1 123 ALPHA = v ** i 124 delta = ALPHA * dCurr 125 alphaNext = alphaCurr + delta[:-1, :] 126 bNext = bCurr + delta[-1, -1] 127 JNext = self.__calc_JVal(KAA, D, c, beta, alphaNext, bNext) 128 return ALPHA 129 130 131 def __converged1(self, grad, epsilon): 132 if numpy.linalg.norm(grad, ord=numpy.inf) <= epsilon: 133 return True 134 return False 135 136 137 def __converged2(self, delta, JValDelta, epsilon): 138 val1 = numpy.linalg.norm(delta, ord=numpy.inf) 139 val2 = numpy.abs(JValDelta) 140 if val1 <= epsilon or val2 <= epsilon: 141 return True 142 return False 143 144 145 def __init_D(self, n): 146 D = numpy.identity(n) 147 return D 148 149 150 def __calc_grad(self, KAA, D, c, beta, alpha, b): 151 grad_J1 = self.__calc_grad_J1(alpha) 152 grad_J2 = self.__calc_grad_J2(KAA, D, c, beta, alpha, b) 153 grad = grad_J1 + grad_J2 154 return grad 155 156 157 def __calc_grad_J2(self, KAA, D, c, beta, alpha, b): 158 grad_J2 = numpy.zeros((KAA.shape[0] + 1, 1)) 159 Y = numpy.matmul(D, numpy.ones((D.shape[0], 1))) 160 YY = numpy.matmul(Y, Y.T) 161 KAAYY = KAA * YY 162 163 z = 1 - numpy.matmul(KAAYY, alpha) - Y * b 164 p = numpy.array(list(self.__p(z[i, 0], beta) for i in range(z.shape[0]))).reshape((-1, 1)) 165 s = numpy.array(list(self.__s(z[i, 0], beta) for i in range(z.shape[0]))).reshape((-1, 1)) 166 term = p * s 167 168 for k in range(grad_J2.shape[0] - 1): 169 val = -c * Y[k, 0] * numpy.sum(Y * KAA[:, k:k+1] * term) 170 grad_J2[k, 0] = val 171 grad_J2[-1, 0] = -c * numpy.sum(Y * term) 172 return grad_J2 173 174 175 def __calc_grad_J1(self, alpha): 176 grad_J1 = numpy.vstack((alpha, [[0]])) 177 return grad_J1 178 179 180 def __calc_JVal(self, KAA, D, c, beta, alpha, b): 181 J1 = self.__calc_J1(alpha) 182 J2 = self.__calc_J2(KAA, D, c, beta, alpha, b) 183 JVal = J1 + J2 184 return JVal 185 186 187 def __calc_J2(self, KAA, D, c, beta, alpha, b): 188 tmpOne = numpy.ones((KAA.shape[0], 1)) 189 x = tmpOne - numpy.matmul(numpy.matmul(numpy.matmul(D, KAA), D), alpha) - numpy.matmul(D, tmpOne) * b 190 p = numpy.array(list(self.__p(x[i, 0], beta) for i in range(x.shape[0]))) 191 J2 = numpy.sum(p * p) * c / 2 192 return J2 193 194 195 def __calc_J1(self, alpha): 196 J1 = numpy.sum(alpha * alpha) / 2 197 return J1 198 199 200 def __get_KAA(self, A, mu): 201 KAA = numpy.zeros((A.shape[1], A.shape[1])) 202 for rowIdx in range(KAA.shape[0]): 203 for colIdx in range(rowIdx + 1): 204 x1 = A[:, rowIdx:rowIdx+1] 205 x2 = A[:, colIdx:colIdx+1] 206 val = self.__calc_gaussian(x1, x2, mu) 207 KAA[rowIdx, colIdx] = KAA[colIdx, rowIdx] = val 208 return KAA 209 210 211 def __get_KAx(self, A, x, mu): 212 KAx = numpy.zeros((A.shape[1], 1)) 213 for rowIdx in range(KAx.shape[0]): 214 x1 = A[:, rowIdx:rowIdx+1] 215 val = self.__calc_gaussian(x1, x, mu) 216 KAx[rowIdx, 0] = val 217 return KAx 218 219 220 def __calc_hVal(self, KAx, D, alpha, b): 221 hVal = numpy.matmul(numpy.matmul(alpha.T, D), KAx)[0, 0] + b 222 return hVal 223 224 225 def __calc_gaussian(self, x1, x2, mu): 226 # val = numpy.math.exp(-mu * numpy.linalg.norm(x1 - x2) ** 2) # 高斯核 227 # val = (numpy.sum(x1 * x2) + 1) ** 1 # 多项式核 228 val = numpy.sum(x1 * x2) # 线性核 229 return val 230 231 232 def __init_alpha_b(self, shape): 233 ''' 234 alpha, b之初始化 235 ''' 236 alpha, b = numpy.zeros(shape), 0 237 return alpha, b 238 239 240 def __get_AD(self): 241 A = self.__trainingSet[:, :2].T 242 D = numpy.diag(self.__trainingSet[:, 2]) 243 return A, D 244 245 246 def __p(self, x, beta): 247 term = x * beta 248 if term > 10: 249 val = x + numpy.math.log(1 + numpy.math.exp(-term)) / beta 250 else: 251 val = numpy.math.log(numpy.math.exp(term) + 1) / beta 252 return val 253 254 255 def __s(self, x, beta): 256 term = x * beta 257 if term > 10: 258 val = 1 / (numpy.math.exp(-beta * x) + 1) 259 else: 260 term1 = numpy.math.exp(term) 261 val = term1 / (1 + term1) 262 return val 263 264 265 266 class AdaBoost(object): 267 268 def __init__(self, trainingSet): 269 self.__trainingSet = trainingSet # 训练数据集 270 271 self.__W = self.__init_weight() # 关联权重初始化 272 273 274 def get_weakModels(self, T=100): 275 ''' 276 T: 弱模型上限数量 277 ''' 278 T = T if T >= 1 else 1 279 W = self.__W 280 trainingSet = self.__trainingSet 281 282 weakModels = list() # 级联弱模型(SVM)列表 283 alphaList = list() # 级联权重列表 284 for t in range(T): 285 print("getting the {}th weak model...".format(t)) 286 weakModel, alpha = self.__get_weakModel(trainingSet, W, 0.49) 287 weakModels.append(weakModel) 288 alphaList.append(alpha) 289 290 W = self.__update_W(W, weakModel, alpha) # 更新关联权重 291 else: 292 realErr = self.__calc_realErr(weakModels, alphaList, trainingSet) # 计算真实错误率 293 print("Final error rate: {}".format(realErr)) 294 return weakModels, alphaList 295 296 297 def get_realErr(self, weakModels, alphaList, dataSet=None): 298 ''' 299 计算AdaBoost在指定数据集上的错误率 300 weakModels: 弱模型列表 301 alphaList: 级联权重列表 302 dataSet: 指定数据集 303 ''' 304 if dataSet is None: 305 dataSet = self.__trainingSet 306 307 realErr = self.__calc_realErr(weakModels, alphaList, dataSet) 308 return realErr 309 310 311 def get_cls(self, x, weakModels, alphaList): 312 hVal = self.__calc_hVal(x, weakModels, alphaList) 313 if hVal >= 0: 314 return 1 315 else: 316 return -1 317 318 319 def __calc_realErr(self, weakModels, alphaList, dataSet): 320 cnt = 0 321 num = dataSet.shape[0] 322 for sample in dataSet: 323 x, y_ = sample[:-1], sample[-1] 324 y = self.get_cls(x, weakModels, alphaList) 325 if y_ != y: 326 cnt += 1 327 err = cnt / num 328 return err 329 330 331 def __calc_hVal(self, x, weakModels, alphaList): 332 if len(weakModels) == 0: 333 raise Exception("Weak model list is empty!") 334 335 hVal = 0 336 for (ssvmObj, ssvmRet), alpha in zip(weakModels, alphaList): 337 hVal += ssvmObj.get_cls(x, ssvmRet[0], ssvmRet[1]) * alpha 338 return hVal 339 340 341 def __update_W(self, W, weakModel, alpha): 342 ssvmObj, ssvmRet = weakModel 343 trainingSet = self.__trainingSet 344 345 WNew = list() 346 for sample in trainingSet: 347 x, y_ = sample[:-1], sample[-1] 348 val = numpy.math.exp(-alpha * y_ * ssvmObj.get_cls(x, ssvmRet[0], ssvmRet[1])) 349 WNew.append(val) 350 WNew = numpy.array(WNew) * W 351 WNew = WNew / numpy.sum(WNew) 352 return WNew 353 354 355 def __get_weakModel(self, trainingSet, W, maxEpsilon=0.5, maxIter=100): 356 ''' 357 获取弱模型及级联权重 358 W: 关联权重 359 maxEpsilon: 最大加权错误率 360 maxIter: 最大迭代次数 361 ''' 362 roulette = self.__build_roulette(W) 363 for idx in range(maxIter): 364 dataSet = self.__get_dataSet(trainingSet, roulette) 365 weakModel = self.__build_weakModel(dataSet) 366 367 epsilon = self.__calc_weightedErr(trainingSet, weakModel, W) 368 if epsilon == 0: 369 raise Exception("The model is not weak enough with epsilon = 0") 370 elif epsilon < maxEpsilon: 371 alpha = self.__calc_alpha(epsilon) 372 return weakModel, alpha 373 else: 374 raise Exception("Fail to get weak model after {} iterations!".format(maxIter)) 375 376 377 def __calc_alpha(self, epsilon): 378 ''' 379 计算级联权重 380 ''' 381 alpha = numpy.math.log(1 / epsilon - 1) / 2 382 return alpha 383 384 385 def __calc_weightedErr(self, trainingSet, weakModel, W): 386 ''' 387 计算加权错误率 388 ''' 389 ssvmObj, (alpha, b, tab) = weakModel 390 391 epsilon = 0 392 for idx, w in enumerate(W): 393 x, y_ = trainingSet[idx, :-1], trainingSet[idx, -1] 394 y = ssvmObj.get_cls(x, alpha, b) 395 if y_ != y: 396 epsilon += w 397 return epsilon 398 399 400 def __build_weakModel(self, dataSet): 401 ''' 402 构造SVM弱模型 403 ''' 404 ssvmObj = SSVM(dataSet, c=0.1, mu=250, beta=100) 405 ssvmRet = ssvmObj.optimize() 406 return (ssvmObj, ssvmRet) 407 408 409 def __get_dataSet(self, trainingSet, roulette): 410 randomDart = numpy.sort(numpy.random.uniform(0, 1, trainingSet.shape[0])) 411 dataSet = list() 412 idxRoulette = idxDart = 0 413 while idxDart < len(randomDart): 414 if randomDart[idxDart] > roulette[idxRoulette]: 415 idxRoulette += 1 416 else: 417 dataSet.append(trainingSet[idxRoulette]) 418 idxDart += 1 419 return numpy.array(dataSet) 420 421 422 def __build_roulette(self, W): 423 roulette = list() 424 val = 0 425 for ele in W: 426 val += ele 427 roulette.append(val) 428 return roulette 429 430 431 def __init_weight(self): 432 num = self.__trainingSet.shape[0] 433 W = numpy.ones(num) / num 434 return W 435 436 437 438 class AdaBoostPlot(object): 439 440 @staticmethod 441 def data_plot(trainingData0, trainingData1): 442 fig = plt.figure(figsize=(5, 5)) 443 ax1 = plt.subplot() 444 445 ax1.scatter(trainingData1[:, 0], trainingData1[:, 1], c="red", marker="o", s=10, label="Positive") 446 ax1.scatter(trainingData0[:, 0], trainingData0[:, 1], c="blue", marker="o", s=10, label="Negative") 447 448 ax1.set(xlim=(-0.5, 0.5), ylim=(-0.5, 0.5), xlabel="$x_1$", ylabel="$x_2$") 449 ax1.legend(fontsize="x-small") 450 451 fig.savefig("data.png", dpi=100) 452 # plt.show() 453 plt.close() 454 455 456 @staticmethod 457 def pred_plot(trainingData0, trainingData1, adaObj, weakModels, alphaList): 458 x = numpy.linspace(-0.5, 0.5, 500) 459 y = numpy.linspace(-0.5, 0.5, 500) 460 x, y = numpy.meshgrid(x, y) 461 z = numpy.zeros(x.shape) 462 for rowIdx in range(x.shape[0]): 463 print("on the {}th row".format(rowIdx)) 464 for colIdx in range(x.shape[1]): 465 z[rowIdx, colIdx] = adaObj.get_cls((x[rowIdx, colIdx], y[rowIdx, colIdx]), weakModels, alphaList) 466 467 errList = list() 468 for idx in range(len(weakModels)): 469 tmpWeakModels = weakModels[:idx+1] 470 tmpAlphaList = alphaList[:idx+1] 471 realErr = adaObj.get_realErr(tmpWeakModels, tmpAlphaList) 472 print("idx = {}; realErr = {}".format(idx, realErr)) 473 errList.append(realErr) 474 475 fig = plt.figure(figsize=(10, 3)) 476 ax1 = plt.subplot(1, 2, 1) 477 ax2 = plt.subplot(1, 2, 2) 478 479 ax1.plot(numpy.arange(len(errList))+1, errList, linestyle="--", marker=".") 480 ax1.set(xlabel="T", ylabel="error rate") 481 482 ax2.contourf(x, y, z, levels=[-1.5, 0, 1.5], colors=["blue", "red"], alpha=0.3) 483 ax2.scatter(trainingData1[:, 0], trainingData1[:, 1], c="red", marker="o", s=10, label="Positive") 484 ax2.scatter(trainingData0[:, 0], trainingData0[:, 1], c="blue", marker="o", s=10, label="Negative") 485 ax2.set(xlim=(-0.5, 0.5), ylim=(-0.5, 0.5), xlabel="$x_1$", ylabel="$x_2$") 486 ax2.legend(loc="upper left", fontsize="x-small") 487 fig.tight_layout() 488 fig.savefig("pred.png", dpi=100) 489 # plt.show() 490 plt.close() 491 492 493 494 if __name__ == "__main__": 495 # 生成训练数据集 496 trainingValList = numpy.arange(1, 101, 1) 497 trainingData0, trainingData1 = spiral_data(trainingValList) 498 trainingSet = numpy.vstack((trainingData0, trainingData1)) 499 500 adaObj = AdaBoost(trainingSet) 501 weakModels, alphaList = adaObj.get_weakModels(200) 502 503 AdaBoostPlot.data_plot(trainingData0, trainingData1) 504 AdaBoostPlot.pred_plot(trainingData0, trainingData1, adaObj, weakModels, alphaList)
笔者所用训练数据集分布如下:
很显然, 此数据集非线性可分, 直接采用线性SVM将获得较差的分类效果. - 结果展示:
左侧为训练集上错误率$E$随弱模型数量$T$的变化情况, 右侧为AdaBoost在此训练集上的最终分类效果. 可以看到, 相较于单一线性SVM, AdaBoost通过级联多个线性SVM, 使其在训练集上的错误率由初始的0.45降至0.12, 极大程度上增强了弱模型的表达能力. - 使用建议:
①. 注意区分级联权重$\alpha$与关联权重$w$;
②. 注意区分错误率$E$与加权错误率$\epsilon$. - 参考文档:
Boosting之AdaBoost算法