『cs231n』计算机视觉基础

线性分类器损失函数明细：

最优化Optimiz部分代码：

1.随机搜索

bestloss = float('inf')  # 无穷大
for num in range(1000):
    W = np.random.randn(10, 3073) * 0.0001
    loss = L(X_train, Y_train, W)
    if loss < bestloss:
        bestloss = loss
        bestW = W

scores = bsetW.dot(Xte_cols)
Yte_predict = np.argmax(score, axis = 0)
np.mean(Yte_predict == Yte)

核心思路：迭代优化

2.随机本地搜索

W = np.random.randn(10, 3073) * 0.001
bestloss = float('inf')
for i in range(1000):
    step_size = 0.0001
    Wtry = np.random.randn(10, 3073) * step_size
    loss = L(Xtr_cols, Ytr, Wtry)
    if loss < bestloss:
        W = Wtry
        bestloss = loss

3.利用有限差值计算梯度（数值计算梯度）

def eval_numerical_gradient(f, x):
  """  
  一个f在x处的数值梯度法的简单实现
  - f是只有一个参数的函数
  - x是计算梯度的点
  """ 

  fx = f(x) # 在原点计算函数值
  grad = np.zeros(x.shape)
  h = 0.00001

  # 对x中所有的索引进行迭代
  it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
  while not it.finished:

    # 计算x+h处的函数值
    ix = it.multi_index
    old_value = x[ix]
    x[ix] = old_value + h # 增加h
    fxh = f(x) # 计算f(x + h)
    x[ix] = old_value # 存到前一个值中 (非常重要)

    # 计算偏导数
    grad[ix] = (fxh - fx) / h # 坡度
    it.iternext() # 到下个维度

  return grad

One_Hot编码

a 0,0,0,1

b 0,0,1,0

c 0,1,0,0

d 1,0,0,0

这样

数据优化另一个方面

下面的代码理论上输出1.0,实际输出0.95,也就是说在数值偏大的时候计算会不准

a = 10**9
for i in range(10**6):
    a = a + 1e-6
print (a - 10**9)

# 0.95367431640625

所以会有优化初始数据的过程，最好使均值为0,方差相同：

以红色通道为例：(R-128）/128

稀疏矩阵

0元素很多的矩阵是稀疏矩阵，便于优化（收敛速度快）有一种说法是提取单一特征时不需要同时激活那么多的神经元，所以抑制其他神经元效果反而更好L1正则化是一种常用稀疏化手段

L2正则化由于加了平方，所以权重影响项可以很接近零,反而不会被继续优化到0，没有稀疏的效果。（）

posted @ 2017-06-11 20:06 叠加态的猫阅读(465) 评论(0) 编辑收藏举报

叠加态的猫