assignment1SVM的一些经验

def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  loss = 0.0e0
  dW = np.zeros(W.shape,dtype='float64') # initialize the gradient as zero

  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the structured SVM loss, storing the    #
  # result in loss.                                                           #
  #############################################################################
  pass
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  num_train = X.shape[0]
  score = np.dot(X, W)
  loss_matrix = np.maximum(0, score - score[np.arange(num_train), np.array(y)].reshape(-1, 1) + 1)
  loss_matrix[np.arange(num_train), np.array(y)] = 0
  loss = np.sum(loss_matrix)
  loss /= num_train
  loss += 0.5 * reg * np.sum(W * W)
  
  

  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the gradient for the structured SVM     #
  # loss, storing the result in dW.                                           #
  #                                                                           #
  # Hint: Instead of computing the gradient from scratch, it may be easier    #
  # to reuse some of the intermediate values that you used to compute the     #
  # loss.                                                                     #
  #############################################################################
  num_classes = W.shape[1]
  coeff_mat = np.zeros((num_train, num_classes))
  coeff_mat[loss_matrix > 0] = 1
  coeff_mat[range(num_train), list(y)] = 0
  coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)

  dW = (X.T).dot(coeff_mat)
  dW /= num_train 
  dW += reg * W
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################

  return loss, dW

这里面，有一句很难理解：

  loss_matrix = np.maximum(0, score - score[np.arange(num_train), np.array(y)].reshape(-1, 1) + 1)
当时看了很久，后来想通了，我们拆开来看，就不会很难了。

score[np.arange(num_train), np.array(y)]是从分数中，把正确的分数提取出来。下图中，那个小红框，就表示当前正确的分类对应的分数。提取出来之后，就是N*1维的矩阵

score - score[np.arange(num_train), np.array(y)].reshape(-1, 1)这个减法虽然维度不匹配，但是有boardcasting技术，后面的矩阵会自动列复制到维度N*C

  num_classes = W.shape[1]
  coeff_mat = np.zeros((num_train, num_classes))
  coeff_mat[loss_matrix > 0] = 1
  coeff_mat[range(num_train), list(y)] = 0
  coeff_mat[range(num_train), list(y)] = -np.sum(coeff_mat, axis=1)

  dW = (X.T).dot(coeff_mat)
  dW /= num_train 
  dW += reg * W

  dW = (X.T).dot(coeff_mat) 这里dW 的计算，使用向量计算。用一个取值的coeff_mat矩阵来确定取哪些x加入。看懂循环是如何操作的，就明白了这个这里取巧的从X.T来实现循环，时间倍数16倍。

中间有几次，发现loss老是益处报错，后来才发现应该是learning rate 太大了，把-5改成-6，就可以了。原因是这里没有学习速率衰减优化策略

posted @ 2018-10-30 10:53 dgi 阅读(469) 评论(0) 编辑收藏举报

刷新页面返回顶部

一扇窗

make something great ! 花苦功夫扎扎实实把每一个细节搞清楚！

assignment1SVM的一些经验

公告