Spring 2017 Assignments1








(1)有用的numpy API:




np.argmax: 返回最大元素的索引值

np.array_split: 划分k折交叉验证集常用



np.random.randn: 常用来初始化权重矩阵

(2)三种计算训练集与测试集L2距离矩阵的方式(two loop,one loop,no loop):

two loop(很暴力的方法):

def compute_distances_two_loops(self, X):
    Compute the distance between each test point in X and each training point
    in self.X_train using a nested loop over both the training data and the 
    test data.

    - X: A numpy array of shape (num_test, D) containing test data.

    - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
      is the Euclidean distance between the ith test point and the jth training
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in xrange(num_test):
      for j in xrange(num_train):
        # TODO:                                                             #
        # Compute the l2 distance between the ith test point and the jth    #
        # training point, and store the result in dists[i, j]. You should   #
        # not use a loop over dimension.                                    #
        dists[i, j] = np.sqrt(np.sum((X[i, :] - self.X_train[j, :]) ** 2)) 
        #                       END OF YOUR CODE                            #
    return dists
one loop(用到了numpy数组的广播):

def compute_distances_one_loop(self, X):
    Compute the distance between each test point in X and each training point
    in self.X_train using a single loop over the test data.

    Input / Output: Same as compute_distances_two_loops
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in xrange(num_test):
      # TODO:                                                               #
      # Compute the l2 distance between the ith test point and all training #
      # points, and store the result in dists[i, :].                        #
      dists[i] += np.sqrt(np.sum((X[i, :] - self.X_train) ** 2, axis=1))
      #                         END OF YOUR CODE                            #
    return dists
no loop (将L2距离表达式展开,然后使用向量化方式巧妙实现):

def compute_distances_no_loops(self, X):
    Compute the distance between each test point in X and each training point
    in self.X_train using no explicit loops.

    Input / Output: Same as compute_distances_two_loops
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train)) 
    # TODO:                                                                 #
    # Compute the l2 distance between all test points and all training      #
    # points without using any explicit loops, and store the result in      #
    # dists.                                                                #
    #                                                                       #
    # You should implement this function using only basic array operations; #
    # in particular you should not use functions from scipy.                #
    #                                                                       #
    # HINT: Try to formulate the l2 distance using matrix multiplication    #
    #       and two broadcast sums.                                         #
    dists += np.sum(X ** 2, axis=1).reshape((num_test, 1))
    dists += np.sum(self.X_train ** 2, axis=1)
    dists += X.dot(self.X_train.T) * (-2)
    dists = np.sqrt(dists)
    #                         END OF YOUR CODE                              #
    return dists
(3) 完整代码




2 多分类SVM







def svm_loss_naive(W, X, y, reg):
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        dW[:, j] += X[i, :]
        dW[:, y[i]] += -X[i, :]
        loss += margin

  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW /= num_train

  # Add regularization to the loss.
  loss += reg * np.sum(W * W)
  dW += 2 * reg * W
  # TODO:                                                                     #
  # Compute the gradient of the loss function and store it dW.                #
  # Rather that first computing the loss and then computing the derivative,   #
  # it may be simpler to compute the derivative at the same time that the     #
  # loss is being computed. As a result you may need to modify some of the    #
  # code above to compute the gradient.                                       #

  return loss, dW
def svm_loss_vectorized(W, X, y, reg):
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero
  num_classes = W.shape[1]
  num_train = X.shape[0]
  # TODO:                                                                     #
  # Implement a vectorized version of the structured SVM loss, storing the    #
  # result in loss.                                                           #
  scores = X.dot(W)
  rightClassScores = scores[range(0, num_train), list(y)].reshape(num_train, 1)
  lossMat = scores - rightClassScores + 1
  lossMat[lossMat < 0] = 0.0
  loss = (np.sum(lossMat) - num_train) / num_train
  #                             END OF YOUR CODE                              #

  # TODO:                                                                     #
  # Implement a vectorized version of the gradient for the structured SVM     #
  # loss, storing the result in dW.                                           #
  #                                                                           #
  # Hint: Instead of computing the gradient from scratch, it may be easier    #
  # to reuse some of the intermediate values that you used to compute the     #
  # loss.                                                                     #
  lossMat[lossMat > 0] = 1.0
  lossMat[range(0, num_train), list(y)] = -np.sum(lossMat, axis=1) + 1
  dW = X.T.dot(lossMat) / num_train + 2 * reg * W
  #                             END OF YOUR CODE                              #

  return loss, dW
- 实现了SVM完全向量化的损失函数
- 实现了解析梯度完全向量化的表达式
- 使用数值梯度检查了解析梯度的正确性
- 使用验证集调参:学习速率和正则化强度
- 实现了优化损失函数的SGD算法
- 可视化最终学习权重,可以看出线性分类器相当于为每个类学出一个模版(对应权重矩阵的一行),进行模版匹配(可视化的方法是将权重进行归一化,然后乘以255)。

 3 softmax

(2) 两种实现softmax损失及其梯度的方式
def softmax_loss_naive(W, X, y, reg):
  Softmax loss function, naive implementation (with loops)

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  # TODO: Compute the softmax loss and its gradient using explicit loops.     #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  train_num = X.shape[0]
  dim = X.shape[1]
  class_num = W.shape[1]
  for i in range(0, train_num):
      scores = X[i].dot(W)
      Sum = 0
      for j in range(0, class_num):
          Sum += math.exp(scores[j])
      for j in range(0, class_num):
          if j == y[i]:
              dW[:, j] += (math.exp(scores[y[i]]) / Sum - 1) * X[i]
              dW[:, j] += math.exp(scores[j]) / Sum * X[i]
      loss += -math.log(math.exp(scores[y[i]]) / Sum)    
  loss /= train_num
  dW /= train_num
  dW += 2 * reg * W
  #                          END OF YOUR CODE                                 #

  return loss, dW
def softmax_loss_vectorized(W, X, y, reg):
  Softmax loss function, vectorized version.

  Inputs and outputs are the same as softmax_loss_naive.
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  train_num = X.shape[0]
  dim = X.shape[1]
  class_num = W.shape[1]
  scores = X.dot(W)
  exp_scores = np.exp(scores)
  tmp = exp_scores[range(0, train_num), y] / np.sum(exp_scores, axis=1)
  loss = np.sum(-np.log(tmp)) / train_num
  H = exp_scores / np.sum(exp_scores, axis=1).reshape((train_num, 1))
  H[range(0, train_num), y] -= 1
  dW = X.T.dot(H) / train_num + 2 * reg * W 
  #                          END OF YOUR CODE                                 #

  return loss, dW
- 实现了Softmax分类器完全向量化的损失函数

- 实现了解析梯度完全向量化的代码
- 用数值梯度检查了实现
- 使用验证集调整学习速度和正则化强度
- 使用SGD优化损失函数
- 可视化最终学习权重



4 两层神经网络

(1)softMax loss和梯度的计算(完全向量法)


# Compute the loss
    loss = None
    # TODO: Finish the forward pass, and compute the loss. This should include  #
    # both the data loss and L2 regularization for W1 and W2. Store the result  #
    # in the variable loss, which should be a scalar. Use the Softmax           #
    # classifier loss.                                                          #
    exp_scores = np.exp(scores)
    loss = np.sum(-np.log(exp_scores[range(0, N), y] / np.sum(exp_scores, axis=1)))
    loss /= N
    loss += reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
    #                              END OF YOUR CODE                             #
a 矩阵线性变换的导数是一个常用的结论,需要记住(使用平铺矩阵jocabian法可以推出这个结论):


b ReLu层的导数





# Backward pass: compute gradients
    grads = {}
    # TODO: Compute the backward pass, computing the derivatives of the weights #
    # and biases. Store the results in the grads dictionary. For example,       #
    # grads['W1'] should store the gradient on W1, and be a matrix of same size #
    #cal gradsOfLossByScore
    gradsOfLossByScore = exp_scores / np.sum(exp_scores, axis=1).reshape((N,1))
    gradsOfLossByScore[range(0, N), y] -= 1
    #cal grads['b2'] 
    gradsOfLossByb2 = gradsOfLossByScore
    grads['b2'] = np.sum(gradsOfLossByb2, axis=0) / N
    grads['W2'] = h.T.dot(gradsOfLossByScore) / N + 2 * reg * W2
    gradsOfLossByh = gradsOfLossByScore.dot(W2.T)
    gradsOfLossBya1 = gradsOfLossByh * (h > 0)
    gradsOfLossByb1 = gradsOfLossBya1
    grads['b1'] = np.sum(gradsOfLossByb1, axis=0) / N
    grads['W1'] = X.T.dot(gradsOfLossBya1) / N + 2 * reg * W1 
    #                              END OF YOUR CODE                             #
View Code

















 5 图像特征实验

  这一部分验证了提取一些图像特征能够达到更高的分类准确率。提取的特征有HOG和color histogram,用到已经实现的SVM和两层神经网络上。经过调参,神经网络在验证集上准确率超过了60%,测试集上达到了58.3%。





