卷积神经网络python实现

以下实现参考吴恩达的作业。

一、 padding

def zero_pad(X, pad):
    """
    Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, 
    as illustrated in Figure 1.
    
    Argument:
    X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
    pad -- integer, amount of padding around each image on vertical and horizontal dimensions
    
    Returns:
    X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
    """
    

    X_pad = np.pad(X, ((0,0),(pad,pad),(pad,pad),(0,0)), 'constant', constant_values=(0,0))

    
    return X_pad

　　从zero_pad的函数中，我们可以看出，我们只需要对原图片矩阵进行padding操作，而m是图片的个数，n_C则是channel的个数，这两个维度并不需要我们做任何操作。

二、卷积计算

def conv_single_step(a_slice_prev, W, b):
    
    s = a_slice_prev * W
  
    Z = np.sum(s)
 
    Z = Z + float(b)

    return Z

卷积计算的过程中，a_slice_prev是我们在图片矩阵中的窗口，而W是filter的参数。随后我们对求得的结果进行求和，然后加上常数b。

三、卷积forward

 1 def conv_forward(A_prev, W, b, hparameters):
 2     """
 3     Implements the forward propagation for a convolution function
 4     
 5     Arguments:
 6     A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
 7     W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
 8     b -- Biases, numpy array of shape (1, 1, 1, n_C)
 9     hparameters -- python dictionary containing "stride" and "pad"
10         
11     Returns:
12     Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
13     cache -- cache of values needed for the conv_backward() function
14     """
15     
16     ### START CODE HERE ###
17     # Retrieve dimensions from A_prev's shape (≈1 line)  
18     (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
19     
20     # Retrieve dimensions from W's shape (≈1 line)
21     (f, f, n_C_prev, n_C) = W.shape
22     
23     # Retrieve information from "hparameters" (≈2 lines)
24     stride = hparameters['stride']
25     pad = hparameters['pad']
26     
27     # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
28     n_H = int((n_H_prev + 2 * pad - f) / stride + 1)
29     n_W = int((n_W_prev + 2 * pad - f) / stride + 1)
30 
31     # Initialize the output volume Z with zeros. (≈1 line)
32     Z = np.zeros((m, n_H, n_W, n_C))
33     
34     # Create A_prev_pad by padding A_prev
35     A_prev_pad = zero_pad(A_prev, pad)
36     
37     for i in range(m):                               # loop over the batch of training examples
38         a_prev_pad = A_prev_pad[i]                               # Select ith training example's padded activation
39         for h in range(n_H):                           # loop over vertical axis of the output volume
40             for w in range(n_W):                       # loop over horizontal axis of the output volume
41                 for c in range(n_C):                   # loop over channels (= #filters) of the output volume
42                     
43                     # Find the corners of the current "slice" (≈4 lines)
44                     vert_start = h * stride
45                     vert_end = h * stride + f
46                     horiz_start = w * stride
47                     horiz_end = w * stride + f
48                     
49                     # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
50                     a_slice_prev = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end]
51                     
52                     # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
53                     Z[i, h, w, c] = conv_single_step(a_slice_prev,W[:,:,:,c],b[:,:,:,c])
54                                         
55     ### END CODE HERE ###
56     
57     # Making sure your output shape is correct
58     assert(Z.shape == (m, n_H, n_W, n_C))
59     
60     # Save information in "cache" for the backprop
61     cache = (A_prev, W, b, hparameters)
62     
63     return Z, cache

参数中包含我们的图片A_prev，W，b以及超参数padding和strides。我们首先通过元组的方式获取了所有形状参数。根据形状对输出结果初始化。随后我们便可以对每一个图片中的每一个窗口进行遍历。通过f窗口长度的加法计算，我们得到窗口的横纵坐标位置。随后通过卷积计算得到最终结果。注意这里的参数适用于图中的每一个窗口。

四、池化层

def pool_forward(A_prev, hparameters, mode = "max"):
    """
    Implements the forward pass of the pooling layer
    
    Arguments:
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
    """
    
    # Retrieve dimensions from the input shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve hyperparameters from "hparameters"
    f = hparameters["f"]
    stride = hparameters["stride"]
    
    # Define the dimensions of the output
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))              
    
    ### START CODE HERE ###
    for i in range(m):                         # loop over the training examples
        for h in range(n_H):                     # loop on the vertical axis of the output volume
            for w in range(n_W):                 # loop on the horizontal axis of the output volume
                for c in range (n_C):            # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = A_prev[i, vert_start : vert_end, horiz_start : horiz_end, c]
                    
                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)
    
    ### END CODE HERE ###
    
    # Store the input and hparameters in "cache" for pool_backward()
    cache = (A_prev, hparameters)
    
    # Making sure your output shape is correct
    assert(A.shape == (m, n_H, n_W, n_C))
    
    return A, cache

池化层的计算和之前的卷积层大同小异；我们需要注意的就是这里的参数中存在mode，其中包括max和average两种模式。

五、卷积层backward

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function
    
    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()
    
    Returns:
    dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
               numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    dW -- gradient of the cost with respect to the weights of the conv layer (W)
          numpy array of shape (f, f, n_C_prev, n_C)
    db -- gradient of the cost with respect to the biases of the conv layer (b)
          numpy array of shape (1, 1, 1, n_C)
    """
    
    ### START CODE HERE ###
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache
    
    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape
    
    # Retrieve information from "hparameters"
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = dZ.shape
    
    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros(A_prev.shape)                           
    dW = np.zeros(W.shape)
    db = np.zeros(b.shape)

    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev, pad)
    dA_prev_pad = zero_pad(dA_prev, pad)
    
    for i in range(m):                       # loop over the training examples
        
        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i]
        da_prev_pad = dA_prev_pad[i]
        
        for h in range(n_H):                   # loop over vertical axis of the output volume
            for w in range(n_W):               # loop over horizontal axis of the output volume
                for c in range(n_C):           # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice"
                    vert_start = h * stride
                    vert_end = h * stride + f
                    horiz_start = w * stride
                    horiz_end = w * stride + f
                    
                    # Use the corners to define the slice from a_prev_pad
                    a_slice = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end, : ]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[ i, h, w ,c]

                    dW[:,:,:,c] += a_slice * dZ[ i, h, w ,c]
                    db[:,:,:,c] += dZ[ i, h, w ,c]
                    
        # Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
    ### END CODE HERE ###
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
    
    return dA_prev, dW, db

这里对于dW，db的计算与BP神经网络的计算相似。在更新参数时，我们对整个图片所有位置进行遍历，进行一次计算。

六、池化层backward

我们了解池化层的原理之后，就需要根据其特征构造backward，对于max池，我们需要创建一个mask来获得我们的有效窗口。

def create_mask_from_window(x):
    """
    Creates a mask from an input matrix x, to identify the max entry of x.
    
    Arguments:
    x -- Array of shape (f, f)
    
    Returns:
    mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x.
    """
    
    ### START CODE HERE ### (≈1 line)
    mask = (x == np.max(x))
    ### END CODE HERE ###
    
    return mask

对于average我们需要分配到窗口中的每个值。

def distribute_value(dz, shape):
    """
    Distributes the input value in the matrix of dimension shape
    
    Arguments:
    dz -- input scalar
    shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz
    
    Returns:
    a -- Array of size (n_H, n_W) for which we distributed the value of dz
    """
    
    ### START CODE HERE ###
    # Retrieve dimensions from shape (≈1 line)
    (n_H, n_W) = shape
    
    # Compute the value to distribute on the matrix (≈1 line)
    average = n_H * n_W
    
    # Create a matrix where every entry is the "average" value (≈1 line)
    a = dz / average * np.ones((n_H, n_W))
    ### END CODE HERE ###
    
    return a

之后我们便可以通过和卷积层backward相同的方法，对图片进行遍历，我们将每一次得到的有效输出dZ进行累加得到这一层的dZ。

def pool_backward(dA, cache, mode = "max"):
    """
    Implements the backward pass of the pooling layer
    
    Arguments:
    dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
    cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters 
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
    """
    
    ### START CODE HERE ###
    
    # Retrieve information from cache (≈1 line)
    (A_prev, hparameters) = cache
    
    # Retrieve hyperparameters from "hparameters" (≈2 lines)
    stride = hparameters['stride']
    f = hparameters['f']
    
    # Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
    m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
    m, n_H, n_W, n_C = dA.shape
    
    # Initialize dA_prev with zeros (≈1 line)
    dA_prev = np.zeros(A_prev.shape)
    
    for i in range(m):                       # loop over the training examples
        
        # select training example from A_prev (≈1 line)
        a_prev = A_prev[i]
        
        for h in range(n_H):                   # loop on the vertical axis
            for w in range(n_W):               # loop on the horizontal axis
                for c in range(n_C):           # loop over the channels (depth)
                    
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Compute the backward propagation in both modes.
                    if mode == "max":
                        
                        # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                        a_prev_slice = a_prev[vert_start : vert_end, horiz_start : horiz_end, c]
                        # Create the mask from a_prev_slice (≈1 line)
                        mask = create_mask_from_window(a_prev_slice)
                        # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i, h, w, c]
                        
                    elif mode == "average":
                        
                        # Get the value a from dA (≈1 line)
                        da = dA[i, h, w, c]
                        # Define the shape of the filter as fxf (≈1 line)
                        shape = (f, f)
                        # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                        dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)
                        
    ### END CODE ###
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == A_prev.shape)
    
    return dA_prev

posted @ 2020-03-02 10:07 金思远阅读(1905) 评论(0) 编辑收藏举报

刷新页面返回顶部

金思远

华南理工大学金融科技专业大三在读。QQ：710627820。欢迎交流。

卷积神经网络python实现

一、 padding

二、卷积计算

三、卷积forward

四、池化层

五、卷积层backward

六、池化层backward

公告

金思远

华南理工大学金融科技专业大三在读。QQ：710627820。欢迎交流。

卷积神经网络python实现

一、 padding

二、 卷积计算

三、 卷积forward

四、 池化层

五、 卷积层backward

六、池化层backward

公告

二、卷积计算

三、卷积forward

四、池化层

五、卷积层backward