吴恩达深度学习笔记(六) —— 卷积神经网络












九.1 * 1 filter










1)首先输入一张图像,可以使RGB图像,即 n_H * n_W * n_C 的结构,其中n_H 、n_W为高和宽,n_C为信道数,RGB图像由三种叠加在一起,所以n_C为3。

2)之后输入数据被若干个过滤器进行卷积,假设有k个过滤器,一个过滤器的结构为 f *f * n_C,其中f为宽和高,n_C为信道数,过滤器的信道数必须与它卷积的对象的信道数保持一致,否则匹配不成功无法进行卷积,因此这里n_C为3。一个过滤器卷积一步就形成一个数(对应位置的元素相乘然后求和),一个过滤器卷积完就形成一个n_new_H * n_new_W的矩阵,然后把k个过滤器卷积得到的k个矩阵叠加在一起,就形成了一个n_new_H * n_new_W * k 的矩体,其中k就是新的信道,因此又可以写成n_new_H * n_new_W * n_new_C。这个矩体就作为下一层的输入。


4)将上一层的输出输入到一个池化层,池化层的目的主要是缩短输入的宽和高,减少计算量。一般地有max pooling和average pooling。








def zero_pad(X, pad):
    Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, 
    as illustrated in Figure 1.
    X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
    pad -- integer, amount of padding around each image on vertical and horizontal dimensions
    X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
    ### START CODE HERE ### (≈ 1 line)
    X_pad = np.pad(X, ((0, 0), (pad, pad), (pad, pad), (0, 0)), 'constant', constant_values=0)
    ### END CODE HERE ###
    return X_pad
View Code










1.上一层输出的矩体为 n_H * n_W * n_C,所以其信道个数为n_C。假设卷积层有k个过滤器,对于一个过滤器,为了能够与矩体卷积,这个过滤器的信道也必须为n_C,所以过滤器的结构为 f * f * n_C。

2.一个过滤器卷积完就形成一个n_new_H * n_new_W的矩阵,然后把k个过滤器卷积得到的k个矩阵叠加在一起,就形成了一个n_new_H * n_new_W * k 的矩体,其中k就是新的信道,因此又可以写成n_new_H * n_new_W * n_new_C。这个矩体就作为下层的输入,以此循环下去。







# GRADED FUNCTION: conv_single_step
def conv_single_step(a_slice_prev, W, b):
    Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation 
    of the previous layer.
    a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
    W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
    b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)
    Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data

    ### START CODE HERE ### (≈ 2 lines of code)
    # Element-wise product between a_slice and W. Do not add the bias yet.
    s = a_slice_prev * W
    # Sum over all entries of the volume s.
    Z = np.sum(s)
    # Add bias b to Z. Cast b to a float() so that Z results in a scalar value.
    Z = Z + float(b)
    ### END CODE HERE ###
    return Z







# GRADED FUNCTION: conv_forward

def conv_forward(A_prev, W, b, hparameters):
    Implements the forward propagation for a convolution function
    A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
    b -- Biases, numpy array of shape (1, 1, 1, n_C)
    hparameters -- python dictionary containing "stride" and "pad"
    Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward() function
    ### START CODE HERE ###
    # Retrieve dimensions from A_prev's shape (≈1 line)  
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    # Retrieve dimensions from W's shape (≈1 line)
    (f, f, n_C_prev, n_C) = W.shape
    # Retrieve information from "hparameters" (≈2 lines)
    stride = hparameters['stride']
    pad = hparameters['pad']
    # Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
    n_H = int((n_H_prev-f+2*pad)/stride)+1
    n_W = int((n_W_prev-f+2*pad)/stride)+1
    # Initialize the output volume Z with zeros. (≈1 line)
    Z = np.zeros((m,n_H,n_W,n_C))
    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev,pad)
    for i in range(m):                               # loop over the batch of training examples
        a_prev_pad = A_prev_pad[i]                               # Select ith training example's padded activation
        for h in range(n_H):                           # loop over vertical axis of the output volume
            for w in range(n_W):                       # loop over horizontal axis of the output volume
                for c in range(n_C):                   # loop over channels (= #filters) of the output volume
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    # Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
                    a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end]
                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
                    Z[i, h, w, c] = conv_single_step(a_slice_prev,W[...,c],b[...,c])
    ### END CODE HERE ###
    # Making sure your output shape is correct
    assert(Z.shape == (m, n_H, n_W, n_C))
    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)
    return Z, cache
View Code




The pooling (POOL) layer reduces the height and width of the input. It helps reduce computation, as well as helps make feature detectors more invariant to its position in the input. The two types of pooling layers are:

1)Max-pooling layer: slides an f * f window over the input and stores the max value of the window in the output.

2)Average-pooling layer: slides an f * f window over the input and stores the average value of the window in the output.

3)These pooling layers have no parameters for backpropagation to train. However, they have hyperparameters such as the window size f. This specifies the height and width of the fxf window you would compute a max or average over.


# GRADED FUNCTION: pool_forward

def pool_forward(A_prev, hparameters, mode = "max"):
    Implements the forward pass of the pooling layer
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
    # Retrieve dimensions from the input shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    # Retrieve hyperparameters from "hparameters"
    f = hparameters["f"]
    stride = hparameters["stride"]
    # Define the dimensions of the output
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))              
    ### START CODE HERE ###
    for i in range(m):                         # loop over the training examples
        for h in range(n_H):                     # loop on the vertical axis of the output volume
            for w in range(n_W):                 # loop on the horizontal axis of the output volume
                for c in range (n_C):            # loop over the channels of the output volume
                    # Find the corners of the current "slice" (≈4 lines)
                    vert_start = stride * h
                    vert_end = vert_start + f
                    horiz_start = stride * w
                    horiz_end = horiz_start + f
                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = A_prev[i,vert_start:vert_end,horiz_start:horiz_end,c]
                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)
    ### END CODE HERE ###
    # Store the input and hparameters in "cache" for pool_backward()
    cache = (A_prev, hparameters)
    # Making sure your output shape is correct
    assert(A.shape == (m, n_H, n_W, n_C))
    return A, cache
View Code



九.1 * 1 filter

如果说,池化层的作用是用来调整n_W和n_H,那么1 * 1,准确而言,是n_new_C个1 * 1 * n_C的过滤器的作用是调节信道数:


posted on 2018-10-04 21:14  h_z_cong  阅读(411)  评论(0编辑  收藏  举报
