吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第四周编程作业

吴恩达Coursera, 机器学习专项课程, Machine Learning:Advanced Learning Algorithms第四周所有jupyter notebook文件:

吴恩达,机器学习专项课程, Advanced Learning Algorithms第四周所有Python编程文件

本次作业

Exercise 1

 UNQ_C1
# GRADED FUNCTION: compute_entropy

def compute_entropy(y):
    """
    Computes the entropy for 
    
    Args:
       y (ndarray): Numpy array indicating whether each example at a node is
           edible (`1`) or poisonous (`0`)
       
    Returns:
        entropy (float): Entropy at that node
        
    """
    # You need to return the following variables correctly
    entropy = 0.
    ### START CODE HERE ###  
    if len(y) != 0:
        p1 = 1 - sum(y)/len(y)
        if p1 == 0 or p1 == 1:
            entropy = 0.
        else:
            entropy = -p1*np.log2(p1)-(1-p1)*np.log2(1-p1)
    else:
        entropy = 0.
    ### END CODE HERE ###        
    
    return entropy

Exercise 2

# UNQ_C2
# GRADED FUNCTION: split_dataset

def split_dataset(X, node_indices, feature):
    """
    Splits the data at the given node into
    left and right branches
    
    Args:
        X (ndarray):             Data matrix of shape(n_samples, n_features)
        node_indices (ndarray):  List containing the active indices. I.e, the samples being considered at this step.
        feature (int):           Index of feature to split on
    
    Returns:
        left_indices (ndarray): Indices with feature value == 1
        right_indices (ndarray): Indices with feature value == 0
    """
    
    # You need to return the following variables correctly
    left_indices = []
    right_indices = []
    
    ### START CODE HERE ###
    for i in range(len(X)):
        if i in node_indices:
            if X[i][feature] ==1:
                left_indices.append(i)
            else:
                right_indices.append(i)
                    
    ### END CODE HERE ###
        
    return left_indices, right_indices

Exercise 3

# UNQ_C3
# GRADED FUNCTION: compute_information_gain

def compute_information_gain(X, y, node_indices, feature):
    
    """
    Compute the information of splitting the node on a given feature
    
    Args:
        X (ndarray):            Data matrix of shape(n_samples, n_features)
        y (array like):         list or ndarray with n_samples containing the target variable
        node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.
   
    Returns:
        cost (float):        Cost computed
    
    """    
    # Split dataset
    left_indices, right_indices = split_dataset(X, node_indices, feature)
    
    # Some useful variables
    X_node, y_node = X[node_indices], y[node_indices]
    X_left, y_left = X[left_indices], y[left_indices]
    X_right, y_right = X[right_indices], y[right_indices]
    
    # You need to return the following variables correctly
    information_gain = 0
    
    ### START CODE HERE ###
    
    # Weights 
    wl = len(left_indices) / len(node_indices)
    wr = len(right_indices) / len(node_indices)
    
    #Weighted entropy

    #Information gain 
    information_gain = compute_entropy(y_node)-(wl*compute_entropy(y_left)+wr*compute_entropy(y_right))
    
    ### END CODE HERE ###  
    
    return information_gain

Exercise 4

# UNQ_C4
# GRADED FUNCTION: get_best_split

def get_best_split(X, y, node_indices):   
    """
    Returns the optimal feature and threshold value
    to split the node data 
    
    Args:
        X (ndarray):            Data matrix of shape(n_samples, n_features)
        y (array like):         list or ndarray with n_samples containing the target variable
        node_indices (ndarray): List containing the active indices. I.e, the samples being considered in this step.

    Returns:
        best_feature (int):     The index of the best feature to split
    """    
    
    # Some useful variables
    num_features = X.shape[1]
    
    # You need to return the following variables correctly
    best_feature = -1
    
    ### START CODE HERE ###
    in_gain = []
    if sum(y) != 0 and sum(y) != len(y):
        for i in range(len(X[0])):
            in_gain.append(compute_information_gain(X, y, node_indices, i))
        best_feature = in_gain.index(max(in_gain))
    else:
        best_feature = -1
       
    ### END CODE HERE ##    
   
    return best_feature
posted @ 2022-07-03 02:15  楚千羽  阅读(692)  评论(3编辑  收藏  举报