1.The way to calculate the slope: the covariance of x and y divided by the variance of x

  from numpy import cov
  slope_density = cov(wine_quality["quality"],wine_quality["density"])[0,1]/wine_quality["density"].var() #cov(x,y) is the function from numpy, which returns a 2*2 metric,.var() is pandas function.

2.To get the intercept: b = y - ax( x and y are the mean value of each column)

  intercept_density = wine_quality["quality"].mean() - wine_quality["density"].mean() * (calc_slope(wine_quality["density"],wine_quality["quality"])) 

3. Making perdictions: accoding to the slope and intercept we get from the mean of the value from the dataset. We can get the predict model. Then we can get the predition array according to the model.

  def predict_quality(x):# define a function to calculate the preducted value from the model

    y = calc_slope(wine_quality["density"],wine_quality["quality"]) * x +     calc_intercept(wine_quality["density"],wine_quality["quality"],calc_slope(wine_quality["density"],wine_quality["quality"]))
    return y

  predicted_quality = wine_quality["density"].apply(predict_quality) 

4. Finding error: use the actrual data minus predicted data to get the error in order to evaluate the model(add up the sum of the squared residuals):

  wine_quality["predicted"] = wine_quality["density"]*slope + intercept
  wine_quality["predicted"] = (wine_quality["quality"] - wine_quality["predicted"]) **2
  rss = sum(wine_quality["predicted"].values)

 

5. Standard error: tries to make the easimate for the whole population(sum of squared residuals, divide by the number of y-points minus two, and then take the square root):

  standard_error = (rss / (len(predicted_y)-2))**(1/2) # get the standard error for the model
  result =np.asarray(wine_quality["quality"] - predicted_y)
  count_one = 0
  count_two = 0
  count_three = 0

  for ele in result:
    if abs(ele) <= standard_error:
      count_one += 1
    elif abs(ele) <= standard_error * 2:
      count_two += 1
    elif abs(ele) <= standard_error * 3:
      count_three += 1
  within_one = count_one/len(result) # Calculate what percentage of actual y values are within 1 standard error of the predicted y value
  within_two = (count_one+count_two)/len(result) #Calculate what percentage of actual y values are within 2 standard errors of the predicted y value
  within_three = (count_one+count_two+count_three)/len(result) #Calculate what percentage of actual y values are within 3 standard errors of the predicted y value

posted on 2016-12-02 04:00  阿难1020  阅读(169)  评论(0编辑  收藏  举报