学习笔记 | 回归模型 | 02 残差

Residuals 残差
用最小二乘法提出两个等式
  1. 残差和存在相互抵消的问题
  2. 残差与预测变量相关(此为父母身高)
 
也许回归模型的关键洞见是适合他们生产高度可翻译的模型。这是与机器学习算法,它常常牺牲可解释性改善预测性能或自动化。当然,这些都是有价值的属性在他们自己的权利。然而,简单的好处,吝啬和intrepretability回归模型(和他们的亲密归纳)应该使他们第一次选择的工具对于任何实际的问题。
 
回归模型的关键在于它能够产生高度可解释的模型。这与机器学习算法不同,后者常常为了提高预测准确度或自动化而牺牲可解释性。当然,这些做法都有其各自的优势。然而,回归模型所提供的简单,简约,可解释性,都让它应该成为在解决任何实际问题时的首选。
 
因此要用 最小二乘法( Ordinary Least Square,OLS):所选择的回归模型应该使所有观察值的残差平方和达到最小。(Q为残差平方和)- 即采用平方损失函数。
 
R:
 
cov函数
 
Description描述
 
var, cov and cor compute the variance of x and the covariance or correlation of x and y if these are vectors. If x and y are matrices then the covariances (or correlations) between the columns of x and the columns of y are computed.
cov2cor scales a covariance matrix into the corresponding correlation matrix efficiently. 计算协方差,两个向量的相似度
 
Usage用法
var(x, y = NULL, na.rm = FALSE, use)
cov(x, y = NULL, use = "everything",
    method = c("pearson", "kendall", "spearman"))
cor(x, y = NULL, use = "everything",
    method = c("pearson", "kendall", "spearman"))
cov2cor(V)
 
Arguments参数
x    
a numeric vector, matrix or data frame.
y    
NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient).
na.rm    
logical. Should missing values be removed?
use    
an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".
method    
a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.
V    
symmetric numeric matrix, usually positive definite such as a covariance matrix.
 
Examples
 
var(1:10)  # 9.166667
var(1:5, 1:5) # 2.5
## Two simple vectors
cor(1:10, 2:11) # == 1
## Correlation Matrix of Multivariate sample:
(Cl <- cor(longley))
## Graphical Correlation Matrix:
symnum(Cl) # highly correlated
## Spearman's rho  and  Kendall's tau
symnum(clS <- cor(longley, method = "spearman"))
symnum(clK <- cor(longley, method = "kendall"))
## How much do they differ?
i <- lower.tri(Cl)
cor(cbind(P = Cl[i], S = clS[i], K = clK[i]))
 
# For compatibility with 2.2.21
.get_course_path <- function(){
  tryCatch(swirl:::swirl_courses_dir(),
           error = function(c) {file.path(find.package("swirl"),"Courses")}
  )
}

galton <- read.csv(file.path(.get_course_path(),
	"Regression_Models","Introduction", "galton.csv"))
est <- function(slope, intercept)intercept + slope*galton$parent
sqe <- function(slope, intercept)sum( (est(slope, intercept)-galton$child)^2)
attenu <- datasets::attenu
fname <- paste(.get_course_path(),
	"Regression_Models","Residuals","res_eqn.R",sep="/")


#Here are the vectors of variations or tweaks
sltweak <- c(.01, .02, .03, -.01, -.02, -.03) #one for the slope
ictweak <- c(.1, .2, .3, -.1, -.2, -.3)  #one for the intercept
lhs <- numeric()
rhs <- numeric()
#left side of eqn is the sum of squares of residuals of the tweaked regression line
for (n in 1:6) lhs[n] <- sqe(ols.slope+sltweak[n],ols.ic+ictweak[n])
#right side of eqn is the sum of squares of original residuals + sum of squares of two tweaks
for (n in 1:6) rhs[n] <- sqe(ols.slope,ols.ic) + sum(est(sltweak[n],ictweak[n])^2)

  

posted @ 2017-10-10 15:43  极客W先森  阅读(732)  评论(0编辑  收藏  举报