学习笔记 | 回归模型 | 02 残差

Residuals 残差

用最小二乘法提出两个等式

残差和存在相互抵消的问题
残差与预测变量相关（此为父母身高）

也许回归模型的关键洞见是适合他们生产高度可翻译的模型。这是与机器学习算法,它常常牺牲可解释性改善预测性能或自动化。当然,这些都是有价值的属性在他们自己的权利。然而,简单的好处,吝啬和intrepretability回归模型(和他们的亲密归纳)应该使他们第一次选择的工具对于任何实际的问题。

回归模型的关键在于它能够产生高度可解释的模型。这与机器学习算法不同，后者常常为了提高预测准确度或自动化而牺牲可解释性。当然，这些做法都有其各自的优势。然而，回归模型所提供的简单，简约，可解释性，都让它应该成为在解决任何实际问题时的首选。

因此要用最小二乘法（ Ordinary Least Square，OLS）：所选择的回归模型应该使所有观察值的残差平方和达到最小。（Q为残差平方和）- 即采用平方损失函数。

R：

cov函数

Description描述

var, cov and cor compute the variance of x and the covariance or correlation of x and y if these are vectors. If x and y are matrices then the covariances (or correlations) between the columns of x and the columns of y are computed.

cov2cor scales a covariance matrix into the corresponding correlation matrix efficiently. 计算协方差，两个向量的相似度

Usage用法

var(x, y = NULL, na.rm = FALSE, use)

cov(x, y = NULL, use = "everything",

method = c("pearson", "kendall", "spearman"))

cor(x, y = NULL, use = "everything",

method = c("pearson", "kendall", "spearman"))

cov2cor(V)

Arguments参数

a numeric vector, matrix or data frame.

NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient).

na.rm

logical. Should missing values be removed?

use

an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".

method

a character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.

symmetric numeric matrix, usually positive definite such as a covariance matrix.

Examples

var(1:10) # 9.166667

var(1:5, 1:5) # 2.5

## Two simple vectors

cor(1:10, 2:11) # == 1

## Correlation Matrix of Multivariate sample:

(Cl <- cor(longley))

## Graphical Correlation Matrix:

symnum(Cl) # highly correlated

## Spearman's rho and Kendall's tau

symnum(clS <- cor(longley, method = "spearman"))

symnum(clK <- cor(longley, method = "kendall"))

## How much do they differ?

i <- lower.tri(Cl)

cor(cbind(P = Cl[i], S = clS[i], K = clK[i]))

# For compatibility with 2.2.21
.get_course_path <- function(){
  tryCatch(swirl:::swirl_courses_dir(),
           error = function(c) {file.path(find.package("swirl"),"Courses")}
  )
}

galton <- read.csv(file.path(.get_course_path(),
	"Regression_Models","Introduction", "galton.csv"))
est <- function(slope, intercept)intercept + slope*galton$parent
sqe <- function(slope, intercept)sum( (est(slope, intercept)-galton$child)^2)
attenu <- datasets::attenu
fname <- paste(.get_course_path(),
	"Regression_Models","Residuals","res_eqn.R",sep="/")


#Here are the vectors of variations or tweaks
sltweak <- c(.01, .02, .03, -.01, -.02, -.03) #one for the slope
ictweak <- c(.1, .2, .3, -.1, -.2, -.3)  #one for the intercept
lhs <- numeric()
rhs <- numeric()
#left side of eqn is the sum of squares of residuals of the tweaked regression line
for (n in 1:6) lhs[n] <- sqe(ols.slope+sltweak[n],ols.ic+ictweak[n])
#right side of eqn is the sum of squares of original residuals + sum of squares of two tweaks
for (n in 1:6) rhs[n] <- sqe(ols.slope,ols.ic) + sum(est(sltweak[n],ictweak[n])^2)

posted @ 2017-10-10 15:43 极客W先森阅读(732) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

W先森

学习笔记 | 回归模型 | 02 残差

公告