On Box-Cox transform in regression models

On Box-Cox transform in regression models

November 13, 2012

By arthur charpentier

 

A few days ago, a former student of mine, David, contacted me about Box-Cox tests in linear models. It made me look more carefully at the test, and I do not understand what is computed, to be honest. Let us start with something simple, like a linear simple regression, i.e.

http://latex.codecogs.com/gif.latex?Y_i=\beta_0+\beta_1%20X_i+\varepsilon_i

Let us introduced – as suggested in Box & Cox (1964) – the following family of (power) transformations

http://latex.codecogs.com/gif.latex?Y_i^{(\lambda)}%20=%20\begin{cases}%20\dfrac{Y_i^\lambda-1}{\lambda}%20&\text{%20%20}%20(\lambda%20\neq%200)\\[8pt]%20\log{(Y_i)}%20%20&\text{%20}%20(\lambda%20=%200)%20\end{cases}

on the variable of interest. Then assume that

http://latex.codecogs.com/gif.latex?Y_i^{(\lambda)}=\beta_0+\beta_1%20X_i+\varepsilon_i

As mentioned in Chapter 14 of Davidson & MacKinnon (1993)in French – the log-likelihood of this model (assuming that observations are independent, with distribution http://latex.codecogs.com/gif.latex?\varepsilon_i\sim\mathcal{N}(0,\sigma^2)) can be written

http://latex.codecogs.com/gif.latex?\log%20\mathcal{L}=-\frac{n}{2}\log(2\pi)-n\log\sigma%20\\-\frac{1}{2\sigma^2}\sum_{i=1}^n\left[Y_i^{(\lambda)}-(\beta_0+\beta_1%20X_i)\right]^2+(\lambda-1)\sum_{i=1}^n\log%20Y_i

We can then use profile-likelihood techniques (see here) to derive the optimal transformation.

This can be done in R extremely simply,

 library(MASS)
 boxcox(lm(dist~speed,data=cars),lambda=seq(0,1,by=.1))

we then get the following graph,

If we look at the code of the function, it is based on the QR decomposition of the http://latex.codecogs.com/gif.latex?\boldsymbol{X}matrix (since we assume that http://latex.codecogs.com/gif.latex?\boldsymbol{X} is a full-rank matrix). More precisely, http://latex.codecogs.com/gif.latex?\boldsymbol{X}=QR where http://latex.codecogs.com/gif.latex?\boldsymbol{X} is a http://latex.codecogs.com/gif.latex?n\times%202 matrix, http://latex.codecogs.com/gif.latex?Q is a http://latex.codecogs.com/gif.latex?n\times%202 orthonornal matrix, and http://latex.codecogs.com/gif.latex?R is a http://latex.codecogs.com/gif.latex?2\times2 upper triangle matrix. It might be convenient to use this matrix since, for instance, http://latex.codecogs.com/gif.latex?R\widehat{\boldsymbol{\beta}}=Q%27Y.  Thus, we do have an upper triangle system of equations.

X=lm(dist~speed,data=cars)$qr

 

The code used to get the previous graph is (more or less) the following,

 g=function(x,lambda){
 if(lambda!=0){y<-(x^lambda-1)/lambda}
 if(lambda==0){y<-log(x)}
 return(y)} 
 n=nrow(cars)
 X=lm(dist~speed,data=cars)$qr
 Y=cars$dist
 logv=function(lambda){
 -n/2*log(sum(qr.resid(X, g(Y,lambda)/
 exp(mean(log(Y)))^(lambda-1))^2))}
 L=seq(0,1,by=.05)
 LV=Vectorize(logv)(L)
 points(L,LV,pch=19,cex=.85,col="red")

As we can see (with those red dots) we can reproduce the R graph. But it might not be consistent with other techniques (and functions described above). For instance, we can plot the profile likelihood function, http://latex.codecogs.com/gif.latex?\lambda\mapsto\log\mathcal{L}

 logv=function(lambda){
 s=summary(lm(g(dist,lambda)~speed,
 data=cars))$sigma
 e=lm(g(dist,lambda)~speed,data=cars)$residuals
 -n/2*log(2 * pi)-n*log(s)-.5/s^2*(sum(e^2))
 (lambda-1)*sum(log(Y))
 }
 L=seq(0,1,by=.01)
 LV=Vectorize(logv)(L)
 plot(L,LV,type="l",ylab="")
 (maxf=optimize(logv,0:1,maximum=TRUE))
$maximum
[1] 0.430591
 
$objective
[1] -197.6966
 
 abline(v=maxf$maximum,lty=2)

The good point is that the optimal value of http://latex.codecogs.com/gif.latex?\lambda is the same as the one we got before. The only problem is that the http://latex.codecogs.com/gif.latex?y-axis has a different scale. And using profile likelihood techniques to derive a confidence interval will give us different results (with a larger confidence interval than the one given by the standard function),

ic=maxf$objective-qchisq(.95,1)
 #install.packages("rootSolve")
 library(rootSolve)
 f=function(x)(logv(x)-ic)
 (lower=uniroot(f, c(0,maxf$maximum))$root)
[1] 0.1383507
 (upper=uniroot(f, c(maxf$maximum,1))$root)
[1] 0.780573
 segments(lower,ic,upper,ic,lwd=2,col="red")

Actually, it possible to rewrite the log-likelihood as

http://latex.codecogs.com/gif.latex?\mathcal{L}=\star-\frac{n}{2}\log\left[\sum_{i=1}^n\left(\frac{Y_i^{(\lambda)}-(\beta_0+\beta_1%20X_i)}{\dot{Y}^\lambda}\right)^2\right]

(let us just get rid of the constant), where

http://latex.codecogs.com/gif.latex?\dot{Y}=\exp\left[\frac{1}{n}\sum_{i=1}^n%20\log%20Y_i\right]

Here, it becomes

 logv=function(lambda){
 e=lm(g(dist,lambda)~speed,data=cars)$residuals
 elY=(exp(mean(log(Y))))
 -n/2*log(sum((e/elY^lambda)^2))
 }

 L=seq(0,1,by=.01)
 LV=Vectorize(logv)(L)
 plot(L,LV,type="l",ylab="")
 optimize(logv,0:1,maximum=TRUE)
$maximum
[1] 0.430591
 
$objective
[1] -47.73436

with again the same optimal value for http://latex.codecogs.com/gif.latex?\lambda, and the same confidence interval, since the function is the same, up to some additive constant.

So we have been able to derive the optimal transformation according to Box-Cox transformation, but so far, the confidence interval is not the same (it might come from the fact that here we substituted an estimator to the unknown parameter http://latex.codecogs.com/gif.latex?\sigma.

 

转载:r-bloggers

posted @ 2017-09-19 21:18  aongao  阅读(144)  评论(0编辑  收藏  举报