On Box-Cox transform in regression models
On Box-Cox transform in regression models
A few days ago, a former student of mine, David, contacted me about Box-Cox tests in linear models. It made me look more carefully at the test, and I do not understand what is computed, to be honest. Let us start with something simple, like a linear simple regression, i.e.
Let us introduced – as suggested in Box & Cox (1964) – the following family of (power) transformations
on the variable of interest. Then assume that
As mentioned in Chapter 14 of Davidson & MacKinnon (1993) – in French – the log-likelihood of this model (assuming that observations are independent, with distribution ) can be written
We can then use profile-likelihood techniques (see here) to derive the optimal transformation.
This can be done in R extremely simply,
library(MASS) boxcox(lm(dist~speed,data=cars),lambda=seq(0,1,by=.1))
we then get the following graph,
If we look at the code of the function, it is based on the QR decomposition of the matrix (since we assume that is a full-rank matrix). More precisely, where is a matrix, is a orthonornal matrix, and is a upper triangle matrix. It might be convenient to use this matrix since, for instance, . Thus, we do have an upper triangle system of equations.
X=lm(dist~speed,data=cars)$qr
The code used to get the previous graph is (more or less) the following,
g=function(x,lambda){ if(lambda!=0){y<-(x^lambda-1)/lambda} if(lambda==0){y<-log(x)} return(y)} n=nrow(cars) X=lm(dist~speed,data=cars)$qr Y=cars$dist logv=function(lambda){ -n/2*log(sum(qr.resid(X, g(Y,lambda)/ exp(mean(log(Y)))^(lambda-1))^2))} L=seq(0,1,by=.05) LV=Vectorize(logv)(L) points(L,LV,pch=19,cex=.85,col="red")
As we can see (with those red dots) we can reproduce the R graph. But it might not be consistent with other techniques (and functions described above). For instance, we can plot the profile likelihood function,
logv=function(lambda){ s=summary(lm(g(dist,lambda)~speed, data=cars))$sigma e=lm(g(dist,lambda)~speed,data=cars)$residuals -n/2*log(2 * pi)-n*log(s)-.5/s^2*(sum(e^2)) (lambda-1)*sum(log(Y)) } L=seq(0,1,by=.01) LV=Vectorize(logv)(L) plot(L,LV,type="l",ylab="") (maxf=optimize(logv,0:1,maximum=TRUE)) $maximum [1] 0.430591 $objective [1] -197.6966 abline(v=maxf$maximum,lty=2)
The good point is that the optimal value of is the same as the one we got before. The only problem is that the -axis has a different scale. And using profile likelihood techniques to derive a confidence interval will give us different results (with a larger confidence interval than the one given by the standard function),
ic=maxf$objective-qchisq(.95,1) #install.packages("rootSolve") library(rootSolve) f=function(x)(logv(x)-ic) (lower=uniroot(f, c(0,maxf$maximum))$root) [1] 0.1383507 (upper=uniroot(f, c(maxf$maximum,1))$root) [1] 0.780573 segments(lower,ic,upper,ic,lwd=2,col="red")
Actually, it possible to rewrite the log-likelihood as
(let us just get rid of the constant), where
Here, it becomes
logv=function(lambda){ e=lm(g(dist,lambda)~speed,data=cars)$residuals elY=(exp(mean(log(Y)))) -n/2*log(sum((e/elY^lambda)^2)) } L=seq(0,1,by=.01) LV=Vectorize(logv)(L) plot(L,LV,type="l",ylab="") optimize(logv,0:1,maximum=TRUE) $maximum [1] 0.430591 $objective [1] -47.73436
with again the same optimal value for , and the same confidence interval, since the function is the same, up to some additive constant.
So we have been able to derive the optimal transformation according to Box-Cox transformation, but so far, the confidence interval is not the same (it might come from the fact that here we substituted an estimator to the unknown parameter .
转载:r-bloggers