R生存分析AFT
γ = 1/scale =1/0.902
α = exp(−(Intercept)γ)=exp(-(7.111)*γ)
> library(survival) > myfit=survreg(Surv(futime, fustat)~1 , ovarian, dist="weibull",scale=0) > summary(myfit) Call: survreg(formula = Surv(futime, fustat) ~ 1, data = ovarian, dist = "weibull", scale = 0) Value Std. Error z p (Intercept) 7.111 0.293 24.292 2.36e-130 Log(scale) -0.103 0.254 -0.405 6.86e-01 Scale= 0.902 Weibull distribution Loglik(model)= -98 Loglik(intercept only)= -98 Number of Newton-Raphson Iterations: 5 n= 26
画生存函数图
d<- seq(0, 2000, length.out=10000) h<-1-pweibull(d,shape=1/0.902,scale=exp(7.111)) df<-data.frame(t=d,s=h) library(ggplot2) ggplot(df,aes(x=t,y=s))+ geom_line(colour="green")+ ggtitle("s(t) \n 生存函数")
1. Surv
Description
创建一个生存对象,通常用作模型公式中的响应变量。 参数匹配是此功能的特殊功能,请参阅下面的详细信息。
Surv(time, time2, event,
type=c('right', 'left', 'interval', 'counting', 'interval2', 'mstate'),
origin=0)
is.Surv(x)
Arguments
time
对于右删失数据,这是一个跟踪时间。对于区间数据,第一个参数是区间的开始时间。
event
状态指示,通常,0=活着,1=死亡。其他选择是TRUE/FALSE (TRUE = 死亡) or 1/2 (2=死亡)。对于区间删失数据,状态指示,0=右删失, 1=事件时间, 2=左删失, 3=区间删失.
右删失(Right Censoring):只知道实际寿命大于某数;
左删失(Left Censoring):只知道实际寿命小于某数;
区间删失(Interval Censoring):只知道实际寿命在一个时间区间内。
time2
区间删失区间的结束时间或仅对过程数据进行计数。
type
指定删失类型。 "right", "left", "counting", "interval", "interval2" or "mstate".
如果event变量是一个因子,假定type="mstate"。如果没有指定参数time2,type="right";如果指定参数time2,type="counting"
Surv使用示例
> str(lung) 'data.frame': 228 obs. of 10 variables: $ inst : num 3 3 3 5 1 12 7 11 1 7 ... $ time : num 306 455 1010 210 883 ... $ status : num 2 2 1 2 2 1 2 2 2 2 ... $ age : num 74 68 56 57 60 74 68 71 53 61 ... $ sex : num 1 1 1 1 1 1 2 2 1 1 ... $ ph.ecog : num 1 0 0 1 0 1 2 2 1 2 ... $ ph.karno : num 90 90 90 90 100 50 70 60 70 70 ... $ pat.karno: num 100 90 90 60 90 80 60 80 80 70 ... $ meal.cal : num 1175 1225 NA 1150 NA ... $ wt.loss : num NA 15 15 11 0 0 10 1 16 34 ... > with(lung, Surv(time, status)) [1] 306 455 1010+ 210 883 1022+ 310 361 218 [10] 166 170 654 728 71 567 144 613 707 [19] 61 88 301 81 624 371 394 520 574 [28] 118 390 12 473 26 533 107 53 122 [37] 814 965+ 93 731 460 153 433 145 583 [46] 95 303 519 643 765 735 189 53 246 [55] 689 65 5 132 687 345 444 223 175 [64] 60 163 65 208 821+ 428 230 840+ 305 [73] 11 132 226 426 705 363 11 176 791 [82] 95 196+ 167 806+ 284 641 147 740+ 163 [91] 655 239 88 245 588+ 30 179 310 477 [100] 166 559+ 450 364 107 177 156 529+ 11 [109] 429 351 15 181 283 201 524 13 212 [118] 524 288 363 442 199 550 54 558 207 [127] 92 60 551+ 543+ 293 202 353 511+ 267 [136] 511+ 371 387 457 337 201 404+ 222 62 [145] 458+ 356+ 353 163 31 340 229 444+ 315+ [154] 182 156 329 364+ 291 179 376+ 384+ 268 [163] 292+ 142 413+ 266+ 194 320 181 285 301+ [172] 348 197 382+ 303+ 296+ 180 186 145 269+ [181] 300+ 284+ 350 272+ 292+ 332+ 285 259+ 110 [190] 286 270 81 131 225+ 269 225+ 243+ 279+ [199] 276+ 135 79 59 240+ 202+ 235+ 105 224+ [208] 239 237+ 173+ 252+ 221+ 185+ 92+ 13 222+ [217] 192+ 183 211+ 175+ 197+ 203+ 116 188+ 191+ [226] 105+ 174+ 177+ > str(heart) 'data.frame': 172 obs. of 8 variables: $ start : num 0 0 0 1 0 36 0 0 0 51 ... $ stop : num 50 6 1 16 36 39 18 3 51 675 ... $ event : num 1 1 0 1 0 1 1 1 0 1 ... $ age : num -17.16 3.84 6.3 6.3 -7.74 ... $ year : num 0.123 0.255 0.266 0.266 0.49 ... $ surgery : num 0 0 0 0 0 0 0 0 0 0 ... $ transplant: Factor w/ 2 levels "0","1": 1 1 1 2 1 2 1 1 1 2 ... $ id : num 1 2 3 3 4 4 5 6 7 7 ... > Surv(heart$start, heart$stop, heart$event) [1] ( 0.0, 50.0] ( 0.0, 6.0] ( 0.0, 1.0+] [4] ( 1.0, 16.0] ( 0.0, 36.0+] ( 36.0, 39.0] [7] ( 0.0, 18.0] ( 0.0, 3.0] ( 0.0, 51.0+] [10] ( 51.0, 675.0] ( 0.0, 40.0] ( 0.0, 85.0] [13] ( 0.0, 12.0+] ( 12.0, 58.0] ( 0.0, 26.0+] [16] ( 26.0, 153.0] ( 0.0, 8.0] ( 0.0, 17.0+] [19] ( 17.0, 81.0] ( 0.0, 37.0+] ( 37.0,1387.0] [22] ( 0.0, 1.0] ( 0.0, 28.0+] ( 28.0, 308.0] [25] ( 0.0, 36.0] ( 0.0, 20.0+] ( 20.0, 43.0] [28] ( 0.0, 37.0] ( 0.0, 18.0+] ( 18.0, 28.0] [31] ( 0.0, 8.0+] ( 8.0,1032.0] ( 0.0, 12.0+] [34] ( 12.0, 51.0] ( 0.0, 3.0+] ( 3.0, 733.0] [37] ( 0.0, 83.0+] ( 83.0, 219.0] ( 0.0, 25.0+] [40] ( 25.0,1800.0+] ( 0.0,1401.0+] ( 0.0, 263.0] [43] ( 0.0, 71.0+] ( 71.0, 72.0] ( 0.0, 35.0] [46] ( 0.0, 16.0+] ( 16.0, 852.0] ( 0.0, 16.0] [49] ( 0.0, 17.0+] ( 17.0, 77.0] ( 0.0, 51.0+] [52] ( 51.0,1587.0+] ( 0.0, 23.0+] ( 23.0,1572.0+] [55] ( 0.0, 12.0] ( 0.0, 46.0+] ( 46.0, 100.0] [58] ( 0.0, 19.0+] ( 19.0, 66.0] ( 0.0, 4.5+] [61] ( 4.5, 5.0] ( 0.0, 2.0+] ( 2.0, 53.0] [64] ( 0.0, 41.0+] ( 41.0,1408.0+] ( 0.0, 58.0+] [67] ( 58.0,1322.0+] ( 0.0, 3.0] ( 0.0, 2.0] [70] ( 0.0, 40.0] ( 0.0, 1.0+] ( 1.0, 45.0] [73] ( 0.0, 2.0+] ( 2.0, 996.0] ( 0.0, 21.0+] [76] ( 21.0, 72.0] ( 0.0, 9.0] ( 0.0, 36.0+] [79] ( 36.0,1142.0+] ( 0.0, 83.0+] ( 83.0, 980.0] [82] ( 0.0, 32.0+] ( 32.0, 285.0] ( 0.0, 102.0] [85] ( 0.0, 41.0+] ( 41.0, 188.0] ( 0.0, 3.0] [88] ( 0.0, 10.0+] ( 10.0, 61.0] ( 0.0, 67.0+] [91] ( 67.0, 942.0+] ( 0.0, 149.0] ( 0.0, 21.0+] [94] ( 21.0, 343.0] ( 0.0, 78.0+] ( 78.0, 916.0+] [97] ( 0.0, 3.0+] ( 3.0, 68.0] ( 0.0, 2.0] [100] ( 0.0, 69.0] ( 0.0, 27.0+] ( 27.0, 842.0+] [103] ( 0.0, 33.0+] ( 33.0, 584.0] ( 0.0, 12.0+] [106] ( 12.0, 78.0] ( 0.0, 32.0] ( 0.0, 57.0+] [109] ( 57.0, 285.0] ( 0.0, 3.0+] ( 3.0, 68.0] [112] ( 0.0, 10.0+] ( 10.0, 670.0+] ( 0.0, 5.0+] [115] ( 5.0, 30.0] ( 0.0, 31.0+] ( 31.0, 620.0+] [118] ( 0.0, 4.0+] ( 4.0, 596.0+] ( 0.0, 27.0+] [121] ( 27.0, 90.0] ( 0.0, 5.0+] ( 5.0, 17.0] [124] ( 0.0, 2.0] ( 0.0, 46.0+] ( 46.0, 545.0+] [127] ( 0.0, 21.0] ( 0.0, 210.0+] (210.0, 515.0+] [130] ( 0.0, 67.0+] ( 67.0, 96.0] ( 0.0, 26.0+] [133] ( 26.0, 482.0+] ( 0.0, 6.0+] ( 6.0, 445.0+] [136] ( 0.0, 428.0+] ( 0.0, 32.0+] ( 32.0, 80.0] [139] ( 0.0, 37.0+] ( 37.0, 334.0] ( 0.0, 5.0] [142] ( 0.0, 8.0+] ( 8.0, 397.0+] ( 0.0, 60.0+] [145] ( 60.0, 110.0] ( 0.0, 31.0+] ( 31.0, 370.0+] [148] ( 0.0, 139.0+] (139.0, 207.0] ( 0.0, 160.0+] [151] (160.0, 186.0] ( 0.0, 340.0] ( 0.0, 310.0+] [154] (310.0, 340.0+] ( 0.0, 28.0+] ( 28.0, 265.0+] [157] ( 0.0, 4.0+] ( 4.0, 165.0] ( 0.0, 2.0+] [160] ( 2.0, 16.0] ( 0.0, 13.0+] ( 13.0, 180.0+] [163] ( 0.0, 21.0+] ( 21.0, 131.0+] ( 0.0, 96.0+] [166] ( 96.0, 109.0+] ( 0.0, 21.0] ( 0.0, 38.0+] [169] ( 38.0, 39.0+] ( 0.0, 31.0+] ( 0.0, 11.0+] [172] ( 0.0, 6.0]
2.survreg
拟合参数生存回归模型。 这些是用于时间变量的任意变换的位置尺度模型; 最常见的情况使用对数变换,建立加速失效时间模型。
survreg(formula, data, weights, subset,
na.action, dist="weibull", init=NULL, scale=0,
control,parms=NULL,model=FALSE, x=FALSE,
y=TRUE, robust=FALSE, score=FALSE, ...)
dist
y变量的假设分布。"weibull", "exponential", "gaussian", "logistic","lognormal" and "loglogistic"。
scale
可选的固定值。如果设置<=0,scale将被估计
# survreg's scale = 1/(rweibull shape)
# survreg's intercept = log(rweibull scale)
# survreg结果中输出的scale与“rweibull scale”不同
control
控制值列表,参考survreg.control()
survreg.control
survreg.control(maxiter=30, rel.tolerance=1e-09,
toler.chol=1e-10, iter.max, debug=0, outer.max=10)
maxiter
最大迭代次数
rel.tolerance
“相对容忍度”来声明收敛
toler.chol
Tolerance to declare Cholesky decomposition singular
iter.max
与maxiter相同
debug
打印调试信息
outer.max
用于选择惩罚参数的外部迭代的最大数目
convergence tolerance
/x(k+1)/=/x(k)/*Tr+Ta
其中:
k 迭代次数;
x(k+1) x的k次迭代计算值;
x(k) x的k次迭代初始值;
Tr 相对误差
Ta绝对误差
// 表示绝对值
因此,从上述公式我们可以得到一个重要结论:设定计算迭代误差的时候,要全面权衡物理量的绝对值大小,同时要衡量收敛迭代值的相对误差,这样才能正确满意的设定自己需要的计算误差!
不能简单的以为绝对误差和相对误差越小越好,也要杜绝tolerance设定时的随意性(按照公式进行合理的设定是一个优秀的过程工程师的基本素质)
survreg使用示例
> library(survival) > str(ovarian) 'data.frame': 26 obs. of 6 variables: $ futime : num 59 115 156 421 431 448 464 475 477 563 ... $ fustat : num 1 1 1 0 1 0 1 1 0 1 ... $ age : num 72.3 74.5 66.5 53.4 50.3 ... $ resid.ds: num 2 2 2 2 2 1 2 2 2 1 ... $ rx : num 1 1 1 2 1 1 2 2 1 2 ... $ ecog.ps : num 1 1 2 1 1 2 2 2 1 2 ... > # 拟合指数模型,以下2个模型是一样的 > survreg(Surv(futime, fustat) ~ ecog.ps + rx, ovarian, dist='weibull', + scale=1) Call: survreg(formula = Surv(futime, fustat) ~ ecog.ps + rx, data = ovarian, dist = "weibull", scale = 1) Coefficients: (Intercept) ecog.ps rx 6.9618376 -0.4331347 0.5815027 Scale fixed at 1 # 注意这里 Loglik(model)= -97.2 Loglik(intercept only)= -98 Chisq= 1.67 on 2 degrees of freedom, p= 0.43 n= 26 > survreg(Surv(futime, fustat) ~ ecog.ps + rx, ovarian, + dist="exponential") Call: survreg(formula = Surv(futime, fustat) ~ ecog.ps + rx, data = ovarian, dist = "exponential") Coefficients: (Intercept) ecog.ps rx 6.9618376 -0.4331347 0.5815027 Scale fixed at 1 # 注意这里 Loglik(model)= -97.2 Loglik(intercept only)= -98 Chisq= 1.67 on 2 degrees of freedom, p= 0.43 n= 26 > ########################################## > > str(lung) 'data.frame': 228 obs. of 10 variables: $ inst : num 3 3 3 5 1 12 7 11 1 7 ... $ time : num 306 455 1010 210 883 ... $ status : num 2 2 1 2 2 1 2 2 2 2 ... $ age : num 74 68 56 57 60 74 68 71 53 61 ... $ sex : num 1 1 1 1 1 1 2 2 1 1 ... $ ph.ecog : num 1 0 0 1 0 1 2 2 1 2 ... $ ph.karno : num 90 90 90 90 100 50 70 60 70 70 ... $ pat.karno: num 100 90 90 60 90 80 60 80 80 70 ... $ meal.cal : num 1175 1225 NA 1150 NA ... $ wt.loss : num NA 15 15 11 0 0 10 1 16 34 ... > # weibull分布,如果设置scale<=0,模型将使用最大似然估计,估计最优的scale > myfit<-survreg(Surv(time, status) ~ ph.ecog + age + sex, data=lung,dist = "weibull") > myfit Call: survreg(formula = Surv(time, status) ~ ph.ecog + age + sex, data = lung, dist = "weibull") Coefficients: (Intercept) ph.ecog age sex 6.273435252 -0.339638098 -0.007475439 0.401090541 Scale= 0.731109 Loglik(model)= -1132.4 Loglik(intercept only)= -1147.4 Chisq= 29.98 on 3 degrees of freedom, p= 1.4e-06 n=227 (1 observation deleted due to missingness) > summary(myfit) Call: survreg(formula = Surv(time, status) ~ ph.ecog + age + sex, data = lung, dist = "weibull") Value Std. Error z p (Intercept) 6.27344 0.45358 13.83 1.66e-43 ph.ecog -0.33964 0.08348 -4.07 4.73e-05 age -0.00748 0.00676 -1.11 2.69e-01 sex 0.40109 0.12373 3.24 1.19e-03 Log(scale) -0.31319 0.06135 -5.11 3.30e-07 Scale= 0.731 Weibull distribution Loglik(model)= -1132.4 Loglik(intercept only)= -1147.4 Chisq= 29.98 on 3 degrees of freedom, p= 1.4e-06 Number of Newton-Raphson Iterations: 5 n=227 (1 observation deleted due to missingness) # survreg's scale = 1/(rweibull shape) # survreg's intercept = log(rweibull scale) # survreg结果中输出的scale与“rweibull scale”不同 ############################################## > # weibull分布,以sex对数据分组,产生2个不同的scale > myfit1<-survreg(Surv(time, status) ~ ph.ecog + age + strata(sex), data=lung,dist = "weibull") > summary(myfit1) Call: survreg(formula = Surv(time, status) ~ ph.ecog + age + strata(sex), data = lung, dist = "weibull") Value Std. Error z p (Intercept) 6.73235 0.42396 15.880 8.75e-57 ph.ecog -0.32443 0.08649 -3.751 1.76e-04 age -0.00581 0.00693 -0.838 4.02e-01 sex=1 -0.24408 0.07920 -3.082 2.06e-03 sex=2 -0.42345 0.10669 -3.969 7.22e-05 Scale: sex=1 sex=2 0.783 0.655 Weibull distribution Loglik(model)= -1137.3 Loglik(intercept only)= -1146.2 Chisq= 17.8 on 2 degrees of freedom, p= 0.00014 Number of Newton-Raphson Iterations: 5 n=227 (1 observation deleted due to missingness) ################################################ # 具有高斯误差的线性回归,以及左删失数据 > survreg(Surv(durable, durable>0, type='left') ~ age + quant, + data=tobin, dist='gaussian', scale = 0) Call: survreg(formula = Surv(durable, durable > 0, type = "left") ~ age + quant, data = tobin, dist = "gaussian", scale = 0) Coefficients: (Intercept) age quant 15.14486636 -0.12905928 -0.04554166 Scale= 5.57254 Loglik(model)= -28.9 Loglik(intercept only)= -29.5 Chisq= 1.1 on 2 degrees of freedom, p= 0.58 n= 20
3.survdiff
测试在两条或更多条生存曲线之间是否存在差异
survdiff(formula, data, subset, na.action, rho=0)
rho = 0 表示log-rank or Mantel-Haenszel检验;
rho = 1 表示Wilcoxon检验
log-rank检验(时序检验)--生存时间分布近似weibull分布或者属于比例风险模型时效率较高
似然比检验(likelihood ratio test)--生存时间分布近似指数分布时效率较高
wilcoxon检验--生存时间分布近似对数正态分布效率较高
survdiff使用示例
> survdiff(Surv(futime, fustat) ~ rx,data=ovarian) Call: survdiff(formula = Surv(futime, fustat) ~ rx, data = ovarian) N Observed Expected (O-E)^2/E (O-E)^2/V rx=1 13 7 5.23 0.596 1.06 rx=2 13 5 6.77 0.461 1.06 Chisq= 1.1 on 1 degrees of freedom, p= 0.303 > # 用strata来控制协变量的影响 > survdiff(Surv(time, status) ~ pat.karno + strata(inst), data=lung) Call: survdiff(formula = Surv(time, status) ~ pat.karno + strata(inst), data = lung) n=224, 4 observations deleted due to missingness. N Observed Expected (O-E)^2/E (O-E)^2/V pat.karno=30 2 1 0.692 0.13720 0.15752 pat.karno=40 2 1 1.099 0.00889 0.00973 pat.karno=50 4 4 1.166 6.88314 7.45359 pat.karno=60 30 27 16.298 7.02790 9.57333 pat.karno=70 41 31 26.358 0.81742 1.14774 pat.karno=80 50 38 41.938 0.36978 0.60032 pat.karno=90 60 38 47.242 1.80800 3.23078 pat.karno=100 35 21 26.207 1.03451 1.44067 Chisq= 21.4 on 7 degrees of freedom, p= 0.00326 > survdiff(Surv(time, status) ~ pat.karno + sex, data=lung) Call: survdiff(formula = Surv(time, status) ~ pat.karno + sex, data = lung) n=225, 3 observations deleted due to missingness. N Observed Expected (O-E)^2/E (O-E)^2/V pat.karno=30, sex=1 1 1 0.246 2.31e+00 2.33e+00 pat.karno=30, sex=2 1 0 0.412 4.12e-01 4.16e-01 pat.karno=40, sex=1 1 1 0.132 5.68e+00 5.72e+00 pat.karno=40, sex=2 1 0 1.204 1.20e+00 1.22e+00 pat.karno=50, sex=1 2 2 0.111 3.23e+01 3.25e+01 pat.karno=50, sex=2 2 2 0.968 1.10e+00 1.11e+00 pat.karno=60, sex=1 18 17 9.173 6.68e+00 7.14e+00 pat.karno=60, sex=2 12 10 6.064 2.56e+00 2.68e+00 pat.karno=70, sex=1 30 23 16.737 2.34e+00 2.65e+00 pat.karno=70, sex=2 11 8 9.527 2.45e-01 2.62e-01 pat.karno=80, sex=1 32 26 24.570 8.32e-02 9.91e-02 pat.karno=80, sex=2 19 13 16.311 6.72e-01 7.55e-01 pat.karno=90, sex=1 31 25 24.992 2.66e-06 3.21e-06 pat.karno=90, sex=2 29 13 24.420 5.34e+00 6.33e+00 pat.karno=100, sex=1 21 15 14.002 7.11e-02 7.91e-02 pat.karno=100, sex=2 14 6 13.131 3.87e+00 4.25e+00 Chisq= 66.1 on 15 degrees of freedom, p= 2.17e-08
anova
计算一个或多个拟合模型的方差(或偏差)表的分析
mod1<-survreg(Surv(time, status) ~ ph.ecog + age + sex, data=lung,dist = "weibull") mod2<-survreg(Surv(time, status) ~ ph.ecog + age + sex+ph.karno*pat.karno, data=lung,dist = "weibull") anova(mod1,mod2,test="Chisq") Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi) 1 ph.ecog + age + sex 222 2264.877 NA NA NA 2 ph.ecog + age + sex + ph.karno * pat.karno 215 2209.372 = 7 55.50527 1.183848e-09 library(MASS) # 简洁模型更好 stepAIC(mod1) stepAIC(mod2)