Beneath and Beyond the Cox Model

2022-09-06

by Joseph Rickert

The Cox Proportional Hazards model has so dominated survival analysis over the past forty years that I imagine quite a few people who regularly analyze survival data might assume that the Cox model, along with the Kaplan-Meier estimator and a few standard parametric models, encompass just about everything there is to say about the subject. It would not be surprising if this were true because it is certainly the case that these tools have dominated the teaching of survival analysis. Very few introductory textbooks look beyond the Cox Model and the handful of parametric models built around Gompertz, Weibull and logistic functions. But why do Cox models work so well? What is the underlying theory? How do all the pieces of the standard survival tool kit fit together?

As it turns out, Kaplan-Meier estimators, the Cox Proportional Hazards model, Aalen-Johansen estimators, parametric models, multistate models, competing risk models, frailty models and almost every other survival analysis technique implemented in the vast array of R packages comprising the CRAN Survival Task View, are supported by an elegant mathematical theory that formulates time-to-event analyses as stochastic counting models. The theory is about thirty years old. It was initiated by Odd Aalen in his 1975 Berkeley PhD dissertation, developed over the following twenty years largely by Scandinavian statisticians and their collaborators, and set down in more or less complete form in two complementary textbooks [5 and 9] by 1993. Unfortunately, because of its dependency on measure theory, martingales, stochastic integrals and other notions from advanced mathematics, it does not appear that the counting process theory of survival analysis has filtered down in a form that is readily accessible by practitioners.

In this rest of this post, I would like to suggest a path for getting a working knowledge of this theory by introducing two very readable papers, which taken together, provide an excellent overview of the relationship of counting processes to some familiar aspects of survival analysis.

The first paper, The History of applications of martingales in survival analysis by Aalen, Andersen, Borgan, Gill, and Keiding [4] is a beautiful historical exposition of the counting process theory by master statisticians who developed a good bit of the theory themselves. Read through this paper in an hour of so and you will have an overview of the theory, see elementary explanations for some of the mathematics involved, and gain a working idea of how the major pieces of the theory fit together how they came together.

The second paper, Who needs the Cox model anyway? [7], is actually a teaching note put together by Bendix Carstensen. It is a lesson with an attitude and the R code to back it up. Carstensen demonstrates the equivalence of the Cox model to a particular Poisson regression model. Working through this note is like seeing a magic trick and then learning how it works.

The following reproduces a portion of Carstensen’s note. I provide some commentary and fill in a few elementary details in the hope that I can persuade you that it is worth the trouble to spend some time with it yourself.

Carstensen use the North Central Cancer Treatment Group lung cancer survival data set which is included in the survival package for his examples.

## # A tibble: 228 × 10
##     inst  time status   age   sex ph.ecog ph.karno pat.karno meal.cal wt.loss
##    <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>    <dbl>     <dbl>    <dbl>   <dbl>
##  1     3   306      2    74     1       1       90       100     1175      NA
##  2     3   455      2    68     1       0       90        90     1225      15
##  3     3  1010      1    56     1       0       90        90       NA      15
##  4     5   210      2    57     1       1       90        60     1150      11
##  5     1   883      2    60     1       0      100        90       NA       0
##  6    12  1022      1    74     1       1       50        80      513       0
##  7     7   310      2    68     2       2       70        60      384      10
##  8    11   361      2    71     2       2       60        80      538       1
##  9     1   218      2    53     1       1       70        80      825      16
## 10     7   166      2    61     1       2       70        70      271      34
## # … with 218 more rows

It may not be obvious at first because there is no subject ID column, but this data frame contains one row for each of 228 subjects. The first column is an institution code. time is the time of death or censoring. status is the censoring indicator. The remaining columns are covariates. I select time, status, sex and age, drop the others from the our working data frame, and then replicate Carstensen’s preprocessing in a tidy way. The second line of mutate() adds a small number to each event time to avoid ties.

set.seed(1952)
lung <- lung %>% select(time, status, age, sex) %>% 
                  mutate(sex = factor(sex,labels=c("M","F")),
                         time = time + round(runif(nrow(lung),-3,3),2))

To get a feel for the data we fit a Kaplan-Meier Curve stratified by sex.

surv.obj <- with(lung, Surv(time, status == 2))
fit.by_sex <- survfit(surv.obj ~ sex, data = lung, conf.type = "log-log")
autoplot(fit.by_sex,
          xlab = "Survival Time (Days) ", 
          ylab = "Survival Probabilities",
         main = "Kaplan-Meier plot of lung data by sex") +  
 theme(plot.title = element_text(hjust = 0.5))

Next, following Carstensen, I fit the baseline Cox model to be used in the model comparisons below.

m0.cox <- coxph( Surv( time, status==2 ) ~ age + sex, data=lung )
summary( m0.cox )

## Call:
## coxph(formula = Surv(time, status == 2) ~ age + sex, data = lung)
## 
##   n= 228, number of events= 165 
## 
##          coef exp(coef) se(coef)     z Pr(>|z|)   
## age   0.01705   1.01720  0.00922  1.85   0.0643 . 
## sexF -0.52033   0.59433  0.16751 -3.11   0.0019 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      exp(coef) exp(-coef) lower .95 upper .95
## age      1.017      0.983     0.999     1.036
## sexF     0.594      1.683     0.428     0.825
## 
## Concordance= 0.603  (se = 0.025 )
## Likelihood ratio test= 14.4  on 2 df,   p=7e-04
## Wald test            = 13.8  on 2 df,   p=0.001
## Score (logrank) test = 14  on 2 df,   p=9e-04

The hazard ratios for age and sexF are given in the output column labeled exp(coef). As Carstensen points out mortality increases by 1.7% per year of age at diagnosis and that women have 40% lower mortality than men.

Carstensen next shows that this model can be exactly replicated by a particular and somewhat peculiar Poisson model. Doing this requires a shift in how time is conceived. In the Kaplan-Meier estimator and the Cox model, time is part of the response vector. In the counting process formulation, time is a covariate. Time is divided into many small intervals of length h in which an individuals “exit status” , d is recorded. d will be 1 if death occurred or 0 otherwise. The h intervals represent an individual’s risk time. The pair (d, h) are used to calculate an empirical rate for the process which corresponds to the theoretical rate:

λ(t) = lim_h→0 P[event in (t, t + h)| at risk at time t]/h (*)

The first step in formulating a Poisson model is to set up a data structure that will allow for this more nuanced treatment of time. The function Lexis from the Epi package creates an object of class Lexis, a data frame with columns that will be used to distinguish event time (death or censoring time) from the time intervals in which subjects are at risk for the event. Collectively, these intervals span period from when the first until the last recorded time.

Lung <- Epi::Lexis( exit = list( tfe=time ),
  exit.status = factor( status, labels=c("Alive","Dead") ),
  data = lung )

## NOTE: entry.status has been set to "Alive" for all.
## NOTE: entry is assumed to be 0 on the tfe timescale.

head(Lung)

##  lex.id tfe lex.dur lex.Cst lex.Xst   time status age sex
##       1   0   308.7   Alive    Dead  308.7      2  74   M
##       2   0   457.4   Alive    Dead  457.4      2  68   M
##       3   0  1008.6   Alive   Alive 1008.6      1  56   M
##       4   0   212.1   Alive    Dead  212.1      2  57   M
##       5   0   885.5   Alive    Dead  885.5      2  60   M
##       6   0  1023.7   Alive   Alive 1023.7      1  74   M

The new variables are:

lex.id - Subject ID
tfe - Time from entry at the beginning of the follow-up interval
lex.dur - Duration of the follow-up interval
lex.Cst - Entry status (Alive in our case)
lex.Xst - Exit status at the end of the follow-up interval: tfe + lex.dur (Either Alive or Dead in our case)

Next, the time variable is sorted to produce a vector of endpoints for the at risk intervals and a new time-split Lexis data frame is created using the splitMulti() function from the popEpi package.

Lung.s <- splitMulti( Lung, tfe=c(0,sort(unique(Lung$time))) )
head(Lung.s)

##  lex.id   tfe lex.dur lex.Cst lex.Xst  time status age sex
##       1  0.00    7.67   Alive   Alive 308.7      2  74   M
##       1  7.67    1.88   Alive   Alive 308.7      2  74   M
##       1  9.55    0.23   Alive   Alive 308.7      2  74   M
##       1  9.78    0.57   Alive   Alive 308.7      2  74   M
##       1 10.35    2.25   Alive   Alive 308.7      2  74   M
##       1 12.60    0.45   Alive   Alive 308.7      2  74   M

summary( Lung.s )

##        
## Transitions:
##      To
## From    Alive Dead  Records:  Events: Risk time:  Persons:
##   Alive 25941  165     26106      165      69632       228

tfe tracks the time from entry into the study. This is calendar time.

Carstensen then fits a Cox model to both the Lexis data set and the time-split Lexis data set and notes that the results match the original baseline Cox model. This is as one would expect since the three different data frames contain the same information. Nevertheless, it is a pleasant surprise that the coxph() and Surv() functions are flexible enough to assimilate the three different input data formats.

mL.cox <- coxph( Surv( tfe, tfe+lex.dur, lex.Xst=="Dead" ) ~ age + sex, eps=10^-11, iter.max=25, data=Lung )
mLs.cox <- coxph( Surv( tfe, tfe+lex.dur, lex.Xst=="Dead" ) ~ age + sex, eps=10^-11, iter.max=25, data=Lung.s )
round( cbind( ci.exp(m0.cox), ci.exp(mL.cox), ci.exp(mLs.cox) ), 6 )

##      exp(Est.)  2.5%  97.5% exp(Est.)  2.5%  97.5% exp(Est.)  2.5%  97.5%
## age     1.0172 0.999 1.0357    1.0172 0.999 1.0357    1.0172 0.999 1.0357
## sexF    0.5943 0.428 0.8253    0.5943 0.428 0.8253    0.5943 0.428 0.8253

Now, Carstensen executes what would seem to be a very strange modeling maneuver. He turns calender time, tfe into a factor and fits a Cox model with tfe as a covariate.

mLs.pois.fc <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ 0 + factor(tfe) + age + sex, family=poisreg, data=Lung.s )

An important technical point is that the time intervals in equation (*) above do not satisfy the independence assumption for a Poisson regression model. Nevertheless, the standard glm() machinery can be used to fit the model because, as Carstensen demonstrates, the likelihood function for the conditional probabilities is proportional to the partial likelihood function of the Cox model.

cbind( ci.exp(mLs.cox),ci.exp( mLs.pois.fc, subset=c("age","sex") ) )

##      exp(Est.)  2.5%  97.5% exp(Est.)  2.5%  97.5%
## age     1.0172 0.999 1.0357    1.0172 0.999 1.0357
## sexF    0.5943 0.428 0.8253    0.5943 0.428 0.8253

Carstensen concludes that this demonstrates that the Cox model is equivalent to a specific Poisson model which has one rate parameter for each time internal, and emphasizes that this is not a new result. He notes that the equivalence was demonstrated some time ago, theoretically by Theodore Holford [10], and in practice, by John Whitehead [14]. Also, in a vignette [15] for the survival package Zhong et al. state that this trick may be used to approximate a Cox model.

Carstensen then demonstrates that more practical Poisson models can be fit by using splines to decrease the number of at risk intervals. The first uses a spline basis with arbitrary knot locations and the second fits a penalized spline gam model.

splines

t.kn <- c(0,25,100,500,1000) 
mLs.pois.sp <- glm( cbind(lex.Xst=="Dead",lex.dur) ~ Ns(tfe,knots=t.kn) + age + sex, family=poisreg, data=Lung.s )

Penalized splines

mLs.pois.ps <- mgcv::gam( cbind(lex.Xst=="Dead",lex.dur) ~ s(tfe) + age + sex, family=poisreg, data=Lung.s )

Carstensen finishes up this portion of his analysis by noting the similarity of the estimates of age and sex effects from the different models.

ests <-
 rbind( ci.exp(m0.cox),
 ci.exp(mLs.cox),
 ci.exp(mLs.pois.fc,subset=c("age","sex")),
 ci.exp(mLs.pois.sp,subset=c("age","sex")),
 ci.exp(mLs.pois.ps,subset=c("age","sex")) )

cmp <- cbind( ests[c(1,3,5,7,9) ,],
 ests[c(1,3,5,7,9)+1,] )

rownames( cmp ) <-
 c("Cox","Cox-split","Poisson-factor","Poisson-spline","Poisson-penSpl")

 colnames( cmp )[c(1,4)] <- c("age","sex")
round( cmp,5 )

##                  age   2.5% 97.5%    sex   2.5%  97.5%
## Cox            1.017 0.9990 1.036 0.5943 0.4280 0.8253
## Cox-split      1.017 0.9990 1.036 0.5943 0.4280 0.8253
## Poisson-factor 1.017 0.9990 1.036 0.5943 0.4280 0.8253
## Poisson-spline 1.016 0.9980 1.035 0.5993 0.4316 0.8322
## Poisson-penSpl 1.016 0.9983 1.035 0.6021 0.4338 0.8358

This demonstration provides some convincing evidence that both parametric and non-parametric models are part of a single underlying theory! When you think about it, this is an astonishing idea. To further explore the counting process theory of survival models, I provide a definition of Aalen’s multiplicative intensity model and a list of references below that I hope you will find helpful.

Finally, there is much more to Carstensen’s note than I have presented. He goes on to provide a fairly complete analysis of the lung data while looking at cumulative rates, survival, practical time splitting, time varying coefficients and more ideas along the way.

Appendix: Multiplicative Intensity Model

For some direct insight into how the Cox Proportional Hazards model fits into the counting process theory have a look at Odd Aalen’s definition of the multiplicative intensity model. Aalan begins his landmark 1978 paper Nonparametric Inference For a Family of Counting Processes] [1] by defining the fundamental components of his multiplicative intensity model.

Let N = (N₁, . . . N_k) be a multivariate counting process which is a collection of univariate counting processes on the interval [0,t] each of which counts events in [0,t]. The N_i may depend on each other. Let σ(F_t) be the sigma algebra which represents the collection of all events that can be determined to have happened by time, t. Let α = α₁, . . . α be an unknown, non-negative function and let Y = (Y₁, . . . Y_k) be a process observable over [0,t].

Define Λ_i(t) = lim_h→0E(N_i(t + h) - N_i(t) | F_t )/h i = 1, … k, to be the the intensity process of the counting process N.

Then the multiplicative intensity model is defined to be: Λ_i(t) = α_i(t)Y_i(t).

This last line certainly looks like the Cox model, and it is not to difficult to confirm that this is indeed the case. You can find the gory details in Fleming and Harrington [9 p 126] or comprehensive monograph by Andersen et al. [5 p481].

References

I believe that the following references (annotated with a few comments) comprise a reasonable basis from gaining familiarity with the counting process approach to survival modeling.

Aalen (1978) Odd O. Aalen Nonparametric Inference For a Family of Counting Processes. The Annals of Statistics 1978, vol 6, no 4, 701-726 This is the ‘Ur’ paper for the multiplicative intensity process. At least the first half should be approachable with some knowledge of measure theory and conditional expectation.
Aalen & Johansen (1978) Odd O. Allen and Soren Johansen. An Empirical Transition Matrix for Non-homogenous Markov Chains Based on Censored Observations. Scand J Statistics 5: 141-150, 1978. This is the source of the Aalen-Johansen estimator. The etm package provides an implementation.
Aalen et al. (2008) Odd O. Aalen, Ørnulf Borgan and Håkon K. Gjessing. Survival and Event History Analysis; A Process Point of View Springer Verlag 2008. This is definitely the text to read first. It is comprehensive, takes a modern point of view, is well written, and introduces the difficult mathematics without all of the technical details that often slow down the process of learning some new mathematics.
Aalen et al. (2009). Odd O. Aalen, Per Kragh Andersen, Ørnulf Borgan, Richard D. Gill and Niels Keiding. History of applications of martingales in survival analysis vol 5, no 1, June 2009
Andersen et al. (1993) Per Kragh Andersen, Ørnulf Borgan, Richard D. Gill, Niels Keiding. Statistical Models Based on Counting Processes. Springer-Verlag, 1993 This text presents numerous examples along with a discussion of the theory and emphasizes parametric models.
Borgan (1997). Ørnulf Borgan Three contributions to the Encyclopedia of Biostatistics: The Nelson-Aalen, Kaplan-Meier, and Aalen-Johansen estimators - Very clear summaries.
Carstensen (2019) Bendex Carstensen. Who needs the Cox model anyway?
Cox (1972) Regression Models and Life-Tables JRSS Vol. 34, No.2, 187-200
Fleming & Harrington (1991). Thomas R. Fleming and David P. Harrington. Counting Processes & Survival Analysis, John Wiley & Sons, Inc. 1991 This text book develops all of the math needed and goes on to study non-parametric models including the Cox model.
Holford. Life table with concomitant information. Biometrics, 32:587{597, 1976.
Mohammed (2019). Introduction to Survival Analysis using R
Threneau (2022). The survival package (vignette)
Therneau et al. (1990). Martingale based residuals for survival models. Biometrika 77, 147-160. Martingale residuals appear to be the one use case where martingales openly surface in the survival calculations.
Whitehead (1980). Fitting Cox’s regression model to survival data using GLIM. Applied Statistics, 29(3):268{275, 1980.
Zhong et al. (2019). Approximating a Cox Model. This is a survival package vignette.