<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Survival Analysis on R Views</title>
    <link>https://rviews.rstudio.com/tags/survival-analysis/</link>
    <description>Recent content in Survival Analysis on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 19 Apr 2023 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/survival-analysis/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Multistate Models for Medical Applications</title>
      <link>https://rviews.rstudio.com/2023/04/19/multistate-models-for-medical-applications/</link>
      <pubDate>Wed, 19 Apr 2023 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2023/04/19/multistate-models-for-medical-applications/</guid>
      <description>
        


&lt;p&gt;Clinical research studies and healthcare economics studies are frequently concerned with assessing the prognosis for survival in circumstances where patients suffer from a disease that progresses from state to state. Standard survival models only directly model two states: alive and dead. Multi-state models enable directly modeling disease progression where patients are observed to be in various states of health or disease at random intervals, but for which, except for death, the times of entering or leaving states are unknown. Multi-state models easily accommodate &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3684949/#:~:text=In%20statistical%20literature%2C%20interval%20censoring,instead%20of%20being%20observed%20exactly.&#34;&gt;interval censored&lt;/a&gt; intermediate states while making the usual assumption that death times are known but may be &lt;a href=&#34;https://en.wikipedia.org/wiki/Censoring_(statistics)&#34;&gt;right censored&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A natural way to conceptualize modeling the dynamics of disease progression with interval censored states is as continuous time Markov chains. The following diagram illustrates a possible disease progression model where there is some possibility of dying from any state, but otherwise a patient would progress from being healthy, to mild disease, to severe disease and then perhaps death.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: shape&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2023/04/19/multistate-models-for-medical-applications/index_files/figure-html/unnamed-chunk-1-1.png&#34; width=&#34;672&#34; /&gt;
It is true that in that the Markov assumption implies that the time patients spend in the various states are exponentially distributed. However, the mathematical theory of stochastic multi-state processes is very rich and can accommodate more realistic models with state dependent hazard rates that vary over time and other relaxations of the Markov assumption. Moreover, there is robust software in R (and other languages) that make multi-state stochastic survival models practical.&lt;/p&gt;
&lt;p&gt;In the remainder of this post, I present a variation of a disease progression model discussed by Ardo van den Hout in some detail in his incredibly informative and very readable monograph &lt;a href=&#34;https://www.routledge.com/Mult-i--State-Survival-Models-for-Interval-Censored-Data/Hout/p/book/9780367570569&#34;&gt;Multi-State Survival Models for Interval Censored Data&lt;/a&gt; . Also note that van den Hout’s model is itself an elaboration of the main example presented by Christopher Jackson in the &lt;a href=&#34;https://cran.r-project.org/web/packages/msm/vignettes/msm-manual.pdf&#34;&gt;vignette&lt;/a&gt; to his &lt;code&gt;msm&lt;/code&gt;package. This post presents a slower development of the model developed by Jackson and van den Hout that might be easier for a person not already familiar with the &lt;code&gt;msm&lt;/code&gt; package to follow.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(tidymodels)
library(msm)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;the-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;The Data&lt;/h3&gt;
&lt;p&gt;The data set explored by both Jackson and van den Hout is the Cardiac Allograft Vasculopathy (CAV) data set which contains the individual histories of angiographic examinations of 622 heart transplant recipients collected at the Papworth Hospital in the United Kingdom. This data is included in the &lt;code&gt;msm&lt;/code&gt; package and is a good candidate to be the &lt;em&gt;iris&lt;/em&gt; dataset for progressive disease models. It is a rich data set with 2846 rows and multiple covariates, including patient age and time time since transplant, both of which can be use for time scales, multiple state transitions among four states and no missing values. Observations of intermediate states are interval censored and have been recorded varying time intervals. Deaths are “exact” or right censored.&lt;/p&gt;
&lt;p&gt;The following code creates a new variable that preserves the original state data for each observation and displays the data in tibble format.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(1234)
df &amp;lt;- tibble(cav) %&amp;gt;% mutate(o_state = state)

df&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2,846 × 11
##     PTNUM   age years  dage   sex pdiag cumrej state firstobs statemax o_state
##     &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt;  &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt;   &amp;lt;int&amp;gt;
##  1 100002  52.5  0       21     0 IHD        0     1        1        1       1
##  2 100002  53.5  1.00    21     0 IHD        2     1        0        1       1
##  3 100002  54.5  2.00    21     0 IHD        2     2        0        2       2
##  4 100002  55.6  3.09    21     0 IHD        2     2        0        2       2
##  5 100002  56.5  4       21     0 IHD        3     2        0        2       2
##  6 100002  57.5  5.00    21     0 IHD        3     3        0        3       3
##  7 100002  58.4  5.85    21     0 IHD        3     4        0        4       4
##  8 100003  29.5  0       17     0 IHD        0     1        1        1       1
##  9 100003  30.7  1.19    17     0 IHD        1     1        0        1       1
## 10 100003  31.5  2.01    17     0 IHD        1     3        0        3       3
## # ℹ 2,836 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The state table which presents the number of times each pair of states were observed in successive observation times shows that 46 transitions from state 2 (Mild CAV) to state 1 (No CAV), 4 transitions from state 3 (Severe CAV) to Healthy and 13 transitions from Severe CAV to Mild CAV.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;statetable.msm(state = state, subject = PTNUM, data = df)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##     to
## from    1    2    3    4
##    1 1367  204   44  148
##    2   46  134   54   48
##    3    4   13  107   55&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I will follow van den Hout and assume these backward transitions are misclassified and alter the state variable so there is no back sliding. The following code does this in a tidy way and also creates a new variable b_age which records the baseline age at which patients entered the study. (Note: you can find van den Hout’s code &lt;a href=&#34;https://www.ucl.ac.uk/~ucakadl/Book/Ch1_CAV_MsmAnalysis.r&#34;&gt;here&lt;/a&gt;)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df1 &amp;lt;- df %&amp;gt;% group_by(PTNUM) %&amp;gt;% 
                     mutate(b_age = min(age),
                            state = cummax(state)
                     )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This transformation will make the state transition table conform to the diagram above, but with state 1 representing No CAV rather than Health.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;statetable.msm(state = state, subject = PTNUM, data = df1)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##     to
## from    1    2    3    4
##    1 1336  185   40  139
##    2    0  220   52   49
##    3    0    0  140   63&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;setting-up-and-running-the-model&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Setting Up and Running the Model&lt;/h3&gt;
&lt;p&gt;The next step is to set up the model using the function &lt;code&gt;msm()&lt;/code&gt; whose great flexibility means that some care must be taken to set parameter values.&lt;/p&gt;
&lt;p&gt;First, we set up the initial guess for the intensity matrix, Q, which determines the transition rates among states for a continuous time Markov chain. For the &lt;code&gt;msm&lt;/code&gt; function, positive values indicate possible transitions.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Intensity matrix Q:
q &amp;lt;- 0.01
Q &amp;lt;- rbind(c(0,q,0,q), c(0,0,q,q),c(0,0,0,q),c(0,0,0,0))
qnames &amp;lt;- c(&amp;quot;q12&amp;quot;,&amp;quot;q14&amp;quot;,&amp;quot;q23&amp;quot;,&amp;quot;q24&amp;quot;,&amp;quot;q34&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we set up the covariate structure which van den Hout discusses in his monograph, but does not show in the code on the book’s website referenced above. For this model, transitions from state 1 to state 2 and from state 1 to state 4 depend on time,&lt;code&gt;years&lt;/code&gt;, the age of the patient at transplant time &lt;code&gt;b_age&lt;/code&gt;, and &lt;code&gt;dage&lt;/code&gt;, the age of the donor. The other transitions depend only on &lt;code&gt;dage&lt;/code&gt;. So, we see that &lt;code&gt;msm()&lt;/code&gt; can deal with time varying covariates as well as permitting individual state transitions to be driven by different covariates.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;covariates = list(&amp;quot;1-2&amp;quot; = ~ years + b_age + dage , 
                  &amp;quot;1-4&amp;quot; = ~ years + b_age + dage ,
                  &amp;quot;2-3&amp;quot; = ~ dage,
                  &amp;quot;2-4&amp;quot; = ~ dage,
                  &amp;quot;3-4&amp;quot; = ~ dage)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, we set the remaining parameters for the model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;obstype &amp;lt;- 1
center &amp;lt;- FALSE
deathexact &amp;lt;- TRUE
method &amp;lt;- &amp;quot;BFGS&amp;quot;
control &amp;lt;- list(trace = 0, REPORT = 1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;obstype = 1&lt;/strong&gt; indicates that observations have been taken at arbitrary time points. They are &lt;em&gt;snapshots&lt;/em&gt; of the process that are common for panel data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;center = FALSE&lt;/strong&gt; means that covariates will not be centered at their means during the maximum likelihood estimation process. The default for this parameter is TRUE.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;deathexact = TRUE&lt;/strong&gt; indicates that the final absorbing state is exactly observed. This is the defining assumption survival data. In &lt;code&gt;msm&lt;/code&gt; this is equivalent to setting obstupe = 3 for state 4, our absorbing state.&lt;/p&gt;
&lt;p&gt;** method = BFGS** signals &lt;code&gt;optim()&lt;/code&gt; to use the optimization method published simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno. (look &lt;a href=&#34;https://en.wikipedia.org/wiki/Charles_George_Broyden&#34;&gt;here&lt;/a&gt;). This is a quasi-Newton method that uses function values and gradients to build up a picture of the surface to be optimized.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;control = list(trace=0,REPORT=1)&lt;/strong&gt; indicates more parameters that will be passed to &lt;code&gt;optim()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;REPORT&lt;/strong&gt; sets the the frequency of reports for the “BFGS”, “L-BFGS-B” and “SANN” methods if control$trace is positive. Defaults to every 10 iterations for “BFGS” and “L-BFGS-B”, or every 100 temperatures for “SANN”. (Note: SANN is a variant of the simulated annealing method presented by C. J. P. Belisle (1992) &lt;em&gt;Convergence theorems for a class of simulated annealing algorithms on R&lt;sup&gt;d&lt;/sup&gt;&lt;/em&gt; Journal of Applied Probability.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;trace&lt;/strong&gt; is also passed to&lt;code&gt;optim()&lt;/code&gt;. trace must be a non-negative integer. If positive, tracing information on the progress of the optimization is produced. Higher values may produce more tracing information.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1 &amp;lt;- msm(state~years, subject = PTNUM, data = df1, center= center, 
             qmatrix=Q, obstype = obstype, deathexact = deathexact, method = method,
             covariates = covariates, control = control)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First check to see if the model has converged. For the BFGS method, possible convergence codes returned by &lt;code&gt;optim()&lt;/code&gt; are:
0 indicates convergence, 1 indicates that the maximum iteration limit has been reached, 51 and 52 indicate warnings.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Model Status
conv &amp;lt;- model_1$opt$convergence; cat(&amp;quot;Convergence code =&amp;quot;, conv,&amp;quot;\n&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Convergence code = 0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, look at a measure of how well the model fits the data proposed by using a visual test proposed by &lt;a href=&#34;https://onlinelibrary.wiley.com/doi/10.1002/sim.4780130803&#34;&gt;Gentleman et al. (1994)&lt;/a&gt; which plots the observed numbers of individuals occupying a state at a series of times against forecasts from the fitted model, for each state. The &lt;code&gt;msm&lt;/code&gt; function &lt;code&gt;plot.prevalence.msm()&lt;/code&gt; produces a perfectly adequate base R plot. However, to emphasize that &lt;code&gt;msm&lt;/code&gt; users are not limited to base R plots, I’ll do a little extra work to use &lt;code&gt;ggplot()&lt;/code&gt;. When a package author is kind enough to provide an extractor function you can do anything you want with the data.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;prevalence.msm()&lt;/code&gt; function extracts both the observed and forecast prevalence matrices.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;prev &amp;lt;- prevalence.msm(model_1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This not very elegant, but straightforward code reshapes the data and plots.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# reshape observed prevalence
do1 &amp;lt;-as_tibble(row.names(prev$Observed)) %&amp;gt;% rename(time = value) %&amp;gt;% 
          mutate(time = as.numeric(time))
do2 &amp;lt;-as_tibble(prev$Observed) %&amp;gt;% mutate(type = &amp;quot;observed&amp;quot;)
do &amp;lt;- cbind(do1,do2) %&amp;gt;% select(-Total)
do_l &amp;lt;- do %&amp;gt;% gather(state, number, -time, -type)
# reshape expected prevalence
de1 &amp;lt;-as_tibble(row.names(prev$Expected)) %&amp;gt;% rename(time = value) %&amp;gt;% 
          mutate(time = as.numeric(time))
de2 &amp;lt;-as_tibble(prev$Expected) %&amp;gt;% mutate(type = &amp;quot;expected&amp;quot;)
de &amp;lt;- cbind(de1,de2) %&amp;gt;% select(-Total) 
de_l &amp;lt;- de %&amp;gt;% gather(state, number, -time, -type) 

# bind into a single data frame
prev_l &amp;lt;-rbind(do_l,de_l) %&amp;gt;% mutate(type = factor(type),
                                     state = factor(state),
                                     time = round(time,3))


# plot for comparison
prev_gp &amp;lt;- prev_l %&amp;gt;% group_by(state)
pp &amp;lt;- prev_l %&amp;gt;% ggplot() +
     geom_line(aes(time, number, color = type)) +
     xlab(&amp;quot;time&amp;quot;) +
     ylab(&amp;quot;&amp;quot;) +
     ggtitle(&amp;quot;&amp;quot;)
pp + facet_wrap(~state)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2023/04/19/multistate-models-for-medical-applications/index_files/figure-html/unnamed-chunk-13-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The agreement of the observed and forecast prevalence for states 1 through 3 look pretty good. After about 8 years the observed deaths are notably higher than the forecast. As Jackson points out (See the &lt;a href=&#34;https://cran.r-project.org/web/packages/msm/vignettes/msm-manual.pdf&#34;&gt;msm Manual&lt;/a&gt; page 33), this kind of discrepancy could indicate that the underlying process is not homogeneous. I have attempted to capture this non-homogeneity by having some of the transitions depend on time. And, although the plot above looks a little better that than the plot in the manual, which does not attempt to model non-homogeneity, it is apparent that there is room to find a better model!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;survival-curves-and-calculations&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Survival Curves and Calculations&lt;/h3&gt;
&lt;p&gt;Now, we can jump straight to the major result and look at the fitted survival curves. There is a &lt;code&gt;plot()&lt;/code&gt; method for &lt;code&gt;msm&lt;/code&gt; that will directly plot these curves. However, just to emphasize that if a package author is kind enough to provide a &lt;code&gt;plot&lt;/code&gt; method, it will probably not be too difficult to hack the code for the method to use an alternative plotting system. To save space, I will not show my code, but you can easily recreate it by stating with the &lt;code&gt;plot.msm()&lt;/code&gt; function, deleting the plotting parts and returning the values for time and the states “Health, Mild_CAV, and Severe_CAV which are used int the code below. Check my hack by running &lt;code&gt;plot(model_1)&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# plot_prep was obtained from plot.msm()
res &amp;lt;- plot_prep(model_1)
time &amp;lt;- res[[1]]
Health &amp;lt;- res[[2]]
Mild_CAV &amp;lt;- res[[3]]
Severe_CAV &amp;lt;- res[[4]]
df_w &amp;lt;- tibble(time,Health, Mild_CAV, Severe_CAV)
df_l &amp;lt;- df_w %&amp;gt;% gather(&amp;quot;state&amp;quot;, &amp;quot;survival&amp;quot;, -time)
p &amp;lt;- df_l %&amp;gt;% ggplot(aes(time, 1 - survival, group = state)) +
     geom_line(aes(color = state)) +
     xlab(&amp;quot;Years&amp;quot;) +
     ylab(&amp;quot;Probability&amp;quot;) +
     ggtitle(&amp;quot;Fitted Survival Probabilities&amp;quot;)
p&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2023/04/19/multistate-models-for-medical-applications/index_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;These curves indicate that a treatment that could prevent CAV or at least delay progression from mild CAV to severe CAV might prolong survival. Additionally, the Markov structure of the model permits extracting information that relates to disease progression and the total time spent in each state.&lt;/p&gt;
&lt;p&gt;The function &lt;code&gt;totlos.msm()&lt;/code&gt; estimates the total expected time that a patient will spend in each state. Parameter settings for this function include:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;start&lt;/strong&gt; = c(1,0,0,0) specifies that patients will start in state 1 with probability 0.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fromt&lt;/strong&gt; = 0 indicates starting at the beginning of the process.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;covariates&lt;/strong&gt; = “mean” indicates that the covariates will be set to their mean values for the calculation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;total_state_time &amp;lt;-totlos.msm(model_1,start = c(1,0,0,0), from = 0, covariates = &amp;quot;mean&amp;quot;)
total_state_time&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## State 1 State 2 State 3 State 4 
##   7.002   2.473   1.621     Inf&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The table indicates that the mean time a patient can expect to avoid CAV is about 7 years. After progressing to a Mild_CAV, a patient can expect five additional years.&lt;/p&gt;
&lt;p&gt;A more direct calculation based on the intensity matrix, Q, give the expected time to the “absorbing” state, Death, from each of the “transient” living states.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_to_death &amp;lt;- efpt.msm(model_1, tostate = 4, covariates = &amp;quot;mean&amp;quot;)
time_to_death&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 11.097  5.836  3.005  0.000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This agrees with the total state times calculated above.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sum(total_state_time[1:3])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 11.1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Within the scope of the information provided by the covariates, it is also possible to generate more individualized forecasts. For example, here is the expected time to death for a person starting off with no CAV at age 60, who received a heart from a 20 year old donor, 5 years after the transplant.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;efpt.msm(model_1, tostate = 4, start = c(1,0,0,0), covariates = list(years = 5, b_age = 60, dage = 20))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       [,1]
## [1,] 7.953&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A related quantity, mean sojourn time, is the mean time that each visit to each state is expected to last. Since, we are assuming a progressive disease model where each patient visits each state only once, the estimate should be close to total time spent in each state. However, Jackson notes that in a progressive model, sojourn time in the disease state will be greater than the expected length of stay in the disease state because the mean sojourn time in a state is conditional on entering the state, whereas the expected total time in a diseased state is a forecast for an individual, who may die before getting the disease. (See help(totlos.msm)). And indeed, that is what we see here for states 2 and 3.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sojourn.msm(model_1, covariates=&amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         estimates     SE     L     U
## State 1     7.002 0.4024 6.256 7.837
## State 2     3.525 0.3020 2.980 4.169
## State 3     3.005 0.3748 2.353 3.837&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;hazard-ratios&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Hazard Ratios&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;model_1&lt;/code&gt; will also product estimates of hazard ratios which show the estimate effect on transition intensities for each state.&lt;/p&gt;
&lt;p&gt;Here is the table of Hazard Ratios:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## msm(formula = state ~ years, subject = PTNUM, data = df1, qmatrix = Q,     obstype = obstype, covariates = covariates, deathexact = deathexact,     center = center, method = method, control = control)
## 
## Maximum likelihood estimates
## Baselines are with covariates set to 0
## 
## Transition intensities with hazard ratios for each covariate
##                   Baseline                         years              
## State 1 - State 1 -0.032750 (-0.0607897,-0.017644)                    
## State 1 - State 2  0.030957 ( 0.0160470, 0.059721) 1.112 (1.061,1.166)
## State 1 - State 4  0.001793 ( 0.0004703, 0.006836) 1.093 (1.012,1.182)
## State 2 - State 2 -0.395633 (-0.6488723,-0.241227)                    
## State 2 - State 3  0.264310 ( 0.1488153, 0.469441) 1.000              
## State 2 - State 4  0.131323 ( 0.0330133, 0.522385) 1.000              
## State 3 - State 3 -0.434548 (-0.9113857,-0.207192)                    
## State 3 - State 4  0.434548 ( 0.2071918, 0.911386) 1.000              
##                   b_age                dage                 
## State 1 - State 1                                           
## State 1 - State 2 1.001 (0.9884,1.014) 1.0281 (1.0159,1.040)
## State 1 - State 4 1.053 (1.0271,1.079) 1.0208 (1.0039,1.038)
## State 2 - State 2                                           
## State 2 - State 3 1.000                0.9932 (0.9756,1.011)
## State 2 - State 4 1.000                0.9757 (0.9298,1.024)
## State 3 - State 3                                           
## State 3 - State 4 1.000                0.9906 (0.9672,1.015)
## 
## -2 * log-likelihood:  3466&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The table shows that time, the covariate &lt;code&gt;years&lt;/code&gt;, affects disease progression represented by the the transition from state 1 to state 2, but has a smaller effect on the transition from state 1 to state 4.&lt;/p&gt;
&lt;p&gt;The covariate &lt;code&gt;b_age&lt;/code&gt;, the baseline age of patient at transplant time has a larger effect on dying before the onset of CAV than on the transition to CAV.&lt;/p&gt;
&lt;p&gt;The covariate &lt;code&gt;dage&lt;/code&gt; has a minor effect on the transitions from state 1 but apparently has no effect thereafter.&lt;/p&gt;
&lt;p&gt;The hazard ratios are computed by calculating the exponential of the estimated covariate effects on the log-transition intensities for the Markov process which are stored in the model object.&lt;/p&gt;
&lt;p&gt;To see how these work, first look at the baseline hazard ratios.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$baseline&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          State 1  State 2 State 3  State 4
## State 1 -0.03275  0.03096  0.0000 0.001793
## State 2  0.00000 -0.39563  0.2643 0.131323
## State 3  0.00000  0.00000 -0.4345 0.434548
## State 4  0.00000  0.00000  0.0000 0.000000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These baseline hazard ratios are computed from the model intensity matrix, Q, assuming no covariates. They can also be directly extracted from the model by &lt;code&gt;qmatrix.msm()&lt;/code&gt; extractor function with covariates set to zero.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;qmatrix.msm(model_1,  covariates = 0)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1                          State 2                         
## State 1 -0.032750 (-0.0607897,-0.017644)  0.030957 ( 0.0160470, 0.059721)
## State 2 0                                -0.395633 (-0.6488723,-0.241227)
## State 3 0                                0                               
## State 4 0                                0                               
##         State 3                          State 4                         
## State 1 0                                 0.001793 ( 0.0004703, 0.006836)
## State 2  0.264310 ( 0.1488153, 0.469441)  0.131323 ( 0.0330133, 0.522385)
## State 3 -0.434548 (-0.9113857,-0.207192)  0.434548 ( 0.2071918, 0.911386)
## State 4 0                                0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The 95% confidence limits are computed by assuming normality of the log-effect.&lt;/p&gt;
&lt;p&gt;A more representative value for the intensity matrix for this model can be obtained by setting the covariates to their expected mean values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;qmatrix.msm(model_1,  covariates = &amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1                      State 2                     
## State 1 -0.14281 (-0.15984,-0.12760)  0.10019 ( 0.08742, 0.11482)
## State 2 0                            -0.28369 (-0.33556,-0.23984)
## State 3 0                            0                           
## State 4 0                            0                           
##         State 3                      State 4                     
## State 1 0                             0.04262 ( 0.03339, 0.05441)
## State 2  0.21821 ( 0.17636, 0.26999)  0.06548 ( 0.03872, 0.11075)
## State 3 -0.33281 (-0.42498,-0.26064)  0.33281 ( 0.26064, 0.42498)
## State 4 0                            0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we may want to examine the contribution of the covariate covariates to the hazard ratios. To take a particular example, look at the &lt;code&gt;dage&lt;/code&gt; to the hazard ratios&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$dage&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2   State 3   State 4
## State 1       0 0.02771  0.000000  0.020575
## State 2       0 0.00000 -0.006783 -0.024625
## State 3       0 0.00000  0.000000 -0.009439
## State 4       0 0.00000  0.000000  0.000000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and focus on the contribution of &lt;code&gt;dage&lt;/code&gt; to the intensity matrix for the transition from state 3 to state 4 which is given as -0.009439 in the table above. Taking the exponential of this value, yields the hazard ratio for the &lt;code&gt;dage&lt;/code&gt; state 3 to 4 transition in the hazard ratio’s table we got by printing out &lt;code&gt;model_1&lt;/code&gt; above.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;exp(-.009439)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.9906&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The hazard ratio tables for the remaining covariates are given by:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$years&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1       0  0.1064       0 0.08933
## State 2       0  0.0000       0 0.00000
## State 3       0  0.0000       0 0.00000
## State 4       0  0.0000       0 0.00000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$b_age&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1   State 2 State 3 State 4
## State 1       0 0.0009645       0 0.05152
## State 2       0 0.0000000       0 0.00000
## State 3       0 0.0000000       0 0.00000
## State 4       0 0.0000000       0 0.00000&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;exploring-transition-probabilities-and-intensities&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Exploring Transition Probabilities and Intensities&lt;/h3&gt;
&lt;p&gt;It is also possible to look at the state transition matrix at different times and see how these probabilities change over time. Here we compute the transition matrix at 1 year.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 1, covariates = &amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2  State 3 State 4
## State 1  0.8669 0.08101 0.008493 0.04357
## State 2  0.0000 0.75300 0.160340 0.08666
## State 3  0.0000 0.00000 0.716903 0.28310
## State 4  0.0000 0.00000 0.000000 1.00000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and at 5 years.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 5, covariates = &amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1  0.4897  0.1761 0.07871  0.2556
## State 2  0.0000  0.2421 0.23419  0.5237
## State 3  0.0000  0.0000 0.18937  0.8106
## State 4  0.0000  0.0000 0.00000  1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Additionally, it is possible to examine the effect of covariates on transition probabilities. Here are the 5 year transition probabilities for a patient with a baseline age of 35.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 5, covariates = list(years = 5, b_age = 35, dage = 20))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1  0.5474  0.1674 0.07575  0.2095
## State 2  0.0000  0.2112 0.21622  0.5726
## State 3  0.0000  0.0000 0.16547  0.8345
## State 4  0.0000  0.0000 0.00000  1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and those who had the procedure at age 60.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 5, covariates = list(years = 5, b_age = 60, dage = 20))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1  0.3863  0.1409 0.06784  0.4050
## State 2  0.0000  0.2112 0.21622  0.5726
## State 3  0.0000  0.0000 0.16547  0.8345
## State 4  0.0000  0.0000 0.00000  1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that the transitions from CAV states are unaffected.&lt;/p&gt;
&lt;p&gt;To summarize: Continuous Time Markov Chains provide a natural framework for working with multi-state survival models. The &lt;code&gt;msm&lt;/code&gt; package is sufficiently sophisticated to permit modeling clinical process with level of fidelity that may provide insight about clinically observed disease progression. The software is relatively easy to use and there is plenty of documentation to help you get started.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;learning-more-about-multi-state-survival-models&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Learning More About Multi-State Survival Models&lt;/h3&gt;
&lt;p&gt;To dive deeper into multi-state survival models, I am sure you will find Ardo van den Hout’ &lt;a href=&#34;https://www.routledge.com/Multi-State-Survival-Models-for-Interval-Censored-Data/Hout/p/book/9780367570569&#34;&gt;Multi-State Survival Models for Interval-Censored Data&lt;/a&gt; extraordinarily helpful. There are many good textbooks about the basics of Continuous Time Markov Chains. I recommend J.R.Norris’ - &lt;a href=&#34;https://www.cambridge.org/core/books/markov-chains/A3F966B10633A32C8F06F37158031739&#34;&gt;Markov Chains&lt;/a&gt; which is still modestly priced. There are also many expositions freely available on the internet including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;David F. Anderson - &lt;a href=&#34;https://u.math.biu.ac.il/~amirgi/CTMCnotes.pdf&#34;&gt;Chapter 6: Continuous Time Markov Chains&lt;/a&gt; from &lt;a href=&#34;https://u.math.biu.ac.il/~amirgi/SBA.pdf&#34;&gt;Lecture Notes on Stochastic Processes with Applications in Biology&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Miranda Holmes-Cerfon - &lt;a href=&#34;https://cims.nyu.edu/~holmes/teaching/asa19/handout_Lecture4_2019.pdf&#34;&gt;Lecture 4: Continuous-time Markov Chains&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Søren Feodor Nielsen - &lt;a href=&#34;http://web.math.ku.dk/~susanne/kursusstokproc/ContinuousTime.pdf&#34;&gt;Continuous-time homogeneous Markov chains&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Karl Sigman - &lt;a href=&#34;http://www.columbia.edu/~ks20/stochastic-I/stochastic-I-CTMC.pdf&#34;&gt;Continuous-Time Markov Chains&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2023/04/19/multistate-models-for-medical-applications/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Beneath and Beyond the Cox Model</title>
      <link>https://rviews.rstudio.com/2022/09/06/deep-survival/</link>
      <pubDate>Tue, 06 Sep 2022 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2022/09/06/deep-survival/</guid>
      <description>
        


&lt;p&gt;The Cox Proportional Hazards model has so dominated survival analysis over the past forty years that I imagine quite a few people who regularly analyze survival data might assume that the Cox model, along with the Kaplan-Meier estimator and a few standard parametric models, encompass just about everything there is to say about the subject. It would not be surprising if this were true because it is certainly the case that these tools have dominated the teaching of survival analysis. Very few introductory textbooks look beyond the Cox Model and the handful of parametric models built around Gompertz, Weibull and logistic functions. But why do Cox models work so well? What is the underlying theory? How do all the pieces of the standard survival tool kit fit together?&lt;/p&gt;
&lt;p&gt;As it turns out, Kaplan-Meier estimators, the Cox Proportional Hazards model, Aalen-Johansen estimators, parametric models, multistate models, competing risk models, frailty models and almost every other survival analysis technique implemented in the vast array of R packages comprising the CRAN &lt;a href=&#34;https://cran.r-project.org/web/views/Survival.html&#34;&gt;Survival Task View&lt;/a&gt;, are supported by an elegant mathematical theory that formulates time-to-event analyses as stochastic counting models. The theory is about thirty years old. It was initiated by Odd Aalen in his 1975 Berkeley PhD dissertation, developed over the following twenty years largely by Scandinavian statisticians and their collaborators, and set down in more or less complete form in two complementary textbooks [5 and 9] by 1993. Unfortunately, because of its dependency on measure theory, martingales, stochastic integrals and other notions from advanced mathematics, it does not appear that the counting process theory of survival analysis has filtered down in a form that is readily accessible by practitioners.&lt;/p&gt;
&lt;p&gt;In this rest of this post, I would like to suggest a path for getting a working knowledge of this theory by introducing two very readable papers, which taken together, provide an excellent overview of the relationship of counting processes to some familiar aspects of survival analysis.&lt;/p&gt;
&lt;p&gt;The first paper, &lt;em&gt;The History of applications of martingales in survival analysis&lt;/em&gt; by Aalen, Andersen, Borgan, Gill, and Keiding [4] is a beautiful historical exposition of the counting process theory by master statisticians who developed a good bit of the theory themselves. Read through this paper in an hour of so and you will have an overview of the theory, see elementary explanations for some of the mathematics involved, and gain a working idea of how the major pieces of the theory fit together how they came together.&lt;/p&gt;
&lt;p&gt;The second paper, &lt;em&gt;Who needs the Cox model anyway?&lt;/em&gt; [7], is actually a teaching &lt;em&gt;note&lt;/em&gt; put together by Bendix Carstensen. It is a lesson with an attitude and the R code to back it up. Carstensen demonstrates the equivalence of the Cox model to a particular Poisson regression model. Working through this &lt;em&gt;note&lt;/em&gt; is like seeing a magic trick and then learning how it works.&lt;/p&gt;
&lt;p&gt;The following reproduces a portion of Carstensen’s &lt;em&gt;note&lt;/em&gt;. I provide some commentary and fill in a few elementary details in the hope that I can persuade you that it is worth the trouble to spend some time with it yourself.&lt;/p&gt;
&lt;p&gt;Carstensen use the North Central Cancer Treatment Group lung cancer survival data set which is included in the &lt;code&gt;survival&lt;/code&gt; package for his examples.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 228 × 10
##     inst  time status   age   sex ph.ecog ph.karno pat.karno meal.cal wt.loss
##    &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;     &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
##  1     3   306      2    74     1       1       90       100     1175      NA
##  2     3   455      2    68     1       0       90        90     1225      15
##  3     3  1010      1    56     1       0       90        90       NA      15
##  4     5   210      2    57     1       1       90        60     1150      11
##  5     1   883      2    60     1       0      100        90       NA       0
##  6    12  1022      1    74     1       1       50        80      513       0
##  7     7   310      2    68     2       2       70        60      384      10
##  8    11   361      2    71     2       2       60        80      538       1
##  9     1   218      2    53     1       1       70        80      825      16
## 10     7   166      2    61     1       2       70        70      271      34
## # … with 218 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It may not be obvious at first because there is no subject ID column, but this data frame contains one row for each of 228 subjects. The first column is an institution code. &lt;code&gt;time&lt;/code&gt; is the time of death or censoring. &lt;code&gt;status&lt;/code&gt; is the censoring indicator. The remaining columns are covariates. I select &lt;code&gt;time&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;sex&lt;/code&gt; and &lt;code&gt;age&lt;/code&gt;, drop the others from the our working data frame, and then replicate Carstensen’s preprocessing in a tidy way. The second line of &lt;code&gt;mutate()&lt;/code&gt; adds a small number to each event time to avoid ties.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(1952)
lung &amp;lt;- lung %&amp;gt;% select(time, status, age, sex) %&amp;gt;% 
                  mutate(sex = factor(sex,labels=c(&amp;quot;M&amp;quot;,&amp;quot;F&amp;quot;)),
                         time = time + round(runif(nrow(lung),-3,3),2))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To get a feel for the data we fit a Kaplan-Meier Curve stratified by sex.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;surv.obj &amp;lt;- with(lung, Surv(time, status == 2))
fit.by_sex &amp;lt;- survfit(surv.obj ~ sex, data = lung, conf.type = &amp;quot;log-log&amp;quot;)
autoplot(fit.by_sex,
          xlab = &amp;quot;Survival Time (Days) &amp;quot;, 
          ylab = &amp;quot;Survival Probabilities&amp;quot;,
         main = &amp;quot;Kaplan-Meier plot of lung data by sex&amp;quot;) +  
 theme(plot.title = element_text(hjust = 0.5))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2022/09/06/deep-survival/index_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Next, following Carstensen, I fit the baseline Cox model to be used in the model comparisons below.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;m0.cox &amp;lt;- coxph( Surv( time, status==2 ) ~ age + sex, data=lung )
summary( m0.cox )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Call:
## coxph(formula = Surv(time, status == 2) ~ age + sex, data = lung)
## 
##   n= 228, number of events= 165 
## 
##          coef exp(coef) se(coef)     z Pr(&amp;gt;|z|)   
## age   0.01705   1.01720  0.00922  1.85   0.0643 . 
## sexF -0.52033   0.59433  0.16751 -3.11   0.0019 **
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
##      exp(coef) exp(-coef) lower .95 upper .95
## age      1.017      0.983     0.999     1.036
## sexF     0.594      1.683     0.428     0.825
## 
## Concordance= 0.603  (se = 0.025 )
## Likelihood ratio test= 14.4  on 2 df,   p=7e-04
## Wald test            = 13.8  on 2 df,   p=0.001
## Score (logrank) test = 14  on 2 df,   p=9e-04&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The hazard ratios for &lt;code&gt;age&lt;/code&gt; and &lt;code&gt;sexF&lt;/code&gt; are given in the output column labeled &lt;em&gt;exp(coef)&lt;/em&gt;. As Carstensen points out mortality increases by 1.7% per year of age at diagnosis and that women have 40% lower mortality than men.&lt;/p&gt;
&lt;p&gt;Carstensen next shows that this model can be exactly replicated by a particular and somewhat peculiar Poisson model. Doing this requires a shift in how time is conceived. In the Kaplan-Meier estimator and the Cox model, time is part of the response vector. In the counting process formulation, time is a covariate. Time is divided into many small intervals of length &lt;em&gt;h&lt;/em&gt; in which an individuals “exit status” , &lt;em&gt;d&lt;/em&gt; is recorded. &lt;em&gt;d&lt;/em&gt; will be 1 if death occurred or 0 otherwise. The &lt;em&gt;h&lt;/em&gt; intervals represent an individual’s risk time. The pair (&lt;em&gt;d&lt;/em&gt;, &lt;em&gt;h&lt;/em&gt;) are used to calculate an empirical rate for the process which corresponds to the theoretical rate:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;λ(t) = lim&lt;sub&gt;h→0&lt;/sub&gt; &lt;strong&gt;P&lt;/strong&gt;[event in (t, t + h)| at risk at time t]/h      (*)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The first step in formulating a Poisson model is to set up a data structure that will allow for this more nuanced treatment of time. The function &lt;code&gt;Lexis&lt;/code&gt; from the &lt;a href=&#34;http://bendixcarstensen.com/Epi/&#34;&gt;&lt;code&gt;Epi&lt;/code&gt;&lt;/a&gt; package creates an object of class Lexis, a data frame with columns that will be used to distinguish event time (death or censoring time) from the time intervals in which subjects are at risk for the event. Collectively, these intervals span period from when the first until the last recorded time.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Lung &amp;lt;- Epi::Lexis( exit = list( tfe=time ),
  exit.status = factor( status, labels=c(&amp;quot;Alive&amp;quot;,&amp;quot;Dead&amp;quot;) ),
  data = lung )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## NOTE: entry.status has been set to &amp;quot;Alive&amp;quot; for all.
## NOTE: entry is assumed to be 0 on the tfe timescale.&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(Lung)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  lex.id tfe lex.dur lex.Cst lex.Xst   time status age sex
##       1   0   308.7   Alive    Dead  308.7      2  74   M
##       2   0   457.4   Alive    Dead  457.4      2  68   M
##       3   0  1008.6   Alive   Alive 1008.6      1  56   M
##       4   0   212.1   Alive    Dead  212.1      2  57   M
##       5   0   885.5   Alive    Dead  885.5      2  60   M
##       6   0  1023.7   Alive   Alive 1023.7      1  74   M&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The new variables are:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;lex.id - Subject ID&lt;/li&gt;
&lt;li&gt;tfe - Time from entry at the beginning of the follow-up interval&lt;/li&gt;
&lt;li&gt;lex.dur - Duration of the follow-up interval&lt;/li&gt;
&lt;li&gt;lex.Cst - Entry status (Alive in our case)&lt;/li&gt;
&lt;li&gt;lex.Xst - Exit status at the end of the follow-up interval: tfe + lex.dur (Either Alive or Dead in our case)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Next, the &lt;code&gt;time&lt;/code&gt; variable is sorted to produce a vector of endpoints for the at risk intervals and a new &lt;em&gt;time-split&lt;/em&gt; Lexis data frame is created using the &lt;code&gt;splitMulti()&lt;/code&gt; function from the &lt;a href=&#34;https://github.com/FinnishCancerRegistry/popEpi&#34;&gt;&lt;code&gt;popEpi&lt;/code&gt;&lt;/a&gt; package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Lung.s &amp;lt;- splitMulti( Lung, tfe=c(0,sort(unique(Lung$time))) )
head(Lung.s)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  lex.id   tfe lex.dur lex.Cst lex.Xst  time status age sex
##       1  0.00    7.67   Alive   Alive 308.7      2  74   M
##       1  7.67    1.88   Alive   Alive 308.7      2  74   M
##       1  9.55    0.23   Alive   Alive 308.7      2  74   M
##       1  9.78    0.57   Alive   Alive 308.7      2  74   M
##       1 10.35    2.25   Alive   Alive 308.7      2  74   M
##       1 12.60    0.45   Alive   Alive 308.7      2  74   M&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary( Lung.s )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##        
## Transitions:
##      To
## From    Alive Dead  Records:  Events: Risk time:  Persons:
##   Alive 25941  165     26106      165      69632       228&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;tfe&lt;/code&gt; tracks the time from entry into the study. This is calendar time.&lt;/p&gt;
&lt;p&gt;Carstensen then fits a Cox model to both the Lexis data set and the time-split Lexis data set and notes that the results match the original baseline Cox model. This is as one would expect since the three different data frames contain the same information. Nevertheless, it is a pleasant surprise that the &lt;code&gt;coxph()&lt;/code&gt; and &lt;code&gt;Surv()&lt;/code&gt; functions are flexible enough to assimilate the three different input data formats.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mL.cox &amp;lt;- coxph( Surv( tfe, tfe+lex.dur, lex.Xst==&amp;quot;Dead&amp;quot; ) ~ age + sex, eps=10^-11, iter.max=25, data=Lung )
mLs.cox &amp;lt;- coxph( Surv( tfe, tfe+lex.dur, lex.Xst==&amp;quot;Dead&amp;quot; ) ~ age + sex, eps=10^-11, iter.max=25, data=Lung.s )
round( cbind( ci.exp(m0.cox), ci.exp(mL.cox), ci.exp(mLs.cox) ), 6 )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      exp(Est.)  2.5%  97.5% exp(Est.)  2.5%  97.5% exp(Est.)  2.5%  97.5%
## age     1.0172 0.999 1.0357    1.0172 0.999 1.0357    1.0172 0.999 1.0357
## sexF    0.5943 0.428 0.8253    0.5943 0.428 0.8253    0.5943 0.428 0.8253&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, Carstensen executes what would seem to be a very strange modeling maneuver. He turns calender time, &lt;code&gt;tfe&lt;/code&gt; into a factor and fits a Cox model with &lt;code&gt;tfe&lt;/code&gt; as a covariate.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mLs.pois.fc &amp;lt;- glm( cbind(lex.Xst==&amp;quot;Dead&amp;quot;,lex.dur) ~ 0 + factor(tfe) + age + sex, family=poisreg, data=Lung.s ) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;An important technical point is that the time intervals in equation (*) above do not satisfy the independence assumption for a Poisson regression model. Nevertheless, the standard &lt;code&gt;glm()&lt;/code&gt; machinery can be used to fit the model because, as Carstensen demonstrates, the likelihood function for the conditional probabilities is proportional to the partial likelihood function of the Cox model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cbind( ci.exp(mLs.cox),ci.exp( mLs.pois.fc, subset=c(&amp;quot;age&amp;quot;,&amp;quot;sex&amp;quot;) ) )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      exp(Est.)  2.5%  97.5% exp(Est.)  2.5%  97.5%
## age     1.0172 0.999 1.0357    1.0172 0.999 1.0357
## sexF    0.5943 0.428 0.8253    0.5943 0.428 0.8253&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Carstensen concludes that this demonstrates that the Cox model is equivalent to a specific Poisson model which has one rate parameter for each time internal, and emphasizes that this is not a new result. He notes that the equivalence was demonstrated some time ago, theoretically by Theodore Holford [10], and in practice, by John Whitehead [14]. Also, in a vignette [15] for the &lt;code&gt;survival&lt;/code&gt; package Zhong et al. state that this &lt;em&gt;trick&lt;/em&gt; may be used to approximate a Cox model.&lt;/p&gt;
&lt;p&gt;Carstensen then demonstrates that more practical Poisson models can be fit by using splines to decrease the number of at risk intervals. The first uses a spline basis with arbitrary knot locations and the second fits a penalized spline &lt;code&gt;gam&lt;/code&gt; model.&lt;/p&gt;
&lt;p&gt;splines&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;t.kn &amp;lt;- c(0,25,100,500,1000) 
mLs.pois.sp &amp;lt;- glm( cbind(lex.Xst==&amp;quot;Dead&amp;quot;,lex.dur) ~ Ns(tfe,knots=t.kn) + age + sex, family=poisreg, data=Lung.s ) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Penalized splines&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mLs.pois.ps &amp;lt;- mgcv::gam( cbind(lex.Xst==&amp;quot;Dead&amp;quot;,lex.dur) ~ s(tfe) + age + sex, family=poisreg, data=Lung.s ) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Carstensen finishes up this portion of his analysis by noting the similarity of the estimates of age and sex effects from the different models.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ests &amp;lt;-
 rbind( ci.exp(m0.cox),
 ci.exp(mLs.cox),
 ci.exp(mLs.pois.fc,subset=c(&amp;quot;age&amp;quot;,&amp;quot;sex&amp;quot;)),
 ci.exp(mLs.pois.sp,subset=c(&amp;quot;age&amp;quot;,&amp;quot;sex&amp;quot;)),
 ci.exp(mLs.pois.ps,subset=c(&amp;quot;age&amp;quot;,&amp;quot;sex&amp;quot;)) )

cmp &amp;lt;- cbind( ests[c(1,3,5,7,9) ,],
 ests[c(1,3,5,7,9)+1,] )

rownames( cmp ) &amp;lt;-
 c(&amp;quot;Cox&amp;quot;,&amp;quot;Cox-split&amp;quot;,&amp;quot;Poisson-factor&amp;quot;,&amp;quot;Poisson-spline&amp;quot;,&amp;quot;Poisson-penSpl&amp;quot;)

 colnames( cmp )[c(1,4)] &amp;lt;- c(&amp;quot;age&amp;quot;,&amp;quot;sex&amp;quot;)
round( cmp,5 )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                  age   2.5% 97.5%    sex   2.5%  97.5%
## Cox            1.017 0.9990 1.036 0.5943 0.4280 0.8253
## Cox-split      1.017 0.9990 1.036 0.5943 0.4280 0.8253
## Poisson-factor 1.017 0.9990 1.036 0.5943 0.4280 0.8253
## Poisson-spline 1.016 0.9980 1.035 0.5993 0.4316 0.8322
## Poisson-penSpl 1.016 0.9983 1.035 0.6021 0.4338 0.8358&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This demonstration provides some convincing evidence that both parametric and non-parametric models are part of a single underlying theory! When you think about it, this is an astonishing idea. To further explore the counting process theory of survival models, I provide a definition of Aalen’s multiplicative intensity model and a list of references below that I hope you will find helpful.&lt;/p&gt;
&lt;p&gt;Finally, there is much more to Carstensen’s note than I have presented. He goes on to provide a fairly complete analysis of the lung data while looking at cumulative rates, survival, practical time splitting, time varying coefficients and more ideas along the way.&lt;/p&gt;
&lt;div id=&#34;appendix-multiplicative-intensity-model&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Appendix: Multiplicative Intensity Model&lt;/h3&gt;
&lt;p&gt;For some direct insight into how the Cox Proportional Hazards model fits into the counting process theory have a look at Odd Aalen’s definition of the multiplicative intensity model. Aalan begins his landmark 1978 paper &lt;em&gt;Nonparametric Inference For a Family of Counting Processes]&lt;/em&gt; [1] by defining the fundamental components of his multiplicative intensity model.&lt;/p&gt;
&lt;p&gt;Let &lt;strong&gt;N&lt;/strong&gt; = (N&lt;sub&gt;1&lt;/sub&gt;, . . . N&lt;sub&gt;k&lt;/sub&gt;) be a multivariate counting process which is a collection of univariate counting processes on the interval [0,t] each of which counts events in [0,t]. The N&lt;sub&gt;i&lt;/sub&gt; may depend on each other. Let σ(F&lt;sub&gt;t&lt;/sub&gt;) be the sigma algebra which represents the collection of all events that can be determined to have happened by time, t. Let &lt;strong&gt;α&lt;/strong&gt; = α&lt;sub&gt;1&lt;/sub&gt;, . . . α&lt;/sub&gt; be an unknown, non-negative function and let &lt;strong&gt;Y&lt;/strong&gt; = (Y&lt;sub&gt;1&lt;/sub&gt;, . . . Y&lt;sub&gt;k&lt;/sub&gt;) be a process observable over [0,t].&lt;/p&gt;
&lt;p&gt;Define Λ&lt;sub&gt;i&lt;/sub&gt;(t) = lim&lt;sub&gt;h→0&lt;/sub&gt;E(N&lt;sub&gt;i&lt;/sub&gt;(t + h) - N&lt;sub&gt;i&lt;/sub&gt;(t) | F&lt;sub&gt;t&lt;/sub&gt; )/h     i = 1, … k, to be the the intensity process of the counting process &lt;strong&gt;N&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Then the multiplicative intensity model is defined to be:
&lt;strong&gt;Λ&lt;sub&gt;i&lt;/sub&gt;(t) = α&lt;sub&gt;i&lt;/sub&gt;(t)Y&lt;sub&gt;i&lt;/sub&gt;(t)&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This last line certainly looks like the Cox model, and it is not to difficult to confirm that this is indeed the case. You can find the gory details in &lt;em&gt;Fleming and Harrington&lt;/em&gt; [9 p 126] or comprehensive monograph by &lt;em&gt;Andersen et al.&lt;/em&gt; [5 p481].&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;I believe that the following references (annotated with a few comments) comprise a reasonable basis from gaining familiarity with the counting process approach to survival modeling.&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jstor.org/stable/2958850&#34;&gt;Aalen (1978)&lt;/a&gt; Odd O. Aalen &lt;em&gt;Nonparametric Inference For a Family of Counting Processes&lt;/em&gt;. The Annals of Statistics 1978, vol 6, no 4, 701-726 &lt;em&gt;This is the ‘Ur’ paper for the multiplicative intensity process. At least the first half should be approachable with some knowledge of measure theory and conditional expectation.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jstor.org/stable/4615704?read-now=1&amp;amp;seq=1#metadata_info_tab_contents&#34;&gt;Aalen &amp;amp; Johansen (1978)&lt;/a&gt; Odd O. Allen and Soren Johansen. &lt;em&gt;An Empirical Transition Matrix for Non-homogenous Markov Chains Based on Censored Observations&lt;/em&gt;. Scand J Statistics 5: 141-150, 1978. &lt;em&gt;This is the source of the Aalen-Johansen estimator. The &lt;code&gt;etm&lt;/code&gt; package provides an implementation.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.amazon.com/Survival-Event-History-Analysis-Statistics/dp/0387202870/ref=sr_1_1?crid=1XEV3T73GK527&amp;amp;keywords=Survival+and+Event+History+Analysis%3B+A+Process+Point+of+View&amp;amp;qid=1662492084&amp;amp;sprefix=survival+and+event+history+analysis+a+process+point+of+view%2Caps%2C119&amp;amp;sr=8-1&#34;&gt;Aalen et al. (2008)&lt;/a&gt; Odd O. Aalen, Ørnulf Borgan and Håkon K. Gjessing. &lt;em&gt;Survival and Event History Analysis; A Process Point of View&lt;/em&gt; Springer Verlag 2008. &lt;em&gt;This is definitely the text to read first. It is comprehensive, takes a modern point of view, is well written, and introduces the difficult mathematics without all of the technical details that often slow down the process of learning some new mathematics.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jehps.net/juin2009/Aalenetal.pdf&#34;&gt;Aalen et al. (2009)&lt;/a&gt;. Odd O. Aalen, Per Kragh Andersen, Ørnulf Borgan, Richard D. Gill and Niels Keiding. &lt;em&gt;History of applications of martingales in survival analysis&lt;/em&gt; vol 5, no 1, June 2009&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.amazon.com/Statistical-Counting-Processes-Springer-Statistics/dp/0387978720/ref=sr_1_1?crid=YA0D7DHM43ZC&amp;amp;keywords=Statistical+Models+Based+on+Counting+Processes&amp;amp;qid=1662492269&amp;amp;sprefix=statistical+models+based+on+counting+processes%2Caps%2C124&amp;amp;sr=8-1&#34;&gt;Andersen et al. (1993)&lt;/a&gt; Per Kragh Andersen, Ørnulf Borgan, Richard D. Gill, Niels Keiding. &lt;em&gt;Statistical Models Based on Counting Processes&lt;/em&gt;. Springer-Verlag, 1993 &lt;em&gt;This text presents numerous examples along with a discussion of the theory and emphasizes parametric models.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.duo.uio.no/bitstream/handle/10852/10287/1/stat-res-03-97.pdf&#34;&gt;Borgan (1997)&lt;/a&gt;. Ørnulf Borgan &lt;em&gt;Three contributions to the Encyclopedia of Biostatistics: The Nelson-Aalen, Kaplan-Meier, and Aalen-Johansen estimators&lt;/em&gt; - &lt;em&gt;Very clear summaries.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://bendixcarstensen.com/WntCma.pdf&#34;&gt;Carstensen (2019)&lt;/a&gt; Bendex Carstensen. &lt;em&gt;Who needs the Cox model anyway?&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://www.biecek.pl/statystykamedyczna/cox.pdf&#34;&gt;Cox (1972)&lt;/a&gt; &lt;em&gt;Regression Models and Life-Tables&lt;/em&gt; JRSS Vol. 34, No.2, 187-200&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.amazon.com/Counting-Processes-Survival-Analysis-Fleming/dp/0471769886/ref=sr_1_1?crid=2CXVSTHAPQ3AG&amp;amp;keywords=Counting+Processes+%26+Survival+Analysis&amp;amp;qid=1662492407&amp;amp;sprefix=counting+processes+%26+survival+analysis%2Caps%2C120&amp;amp;sr=8-1&#34;&gt;Fleming &amp;amp; Harrington (1991)&lt;/a&gt;. Thomas R. Fleming and David P. Harrington. &lt;em&gt;Counting Processes &amp;amp; Survival Analysis&lt;/em&gt;, John Wiley &amp;amp; Sons, Inc. 1991 &lt;em&gt;This text book develops all of the math needed and goes on to study non-parametric models including the Cox model.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jstor.org/stable/2529747&#34;&gt;Holford&lt;/a&gt;. &lt;em&gt;Life table with concomitant information. Biometrics&lt;/em&gt;, 32:587{597, 1976.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://shariq-mohammed.github.io/files/cbsa2019/1-intro-to-survival.html&#34;&gt;Mohammed (2019)&lt;/a&gt;. &lt;em&gt;Introduction to Survival Analysis using R&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/web/packages/survival/vignettes/survival.pdf&#34;&gt;Threneau (2022)&lt;/a&gt;. &lt;em&gt;The survival package&lt;/em&gt; (vignette)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.jstor.org/stable/2336057&#34;&gt;Therneau et al. (1990)&lt;/a&gt;. &lt;em&gt;Martingale based residuals for survival models&lt;/em&gt;. Biometrika 77, 147-160. &lt;em&gt;Martingale residuals appear to be the one use case where martingales openly surface in the survival calculations.&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://rss.onlinelibrary.wiley.com/doi/abs/10.2307/2346901&#34;&gt;Whitehead (1980)&lt;/a&gt;. &lt;em&gt;Fitting Cox’s regression model to survival data using GLIM&lt;/em&gt;. Applied Statistics, 29(3):268{275, 1980.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/web/packages/survival/vignettes/approximate.pdf&#34;&gt;Zhong et al. (2019)&lt;/a&gt;. &lt;em&gt;Approximating a Cox Model&lt;/em&gt;. This is a &lt;code&gt;survival&lt;/code&gt; package vignette.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2022/09/06/deep-survival/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Biologically Plausible Fake Survival Data</title>
      <link>https://rviews.rstudio.com/2020/11/02/simulating-biologically-plausible-survival-data/</link>
      <pubDate>Mon, 02 Nov 2020 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2020/11/02/simulating-biologically-plausible-survival-data/</guid>
      <description>
        
&lt;script src=&#34;/2020/11/02/simulating-biologically-plausible-survival-data/index_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;In two recent posts, one on the &lt;a href=&#34;https://rviews.rstudio.com/2020/10/08/fake-data-for-the-illness-death-model/&#34;&gt;Disease Progression Model&lt;/a&gt; and the other on &lt;a href=&#34;https://rviews.rstudio.com/2020/09/09/fake-data-with-r/&#34;&gt;Fake Data&lt;/a&gt;, I highlighted some of &lt;code&gt;R&#39;s&lt;/code&gt; tools for simulating data that exhibit desired correlations and other statistical properties. In this post, I’ll focus on a small cluster of &lt;code&gt;R&lt;/code&gt; packages that support generating biologically plausible survival data.&lt;/p&gt;
&lt;div id=&#34;background&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Background&lt;/h3&gt;
&lt;p&gt;In an impressive &lt;a href=&#34;https://core.ac.uk/download/pdf/191273838.pdf&#34;&gt;paper&lt;/a&gt; &lt;em&gt;Simulating biologically plausible complex survival data&lt;/em&gt; &lt;a href=&#34;https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.5823&#34;&gt;Crowther &amp;amp; Lambert (2013)&lt;/a&gt; that combines survival analysis theory and numerical methods, Michael Crowther and Paul Lambert address the problem of simulating plausible data in which event time, censuring and covariate distributions are plausible. They develop a methodology for conducting survival analysis studies, and also provide computational tools for moving beyond the usual exponential, Weibull and Gompertz models. Building on the work by &lt;a href=&#34;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.330.215&amp;amp;rep=rep1&amp;amp;type=pdf&#34;&gt;Bender et al. (2005)&lt;/a&gt; in establishing a framework for simulating survival data for Cox proportional hazards models, Crowther and Lambert discuss how modelers can incorporate non proportional model hazards, time varying effects, delayed entry and random effects and provide code examples based on the &lt;code&gt;Stata&lt;/code&gt; &lt;a href=&#34;https://www.stata-journal.com/sjpdf.html?articlenum=st0275&#34;&gt;&lt;code&gt;survsim&lt;/code&gt; package&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-survsim-package&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;The &lt;code&gt;survsim&lt;/code&gt; package&lt;/h3&gt;
&lt;p&gt;Not long after the &lt;code&gt;Stata&lt;/code&gt; package appeared, Moriña and Navarro released the &lt;code&gt;R&lt;/code&gt; &lt;a href=&#34;https://cran.r-project.org/package=survsim&#34;&gt;&lt;code&gt;survsim&lt;/code&gt; package&lt;/a&gt; which implements some of the features in the &lt;code&gt;Stata&lt;/code&gt; package for simulating complex survival data. The &lt;code&gt;R&lt;/code&gt; package does not have a vignette, but you can find several examples in the &lt;em&gt;JSS&lt;/em&gt; paper &lt;a href=&#34;https://www.jstatsoft.org/article/view/v059i02&#34;&gt;Moriña &amp;amp; Navarro (2014)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The following example from section 4.3 of the paper simulates adverse events for a clinical trial with 100 patients followed up for 30 days. The authors suggest that the three covariates &lt;strong&gt;x&lt;/strong&gt; could represent body mass index, age at entry to the cohort, and whether or not the subject has hypertension. This is a little bit unusual and sophisticated example of survival modeling.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(12345)
dist.ev &amp;lt;- c(&amp;quot;weibull&amp;quot;, &amp;quot;llogistic&amp;quot;, &amp;quot;weibull&amp;quot;)
anc.ev &amp;lt;- c(0.8, 0.9, 0.82)
beta0.ev &amp;lt;- c(3.56, 5.94, 5.78)
beta &amp;lt;- list(c(-0.04, -0.02, -0.01), c(-0.001, -0.0008, -0.0005),c(-0.7, -0.2, -0.1))
x &amp;lt;- list(c(&amp;quot;normal&amp;quot;, 26, 4.5), c(&amp;quot;unif&amp;quot;, 50, 75), c(&amp;quot;bern&amp;quot;, 0.25))
clinical.data &amp;lt;- mult.ev.sim(n = 100,      # number of patients in cohort
                            foltime = 30,  # maximal followup time
                            dist.ev,       # time to event distributions (t.e.d.)
                            anc.ev,        # parameters for t.d.e. distributions
                            beta0.ev,      # beta0 parameters for t.d.e. dist 
                            dist.cens = &amp;quot;weibull&amp;quot;, #censoring distribution
                            anc.cens = 1,  # parameters for censoring dist
                            beta0.cens = 5.2, # beta0 for censoring dist
                            z = list(c(&amp;quot;unif&amp;quot;, 0.6, 1.4)), # random effect dist
                            beta, # effect of covariate
                            x, # distributions of covariates
                            nsit = 3) # max number of adverse events for an individual
head(round(clinical.data,2))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   nid ev.num  time status start  stop    z     x   x.1 x.2
## 1   1      1  5.79      1     0  5.79 0.97 28.63 69.02   1
## 2   1      2 30.00      0     0 30.00 0.97 28.63 69.02   1
## 3   1      3 30.00      0     0 30.00 0.97 28.63 69.02   1
## 4   2      1  3.37      1     0  3.37 0.60 36.42 53.81   0
## 5   2      2 30.00      0     0 30.00 0.60 36.42 53.81   0
## 6   2      3 30.00      0     0 30.00 0.60 36.42 53.81   0&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;the-simsurv-package&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;The &lt;code&gt;simsurv&lt;/code&gt; package&lt;/h3&gt;
&lt;p&gt;In the &lt;a href=&#34;https://cran.r-project.org/web/packages/simsurv/vignettes/simsurv_usage.html&#34;&gt;vignette&lt;/a&gt; on &lt;em&gt;How to use the &lt;a href=&#34;https://cran.r-project.org/package=simsurv&#34;&gt;&lt;code&gt;simsurv&lt;/code&gt;&lt;/a&gt; package&lt;/em&gt;, the package authors Sam Brilleman and Alessandro Gasparini state that they directly modeled their package on the &lt;code&gt;Stata&lt;/code&gt; package&lt;code&gt;survsim&lt;/code&gt; and cite the Crowther and Lambert paper. They show how &lt;code&gt;survsim&lt;/code&gt; builds out much of the functionality envisioned there in examples that demonstrate the interplay between model fitting and simulation. Example 2 of the vignette is concerned with constructing fake data modeled on the German breast cancer data by &lt;a href=&#34;https://pubmed.ncbi.nlm.nih.gov/7931478/&#34;&gt;Schumacher et al.
(1994)&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(&amp;quot;brcancer&amp;quot;)
head(brcancer)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   id hormon rectime censrec
## 1  1      0    1814       1
## 2  2      1    2018       1
## 3  3      1     712       1
## 4  4      1    1807       1
## 5  5      0     772       1
## 6  6      0     448       1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The example begins by fitting alternative models to the data using functions from the &lt;a href=&#34;https://cran.r-project.org/package=flexsurv&#34;&gt;&lt;code&gt;flexsurv&lt;/code&gt;&lt;/a&gt; package of Jackson, Metcalfe and Amdahl. Two candidate models are proposed and a spline model giving the best fit is used to simulate data. The example concludes with more model fitting to examine the fake data. All of the examples in the vignette showcase the interplay between &lt;code&gt;simsurv&lt;/code&gt; and &lt;code&gt;flexsurv&lt;/code&gt; functions and emphasize the flexible modeling tools in &lt;code&gt;flexsruv&lt;/code&gt; for building custom survival models.&lt;/p&gt;
&lt;p&gt;The following code replicates the portion of Example 2 that illustrates the use of the &lt;code&gt;flexsurvspline()&lt;/code&gt; function which allows the calculation of the log cumulative hazard function to depend on knot locations.&lt;/p&gt;
&lt;p&gt;The code below produces the simulated data and uses the &lt;code&gt;survminer&lt;/code&gt; package of Kassambara et al. to produce high quality Kaplan-Meier plots.&lt;/p&gt;
&lt;p&gt;This line of code fits a three knot spline model to the &lt;code&gt;brcancer&lt;/code&gt; data. The &lt;code&gt;flexsurvspline()&lt;/code&gt; function, as with other functions in the &lt;code&gt;flexsurv&lt;/code&gt; package build on the basic functionality of the fundamental Terry Therneau’s &lt;code&gt;survival&lt;/code&gt; package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;true_mod &amp;lt;- flexsurv::flexsurvspline(Surv(rectime, censrec) ~ hormon, 
                                     data = brcancer, k = 3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This helper function returns the log cumulative hazard at time t&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;logcumhaz &amp;lt;- function(t, x, betas, knots) {
  
  # Obtain the basis terms for the spline-based log
  # cumulative hazard (evaluated at time t)
  basis &amp;lt;- flexsurv::basis(knots, log(t))
  
  # Evaluate the log cumulative hazard under the
  # Royston and Parmar specification
  res &amp;lt;- 
    betas[[&amp;quot;gamma0&amp;quot;]] * basis[[1]] + 
    betas[[&amp;quot;gamma1&amp;quot;]] * basis[[2]] +
    betas[[&amp;quot;gamma2&amp;quot;]] * basis[[3]] +
    betas[[&amp;quot;gamma3&amp;quot;]] * basis[[4]] +
    betas[[&amp;quot;gamma4&amp;quot;]] * basis[[5]] +
    betas[[&amp;quot;hormon&amp;quot;]] * x[[&amp;quot;hormon&amp;quot;]]
  
  res
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;simsurv()&lt;/code&gt; functions generates the simulated survival data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;covariates &amp;lt;- data.frame(id = 1:686, hormon = rbinom(686, 1, 0.5))
sim_data &amp;lt;- simsurv(
               betas = true_mod$coefficients, # &amp;quot;true&amp;quot; parameter values
               x = covariates,            # covariate data for 686 individuals
               knots = true_mod$knots,    # knot locations for splines
               logcumhazard = logcumhaz,  # definition of log cum hazard
               maxt = NULL,               # no right-censoring
               interval = c(1E-8,100000)) # interval for root finding
sim_data &amp;lt;- merge(covariates, sim_data)
head(sim_data)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   id hormon eventtime status
## 1  1      1     240.4      1
## 2  2      0     942.7      1
## 3  3      1     463.5      1
## 4  4      0    1762.1      1
## 5  5      0    3976.4      1
## 6  6      0    2288.2      1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We use the &lt;code&gt;surv_fit&lt;/code&gt; function from the &lt;code&gt;survminer&lt;/code&gt; package to fit the Kaplan-Meier curves&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;KM_data &amp;lt;- survminer::surv_fit(Surv(rectime, censrec) ~ 1, data = brcancer)
KM_data_sim &amp;lt;- survminer::surv_fit(Surv(eventtime, status) ~ 1, data = sim_data)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, plotting the curves shows that the simulsted data does appear to plausibly resemble the original data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggsurvplot_combine(list(KM_data, KM_data_sim),
                risk.table = TRUE,
                conf.int = TRUE,
                censor = FALSE,
                conf.int.style = &amp;quot;step&amp;quot;,
                tables.theme = theme_cleantable(),
                palette = &amp;quot;jco&amp;quot;)

plot.new() 
print(p,newpage = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2020/11/02/simulating-biologically-plausible-survival-data/index_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I hope you find this small post helpful. The CRAN task view on Survival Analysis is a fantastic resource, but it can be a daunting task for non-experts to know where to begin to unravel the secrets there without a thread to pull on.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2020/11/02/simulating-biologically-plausible-survival-data/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>R/Medicine 2019 Workshops</title>
      <link>https://rviews.rstudio.com/2019/09/12/r-medicine-2019-workshops/</link>
      <pubDate>Thu, 12 Sep 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/09/12/r-medicine-2019-workshops/</guid>
      <description>
        &lt;p&gt;&lt;a href=&#34;https://r-medicine.com/&#34;&gt;R/Medicine 2019&lt;/a&gt; kicked off on Thursday with two outstanding workshops. It was difficult to choose between the two, but fortunately both presenters developed rich sets of materials that are available online.&lt;/p&gt;

&lt;p&gt;Alison Hill delivered &lt;a href=&#34;https://rmd4medicine.netlify.com/&#34;&gt;R Markdown for Medicine&lt;/a&gt; with an elegant HTML exposition masterfully created to cultivate beginners while still engaging experienced R Markdown users.
&lt;img src=&#34;/post/2019-09-12-rmedicine_files/surgery.jpg&#34; height = &#34;400&#34; width=&#34;600&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://unsplash.com/photos/FvNp_SY4kF0&#34;&gt;Photo by Samuel Zeller on Unsplash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In four sections: (1) &lt;a href=&#34;https://rmd4medicine.netlify.com/materials/01-rmd-anatomy/&#34;&gt;R Markdown Anatomy&lt;/a&gt;, (2) &lt;a href=&#34;https://rmd4medicine.netlify.com/materials/02-output-tables/&#34;&gt;Outputs and Tables&lt;/a&gt;, (3) &lt;a href=&#34;https://rmd4medicine.netlify.com/materials/03/&#34;&gt;Graphics for Communication&lt;/a&gt; and (4)
&lt;a href=&#34;https://rmd4medicine.netlify.com/materials/04-data-workflows/&#34;&gt;Data and Workflows&lt;/a&gt; she developed aspects of R Markdown aimed at statisticians and clinicians writing medical document which should also delight a wide audience of R Markdown users.&lt;/p&gt;

&lt;p&gt;In the parallel session, Elizabeth (Beth) Atkinson distilled years of experience &lt;a href=&#34;https://github.com/bethatkinson/rmed2019_surv&#34;&gt;Wrangling survival data&lt;/a&gt; at the Mayo Clinic while presenting new functionality from version 3.0 of Terry Therneau&amp;rsquo;s &lt;code&gt;Survival&lt;/code&gt; package which contains significant new material on multi-state models. (Version 3.0 is expected to make it to CRAN very soon, but if you can&amp;rsquo;t wait, you can install the new version from GitHub with: &lt;code&gt;install_github(&amp;quot;therneau/survival&amp;quot;, dependencies=TRUE)&lt;/code&gt;. Terry, who will be delivering the opening keynote presentation, also attended the workshop. It was a rare treat to hear Beth and Terry discuss best practices, pitfalls and common errors while fielding questions from the attendees. Beth assembled so much material it will take a &amp;ldquo;month of Sundays&amp;rdquo; to work through it all, but I doubt that there is a better source of material anywhere that makes the special difficulties of wrangling survival data more easily accessible. The following gem shows up early in the presentation.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2019-09-12-rmedicine_files/kp.png&#34; height = &#34;400&#34; width=&#34;600&#34;&gt;&lt;/p&gt;

&lt;p&gt;R/Medicine is off to a great start.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/09/12/r-medicine-2019-workshops/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Survival Analysis with R</title>
      <link>https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/</link>
      <pubDate>Mon, 25 Sep 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/</guid>
      <description>
        


&lt;p&gt;With roots dating back to at least 1662 when John Graunt, a London merchant, published an extensive set of inferences based on mortality records, survival analysis is one of the oldest subfields of Statistics [1]. Basic life-table methods, including techniques for dealing with censored data, were discovered before 1700 [2], and in the early eighteenth century, the old masters - de Moivre working on annuities, and Daniel Bernoulli studying competing risks for the analysis of smallpox inoculation - developed the modern foundations of the field [2]. Today, survival analysis models are important in Engineering, Insurance, Marketing, Medicine, and many more application areas. So, it is not surprising that R should be rich in survival analysis functions. CRAN’s Survival Analysis Task View, a curated list of the best relevant R survival analysis packages and functions, is indeed formidable. We all owe a great deal of gratitude to Arthur Allignol and Aurielien Latouche, the task view maintainers.&lt;/p&gt;
&lt;iframe src=&#34;https://cran.r-project.org/web/views/Survival.html&#34; width=&#34;90%&#34; height=&#34;450&#34;&gt;
&lt;/iframe&gt;
&lt;p&gt;Looking at the Task View on a small screen, however, is a bit like standing too close to a brick wall - left-right, up-down, bricks all around. It is a fantastic edifice that gives some idea of the significant contributions R developers have made both to the theory and practice of Survival Analysis. As well-organized as it is, however, I imagine that even survival analysis experts need some time to find their way around this task view. Newcomers - people either new to R or new to survival analysis or both - must find it overwhelming. So, it is with newcomers in mind that I offer the following narrow trajectory through the task view that relies on just a few packages: &lt;a href=&#34;https://CRAN.R-project.org/package=survival&#34;&gt;survival&lt;/a&gt;, &lt;a href=&#34;https://CRAN.R-project.org/package=ggplot2&#34;&gt;ggplot2&lt;/a&gt;, &lt;a href=&#34;https://CRAN.R-project.org/package=ggfortify&#34;&gt;ggfortify&lt;/a&gt;, and &lt;a href=&#34;https://CRAN.R-project.org/package=ranger&#34;&gt;ranger&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;survival&lt;/code&gt; package is the cornerstone of the entire R survival analysis edifice. Not only is the package itself rich in features, but the object created by the &lt;code&gt;Surv()&lt;/code&gt; function, which contains failure time and censoring information, is the basic survival analysis data structure in R. Dr. Terry Therneau, the package author, began working on the survival package in 1986. The first public release, in late 1989, used the Statlib service hosted by Carnegie Mellon University. Thereafter, the package was incorporated directly into &lt;a href=&#34;http://www.mayo.edu/research/documents/tr53pdf/doc-10027379&#34;&gt;Splus&lt;/a&gt;, and subsequently into R.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ggfortify&lt;/code&gt; enables producing handsome, one-line survival plots with &lt;code&gt;ggplot2::autoplot&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ranger&lt;/code&gt; might be the surprise in my very short list of survival packages. The &lt;code&gt;ranger()&lt;/code&gt; function is well-known for being a fast implementation of the Random Forests algorithm for building ensembles of classification and regression trees. But &lt;code&gt;ranger()&lt;/code&gt; also works with survival data. &lt;a href=&#34;https://arxiv.org/pdf/1508.04409.pdf&#34;&gt;Benchmarks&lt;/a&gt; indicate that &lt;code&gt;ranger()&lt;/code&gt; is suitable for building time-to-event models with the large, high-dimensional data sets important to internet marketing applications. Since &lt;code&gt;ranger()&lt;/code&gt; uses standard &lt;code&gt;Surv()&lt;/code&gt; survival objects, it’s an ideal tool for getting acquainted with survival analysis in this machine-learning age.&lt;/p&gt;
&lt;div id=&#34;load-the-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Load the data&lt;/h3&gt;
&lt;p&gt;This first block of code loads the required packages, along with the &lt;code&gt;veteran&lt;/code&gt; dataset from the &lt;code&gt;survival&lt;/code&gt; package that contains data from a two-treatment, randomized trial for lung cancer.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(survival)
library(ranger)
library(ggplot2)
library(dplyr)
library(ggfortify)

#------------
data(veteran)
head(veteran)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   trt celltype time status karno diagtime age prior
## 1   1 squamous   72      1    60        7  69     0
## 2   1 squamous  411      1    70        5  64    10
## 3   1 squamous  228      1    60        3  38     0
## 4   1 squamous  126      1    60        9  63    10
## 5   1 squamous  118      1    70       11  65    10
## 6   1 squamous   10      1    20        5  49     0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The variables in &lt;code&gt;veteran&lt;/code&gt; are: * &lt;code&gt;trt&lt;/code&gt;: 1=standard 2=test * &lt;code&gt;celltype&lt;/code&gt;: 1=squamous, 2=small cell, 3=adeno, 4=large * &lt;code&gt;time&lt;/code&gt;: survival time in days * &lt;code&gt;status&lt;/code&gt;: censoring status * &lt;code&gt;karno&lt;/code&gt;: Karnofsky performance score (100=good) * &lt;code&gt;diagtime&lt;/code&gt;: months from diagnosis to randomization * &lt;code&gt;age&lt;/code&gt;: in years * &lt;code&gt;prior&lt;/code&gt;: prior therapy 0=no, 10=yes&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;kaplan-meier-analysis&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Kaplan Meier Analysis&lt;/h3&gt;
&lt;p&gt;The first thing to do is to use &lt;code&gt;Surv()&lt;/code&gt; to build the standard survival object. The variable &lt;code&gt;time&lt;/code&gt; records survival time; &lt;code&gt;status&lt;/code&gt; indicates whether the patient’s death was observed (&lt;code&gt;status = 1&lt;/code&gt;) or that survival time was censored (&lt;code&gt;status = 0&lt;/code&gt;). Note that a “+” after the time in the print out of &lt;code&gt;km&lt;/code&gt; indicates censoring.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Kaplan Meier Survival Curve
km &amp;lt;- with(veteran, Surv(time, status))
head(km,80)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1]  72  411  228  126  118   10   82  110  314  100+  42    8  144   25+
## [15]  11   30  384    4   54   13  123+  97+ 153   59  117   16  151   22 
## [29]  56   21   18  139   20   31   52  287   18   51  122   27   54    7 
## [43]  63  392   10    8   92   35  117  132   12  162    3   95  177  162 
## [57] 216  553  278   12  260  200  156  182+ 143  105  103  250  100  999 
## [71] 112   87+ 231+ 242  991  111    1  587  389   33&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To begin our analysis, we use the formula &lt;code&gt;Surv(futime, status) ~ 1&lt;/code&gt; and the &lt;code&gt;survfit()&lt;/code&gt; function to produce the &lt;a href=&#34;https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator&#34;&gt;Kaplan-Meier&lt;/a&gt; estimates of the probability of survival over time. The &lt;code&gt;times&lt;/code&gt; parameter of the &lt;code&gt;summary()&lt;/code&gt; function gives some control over which times to print. Here, it is set to print the estimates for 1, 30, 60 and 90 days, and then every 90 days thereafter. This is the simplest possible model. It only takes three lines of R code to fit it, and produce numerical and graphical summaries.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;km_fit &amp;lt;- survfit(Surv(time, status) ~ 1, data=veteran)
summary(km_fit, times = c(1,30,60,90*(1:10)))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Call: survfit(formula = Surv(time, status) ~ 1, data = veteran)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1    137       2    0.985  0.0102      0.96552       1.0000
##    30     97      39    0.700  0.0392      0.62774       0.7816
##    60     73      22    0.538  0.0427      0.46070       0.6288
##    90     62      10    0.464  0.0428      0.38731       0.5560
##   180     27      30    0.222  0.0369      0.16066       0.3079
##   270     16       9    0.144  0.0319      0.09338       0.2223
##   360     10       6    0.090  0.0265      0.05061       0.1602
##   450      5       5    0.045  0.0194      0.01931       0.1049
##   540      4       1    0.036  0.0175      0.01389       0.0934
##   630      2       2    0.018  0.0126      0.00459       0.0707
##   720      2       0    0.018  0.0126      0.00459       0.0707
##   810      2       0    0.018  0.0126      0.00459       0.0707
##   900      2       0    0.018  0.0126      0.00459       0.0707&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#plot(km_fit, xlab=&amp;quot;Days&amp;quot;, main = &amp;#39;Kaplan Meyer Plot&amp;#39;) #base graphics is always ready
autoplot(km_fit)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-05-survival-analysis-with-r-with-comments_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Next, we look at survival curves by treatment.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;km_trt_fit &amp;lt;- survfit(Surv(time, status) ~ trt, data=veteran)
autoplot(km_trt_fit)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-05-survival-analysis-with-r-with-comments_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;And, to show one more small exploratory plot, I’ll do just a little data munging to look at survival by age. First, I create a new data frame with a categorical variable &lt;code&gt;AG&lt;/code&gt; that has values &lt;code&gt;LT60&lt;/code&gt; and &lt;code&gt;GT60&lt;/code&gt;, which respectively describe veterans younger and older than sixty. While I am at it, I make &lt;code&gt;trt&lt;/code&gt; and &lt;code&gt;prior&lt;/code&gt; into factor variables. But note, &lt;code&gt;survfit()&lt;/code&gt; and &lt;code&gt;npsurv()&lt;/code&gt; worked just fine without this refinement.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;vet &amp;lt;- mutate(veteran, AG = ifelse((age &amp;lt; 60), &amp;quot;LT60&amp;quot;, &amp;quot;OV60&amp;quot;),
              AG = factor(AG),
              trt = factor(trt,labels=c(&amp;quot;standard&amp;quot;,&amp;quot;test&amp;quot;)),
              prior = factor(prior,labels=c(&amp;quot;N0&amp;quot;,&amp;quot;Yes&amp;quot;)))

km_AG_fit &amp;lt;- survfit(Surv(time, status) ~ AG, data=vet)
autoplot(km_AG_fit)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-05-survival-analysis-with-r-with-comments_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Although the two curves appear to overlap in the first fifty days, younger patients clearly have a better chance of surviving more than a year.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;cox-proportional-hazards-model&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Cox Proportional Hazards Model&lt;/h3&gt;
&lt;p&gt;Next, I’ll fit a &lt;a href=&#34;https://en.wikipedia.org/wiki/Proportional_hazards_model&#34;&gt;Cox Proportional Hazards Model&lt;/a&gt; that makes use of all of the covariates in the data set.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Fit Cox Model
cox &amp;lt;- coxph(Surv(time, status) ~ trt + celltype + karno                   + diagtime + age + prior , data = vet)
summary(cox)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Call:
## coxph(formula = Surv(time, status) ~ trt + celltype + karno + 
##     diagtime + age + prior, data = vet)
## 
##   n= 137, number of events= 128 
## 
##                         coef  exp(coef)   se(coef)      z Pr(&amp;gt;|z|)    
## trttest            2.946e-01  1.343e+00  2.075e-01  1.419  0.15577    
## celltypesmallcell  8.616e-01  2.367e+00  2.753e-01  3.130  0.00175 ** 
## celltypeadeno      1.196e+00  3.307e+00  3.009e-01  3.975 7.05e-05 ***
## celltypelarge      4.013e-01  1.494e+00  2.827e-01  1.420  0.15574    
## karno             -3.282e-02  9.677e-01  5.508e-03 -5.958 2.55e-09 ***
## diagtime           8.132e-05  1.000e+00  9.136e-03  0.009  0.99290    
## age               -8.706e-03  9.913e-01  9.300e-03 -0.936  0.34920    
## priorYes           7.159e-02  1.074e+00  2.323e-01  0.308  0.75794    
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
##                   exp(coef) exp(-coef) lower .95 upper .95
## trttest              1.3426     0.7448    0.8939    2.0166
## celltypesmallcell    2.3669     0.4225    1.3799    4.0597
## celltypeadeno        3.3071     0.3024    1.8336    5.9647
## celltypelarge        1.4938     0.6695    0.8583    2.5996
## karno                0.9677     1.0334    0.9573    0.9782
## diagtime             1.0001     0.9999    0.9823    1.0182
## age                  0.9913     1.0087    0.9734    1.0096
## priorYes             1.0742     0.9309    0.6813    1.6937
## 
## Concordance= 0.736  (se = 0.03 )
## Rsquare= 0.364   (max possible= 0.999 )
## Likelihood ratio test= 62.1  on 8 df,   p=2e-10
## Wald test            = 62.37  on 8 df,   p=2e-10
## Score (logrank) test = 66.74  on 8 df,   p=2e-11&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cox_fit &amp;lt;- survfit(cox)
#plot(cox_fit, main = &amp;quot;cph model&amp;quot;, xlab=&amp;quot;Days&amp;quot;)
autoplot(cox_fit)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-05-survival-analysis-with-r-with-comments_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Note that the model flags small cell type, adeno cell type and karno as significant. However, some caution needs to be exercised in interpreting these results. While the Cox Proportional Hazard’s model is thought to be “robust”, a careful analysis would check the assumptions underlying the model. For example, the Cox model assumes that the covariates do not vary with time. In a &lt;a href=&#34;https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf&#34;&gt;vignette&lt;/a&gt; [12] that accompanies the &lt;code&gt;survival&lt;/code&gt; package Therneau, Crowson and Atkinson demonstrate that the Karnofsky score (karno) is, in fact, time-dependent so the assumptions for the Cox model are not met. The vignette authors go on to present a strategy for dealing with time dependent covariates.&lt;/p&gt;
&lt;p&gt;Data scientists who are accustomed to computing ROC curves to assess model performance should be interested in the Concordance statistic. The documentation for the &lt;code&gt;survConcordance()&lt;/code&gt; function in the &lt;code&gt;survival&lt;/code&gt; package defines concordance as “the probability of agreement for any two randomly chosen observations, where in this case agreement means that the observation with the shorter survival time of the two also has the larger risk score. The predictor (or risk score) will often be the result of a Cox model or other regression” and notes that: “For continuous covariates concordance is equivalent to Kendall’s tau, and for logistic regression is is equivalent to the area under the ROC curve.”&lt;/p&gt;
&lt;p&gt;To demonstrate using the &lt;code&gt;survival&lt;/code&gt; package, along with &lt;code&gt;ggplot2&lt;/code&gt; and &lt;code&gt;ggfortify&lt;/code&gt;, I’ll fit Aalen’s additive regression model for censored data to the veteran data. The documentation states: “The Aalen model assumes that the cumulative hazard H(t) for a subject can be expressed as a(t) + X B(t), where a(t) is a time-dependent intercept term, X is the vector of covariates for the subject (possibly time-dependent), and B(t) is a time-dependent matrix of coefficients.”&lt;/p&gt;
&lt;p&gt;The plots show how the effects of the covariates change over time. Notice the steep slope and then abrupt change in slope of karno.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;aa_fit &amp;lt;-aareg(Surv(time, status) ~ trt + celltype +
                 karno + diagtime + age + prior , 
                 data = vet)
aa_fit&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Call:
## aareg(formula = Surv(time, status) ~ trt + celltype + karno + 
##     diagtime + age + prior, data = vet)
## 
##   n= 137 
##     75 out of 97 unique event times used
## 
##                       slope      coef se(coef)      z        p
## Intercept          0.083400  3.81e-02 1.09e-02  3.490 4.79e-04
## trttest            0.006730  2.49e-03 2.58e-03  0.967 3.34e-01
## celltypesmallcell  0.015000  7.30e-03 3.38e-03  2.160 3.09e-02
## celltypeadeno      0.018400  1.03e-02 4.20e-03  2.450 1.42e-02
## celltypelarge     -0.001090 -6.21e-04 2.71e-03 -0.229 8.19e-01
## karno             -0.001180 -4.37e-04 8.77e-05 -4.980 6.28e-07
## diagtime          -0.000243 -4.92e-05 1.64e-04 -0.300 7.65e-01
## age               -0.000246 -6.27e-05 1.28e-04 -0.491 6.23e-01
## priorYes           0.003300  1.54e-03 2.86e-03  0.539 5.90e-01
## 
## Chisq=41.62 on 8 df, p=1.6e-06; test weights=aalen&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#summary(aa_fit)  # provides a more complete summary of results
autoplot(aa_fit)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-05-survival-analysis-with-r-with-comments_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;random-forests-model&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Random Forests Model&lt;/h3&gt;
&lt;p&gt;As a final example of what some might perceive as a data-science-like way to do time-to-event modeling, I’ll use the &lt;code&gt;ranger()&lt;/code&gt; function to fit a Random Forests Ensemble model to the data. Note however, that there is nothing new about building tree models of survival data. Terry Therneau also wrote the &lt;a href=&#34;https://CRAN.R-project.org/package=rpart&#34;&gt;&lt;code&gt;rpart&lt;/code&gt;&lt;/a&gt; package, R’s basic tree-modeling package, along with Brian Ripley. See section 8.4 for the &lt;a href=&#34;https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf&#34;&gt;rpart vignette&lt;/a&gt; [14] that contains a survival analysis example.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ranger()&lt;/code&gt; builds a model for each observation in the data set. The next block of code builds the model using the same variables used in the Cox model above, and plots twenty random curves, along with a curve that represents the global average for all of the patients. Note that I am using plain old base R graphics here.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# ranger model
r_fit &amp;lt;- ranger(Surv(time, status) ~ trt + celltype + 
                     karno + diagtime + age + prior,
                     data = vet,
                     mtry = 4,
                     importance = &amp;quot;permutation&amp;quot;,
                     splitrule = &amp;quot;extratrees&amp;quot;,
                     verbose = TRUE)

# Average the survival models
death_times &amp;lt;- r_fit$unique.death.times 
surv_prob &amp;lt;- data.frame(r_fit$survival)
avg_prob &amp;lt;- sapply(surv_prob,mean)

# Plot the survival models for each patient
plot(r_fit$unique.death.times,r_fit$survival[1,], 
     type = &amp;quot;l&amp;quot;, 
     ylim = c(0,1),
     col = &amp;quot;red&amp;quot;,
     xlab = &amp;quot;Days&amp;quot;,
     ylab = &amp;quot;survival&amp;quot;,
     main = &amp;quot;Patient Survival Curves&amp;quot;)

#
cols &amp;lt;- colors()
for (n in sample(c(2:dim(vet)[1]), 20)){
  lines(r_fit$unique.death.times, r_fit$survival[n,], type = &amp;quot;l&amp;quot;, col = cols[n])
}
lines(death_times, avg_prob, lwd = 2)
legend(500, 0.7, legend = c(&amp;#39;Average = black&amp;#39;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-05-survival-analysis-with-r-with-comments_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The next block of code illustrates how &lt;code&gt;ranger()&lt;/code&gt; ranks variable importance.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;vi &amp;lt;- data.frame(sort(round(r_fit$variable.importance, 4), decreasing = TRUE))
names(vi) &amp;lt;- &amp;quot;importance&amp;quot;
head(vi)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          importance
## karno        0.0903
## celltype     0.0323
## diagtime    -0.0012
## trt         -0.0013
## prior       -0.0027
## age         -0.0037&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that &lt;code&gt;ranger()&lt;/code&gt; flags &lt;code&gt;karno&lt;/code&gt; and &lt;code&gt;celltype&lt;/code&gt; as the two most important; the same variables with the smallest p-values in the Cox model. Also note that the importance results just give variable names and not level names. This is because &lt;code&gt;ranger&lt;/code&gt; and other tree models do not usually create dummy variables.&lt;/p&gt;
&lt;p&gt;But &lt;code&gt;ranger()&lt;/code&gt; does compute &lt;strong&gt;&lt;a href=&#34;https://pdfs.semanticscholar.org/7705/392f1068c76669de750c6d0da8144da3304d.pdf&#34;&gt;Harrell’s c-index&lt;/a&gt;&lt;/strong&gt; (See [8] p. 370 for the definition), which is similar to the Concordance statistic described above. This is a generalization of the ROC curve, which reduces to the &lt;a href=&#34;https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test&#34;&gt;Wilcoxon-Mann-Whitney statistic&lt;/a&gt; for binary variables, which in turn, is equivalent to computing the area under the ROC curve.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cat(&amp;quot;Prediction Error = 1 - Harrell&amp;#39;s c-index = &amp;quot;, r_fit$prediction.error)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Prediction Error = 1 - Harrell&amp;#39;s c-index =  0.3087233&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;An ROC value of .68 would normally be pretty good for a first try. But note that the &lt;code&gt;ranger&lt;/code&gt; model doesn’t do anything to address the time varying coefficients. This apparently is a challenge. In a 2011 &lt;a href=&#34;https://projecteuclid.org/download/pdfview_1/euclid.ssu/1315833185&#34;&gt;paper&lt;/a&gt; [16], Hamad observes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;However, in the context of survival trees, a further difficulty arises when time–varying effects are included. Hence, we feel that the interpretation of covariate effects with tree ensembles in general is still mainly unsolved and should attract future research.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I believe that the major use for tree-based models for survival data will be to deal with very large data sets.&lt;/p&gt;
&lt;p&gt;Finally, to provide an “eyeball comparison” of the three survival curves, I’ll plot them on the same graph.The following code pulls out the survival data from the three model objects and puts them into a data frame for &lt;code&gt;ggplot()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Set up for ggplot
kmi &amp;lt;- rep(&amp;quot;KM&amp;quot;,length(km_fit$time))
km_df &amp;lt;- data.frame(km_fit$time,km_fit$surv,kmi)
names(km_df) &amp;lt;- c(&amp;quot;Time&amp;quot;,&amp;quot;Surv&amp;quot;,&amp;quot;Model&amp;quot;)

coxi &amp;lt;- rep(&amp;quot;Cox&amp;quot;,length(cox_fit$time))
cox_df &amp;lt;- data.frame(cox_fit$time,cox_fit$surv,coxi)
names(cox_df) &amp;lt;- c(&amp;quot;Time&amp;quot;,&amp;quot;Surv&amp;quot;,&amp;quot;Model&amp;quot;)

rfi &amp;lt;- rep(&amp;quot;RF&amp;quot;,length(r_fit$unique.death.times))
rf_df &amp;lt;- data.frame(r_fit$unique.death.times,avg_prob,rfi)
names(rf_df) &amp;lt;- c(&amp;quot;Time&amp;quot;,&amp;quot;Surv&amp;quot;,&amp;quot;Model&amp;quot;)

plot_df &amp;lt;- rbind(km_df,cox_df,rf_df)

p &amp;lt;- ggplot(plot_df, aes(x = Time, y = Surv, color = Model))
p + geom_line()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-05-survival-analysis-with-r-with-comments_files/figure-html/unnamed-chunk-12-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;For this data set, I would put my money on a carefully constructed Cox model that takes into account the time varying coefficients. I suspect that there are neither enough observations nor enough explanatory variables for the &lt;code&gt;ranger()&lt;/code&gt; model to do better.&lt;/p&gt;
&lt;p&gt;This four-package excursion only hints at the Survival Analysis tools that are available in R, but it does illustrate some of the richness of the R platform, which has been under continuous development and improvement for nearly twenty years. The &lt;code&gt;ranger&lt;/code&gt; package, which suggests the &lt;code&gt;survival&lt;/code&gt; package, and &lt;code&gt;ggfortify&lt;/code&gt;, which depends on &lt;code&gt;ggplot2&lt;/code&gt; and also suggests the &lt;code&gt;survival&lt;/code&gt; package, illustrate how open-source code allows developers to build on the work of their predecessors. The documentation that accompanies the &lt;code&gt;survival&lt;/code&gt; package, the numerous online resources, and the statistics such as concordance and Harrell’s c-index packed into the objects produced by fitting the models gives some idea of the statistical depth that underlies almost everything R.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;some-tutorials-and-papers&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Some Tutorials and Papers&lt;/h3&gt;
&lt;p&gt;For a very nice, basic tutorial on survival analysis, have a look at the &lt;a href=&#34;https://www.openintro.org/download.php?file=survival_analysis_in_R&amp;amp;referrer=/stat/surv.php&#34;&gt;Survival Analysis in R&lt;/a&gt; [5] and the &lt;a href=&#34;https://CRAN.R-project.org/package=OIsurv&#34;&gt;OIsurv&lt;/a&gt; package produced by the folks at &lt;a href=&#34;https://www.openintro.org/stat/surv.php&#34;&gt;OpenIntro&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Look &lt;a href=&#34;https://d1rkab7tlqy5f1.cloudfront.net/EWI/Over%20de%20faculteit/Afdelingen/Applied%20Mathematics/Applied%20Probability/Risk/Download/Cooke_CPH_encycl.pdf&#34;&gt;here&lt;/a&gt; for an exposition of the Cox Proportional Hazard’s Model, and &lt;a href=&#34;http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Encyclopedia%20of%20Biostatistics2ndEd2005.pdf&#34;&gt;here&lt;/a&gt; [11] for an introduction to Aalen’s Additive Regression Model.&lt;/p&gt;
&lt;p&gt;For an elementary treatment of evaluating the proportional hazards assumption that uses the veterans data set, see the text by Kleinbaum and Klein [13].&lt;/p&gt;
&lt;p&gt;For an exposition of the sort of predictive survival analysis modeling that can be done with &lt;code&gt;ranger&lt;/code&gt;, be sure to have a look at Manuel Amunategui’s &lt;a href=&#34;http://amunategui.github.io/survival-ensembles/&#34;&gt;post&lt;/a&gt; and &lt;a href=&#34;https://www.youtube.com/watch?v=6q-UFJUZK0g&amp;amp;list=UUq4pm1i_VZqxKVVOz5qRBIA&amp;amp;index=1&#34;&gt;video&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See the &lt;a href=&#34;http://kooperberg.fhcrc.org/papers/1995trees.pdf&#34;&gt;1995 paper&lt;/a&gt; [15] by Intrator and Kooperberg for an early review of using classification and regression trees to study survival data.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;p&gt;For convenience, I have collected the references used throughout the post here.&lt;/p&gt;
&lt;p&gt;[1] Hacking, Ian. (2006) &lt;em&gt;The Emergence of Probability: A Philosophical Study of Early Ideas about Probability Induction and Statistical Inference.&lt;/em&gt; Cambridge University Press, 2nd ed., p. 11&lt;br /&gt;
[2] Andersen, P.K., Keiding, N. (1998) &lt;a href=&#34;http://www.pauldickman.com/survival/handouts/21%20-%20EoB%20Survival%20analysis%20overview.pdf&#34;&gt;&lt;em&gt;Survival analysis&lt;/em&gt;&lt;/a&gt; Encyclopedia of Biostatistics 6. Wiley, pp. 4452-4461 [3] Kaplan, E.L. &amp;amp; Meier, P. (1958). &lt;em&gt;Non-parametric estimation from incomplete observations&lt;/em&gt;, J American Stats Assn. 53, pp. 457–481, 562–563. [4] Cox, D.R. (1972). &lt;em&gt;Regression models and life-tables&lt;/em&gt; (with discussion), Journal of the Royal Statistical Society (B) 34, pp. 187–220.&lt;br /&gt;
[5] Diez, David. &lt;a href=&#34;https://www.openintro.org/download.php?file=survival_analysis_in_R&amp;amp;referrer=/stat/surv.php&#34;&gt;&lt;em&gt;Survival Analysis in R&lt;/em&gt;&lt;/a&gt;, &lt;a href=&#34;https://www.openintro.org/stat/surv.php&#34;&gt;OpenIntro&lt;/a&gt;&lt;br /&gt;
[6] Klein, John P and Moeschberger, Melvin L. &lt;em&gt;Survival Analysis Techniques for Censored and Truncated Data&lt;/em&gt;, Springer. (1997)&lt;br /&gt;
[7] Wright, Marvin &amp;amp; Ziegler, Andreas. (2017) &lt;a href=&#34;https://www.jstatsoft.org/article/view/v077i01&#34;&gt;&lt;em&gt;ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R&lt;/em&gt;&lt;/a&gt;, JSS Vol 77, Issue 1.&lt;br /&gt;
[8] Harrell, Frank, Lee, Kerry &amp;amp; Mark, Daniel. &lt;a href=&#34;https://pdfs.semanticscholar.org/7705/392f1068c76669de750c6d0da8144da3304d.pdf&#34;&gt;&lt;em&gt;Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors&lt;/em&gt;&lt;/a&gt;. Statistics in Medicine, Vol 15 (1996), pp. 361-387 [9] Amunategui, Manuel. &lt;a href=&#34;http://amunategui.github.io/survival-ensembles/&#34;&gt;&lt;em&gt;Survival Ensembles: Survival Plus Classification for Improved Time-Based Predictions in R&lt;/em&gt;&lt;/a&gt;&lt;br /&gt;
[10] NUS Course Notes. &lt;a href=&#34;https://courses.nus.edu.sg/course/stacar/internet/st3242/handouts/notes3.pdf&#34;&gt;&lt;em&gt;Chapter 3 The Cox Proportional Hazards Model&lt;/em&gt;&lt;/a&gt;&lt;br /&gt;
[11] Encyclopedia of Biostatistics, 2nd Edition (2005). &lt;a href=&#34;http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Encyclopedia%20of%20Biostatistics2ndEd2005.pdf&#34;&gt;&lt;em&gt;Aalen’s Additive Regression Model&lt;/em&gt;&lt;/a&gt; [12] Therneau et al. &lt;a href=&#34;https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf&#34;&gt;&lt;em&gt;Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model&lt;/em&gt;&lt;/a&gt;&lt;br /&gt;
[13] Kleinbaum, D.G. and Klein, M. &lt;a href=&#34;https://www.amazon.com/Survival-Analysis-Self-Learning-Statistics-Biology/dp/1441966455/ref=sr_1_1?ie=UTF8&amp;amp;qid=1504809712&amp;amp;sr=8-1&amp;amp;keywords=survival+analysis+a+self-learning+text&#34;&gt;&lt;em&gt;Survival Analysis, A Self Learning Text&lt;/em&gt;&lt;/a&gt; Springer (2005) [14] Therneau, T and Atkinson, E. &lt;a href=&#34;https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf&#34;&gt;&lt;em&gt;An Introduction to Recursive Partitioning Using RPART Routines&lt;/em&gt;&lt;/a&gt;&lt;br /&gt;
[15] Intrator, O. and Kooperberg, C. &lt;a href=&#34;http://kooperberg.fhcrc.org/papers/1995trees.pdf&#34;&gt;&lt;em&gt;Trees and splines in survival analysis&lt;/em&gt;&lt;/a&gt; Statistical Methods in Medical Research (1995)&lt;br /&gt;
[16] Bou-Hamad, I. &lt;a href=&#34;https://projecteuclid.org/download/pdfview_1/euclid.ssu/1315833185&#34;&gt;&lt;em&gt;A review of survival trees&lt;/em&gt;&lt;/a&gt; Statistics Surveys Vol.5 (2011)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Authors’s note: this post was originally published on April 26, 2017 but was subsequently withdrawn because of an error spotted by Dr. Terry Therneau. He observed that the Cox Portional Hazards Model fitted in that post did not properly account for the time varying covariates. This revised post makes use of a different data set, and points to resources for addressing time varying covariates. Many thanks to Dr. Therneau. Any errors that remain are mine.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
