<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Continuous Time Markov Chains on R Views</title>
    <link>https://rviews.rstudio.com/tags/continuous-time-markov-chains/</link>
    <description>Recent content in Continuous Time Markov Chains on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 19 Apr 2023 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/continuous-time-markov-chains/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Multistate Models for Medical Applications</title>
      <link>https://rviews.rstudio.com/2023/04/19/multistate-models-for-medical-applications/</link>
      <pubDate>Wed, 19 Apr 2023 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2023/04/19/multistate-models-for-medical-applications/</guid>
      <description>
        


&lt;p&gt;Clinical research studies and healthcare economics studies are frequently concerned with assessing the prognosis for survival in circumstances where patients suffer from a disease that progresses from state to state. Standard survival models only directly model two states: alive and dead. Multi-state models enable directly modeling disease progression where patients are observed to be in various states of health or disease at random intervals, but for which, except for death, the times of entering or leaving states are unknown. Multi-state models easily accommodate &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3684949/#:~:text=In%20statistical%20literature%2C%20interval%20censoring,instead%20of%20being%20observed%20exactly.&#34;&gt;interval censored&lt;/a&gt; intermediate states while making the usual assumption that death times are known but may be &lt;a href=&#34;https://en.wikipedia.org/wiki/Censoring_(statistics)&#34;&gt;right censored&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A natural way to conceptualize modeling the dynamics of disease progression with interval censored states is as continuous time Markov chains. The following diagram illustrates a possible disease progression model where there is some possibility of dying from any state, but otherwise a patient would progress from being healthy, to mild disease, to severe disease and then perhaps death.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: shape&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2023/04/19/multistate-models-for-medical-applications/index_files/figure-html/unnamed-chunk-1-1.png&#34; width=&#34;672&#34; /&gt;
It is true that in that the Markov assumption implies that the time patients spend in the various states are exponentially distributed. However, the mathematical theory of stochastic multi-state processes is very rich and can accommodate more realistic models with state dependent hazard rates that vary over time and other relaxations of the Markov assumption. Moreover, there is robust software in R (and other languages) that make multi-state stochastic survival models practical.&lt;/p&gt;
&lt;p&gt;In the remainder of this post, I present a variation of a disease progression model discussed by Ardo van den Hout in some detail in his incredibly informative and very readable monograph &lt;a href=&#34;https://www.routledge.com/Mult-i--State-Survival-Models-for-Interval-Censored-Data/Hout/p/book/9780367570569&#34;&gt;Multi-State Survival Models for Interval Censored Data&lt;/a&gt; . Also note that van den Hout’s model is itself an elaboration of the main example presented by Christopher Jackson in the &lt;a href=&#34;https://cran.r-project.org/web/packages/msm/vignettes/msm-manual.pdf&#34;&gt;vignette&lt;/a&gt; to his &lt;code&gt;msm&lt;/code&gt;package. This post presents a slower development of the model developed by Jackson and van den Hout that might be easier for a person not already familiar with the &lt;code&gt;msm&lt;/code&gt; package to follow.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)
library(tidymodels)
library(msm)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;the-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;The Data&lt;/h3&gt;
&lt;p&gt;The data set explored by both Jackson and van den Hout is the Cardiac Allograft Vasculopathy (CAV) data set which contains the individual histories of angiographic examinations of 622 heart transplant recipients collected at the Papworth Hospital in the United Kingdom. This data is included in the &lt;code&gt;msm&lt;/code&gt; package and is a good candidate to be the &lt;em&gt;iris&lt;/em&gt; dataset for progressive disease models. It is a rich data set with 2846 rows and multiple covariates, including patient age and time time since transplant, both of which can be use for time scales, multiple state transitions among four states and no missing values. Observations of intermediate states are interval censored and have been recorded varying time intervals. Deaths are “exact” or right censored.&lt;/p&gt;
&lt;p&gt;The following code creates a new variable that preserves the original state data for each observation and displays the data in tibble format.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(1234)
df &amp;lt;- tibble(cav) %&amp;gt;% mutate(o_state = state)

df&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2,846 × 11
##     PTNUM   age years  dage   sex pdiag cumrej state firstobs statemax o_state
##     &amp;lt;int&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;fct&amp;gt;  &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;int&amp;gt;    &amp;lt;dbl&amp;gt;   &amp;lt;int&amp;gt;
##  1 100002  52.5  0       21     0 IHD        0     1        1        1       1
##  2 100002  53.5  1.00    21     0 IHD        2     1        0        1       1
##  3 100002  54.5  2.00    21     0 IHD        2     2        0        2       2
##  4 100002  55.6  3.09    21     0 IHD        2     2        0        2       2
##  5 100002  56.5  4       21     0 IHD        3     2        0        2       2
##  6 100002  57.5  5.00    21     0 IHD        3     3        0        3       3
##  7 100002  58.4  5.85    21     0 IHD        3     4        0        4       4
##  8 100003  29.5  0       17     0 IHD        0     1        1        1       1
##  9 100003  30.7  1.19    17     0 IHD        1     1        0        1       1
## 10 100003  31.5  2.01    17     0 IHD        1     3        0        3       3
## # ℹ 2,836 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The state table which presents the number of times each pair of states were observed in successive observation times shows that 46 transitions from state 2 (Mild CAV) to state 1 (No CAV), 4 transitions from state 3 (Severe CAV) to Healthy and 13 transitions from Severe CAV to Mild CAV.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;statetable.msm(state = state, subject = PTNUM, data = df)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##     to
## from    1    2    3    4
##    1 1367  204   44  148
##    2   46  134   54   48
##    3    4   13  107   55&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I will follow van den Hout and assume these backward transitions are misclassified and alter the state variable so there is no back sliding. The following code does this in a tidy way and also creates a new variable b_age which records the baseline age at which patients entered the study. (Note: you can find van den Hout’s code &lt;a href=&#34;https://www.ucl.ac.uk/~ucakadl/Book/Ch1_CAV_MsmAnalysis.r&#34;&gt;here&lt;/a&gt;)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df1 &amp;lt;- df %&amp;gt;% group_by(PTNUM) %&amp;gt;% 
                     mutate(b_age = min(age),
                            state = cummax(state)
                     )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This transformation will make the state transition table conform to the diagram above, but with state 1 representing No CAV rather than Health.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;statetable.msm(state = state, subject = PTNUM, data = df1)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##     to
## from    1    2    3    4
##    1 1336  185   40  139
##    2    0  220   52   49
##    3    0    0  140   63&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;setting-up-and-running-the-model&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Setting Up and Running the Model&lt;/h3&gt;
&lt;p&gt;The next step is to set up the model using the function &lt;code&gt;msm()&lt;/code&gt; whose great flexibility means that some care must be taken to set parameter values.&lt;/p&gt;
&lt;p&gt;First, we set up the initial guess for the intensity matrix, Q, which determines the transition rates among states for a continuous time Markov chain. For the &lt;code&gt;msm&lt;/code&gt; function, positive values indicate possible transitions.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Intensity matrix Q:
q &amp;lt;- 0.01
Q &amp;lt;- rbind(c(0,q,0,q), c(0,0,q,q),c(0,0,0,q),c(0,0,0,0))
qnames &amp;lt;- c(&amp;quot;q12&amp;quot;,&amp;quot;q14&amp;quot;,&amp;quot;q23&amp;quot;,&amp;quot;q24&amp;quot;,&amp;quot;q34&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we set up the covariate structure which van den Hout discusses in his monograph, but does not show in the code on the book’s website referenced above. For this model, transitions from state 1 to state 2 and from state 1 to state 4 depend on time,&lt;code&gt;years&lt;/code&gt;, the age of the patient at transplant time &lt;code&gt;b_age&lt;/code&gt;, and &lt;code&gt;dage&lt;/code&gt;, the age of the donor. The other transitions depend only on &lt;code&gt;dage&lt;/code&gt;. So, we see that &lt;code&gt;msm()&lt;/code&gt; can deal with time varying covariates as well as permitting individual state transitions to be driven by different covariates.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;covariates = list(&amp;quot;1-2&amp;quot; = ~ years + b_age + dage , 
                  &amp;quot;1-4&amp;quot; = ~ years + b_age + dage ,
                  &amp;quot;2-3&amp;quot; = ~ dage,
                  &amp;quot;2-4&amp;quot; = ~ dage,
                  &amp;quot;3-4&amp;quot; = ~ dage)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, we set the remaining parameters for the model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;obstype &amp;lt;- 1
center &amp;lt;- FALSE
deathexact &amp;lt;- TRUE
method &amp;lt;- &amp;quot;BFGS&amp;quot;
control &amp;lt;- list(trace = 0, REPORT = 1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;obstype = 1&lt;/strong&gt; indicates that observations have been taken at arbitrary time points. They are &lt;em&gt;snapshots&lt;/em&gt; of the process that are common for panel data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;center = FALSE&lt;/strong&gt; means that covariates will not be centered at their means during the maximum likelihood estimation process. The default for this parameter is TRUE.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;deathexact = TRUE&lt;/strong&gt; indicates that the final absorbing state is exactly observed. This is the defining assumption survival data. In &lt;code&gt;msm&lt;/code&gt; this is equivalent to setting obstupe = 3 for state 4, our absorbing state.&lt;/p&gt;
&lt;p&gt;** method = BFGS** signals &lt;code&gt;optim()&lt;/code&gt; to use the optimization method published simultaneously in 1970 by Broyden, Fletcher, Goldfarb and Shanno. (look &lt;a href=&#34;https://en.wikipedia.org/wiki/Charles_George_Broyden&#34;&gt;here&lt;/a&gt;). This is a quasi-Newton method that uses function values and gradients to build up a picture of the surface to be optimized.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;control = list(trace=0,REPORT=1)&lt;/strong&gt; indicates more parameters that will be passed to &lt;code&gt;optim()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;REPORT&lt;/strong&gt; sets the the frequency of reports for the “BFGS”, “L-BFGS-B” and “SANN” methods if control$trace is positive. Defaults to every 10 iterations for “BFGS” and “L-BFGS-B”, or every 100 temperatures for “SANN”. (Note: SANN is a variant of the simulated annealing method presented by C. J. P. Belisle (1992) &lt;em&gt;Convergence theorems for a class of simulated annealing algorithms on R&lt;sup&gt;d&lt;/sup&gt;&lt;/em&gt; Journal of Applied Probability.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;trace&lt;/strong&gt; is also passed to&lt;code&gt;optim()&lt;/code&gt;. trace must be a non-negative integer. If positive, tracing information on the progress of the optimization is produced. Higher values may produce more tracing information.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1 &amp;lt;- msm(state~years, subject = PTNUM, data = df1, center= center, 
             qmatrix=Q, obstype = obstype, deathexact = deathexact, method = method,
             covariates = covariates, control = control)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First check to see if the model has converged. For the BFGS method, possible convergence codes returned by &lt;code&gt;optim()&lt;/code&gt; are:
0 indicates convergence, 1 indicates that the maximum iteration limit has been reached, 51 and 52 indicate warnings.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Model Status
conv &amp;lt;- model_1$opt$convergence; cat(&amp;quot;Convergence code =&amp;quot;, conv,&amp;quot;\n&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Convergence code = 0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, look at a measure of how well the model fits the data proposed by using a visual test proposed by &lt;a href=&#34;https://onlinelibrary.wiley.com/doi/10.1002/sim.4780130803&#34;&gt;Gentleman et al. (1994)&lt;/a&gt; which plots the observed numbers of individuals occupying a state at a series of times against forecasts from the fitted model, for each state. The &lt;code&gt;msm&lt;/code&gt; function &lt;code&gt;plot.prevalence.msm()&lt;/code&gt; produces a perfectly adequate base R plot. However, to emphasize that &lt;code&gt;msm&lt;/code&gt; users are not limited to base R plots, I’ll do a little extra work to use &lt;code&gt;ggplot()&lt;/code&gt;. When a package author is kind enough to provide an extractor function you can do anything you want with the data.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;prevalence.msm()&lt;/code&gt; function extracts both the observed and forecast prevalence matrices.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;prev &amp;lt;- prevalence.msm(model_1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This not very elegant, but straightforward code reshapes the data and plots.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# reshape observed prevalence
do1 &amp;lt;-as_tibble(row.names(prev$Observed)) %&amp;gt;% rename(time = value) %&amp;gt;% 
          mutate(time = as.numeric(time))
do2 &amp;lt;-as_tibble(prev$Observed) %&amp;gt;% mutate(type = &amp;quot;observed&amp;quot;)
do &amp;lt;- cbind(do1,do2) %&amp;gt;% select(-Total)
do_l &amp;lt;- do %&amp;gt;% gather(state, number, -time, -type)
# reshape expected prevalence
de1 &amp;lt;-as_tibble(row.names(prev$Expected)) %&amp;gt;% rename(time = value) %&amp;gt;% 
          mutate(time = as.numeric(time))
de2 &amp;lt;-as_tibble(prev$Expected) %&amp;gt;% mutate(type = &amp;quot;expected&amp;quot;)
de &amp;lt;- cbind(de1,de2) %&amp;gt;% select(-Total) 
de_l &amp;lt;- de %&amp;gt;% gather(state, number, -time, -type) 

# bind into a single data frame
prev_l &amp;lt;-rbind(do_l,de_l) %&amp;gt;% mutate(type = factor(type),
                                     state = factor(state),
                                     time = round(time,3))


# plot for comparison
prev_gp &amp;lt;- prev_l %&amp;gt;% group_by(state)
pp &amp;lt;- prev_l %&amp;gt;% ggplot() +
     geom_line(aes(time, number, color = type)) +
     xlab(&amp;quot;time&amp;quot;) +
     ylab(&amp;quot;&amp;quot;) +
     ggtitle(&amp;quot;&amp;quot;)
pp + facet_wrap(~state)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2023/04/19/multistate-models-for-medical-applications/index_files/figure-html/unnamed-chunk-13-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The agreement of the observed and forecast prevalence for states 1 through 3 look pretty good. After about 8 years the observed deaths are notably higher than the forecast. As Jackson points out (See the &lt;a href=&#34;https://cran.r-project.org/web/packages/msm/vignettes/msm-manual.pdf&#34;&gt;msm Manual&lt;/a&gt; page 33), this kind of discrepancy could indicate that the underlying process is not homogeneous. I have attempted to capture this non-homogeneity by having some of the transitions depend on time. And, although the plot above looks a little better that than the plot in the manual, which does not attempt to model non-homogeneity, it is apparent that there is room to find a better model!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;survival-curves-and-calculations&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Survival Curves and Calculations&lt;/h3&gt;
&lt;p&gt;Now, we can jump straight to the major result and look at the fitted survival curves. There is a &lt;code&gt;plot()&lt;/code&gt; method for &lt;code&gt;msm&lt;/code&gt; that will directly plot these curves. However, just to emphasize that if a package author is kind enough to provide a &lt;code&gt;plot&lt;/code&gt; method, it will probably not be too difficult to hack the code for the method to use an alternative plotting system. To save space, I will not show my code, but you can easily recreate it by stating with the &lt;code&gt;plot.msm()&lt;/code&gt; function, deleting the plotting parts and returning the values for time and the states “Health, Mild_CAV, and Severe_CAV which are used int the code below. Check my hack by running &lt;code&gt;plot(model_1)&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# plot_prep was obtained from plot.msm()
res &amp;lt;- plot_prep(model_1)
time &amp;lt;- res[[1]]
Health &amp;lt;- res[[2]]
Mild_CAV &amp;lt;- res[[3]]
Severe_CAV &amp;lt;- res[[4]]
df_w &amp;lt;- tibble(time,Health, Mild_CAV, Severe_CAV)
df_l &amp;lt;- df_w %&amp;gt;% gather(&amp;quot;state&amp;quot;, &amp;quot;survival&amp;quot;, -time)
p &amp;lt;- df_l %&amp;gt;% ggplot(aes(time, 1 - survival, group = state)) +
     geom_line(aes(color = state)) +
     xlab(&amp;quot;Years&amp;quot;) +
     ylab(&amp;quot;Probability&amp;quot;) +
     ggtitle(&amp;quot;Fitted Survival Probabilities&amp;quot;)
p&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/2023/04/19/multistate-models-for-medical-applications/index_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;These curves indicate that a treatment that could prevent CAV or at least delay progression from mild CAV to severe CAV might prolong survival. Additionally, the Markov structure of the model permits extracting information that relates to disease progression and the total time spent in each state.&lt;/p&gt;
&lt;p&gt;The function &lt;code&gt;totlos.msm()&lt;/code&gt; estimates the total expected time that a patient will spend in each state. Parameter settings for this function include:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;start&lt;/strong&gt; = c(1,0,0,0) specifies that patients will start in state 1 with probability 0.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fromt&lt;/strong&gt; = 0 indicates starting at the beginning of the process.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;covariates&lt;/strong&gt; = “mean” indicates that the covariates will be set to their mean values for the calculation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;total_state_time &amp;lt;-totlos.msm(model_1,start = c(1,0,0,0), from = 0, covariates = &amp;quot;mean&amp;quot;)
total_state_time&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## State 1 State 2 State 3 State 4 
##   7.002   2.473   1.621     Inf&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The table indicates that the mean time a patient can expect to avoid CAV is about 7 years. After progressing to a Mild_CAV, a patient can expect five additional years.&lt;/p&gt;
&lt;p&gt;A more direct calculation based on the intensity matrix, Q, give the expected time to the “absorbing” state, Death, from each of the “transient” living states.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_to_death &amp;lt;- efpt.msm(model_1, tostate = 4, covariates = &amp;quot;mean&amp;quot;)
time_to_death&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 11.097  5.836  3.005  0.000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This agrees with the total state times calculated above.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sum(total_state_time[1:3])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 11.1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Within the scope of the information provided by the covariates, it is also possible to generate more individualized forecasts. For example, here is the expected time to death for a person starting off with no CAV at age 60, who received a heart from a 20 year old donor, 5 years after the transplant.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;efpt.msm(model_1, tostate = 4, start = c(1,0,0,0), covariates = list(years = 5, b_age = 60, dage = 20))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       [,1]
## [1,] 7.953&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A related quantity, mean sojourn time, is the mean time that each visit to each state is expected to last. Since, we are assuming a progressive disease model where each patient visits each state only once, the estimate should be close to total time spent in each state. However, Jackson notes that in a progressive model, sojourn time in the disease state will be greater than the expected length of stay in the disease state because the mean sojourn time in a state is conditional on entering the state, whereas the expected total time in a diseased state is a forecast for an individual, who may die before getting the disease. (See help(totlos.msm)). And indeed, that is what we see here for states 2 and 3.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sojourn.msm(model_1, covariates=&amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         estimates     SE     L     U
## State 1     7.002 0.4024 6.256 7.837
## State 2     3.525 0.3020 2.980 4.169
## State 3     3.005 0.3748 2.353 3.837&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;hazard-ratios&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Hazard Ratios&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;model_1&lt;/code&gt; will also product estimates of hazard ratios which show the estimate effect on transition intensities for each state.&lt;/p&gt;
&lt;p&gt;Here is the table of Hazard Ratios:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## msm(formula = state ~ years, subject = PTNUM, data = df1, qmatrix = Q,     obstype = obstype, covariates = covariates, deathexact = deathexact,     center = center, method = method, control = control)
## 
## Maximum likelihood estimates
## Baselines are with covariates set to 0
## 
## Transition intensities with hazard ratios for each covariate
##                   Baseline                         years              
## State 1 - State 1 -0.032750 (-0.0607897,-0.017644)                    
## State 1 - State 2  0.030957 ( 0.0160470, 0.059721) 1.112 (1.061,1.166)
## State 1 - State 4  0.001793 ( 0.0004703, 0.006836) 1.093 (1.012,1.182)
## State 2 - State 2 -0.395633 (-0.6488723,-0.241227)                    
## State 2 - State 3  0.264310 ( 0.1488153, 0.469441) 1.000              
## State 2 - State 4  0.131323 ( 0.0330133, 0.522385) 1.000              
## State 3 - State 3 -0.434548 (-0.9113857,-0.207192)                    
## State 3 - State 4  0.434548 ( 0.2071918, 0.911386) 1.000              
##                   b_age                dage                 
## State 1 - State 1                                           
## State 1 - State 2 1.001 (0.9884,1.014) 1.0281 (1.0159,1.040)
## State 1 - State 4 1.053 (1.0271,1.079) 1.0208 (1.0039,1.038)
## State 2 - State 2                                           
## State 2 - State 3 1.000                0.9932 (0.9756,1.011)
## State 2 - State 4 1.000                0.9757 (0.9298,1.024)
## State 3 - State 3                                           
## State 3 - State 4 1.000                0.9906 (0.9672,1.015)
## 
## -2 * log-likelihood:  3466&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The table shows that time, the covariate &lt;code&gt;years&lt;/code&gt;, affects disease progression represented by the the transition from state 1 to state 2, but has a smaller effect on the transition from state 1 to state 4.&lt;/p&gt;
&lt;p&gt;The covariate &lt;code&gt;b_age&lt;/code&gt;, the baseline age of patient at transplant time has a larger effect on dying before the onset of CAV than on the transition to CAV.&lt;/p&gt;
&lt;p&gt;The covariate &lt;code&gt;dage&lt;/code&gt; has a minor effect on the transitions from state 1 but apparently has no effect thereafter.&lt;/p&gt;
&lt;p&gt;The hazard ratios are computed by calculating the exponential of the estimated covariate effects on the log-transition intensities for the Markov process which are stored in the model object.&lt;/p&gt;
&lt;p&gt;To see how these work, first look at the baseline hazard ratios.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$baseline&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          State 1  State 2 State 3  State 4
## State 1 -0.03275  0.03096  0.0000 0.001793
## State 2  0.00000 -0.39563  0.2643 0.131323
## State 3  0.00000  0.00000 -0.4345 0.434548
## State 4  0.00000  0.00000  0.0000 0.000000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These baseline hazard ratios are computed from the model intensity matrix, Q, assuming no covariates. They can also be directly extracted from the model by &lt;code&gt;qmatrix.msm()&lt;/code&gt; extractor function with covariates set to zero.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;qmatrix.msm(model_1,  covariates = 0)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1                          State 2                         
## State 1 -0.032750 (-0.0607897,-0.017644)  0.030957 ( 0.0160470, 0.059721)
## State 2 0                                -0.395633 (-0.6488723,-0.241227)
## State 3 0                                0                               
## State 4 0                                0                               
##         State 3                          State 4                         
## State 1 0                                 0.001793 ( 0.0004703, 0.006836)
## State 2  0.264310 ( 0.1488153, 0.469441)  0.131323 ( 0.0330133, 0.522385)
## State 3 -0.434548 (-0.9113857,-0.207192)  0.434548 ( 0.2071918, 0.911386)
## State 4 0                                0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The 95% confidence limits are computed by assuming normality of the log-effect.&lt;/p&gt;
&lt;p&gt;A more representative value for the intensity matrix for this model can be obtained by setting the covariates to their expected mean values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;qmatrix.msm(model_1,  covariates = &amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1                      State 2                     
## State 1 -0.14281 (-0.15984,-0.12760)  0.10019 ( 0.08742, 0.11482)
## State 2 0                            -0.28369 (-0.33556,-0.23984)
## State 3 0                            0                           
## State 4 0                            0                           
##         State 3                      State 4                     
## State 1 0                             0.04262 ( 0.03339, 0.05441)
## State 2  0.21821 ( 0.17636, 0.26999)  0.06548 ( 0.03872, 0.11075)
## State 3 -0.33281 (-0.42498,-0.26064)  0.33281 ( 0.26064, 0.42498)
## State 4 0                            0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we may want to examine the contribution of the covariate covariates to the hazard ratios. To take a particular example, look at the &lt;code&gt;dage&lt;/code&gt; to the hazard ratios&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$dage&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2   State 3   State 4
## State 1       0 0.02771  0.000000  0.020575
## State 2       0 0.00000 -0.006783 -0.024625
## State 3       0 0.00000  0.000000 -0.009439
## State 4       0 0.00000  0.000000  0.000000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and focus on the contribution of &lt;code&gt;dage&lt;/code&gt; to the intensity matrix for the transition from state 3 to state 4 which is given as -0.009439 in the table above. Taking the exponential of this value, yields the hazard ratio for the &lt;code&gt;dage&lt;/code&gt; state 3 to 4 transition in the hazard ratio’s table we got by printing out &lt;code&gt;model_1&lt;/code&gt; above.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;exp(-.009439)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.9906&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The hazard ratio tables for the remaining covariates are given by:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$years&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1       0  0.1064       0 0.08933
## State 2       0  0.0000       0 0.00000
## State 3       0  0.0000       0 0.00000
## State 4       0  0.0000       0 0.00000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_1$Qmatrices$b_age&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1   State 2 State 3 State 4
## State 1       0 0.0009645       0 0.05152
## State 2       0 0.0000000       0 0.00000
## State 3       0 0.0000000       0 0.00000
## State 4       0 0.0000000       0 0.00000&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;exploring-transition-probabilities-and-intensities&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Exploring Transition Probabilities and Intensities&lt;/h3&gt;
&lt;p&gt;It is also possible to look at the state transition matrix at different times and see how these probabilities change over time. Here we compute the transition matrix at 1 year.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 1, covariates = &amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2  State 3 State 4
## State 1  0.8669 0.08101 0.008493 0.04357
## State 2  0.0000 0.75300 0.160340 0.08666
## State 3  0.0000 0.00000 0.716903 0.28310
## State 4  0.0000 0.00000 0.000000 1.00000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and at 5 years.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 5, covariates = &amp;quot;mean&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1  0.4897  0.1761 0.07871  0.2556
## State 2  0.0000  0.2421 0.23419  0.5237
## State 3  0.0000  0.0000 0.18937  0.8106
## State 4  0.0000  0.0000 0.00000  1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Additionally, it is possible to examine the effect of covariates on transition probabilities. Here are the 5 year transition probabilities for a patient with a baseline age of 35.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 5, covariates = list(years = 5, b_age = 35, dage = 20))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1  0.5474  0.1674 0.07575  0.2095
## State 2  0.0000  0.2112 0.21622  0.5726
## State 3  0.0000  0.0000 0.16547  0.8345
## State 4  0.0000  0.0000 0.00000  1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and those who had the procedure at age 60.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(model_1, t = 5, covariates = list(years = 5, b_age = 60, dage = 20))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         State 1 State 2 State 3 State 4
## State 1  0.3863  0.1409 0.06784  0.4050
## State 2  0.0000  0.2112 0.21622  0.5726
## State 3  0.0000  0.0000 0.16547  0.8345
## State 4  0.0000  0.0000 0.00000  1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that the transitions from CAV states are unaffected.&lt;/p&gt;
&lt;p&gt;To summarize: Continuous Time Markov Chains provide a natural framework for working with multi-state survival models. The &lt;code&gt;msm&lt;/code&gt; package is sufficiently sophisticated to permit modeling clinical process with level of fidelity that may provide insight about clinically observed disease progression. The software is relatively easy to use and there is plenty of documentation to help you get started.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;learning-more-about-multi-state-survival-models&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Learning More About Multi-State Survival Models&lt;/h3&gt;
&lt;p&gt;To dive deeper into multi-state survival models, I am sure you will find Ardo van den Hout’ &lt;a href=&#34;https://www.routledge.com/Multi-State-Survival-Models-for-Interval-Censored-Data/Hout/p/book/9780367570569&#34;&gt;Multi-State Survival Models for Interval-Censored Data&lt;/a&gt; extraordinarily helpful. There are many good textbooks about the basics of Continuous Time Markov Chains. I recommend J.R.Norris’ - &lt;a href=&#34;https://www.cambridge.org/core/books/markov-chains/A3F966B10633A32C8F06F37158031739&#34;&gt;Markov Chains&lt;/a&gt; which is still modestly priced. There are also many expositions freely available on the internet including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;David F. Anderson - &lt;a href=&#34;https://u.math.biu.ac.il/~amirgi/CTMCnotes.pdf&#34;&gt;Chapter 6: Continuous Time Markov Chains&lt;/a&gt; from &lt;a href=&#34;https://u.math.biu.ac.il/~amirgi/SBA.pdf&#34;&gt;Lecture Notes on Stochastic Processes with Applications in Biology&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Miranda Holmes-Cerfon - &lt;a href=&#34;https://cims.nyu.edu/~holmes/teaching/asa19/handout_Lecture4_2019.pdf&#34;&gt;Lecture 4: Continuous-time Markov Chains&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Søren Feodor Nielsen - &lt;a href=&#34;http://web.math.ku.dk/~susanne/kursusstokproc/ContinuousTime.pdf&#34;&gt;Continuous-time homogeneous Markov chains&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Karl Sigman - &lt;a href=&#34;http://www.columbia.edu/~ks20/stochastic-I/stochastic-I-CTMC.pdf&#34;&gt;Continuous-Time Markov Chains&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2023/04/19/multistate-models-for-medical-applications/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Fake Survival Data for the Disease Progression Model</title>
      <link>https://rviews.rstudio.com/2020/10/08/fake-data-for-the-illness-death-model/</link>
      <pubDate>Thu, 08 Oct 2020 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2020/10/08/fake-data-for-the-illness-death-model/</guid>
      <description>
        


&lt;p&gt;In a &lt;a href=&#34;https://rviews.rstudio.com/2020/09/09/fake-data-with-r/&#34;&gt;previous post&lt;/a&gt;, I showed some examples of simulating fake data from a few packages that are useful for common simulation tasks and indicated that I would be following up with a look at simulating survival data. A tremendous amount of work in survival analysis has been done in R&lt;sup&gt;1&lt;/sup&gt; and it will take some time to explore what’s out there. In this first post, I am just going to jump into the ocean of ideas and see if I can fish out and interesting example.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://link.springer.com/article/10.2165/00019053-199813040-00003&#34;&gt;Markov models&lt;/a&gt; are commonly used in Health Care Economics to model the progression of a disease, and the efficacy and potential benefits of various treatments. One popular approach is to consider cohorts of patients who move through the three states of being &lt;em&gt;healthy&lt;/em&gt; (no disease progression), &lt;em&gt;diseased&lt;/em&gt; (some level of disease progression) and &lt;em&gt;dead&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The following figure illustrates the process. (I will explain the labeling on the arrows below).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2020-10-02-fake-data-for-the-illness-death-model/index_files/figure-html/unnamed-chunk-2-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;These kinds of models are commonly called multi-state models in the survival literature. In the simplest case, disease progression might be modeled as a discrete time &lt;a href=&#34;https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf&#34;&gt;Markov chain&lt;/a&gt; where patients move from state-to-state according to a matrix of transition probabilities which govern how the process develops at discrete time intervals. However, for many studies, limiting transitions to discrete, uniform intervals is a little too simplistic. For example, in most cases, the exact time when a patient “progresses” from healthy to deceased is not observed. To account for this, modelers frequently consider &lt;a href=&#34;http://u.math.biu.ac.il/~amirgi/CTMCnotes.pdf&#34;&gt;Continuous Time Markov Chain&lt;/a&gt; which allow modeling the distribution of time spent in each state as well as the state-to-state transitions.&lt;/p&gt;
&lt;p&gt;One way to define a continuous time Markov chain is as a continuous time process that takes values in a discrete state space and obeys the Markov property where the transition to a future state depends only on the present and not on the past.&lt;/p&gt;
&lt;p&gt;A continuous-time stochastic process &lt;span class=&#34;math inline&#34;&gt;\(X_{t}, t \geq 0\)&lt;/span&gt; with discrete state space S is a continuous-time Markov chain if:
&lt;span class=&#34;math display&#34;&gt;\[P(X_{t+s}=j \:|\: X_{s}=i), X_u = x_u, 0 \leq u &amp;lt; s) = P(X_{t+s}=j \: | \: X_{s}=i)\]&lt;/span&gt; &lt;span class=&#34;math display&#34;&gt;\[ \forall s,t \geq 0 \:, i, j, x_{u} \in S, \: 0 \leq u &amp;lt; s \]&lt;/span&gt;
If the process does not depend on the the particular value of &lt;em&gt;s&lt;/em&gt; (the time when the process is in state &lt;em&gt;i&lt;/em&gt;) then it is said to be &lt;em&gt;time homogeneous&lt;/em&gt;. For a very readable account of how the definition above along with the assumption of time homogeneity ensure both the Markov property and that the time the process spends in the various states will be exponentially distributed, see Chapter 7 of &lt;a href=&#34;https://www.amazon.com/Introduction-Stochastic-Processes-Robert-Dobrow/dp/1118740653/ref=sr_1_1?dchild=1&amp;amp;keywords=stochastic+processes+in+r&amp;amp;qid=1601771046&amp;amp;s=books&amp;amp;sr=1-1&#34;&gt;Dobrow (2016)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One more bit of theory before we get to the example: unlike discrete time Markov chains, the development of a continuous time process is not driven by a transition matrix. Instead, state transition probabilities are generated by a matrix, &lt;em&gt;Q&lt;/em&gt;, that gives the instantaneous rates of going from one state to another. Transition probabilities for any time, &lt;em&gt;t&lt;/em&gt;, are then calculated from &lt;em&gt;Q&lt;/em&gt; using &lt;a href=&#34;https://cran.r-project.org/web/packages/expm/index.html&#34;&gt;matrix exponentiation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[P(t)=e^Q\]&lt;/span&gt;
The following is the &lt;em&gt;Q&lt;/em&gt; matrix for our three state disease progression model. Notice, that this is not a stochastic matrix: the rows sum to 0 not to 1. The basic idea is that the rate of flow into a state &lt;em&gt;i&lt;/em&gt; is equal to the flow out of &lt;em&gt;i&lt;/em&gt;. The final row is all zeroes in our &lt;em&gt;Q&lt;/em&gt; matrix because death is an &lt;em&gt;absorbing state&lt;/em&gt; and there are not transitions back to &lt;em&gt;healthy&lt;/em&gt; from &lt;em&gt;diseased&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[Q = \begin{pmatrix}
        \ -(q_{12} + q_{13}) &amp;amp; q_{12} &amp;amp; q_{13}) \\ 
        \ 0 &amp;amp; -q_{23} &amp;amp; q_{23} \\
        \  0 &amp;amp; 0 &amp;amp; 0          
     \end{pmatrix} \]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Armed with a little bit of theory, let’s see how continuous time Markov chains can be used both to simulate survival data and also to fit a model to the fake data.&lt;/p&gt;
&lt;div id=&#34;generating-simulated-survival-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Generating Simulated Survival Data&lt;/h3&gt;
&lt;p&gt;The following is essentially the example on page 12 of the pdf for the &lt;a href=&#34;https://CRAN.R-project.org/package=genSurv&#34;&gt;genSurv&lt;/a&gt; package&lt;sup&gt;2&lt;/sup&gt; listed in the CRAN Survival Task View. This shows how to use the &lt;code&gt;genTHMM()&lt;/code&gt; function to simulate data from a time homogeneous, continuous time Markov Chain. In the code below, the &lt;code&gt;model.cens&lt;/code&gt; parameter indicates that censoring is accomplished via a uniform distribution over the interval [0, &lt;code&gt;cens.par&lt;/code&gt;]. A covariate is generated by a uniform distribution over the interval [0, &lt;code&gt;covar&lt;/code&gt;] and enters the model through the equation:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[q_{i,j} = \lambda_{i,j} exp(\beta_{i,j} \cdot v)\]&lt;/span&gt;
where &lt;span class=&#34;math inline&#34;&gt;\(\lambda_{i,j}\)&lt;/span&gt; is the base rate, parameter &lt;code&gt;rate&lt;/code&gt; for the &lt;code&gt;genTHMM()&lt;/code&gt; function and &lt;span class=&#34;math inline&#34;&gt;\(\beta_{i,j}\)&lt;/span&gt; are the regression coefficients, &lt;code&gt;beta&lt;/code&gt; in the function. In the code below, we use the &lt;code&gt;covariate&lt;/code&gt; output to create a &lt;code&gt;sex&lt;/code&gt; covariate.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(1234)
thmmdata &amp;lt;- genTHMM( n=100, model.cens=&amp;quot;uniform&amp;quot;, # censorship model
                     cens.par = 20, 
                     beta = c(0.01,0.08,0.05),
                     covar = 1, 
                     rate = c(0.1,0.05,0.08) )
                     
df &amp;lt;- thmmdata %&amp;gt;% mutate(sex = if_else(covariate &amp;lt;= .5,0,1 ))
df &amp;lt;- df %&amp;gt;% mutate_if(is.numeric, round, 3)
head(df,11)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    PTNUM   time state covariate sex
## 1      1  0.000     1     0.114   0
## 2      1  2.183     2     0.114   0
## 3      1  2.265     3     0.114   0
## 4      2  0.000     1     0.233   0
## 5      2  0.284     2     0.233   0
## 6      2  1.396     3     0.233   0
## 7      3  0.000     1     0.283   0
## 8      3  8.600     2     0.283   0
## 9      3 18.469     2     0.283   0
## 10     4  0.000     1     0.267   0
## 11     4  3.734     1     0.267   0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For more on the theory underlying the &lt;code&gt;genSurv&lt;/code&gt; package have a look at the paper &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2692556/&#34;&gt;Meira-Mechado et al. (2009)&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;fitting-the-survival-model&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Fitting the Survival Model&lt;/h3&gt;
&lt;p&gt;The code in this section fits a continuous time, Markov chain survival model to the data generated above using the &lt;a href=&#34;https://cran.r-project.org/package=msm&#34;&gt;&lt;code&gt;msm&lt;/code&gt;&lt;/a&gt; package&lt;sup&gt;3&lt;/sup&gt; and indicates how one might go about examining the output.&lt;/p&gt;
&lt;p&gt;First, let’s look at the transitions between states that occurred for the simulated patients.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;st &amp;lt;- statetable.msm(state, PTNUM,data = thmmdata)
st&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##     to
## from  1  2  3
##    1 28 50 22
##    2  0 25 25&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We see, for example, 50 progressed to the diseased state and 22 patient went directly from being &lt;em&gt;healthy&lt;/em&gt; to &lt;em&gt;dead&lt;/em&gt;. 25 patients who progressed to disease, subsequently died.&lt;/p&gt;
&lt;p&gt;Next, we set up the Q matrix of instantaneous transition rates described above,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Q &amp;lt;- matrix(c(0, 1, 1, 0, 0 , 1, 0, 0 , 0), nrow = 3, byrow = TRUE)
rownames(Q) &amp;lt;- c(&amp;quot;S1&amp;quot;, &amp;quot;S2&amp;quot;, &amp;quot;S3&amp;quot;)
colnames(Q)  &amp;lt;- c(&amp;quot;S1&amp;quot;, &amp;quot;S2&amp;quot;, &amp;quot;S3&amp;quot;)
Q&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    S1 S2 S3
## S1  0  1  1
## S2  0  0  1
## S3  0  0  0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;fit the model, and plot the survival curves for states &lt;em&gt;S1&lt;/em&gt; and &lt;em&gt;S2&lt;/em&gt; using the “old school” pre-built plot method.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;- msm( state ~ time, subject=PTNUM, data = df, 
            qmatrix = Q, gen.inits = TRUE, covariates = ~ sex)
plot(fit)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2020-10-02-fake-data-for-the-illness-death-model/index_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The default print method for the mode fit shows the transition intensities with the hazard ratio of the covariate.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## msm(formula = state ~ time, subject = PTNUM, data = df, qmatrix = Q,     gen.inits = TRUE, covariates = ~sex)
## 
## Maximum likelihood estimates
## Baselines are with covariates set to their means
## 
## Transition intensities with hazard ratios for each covariate
##         Baseline                    sex                   
## S1 - S1 -0.28724 (-0.37477,-0.2202)                       
## S1 - S2  0.24020 ( 0.17891, 0.3225) 0.8992 (0.49769,1.625)
## S1 - S3  0.04703 ( 0.02057, 0.1076) 0.2106 (0.04132,1.073)
## S2 - S2 -0.09756 (-0.14312,-0.0665)                       
## S2 - S3  0.09756 ( 0.06650, 0.1431) 0.6558 (0.30473,1.411)
## 
## -2 * log-likelihood:  413.3 
## [Note, to obtain old print format, use &amp;quot;printold.msm&amp;quot;]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can get the transition rates for sex = 0,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;qmatrix.msm(fit, covariates = list(sex = 0))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    S1                          S2                         
## S1 -0.3583 (-0.51092,-0.25130)  0.2537 ( 0.16167, 0.39804)
## S2 0                           -0.1211 (-0.20856,-0.07037)
## S3 0                           0                          
##    S3                         
## S1  0.1046 ( 0.04986, 0.21962)
## S2  0.1211 ( 0.07037, 0.20856)
## S3 0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and for sex = 1.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;qmatrix.msm(fit, covariates = list(sex = 1))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    S1                            S2                           
## S1 -0.25013 (-0.357720,-0.17490)  0.22810 ( 0.155472, 0.33464)
## S2 0                             -0.07945 (-0.136411,-0.04627)
## S3 0                             0                            
##    S3                           
## S1  0.02204 ( 0.005169, 0.09396)
## S2  0.07945 ( 0.046269, 0.13641)
## S3 0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and &lt;code&gt;msm&lt;/code&gt; also allows us to calculate the transition function &lt;span class=&#34;math inline&#34;&gt;\(P(t)\)&lt;/span&gt; for arbitrary times.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pmatrix.msm(fit, t= 13.3)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         S1     S2     S3
## S1 0.02192 0.3182 0.6599
## S2 0.00000 0.2732 0.7268
## S3 0.00000 0.0000 1.0000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, we look at the mean sojourn times for patients in the &lt;em&gt;healthy&lt;/em&gt; and &lt;em&gt;diseased&lt;/em&gt; states. Normally, for a process that can transition in and out of states this means the average time spent in the state each time it is visited. For our model, patients, only go forward through the chain, there is no getting better, so the sojourn for S2 is essentially the average amount of time patients spent in the &lt;em&gt;diseased&lt;/em&gt; state.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sojourn.msm(fit)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    estimates     SE     L      U
## S1     3.481 0.4725 2.668  4.542
## S2    10.251 2.0045 6.987 15.038&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;a-few-remarks&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;A Few Remarks&lt;/h3&gt;
&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt;The work done in R on survival analysis, and partially embodied in the two hundred thirty-three packages listed in the CRAN &lt;a href=&#34;https://cran.r-project.org/web/views/Survival.html&#34;&gt;Survival Analysis Task View&lt;/a&gt;, constitutes a fundamental contribution to statistics. There is enough material here for a lifetime of study. Even confining oneself to a tour of the eleven packages listed in the simulation section would be a significant undertaking.&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;2&lt;/sup&gt; &lt;code&gt;genSurv&lt;/code&gt; is a pretty bare bones package having just seven functions and little explanatory text. If it was not listed on the CRAN Task View, it would have been easy pass by. Nevertheless, I have only shown a small portion of what it can do.&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;code&gt;msm&lt;/code&gt; is an example of why R is a treasury of statistical knowledge. Not only does the package offer an impressive array of capabilities for analyzing multi-state Markov models in continuous time, the basic documentation, the package’s pdf, includes references to quite a few of the fundamental papers.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2020/10/08/fake-data-for-the-illness-death-model/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
