Health Care Economics on R Views
https://rviews.rstudio.com/tags/health-care-economics/
Recent content in Health Care Economics on R ViewsHugo -- gohugo.ioen-usThu, 08 Oct 2020 00:00:00 +0000Fake Survival Data for the Disease Progression Model
https://rviews.rstudio.com/2020/10/08/fake-data-for-the-illness-death-model/
Thu, 08 Oct 2020 00:00:00 +0000https://rviews.rstudio.com/2020/10/08/fake-data-for-the-illness-death-model/
<p>In a <a href="https://rviews.rstudio.com/2020/09/09/fake-data-with-r/">previous post</a>, I showed some examples of simulating fake data from a few packages that are useful for common simulation tasks and indicated that I would be following up with a look at simulating survival data. A tremendous amount of work in survival analysis has been done in R<sup>1</sup> and it will take some time to explore what’s out there. In this first post, I am just going to jump into the ocean of ideas and see if I can fish out and interesting example.</p>
<p><a href="https://link.springer.com/article/10.2165/00019053-199813040-00003">Markov models</a> are commonly used in Health Care Economics to model the progression of a disease, and the efficacy and potential benefits of various treatments. One popular approach is to consider cohorts of patients who move through the three states of being <em>healthy</em> (no disease progression), <em>diseased</em> (some level of disease progression) and <em>dead</em>.</p>
<p>The following figure illustrates the process. (I will explain the labeling on the arrows below).</p>
<p><img src="/post/2020-10-02-fake-data-for-the-illness-death-model/index_files/figure-html/unnamed-chunk-2-1.png" width="672" /></p>
<p>These kinds of models are commonly called multi-state models in the survival literature. In the simplest case, disease progression might be modeled as a discrete time <a href="https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf">Markov chain</a> where patients move from state-to-state according to a matrix of transition probabilities which govern how the process develops at discrete time intervals. However, for many studies, limiting transitions to discrete, uniform intervals is a little too simplistic. For example, in most cases, the exact time when a patient “progresses” from healthy to deceased is not observed. To account for this, modelers frequently consider <a href="http://u.math.biu.ac.il/~amirgi/CTMCnotes.pdf">Continuous Time Markov Chain</a> which allow modeling the distribution of time spent in each state as well as the state-to-state transitions.</p>
<p>One way to define a continuous time Markov chain is as a continuous time process that takes values in a discrete state space and obeys the Markov property where the transition to a future state depends only on the present and not on the past.</p>
<p>A continuous-time stochastic process <span class="math inline">\(X_{t}, t \geq 0\)</span> with discrete state space S is a continuous-time Markov chain if:
<span class="math display">\[P(X_{t+s}=j \:|\: X_{s}=i), X_u = x_u, 0 \leq u < s) = P(X_{t+s}=j \: | \: X_{s}=i)\]</span> <span class="math display">\[ \forall s,t \geq 0 \:, i, j, x_{u} \in S, \: 0 \leq u < s \]</span>
If the process does not depend on the the particular value of <em>s</em> (the time when the process is in state <em>i</em>) then it is said to be <em>time homogeneous</em>. For a very readable account of how the definition above along with the assumption of time homogeneity ensure both the Markov property and that the time the process spends in the various states will be exponentially distributed, see Chapter 7 of <a href="https://www.amazon.com/Introduction-Stochastic-Processes-Robert-Dobrow/dp/1118740653/ref=sr_1_1?dchild=1&keywords=stochastic+processes+in+r&qid=1601771046&s=books&sr=1-1">Dobrow (2016)</a>.</p>
<p>One more bit of theory before we get to the example: unlike discrete time Markov chains, the development of a continuous time process is not driven by a transition matrix. Instead, state transition probabilities are generated by a matrix, <em>Q</em>, that gives the instantaneous rates of going from one state to another. Transition probabilities for any time, <em>t</em>, are then calculated from <em>Q</em> using <a href="https://cran.r-project.org/web/packages/expm/index.html">matrix exponentiation</a>.</p>
<p><span class="math display">\[P(t)=e^Q\]</span>
The following is the <em>Q</em> matrix for our three state disease progression model. Notice, that this is not a stochastic matrix: the rows sum to 0 not to 1. The basic idea is that the rate of flow into a state <em>i</em> is equal to the flow out of <em>i</em>. The final row is all zeroes in our <em>Q</em> matrix because death is an <em>absorbing state</em> and there are not transitions back to <em>healthy</em> from <em>diseased</em>.</p>
<p><span class="math display">\[Q = \begin{pmatrix}
\ -(q_{12} + q_{13}) & q_{12} & q_{13}) \\
\ 0 & -q_{23} & q_{23} \\
\ 0 & 0 & 0
\end{pmatrix} \]</span></p>
<p>Armed with a little bit of theory, let’s see how continuous time Markov chains can be used both to simulate survival data and also to fit a model to the fake data.</p>
<div id="generating-simulated-survival-data" class="section level3">
<h3>Generating Simulated Survival Data</h3>
<p>The following is essentially the example on page 12 of the pdf for the <a href="https://CRAN.R-project.org/package=genSurv">genSurv</a> package<sup>2</sup> listed in the CRAN Survival Task View. This shows how to use the <code>genTHMM()</code> function to simulate data from a time homogeneous, continuous time Markov Chain. In the code below, the <code>model.cens</code> parameter indicates that censoring is accomplished via a uniform distribution over the interval [0, <code>cens.par</code>]. A covariate is generated by a uniform distribution over the interval [0, <code>covar</code>] and enters the model through the equation:</p>
<p><span class="math display">\[q_{i,j} = \lambda_{i,j} exp(\beta_{i,j} \cdot v)\]</span>
where <span class="math inline">\(\lambda_{i,j}\)</span> is the base rate, parameter <code>rate</code> for the <code>genTHMM()</code> function and <span class="math inline">\(\beta_{i,j}\)</span> are the regression coefficients, <code>beta</code> in the function. In the code below, we use the <code>covariate</code> output to create a <code>sex</code> covariate.</p>
<pre class="r"><code>set.seed(1234)
thmmdata <- genTHMM( n=100, model.cens="uniform", # censorship model
cens.par = 20,
beta = c(0.01,0.08,0.05),
covar = 1,
rate = c(0.1,0.05,0.08) )
df <- thmmdata %>% mutate(sex = if_else(covariate <= .5,0,1 ))
df <- df %>% mutate_if(is.numeric, round, 3)
head(df,11)</code></pre>
<pre><code>## PTNUM time state covariate sex
## 1 1 0.000 1 0.114 0
## 2 1 2.183 2 0.114 0
## 3 1 2.265 3 0.114 0
## 4 2 0.000 1 0.233 0
## 5 2 0.284 2 0.233 0
## 6 2 1.396 3 0.233 0
## 7 3 0.000 1 0.283 0
## 8 3 8.600 2 0.283 0
## 9 3 18.469 2 0.283 0
## 10 4 0.000 1 0.267 0
## 11 4 3.734 1 0.267 0</code></pre>
<p>For more on the theory underlying the <code>genSurv</code> package have a look at the paper <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2692556/">Meira-Mechado et al. (2009)</a>.</p>
</div>
<div id="fitting-the-survival-model" class="section level3">
<h3>Fitting the Survival Model</h3>
<p>The code in this section fits a continuous time, Markov chain survival model to the data generated above using the <a href="https://cran.r-project.org/package=msm"><code>msm</code></a> package<sup>3</sup> and indicates how one might go about examining the output.</p>
<p>First, let’s look at the transitions between states that occurred for the simulated patients.</p>
<pre class="r"><code>st <- statetable.msm(state, PTNUM,data = thmmdata)
st</code></pre>
<pre><code>## to
## from 1 2 3
## 1 28 50 22
## 2 0 25 25</code></pre>
<p>We see, for example, 50 progressed to the diseased state and 22 patient went directly from being <em>healthy</em> to <em>dead</em>. 25 patients who progressed to disease, subsequently died.</p>
<p>Next, we set up the Q matrix of instantaneous transition rates described above,</p>
<pre class="r"><code>Q <- matrix(c(0, 1, 1, 0, 0 , 1, 0, 0 , 0), nrow = 3, byrow = TRUE)
rownames(Q) <- c("S1", "S2", "S3")
colnames(Q) <- c("S1", "S2", "S3")
Q</code></pre>
<pre><code>## S1 S2 S3
## S1 0 1 1
## S2 0 0 1
## S3 0 0 0</code></pre>
<p>fit the model, and plot the survival curves for states <em>S1</em> and <em>S2</em> using the “old school” pre-built plot method.</p>
<pre class="r"><code>fit <- msm( state ~ time, subject=PTNUM, data = df,
qmatrix = Q, gen.inits = TRUE, covariates = ~ sex)
plot(fit)</code></pre>
<p><img src="/post/2020-10-02-fake-data-for-the-illness-death-model/index_files/figure-html/unnamed-chunk-6-1.png" width="672" /></p>
<p>The default print method for the mode fit shows the transition intensities with the hazard ratio of the covariate.</p>
<pre class="r"><code>fit</code></pre>
<pre><code>##
## Call:
## msm(formula = state ~ time, subject = PTNUM, data = df, qmatrix = Q, gen.inits = TRUE, covariates = ~sex)
##
## Maximum likelihood estimates
## Baselines are with covariates set to their means
##
## Transition intensities with hazard ratios for each covariate
## Baseline sex
## S1 - S1 -0.28724 (-0.37477,-0.2202)
## S1 - S2 0.24020 ( 0.17891, 0.3225) 0.8992 (0.49769,1.625)
## S1 - S3 0.04703 ( 0.02057, 0.1076) 0.2106 (0.04132,1.073)
## S2 - S2 -0.09756 (-0.14312,-0.0665)
## S2 - S3 0.09756 ( 0.06650, 0.1431) 0.6558 (0.30473,1.411)
##
## -2 * log-likelihood: 413.3
## [Note, to obtain old print format, use "printold.msm"]</code></pre>
<p>We can get the transition rates for sex = 0,</p>
<pre class="r"><code>qmatrix.msm(fit, covariates = list(sex = 0))</code></pre>
<pre><code>## S1 S2
## S1 -0.3583 (-0.51092,-0.25130) 0.2537 ( 0.16167, 0.39804)
## S2 0 -0.1211 (-0.20856,-0.07037)
## S3 0 0
## S3
## S1 0.1046 ( 0.04986, 0.21962)
## S2 0.1211 ( 0.07037, 0.20856)
## S3 0</code></pre>
<p>and for sex = 1.</p>
<pre class="r"><code>qmatrix.msm(fit, covariates = list(sex = 1))</code></pre>
<pre><code>## S1 S2
## S1 -0.25013 (-0.357720,-0.17490) 0.22810 ( 0.155472, 0.33464)
## S2 0 -0.07945 (-0.136411,-0.04627)
## S3 0 0
## S3
## S1 0.02204 ( 0.005169, 0.09396)
## S2 0.07945 ( 0.046269, 0.13641)
## S3 0</code></pre>
<p>and <code>msm</code> also allows us to calculate the transition function <span class="math inline">\(P(t)\)</span> for arbitrary times.</p>
<pre class="r"><code>pmatrix.msm(fit, t= 13.3)</code></pre>
<pre><code>## S1 S2 S3
## S1 0.02192 0.3182 0.6599
## S2 0.00000 0.2732 0.7268
## S3 0.00000 0.0000 1.0000</code></pre>
<p>Finally, we look at the mean sojourn times for patients in the <em>healthy</em> and <em>diseased</em> states. Normally, for a process that can transition in and out of states this means the average time spent in the state each time it is visited. For our model, patients, only go forward through the chain, there is no getting better, so the sojourn for S2 is essentially the average amount of time patients spent in the <em>diseased</em> state.</p>
<pre class="r"><code>sojourn.msm(fit)</code></pre>
<pre><code>## estimates SE L U
## S1 3.481 0.4725 2.668 4.542
## S2 10.251 2.0045 6.987 15.038</code></pre>
</div>
<div id="a-few-remarks" class="section level3">
<h3>A Few Remarks</h3>
<p><sup>1</sup>The work done in R on survival analysis, and partially embodied in the two hundred thirty-three packages listed in the CRAN <a href="https://cran.r-project.org/web/views/Survival.html">Survival Analysis Task View</a>, constitutes a fundamental contribution to statistics. There is enough material here for a lifetime of study. Even confining oneself to a tour of the eleven packages listed in the simulation section would be a significant undertaking.</p>
<p><sup>2</sup> <code>genSurv</code> is a pretty bare bones package having just seven functions and little explanatory text. If it was not listed on the CRAN Task View, it would have been easy pass by. Nevertheless, I have only shown a small portion of what it can do.</p>
<p><sup>3</sup><code>msm</code> is an example of why R is a treasury of statistical knowledge. Not only does the package offer an impressive array of capabilities for analyzing multi-state Markov models in continuous time, the basic documentation, the package’s pdf, includes references to quite a few of the fundamental papers.</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2020/10/08/fake-data-for-the-illness-death-model/';</script>