R/Medicine on R Views
https://rviews.rstudio.com/tags/r/medicine/index.xml
Recent content in R/Medicine on R ViewsHugo -- gohugo.ioen-usRStudio, Inc. All Rights Reserved.Statistics in Glaucoma: Part II
https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/
Fri, 07 Dec 2018 00:00:00 +0000https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/
<p><em>Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science.</em></p>
<p><em>Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University.</em></p>
<div id="analyzing-visual-field-data" class="section level2">
<h2>Analyzing Visual Field Data</h2>
<p>In Part I of this series on statistic in glaucoma, we detailed the use of visual fields for understanding functional vision loss in glaucoma patients. Before discussing a new method for modeling visual field data that accounts for the anatomy of the eye, we discussed how visual field data is typically analyzed by introducing a common diagnostic metric, point-wise linear regression (PLR). PLR is a trend-based diagnostic that uses slope p-values from the location specific linear regressions to discriminate progression status. The motivation for PLR is straightforward, assuming that large negative slopes at numerous visual field locations is indicative of progression. This is characteristic of a large class of methods for analyzing visual field data that attempt to discriminate progression based on changes in the DLS across time. This technique is simple, intuitive, and effective; however, it is often limited due to the naivete of modeling assumptions, including the independence of visual field locations.</p>
</div>
<div id="ocular-anatomy-in-the-neighborhood-structure-of-the-visual-field" class="section level2">
<h2>Ocular Anatomy in the Neighborhood Structure of the Visual Field</h2>
<p>To properly account for the spatial dependencies on the visual field, Berchuck et al. 2018 introduce a neighborhood model that incorporates anatomical information through a dissimilarity metric. Details of the method can be found in Berchuck et al. 2018, but we provide a quick introduction. The key development is the specification of the neighborhood structure through a new definition of adjacency weights. Typically in areal data, the adjacency for two locations <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> is defined as <span class="math inline">\(w_{ij} = 1(i \sim j)\)</span>, where <span class="math inline">\(i \sim j\)</span> is the event that locations <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> are neighbors. As discussed in Part I, this assumption is not sufficient due to the complex anatomy of the eye. To account for this additional structure, a more general adjacency is introduced that is a function of a dissimilarity metric, <span class="math inline">\(w_{ij}(\alpha_t) = 1(i \sim j)\exp\{-z_{ij}\alpha_t\}\)</span>. Here, <span class="math inline">\(z_{ij}\)</span> is a dissimilarity metric that represents the absolute difference between the Garway-Heath angles of locations <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span>.</p>
<p>The parameter <span class="math inline">\(\alpha_t\)</span> dictates the importance of the dissimilarity metric at each visual field exam <span class="math inline">\(t\)</span>. When <span class="math inline">\(\alpha_t\)</span> becomes large, the model reduces to an independent process, and as <span class="math inline">\(\alpha_t\)</span> goes to zero, the process becomes the standard spatial model for areal data. Based on the specification of the adjacency weights, <span class="math inline">\(\alpha_t\)</span> has a useful interpretation with respect to deterioration of visual ability. In particular, <span class="math inline">\(\alpha_t\)</span> changing over exams indicates that the neighborhood structure on the visual field is changing, which in turn implies damage to the underlying retinal ganglion cell structure. This observation motivates a diagnostic of progression that quantifies variability in <span class="math inline">\(\alpha_t\)</span> across time. We choose the coefficient of variation (CV) and demonstrate that is a highly significant predictor of progression, and furthermore, independent of trend-based methods such as PLR.</p>
</div>
<div id="navigating-the-womblr-package" class="section level2">
<h2>Navigating the <code>womblR</code> Package</h2>
<p>To make the method available to clinicians, the R package <code>womblR</code> was developed. The package provides a suite of functions that walk a user through the full process of analyzing a series of visual fields from beginning to end. The user interface was modeled after other impactful R packages for Bayesian spatial analysis, including <code>spBayes</code> and <code>CARBayes</code>. The package name combines Hadley’s naming convention for R packages (i.e., ending a package with the letter R) with the name of the author of the seminal paper on boundary detection, originally referred to areal wombling (Womble 1951).</p>
<p>We will now walk through the process of analyzing visual field data, estimating the <span class="math inline">\(\alpha_t\)</span> parameters, and assessing progression status. The main function in <code>womblR</code> is the Spatiotemporal Boundary Detection with Dissimilarity Metric model function (<code>STBDwDM</code>). Inference for the method is obtained through Markov chain Monte Carlo (MCMC), which is a computationally intensive method that iterates between updating individual model parameters until enough posterior samples have been collected post-convergence for making accurate posterior inference. Because of the iterative nature of MCMC, the majority of computation is performed within a <code>for</code> loop, so the package is built on C++ through the packages <code>Rcpp</code> and <code>RcppArmadillo</code>. Because of the increased complexity of writing in C++, the pre- and post-processing of the model are done in <code>R</code> with the <code>for</code> loop implemented in C++. The MCMC method employed in <code>womblR</code> is a Metropolis-Hastings within Gibbs algorithm.</p>
<p>Just as a quick aside, with the more recent advent of probabilistic programming, this model could have been implemented using the Hamiltonian Monte Carlo methods used in software like Stan or PyMC3. These programs do not require the derivation of full conditionals, and push the MCMC algorithm to the background. There is undoubtedly a huge market for this type of software, and it is clearly playing a significant role in the popularization of Bayesian modeling. At the same time, implementing MCMC samplers using <code>Rcpp</code> with traditional MCMC algorithms can be instructive, and for those with experience, nearly as quick of a coding experience.</p>
<p>We now begin by formatting the visual field data for analysis. According to the manual, the observed data <code>Y</code> must first be ordered spatially and then temporally. Furthermore, we will remove all locations that correspond to the natural blind spot (which, in the Humphrey Field Analyzer-II, correspond to locations 26 and 35).</p>
<pre class="r"><code>###Load package
library(womblR)
###Format data
blind_spot <- c(26, 35) # define blind spot
VFSeries <- VFSeries[order(VFSeries$Location), ] # sort by location
VFSeries <- VFSeries[order(VFSeries$Visit), ] # sort by visit
VFSeries <- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locations
Y <- VFSeries$DLS # define observed outcome data</code></pre>
<p>Now that we have assigned the observed outcomes to <code>Y</code>, we move onto the temporal variable <code>Time</code>. For visual field data, we define this to be the time from the baseline visit. We obtain the unique days from the baseline visit and scale them to be on the year scale.</p>
<pre class="r"><code>Time <- unique(VFSeries$Time) / 365 # years since baseline visit
print(Time)</code></pre>
<pre><code>## [1] 0.0000000 0.3452055 0.6520548 1.1123288 1.3808219 1.6109589 2.0712329
## [8] 2.3780822 2.5698630</code></pre>
<p>Next, we assign the adjacency matrix and dissimilarity metric (both discussed in Part I).</p>
<pre class="r"><code>W <- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix
DM <- GarwayHeath[-blind_spot] # Garway-Heath angles</code></pre>
<p>Now that we have specified the data objects <code>Y</code>, <code>DM</code>, <code>W</code>, and <code>Time</code>, we will customize the objects that characterize Bayesian MCMC methods, in particular, hyperparameters, starting values, Metropolis tuning values, and MCMC inputs. These objects have been detailed previously in the <code>womblR</code> package <a href="https://cran.r-project.org/web/packages/womblR/vignettes/womblR-example.html">vignette</a>, so we will not spend time going over their definitions. We will only note that they are each <code>list</code> objects similar to the <code>spBayes</code> package. We begin by specifying the hyperparameters.</p>
<pre class="r"><code>###Bounds for temporal tuning parameter phi
TimeDist <- abs(outer(Time, Time, "-"))
TimeDistVec <- TimeDist[lower.tri(TimeDist)]
minDiff <- min(TimeDistVec)
maxDiff <- max(TimeDistVec)
PhiUpper <- -log(0.01) / minDiff # shortest diff goes down to 1%
PhiLower <- -log(0.95) / maxDiff # longest diff goes up to 95%
###Hyperparameter object
Hypers <- list(Delta = list(MuDelta = c(3, 0, 0), OmegaDelta = diag(c(1000, 1000, 1))),
T = list(Xi = 4, Psi = diag(3)),
Phi = list(APhi = PhiLower, BPhi = PhiUpper))</code></pre>
<p>Then we specify the starting values for the parameters, Metropolis tuning variances, and MCMC details.</p>
<pre class="r"><code>###Starting values
Starting <- list(Delta = c(3, 0, 0), T = diag(3), Phi = 0.5)
###Metropolis tuning variances
Nu <- length(Time) # calculate number of visits
Tuning <- list(Theta2 = rep(1, Nu), Theta3 = rep(1, Nu), Phi = 1)
###MCMC inputs
MCMC <- list(NBurn = 10000, NSims = 250000, NThin = 25, NPilot = 20)</code></pre>
<p>We specify that our model will run for a burn-in period of 10,000 scans, followed by 250,000 scans post burn-in. In the burn-in period there will be 20 iterations of pilot adaptation evenly spaced out over the period. The final number of samples to be used for inference will be thinned down to 10,000 based on the thinning number of 25. We can now run the MCMC sampler. Details of the various options available in the sampler can be found in the documentation, <code>help(STBDwDM)</code>.</p>
<pre class="r"><code>reg.STBDwDM <- STBDwDM(Y = Y, DM = DM, W = W, Time = Time,
Starting = Starting, Hypers = Hypers, Tuning = Tuning, MCMC = MCMC,
Family = "tobit",
TemporalStructure = "exponential",
Distance = "circumference",
Weights = "continuous",
Rho = 0.99,
ScaleY = 10,
ScaleDM = 100,
Seed = 54)
## Burn-in progress: |*************************************************|
## Sampler progress: 0%.. 10%.. 20%.. 30%.. 40%.. 50%.. 60%.. 70%.. 80%.. 90%.. 100%.. </code></pre>
<p>We quickly assess convergence by checking the traceplots of <span class="math inline">\(\alpha_t\)</span> (note that further MCMC convergence diagnostics should be used in practice).</p>
<pre class="r"><code>###Load coda package
library(coda)
###Convert alpha to an MCMC object
Alpha <- as.mcmc(reg.STBDwDM$alpha)
###Create traceplot
par(mfrow = c(3, 3))
for (t in 1:Nu) traceplot(Alpha[, t], ylab = bquote(alpha[.(t)]), main = bquote(paste("Posterior of " ~ alpha[.(t)])))</code></pre>
<p><img src="/post/2018-12-03-statistics-in-glaucoma-part-ii_files/figure-html/unnamed-chunk-8-1.png" width="689.28" /></p>
</div>
<div id="converting-mcmc-samples-into-clinical-statements" class="section level2">
<h2>Converting MCMC Samples into Clinical Statements</h2>
<p>Now we calculate the posterior distribution of the CV of <span class="math inline">\(\alpha_t\)</span> and print its moments.</p>
<pre class="r"><code>CVAlpha <- apply(Alpha, 1, function(x) sd(x) / mean(x))
plot(density(CVAlpha, adjust = 2), main = expression("Posterior of CV"~(alpha[t])), xlab = expression("CV"~(alpha[t])))</code></pre>
<p><img src="/post/2018-12-03-statistics-in-glaucoma-part-ii_files/figure-html/unnamed-chunk-9-1.png" width="50%" style="display: block; margin: auto;" /></p>
<pre class="r"><code>STCV <- c(mean(CVAlpha), sd(CVAlpha), quantile(CVAlpha, probs = c(0.025, 0.975)))
names(STCV)[1:2] <- c("Mean", "SD")
print(STCV)</code></pre>
<pre><code>## Mean SD 2.5% 97.5%
## 0.19121622 0.10205826 0.04636219 0.42744656</code></pre>
<p>For this information to be useful clinically, we convert it into a probability of progression based on a model trained on a large cohort of glaucoma patients (Berchuck et al. 2019). Because the information from <span class="math inline">\(\alpha_t\)</span> is independent of trend-based methods, we show that the optimal use of <span class="math inline">\(\alpha_t\)</span> is combining it with a basic global metric that includes the slope and p-value (and their interaction) of the overall mean at each visual field exam. The trained model coefficients are publicly available and are used below. Furthermore, both the mean, standard deviation, and their interaction of the CV of <span class="math inline">\(\alpha_t\)</span> are included. The probability of progression can be calculated as follows.</p>
<pre class="r"><code>###Calculate the global metric slope and p-value
MeanSens <- apply(t(matrix(VFSeries$DLS, ncol = Nu)) / 10, 1, mean) # scaled mean DLS
reg.global <- lm(MeanSens ~ Time) # global regression
GlobalS <- summary(reg.global)$coef[2, 1] # global slope
GlobalP <- summary(reg.global)$coef[2, 4] # global p-value
###Obtain probabiltiy of progression using estimated parameters from Berchuck et al. 2019
input <- c(1, GlobalP, GlobalS, STCV[1], STCV[2], GlobalS * GlobalP, STCV[1] * STCV[2])
coef <- c(-1.7471655, -0.2502131, -13.7317622, 7.4746348, -8.9152523, 18.6964153, -13.3706058)
fit <- input %*% coef
exp(fit) / (1 + exp(fit))</code></pre>
<pre><code>## [,1]
## [1,] 0.4355997</code></pre>
<p>The probability of progression is calculated to be 0.44, which can be compared to the threshold cutoff for the trained model of 0.325. This cutoff for the probability of progression was determined using operating characteristics, so that the specificity was forced to be in the clinically meaningful range of 85%. Based on this derived threshold, the probability of progression is high enough to indicate that this patient’s disease shows evidence of visual field progression (which is reassuring, because we know this patient has progression as determined by clinicians).</p>
<p><code>Looking ahead:</code> The third installment will wrap up the discussion on the <code>womblR</code> package and ponder future directions for the role of statistics in glaucoma research. Furthermore, the role of open-source software in medicine will be discussed.</p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<ol style="list-style-type: decimal">
<li>Berchuck, S.I., Mwanza, J.C., & Warren, J.L. (2018). <a href="https://arxiv.org/abs/1805.11636"><em>Diagnosing Glaucoma Progression with Visual Field Data Using a Spatiotemporal Boundary Detection Method</em></a>, In press at <em>Journal of the American Statistical Association</em>.</li>
<li>Womble, W. H. (1951). <a href="http://science.sciencemag.org/content/114/2961/315"><em>Differential Systematics</em></a>. <em>Science</em>, 114(2961), 315-322.</li>
<li>Berchuck, S.I., Mwanza, J.C., Tanna, A.P., Budenz, D.L., Warren, J.L. (2019). <em>Improved Detection of Visual Field Progression Using a Spatiotemporal Boundary Detection Method</em>. In press at <em>Scientific Reports</em> (Available upon request).</li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/';</script>
Statistics in Glaucoma: Part I
https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/
Mon, 03 Dec 2018 00:00:00 +0000https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/
<p><em>Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science.</em></p>
<p><em>Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University.</em></p>
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>Glaucoma is a leading cause of blindness worldwide, with a prevalence of 4% in the population aged 40-80. The disease is characterized by retinal ganglion cell death and corresponding damage to the optic nerve head. Since visual impairment caused by glaucoma is irreversible and efficient treatments exist, early detection of the disease is essential. Determining if the disease is progressing remains one of the most challenging aspects of glaucoma management, since it is difficult to distinguish true progression from variability due to natural degradation or noise. In practice, clinicians monitor progression using a multifactorial approach that relies on various measurements of the disease. In this series of blog posts, we focus on the use of visual fields. Visual field examinations obtain levels of a patient’s actual vision, and the practice is thus referred to as a functional measurement. As such, visual fields are a proxy for a patient’s quality of life, and therefore are typically prioritized in practice.</p>
</div>
<div id="visual-field-data" class="section level2">
<h2>Visual Field Data</h2>
<p>Visual fields are complex spatiotemporal data generated from an intricate anatomical system, which is important to understand for modeling purposes. To illustrate visual field data, we load an example data set from the <code>womblR</code> package on CRAN. The package <code>womblR</code> was developed specifically for analyzing visual field data, and uses a Bayesian hierarchical model that accounts for the complex nature of the data (more details will be provided in Part II). The specific data set comes from the Vein Pulsation Study Trial in Glaucoma and the Lions Eye Institute trial registry, Perth, Western Australia. We begin by loading the package.</p>
<pre class="r"><code>library(womblR)</code></pre>
<p>The data set of interest is loaded lazily and can be accessed as follows; we also view the first six rows for illustration.</p>
<pre class="r"><code>data(VFSeries)
head(VFSeries)</code></pre>
<pre><code>## Visit DLS Time Location
## 1 1 25 0 1
## 2 2 23 126 1
## 3 3 23 238 1
## 4 4 23 406 1
## 5 5 24 504 1
## 6 6 21 588 1</code></pre>
<p>The data object <code>VFSeries</code> contains a longitudinal series of visual fields for a glaucoma patient that we will use throughout the three blog posts to exemplify the study of visual fields. This patient has been determined to be progressing, based on the expertise of two clinicians. <code>VFSeries</code> has four variables: <code>Visit</code>, <code>DLS</code>, <code>Time</code>, and <code>Location</code>. The variable <code>Visit</code> represents the visual field test visit number, <code>DLS</code> the observed measure, <code>Time</code> the time of the visual field test (in days from baseline visit), and <code>Location</code> the spatial location on the visual field where the observation occurred. There are 9 visual field exams contained in this data set, and on average 117.25 days between visits.</p>
<p>To help visualize the dataframe, we can use the <code>PlotVFTimeSeries</code> function. <code>PlotVFTimeSeries</code> is a function that plots a patient’s observed visual acuity over time at each location on the visual field.</p>
<pre class="r"><code>PlotVfTimeSeries(Y = VFSeries$DLS,
Location = VFSeries$Location,
Time = VFSeries$Time,
main = "Visual field sensitivity time series \n at each location",
xlab = "Days from baseline visit",
ylab = "Differential light sensitivity (dB)",
line.reg = FALSE)</code></pre>
<p><img src="/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-3-1.png" width="528" style="display: block; margin: auto;" /></p>
<p>The above figure demonstrates the visual field from a Humphrey Field Analyzer-II (HFA-II) testing machine, which generates 54 spatial locations (only 52 informative locations; note the 2 blanks spots corresponding to the blind spot). The visual field map is constructed by assessing a patient’s response to varying levels of light. Patients are instructed to focus on a central fixation point as light is introduced randomly in a preceding manner over a grid on the visual field. As light is observed, the patient presses a button and the current light intensity is recorded. The process is repeated until the entire visual field is tested. The light intensity is measured in differential light sensitivity (DLS), which quantifies the difference in the HFA-II background and observed light intensity. Smaller values indicate worsening vision.</p>
</div>
<div id="spatial-anatomy-on-the-visual-field" class="section level2">
<h2>Spatial Anatomy on the Visual Field</h2>
<p>The spatial surface of the visual field is observed on a lattice (i.e., uniform areal data); however, it is a complex projection of the underlying optic nerve head and exhibits anatomically induced spatial dependencies. In particular, localized damage to the optic disc can result in clinically deterministic deterioration across the visual field. Incorporating this non-standard spatial dependence structure into our methodology is a priority for properly analyzing these data, although it is commonly ignored. Translating this into math lingo, this means that a naive modeling of the spatial surface of the visual field would be inappropriate (i.e., neighbors defined through adjacent locations). Instead, the definition of a neighbor when considering vision loss on the visual field must depend on the underlying anatomical proximities.</p>
<p>To illustrate this concept, we begin by displaying the visual field neighborhood structure. The adjacency matrix for the HFA-II is available in the <code>womblR</code> package. In this analysis, we use a queen specification, meaning that an adjacency is defined as any location that shares an edge or corner on the lattice. We now load this adjacency matrix and remove the two locations that correspond to the blind spot.</p>
<pre class="r"><code>blind_spot <- c(26, 35) # define blind spot
W <- HFAII_Queen[-blind_spot, -blind_spot] # HFA-II visual field adjacency matrix</code></pre>
<p>This adjacency structure can be displayed using the <code>graph.adjacency</code> function in the <code>igraph</code> package.</p>
<pre class="r"><code>library(igraph)
adj.graph <- graph.adjacency(W, mode = "undirected")
plot(adj.graph)</code></pre>
<p><img src="/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-5-1.png" width="528" style="display: block; margin: auto;" /></p>
<p>As mentioned above, naively assuming that all of these adjacencies are equal ignores the important underlying anatomy that enforces these dependencies. This anatomical relationship of the visual field test points and the underlying optic nerve head was studied by Garway-Heath et al. (2000), in which they estimated the angle that each test location’s underlying retinal ganglion cells enters the optic disc, measured in degrees. These angles are the missing link that will allow the visual field adjacency structure to be dictated by the underlying anatomy. These angles can be visualized using the function <code>PlotAdjacency</code> from <code>womblR</code>, which displays neighborhood structures across the visual field. Before using this function, we need to load the angles measured in Garway-Heath et al. (2000). These are available from <code>womblR</code>; again, we remove the blind spot before using.</p>
<pre class="r"><code>Angles <- GarwayHeath[-blind_spot] # Garway-Heath angles
summary(Angles)</code></pre>
<pre><code>## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 80.75 192.50 177.35 275.75 329.00</code></pre>
<p>We are now ready to visualize the neighborhood structure of the visual field using the <code>PlotAdjacency</code> function.</p>
<pre class="r"><code>###Plot the angles on the visual field
PlotAdjacency(W = W,
DM = Angles,
zlim = c(0, 180),
Visit = NA,
edgewidth = 3.75,
cornerwidth = 0.33,
lwd.border = 3.75,
main = "Garway-Heath angles\n across the visual field")</code></pre>
<p><img src="/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-7-1.png" width="528" style="display: block; margin: auto;" /></p>
<p>The angles measured by Garway-Heath et al. are presented at each location on the visual field. More interestingly, the distances between these angles are presented for each of the neighbor pairs. This figure is equivalent to the adjacency plot displayed above, but allows the adjacencies to vary as a function of the anatomy. In particular, if two visual field locations are anatomically similar, the dependency is strengthened (i.e., more white), and if the locations are close to anatomically independent, the dependency is weaker (i.e., more black). Here the edge adjacencies are represented by lines, while the diagonal adjacencies are represented as two triangles. This view of the visual field details the anatomical importance in modeling visual field data, as neighboring locations can have underlying retinal ganglion cells that enter the optic nerve head with a large degree of separation. In particular, locations on either side of the equator, although adjacent, are anatomically close to independent based on anatomy.</p>
</div>
<div id="how-to-model-visual-field-data" class="section level2">
<h2>How to Model Visual Field Data?</h2>
<p>If you have gotten this far in the post, hopefully you have the sense that the study of visual field data is statistically interesting and clinically important for properly assessing a glaucoma patient’s risk of vision loss. In the next two blog posts, we will explore how visual field data are currently analyzed and new methods that account for the anatomical structure detailed above. To accomplish this, we will break down the algorithm and software used to build the <code>womblR</code> package, and will attempt to illustrate the importance of R packages for open-source clinical research.</p>
</div>
<div id="reference" class="section level2">
<h2>Reference</h2>
<ol style="list-style-type: decimal">
<li>Garway-Heath, David F., Darmalingum Poinoosawmy, Frederick W. Fitzke, and Roger A. Hitchings. “Mapping the visual field to the optic disc in normal tension glaucoma eyes” <em>Ophthalmology</em> 107, no. 10 (2000): 1809-1815.</li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/';</script>
Serendipity at R / Medicine
https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/
Tue, 16 Oct 2018 00:00:00 +0000https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/
<p>We knew we were on to something important early on in the process of organizing <a href="www.r-medicine.com">R / Medicine 2018</a>. Even during our initial attempts to articulate the differences between this conference and <a href="www.rinpharma.com">R / Pharma 2018</a>, it became clear that the focus on the use of R and statistics in clinical settings was going to be a richer topic than just the design of clinical trials. However, it wasn’t until the conference got underway that we realized there was magic in the mix of attendees. R / Medicine attracted quite a few clinicians who were themselves using R in their work, or were in the process of teaching themselves R. This group catalyzed the discussions that continued throughout the conference, enabling high-bandwidth exchanges that would have otherwise suffered from the effort to translate between the two cultures. The small, single-track nature of the conference helped to keep the conversations going, with the questions and answers at the end of a given talk helping to enrich the quality of successive discussions.</p>
<p>Rob Tibshirani set the collaborative tone for the conference with his opening <a href="https://r-medicine.netlify.com/talks/talk7.pdf">keynote talk</a> describing the clinical forecasting system he and his collaborators have built to predict platelet usage for the Stanford hospitals. Big-league and big-impact, the system shows the promise of delivering real clinical and financial benefits. Tibshirani’s presentation of the modeling process also set the bar for clarity.</p>
<p>The other keynotes were also “top shelf”. Michael Lawrence spoke about <a href="https://r-medicine.netlify.com/talks/michael-lawrence-keynote.pdf">Scientific Software In-the-Large</a>. He laid out three challenges for scientific programming at this scale:<br />
* Integration of independently developed modules<br />
* Translation of analyses and prototypes into software<br />
* Scalability<br />
and addressed these issues using examples from the <a href="https://www.bioconductor.org/">Bioconductor</a> project.</p>
<p>Victoria Stodden’s Keynote, <a href="http://web.stanford.edu/~vcs/talks/Yale-Sept-2018-STODDEN.pdf">Computational Reproducibility in Medical Research: Toward Open Code and Data</a>, was a meditation on the need to reassess scientific transparency in an age where big data and computational power are driving medical research, and deep intellectual contributions are encoded in software. I was particularly struck by the idea that progress towards computational reproducibility depends on the coordination of stakeholders.</p>
<p><img src="/post/2018-10-10-RMedicine_files/progress.png" height = "500" width="700"></p>
<p>Perhaps the highest-energy talk of the conference (and maybe all of the conferences I have attended this year) was given by Yale’s <a href="https://medicine.yale.edu/intmed/people/harlan_krumholz.profile">Dr. Harlan Krumholz</a>. Unfortunately, we have neither video nor slides from this keynote, but to give you some ideal of Dr. Krumholz iconoclastic work, look at the 2010 <a href="https://www.forbes.com/forbes/2010/0927/opinions-harlan-krumholz-yale-medicine-ideas-opinions.html#311fdfca6db3">Forbes Article</a> and this more recent article published in <a href="https://www.healthaffairs.org/doi/10.1377/hlthaff.2014.0053">HealthAffairs</a>. The following are some notes I managed to take at the talk between moments of mesmerization. With respect to medicine in general Dr. Krumholz said that:</p>
<blockquote>
<p>There could not be a more exciting era in medicine. Medicine is emerging as an information science and the clinician’s role is changing to be a guide or interpretor, not a shaman.</p>
</blockquote>
<p>Commenting on evidence-based medicine:</p>
<blockquote>
<p>More than half of the guidlines in cardiology are not based on evidence.</p>
</blockquote>
<p>With respect to medical data, he said:</p>
<blockquote>
<p>The goal should be to take high-dimensional data and make it low-dimensional. Instead of thinking that everyone should have the same data, we should move towards thinking: How dow we use the data that we do have? There should be no missing data.</p>
</blockquote>
<p>I took these statements to mean that teams of clinicians, statisticians, and data scientists should be working towards building predictive models for individual patients based on whatever data is available for them and whatever big data is relevant. This was clearly the music the crowd wanted to dance to.</p>
<p>The slides for most of the rest of the talks are available on the website. One talk I would like to highlight here is Nathaniel Phillips’ talk on <a href="https://ndphillips.github.io/RMedicine_2018/#1">Fast and Frugal Trees</a>.</p>
<p><img src="/post/2018-10-10-RMedicine_files/fft.png" height = "500" width="700"></p>
<p>This talk addressed a recurring theme throughout the conference: the difference in decision making between the two cultures of statisticians and physicians. Probabilistic estimates to characteristic risk and to inform decision making are central to a statisticians worldview. Physicians, on the other hand, are in general not comfortable with probabilities, and when push comes to shove, prefer unambiguous guidelines and thresholds, such as blood pressure ranges, to inform treatment decisions. A vexing cultural problem is to identify effective decision models that have a chance of actually being used by clinicians.</p>
<p>The conference finished with a roundtable discussion with the theme <em>Bridging the Two Cultures</em>, with panelists Beth Atkinson, Joseph Chou, Peter Higgins, Stephan Kadauke, Chinonyerem Madu, and Jack Wasey representing both the statistical and clinical points of view. The moderator (me) began by asking three questions:
1. How do clinicians engage with statisticians and data scientists?
2. What are some key ideas you should know about collaborating?
3. In your experience, what kinds of engagements have been the most successful?</p>
<p>Panelists were free to respond as they felt inclined to any or all of the questions. As I recall, a consensus emerged around three key ideas: make an effort to empathize with colleagues, meet frequently and go out of your way to interact with colleagues, and carefully select projects and then cultivate them.</p>
<p>Planning is already underway for R / Medicine 2019. Mark the week of September 23rd, and stay tuned!</p>
<script>window.location.href='https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/';</script>