R Views
https://rviews.rstudio.com/index.xml
Recent content on R ViewsHugo -- gohugo.ioen-usRStudio, Inc. All Rights Reserved.Fri, 07 Dec 2018 00:00:00 +0000Statistics in Glaucoma: Part II
https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/
Fri, 07 Dec 2018 00:00:00 +0000https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/
<p><em>Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science.</em></p>
<p><em>Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University.</em></p>
<div id="analyzing-visual-field-data" class="section level2">
<h2>Analyzing Visual Field Data</h2>
<p>In Part I of this series on statistic in glaucoma, we detailed the use of visual fields for understanding functional vision loss in glaucoma patients. Before discussing a new method for modeling visual field data that accounts for the anatomy of the eye, we discussed how visual field data is typically analyzed by introducing a common diagnostic metric, point-wise linear regression (PLR). PLR is a trend-based diagnostic that uses slope p-values from the location specific linear regressions to discriminate progression status. The motivation for PLR is straightforward, assuming that large negative slopes at numerous visual field locations is indicative of progression. This is characteristic of a large class of methods for analyzing visual field data that attempt to discriminate progression based on changes in the DLS across time. This technique is simple, intuitive, and effective; however, it is often limited due to the naivete of modeling assumptions, including the independence of visual field locations.</p>
</div>
<div id="ocular-anatomy-in-the-neighborhood-structure-of-the-visual-field" class="section level2">
<h2>Ocular Anatomy in the Neighborhood Structure of the Visual Field</h2>
<p>To properly account for the spatial dependencies on the visual field, Berchuck et al. 2018 introduce a neighborhood model that incorporates anatomical information through a dissimilarity metric. Details of the method can be found in Berchuck et al. 2018, but we provide a quick introduction. The key development is the specification of the neighborhood structure through a new definition of adjacency weights. Typically in areal data, the adjacency for two locations <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> is defined as <span class="math inline">\(w_{ij} = 1(i \sim j)\)</span>, where <span class="math inline">\(i \sim j\)</span> is the event that locations <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span> are neighbors. As discussed in Part I, this assumption is not sufficient due to the complex anatomy of the eye. To account for this additional structure, a more general adjacency is introduced that is a function of a dissimilarity metric, <span class="math inline">\(w_{ij}(\alpha_t) = 1(i \sim j)\exp\{-z_{ij}\alpha_t\}\)</span>. Here, <span class="math inline">\(z_{ij}\)</span> is a dissimilarity metric that represents the absolute difference between the Garway-Heath angles of locations <span class="math inline">\(i\)</span> and <span class="math inline">\(j\)</span>.</p>
<p>The parameter <span class="math inline">\(\alpha_t\)</span> dictates the importance of the dissimilarity metric at each visual field exam <span class="math inline">\(t\)</span>. When <span class="math inline">\(\alpha_t\)</span> becomes large, the model reduces to an independent process, and as <span class="math inline">\(\alpha_t\)</span> goes to zero, the process becomes the standard spatial model for areal data. Based on the specification of the adjacency weights, <span class="math inline">\(\alpha_t\)</span> has a useful interpretation with respect to deterioration of visual ability. In particular, <span class="math inline">\(\alpha_t\)</span> changing over exams indicates that the neighborhood structure on the visual field is changing, which in turn implies damage to the underlying retinal ganglion cell structure. This observation motivates a diagnostic of progression that quantifies variability in <span class="math inline">\(\alpha_t\)</span> across time. We choose the coefficient of variation (CV) and demonstrate that is a highly significant predictor of progression, and furthermore, independent of trend-based methods such as PLR.</p>
</div>
<div id="navigating-the-womblr-package" class="section level2">
<h2>Navigating the <code>womblR</code> Package</h2>
<p>To make the method available to clinicians, the R package <code>womblR</code> was developed. The package provides a suite of functions that walk a user through the full process of analyzing a series of visual fields from beginning to end. The user interface was modeled after other impactful R packages for Bayesian spatial analysis, including <code>spBayes</code> and <code>CARBayes</code>. The package name combines Hadley’s naming convention for R packages (i.e., ending a package with the letter R) with the name of the author of the seminal paper on boundary detection, originally referred to areal wombling (Womble 1951).</p>
<p>We will now walk through the process of analyzing visual field data, estimating the <span class="math inline">\(\alpha_t\)</span> parameters, and assessing progression status. The main function in <code>womblR</code> is the Spatiotemporal Boundary Detection with Dissimilarity Metric model function (<code>STBDwDM</code>). Inference for the method is obtained through Markov chain Monte Carlo (MCMC), which is a computationally intensive method that iterates between updating individual model parameters until enough posterior samples have been collected post-convergence for making accurate posterior inference. Because of the iterative nature of MCMC, the majority of computation is performed within a <code>for</code> loop, so the package is built on C++ through the packages <code>Rcpp</code> and <code>RcppArmadillo</code>. Because of the increased complexity of writing in C++, the pre- and post-processing of the model are done in <code>R</code> with the <code>for</code> loop implemented in C++. The MCMC method employed in <code>womblR</code> is a Metropolis-Hastings within Gibbs algorithm.</p>
<p>Just as a quick aside, with the more recent advent of probabilistic programming, this model could have been implemented using the Hamiltonian Monte Carlo methods used in software like Stan or PyMC3. These programs do not require the derivation of full conditionals, and push the MCMC algorithm to the background. There is undoubtedly a huge market for this type of software, and it is clearly playing a significant role in the popularization of Bayesian modeling. At the same time, implementing MCMC samplers using <code>Rcpp</code> with traditional MCMC algorithms can be instructive, and for those with experience, nearly as quick of a coding experience.</p>
<p>We now begin by formatting the visual field data for analysis. According to the manual, the observed data <code>Y</code> must first be ordered spatially and then temporally. Furthermore, we will remove all locations that correspond to the natural blind spot (which, in the Humphrey Field Analyzer-II, correspond to locations 26 and 35).</p>
<pre class="r"><code>###Load package
library(womblR)
###Format data
blind_spot <- c(26, 35) # define blind spot
VFSeries <- VFSeries[order(VFSeries$Location), ] # sort by location
VFSeries <- VFSeries[order(VFSeries$Visit), ] # sort by visit
VFSeries <- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locations
Y <- VFSeries$DLS # define observed outcome data</code></pre>
<p>Now that we have assigned the observed outcomes to <code>Y</code>, we move onto the temporal variable <code>Time</code>. For visual field data, we define this to be the time from the baseline visit. We obtain the unique days from the baseline visit and scale them to be on the year scale.</p>
<pre class="r"><code>Time <- unique(VFSeries$Time) / 365 # years since baseline visit
print(Time)</code></pre>
<pre><code>## [1] 0.0000000 0.3452055 0.6520548 1.1123288 1.3808219 1.6109589 2.0712329
## [8] 2.3780822 2.5698630</code></pre>
<p>Next, we assign the adjacency matrix and dissimilarity metric (both discussed in Part I).</p>
<pre class="r"><code>W <- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix
DM <- GarwayHeath[-blind_spot] # Garway-Heath angles</code></pre>
<p>Now that we have specified the data objects <code>Y</code>, <code>DM</code>, <code>W</code>, and <code>Time</code>, we will customize the objects that characterize Bayesian MCMC methods, in particular, hyperparameters, starting values, Metropolis tuning values, and MCMC inputs. These objects have been detailed previously in the <code>womblR</code> package <a href="https://cran.r-project.org/web/packages/womblR/vignettes/womblR-example.html">vignette</a>, so we will not spend time going over their definitions. We will only note that they are each <code>list</code> objects similar to the <code>spBayes</code> package. We begin by specifying the hyperparameters.</p>
<pre class="r"><code>###Bounds for temporal tuning parameter phi
TimeDist <- abs(outer(Time, Time, "-"))
TimeDistVec <- TimeDist[lower.tri(TimeDist)]
minDiff <- min(TimeDistVec)
maxDiff <- max(TimeDistVec)
PhiUpper <- -log(0.01) / minDiff # shortest diff goes down to 1%
PhiLower <- -log(0.95) / maxDiff # longest diff goes up to 95%
###Hyperparameter object
Hypers <- list(Delta = list(MuDelta = c(3, 0, 0), OmegaDelta = diag(c(1000, 1000, 1))),
T = list(Xi = 4, Psi = diag(3)),
Phi = list(APhi = PhiLower, BPhi = PhiUpper))</code></pre>
<p>Then we specify the starting values for the parameters, Metropolis tuning variances, and MCMC details.</p>
<pre class="r"><code>###Starting values
Starting <- list(Delta = c(3, 0, 0), T = diag(3), Phi = 0.5)
###Metropolis tuning variances
Nu <- length(Time) # calculate number of visits
Tuning <- list(Theta2 = rep(1, Nu), Theta3 = rep(1, Nu), Phi = 1)
###MCMC inputs
MCMC <- list(NBurn = 10000, NSims = 250000, NThin = 25, NPilot = 20)</code></pre>
<p>We specify that our model will run for a burn-in period of 10,000 scans, followed by 250,000 scans post burn-in. In the burn-in period there will be 20 iterations of pilot adaptation evenly spaced out over the period. The final number of samples to be used for inference will be thinned down to 10,000 based on the thinning number of 25. We can now run the MCMC sampler. Details of the various options available in the sampler can be found in the documentation, <code>help(STBDwDM)</code>.</p>
<pre class="r"><code>reg.STBDwDM <- STBDwDM(Y = Y, DM = DM, W = W, Time = Time,
Starting = Starting, Hypers = Hypers, Tuning = Tuning, MCMC = MCMC,
Family = "tobit",
TemporalStructure = "exponential",
Distance = "circumference",
Weights = "continuous",
Rho = 0.99,
ScaleY = 10,
ScaleDM = 100,
Seed = 54)
## Burn-in progress: |*************************************************|
## Sampler progress: 0%.. 10%.. 20%.. 30%.. 40%.. 50%.. 60%.. 70%.. 80%.. 90%.. 100%.. </code></pre>
<p>We quickly assess convergence by checking the traceplots of <span class="math inline">\(\alpha_t\)</span> (note that further MCMC convergence diagnostics should be used in practice).</p>
<pre class="r"><code>###Load coda package
library(coda)
###Convert alpha to an MCMC object
Alpha <- as.mcmc(reg.STBDwDM$alpha)
###Create traceplot
par(mfrow = c(3, 3))
for (t in 1:Nu) traceplot(Alpha[, t], ylab = bquote(alpha[.(t)]), main = bquote(paste("Posterior of " ~ alpha[.(t)])))</code></pre>
<p><img src="/post/2018-12-03-statistics-in-glaucoma-part-ii_files/figure-html/unnamed-chunk-8-1.png" width="689.28" /></p>
</div>
<div id="converting-mcmc-samples-into-clinical-statements" class="section level2">
<h2>Converting MCMC Samples into Clinical Statements</h2>
<p>Now we calculate the posterior distribution of the CV of <span class="math inline">\(\alpha_t\)</span> and print its moments.</p>
<pre class="r"><code>CVAlpha <- apply(Alpha, 1, function(x) sd(x) / mean(x))
plot(density(CVAlpha, adjust = 2), main = expression("Posterior of CV"~(alpha[t])), xlab = expression("CV"~(alpha[t])))</code></pre>
<p><img src="/post/2018-12-03-statistics-in-glaucoma-part-ii_files/figure-html/unnamed-chunk-9-1.png" width="50%" style="display: block; margin: auto;" /></p>
<pre class="r"><code>STCV <- c(mean(CVAlpha), sd(CVAlpha), quantile(CVAlpha, probs = c(0.025, 0.975)))
names(STCV)[1:2] <- c("Mean", "SD")
print(STCV)</code></pre>
<pre><code>## Mean SD 2.5% 97.5%
## 0.19121622 0.10205826 0.04636219 0.42744656</code></pre>
<p>For this information to be useful clinically, we convert it into a probability of progression based on a model trained on a large cohort of glaucoma patients (Berchuck et al. 2019). Because the information from <span class="math inline">\(\alpha_t\)</span> is independent of trend-based methods, we show that the optimal use of <span class="math inline">\(\alpha_t\)</span> is combining it with a basic global metric that includes the slope and p-value (and their interaction) of the overall mean at each visual field exam. The trained model coefficients are publicly available and are used below. Furthermore, both the mean, standard deviation, and their interaction of the CV of <span class="math inline">\(\alpha_t\)</span> are included. The probability of progression can be calculated as follows.</p>
<pre class="r"><code>###Calculate the global metric slope and p-value
MeanSens <- apply(t(matrix(VFSeries$DLS, ncol = Nu)) / 10, 1, mean) # scaled mean DLS
reg.global <- lm(MeanSens ~ Time) # global regression
GlobalS <- summary(reg.global)$coef[2, 1] # global slope
GlobalP <- summary(reg.global)$coef[2, 4] # global p-value
###Obtain probabiltiy of progression using estimated parameters from Berchuck et al. 2019
input <- c(1, GlobalP, GlobalS, STCV[1], STCV[2], GlobalS * GlobalP, STCV[1] * STCV[2])
coef <- c(-1.7471655, -0.2502131, -13.7317622, 7.4746348, -8.9152523, 18.6964153, -13.3706058)
fit <- input %*% coef
exp(fit) / (1 + exp(fit))</code></pre>
<pre><code>## [,1]
## [1,] 0.4355997</code></pre>
<p>The probability of progression is calculated to be 0.44, which can be compared to the threshold cutoff for the trained model of 0.325. This cutoff for the probability of progression was determined using operating characteristics, so that the specificity was forced to be in the clinically meaningful range of 85%. Based on this derived threshold, the probability of progression is high enough to indicate that this patient’s disease shows evidence of visual field progression (which is reassuring, because we know this patient has progression as determined by clinicians).</p>
<p><code>Looking ahead:</code> The third installment will wrap up the discussion on the <code>womblR</code> package and ponder future directions for the role of statistics in glaucoma research. Furthermore, the role of open-source software in medicine will be discussed.</p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<ol style="list-style-type: decimal">
<li>Berchuck, S.I., Mwanza, J.C., & Warren, J.L. (2018). <a href="https://arxiv.org/abs/1805.11636"><em>Diagnosing Glaucoma Progression with Visual Field Data Using a Spatiotemporal Boundary Detection Method</em></a>, In press at <em>Journal of the American Statistical Association</em>.</li>
<li>Womble, W. H. (1951). <a href="http://science.sciencemag.org/content/114/2961/315"><em>Differential Systematics</em></a>. <em>Science</em>, 114(2961), 315-322.</li>
<li>Berchuck, S.I., Mwanza, J.C., Tanna, A.P., Budenz, D.L., Warren, J.L. (2019). <em>Improved Detection of Visual Field Progression Using a Spatiotemporal Boundary Detection Method</em>. In press at <em>Scientific Reports</em> (Available upon request).</li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/';</script>
Statistics in Glaucoma: Part I
https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/
Mon, 03 Dec 2018 00:00:00 +0000https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/
<p><em>Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science.</em></p>
<p><em>Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University.</em></p>
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>Glaucoma is a leading cause of blindness worldwide, with a prevalence of 4% in the population aged 40-80. The disease is characterized by retinal ganglion cell death and corresponding damage to the optic nerve head. Since visual impairment caused by glaucoma is irreversible and efficient treatments exist, early detection of the disease is essential. Determining if the disease is progressing remains one of the most challenging aspects of glaucoma management, since it is difficult to distinguish true progression from variability due to natural degradation or noise. In practice, clinicians monitor progression using a multifactorial approach that relies on various measurements of the disease. In this series of blog posts, we focus on the use of visual fields. Visual field examinations obtain levels of a patient’s actual vision, and the practice is thus referred to as a functional measurement. As such, visual fields are a proxy for a patient’s quality of life, and therefore are typically prioritized in practice.</p>
</div>
<div id="visual-field-data" class="section level2">
<h2>Visual Field Data</h2>
<p>Visual fields are complex spatiotemporal data generated from an intricate anatomical system, which is important to understand for modeling purposes. To illustrate visual field data, we load an example data set from the <code>womblR</code> package on CRAN. The package <code>womblR</code> was developed specifically for analyzing visual field data, and uses a Bayesian hierarchical model that accounts for the complex nature of the data (more details will be provided in Part II). The specific data set comes from the Vein Pulsation Study Trial in Glaucoma and the Lions Eye Institute trial registry, Perth, Western Australia. We begin by loading the package.</p>
<pre class="r"><code>library(womblR)</code></pre>
<p>The data set of interest is loaded lazily and can be accessed as follows; we also view the first six rows for illustration.</p>
<pre class="r"><code>data(VFSeries)
head(VFSeries)</code></pre>
<pre><code>## Visit DLS Time Location
## 1 1 25 0 1
## 2 2 23 126 1
## 3 3 23 238 1
## 4 4 23 406 1
## 5 5 24 504 1
## 6 6 21 588 1</code></pre>
<p>The data object <code>VFSeries</code> contains a longitudinal series of visual fields for a glaucoma patient that we will use throughout the three blog posts to exemplify the study of visual fields. This patient has been determined to be progressing, based on the expertise of two clinicians. <code>VFSeries</code> has four variables: <code>Visit</code>, <code>DLS</code>, <code>Time</code>, and <code>Location</code>. The variable <code>Visit</code> represents the visual field test visit number, <code>DLS</code> the observed measure, <code>Time</code> the time of the visual field test (in days from baseline visit), and <code>Location</code> the spatial location on the visual field where the observation occurred. There are 9 visual field exams contained in this data set, and on average 117.25 days between visits.</p>
<p>To help visualize the dataframe, we can use the <code>PlotVFTimeSeries</code> function. <code>PlotVFTimeSeries</code> is a function that plots a patient’s observed visual acuity over time at each location on the visual field.</p>
<pre class="r"><code>PlotVfTimeSeries(Y = VFSeries$DLS,
Location = VFSeries$Location,
Time = VFSeries$Time,
main = "Visual field sensitivity time series \n at each location",
xlab = "Days from baseline visit",
ylab = "Differential light sensitivity (dB)",
line.reg = FALSE)</code></pre>
<p><img src="/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-3-1.png" width="528" style="display: block; margin: auto;" /></p>
<p>The above figure demonstrates the visual field from a Humphrey Field Analyzer-II (HFA-II) testing machine, which generates 54 spatial locations (only 52 informative locations; note the 2 blanks spots corresponding to the blind spot). The visual field map is constructed by assessing a patient’s response to varying levels of light. Patients are instructed to focus on a central fixation point as light is introduced randomly in a preceding manner over a grid on the visual field. As light is observed, the patient presses a button and the current light intensity is recorded. The process is repeated until the entire visual field is tested. The light intensity is measured in differential light sensitivity (DLS), which quantifies the difference in the HFA-II background and observed light intensity. Smaller values indicate worsening vision.</p>
</div>
<div id="spatial-anatomy-on-the-visual-field" class="section level2">
<h2>Spatial Anatomy on the Visual Field</h2>
<p>The spatial surface of the visual field is observed on a lattice (i.e., uniform areal data); however, it is a complex projection of the underlying optic nerve head and exhibits anatomically induced spatial dependencies. In particular, localized damage to the optic disc can result in clinically deterministic deterioration across the visual field. Incorporating this non-standard spatial dependence structure into our methodology is a priority for properly analyzing these data, although it is commonly ignored. Translating this into math lingo, this means that a naive modeling of the spatial surface of the visual field would be inappropriate (i.e., neighbors defined through adjacent locations). Instead, the definition of a neighbor when considering vision loss on the visual field must depend on the underlying anatomical proximities.</p>
<p>To illustrate this concept, we begin by displaying the visual field neighborhood structure. The adjacency matrix for the HFA-II is available in the <code>womblR</code> package. In this analysis, we use a queen specification, meaning that an adjacency is defined as any location that shares an edge or corner on the lattice. We now load this adjacency matrix and remove the two locations that correspond to the blind spot.</p>
<pre class="r"><code>blind_spot <- c(26, 35) # define blind spot
W <- HFAII_Queen[-blind_spot, -blind_spot] # HFA-II visual field adjacency matrix</code></pre>
<p>This adjacency structure can be displayed using the <code>graph.adjacency</code> function in the <code>igraph</code> package.</p>
<pre class="r"><code>library(igraph)
adj.graph <- graph.adjacency(W, mode = "undirected")
plot(adj.graph)</code></pre>
<p><img src="/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-5-1.png" width="528" style="display: block; margin: auto;" /></p>
<p>As mentioned above, naively assuming that all of these adjacencies are equal ignores the important underlying anatomy that enforces these dependencies. This anatomical relationship of the visual field test points and the underlying optic nerve head was studied by Garway-Heath et al. (2000), in which they estimated the angle that each test location’s underlying retinal ganglion cells enters the optic disc, measured in degrees. These angles are the missing link that will allow the visual field adjacency structure to be dictated by the underlying anatomy. These angles can be visualized using the function <code>PlotAdjacency</code> from <code>womblR</code>, which displays neighborhood structures across the visual field. Before using this function, we need to load the angles measured in Garway-Heath et al. (2000). These are available from <code>womblR</code>; again, we remove the blind spot before using.</p>
<pre class="r"><code>Angles <- GarwayHeath[-blind_spot] # Garway-Heath angles
summary(Angles)</code></pre>
<pre><code>## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 80.75 192.50 177.35 275.75 329.00</code></pre>
<p>We are now ready to visualize the neighborhood structure of the visual field using the <code>PlotAdjacency</code> function.</p>
<pre class="r"><code>###Plot the angles on the visual field
PlotAdjacency(W = W,
DM = Angles,
zlim = c(0, 180),
Visit = NA,
edgewidth = 3.75,
cornerwidth = 0.33,
lwd.border = 3.75,
main = "Garway-Heath angles\n across the visual field")</code></pre>
<p><img src="/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-7-1.png" width="528" style="display: block; margin: auto;" /></p>
<p>The angles measured by Garway-Heath et al. are presented at each location on the visual field. More interestingly, the distances between these angles are presented for each of the neighbor pairs. This figure is equivalent to the adjacency plot displayed above, but allows the adjacencies to vary as a function of the anatomy. In particular, if two visual field locations are anatomically similar, the dependency is strengthened (i.e., more white), and if the locations are close to anatomically independent, the dependency is weaker (i.e., more black). Here the edge adjacencies are represented by lines, while the diagonal adjacencies are represented as two triangles. This view of the visual field details the anatomical importance in modeling visual field data, as neighboring locations can have underlying retinal ganglion cells that enter the optic nerve head with a large degree of separation. In particular, locations on either side of the equator, although adjacent, are anatomically close to independent based on anatomy.</p>
</div>
<div id="how-to-model-visual-field-data" class="section level2">
<h2>How to Model Visual Field Data?</h2>
<p>If you have gotten this far in the post, hopefully you have the sense that the study of visual field data is statistically interesting and clinically important for properly assessing a glaucoma patient’s risk of vision loss. In the next two blog posts, we will explore how visual field data are currently analyzed and new methods that account for the anatomical structure detailed above. To accomplish this, we will break down the algorithm and software used to build the <code>womblR</code> package, and will attempt to illustrate the importance of R packages for open-source clinical research.</p>
</div>
<div id="reference" class="section level2">
<h2>Reference</h2>
<ol style="list-style-type: decimal">
<li>Garway-Heath, David F., Darmalingum Poinoosawmy, Frederick W. Fitzke, and Roger A. Hitchings. “Mapping the visual field to the optic disc in normal tension glaucoma eyes” <em>Ophthalmology</em> 107, no. 10 (2000): 1809-1815.</li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/';</script>
October 2018: “Top 40” New Packages
https://rviews.rstudio.com/2018/11/29/october-2018-top-40-new-packages/
Thu, 29 Nov 2018 00:00:00 +0000https://rviews.rstudio.com/2018/11/29/october-2018-top-40-new-packages/
<p>One hundred eighty-five new packages made it to CRAN in October. Here are my picks for the “Top 40” in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=compboost">compboost</a> v0.1.0: Provides a C++ implementation of component-wise boosting written to obtain high run-time performance and full memory control. The <a href="https://cran.r-project.org/web/packages/compboost/vignettes/compboost.html">vignette</a> shows how to use the package.</p>
<p><a href="https://cran.r-project.org/package=RcppEnsmallen">RcppEnsmallen</a> v0.1.10.0.1: Implements an interface to the C++ based <a href="http://ensmallen.org/">Ensmallen</a> mathematical optimization library that provides a simple set of abstractions for writing an objective function to optimize. Optimizers include full-batch gradient descent techniques, small-batch techniques, gradient-free optimizers, and constrained optimization.</p>
<p><a href="https://cran.r-project.org/package=SAMCpack">SAMpack</a> v0.1.1: Implements Stochastic Approximation Monte Carlo (SAMC) samplers capable of sampling from multimodal or doubly intractable distributions. See <a href="doi:10.1002/9780470669723">Liang et al (2010)</a> for a complete introduction to the method, and the <a href="https://cran.r-project.org/package=SAMCpack">vignette</a> for an introduction to the package.</p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=crimedata">crimedata</a> v0.1.0: Provides access to publicly available, police-recorded open crime data from large cities in the United States that are included in the <a href="https://osf.io/zyaqn/">Crime Open Database</a>.</p>
<p><a href="https://cran.r-project.org/web/packages/nasapower/index.html">nasapower</a> v1.02: Implements an interface to <a href="https://power.larc.nasa.gov/"><code>POWER</code> (Prediction Of Worldwide Energy Resource)</a>, NASA’s global meteorology, surface solar energy, and climatology data API. Look <a href="https://ropensci.github.io/nasapower/">here</a> for a quick start.</p>
<p><a href="https://cran.r-project.org/package=wikisourcer">wikisourcer</a> v0.1.1: Provides access to public domain works from <a href="https://wikisource.org/">Wikisource</a>, a free library from the Wikimedia Foundation project. See the <a href="https://cran.r-project.org/web/packages/wikisourcer/vignettes/wikisourcer.html">vignette</a> for a package tutorial.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/wikisourcer.png" height = "400" width="600"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=gcForest">gcForest</a> v0.2.7: Provides an API interface to the <a href="https://github.com/pylablanche/gcForest">Python implementation</a> of Deep Forest, an alternative to Deep Learning. The algorithm is described in <a href="arXiv:1702.08835v2">Zhou and Feng (2017)</a>, and there is a brief package <a href="https://cran.r-project.org/web/packages/gcForest/vignettes/gcForest-docs.html">tutorial</a>.</p>
<p><a href="https://cran.r-project.org/package=galgo">galgo</a> v1.4: Allows users to build multivariate predictive models from large data sets having a far larger number of features than samples, such as in functional genomics data sets. See <a href="doi:10.1093/bioinformatics/btl074">Trevino and Falciani (2006)</a> for details.</p>
<p><a href="https://cran.r-project.org/package=MachineShop">MachineShop</a> v0.2.0: Provides a common interface for machine learning model fitting, prediction, performance assessment, and presentation of results. There is an <a href="https://cran.r-project.org/web/packages/MachineShop/vignettes/Introduction.html">Introduction</a> and a note on <a href="https://cran.r-project.org/web/packages/MachineShop/vignettes/MLModels.html">Implementation Conventions</a>.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/MachineShop.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=mlflow">mlflow</a> v0.8.0: Provides an interface to <a href="ttps://mlflow.org/"><code>MLflow</code></a>, an open-source platform for the complete machine learning life cycle that supports installation, tracking experiments, running projects, and saving models.</p>
<p><a href="https://cran.r-project.org/package=sboost">sboost</a> v0.1.0: Provides a fast, C++-based implementation of Freund and Schapire’s Adaptive Boosting (AdaBoost) algorithm, and includes methods for classifier assessment, predictions, and cross-validation.</p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=CoRpower">CoRpower</a> v1.0.0: Provides functions to calculate power for assessment of intermediate biomarker responses as correlates of risk in the active treatment group in clinical efficacy trials, as described in <a href="https://www.ncbi.nlm.nih.gov/pubmed/27037797">Gilbert et al. (2016)</a>. The <a href="https://cran.r-project.org/web/packages/CoRpower/vignettes/CoRpower.html">vignette</a> demonstrates the math.</p>
<p><a href="https://cran.r-project.org/package=radtools">radtools</a> v1.0.0: Provides a collection of utilities for navigating medical image data in DICOM and NIfTI formats. An emphasis on metadata allows simple conversion of image metadata to familiar R data structures, such as lists and data frames. The <a href="https://cran.r-project.org/web/packages/radtools/vignettes/radtools_usage.html">vignette</a> shows how to use the package.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/radtools.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=rpact">rpact</a> v1.0.0: Provides functions for designing and analyzing confirmatory adaptive clinical trials with continuous, binary, and survival endpoints according to the methods described in the monograph by <a href="doi:10.1007/978-3-319-32562-0">Wassmer and Brannath (2016)</a>. Look <a href="https://www.rpact.org/">here</a> for an overview.</p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=ClimProjDiags">ClimProjDiags</a> v0.0.1: Provides functions for computing metrics and indices for climate analysis, comparing models, and combining them into ensembles. There are vignettes on <a href="https://cran.r-project.org/web/packages/ClimProjDiags/vignettes/anomaly_agreement.html">anomaly agreement</a>, <a href="https://cran.r-project.org/web/packages/ClimProjDiags/vignettes/diurnaltemp.html">diurnal temperatures</a>, <a href="https://cran.r-project.org/web/packages/ClimProjDiags/vignettes/extreme_indices.html">extreme indices</a>, and <a href="https://cran.r-project.org/web/packages/ClimProjDiags/vignettes/heatcoldwaves.html">heat and cold wave duration</a>.</p>
<p><a href="https://cran.r-project.org/package=DEVis">DEVis</a> v1.0.0: Provides a comprehensive tool set for data aggregation, visual analytics, exploratory analysis, and project management that builds upon the Bioconductor <a href="http://bioconductor.org/packages/release/bioc/html/DESeq2.html">DESeq2</a> differential expression package. The <a href="https://cran.r-project.org/web/packages/DEVis/vignettes/DEVis_vignette.pdf">vignette</a> offers a comprehensive introduction.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/DEVis.png" height = "300" width="400"></p>
<p><a href="https://cran.r-project.org/package=epimdr">epimdr</a> v0.6-1: Provides functions for studying epidemics, including the <a href="http://www.public.asu.edu/~hnesse/classes/seir.html">S(E)IR model</a>, time-series SIR and chain-binomial stochastic models, catalytic disease models, and coupled map lattice models. It is a companion to the book <a href="https://www.springer.com/gp/book/9783319974866">Epidemics: Models and Data in R</a> and the Coursera course <a href="https://www.coursera.org/learn/epidemics">Epidemics Massive Online Open Course</a>.</p>
<p><a href="https://cran.r-project.org/package=firebehavioR">firebehavior</a> v0.1.1: Implements fire behavior prediction models, including those documented in <a href="doi:10.2737/RMRS-RP-29">Scott & Reinhardt (2001)</a> and <a href="doi:10.1016/j.foreco.2006.08.174">Alexander et al. (2006)</a>. The <a href="https://cran.r-project.org/web/packages/firebehavioR/vignettes/firebehavioR.html">vignette</a> is informative.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/firebehavioR.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=lorentz">lorentz</a> v1.0.0: Provides the functionality to work with Lorentz transforms and the gyrogroup structure in <a href="https://en.wikipedia.org/wiki/Special_relativity">Special Relativity</a>.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/lorentz.png" height = "300" width="400"></p>
<p><a href="https://cran.r-project.org/package=pubchunks">pubchunks</a> v0.1.0: Provides functions for extracting chunks of XML from scholarly articles without having to know how to work with XML. See <a href="https://cran.r-project.org/web/packages/pubchunks/readme/README.html">README</a> to get going.</p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=BayesMallows">BayesMallows</a> v0.1.1: Implements the Bayesian version of the Mallows rank model (Vitelli et al. (2018)(<a href="http://jmlr.org/papers/v18/15-481.html">http://jmlr.org/papers/v18/15-481.html</a>). The <a href="https://cran.r-project.org/web/packages/BayesMallows/vignettes/BayesMallowsPackage.html">vignette</a> provides the details.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/BayesMallows.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=contextual">contextual</a> v0.9.1: Facilitates the simulation and evaluation of context-free and contextual multi-Armed Bandit policies or algorithms to ease the implementation, evaluation, and dissemination of both existing and new bandit algorithms and policies. See the <a href="https://cran.r-project.org/web/packages/contextual/vignettes/contextual.html">Getting Started Guide</a> and this <a href="https://cran.r-project.org/web/packages/contextual/vignettes/posts.html">list of posts</a> for more information.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/contextual.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=coxrt">coxrt</a> v1.0.0: Implements Cox Proportional Hazards regression for right-truncated data. The <a href="https://cran.r-project.org/web/packages/coxrt/vignettes/coxrt-vignette.html">vignette</a> gives the details.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/coxrt.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=crossrun">crossrun</a> v0.1.0: Estimates the joint distribution of number of crossings and the longest run in a series of independent Bernoulli trials. There is a <a href="https://cran.r-project.org/web/packages/crossrun/vignettes/vignettecrossrun.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=logisticRR">logisticRR</a> v0.2.0: Asserting that relative risk is often of interest in public health, this package provides functions to return adjusted relative risks from logistic regression model under potential confounders. The <a href="https://cran.r-project.org/web/packages/logisticRR/vignettes/logisticRR.html">vignette</a> does the math.</p>
<p><a href="https://cran.r-project.org/package=lognorm">lognorm</a> v0.1.3: Estimates the distribution parameters and computes moments and other basic statistics of the lognormal distribution <a href="doi:10.1641/0006-3568(2001)051[0341:lndats]2.0.co;2">Limpert al. (2001)</a>, and also provides an approximation to the distribution of the sum of several correlated lognormally distributed variables <a href="doi:10.12988/ams.2013.39511">Lo (2013)</a>. There is a vignette on <a href="https://cran.r-project.org/web/packages/lognorm/vignettes/aggregateCorrelated.html">Aggregating Correlated Random Variables</a> and another on <a href="https://cran.r-project.org/web/packages/lognorm/vignettes/lognormalSum.html">Approximating Sums</a>.</p>
<p><a href="https://cran.r-project.org/package=lolog">lolog</a> v1.1: Provides functions to estimate Latent Order Logistic (LOLOG) Models for Networks, and also visual diagnostics and goodness of fit metrics are provided. See <a href="arXiv:1804.04583">Fellows (2018)</a> for a detailed description of the methods. One vignette works through an <a href="https://cran.r-project.org/web/packages/lolog/vignettes/lolog-ergm.pdf">example</a>, and another introduces <a href="https://cran.r-project.org/web/packages/lolog/vignettes/lolog-introduction.pdf">lolog models</a>.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/lolog.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=matrixNormal">matrixNormal</a> v0.0.0: Provides the functions to compute densities, probabilities, and random deviates of the Matrix Normal distribution. See <a href="doi:10.7508/ijmsi.2010.02.004">Iranmanesh et.al. (2010)</a></p>
<p><a href="https://cran.r-project.org/package=outcomerate">outcomerate</a> v1.0.1: Implements standardized survey outcome rate functions, including the response rate, contact rate, cooperation rate, and refusal rate that allow researchers to measure the quality of survey data using standards published by the <a href="https://www.aapor.org/">American Association of Public Opinion Research</a>. For details, see <a href="https://www.aapor.org/Standards-Ethics/Standard-Definitions-(1).aspx">AAPOR (2016)</a>. The vignette provides an <a href="https://cran.r-project.org/web/packages/outcomerate/vignettes/intro-to-outcomerate.html">Introduction</a>.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/outcomerate.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=parmsurvfit">parmsurvfit</a> v0.0.1: Fits right-censored data to a given parametric distribution, and produces summary statistics, hazard, cumulative hazard and probability plots, and the Anderson-Darling test statistic. There is a <a href="https://cran.r-project.org/web/packages/parmsurvfit/vignettes/parmsurvfit_vig.html">vignette</a>.</p>
<p><a href="https://CRAN.R-project.org/package=ppgmmga">ppgmmga</a> v1.0.1: Implements a Projection Pursuit algorithm for dimension reduction based on Gaussian Mixture Models. The <a href="https://cran.r-project.org/web/packages/ppgmmga/vignettes/ppgmmga.html">vignette</a> provides a quick tour of the package.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/ppgmmga.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=RcppDist">RcppDist</a> v0.1.1: Provides additional statistical distributions that
can be called from C++ when writing code using Rcpp or RcppArmadillo. See the <a href="https://cran.r-project.org/web/packages/RcppDist/vignettes/RcppDist.pdf">vignette</a> for a list of the distributions supported.</p>
<p><a href="https://cran.r-project.org/package=simstandard">simstandard</a> v0.2.0: Enables the creation of simulated data from structural equation models with standardized loading. The <a href="https://cran.r-project.org/web/packages/simstandard/vignettes/simstandard_tutorial.html">vignette</a> shows how to use the package.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/simstandard.png" height = "300" width="400"></p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=carrier">carrier</a> v0.1.0: Enables users to create functions that are isolated from their environment. These isolated functions, also called crates, print at the console with their total size and can be easily tested locally before being sent to a remote.</p>
<p><a href="https://cran.r-project.org/package=carbonate">carbonate</a> v0.1.0: Implements an interface to <a href="https://carbon.now.sh/about">carbon.js</a>, which allows developers to create images of source code. There is a vignette on <a href="https://cran.r-project.org/web/packages/carbonate/vignettes/tests_and_coverage.html">Tests and Coverage</a>.</p>
<p><a href="https://cran.r-project.org/package=generics">generics</a> v0.0.1: In order to reduce potential package dependencies and conflicts, <code>generics</code> provides a number of commonly used S3 generics.</p>
<p><a href="https://cran.r-project.org/package=REPLesentR">REPLesentR</a> v0.3.0: Allows users to create presentations and display them inside the R <code>REPL</code> (console). Supports <code>RMarkdown</code> and other text format.</p>
<p><a href="https://cran.r-project.org/package=stationery">stationery</a> v0.98.5.5: Provides templates, guides, and scripts for writing documents in <code>LaTeX</code> and <code>R markdown</code> to produce guides, slides, and reports; and includes several vignettes to assist new users of literate programming. There is an <a href="https://cran.r-project.org/web/packages/stationery/vignettes/stationery.pdf">Overview</a>, a vignette on <a href="https://cran.r-project.org/web/packages/stationery/vignettes/Rmarkdown.pdf">R Markdown Basics</a>, and another on <a href="https://cran.r-project.org/web/packages/stationery/vignettes/HTML_special_features.html">R Markdown HTML</a>, and a comparison between <a href="https://cran.r-project.org/web/packages/stationery/vignettes/code_chunks.pdf">Sweave and Knitr code chunks</a>.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=balance">balance</a> v0.1.6: Provides an alternative scheme for visualizing balances (used in <a href="https://en.wikipedia.org/wiki/Compositional_data">compositional data analysis</a>) as described in <a href="doi:10.12688/f1000research.15858.1">Quinn (2018)</a>, as well as a method for principal balance analysis. See the <a href="https://cran.r-project.org/web/packages/balance/vignettes/balance.html">vignette</a> for details.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/balances.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=trelliscopejs">trelliscopejs</a> v0.1.14: Provides methods that make it easy to create a Trelliscope display specification for TrelliscopeJS, including high-level functions for creating displays from within <code>dplyr</code> or <code>ggplot2</code> workflows. There is a vignette on <a href="https://hafen.github.io/trelliscopejs/#trelliscope">trelliscope Documentation</a> and a <a href="https://cran.r-project.org/web/packages/trelliscopejs/vignettes/rd.html">trelliscope Package Reference</a>.</p>
<p><img src="/post/2018-11-19-Rickert-OctTop40_files/trelliscopejs.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2018/11/29/october-2018-top-40-new-packages/';</script>
Slack and Plumber, Part Two
https://rviews.rstudio.com/2018/11/27/slack-and-plumber-part-two/
Tue, 27 Nov 2018 00:00:00 +0000https://rviews.rstudio.com/2018/11/27/slack-and-plumber-part-two/
<p>This is the final entry in a three-part series about the <a href="https://www.rplumber.io/"><code>plumber</code></a> package. <a href="https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/">The first post</a> introduces <code>plumber</code> as an R package for building REST API endpoints in R. <a href="https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/">The second post</a> builds a working example of a <code>plumber</code> API that powers a <a href="https://api.slack.com/slash-commands">Slack slash command</a>. In this final entry, we will secure the API created in the previous post so that it only responds to authenticated requests, and deploy it using <a href="https://www.rstudio.com/products/connect/">RStudio Connect</a>.</p>
<div class="figure">
<img src="/post/2018-11-20-blair-plumber-slack-part-two-files/plumber-slack-demo.gif" />
</div>
<p>As a reminder, this API is built on top of simulated customer call data. The slash command we create will allow users to view a customer status report within Slack. This status report contains customer name, total calls, date of birth, and a plot of call history for the past 20 weeks. The simulated data, along with the script used to create it, can be found in the <a href="https://github.com/sol-eng/plumber-slack">GitHub repository</a> for this example.</p>
<div id="setup" class="section level2">
<h2>Setup</h2>
<p>Successfully following this example assumes you have created a <a href="https://slack.com">Slack</a> account and you have <a href="https://api.slack.com/slack-apps">followed the instructions for creating an app</a>. The Plumber API as it currently exists is described in detail in the <a href="https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/">previous post</a>.</p>
<p>This API can be run through the UI as previously described, or by running <code>plumber::plumb("plumber.R")$run(port = 5762)</code> from the directory containing the API defined in <code>plumber.R</code>. As it stands now, this API could be deployed and used by Slack. However, it’s important to remember that we have no control over the request that Slack makes to the API. Because of this, we can’t rely on RStudio Connect’s <a href="http://docs.rstudio.com/connect/admin/content-management.html#api-keys">built-in API authentication mechanism</a> to secure the API because there is no way to submit a key with the request. Our options are either to expose the API with no security, meaning anyone can access the endpoints we’ve defined, or to find some other mechanism for securing the API so that it only responds to authorized requests.</p>
</div>
<div id="api-security-patterns" class="section level2">
<h2>API Security Patterns</h2>
<p>The <a href="https://www.rplumber.io/docs/security.html"><code>plumber</code> documentation</a> provides a good introduction to API security for the R user:</p>
<blockquote>
<p>The majority of R programmers have not been trained to give much attention to the security of the code that they write. This is for good reason since running R code on your own machine with no external input gives little opportunity for attackers to leverage your R code to do anything malicious. However, as soon as you expose an API on a network, your concerns and thought process must adapt accordingly.</p>
</blockquote>
<p>API security can be challenging to address. As it stands today, it is the developer’s responsibility to provide proper security on API endpoints, though in the future, there may be additional security features added to <code>plumber</code> or available via other R packages.</p>
<p>As mentioned in the <a href="https://www.rplumber.io/docs/security.html"><code>plumber</code> documentation</a>, there are a number of things to consider when designing API security. For example, if the API is deployed on an internal network, securing the API may not be as important as it would be if the API was publicly exposed on the internet. When an API needs to be secured, there are several potential attack vectors that need to be handled. In this specific example, we are exposing a public endpoint that provides access to sensitive customer data. If we are unable to authenticate incoming requests, then we risk exposing sensitive data. To prevent this data from falling into the wrong hands, we will focus on verifying incoming requests so that the API only responds to requests made from Slack.</p>
<p>There are several different methods for authenticating requests made to API endpoints. One common method is the use of API keys, which are cryptographically secure values sent with the request to verify the identity of the client. However, in this case, we have no control over the request Slack sends, so we cannot include such a key in the request. Thankfully, Slack has provided an alternative authentication method using <a href="https://api.slack.com/docs/verifying-requests-from-slack">signed secrets</a>. Full details can be read in the Slack documentation, but in essence, each Slack application is assigned a unique secret value that, when used in connection with other request details, can be used to verify that an incoming request is indeed coming from Slack and not an unknown third party.</p>
</div>
<div id="securing-the-api" class="section level2">
<h2>Securing the API</h2>
<p>In order to secure our API so that only requests from Slack are honored, we first need to obtain the signing secret for our application. This value can be found in the Basic Information section of the Slack application settings. Now, it is important to remember that this is called a signing secret for a reason: it should not be shared with anyone. To avoid exposing this secret, we can save it as an environment variable. We add this in a current R session by using <code>Sys.setenv(SLACK_SIGNING_SECRET = <our signing secret>)</code>, or we can add it to our <a href="https://csgillespie.github.io/efficientR/set-up.html#renviron"><code>.Renviron</code></a> file so that it is set for every R session. Once this is done, we can access this value in R using <code>Sys.getenv("SLACK_SIGNING_SECRET")</code>. Now we are ready to create a function to verify if incoming requests are from Slack.</p>
<p>Slack provides the following three-step process for verifying requests:</p>
<ul>
<li>Your app receives a request from Slack</li>
<li>Your app computes a signature based on the request</li>
<li>You make sure the computed signature matches the signature on the request</li>
</ul>
<p>In order to verify all incoming requests, we can define an additional filter for our API that follows the above recipe.</p>
<pre class="r"><code>#* Verify incoming requests
#* @filter verify
function(req, res) {
# Forward requests coming to swagger endpoints
if (grepl("swagger", tolower(req$PATH_INFO))) return(forward())
# Check for X_SLACK_REQUEST_TIMESTAMP header
if (is.null(req$HTTP_X_SLACK_REQUEST_TIMESTAMP)) {
res$status <- 401
}
# Build base string
base_string <- paste(
"v0",
req$HTTP_X_SLACK_REQUEST_TIMESTAMP,
req$postBody,
sep = ":"
)
# Slack Signing secret is available as environment variable
# SLACK_SIGNING_SECRET
computed_request_signature <- paste0(
"v0=",
openssl::sha256(base_string, Sys.getenv("SLACK_SIGNING_SECRET"))
)
# If the computed request signature doesn't match the signature provided in the
# request, set status of response to 401
if (!identical(req$HTTP_X_SLACK_SIGNATURE, computed_request_signature)) {
res$status <- 401
} else {
res$status <- 200
}
if (res$status == 401) {
list(
text = "Error: Invalid request"
)
} else {
forward()
}
}</code></pre>
<p>There are a lot of moving pieces to this filter, but essentially we are following the process outlined by Slack for verifying requests. We also allow Swagger endpoints to be served without verification so that the Swagger UI can still be generated for our API.</p>
<p>Once this filter is in place, all incoming requests will be verified. However, this will create issues with our <code>/plot/history/</code> endpoint since it is called using a standard GET request without any Slack authentication. To ensure that this endpoint is able to be utilized as we want, we’ll make some small updates to the endpoint and add <code>#* @preempt verify</code> to the <code>plumber</code> comments before the function. This prevents the <code>verify</code> filter from applying to this endpoint.</p>
<p>Now, this prevents the Slack authentication process from applying to our plot endpoint. However, this endpoint, if left unsecured, provides unfiltered access to sensitive customer data. We need an effective way to secure this endpoint so that it only responds to requests generated from Slack.</p>
<p>Since the only thing we control in the request to this endpoint is the URL, we can update our endpoint so that an encrypted parameter is passed as part of the URL. This parameter is a combination of the current datetime and the customer ID that is then encrypted using our Slack signing secret. We can use the <code>encrypt_string()</code> function from the <a href="https://talegari.github.io/safer/"><code>safer</code></a> package to securely encrypt this string. The following example illustrates this process.</p>
<pre class="r"><code>current_time <- Sys.time()
customer_id <- 89
parameter_string <- paste(current_time, customer_id, sep = ";")
safer::encrypt_string(parameter_string, Sys.getenv("SLACK_SIGNING_SECRET"))</code></pre>
<pre><code>## [1] "m7NfMZfpY1n5EuivjuiFQsyKopT68HiX+NIgk5S+VBlDHrVqzRM="</code></pre>
<p>Once we have created this encrypted value, we pass it to the URL of our plot endpoint. Then, within the plot endpoint, we decrypt the string, extract the customer ID, and check to see if the current time is within five seconds of the time encoded in the string. If more than five seconds have passed, we consider the request to be unauthorized. To help with this process, we define two helper functions:</p>
<pre class="r"><code>encrypt_string <- function(string) {
urltools::url_encode(safer::encrypt_string(paste(Sys.time(), string, sep = ";"),
key = Sys.getenv("SLACK_SIGNING_SECRET")))
}
plot_auth <- function(endpoint, time_limit = 5) {
# Save current time to compare against endpoint time value
current_time <- Sys.time()
# Try to decrypt endpoint and extract user id
tryCatch({
# Decrypt endpoint using SLACK_SIGNING_SECRET
decrypted_endpoint <- safer::decrypt_string(endpoint,
key = Sys.getenv("SLACK_SIGNING_SECRET"))
# Split endpoint on ;
endpoint_split <- unlist(strsplit(decrypted_endpoint, split = ";"))
# Convert time
endpoint_time <- as.POSIXct(endpoint_split[1])
# Calculate time difference
time_diff <- difftime(current_time, endpoint_time, units = "secs")
# If more than 5 seconds have passed since the request was generated, then
# error
if (time_diff > time_limit) {
"Unauthorized"
} else {
endpoint_split[2]
}
},
error = function(e) "Unauthorized"
)
}</code></pre>
<p>Once these helper functions are in place, we can update our <code>/plot/history</code> endpoint as follows:</p>
<pre class="r"><code>#* Plot customer weekly calls
#* @png
#* @param cust_secret encrypted value calculated in /status endpoint
#* @response 400 No customer with the given ID was found.
#* @preempt verify
#* @get /plot/history
function(res, cust_secret) {
# Authenticate that request came from /status
cust_id <- plot_auth(cust_secret)
# Return unauthorized error if cust_id is "Unauthorized"
if (cust_id == "Unauthorized") {
res$status <- 401
stop("Unauthorized request")
} else if (!cust_id %in% sim_data$id) {
res$status <- 400
stop("Customer id" , cust_id, " not found.")
}
# Filter data to customer id provided
plot_data <- dplyr::filter(sim_data, id == cust_id)
# Customer name (id)
customer_name <- paste0(unique(plot_data$name), " (", unique(plot_data$id), ")")
# Create plot
history_plot <- plot_data %>%
ggplot(aes(x = time, y = calls, col = calls)) +
ggalt::geom_lollipop(show.legend = FALSE) +
theme_light() +
labs(
title = paste("Weekly calls for", customer_name),
x = "Week",
y = "Calls"
)
# print() is necessary to render plot properly
print(history_plot)
}</code></pre>
<p>Now we need to make one small change to our <code>/status</code> endpoint so that we properly build the appropriate URL for our image. We construct the list response to the <code>/status</code> endpoint as follows, where <code>image_url</code> has been updated.</p>
<pre class="r"><code> attachments = list(
list(
color = customer_status,
title = paste0("Status update for ", customer_name, " (", customer_id, ")"),
fallback = paste0("Status update for ", customer_name, " (", customer_id, ")"),
# History plot
image_url = paste0(base_url,
"/plot/history?cust_secret=",
encrypt_string(customer_id)),
# Fields provide a way of communicating semi-tabular data in Slack
fields = list(
list(
title = "Total Calls",
value = sum(customer_data$calls),
short = TRUE
),
list(
title = "DoB",
value = unique(customer_data$dob),
short = TRUE
)
)
)
)</code></pre>
<p>Just like that, we have a secure API!</p>
</div>
<div id="all-together-now" class="section level2">
<h2>All Together Now</h2>
<p>Now, given the authorization pieces we have implemented, it is a bit more difficult to test our API since our endpoints will only respond to authorized requests. However, we can use the free version of <a href="https://www.getpostman.com">Postman</a> to test our API. An in-depth look at the capabilities of Postman is beyond the scope of this post, so hopefully a gif will suffice. Further details about using Postman in this context can be found in the <a href="https://github.com/sol-eng/plumber-slack#running-locally">GitHub repository</a> for this example.</p>
<div class="figure">
<img src="/post/2018-11-20-blair-plumber-slack-part-two-files/postman-demo.gif" />
</div>
<p>It appears that everything is working as expected! Our endpoints fail when the authorization criteria are not met, and otherwise they succeed. Notice that the plot endpoint works when initially called, but when a subsequent call is made it fails since more than five seconds have passed since the <code>/status</code> endpoint was invoked.</p>
<p>Now, the final step in this process is publishing this API so that Slack can properly interact with it. The easiest way to do this is to publish the API to <a href="http://docs.rstudio.com/connect/user/publishing.html#publishing-apis">RStudio Connect</a>. Once published, Slack can be updated to point the Slash command to our nice, newly secured API.</p>
</div>
<div id="conclusion" class="section level2">
<h2>Conclusion</h2>
<p>This brings us to the conclusion of this series. We’ve discovered the power of <code>plumber</code> in exposing R to downstream consumers via RESTful API endpoints. We built a Slack app powered entirely by R and <code>plumber</code>, and now we have secured the underlying API so that it only responds to authorized requests. As we have seen, <code>plumber</code> provides a powerful and flexible framework for exposing R functions as APIs. These APIs can be safely secured so that only authorized requests are permitted.</p>
<p><em>James Blair is a solutions engineer at RStudio who focuses on tools, technologies, and best practices for using R in the enterprise.</em></p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2018/11/27/slack-and-plumber-part-two/';</script>
Many Factor Models
https://rviews.rstudio.com/2018/11/19/many-factor-models/
Mon, 19 Nov 2018 00:00:00 +0000https://rviews.rstudio.com/2018/11/19/many-factor-models/
<p>Today, we will return to the Fama French (FF) model of asset returns and use it as a proxy for fitting and evaluating multiple linear models. In a <a href="https://rviews.rstudio.com/2018/04/11/introduction-to-fama-french/">previous post</a>, we reviewed how to run the FF three-factor model on the returns of a portfolio. That is, we ran one model on one set of returns. Today, we will run multiple models on multiple streams of returns, which will allow us to compare those models and hopefully build a code scaffolding that can be used when we wish to explore other factor models.</p>
<p>Let’s get to it!</p>
<p>We will start by importing daily prices and calculating the returns of our five usual ETFs: SPY, EFA, IJS, EEM and AGG. I covered the logic for this task in a <a href="http://www.reproduciblefinance.com/2017/09/25/asset-prices-to-log-returns/">previous post</a> and the code is below.</p>
<pre class="r"><code>library(tidyverse)
library(broom)
library(tidyquant)
library(timetk)
symbols <- c("SPY", "EFA", "IJS", "EEM", "AGG")
# The prices object will hold our daily price data.
prices <-
getSymbols(symbols, src = 'yahoo',
from = "2012-12-31",
to = "2017-12-31",
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`(symbols)
asset_returns_long <-
prices %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, returns, -date) %>%
group_by(asset) %>%
mutate(returns = (log(returns) - log(lag(returns)))) %>%
na.omit()
asset_returns_long %>%
head()</code></pre>
<pre><code># A tibble: 6 x 3
# Groups: asset [1]
date asset returns
<date> <chr> <dbl>
1 2013-01-02 SPY 0.0253
2 2013-01-03 SPY -0.00226
3 2013-01-04 SPY 0.00438
4 2013-01-07 SPY -0.00274
5 2013-01-08 SPY -0.00288
6 2013-01-09 SPY 0.00254</code></pre>
<p>We now have the returns of our five ETFs saved in a tidy tibble called <code>asset_returns_long</code>. Normally we would combine these into one portfolio, but we will leave them as individual assets today so we can explore how to model the returns of multiple assets saved in one data object.</p>
<p>We are going to model those individual assets on the Fama French set of five factors to see if those factors explain our asset returns. Those five factors are the market returns (similar to <a href="https://rviews.rstudio.com/2018/02/08/capm-beta/">CAPM</a>), firm size (small versus big), firm value (high versus low book-to-market), firm profitability (high versus low operating profits), and firm investment (high versus low total asset growth). To learn more about the theory behind using these factors, see <a href="https://www.sciencedirect.com/science/article/pii/S0304405X14002323">this article</a>.</p>
<p>Our next step is to import our factor data, and luckily FF make them available on <a href="http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research">their website</a>. I presented the code showing how to import this data in a <a href="http://www.reproduciblefinance.com/2018/06/07/fama-french-write-up-part-one/">previous post on my blog</a> so I won’t go through the steps again, but the code is below.</p>
<pre class="r"><code>factors_data_address <-
"http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Global_5_Factors_Daily_CSV.zip"
factors_csv_name <- "Global_5_Factors_Daily.csv"
temp <- tempfile()
download.file(
# location of file to be downloaded
factors_data_address,
# where we want R to store that file
temp,
quiet = TRUE)
Global_5_Factors <-
read_csv(unz(temp, factors_csv_name), skip = 6 ) %>%
rename(date = X1, MKT = `Mkt-RF`) %>%
mutate(date = ymd(parse_date_time(date, "%Y%m%d")))%>%
mutate_if(is.numeric, funs(. / 100)) %>%
select(-RF)
Global_5_Factors %>%
head()</code></pre>
<pre><code># A tibble: 6 x 6
date MKT SMB HML RMW CMA
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1990-07-02 0.00700 -0.000600 -0.0031 0.0022 0.0004
2 1990-07-03 0.0018 0.0008 -0.0017 0.0007 0.0004
3 1990-07-04 0.0063 -0.0019 -0.0016 -0.0007 0.000300
4 1990-07-05 -0.0074 0.0031 0.0017 -0.0013 0.000600
5 1990-07-06 0.002 -0.0015 0.0002 0.002 -0.000300
6 1990-07-09 0.00240 0.0018 0.0001 0.0004 -0.00240 </code></pre>
<p>Next, we mash <code>asset_returns_long</code> and <code>Global_5_Factors</code> into one data object, using <code>left_join(..., by = "date")</code> because each object has a column called <code>date</code>.</p>
<pre class="r"><code>data_joined_tidy <-
asset_returns_long %>%
left_join(Global_5_Factors, by = "date") %>%
na.omit()
data_joined_tidy %>%
head(5)</code></pre>
<pre><code># A tibble: 5 x 8
# Groups: asset [1]
date asset returns MKT SMB HML RMW CMA
<date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2013-01-02 SPY 0.0253 0.0199 -0.0043 0.0028 -0.0028 -0.0023
2 2013-01-03 SPY -0.00226 -0.0021 0.00120 0.000600 0.0008 0.0013
3 2013-01-04 SPY 0.00438 0.0064 0.0011 0.0056 -0.0043 0.0036
4 2013-01-07 SPY -0.00274 -0.0014 0.00580 0 -0.0015 0.0001
5 2013-01-08 SPY -0.00288 -0.0027 0.0018 -0.00120 -0.0002 0.00120</code></pre>
<p>We now have a tibble called <code>data_joined_tidy</code> with the asset names in the <code>asset</code> column, asset returns in the <code>returns</code> column, and our five FF factors in the <code>MKT</code>, <code>SMB</code>, <code>HML</code>, <code>RMW</code> and <code>CMA</code> columns. We want to explore whether there is a linear relationship between the returns of our five assets and any/all of the five FF factors. Specifically, we will run several linear regressions, save the results, examine the results, and then quickly visualize the results.</p>
<p>The <code>broom</code> and <code>purrr</code> packages will do a lot of the heavy lifting for us eventually, but let’s start with a simple example: regress the return of one ETF on one of the FF factors. We will use <code>do()</code> for this, which I believe has been declared not a best practice but it’s so irresistibly simple to plunk into the pipes.</p>
<pre class="r"><code>data_joined_tidy %>%
filter(asset == "SPY") %>%
do(model = lm(returns ~ MKT, data = .))</code></pre>
<pre><code>Source: local data frame [1 x 2]
Groups: <by row>
# A tibble: 1 x 2
asset model
* <chr> <list>
1 SPY <S3: lm></code></pre>
<p>That worked, but our model is stored as a nested S3 object. Let’s use <code>tidy()</code>, <code>glance()</code>, and <code>augment()</code> to view the results.</p>
<pre class="r"><code>data_joined_tidy %>%
filter(asset == "SPY") %>%
do(model = lm(returns ~ MKT, data = .)) %>%
tidy(model)</code></pre>
<pre><code># A tibble: 2 x 6
# Groups: asset [1]
asset term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 SPY (Intercept) 0.000126 0.000100 1.26 0.208
2 SPY MKT 1.02 0.0154 66.0 0 </code></pre>
<pre class="r"><code>data_joined_tidy %>%
filter(asset == "SPY") %>%
do(model = lm(returns ~ MKT, data = .)) %>%
glance(model)</code></pre>
<pre><code># A tibble: 1 x 12
# Groups: asset [1]
asset r.squared adj.r.squared sigma statistic p.value df logLik
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 SPY 0.776 0.776 0.00354 4356. 0 2 5319.
# ... with 4 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>,
# df.residual <int></code></pre>
<pre class="r"><code>data_joined_tidy %>%
filter(asset == "SPY") %>%
do(model = lm(returns ~ MKT, data = .)) %>%
augment(model) %>%
head(5)</code></pre>
<pre><code># A tibble: 5 x 10
# Groups: asset [1]
asset returns MKT .fitted .se.fit .resid .hat .sigma .cooksd
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SPY 0.0253 0.0199 0.0204 3.17e-4 4.90e-3 7.99e-3 0.00354 7.77e-3
2 SPY -0.00226 -0.0021 -0.00201 1.07e-4 -2.47e-4 9.17e-4 0.00354 2.24e-6
3 SPY 0.00438 0.0064 0.00665 1.36e-4 -2.27e-3 1.47e-3 0.00354 3.02e-4
4 SPY -0.00274 -0.0014 -0.00130 1.04e-4 -1.44e-3 8.59e-4 0.00354 7.07e-5
5 SPY -0.00288 -0.0027 -0.00263 1.11e-4 -2.56e-4 9.82e-4 0.00354 2.56e-6
# ... with 1 more variable: .std.resid <dbl></code></pre>
<p>We can quickly pop our augmented results into <code>ggplot()</code> and inspect our residuals versus our fitted values. The important takeaway here is that our augmented results are in a data frame, so we can use all of our <code>ggplot()</code> code flows to create visualizations.</p>
<pre class="r"><code>data_joined_tidy %>%
filter(asset == "SPY") %>%
do(model = lm(returns ~ MKT, data = .)) %>%
augment(model) %>%
ggplot(aes(y = .resid, x = .fitted)) +
geom_point(color = "cornflowerblue")</code></pre>
<p><img src="/post/2018-11-13-many-factor-models_files/figure-html/unnamed-chunk-8-1.png" width="672" /></p>
<p>Alright, we have run a simple linear regression and seen how <code>tidy()</code>, <code>glance()</code>, and <code>augment()</code> clean up the model results. We could, of course, keep repeating this process for any combination of the FF factors and any of our ETFs, but let’s look at a more efficient approach for fitting multiple models on all of our ETFs.</p>
<p>First, let’s create and save three models as functions. That will allow us to pass them to the <code>map</code> functions from <code>purrr</code>. We will save a one-factor model as a function called <code>model_one</code>, a three-factor model as a function called <code>model_two</code> and a five-factor model as a function called <code>model_three</code>. Note that each function takes a data frame as an argument.</p>
<pre class="r"><code>model_one <- function(df) {
lm(returns ~ MKT, data = df)
}
model_two <- function(df) {
lm(returns ~ MKT + SMB + HML, data = df)
}
model_three <- function(df) {
lm(returns ~ MKT + SMB + HML + RMW + CMA, data = df)
}</code></pre>
<p>Now we want to run those three models on each of our five asset returns or, equivalently, we need to pass in a data frame of asset returns to each of those functions. However, we don’t want to save our five ETF returns as five separate tibbles. That would get quite unwieldy with a larger set of ETFs.</p>
<p>Instead, let’s use <code>nest()</code> to make our data more compact!</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest()</code></pre>
<pre><code># A tibble: 5 x 2
asset data
<chr> <list>
1 SPY <tibble [1,259 × 7]>
2 EFA <tibble [1,259 × 7]>
3 IJS <tibble [1,259 × 7]>
4 EEM <tibble [1,259 × 7]>
5 AGG <tibble [1,259 × 7]></code></pre>
<p>In my mind, our task has gotten a little conceptually simpler: we want to apply each of our models to each tibble in the <code>data</code> column, and to do that, we need to pass each tibble in that column to our functions.</p>
<p>Let’s use a combination of <code>mutate()</code> and <code>map()</code>.</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
three_factor_model = map(data, model_two),
five_factor_model = map(data, model_three))</code></pre>
<pre><code># A tibble: 5 x 5
asset data one_factor_model three_factor_mod… five_factor_mod…
<chr> <list> <list> <list> <list>
1 SPY <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
2 EFA <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
3 IJS <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
4 EEM <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
5 AGG <tibble [1,25… <S3: lm> <S3: lm> <S3: lm> </code></pre>
<p>From a substantive perspective, we’re done! We have run all three models on all five assets and stored the results. Of course, we’d like to be able to look at the results, but the substance is all there.</p>
<p>The same as we did above, we will use <code>tidy()</code>, <code>glimpse()</code>, and <code>augment()</code>, but now in combination with <code>mutate()</code> and <code>map()</code> to clean up the model results stored in each column. Let’s start by running just <code>model_one</code> and then <code>tidying</code> the results.</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
tidied_one = map(one_factor_model, tidy))</code></pre>
<pre><code># A tibble: 5 x 4
asset data one_factor_model tidied_one
<chr> <list> <list> <list>
1 SPY <tibble [1,259 × 7]> <S3: lm> <data.frame [2 × 5]>
2 EFA <tibble [1,259 × 7]> <S3: lm> <data.frame [2 × 5]>
3 IJS <tibble [1,259 × 7]> <S3: lm> <data.frame [2 × 5]>
4 EEM <tibble [1,259 × 7]> <S3: lm> <data.frame [2 × 5]>
5 AGG <tibble [1,259 × 7]> <S3: lm> <data.frame [2 × 5]></code></pre>
<p>If we want to see those tidied results, we need to <code>unnest()</code> them.</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
tidied_one = map(one_factor_model, tidy)) %>%
unnest(tidied_one)</code></pre>
<pre><code># A tibble: 10 x 6
asset term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 SPY (Intercept) 0.000126 0.000100 1.26 2.08e- 1
2 SPY MKT 1.02 0.0154 66.0 0.
3 EFA (Intercept) -0.000288 0.0000972 -2.97 3.07e- 3
4 EFA MKT 1.29 0.0150 86.0 0.
5 IJS (Intercept) 0.0000634 0.000171 0.371 7.11e- 1
6 IJS MKT 1.13 0.0264 42.9 3.25e-248
7 EEM (Intercept) -0.000491 0.000203 -2.42 1.57e- 2
8 EEM MKT 1.40 0.0313 44.8 4.94e-263
9 AGG (Intercept) 0.0000888 0.0000572 1.55 1.21e- 1
10 AGG MKT -0.0282 0.00883 -3.19 1.46e- 3</code></pre>
<p>Let’s use <code>glance()</code> and <code>augment()</code> as well.</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
tidied_one = map(one_factor_model, tidy),
glanced_one = map(one_factor_model, glance),
augmented_one = map(one_factor_model, augment))</code></pre>
<pre><code># A tibble: 5 x 6
asset data one_factor_model tidied_one glanced_one augmented_one
<chr> <list> <list> <list> <list> <list>
1 SPY <tibble… <S3: lm> <data.frame… <data.frame… <data.frame […
2 EFA <tibble… <S3: lm> <data.frame… <data.frame… <data.frame […
3 IJS <tibble… <S3: lm> <data.frame… <data.frame… <data.frame […
4 EEM <tibble… <S3: lm> <data.frame… <data.frame… <data.frame […
5 AGG <tibble… <S3: lm> <data.frame… <data.frame… <data.frame […</code></pre>
<p>Again, we use <code>unnest()</code> if we wish to look at the glanced or augmented results.</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
tidied_one = map(one_factor_model, tidy),
glanced_one = map(one_factor_model, glance),
augmented_one = map(one_factor_model, augment)) %>%
# unnest(tidied_one)
unnest(glanced_one)</code></pre>
<pre><code># A tibble: 5 x 16
asset data one_factor_model tidied_one augmented_one r.squared
<chr> <list> <list> <list> <list> <dbl>
1 SPY <tibble … <S3: lm> <data.frame … <data.frame [1… 0.776
2 EFA <tibble … <S3: lm> <data.frame … <data.frame [1… 0.855
3 IJS <tibble … <S3: lm> <data.frame … <data.frame [1… 0.594
4 EEM <tibble … <S3: lm> <data.frame … <data.frame [1… 0.615
5 AGG <tibble … <S3: lm> <data.frame … <data.frame [1… 0.00803
# ... with 10 more variables: adj.r.squared <dbl>, sigma <dbl>,
# statistic <dbl>, p.value <dbl>, df <int>, logLik <dbl>, AIC <dbl>,
# BIC <dbl>, deviance <dbl>, df.residual <int></code></pre>
<pre class="r"><code> # unnest(augmented_one)</code></pre>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
tidied_one = map(one_factor_model, tidy),
glanced_one = map(one_factor_model, glance),
augmented_one = map(one_factor_model, augment)) %>%
# unnest(tidied_one)
# unnest(glanced_one)
unnest(augmented_one) %>%
head(5)</code></pre>
<pre><code># A tibble: 5 x 10
asset returns MKT .fitted .se.fit .resid .hat .sigma .cooksd
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SPY 0.0253 0.0199 0.0204 3.17e-4 4.90e-3 7.99e-3 0.00354 7.77e-3
2 SPY -0.00226 -0.0021 -0.00201 1.07e-4 -2.47e-4 9.17e-4 0.00354 2.24e-6
3 SPY 0.00438 0.0064 0.00665 1.36e-4 -2.27e-3 1.47e-3 0.00354 3.02e-4
4 SPY -0.00274 -0.0014 -0.00130 1.04e-4 -1.44e-3 8.59e-4 0.00354 7.07e-5
5 SPY -0.00288 -0.0027 -0.00263 1.11e-4 -2.56e-4 9.82e-4 0.00354 2.56e-6
# ... with 1 more variable: .std.resid <dbl></code></pre>
<p>Let’s change gears a bit and evaluate how well each model explained one of our asset returns, IJS, based on adjusted R-squareds.</p>
<p>First, we use <code>filter(asset == "IJS")</code>, nest the data, then <code>map()</code> each of our models to the nested data. I don’t want the raw data anymore, so will remove it with <code>select(-data)</code>.</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
filter(asset == "IJS") %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
three_factor_model = map(data, model_two),
five_factor_model = map(data, model_three)) %>%
select(-data)</code></pre>
<pre><code># A tibble: 1 x 4
asset one_factor_model three_factor_model five_factor_model
<chr> <list> <list> <list>
1 IJS <S3: lm> <S3: lm> <S3: lm> </code></pre>
<p>We now have our three model results for the returns of IJS. Let’s use <code>gather()</code> to put those results in tidy format, and then <code>glance()</code> to get at the adjusted R-squared, AIC, and BIC.</p>
<pre class="r"><code>models_results <-
data_joined_tidy %>%
group_by(asset) %>%
filter(asset == "IJS") %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
three_factor_model = map(data, model_two),
five_factor_model = map(data, model_three)) %>%
select(-data) %>%
gather(models, results, -asset) %>%
mutate(glanced_results = map(results, glance)) %>%
unnest(glanced_results) %>%
select(asset, models, adj.r.squared, AIC, BIC)
models_results</code></pre>
<pre><code># A tibble: 3 x 5
asset models adj.r.squared AIC BIC
<chr> <chr> <dbl> <dbl> <dbl>
1 IJS one_factor_model 0.594 -9282. -9266.
2 IJS three_factor_model 0.599 -9298. -9272.
3 IJS five_factor_model 0.637 -9421. -9385.</code></pre>
<p>Let’s plot these results and quickly glance at where the adjusted R-squareds lie. We will call <code>ggplot(aes(x = models, y = adj.r.squared, color = models))</code> and then <code>geom_point()</code>.</p>
<pre class="r"><code>models_results %>%
ggplot(aes(x = models, y = adj.r.squared, color = models)) +
geom_point() +
labs(x = "",
title = "Models Comparison") +
theme(plot.title = element_text(hjust = 0.5))</code></pre>
<p><img src="/post/2018-11-13-many-factor-models_files/figure-html/unnamed-chunk-19-1.png" width="672" /></p>
<p>That chart looks alright, but the models are placed on the x-axis in alphabetical order, whereas I’d prefer they go in ascending order based on adjusted R-squared. We can make that happen with <code>ggplot(aes(x = reorder(models, adj.r.squared)...)</code>. Let’s also add labels on the chart with <code>geom_text(aes(label = models), nudge_y = .003)</code>. Since we’re labeling in the chart, we can remove the x-axis labels with <code>theme(axis.text.x = element_blank())</code>.</p>
<pre class="r"><code>models_results %>%
ggplot(aes(x = reorder(models, adj.r.squared), y = adj.r.squared, color = models)) +
geom_point(show.legend = NA) +
geom_text(aes(label = models),
nudge_y = .003) +
labs(x = "",
title = "Models Comparison") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_blank(),
axis.ticks.x=element_blank())</code></pre>
<p><img src="/post/2018-11-13-many-factor-models_files/figure-html/unnamed-chunk-20-1.png" width="672" /></p>
<p>Pay close attention to the scale on the y-axis. The lowest adjusted R-squared is less than .05 away from the highest. Maybe that amount is meaningful in your world, and maybe it isn’t.</p>
<p>Before we close, let’s get back to modeling and saving results. Here is the full code for running each model on each asset, then tidying, glancing, and augmenting those results. The result is a compact, nested tibble where the columns can be unnested depending on which results we wish to extract.</p>
<pre class="r"><code>data_joined_tidy %>%
group_by(asset) %>%
nest() %>%
mutate(one_factor_model = map(data, model_one),
three_factor_model = map(data, model_two),
five_factor_model = map(data, model_three)) %>%
mutate(tidied_one = map(one_factor_model, tidy),
tidied_three = map(three_factor_model, tidy),
tidied_five = map(five_factor_model, tidy)) %>%
mutate(glanced_one = map(one_factor_model, glance),
glanced_three = map(three_factor_model, glance),
glanced_five = map(five_factor_model, glance)) %>%
mutate(augmented_one = map(one_factor_model, augment),
augmented_three = map(three_factor_model, augment),
augmented_five = map(five_factor_model, augment)) # %>% </code></pre>
<pre><code># A tibble: 5 x 14
asset data one_factor_model three_factor_mod… five_factor_mod…
<chr> <list> <list> <list> <list>
1 SPY <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
2 EFA <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
3 IJS <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
4 EEM <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
5 AGG <tibble [1,25… <S3: lm> <S3: lm> <S3: lm>
# ... with 9 more variables: tidied_one <list>, tidied_three <list>,
# tidied_five <list>, glanced_one <list>, glanced_three <list>,
# glanced_five <list>, augmented_one <list>, augmented_three <list>,
# augmented_five <list></code></pre>
<pre class="r"><code> # unnest any broomed column for viewing
# unnest(Insert nested column name here)</code></pre>
<p>There are probably more efficient ways to do this, and in the future we’ll explore a package that runs these model comparisons for us, but for now, think about how we could wrap this work to a Shiny application or extend this code flow to accommodate more models and more assets.</p>
<p>Thanks for reading and see you next time!</p>
<script>window.location.href='https://rviews.rstudio.com/2018/11/19/many-factor-models/';</script>
A Mathematician's Perspective on Topological Data Analysis and R
https://rviews.rstudio.com/2018/11/14/a-mathematician-s-perspective-on-topological-data-analysis-and-r/
Wed, 14 Nov 2018 00:00:00 +0000https://rviews.rstudio.com/2018/11/14/a-mathematician-s-perspective-on-topological-data-analysis-and-r/
<p>A few years ago, when I first became aware of Topological Data Analysis (TDA), I was really excited by the possibility that the elegant theorems of Algebraic Topology could provide some new insights into the practical problems of data analysis. But time has passed, and the <a href="https://arxiv.org/pdf/1609.08227.pdf">sober assessment</a> of Larry Wasserman seems to describe where things stand.</p>
<blockquote>
<p><em>TDA is an exciting area and is full of interesting ideas. But so far, it has had little impact on data analysis.</em></p>
</blockquote>
<p>Nevertheless, TDA researchers have been quietly working the problem and at least some of them are using R (see below). Since I first read Professor Wasserman’s paper, I have been very keen on getting the perspective of a TDA researcher. So, I am delighted to present the following interview with <a href="https://sites.google.com/site/noahgian/">Noah Giansiracusa</a>, Algebraic Geometer, TDA researcher and co-author of a <a href="https://amstat.tandfonline.com/doi/full/10.1080/10618600.2017.1422432#.W-ICw5NKhpg">recent JCGS paper</a> on a new visualization tool for persistent homology.</p>
<p><strong>Q: Hello Dr. Giansiracusa. Thank you for making time for us at R Views. How did you get interested in TDA?</strong></p>
<p>While doing a postdoc in pure mathematics (algebraic geometry, specifically) I, probably like many people, could not escape a feeling that crept up from time to time—particularly during the more challenging moments of research frustration—that perhaps the efforts I was putting into proving abstract theorems might have been better spent working in a more practical, applied realm of mathematics. However, pragmatic considerations made me apprehensive at that point in my career to take a sudden departure, for I finally felt like I was gaining some momentum in algebraic geometry, developing a nice network of supportive colleagues, etc., and also that I would very soon be on the tenure track job market and I knew that if I was hired (a big “if”!) it would be for a subject I had actually published in, not one I was merely curious about or had recently moved into. But around this same time I kept hearing about an exciting but possibly over-hyped topic called topological data analysis: TDA. It really seemed to be in the air at the time (this was about five years ago).</p>
<p><strong>Q: Why do you think TDA took off?</strong></p>
<p>I can only speak for myself, but I think there were two big reasons that TDA generated so much buzz among mathematicians at the early stages.</p>
<p>First, it was then, and still is impossible to escape the media blitz on “big data” and the “data revolution” and related sentiments. This is felt strongly within academic circles (our deans would love us all to be working in data it seems!) but also in the mainstream press. Yet, I think pure mathematicians often felt somewhat on the periphery of this revolution: we knew that the modern developments in data and deep learning and artificial intelligence would not be possible without the rigorous foundations our mathematical ancestors had laid, but we also knew that most of the theorems we are currently proving would in all likelihood play absolutely zero role in any of the contemporary story. TDA provided a hope for relevance, that in the end the pure mathematician would come out of the shadows of obscurity and strike a data science victory proving our ineluctable relevance and superiority in all things technical—and this hope quickly turned to hype.</p>
<p>I think I and many other pure mathematicians were rooting for TDA, to show the world that our work has value. We tired of telling stories of how mathematicians invented differential geometry before Einstein used it in relativity and your GPS would not be possible without this. We needed a more fresh, decisive victory in the intellectual landscape; number theory used in cryptography is great, but still too specialized: TDA had the promise of bringing us into the (big) data revolution. And so we hoped, and we hyped.</p>
<p>And second, from a very practical perspective, I simply did not have time to retrain myself in applied math, the usual form of applied math based heavily on differential equations, modeling, numerical analysis, etc. But TDA seemed to offer a chance to gently transition to data science mathematical relevance—instead of starting from scratch, pure mathematicians such as myself would simply need to add one more chapter to our background in topics like algebraic topology and then we’d be ready to go and could brand ourselves as useful! And if academia didn’t work out, Google would surely rather open the doors of employment to a TDA expert than to a traditional algebraic topologist (or algebraic geometer, in my case).</p>
<p>I think these are two of the main things that brought TDA so much early attention before it really had many real-world successes under its belt, and they are certainly what brought me to it; well, that and also an innocent curiosity to understand what TDA really is, how it works, and whether or not it does what it claims.</p>
<p><strong>Q: So how did you get started?</strong></p>
<p>I first dipped my toes in the TDA waters by agreeing to do a reading course with a couple undergraduates interested in the topic; then I led an undergraduate/master’s level course where we studied the basics of persistent homology, downloaded some data sets, and played around. We chose to use R for that since there are many data sets readily available, and also because we wanted to do some simple experiments like sampling points from nice objects like a sphere but then adding noise, so we knew we wanted to have a lot of statistical functions available to us and R had that plus TDA packages already. While doing this I grew to quite like R and so have stuck with it ever since. In fact, I’m now using it also on a (non-TDA) project to analyze Supreme Court voting patterns from a computational geometry perspective.</p>
<p><strong>Q: Do you think TDA might become a practical tool for statisticians?</strong></p>
<p>First of all, I think this is absolutely the correct way to phrase the question! A few years ago TDA seemed to have almost an adversarial nature to it, that topologists were going to do what data scientists were doing but better because fancy math and smart people were involved. So the question at the time seemed to be whether TDA would supplant other forms of data science, and this was a very unfortunate way to view things.</p>
<p>But, it was easy to entirely discredit TDA by saying that it makes no practical difference whether your data has the homology of a Klein bottle, or there were no real-world examples where TDA had outperformed machine learning. This type of dialogue was missing the point. As your question suggests, TDA should be viewed as a tool to be added to the quiver of data science arrows, rather than an entirely new weapon.<br />
In fact, while this clarification moves the dialogue in a healthy direction (TDA and machine learning should work together, rather than compete with each other!) I think there’s still one further step we should take here: TDA is not really a well-defined entity. For instance, when I see topics like random decision forests, it looks very much like topology to me! (Any graph, of which a tree is an example, is a 1-dimensional simplicial complex, and actually if you look under the hood, the standard Rips complex approach in TDA builds its higher dimensional simplicial complexes out of a 1-dimensional simplicial complex, so both random forests and TDA—and most of network theory—are really rooted in the common world of graph theory.)</p>
<p>Another example: the 0-dimensional barcode for the Rips flavor of TDA encodes essentially the same information as hierarchical clustering. All I’m saying here is that there’s more traditional data science in TDA than one might first imagine, and there’s more topology in traditional data science than one might realize. I think this is healthy, to recognize connections like these—it helps one see a continuum of intellectual development here rather than a discrete jump from ordinary data science to fancy topological data science.</p>
<p>That’s a long-winded way of saying that you phrased the question well. The (less long-winded) answer to the question is: Yes! Once one sees TDA as one more tool for extracting structure and statistics from data, it is much easier to imagine it being absorbed into the mainstream. It need not outperform all previous methods or revolutionize data science, it merely needs to be, exactly as you worded it, a practical tool. Data science is replete with tools that apply in some settings and not others, work better with some data than others, reveal relevant information sometimes more than others, and TDA (whatever it is exactly) fits right into this. There certainly will be some branches of TDA that gain more traction over the years than others, but I am absolutely convinced that at least some of the methods used in TDA will be absorbed into statistical learning just as things like random decision trees and SVMs (both of which have very topological/geometric flavors to them!) have. This does not mean that every statistician needs to learn TDA, just as not every statistician needs to learn all the latest methods in deep learning.</p>
<p><strong>Q: Where do you think TDA has had the most success?</strong></p>
<p>Over the past few years I think the biggest strides TDA has made have been in terms of better interweaving it with other methods and disciplines—so big topics with lots of progress but still room for more have included confidence intervals, distributions of barcodes, feature selection and kernel methods in persistent homology. These are all exciting topics and healthy for the long-term development of TDA.</p>
<p>I think, perhaps controversially, the next step might actually be to rid ourselves of the label TDA. For one thing, TDA is very geometric and not just topological (which is to say: distances matter!). But the bigger issue for me is that we should refer to the actual tools being used (mapper, persistent homology in its various flavors, etc.) rather than lump them arbitrarily together under this common label. It took many years for statisticians to jump on the machine learning bandwagon, and part of what prevented them from doing so sooner was language; the field of statistical learning essentially translates machine learning into more familiar statistical terminology and reveals that it is just another branch of the same discipline. I suspect something similar will take place with TDA… er, I should say, with these recent topological and geometric data analysis tools.</p>
<p><strong>Q: Do you think focusing on the kinds of concrete problems faced when trying to apply topological and algebraic ideas to data analysis will turn out to be a productive means of motivating mathematical research?</strong></p>
<p>Yes, absolutely—and this is also a great question and a healthy way to look at things! Pure mathematicians have no particular agenda or preconceived notion of what they should and should not be studying: pure mathematics, broadly speaking, is logical exploration and development of structure and symmetry. The more intricate a structure appears to be, and the more connected to other structures we have studied, the more interested we tend to be in it. But that really is pretty much all we need to be interested—and to be happy.</p>
<p>So TDA provides a whole range of new questions we can ask, and new structures we can uncover, and inevitably many of these will tie back to earlier areas of pure mathematics in fascinating ways—all the while, throughout these explorations pure mathematicians likely will end up laying foundations that help provide a stable scaffolding for the adventurous data practitioners who jump into methodology before the full mathematical landscape has been revealed. So TDA absolutely will lead to new, important mathematical research: important both because we’ll uncover beautiful structures and connections, and important also because it will provide some certainty to the applied methods build on top of this.</p>
<p><strong>Q: More specifically, what role might the R language play in facilitating the practice or teaching of mathematics?</strong></p>
<p>Broadly speaking, I think many teachers—especially in pure mathematics—undervalue the importance of computer programming skills, though this is starting to change as pure mathematicians increasingly enjoy experimentation as a way of exploring small examples, honing intuition, and finding evidence for conjectures. While the idea of theorem-proof mathematics is certainly the staple of our discipline, it’s not the only way to understand mathematical structure. In fact, students often find mathematical material resonates with them much more strongly if they uncover a pattern by experimenting on a computer rather than just being fed it through lecture or textbooks. Concretely, if students play with something like the distribution of prime numbers, they might get excited to see the fascinating interplay between randomness and structure that emerges, and that can better prepare them to appreciate formally learning the prime number theorem in a classroom. So as things like TDA emerge, the number of pure mathematics topics that can be explored on a computer increases, and I think that’s a great thing.</p>
<p><strong>Q: Where does R fit in?</strong></p>
<p>Well, much of the mathematical exploration I’m referring to here is symbolic—so very precise and algebraic flavor—and R certainly has no limitations working precisely, but it’s not the main goal of the language so one likely would use a computer algebra system instead. But, one exciting thing TDA does help us see is that there’s a marvelous interface between the symbolic and numerical worlds (here represented by the topology and the statistics, respectively) and I think this is great for both teaching and research. The more common ground we find between topics that previously seemed quite disparate, the more chance we have of building meaningful interdisciplinary collaborations, the more perspectives we can provide our students to motivate and study something, and the more we see unity within mathematics. My favorite manifestation of this is that TDA is the study of the topology of discrete spaces—but discrete spaces have no non-trivial topology! What’s really going on then is that data gives us a discrete glimpse into a continuous, highly structured world, and TDA aims to restore the geometric structure lost due to sampling. In doing so one cannot, and should not, avoid statistics, so pure mathematics is brought meaningfully in contact with statistics and I absolutely love that. This means the R language finds a role in pure math where it may previously not have: topology with noise, algebra with uncertainty.</p>
<p><strong>Thank you again! I think your ideas are going to inspire some R Views readers.</strong></p>
<p>Editors note: here are some R packages for doing TDA:<br />
* <a href="https://cran.r-project.org/package=TDA">TDA</a> contains tools for the statistical analysis of persistent homology and for density clustering.<br />
* <a href="https://cran.r-project.org/package=TDAmapper">TDAmapper</a> enables TDA using Discrete Morse Theory.<br />
* <a href="https://cran.r-project.org/package=TDAstats">TDAstats</a> offers a tool set for TDA including for calculating persistent homology in a Vietoris-Rips complex.<br />
* <a href="https://cran.r-project.org/package=pterrace">pterrace</a> builds on TDA and offers a new multi-scale and parameter free summary plot for studying topological signals.</p>
<p><img src="/post/2018-11-07-Giansiracura-TDA_files/pterrace.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2018/11/14/a-mathematician-s-perspective-on-topological-data-analysis-and-r/';</script>
In-database xgboost predictions with R
https://rviews.rstudio.com/2018/11/07/in-database-xgboost-predictions-with-r/
Wed, 07 Nov 2018 00:00:00 +0000https://rviews.rstudio.com/2018/11/07/in-database-xgboost-predictions-with-r/
<p>Moving predictive machine learning algorithms into large-scale production environments can present many challenges. For example, problems arise when attempting to calculate prediction probabilities (“scores”) for many thousands of subjects using many thousands of features located on remote databases.</p>
<p><a href="https://cran.r-project.org/web/packages/xgboost/"><code>xgboost</code></a> (<a href="https://xgboost.readthedocs.io/en/latest/index.html">docs</a>), a popular algorithm for classification and regression, and the model of choice in many winning <a href="https://www.kaggle.com/">Kaggle</a> competitions, is no exception. However, to run <code>xgboost</code>, the subject-features matrix <a href="https://xgboost.readthedocs.io/en/latest/faq.html#i-have-a-big-dataset">must be loaded into memory</a>, a <strong>cumbersome and expensive</strong> process.</p>
<p>Available solutions require using expensive high-memory machines, or implementing external memory across distributed machines (expensive and in <a href="https://xgboost.readthedocs.io/en/latest/tutorials/external_memory.html">beta</a>). Both solutions still require transferring all feature data from the database to the local machine(s), loading it into memory, calculating the probabilities for the subjects, and then transferring the probabilities back to the database for storage. I have seen this take, <strong>20-50 minutes for ~1MM subjects</strong>.</p>
<p>In this post, we will consider <strong>in-database scoring</strong>, a simple alternative for calculating batch predictions without having to transfer features stored in a database to the machine where the model is located. Instead, we will convert the model predictions into SQL commands and thereby transfer the scoring process to the database.</p>
<p><img src="/post/2018-11-05-Roland-xgboost_files/xgboost_workflows.PNG" height = "400" width="600"></p>
<p>We will convert the <code>xgboost</code> model prediction process into a SQL query, and thereby accomplish the same task while leveraging a cloud database’s scalability to efficiently calculate the predictions.</p>
<p>To accomplish this, we’ll need to work through a few steps. First, we’ll import the model as a list of nested tree structures that we can iterate through recursively. Then, we’ll create a function that will recursively descend through a tree and translate it into a SQL CASE statement. After that, we’ll create a query that sums the CASE statements for all trees before logit-transforming it to calculate a probability.</p>
<p>This first block of code loads the required packages and converts the model object to a list of trees that we can work with:</p>
<pre><code class="language-r">library(xgboost)
library(jsonlite)
library(whisker)
# our model exists in the variable `xgb_model`:
# dump the list of trees as JSON and import it as `model_trees` using jsonlite
model_trees <- jsonlite::fromJSON(
xgb.dump(xgb_model, with_stats = FALSE, dump_format='json'),
simplifyDataFrame = FALSE)
</code></pre>
<p><img src="/post/2018-11-05-Roland-xgboost_files/xgboost_tree_structure.PNG" height = "400" width="600"></p>
<p>Now, we need to translate each tree into a SQL CASE statement. Each tree represents a set of decisions based on whether a variable (the ‘split’) is less than a threshold value (the ‘split_condition’). The result of the decision could be ‘yes’, ‘no’, or ‘missing’. In each case, the tree provides the ‘node_id’ of the next decision to evaluate. When we reach a leaf, no decision needs to be made and instead a value is returned. An example tree is shown below:</p>
<p>We’ll also need a dictionary that maps an integer to its associated feature name, since the trees themselves refer to 0-indexed integers instead of the feature names. We can accomplish that by creating the following list:</p>
<pre><code class="language-r">feature_dict <- as.list(xgb_model$feature_names)
</code></pre>
<p>Using our <code>feature_dict</code> object, we can recursively descend through the tree and translate each node into a CASE statement, producing a sequence of nested CASE statements. The following function does just that:</p>
<pre><code class="language-r">xgb_tree_sql <- function(tree, feature_dict, sig=5){
# split variables must exist to generate subquery for tree children
sv <- c("split", "split_condition", "yes", "no", "missing", "children")
# we have a leaf, just return the leaf value
if("leaf" %in% names(tree)){
return(round(tree[['leaf']],sig))
}
else if(all(sv %in% names(tree))){
tree$split_long <- feature_dict[[tree$split+1]] # +1 because xgboost is 0-indexed
cs <- c(tree$yes, tree$no, tree$missing)
cd <- data.frame(
k = c(min(cs), max(cs)),
v = c(1,2)
)
tree$missing_sql <- xgb_tree_sql(tree$children[[cd$v[cd$k==tree$missing]]], feature_dict)
tree$yes_sql <- xgb_tree_sql(tree$children[[cd$v[cd$k==tree$yes]]], feature_dict)
tree$no_sql <- xgb_tree_sql(tree$children[[cd$v[cd$k==tree$no]]], feature_dict)
q <- "
CASE
WHEN {{{split_long}}} IS NULL THEN {{{missing_sql}}}
WHEN {{{split_long}}} < {{{split_condition}}} THEN {{{yes_sql}}}
ELSE {{{no_sql}}}
END
"
return(whisker.render(q,tree))
}
}
</code></pre>
<p>When we transform one tree into a sequence of nested CASE statements, we are producing a statement that yields that tree’s contribution to the total score. We now need to sum the output of each tree and then calculate the total probability prediction. In other words, we need to add up a list of nested CASE statements and then logit-transform the result.</p>
<p>Note that below we make use of the R <code>whisker</code> package. This logic-less templating language is a great way to easily transform associative-arrays into SQL that contains easily identifiable labels as placeholders. We find this more readable than sequences of <code>paste</code> statements.</p>
<pre><code class="language-r">xgb_sql_score_query <- function(list_of_trees, features_table, feature_dict, key_field = "id"){
# a swap list to render queries via whisker
swap <- list(
key_field = key_field,
features_table = features_table
)
# score_queries contains the score query for each tree in the list_of_trees
score_queries <- lapply(list_of_trees, function(tree){
xgb_tree_sql(tree, feature_dict)
})
# the query clause to sum the scores from each tree
swap$sum_of_scores <- paste(score_queries, collapse=' + ')
# score query that logit-transforms the sum_of_scores
q <- "
SELECT
{{{key_field}}},
1/(1+exp(-1*( {{{sum_of_scores}}} ))) AS score
FROM `{{{features_table}}}`
"
return(whisker.render(q,swap))
}
</code></pre>
<p>We are now ready to generate the score query from our model:</p>
<pre><code class="language-r">queries <- xgb_sql_score_query(
model trees,
'mydataset.my_feature_table',
feature_dict
)
for(q in queries){
# example: run the query with the R bigrquery package
bq_project_query('my_project', q)
}
</code></pre>
<p>In summary, production models typically calculate predictions for <strong>all subjects</strong> on a daily, hourly, or even more frequent basis; however, moving feature data between a database and a local “scoring” machine is expensive and slow. Transferring the scoring calculations to run within the database, as we’ve shown above, can significantly reduce both cost and run time.</p>
<p>The astute reader may notice that, depending on the database, this will only work for a limited number of trees. When that becomes a problem, it is possible to add another layer that stores the summed scores for batches of trees as views or tables, and then aggregates their results. Beyond that, when queries with views become too long, it is possible to add an additional layer than aggregates batches of views into tables. We will save all of this for a future post.</p>
<p>Roland Stevenson is a data scientist and consultant who may be reached on <a href="https://www.linkedin.com/in/roland-stevenson/">Linkedin</a></p>
<script>window.location.href='https://rviews.rstudio.com/2018/11/07/in-database-xgboost-predictions-with-r/';</script>
Communicating results with R Markdown
https://rviews.rstudio.com/2018/11/01/r-markdown-a-better-approach/
Thu, 01 Nov 2018 00:00:00 +0000https://rviews.rstudio.com/2018/11/01/r-markdown-a-better-approach/
<p><img src="/post/2018-10-31-Stephens-Communicate_files/r4ds-com.png" height = "400" width="600"></p>
<p>In my training as a consultant, I learned that long hours of analysis were typically followed by equally long hours of preparing for presentations. I had to turn my complex analyses into recommendations, and my success as a consultant depended on my ability to influence decision makers. I used a variety of tools to convey my insights, but over time I increasingly came to rely on <a href="https://rmarkdown.rstudio.com/">R Markdown</a> as my tool of choice. R Markdown is easy to use, allows others to reproduce my work, and has powerful features such as parameterized inputs and multiple output formats. With R Markdown, I can share more work with less effort than I did with previous tools, making me a more effective data scientist. In this post, I want to examine three commonly used communication tools and show how R Markdown is often the better choice.</p>
<div id="microsoft-office" class="section level3">
<h3>Microsoft Office</h3>
<p>The de facto tools for communication in the enterprise are still Microsoft Word, PowerPoint, and Excel. These tools, born in the 80’s and rising to prominence in the 90’s, are used everywhere for sharing reports, presentations, and dashboards. Although Microsoft Office documents are easy to share, they can be cumbersome for data scientists to write because they cannot be written with code. Additionally:</p>
<ul>
<li>They are not reproducible.</li>
<li>They are separate from the code you used to create your analysis.</li>
<li>They can be time-consuming to create and difficult to maintain.</li>
</ul>
<p>In data science, your code - not your report or presentation - is the source of your results. Therefore, your documents should also be based on code! You can accomplish this with R Markdown, which produces documents that are generated by code, reproducible, and easy to maintain. Moreover, R Markdown documents can be rendered in <a href="https://bookdown.org/yihui/rmarkdown/word-document.html">Word</a>, <a href="https://bookdown.org/yihui/rmarkdown/powerpoint-presentation.html">PowerPoint</a>, and many other output formats. So, even if your client insists on having Microsoft documents, by generating them with R Markdown, you can spend more time working on your code and less time maintaining reports.</p>
</div>
<div id="r-scripts" class="section level3">
<h3>R Scripts</h3>
<p>Data science often involves interactive analyses with code, but code by itself is usually not enough to communicate results in an enterprise setting. In a <a href="https://rviews.rstudio.com/2017/03/15/why-i-love-r-notebooks/">previous post</a>, I explained the benefits of using <a href="https://bookdown.org/yihui/rmarkdown/notebook.html">R Notebooks</a> over R scripts for doing data science. An R Notebook is a special execution mode of R Markdown with two characteristics that make it very useful for communicating results:</p>
<ul>
<li>Rendering a preview of an R Notebook does not execute R code, making it computationally convenient to create reports during or after interactive analyses.</li>
<li>R Notebooks have an embedded copy of the source code, making it convenient for others to examine your work.</li>
</ul>
<p>These two characteristics of R Notebooks combine the advantages of R scripts with the advantages of R Markdown. Like R scripts, you can do interactive data analyses and see all your code, but unlike R scripts you can easily create reports that explain why your code is important.</p>
</div>
<div id="shiny" class="section level3">
<h3>Shiny</h3>
<p>Shiny and R Markdown are both used to communicate results. They both depend on R, generate high-quality output, and can be designed to accept user inputs. In previous posts, we discussed <a href="https://rviews.rstudio.com/2017/09/20/dashboards-with-r-and-databases/">Dashboards with Shiny</a> and <a href="https://rviews.rstudio.com/2018/05/16/replacing-excel-reports-with-r-markdown-and-shiny/">Dashboards with R Markdown</a>. Knowing when to use Shiny and when to use R Markdown will increase your ability to influence decision makers.</p>
<table style="width:44%;">
<colgroup>
<col width="20%" />
<col width="23%" />
</colgroup>
<thead>
<tr class="header">
<th>Shiny Apps</th>
<th>R Markdown Documents</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Have an interactive and responsive user experience.</td>
<td>Are snapshots in time, rendered in batch.</td>
</tr>
<tr class="even">
<td>Are hosted on a web server that runs R.</td>
<td>Have multiple output types such as HTML, Word, PDF, and many more.</td>
</tr>
<tr class="odd">
<td>Are not portable (i.e., users must visit the app).</td>
<td>Are files that can be sent via email or otherwise shared.</td>
</tr>
</tbody>
</table>
<p>Shiny is great – even “magical” – when you want your end users to have an interactive experience, but R Markdown documents are often simpler to program, easier to maintain, and can reach a wider audience. I use Shiny when I need an interactive user experience, but for everything else, I use R Markdown.</p>
<p>If you need to accept user input, but you don’t require the reactive framework of Shiny, you can <a href="https://bookdown.org/yihui/rmarkdown/parameterized-reports.html">add parameters</a> to your R Markdown code. This <a href="https://resources.rstudio.com/rstudio-connect-2/parameterized-r-markdown-reports-with-rstudio-connect-aron-atkins">process is easy and powerful</a>, yet remains underutilized by most R users. It is a feature that would benefit a wide range of use cases, especially where the full power of Shiny is not required. Additionally, adding parameters to your document makes it easy to generate multiple versions of that document. If you host a document on <a href="https://www.rstudio.com/products/connect/">RStudio Connect</a>, then users can select inputs and generate new versions on demand. Many Shiny applications today would be better suited as parameterized R Markdown documents.</p>
<p>Finally, Shiny and R Markdown are not mutually exclusive. You can include Shiny elements in an R Markdown document, which enables you create a report that responds interactively to user inputs. These <a href="https://bookdown.org/yihui/rmarkdown/shiny-documents.html">Shiny documents</a> are created with the simplicity of R markdown, but have the same hosting requirements of a Shiny app and are not portable.</p>
</div>
<div id="summary" class="section level3">
<h3>Summary</h3>
<p>Using the right tools for communication matters. R Markdown is a better solution than conventional tools for the following problems:</p>
<table>
<thead>
<tr class="header">
<th>Problem</th>
<th>Common tool</th>
<th>Better tool</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Share reports and presentations</td>
<td>Microsoft Office</td>
<td>R Markdown</td>
</tr>
<tr class="even">
<td>Summarize and share your interactive analyses</td>
<td>R Scripts</td>
<td>R Notebooks</td>
</tr>
<tr class="odd">
<td>Update results (in batch) based on new inputs</td>
<td>Shiny</td>
<td>Parameterized reports</td>
</tr>
</tbody>
</table>
<p><a href="http://r4ds.had.co.nz/index.html">R For Data Science</a> explains that, <em>“It doesn’t matter how great your analysis is unless you can explain it to others: you need to communicate your results.”</em> I highly recommend reading <a href="https://r4ds.had.co.nz/communicate-intro.html">Part V</a> of this book, which has chapters on using <a href="https://r4ds.had.co.nz/r-markdown.html">R Markdown</a> as a unified authoring framework for data science, using <a href="https://r4ds.had.co.nz/r-markdown-formats.html">R Markdown formats</a> for effective communication, and using <a href="https://r4ds.had.co.nz/r-markdown-workflow.html">R Markdown workflows</a> to create analysis notebooks. There are references at the end of these chapters that describe where to learn more about communication.</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2018/11/01/r-markdown-a-better-approach/';</script>
Reproducible Finance, the book! And a discount for our readers
https://rviews.rstudio.com/2018/10/29/reproducible-finance-the-book/
Mon, 29 Oct 2018 00:00:00 +0000https://rviews.rstudio.com/2018/10/29/reproducible-finance-the-book/
<p>I’m thrilled to announce the release of my new book <a href="https://www.crcpress.com/Reproducible-Finance-with-R-Code-Flows-and-Shiny-Apps-for-Portfolio-Analysis/Regenstein-Jr/p/book/9781138484030">Reproducible Finance with R: Code Flows and Shiny Apps for Portfolio Analysis</a>, which originated as a series of R Views posts in this space. The <a href="https://rviews.rstudio.com/2016/11/09/reproducible-finance-with-r-the-sharpe-ratio/">first post</a> was written way back in November of 2016 - thanks to all the readers who have supported us along the way!</p>
<p>If you are familiar with the R Views posts, then you probably have a pretty good sense for the book’s style, prose, and code approach, but I’d like to add a few quick words of background.</p>
<p>The book’s practical motivations are: (1) to introduce R to finance professionals, or aspiring finance professionals, who wish to move beyond Excel for their quantitative work, and (2) to introduce various finance coding paradigms to R coders.</p>
<p>The softer motivation is to demonstrate and emphasize readable, reusable, and reproducible R code, data visualizations, and Shiny dashboards. It will be very helpful to have some background in the R programming language <em>or</em> in finance, but the most important thing is a desire to learn about the landscape of R code and finance packages.</p>
<p>An overarching goal of the book is to introduce the three major R paradigms for portfolio analysis: <code>xts</code>, the <code>tidyverse</code>, and <code>tidyquant</code>. As a result, we will frequently run the same analysis using three different code flows.</p>
<p>If that ‘three-universe’ structure seems a bit unclear, have a quick look back at <a href="https://rviews.rstudio.com/2017/12/13/introduction-to-skewness/">this post on skewness</a> and <a href="https://rviews.rstudio.com/2018/01/04/introduction-to-kurtosis/">this post on kurtosis</a>, and you’ll notice that we solve the same task and get the same result with different code paths.</p>
<p>For example, if we had portfolio returns saved in a tibble object called <code>portfolio_returns_tq_rebalanced_monthly</code>, and an <code>xts</code> object called <code>portfolio_returns_xts_rebalanced_monthly</code>, and our goal was to find the Sharpe Ratio of portfolio returns, we would start in the <code>xts</code> world with <code>SharpeRatio()</code> from the <code>PerformanceAnalytics</code> package.</p>
<pre class="r"><code># define risk free rate variable
rfr <- .0003
sharpe_xts <-
SharpeRatio(portfolio_returns_xts_rebalanced_monthly,
Rf = rfr,
FUN = "StdDev") %>%
`colnames<-`("sharpe_xts")
sharpe_xts</code></pre>
<pre><code>## sharpe_xts
## StdDev Sharpe (Rf=0%, p=95%): 0.2748752</code></pre>
<p>We next would use the tidyverse and run our calculations in a piped flow:</p>
<pre class="r"><code>sharpe_tidyverse_byhand <-
portfolio_returns_tq_rebalanced_monthly %>%
summarise(sharpe_dplyr = mean(returns - rfr)/
sd(returns - rfr))
sharpe_tidyverse_byhand</code></pre>
<pre><code>## # A tibble: 1 x 1
## sharpe_dplyr
## <dbl>
## 1 0.275</code></pre>
<p>And then head to the <code>tidyquant</code> paradigm where we can apply the <code>SharpeRatio()</code> function to a tidy tibble.</p>
<pre class="r"><code>sharpe_tq <-
portfolio_returns_tq_rebalanced_monthly %>%
tq_performance(Ra = returns,
performance_fun = SharpeRatio,
Rf = rfr,
FUN = "StdDev") %>%
`colnames<-`("sharpe_tq")</code></pre>
<p>We can compare our three Sharpe objects and confirm consistent results.</p>
<pre class="r"><code>sharpe_tq %>%
mutate(tidy_sharpe = sharpe_tidyverse_byhand$sharpe_dplyr,
xts_sharpe = sharpe_xts)</code></pre>
<pre><code>## # A tibble: 1 x 3
## sharpe_tq tidy_sharpe xts_sharpe
## <dbl> <dbl> <dbl>
## 1 0.275 0.275 0.275</code></pre>
<p>We might be curious how the Sharpe-Ratio-to-standard-deviation ratio of our portfolio compares to those of the component ETFs and a <code>ggplot</code> scatter is a nice way to visualize that.</p>
<pre class="r"><code>asset_returns_long %>%
na.omit() %>%
summarise(stand_dev = sd(returns),
sharpe = mean(returns - rfr)/
sd(returns - rfr)) %>%
add_row(asset = "Portfolio",
stand_dev =
portfolio_sd_xts_builtin[1],
sharpe =
sharpe_tq$sharpe_tq) %>%
ggplot(aes(x = stand_dev,
y = sharpe,
color = asset)) +
geom_point(size = 2) +
geom_text(
aes(x =
sd(portfolio_returns_tq_rebalanced_monthly$returns),
y =
sharpe_tq$sharpe_tq + .02,
label = "Portfolio")) +
ylab("Sharpe Ratio") +
xlab("standard deviation") +
ggtitle("Sharpe Ratio versus Standard Deviation") +
# The next line centers the title
theme_update(plot.title = element_text(hjust = 0.5))</code></pre>
<div class="figure"><span id="fig:unnamed-chunk-5"></span>
<img src="/post/2018-10-22-reproducible-finance-the-book_files/figure-html/unnamed-chunk-5-1.png" alt="Sharpe versus Standard Deviation" width="672" />
<p class="caption">
Figure 1: Sharpe versus Standard Deviation
</p>
</div>
<p>Finally, we are ready to calculate and visualize the Sharpe Ratio of a custom portfolio with Shiny and a flexdashboard, like the one found <a href="http://www.reproduciblefinance.com/shiny/sharpe-ratio/">here</a>.</p>
<p>As in the above example, most tasks in the book end with data visualization and then Shiny (a few early readers have commented with happy surprise that all the charts and code are in full color in the book - thanks to CRC press for making that happen!). Data visualization and Shiny are heavily emphasized - more so than in other finance books - and that might seem unusual. After all, every day we hear about how the financial world is becoming more quantitatively driven as firms race towards faster, more powerful algorithms. Why emphasize good ol’ data visualization? I believe, and have observed first-hand, that the ability to communicate and tell the story of data in a compelling way is only going to become more crucial as the financial world becomes more complex. Investors, limited partners, portfolio managers, clients, risk managers - they might not want to read our code or see our data, but we still need to communicate to them the value of our work. Data visualization and Shiny dashboards are a great way to do that. By book’s end, a reader will have built a collection of live, functioning flexdashboards that can be the foundation for more complex apps in the future.</p>
<p>If you’ve read this far, good news! Between now and December 31, 2018, there’s a 20% discount on the book being run at <a href="https://crcpress.com/Reproducible-Finance-with-R-Code-Flows-and-Shiny-Apps-for-Portfolio-Analysis/Jr/p/book/9781138484030">CRC</a>, and if you don’t see it applied, readers can use discount code SS120 on the <a href="https://crcpress.com/Reproducible-Finance-with-R-Code-Flows-and-Shiny-Apps-for-Portfolio-Analysis/Jr/p/book/9781138484030">CRC website</a>. The book is also available on <a href="https://www.amazon.com/Reproducible-Finance-Portfolio-Analysis-Chapman/dp/1138484032">Amazon as Kindle or paperback</a> (but there’s only than 10 copies left as of right now).</p>
<p>Thanks so much for reading, and happy coding!</p>
<script>window.location.href='https://rviews.rstudio.com/2018/10/29/reproducible-finance-the-book/';</script>
CRAN’s New Missing Data Task View
https://rviews.rstudio.com/2018/10/26/cran-s-new-missing-values-task-view/
Fri, 26 Oct 2018 00:00:00 +0000https://rviews.rstudio.com/2018/10/26/cran-s-new-missing-values-task-view/
<p>It is a relatively rare event, and cause for celebration, when CRAN gets a new Task View. This week the <a href="https://github.com/R-miss-tastic">r-miss-tastic</a> team: Julie Josse, Nicholas Tierney and Nathalie Vialaneix launched the <a href="https://CRAN.R-project.org/view=MissingData">Missing Data Task View</a>. Even though I did some research on R packages for a <a href="https://rviews.rstudio.com/2016/11/30/missing-values-data-science-and-r/">post on missing values</a> a couple of years ago, I was dumbfounded by the number of packages included in the new Task View.</p>
<iframe src="https://cran.r-project.org/view=MissingData" width="90%" height="300"> </iframe>
<hr />
<p>This single page not only describes what R has to offer with respect to coping with missing data, it is probably the world’s most complete index of statistical knowledge on the subject. But, the task view is not just devoted to esoterica. The <em>Exploration of missing data</em> section contains a number of tools that should be useful in any data wrangling effort like this plot diagnostic plot from the <code>dlookr</code> package.</p>
<p><img src="/post/2018-10-25-Missing-Values_files/dlookr.png" height = "300" width=90%></p>
<p>The downside that may curb one’s enthusiasm, is that mastering missing data techniques requires some serious study. Missing data problems are among the most vexing in statistics, and the newer techniques for tackling these problems are relatively sophisticated. So, unless you are a <code>na.omit()</code> kind of guy/gal (a data scientist?) coming to grips with <code>NAs</code> may involve a subtle inferential task embedded in the usual data wrangling effort.</p>
<p>It is also the case that it is not easy to find good introductory material on the subject. The notable exception is Stef van Buuren’s authoritative R-based textbook, <a href="https://stefvanbuuren.name/fimd/">Flexible Imputation of Missing Data</a> which he has graciously made available online.</p>
<p>I also found the review papers by <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3701793/">Dong and Peng</a> and by <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839993/">Horton and Kleinman</a> to be helpful. And, if you read French a little better than I do, the review by <a href="http://journal-sfds.fr/article/view/681">Imbert and Vialaneix</a> looks like it covers all of the basic material.</p>
<p>Finally, I should mention that the Missing Data Task View is part of the R Consortium funded project <a href="https://www.r-consortium.org/projects/awarded-projects">A unified platform for missing values methods and workflows</a>. Many thanks to the ISC for making it possible for the expert r-miss-tastic team to do the work.</p>
<script>window.location.href='https://rviews.rstudio.com/2018/10/26/cran-s-new-missing-values-task-view/';</script>
Searching for R Packages
https://rviews.rstudio.com/2018/10/22/searching-for-r-packages/
Mon, 22 Oct 2018 00:00:00 +0000https://rviews.rstudio.com/2018/10/22/searching-for-r-packages/
<script src="/rmarkdown-libs/htmlwidgets/htmlwidgets.js"></script>
<link href="/rmarkdown-libs/vis/vis.css" rel="stylesheet" />
<script src="/rmarkdown-libs/vis/vis.min.js"></script>
<script src="/rmarkdown-libs/visNetwork-binding/visNetwork.js"></script>
<script src="/rmarkdown-libs/FileSaver/FileSaver.min.js"></script>
<script src="/rmarkdown-libs/Blob/Blob.js"></script>
<script src="/rmarkdown-libs/canvas-toBlob/canvas-toBlob.js"></script>
<script src="/rmarkdown-libs/html2canvas/html2canvas.js"></script>
<script src="/rmarkdown-libs/jspdf/jspdf.debug.js"></script>
<hr />
<p>Searching for R packages is a vexing problem for both new and experienced R users. With over 13,000 packages already on <a href="https://cran.r-project.org/web/packages/">CRAN</a>, and new packages arriving at a rate of almost 200 per month, it is impossible to keep up. Package names can be almost anything, and they are rarely informative, so <a href="https://cran.r-project.org/web/packages/available_packages_by_name.html">searching by name</a> is of little help. I make it a point to look at all of the new packages arriving on CRAN each month, but after a month or so, when asked about packages related to some particular topic, more often than not, I have little more to offer than a vague memory that I saw something that might be useful.</p>
<p>Fortunately, package developers have provided some very useful tools, if you know where to look. :) This post presents a search strategy based on some relatively new packages I have come across in my monthly review.</p>
<pre class="r"><code>library(tidyverse)</code></pre>
<pre><code>## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──</code></pre>
<pre><code>## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.5
## ✔ tidyr 0.8.1 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0</code></pre>
<pre><code>## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()</code></pre>
<pre class="r"><code>library(packagefinder)
library(dlstats)
library(cranly)</code></pre>
<p><a href="https://CRAN.R-project.org/package=packagefinder">packagefinder v0.0.7</a>, which appeared on CRAN this past July, goes right to the heart of the problem and shows great promise. The function <code>findPackage()</code> allows you to do a keyword search through the metadata of all CRAN packages. Since I am researching a possible post on <a href="https://thomasleeper.com/Rcourse/Tutorials/permutationtests.html">Permutation Tests</a>, I thought I would give <code>packagefinder::findPackage()</code> the most straightforward search text I could think of. (Note that the link for <code>Permutation Tests</code> above goes to an example by Thomas Leeper that references the <code>coin</code> package. This is a pretty strong hint that I expect to find <code>coin</code> prominently listed among the results.)</p>
<p>Also note, that making the output a <code>tibble</code> is not just obsessive-compulsive tidy behavior. The default print method sends the output to the Viewer in the RStudio IDE.</p>
<pre class="r"><code>pt_pkg <- as.tibble(findPackage("permutation test"))</code></pre>
<pre><code>##
## 59 out of 13256 CRAN packages found in 6 seconds.</code></pre>
<pre class="r"><code>pt_pkg</code></pre>
<pre><code>## # A tibble: 59 x 5
## SCORE NAME DESC_SHORT DOWNL_TOTAL GO
## <dbl> <chr> <chr> <S3: format> <fct>
## 1 100 permutes Permutation Tests for Time Series … NA 8300
## 2 75 AUtests Approximate Unconditional and Perm… NA 502
## 3 75 jmuOutlier Permutation Tests for Nonparametri… NA 5564
## 4 75 lmPerm Permutation Tests for Linear Models NA 6083
## 5 75 NetRep Permutation Testing Network Module… NA 7453
## 6 75 perm Exact or Asymptotic permutation te… NA 8289
## 7 75 permDep Permutation Tests for General Depe… NA 8292
## 8 75 permuco "Permutation Tests for Regression,… NA 8297
## 9 75 RATest Randomization Tests NA 9287
## 10 75 treeperm Exact and Asymptotic K Sample Perm… NA 12442
## # ... with 49 more rows</code></pre>
<p>Unfortunately, the package is very new and not well-documented. It is not clear how <code>SCORE</code> is computed, and <code>DOWNL_TOTAL</code> is replete with <code>NA</code>s. Nevertheless, the function does seem to find packages. I can’t vouch for its completeness, but when I tried it out on some topics with which it I am familiar, it did a pretty thorough job. Note that <code>findPackage()</code> allows a user to set a weights parameter that affects how the search “hits in the package’s title, short description and long description”. So far, I have not found this to be particularly useful, but I have not spent a lot of time with it, either.</p>
<p>The next line of code just selects the columns we will be using.</p>
<pre class="r"><code>pt_pkg <- select(pt_pkg, NAME, DESC_SHORT)</code></pre>
<p>Now that we have a list of packages of interest, it would be nice to have an indication of the quality and usefulness of the packages selected. A natural measure of usefulness is the number of times the package has been downloaded. For this, we turn to the <code>cran_stats()</code> function from the <a href="https://cran.r-project.org/package=dlstats"><code>dlstats</code> package</a>. This function takes a vector of packages names as inputs, queries the <a href="http://cran-logs.rstudio.com/">RStudio download logs</a>, and returns a data frame listing the number of downloads by month for each package.</p>
<pre class="r"><code>pt_downloads <- cran_stats(pt_pkg$NAME)
dim(pt_downloads)</code></pre>
<pre><code>## [1] 2784 4</code></pre>
<pre class="r"><code>head(pt_downloads)</code></pre>
<pre><code>## start end downloads package
## 4485 2018-05-01 2018-05-31 52 permutes
## 4544 2018-06-01 2018-06-30 89 permutes
## 4603 2018-07-01 2018-07-31 92 permutes
## 4662 2018-08-01 2018-08-31 74 permutes
## 4721 2018-09-01 2018-09-30 227 permutes
## 4780 2018-10-01 2018-10-22 142 permutes</code></pre>
<p>Just a little wrangling yields a data frame that lists total downloads for each package over its lifespan.</p>
<pre class="r"><code>top_downloads <- pt_downloads %>% group_by(package) %>%
summarize(downloads = sum(downloads)) %>%
arrange(desc(downloads))
head(top_downloads,10)</code></pre>
<pre><code>## # A tibble: 10 x 2
## package downloads
## <fct> <int>
## 1 coin 1103426
## 2 exactRankTests 137674
## 3 RVAideMemoire 108837
## 4 perm 97071
## 5 logcondens 83033
## 6 HardyWeinberg 55735
## 7 biotools 47694
## 8 smacof 45257
## 9 SNPassoc 38920
## 10 broman 30956</code></pre>
<p>As expected, <code>coin</code> has flipped to the head of the list. Plotting the downloads over time shows that the package has increased in popularity over the past five years, and it looks like people have been doing a crazy amount of permutation testing over the past year or so.</p>
<pre class="r"><code>top_pkgs <- pt_downloads %>% filter(package %in% top_downloads$package[1:3])
ggplot(top_pkgs, aes(end, downloads, group=package, color=package)) +
geom_line() + geom_point(aes(shape=package))</code></pre>
<p><img src="/post/2018-10-17-searching-for-r-packages_files/figure-html/unnamed-chunk-7-1.png" width="672" /></p>
<p>One way to gauge the quality and reliability of a package is to see how many other packages depend on it. These would be the packages listed as “Reverse depends” and “Reverse imports” on the CRAN page for a package. Using the canonical link, <a href="https://cran.r-project.org/package=coin" class="uri">https://cran.r-project.org/package=coin</a>, we see that 24 packages are listed in these fields on the <code>coin</code> page.</p>
<p>Likewise, knowing something of an author’s background, his or her experience writing other R packages, and prominent R developers he or she may have collaborated with is also helpful in assessing whether to give a newly found package is worth a try. The same link above also shows the package’s authors. Checking the <a href="http://www.cran.r-project.org/contributors.html">Contributors page</a> for the R Project, we see that two authors are members of R Core and the lead author, Torsten Hothorn, is listed with the contributors who have provided “invaluable help”. The background and collaborators couldn’t be better.</p>
<p>In most cases, background checks aren’t so easy. However, with the help of the <code>build_network()</code> function from the <a href="https://cran.r-project.org/web/packages/cranly/index.html">cranly package</a>, it is simple to track down an author’s collaboration network. Here, we see that Torston has an extensive network of collaborators.</p>
<pre class="r"><code>p_db <- tools::CRAN_package_db()
clean_p_db <- clean_CRAN_db(p_db)
author_net <- build_network(object = clean_p_db, perspective = "author")
plot(author_net, author = "Torsten Hothorn", exact = FALSE)</code></pre>
<div id="htmlwidget-1" style="width:672px;height:480px;" class="visNetwork html-widget"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"nodes":{"colorlabel":["Tony Plate","Richard Heiberger","Gilles Kratzer","Henrik Singmann","Sandrine Pavoine","Daniel Chessel","Stephane Dray","Christian Kleiber","Achim Zeileis","Ben Bolker","John Fox","Kevin Wright","Martin Maechler","Jeremy VanDerWal","Michael Sumner","Kamil Erguler","Max Kuhn","Hadley Wickham","Dirk Eddelbuettel","Renaud Lancelot","Simon Blomberg","Jim Lemon","Karline Soetaert","Friedrich Leisch","Arni Magnusson","Christian Buchta","Kurt Hornik","Ken Aho","Nick Parsons","Rolf Turner","Jon Eugster","Andrea Farnham","Raphael Hartmann","Tea Isler","Ke Li","Silvia Panunzi","Sophie Schneider","Craig Wang","Torsten Hothorn","Terry Therneau","Alexandros Karatzoglou","Andrie de Vries","Jeff Enos","Bill Venables","Roman Hornung","Brian Ripley","Peter Buehlmann","Barry Eggleston","Christopher Jackson","Thomas Kneib","Andreas Mayr","Benjamin Hofner","Matthias Schmid","Romain Francois","Mikko Korpela","Fabian Scheipl","Greg Snow","Kevin Ushey","Bjoern Bornkamp","Duncan Murdoch","Dieter Menne","Uwe Ligges","Klaus Nordhausen","Zhu Wang","Michael Friendly","David Ruegamer","Thomas Petzoldt","Spencer Graves","Henric Nilsson","Derek Ogle","David Winsemius","Roger Bivand","Andreas Alfons","Michael Smithson","Gabor Grothendieck","Matthieu Stigler","Venkatraman E Seshan","Andreas Bender","Mark A van de Wiel","Henric Winell","Tyler Rinker","Alec Stephenson","Christian W Hoffmann","Tal Galili","Gregory R Warnes","Barry Rowlingson","Rob J Hyndman","Joshua Ulrich","Marc Schwartz","Andri Signorell","Nanina Anderegg","Tomas Aragon","Antti Arppe","Adrian Baddeley","Kamil Barton","Frederico Caeiro","Stephane Champely","Leanne Chhay","Clint Cummins","Michael Dewey","Harold C Doran","Charles Dupont","Claus Ekstrom","Martin Elff","Richard W Farebrother","Matthias Gamer","Joseph L Gastwirth","Yulia R Gel","Juergen Gross","Frank E Harrell Jr","Michael Hoehle","Markus Huerzeler","Wallace W Hui","Pete Hurd","Pablo J Villacorta Iglesias","Matthias Kohl","Detlew Labes","Friederich Leisch","Dong Li","Daniel Malter","George Marsaglia","John Marsaglia","Alina Matei","David Meyer","Weiwen Miao","Giovanni Millo","Yongyi Min","David Mitchell","Markus Naepflin","Daniel Navarro","Hong Ooi","Roland Rapold","William Revelle","Caroline Rodriguez","Nathan Russell","Nick Sabbe","Werner A Stahel","Mark Stevenson","Matthias Templ","Yves Tille","Adrian Trapletti","John Verzani","Stefan Wellek","Rand R Wilcox","Peter Wolf","Daniel Wollschlaeger","Thomas Yee","Detlef Steuer","Frank Bretz","Andy Bunn","Sarah Goslee","Sarah Brockhaus","R community","Peter Dalgaard","Kjetil Brinchmann Halvorsen","Ray Brownrigg","David L Reiner","Berton Gunter","Roger Koenker","Charles Berry","Peter Dunn","Roland Rau","Mark Leeds","Emmanuel Charpentier","Chris Evans","Paolo Sonego","Peter Ehlers","Liviu Andronic","Brian Diggs","Richard M Heiberger","Patrick Burns","R Michael Weylandt","Jon Olav Skoien","Francois Morneau","Antony Unwin","Joshua Wiley","Bryan Hanson","Eduard Szoecs","Gregor Passolt","John C Nash","Matthias Speidel","Anne-Laure Boulesteix","Hannah Frick","Christina Riedel","Martin Spindler","Ivan Kondofersky","Oliver S Kuehnle","Christian Lindenlaub","Georg Pfundstein","Ariane Straub","Florian Wickler","Katharina Zink","Manuel Eugster","Heidi Seibold","Brian S Everitt","Andrea Peters","Beth Atkinson","Fabian Sobotka","Alan Genz","Nikhil Garge","Georgiy Bobashev","Benjamin Carper","Kasey Jones","Carolin Strobl","Basil Abou El-Komboz","Abdelilah El Hadad","Laura Goeres","Max Hughes-Brandl","Peter Westfall","Andre Schuetzenmeister","Susan Scheibe","Tetsuhisa Miwa","Xuefei Mi"],"id":["Tony Plate","Richard Heiberger","Gilles Kratzer","Henrik Singmann","Sandrine Pavoine","Daniel Chessel","Stephane Dray","Christian Kleiber","Achim Zeileis","Ben Bolker","John Fox","Kevin Wright","Martin Maechler","Jeremy VanDerWal","Michael Sumner","Kamil Erguler","Max Kuhn","Hadley Wickham","Dirk Eddelbuettel","Renaud Lancelot","Simon Blomberg","Jim Lemon","Karline Soetaert","Friedrich Leisch","Arni Magnusson","Christian Buchta","Kurt Hornik","Ken Aho","Nick Parsons","Rolf Turner","Jon Eugster","Andrea Farnham","Raphael Hartmann","Tea Isler","Ke Li","Silvia Panunzi","Sophie Schneider","Craig Wang","Torsten Hothorn","Terry Therneau","Alexandros Karatzoglou","Andrie de Vries","Jeff Enos","Bill Venables","Roman Hornung","Brian Ripley","Peter Buehlmann","Barry Eggleston","Christopher Jackson","Thomas Kneib","Andreas Mayr","Benjamin Hofner","Matthias Schmid","Romain Francois","Mikko Korpela","Fabian Scheipl","Greg Snow","Kevin Ushey","Bjoern Bornkamp","Duncan Murdoch","Dieter Menne","Uwe Ligges","Klaus Nordhausen","Zhu Wang","Michael Friendly","David Ruegamer","Thomas Petzoldt","Spencer Graves","Henric Nilsson","Derek Ogle","David Winsemius","Roger Bivand","Andreas Alfons","Michael Smithson","Gabor Grothendieck","Matthieu Stigler","Venkatraman E Seshan","Andreas Bender","Mark A van de Wiel","Henric Winell","Tyler Rinker","Alec Stephenson","Christian W Hoffmann","Tal Galili","Gregory R Warnes","Barry Rowlingson","Rob J Hyndman","Joshua Ulrich","Marc Schwartz","Andri Signorell","Nanina Anderegg","Tomas Aragon","Antti Arppe","Adrian Baddeley","Kamil Barton","Frederico Caeiro","Stephane Champely","Leanne Chhay","Clint Cummins","Michael Dewey","Harold C Doran","Charles Dupont","Claus Ekstrom","Martin Elff","Richard W Farebrother","Matthias Gamer","Joseph L Gastwirth","Yulia R Gel","Juergen Gross","Frank E Harrell Jr","Michael Hoehle","Markus Huerzeler","Wallace W Hui","Pete Hurd","Pablo J Villacorta Iglesias","Matthias Kohl","Detlew Labes","Friederich Leisch","Dong Li","Daniel Malter","George Marsaglia","John Marsaglia","Alina Matei","David Meyer","Weiwen Miao","Giovanni Millo","Yongyi Min","David Mitchell","Markus Naepflin","Daniel Navarro","Hong Ooi","Roland Rapold","William Revelle","Caroline Rodriguez","Nathan Russell","Nick Sabbe","Werner A Stahel","Mark Stevenson","Matthias Templ","Yves Tille","Adrian Trapletti","John Verzani","Stefan Wellek","Rand R Wilcox","Peter Wolf","Daniel Wollschlaeger","Thomas Yee","Detlef Steuer","Frank Bretz","Andy Bunn","Sarah Goslee","Sarah Brockhaus","R community","Peter Dalgaard","Kjetil Brinchmann Halvorsen","Ray Brownrigg","David L Reiner","Berton Gunter","Roger Koenker","Charles Berry","Peter Dunn","Roland Rau","Mark Leeds","Emmanuel Charpentier","Chris Evans","Paolo Sonego","Peter Ehlers","Liviu Andronic","Brian Diggs","Richard M Heiberger","Patrick Burns","R Michael Weylandt","Jon Olav Skoien","Francois Morneau","Antony Unwin","Joshua Wiley","Bryan Hanson","Eduard Szoecs","Gregor Passolt","John C Nash","Matthias Speidel","Anne-Laure Boulesteix","Hannah Frick","Christina Riedel","Martin Spindler","Ivan Kondofersky","Oliver S Kuehnle","Christian Lindenlaub","Georg Pfundstein","Ariane Straub","Florian Wickler","Katharina Zink","Manuel Eugster","Heidi Seibold","Brian S Everitt","Andrea Peters","Beth Atkinson","Fabian Sobotka","Alan Genz","Nikhil Garge","Georgiy Bobashev","Benjamin Carper","Kasey Jones","Carolin Strobl","Basil Abou El-Komboz","Abdelilah El Hadad","Laura Goeres","Max Hughes-Brandl","Peter Westfall","Andre Schuetzenmeister","Susan Scheibe","Tetsuhisa Miwa","Xuefei Mi"],"title":["Author: Tony Plate<br>148 collaborators in 10 packages: <br>abind, DescTools, Holidays, JuniperKernel<br>RSVGTipsDevice, scriptests, sfsmisc, svglite<br>TimeWarp, track","Author: Richard Heiberger<br>149 collaborators in 4 packages: <br>abind, car, DescTools, Rcmdr","Author: Gilles Kratzer<br>13 collaborators in 3 packages: <br>abn, ATR, varrank","Author: Henrik Singmann<br>147 collaborators in 11 packages: <br>acss, acss.data, afex, bridgesampling<br>emmeans, fortunes, LaplacesDemon, lme4<br>MPTinR, plotrix, rtdists","Author: Sandrine Pavoine<br>131 collaborators in 4 packages: <br>ade4, adiv, DescTools, seewave","Author: Daniel Chessel<br>112 collaborators in 2 packages: <br>ade4, DescTools","Author: Stephane Dray<br>109 collaborators in 4 packages: <br>adehabitatLT, DescTools, Guerry, HistData","Author: Christian Kleiber<br>77 collaborators in 5 packages: <br>AER, fortunes, ineq, plm<br>strucchange","Author: Achim Zeileis<br>349 collaborators in 53 packages: <br>AER, bamlss, BayesXsrc, betareg<br>bfast, car, coin, colorspace<br>condvis, crch, ctv, DescTools<br>dichromat, dynlm, evtree, exams<br>Formula, fortunes, fxregime, glmertree<br>glmx, glogis, ineq, lagsarlmtree<br>lmSubsets, lmtest, mobForest, model4you<br>modeltools, mpath, mpt, palmtree<br>party, partykit, plm, pscl<br>psychomix, psychotools, psychotree, pwt<br>pwt8, pwt9, quantreg, R2BayesX<br>RWeka, sandwich, stablelearner, strucchange<br>truncreg, tth, vcd, vcdExtra<br>zoo","Author: Ben Bolker<br>402 collaborators in 29 packages: <br>afex, ape, bbmle, broom<br>broom.mixed, car, DescTools, dotwhisker<br>emdbook, foghorn, fortunes, gdata<br>ggalt, glmmTMB, gmodels, gplots<br>gtools, lme4, MEMSS, metaplus<br>minpack.lm, mlmRev, plotrix, R2admb<br>RLRsim, rncl, rstanarm, SASmixed<br>sfsmisc","Author: John Fox<br>205 collaborators in 19 packages: <br>afex, candisc, car, carData<br>DescTools, effects, english, heplots<br>matlib, phia, polycor, pubh<br>Rcmdr, RcmdrMisc, RcmdrPlugin.survival, RcmdrPlugin.TeachingDemos<br>relimp, sem, twoway","Author: Kevin Wright<br>87 collaborators in 12 packages: <br>agridat, corrgram, desplot, fortunes<br>gge, lucid, mountainplot, nipals<br>pagenum, pals, Rcmdr, rseedcalc","Author: Martin Maechler<br>448 collaborators in 58 packages: <br>akima, bastah, Bessel, Biodem<br>bitops, car, CLA, classGraph<br>cluster, cobs, copula, copulaData<br>DEoptimR, DescTools, DetR, diptest<br>expm, fortunes, fracdiff, GLDEX<br>glmmTMB, gmp, gnm, gplots<br>hdi, hexbin, lasso2, lme4<br>lokern, longmemo, lpridge, Matrix<br>MatrixModels, MEMSS, mlmRev, mvtnorm<br>pcalg, pixmap, plugdensity, polynom<br>RankingProject, Rcmdr, Rmpfr, robust<br>robustbase, robustX, rstanarm, SASmixed<br>scatterplot3d, sfsmisc, simsalapar, sptm<br>stabledist, supclust, timeDate, TMB<br>VLMC, xgobi","Author: Jeremy VanDerWal<br>119 collaborators in 4 packages: <br>ALA4R, DescTools, landscapemetrics, SDMTools","Author: Michael Sumner<br>175 collaborators in 19 packages: <br>ALA4R, bsam, decido, fasterize<br>fortunes, GeoLight, gibble, hddtools<br>mapview, mregions, ncmeta, raster<br>rgdal, sf, sp, stars<br>tidytransit, trread, vapour","Author: Kamil Erguler<br>101 collaborators in 3 packages: <br>albopictus, Barnard, DescTools","Author: Max Kuhn<br>148 collaborators in 17 packages: <br>AmesHousing, AppliedPredictiveModeling, C50, caret<br>contrast, Cubist, DescTools, desirability<br>dials, embed, QSARdata, recipes<br>rsample, sparseLDA, spectacles, tidyposterior<br>yardstick","Author: Hadley Wickham<br>784 collaborators in 131 packages: <br>analogsea, anyflights, assertthat, babynames<br>bigQueryR, bigrquery, blob, bnclassify<br>bookdown, broom, cellranger, classifly<br>cli, clusterfly, conflicted, curl<br>damr, DBI, dbplyr, DescribeDisplay<br>DescTools, devtools, dplyr, dtplyr<br>ellipsis, evaluate, fda, feather<br>forcats, fs, fueleconomy, gdtools<br>geozoo, GGally, ggmap, ggmosaic<br>ggplot2, ggplot2movies, ggstance, ggthemes<br>ggvis, gh, gtable, haven<br>hflights, hipread, HistData, httr<br>itertools, knitr, knitrProgressBar, labelled<br>lazyeval, leaflet, lemon, lubridate<br>lvplot, magrittr, meifly, memoise<br>modelr, namespace, nasaweather, nlmixr<br>nullabor, nycflights13, odbc, packagedocs<br>partools, pillar, pkgbuild, pkgdown<br>pkgload, plotrix, plumbr, plyr<br>prettydoc, productplots, profr, proto<br>pryr, purrr, purrrlyr, rappdirs<br>Rd2roxygen, readr, readxl, recipes<br>remotes, reprex, reshape, reshape2<br>rggobi, RInno, rlang, RMariaDB<br>rmarkdown, RMySQL, roxygen2, RPostgres<br>rsample, RSQLite, rstan, rstudioapi<br>rticles, rvest, RxODE, scales<br>sessioninfo, sf, skimr, spectacles<br>stringr, svglite, testthat, tibble<br>tidymodels, tidyr, tidyselect, tidyverse<br>tidyxl, tourr, tourrGui, tribe<br>unjoin, usethis, wesanderson, withr<br>xml2, yaml, yesno","Author: Dirk Eddelbuettel<br>336 collaborators in 74 packages: <br>anytime, AsioHeaders, BH, bigFastlm<br>binb, DescTools, digest, drat<br>fortunes, gaussfacts, gcbd, gettz<br>gtrendsR, gunsales, hurricaneexposure, inline<br>komaletter, lbfgs, linl, littler<br>mvabund, mvst, n1qn1, nanotime<br>nlmixr, nloptr, permGPU, pinp<br>pkgKitten, prrd, random, RApiDatetime<br>RApiSerialize, Rblpapi, Rcpp, RcppAnnoy<br>RcppAPT, RcppArmadillo, RcppBDT, RcppBlaze<br>RcppCCTZ, RcppClassic, RcppClassicExamples, RcppCNPy<br>RcppDE, RcppEigen, RcppExamples, RcppFaddeeva<br>RcppGetconf, RcppGSL, RcppMsgPack, RcppNLoptExample<br>RcppQuantuccia, RcppRedis, RcppSMC, RcppStreams<br>RcppTOML, RcppXts, RcppZiggurat, RDieHarder<br>reticulate, rfoaas, RInside, Rmalschains<br>rmsfact, RPostgreSQL, RProtoBuf, RPushbullet<br>RQuantLib, RVowpalWabbit, sanitizers, tensorflow<br>tint, x13binary","Author: Renaud Lancelot<br>87 collaborators in 4 packages: <br>aod, aods3, fortunes, Rcmdr","Author: Simon Blomberg<br>91 collaborators in 2 packages: <br>ape, fortunes","Author: Jim Lemon<br>236 collaborators in 11 packages: <br>ape, clinsig, crank, DescTools<br>eventInterval, fortunes, irr, logmult<br>plotrix, prettyR, sp","Author: Karline Soetaert<br>138 collaborators in 23 packages: <br>AquaEnv, BCE, bvpSolve, DescTools<br>deSolve, deTestSet, diagram, diffEq<br>ecolMod, FME, inline, LIM<br>limSolve, marelac, MSCMT, NetIndices<br>OceanView, plot3D, plot3Drgl, ReacTran<br>rootSolve, seacarb, shape","Author: Friedrich Leisch<br>87 collaborators in 20 packages: <br>archetypes, biclust, bindata, bootstrap<br>e1071, exams, flexclust, flexmix<br>genetics, mda, mlbench, modeltools<br>mvtnorm, ockc, pixmap, psychomix<br>signal, StatDataML, strucchange, tth","Author: Arni Magnusson<br>185 collaborators in 18 packages: <br>areaplot, coda, DescTools, gdata<br>glmmTMB, gmt, gplots, icesAdvice<br>icesDatras, icesSAG, icesTAF, icesVocab<br>ora, plotMCMC, r2d2, scape<br>TMB, xtable","Author: Christian Buchta<br>39 collaborators in 14 packages: <br>arules, arulesSequences, cba, DSL<br>ISOcodes, proxy, relations, Rglpk<br>RWeka, seriation, sets, slam<br>tau, textcat","Author: Kurt Hornik<br>299 collaborators in 71 packages: <br>arules, aucm, bibtex, bindata<br>cclust, chron, clue, cluster<br>coin, colorspace, cordillera, ctv<br>date, dendextend, digest, e1071<br>exactRankTests, fortunes, gap, ISOcodes<br>isotone, kernlab, MASS, mda<br>mobForest, movMF, mvord, NLP<br>NLPutils, OAIHarvester, openNLP, openNLPdata<br>oz, pandocfilters, party, polyclip<br>polynom, PolynomF, princurve, qrmdata<br>qrmtools, Rcplex, relations, Rglpk<br>RKEA, RKEAjars, ROI, ROI.plugin.msbinlp<br>Rpoppler, Rsymphony, RWeka, RWekajars<br>seriation, sets, signal, skmeans<br>slam, stablelearner, strucchange, tau<br>textcat, tm, tm.plugin.mail, topicmodels<br>tseries, TSP, Unicode, vcd<br>W3CMarkupValidator, wordnet, xgobi","Author: Ken Aho<br>101 collaborators in 2 packages: <br>asbio, DescTools","Author: Nick Parsons<br>101 collaborators in 3 packages: <br>asd, DescTools, repolr","Author: Rolf Turner<br>315 collaborators in 11 packages: <br>AssetPricing, deldir, fortunes, hmm.discnp<br>Iso, maptools, mixreg, plotrix<br>spatstat, spatstat.data, spatstat.utils","Author: Jon Eugster<br>9 collaborators in 1 packages: <br>ATR","Author: Andrea Farnham<br>9 collaborators in 1 packages: <br>ATR","Author: Raphael Hartmann<br>9 collaborators in 1 packages: <br>ATR","Author: Tea Isler<br>12 collaborators in 2 packages: <br>ATR, eggCounts","Author: Ke Li<br>9 collaborators in 1 packages: <br>ATR","Author: Silvia Panunzi<br>9 collaborators in 1 packages: <br>ATR","Author: Sophie Schneider<br>9 collaborators in 1 packages: <br>ATR","Author: Craig Wang<br>12 collaborators in 3 packages: <br>ATR, eggCounts, variosig","Author: Torsten Hothorn<br>264 collaborators in 39 packages: <br>ATR, basefun, bst, coin<br>DescTools, exactRankTests, FDboost, fortunes<br>globalboosttest, hgam, HSAUR, HSAUR2<br>HSAUR3, inum, ipred, libcoin<br>lmtest, maxstat, mboost, mlt<br>mlt.docreg, mobForest, model4you, modeltools<br>MUCflights, multcomp, MVA, mvtnorm<br>palmtree, party, partykit, RWeka<br>sfa, stabs, StatDataML, TH.data<br>tram, trtf, variables","Author: Terry Therneau<br>180 collaborators in 10 packages: <br>attribrisk, bdsmatrix, date, deming<br>DescTools, fortunes, glmBfp, ipred<br>noweb, rpart","Author: Alexandros Karatzoglou<br>19 collaborators in 4 packages: <br>aucm, kernlab, personalized, RWeka","Author: Andrie de Vries<br>93 collaborators in 10 packages: <br>AzureML, dendextend, fortunes, ggdendro<br>miniCRAN, rfordummies, rrd, secret<br>sss, surveydata","Author: Jeff Enos<br>113 collaborators in 4 packages: <br>backtest, DescTools, portfolio, portfolioSim","Author: Bill Venables<br>252 collaborators in 23 packages: <br>bannerCommenter, BART, codingMatrices, conf.design<br>demoKde, DescTools, english, fortunes<br>fractional, gnm, gplots, lasso2<br>lazyData, LSD, MASS, minimax<br>oz, polynom, PolynomF, raster<br>sfsmisc, SOAR, sudokuAlt","Author: Roman Hornung<br>13 collaborators in 4 packages: <br>bapred, MUCflights, ordinalForest, prioritylasso","Author: Brian Ripley<br>541 collaborators in 41 packages: <br>BART, boot, car, class<br>crs, DescTools, fastICA, fortunes<br>FREGAT, gap, gee, ggdendro<br>gnm, ipred, itree, KernSmooth<br>LSD, MarginalMediation, MASS, mda<br>mpath, nnet, pbdMPI, pbdSLAP<br>pbdZMQ, polyclip, pspline, quantreg<br>rattle, Rcmdr, RODBC, RODBCext<br>rpart, rstanarm, sm, spatial<br>spatstat, tactile, tree, TSA<br>xgobi","Author: Peter Buehlmann<br>19 collaborators in 4 packages: <br>bastah, hdi, mboost, protiq","Author: Barry Eggleston<br>12 collaborators in 2 packages: <br>BayesCTDesign, mobForest","Author: Christopher Jackson<br>107 collaborators in 7 packages: <br>bayesDP, denstrip, DescTools, ecoreg<br>flexsurv, MetaAnalyser, msm","Author: Thomas Kneib<br>25 collaborators in 6 packages: <br>BayesX, BayesXsrc, cAIC4, mboost<br>R2BayesX, svcm","Author: Andreas Mayr<br>14 collaborators in 3 packages: <br>betaboost, gamboostLSS, mboost","Author: Benjamin Hofner<br>31 collaborators in 8 packages: <br>betaboost, Daim, gamboostLSS, kangar00<br>mboost, OpenML, papeR, stabs","Author: Matthias Schmid<br>20 collaborators in 7 packages: <br>betaboost, discSurv, DStree, gamboostLSS<br>kernDeepStackNet, mboost, survAUC","Author: Romain Francois<br>279 collaborators in 29 packages: <br>bibtex, bigFastlm, DescTools, highlight<br>hipread, inline, knitr, knitrProgressBar<br>mvst, operators, permGPU, Rcpp<br>Rcpp11, RcppArmadillo, RcppBDT, RcppBlaze<br>RcppClassic, RcppClassicExamples, RcppEigen, RcppExamples<br>RcppGSL, RcppParallel, readr, RInside<br>RProtoBuf, sos, svMisc, svTools<br>tibble","Author: Mikko Korpela<br>133 collaborators in 6 packages: <br>BINCOR, DescTools, dplR, RXKCD<br>sisal, skimr","Author: Fabian Scheipl<br>48 collaborators in 9 packages: <br>bioimagetools, dlnm, gamm4, lme4<br>mboost, mvtnorm, refund, RLRsim<br>spikeSlabGAM","Author: Greg Snow<br>237 collaborators in 11 packages: <br>blockrand, BrailleR, DescTools, fortunes<br>maptools, obsSens, qcc, sfsmisc<br>spData, sudoku, TeachingDemos","Author: Kevin Ushey<br>246 collaborators in 21 packages: <br>blogdown, bookdown, cloudml, cronR<br>DescTools, icd, packrat, Rcpp<br>Rcpp11, RcppParallel, RcppRoll, reticulate<br>rex, rmarkdown, rsnps, rstudioapi<br>sourcetools, sparklyr, tfdatasets, tfestimators<br>withr","Author: Bjoern Bornkamp<br>12 collaborators in 7 packages: <br>bnpmr, DoseFinding, iterLap, MCPMod<br>mvtnorm, SEL, txtplot","Author: Duncan Murdoch<br>265 collaborators in 22 packages: <br>BrailleR, car, digest, ellipse<br>fortunes, gpclib, gsl, inline<br>knitr, manipulateWidget, nlsr, orientlib<br>patchDVI, Rcmdr, Rdpack, rgl<br>rglwidget, sciplot, spatialkernel, tables<br>tkrgl, vcdExtra","Author: Dieter Menne<br>125 collaborators in 7 packages: <br>breathtestcore, breathteststan, broom, broom.mixed<br>fortunes, gastempt, installr","Author: Uwe Ligges<br>206 collaborators in 16 packages: <br>BRugs, dendextend, fftw, fortunes<br>klaR, nortest, R2OpenBUGS, R2WinBUGS<br>Rcmdr, reliaR, RobPer, RWinEdt<br>scatterplot3d, signal, tuneR, xtable","Author: Klaus Nordhausen<br>153 collaborators in 17 packages: <br>BSSasymp, DescTools, fastM, fICA<br>ICS, ICSNP, ICSOutlier, ICSShiny<br>ICtest, JADE, LDRTools, MNM<br>OjaNP, REPPlab, SpatialNP, tensorBSS<br>tsBSS","Author: Zhu Wang<br>11 collaborators in 5 packages: <br>bst, bujar, cts, mpath<br>orsk","Author: Michael Friendly<br>386 collaborators in 25 packages: <br>ca, candisc, car, DescTools<br>effects, fortunes, genridge, Guerry<br>heplots, HistData, installr, knitr<br>Lahman, logmult, maptools, matlib<br>mvinfluence, sem, statquotes, tableplot<br>twoway, vcd, vcdExtra, vegan<br>WordPools","Author: David Ruegamer<br>5 collaborators in 2 packages: <br>cAIC4, FDboost","Author: Thomas Petzoldt<br>109 collaborators in 11 packages: <br>caper, cardidates, deSolve, FME<br>fortunes, growthrates, marelac, plotrix<br>proto, qualV, simecol","Author: Spencer Graves<br>97 collaborators in 7 packages: <br>car, Ecfun, fda, fortunes<br>maxLik, multcompView, sos","Author: Henric Nilsson<br>125 collaborators in 2 packages: <br>car, DescTools","Author: Derek Ogle<br>155 collaborators in 6 packages: <br>car, DescTools, FSA, FSAdata<br>plotrix, readbitmap","Author: David Winsemius<br>86 collaborators in 2 packages: <br>car, fortunes","Author: Roger Bivand<br>244 collaborators in 25 packages: <br>cartogram, classInt, DCluster, foreign<br>fortunes, INLABMA, interp, mapproj<br>maptools, MBA, PBSmapping, pixmap<br>raster, rgdal, rgeos, rgrass7<br>sf, sp, spatial, spData<br>spdep, spgrass6, spgwr, splancs<br>vapour","Author: Andreas Alfons<br>119 collaborators in 13 packages: <br>ccaPP, cvTools, DescTools, laeken<br>perry, robmed, robustHD, simFrame<br>simPop, sparseLTSEigen, sparsestep, VIM<br>VIMGUI","Author: Michael Smithson<br>103 collaborators in 3 packages: <br>cdfquantreg, DescTools, smdata","Author: Gabor Grothendieck<br>169 collaborators in 12 packages: <br>chron, DescTools, gdata, lme4<br>optimr, optimx, plotrix, proto<br>rockchalk, Ryacas, stinepack, zoo","Author: Matthieu Stigler<br>108 collaborators in 9 packages: <br>classInt, fortunes, rddapp, rddtools<br>rsdmx, tsDyn, urca, vars<br>xtable","Author: Venkatraman E Seshan<br>106 collaborators in 4 packages: <br>clinfun, DescTools, genepi, PSCBS","Author: Andreas Bender<br>12 collaborators in 2 packages: <br>coalitions, MUCflights","Author: Mark A van de Wiel<br>4 collaborators in 1 packages: <br>coin","Author: Henric Winell<br>4 collaborators in 1 packages: <br>coin","Author: Tyler Rinker<br>170 collaborators in 17 packages: <br>cowsay, DescTools, gofastr, lexicon<br>numform, pacman, qdap, qdapDictionaries<br>qdapRegex, qdapTools, reports, sentimentr<br>textclean, textreadr, textshape, textstem<br>wakefield","Author: Alec Stephenson<br>112 collaborators in 8 packages: <br>cubing, DescTools, evd, evdbayes<br>evir, PlayerRatings, texmex, TideHarmonics","Author: Christian W Hoffmann<br>101 collaborators in 2 packages: <br>cwhmisc, DescTools","Author: Tal Galili<br>218 collaborators in 9 packages: <br>d3heatmap, dendextend, DescTools, digitize<br>edfun, fortunes, heatmaply, installr<br>shinyHeatmaply","Author: Gregory R Warnes<br>168 collaborators in 13 packages: <br>daff, DescTools, gdata, gmodels<br>gplots, gtools, mcgibbsit, namespace<br>r2d2, SASxport, session, SII<br>yaml","Author: Barry Rowlingson<br>182 collaborators in 13 packages: <br>DClusterm, fortunes, geonames, gpclib<br>installr, plotrix, raster, rgdal<br>sp, spatialkernel, splancs, stplanr<br>stpp","Author: Rob J Hyndman<br>111 collaborators in 11 packages: <br>demography, DescTools, emma, expsmooth<br>fds, fma, fpp, ftsa<br>rainbow, smoothAPC, stR","Author: Joshua Ulrich<br>115 collaborators in 4 packages: <br>DEoptim, DescTools, PerformanceAnalytics, TTR","Author: Marc Schwartz<br>76 collaborators in 4 packages: <br>descr, fortunes, gplots, WriteXLS","Author: Andri Signorell<br>108 collaborators in 3 packages: <br>DescTools, DescToolsAddIns, kyotil","Author: Nanina Anderegg<br>101 collaborators in 1 packages: <br>DescTools","Author: Tomas Aragon<br>112 collaborators in 2 packages: <br>DescTools, pubh","Author: Antti Arppe<br>106 collaborators in 2 packages: <br>DescTools, ndl","Author: Adrian Baddeley<br>342 collaborators in 11 packages: <br>DescTools, globe, goftest, maptools<br>polyclip, scuba, spatstat, spatstat.data<br>spatstat.local, spatstat.utils, statip","Author: Kamil Barton<br>103 collaborators in 2 packages: <br>DescTools, svMisc","Author: Frederico Caeiro<br>105 collaborators in 3 packages: <br>DescTools, evt0, randtests","Author: Stephane Champely<br>109 collaborators in 4 packages: <br>DescTools, PairedData, pwr, RcmdrPlugin.pointG","Author: Leanne Chhay<br>116 collaborators in 2 packages: <br>DescTools, forecast","Author: Clint Cummins<br>106 collaborators in 2 packages: <br>DescTools, lmtest","Author: Michael Dewey<br>163 collaborators in 4 packages: <br>DescTools, fortunes, latdiag, metap","Author: Harold C Doran<br>101 collaborators in 1 packages: <br>DescTools","Author: Charles Dupont<br>107 collaborators in 4 packages: <br>DescTools, Hmisc, PResiduals, sensitivityPStrat","Author: Claus Ekstrom<br>115 collaborators in 5 packages: <br>DescTools, isdals, kulife, MethComp<br>pwr","Author: Martin Elff<br>101 collaborators in 4 packages: <br>DescTools, mclogit, memisc, munfold","Author: Richard W Farebrother<br>106 collaborators in 2 packages: <br>DescTools, lmtest","Author: Matthias Gamer<br>103 collaborators in 2 packages: <br>DescTools, irr","Author: Joseph L Gastwirth<br>106 collaborators in 2 packages: <br>DescTools, lawstat","Author: Yulia R Gel<br>120 collaborators in 5 packages: <br>DescTools, funtimes, lawstat, nparLD<br>snowboot","Author: Juergen Gross<br>102 collaborators in 2 packages: <br>DescTools, nortest","Author: Frank E Harrell Jr<br>187 collaborators in 5 packages: <br>DescTools, greport, Hmisc, knitr<br>rms","Author: Michael Hoehle<br>103 collaborators in 2 packages: <br>DescTools, polyCub","Author: Markus Huerzeler<br>101 collaborators in 1 packages: <br>DescTools","Author: Wallace W Hui<br>101 collaborators in 1 packages: <br>DescTools","Author: Pete Hurd<br>101 collaborators in 1 packages: <br>DescTools","Author: Pablo J Villacorta Iglesias<br>101 collaborators in 1 packages: <br>DescTools","Author: Matthias Kohl<br>148 collaborators in 21 packages: <br>DescTools, distr, distrDoc, distrEx<br>distrMod, distrSim, distrTeach, distrTEst<br>MKmisc, mpe, RandVar, RFLPtools<br>RobAStBase, RobAStRDA, RobExtremes, RobLox<br>RobLoxBioC, RobRex, ROptEst, ROptEstOld<br>ROptRegTS","Author: Detlew Labes<br>105 collaborators in 3 packages: <br>DescTools, Power2Stage, PowerTOST","Author: Friederich Leisch<br>101 collaborators in 1 packages: <br>DescTools","Author: Dong Li<br>101 collaborators in 1 packages: <br>DescTools","Author: Daniel Malter<br>101 collaborators in 1 packages: <br>DescTools","Author: George Marsaglia<br>104 collaborators in 2 packages: <br>DescTools, goftest","Author: John Marsaglia<br>104 collaborators in 2 packages: <br>DescTools, goftest","Author: Alina Matei<br>102 collaborators in 2 packages: <br>DescTools, sampling","Author: David Meyer<br>136 collaborators in 14 packages: <br>DescTools, e1071, kst, proxy<br>registry, relations, ROI, ROI.plugin.msbinlp<br>RWeka, sets, slam, StatDataML<br>tau, vcd","Author: Weiwen Miao<br>106 collaborators in 2 packages: <br>DescTools, lawstat","Author: Giovanni Millo<br>146 collaborators in 6 packages: <br>DescTools, lmtest, pder, plm<br>spdep, splm","Author: Yongyi Min<br>101 collaborators in 1 packages: <br>DescTools","Author: David Mitchell<br>134 collaborators in 3 packages: <br>DescTools, lmtest, xtable","Author: Markus Naepflin<br>101 collaborators in 1 packages: <br>DescTools","Author: Daniel Navarro<br>101 collaborators in 2 packages: <br>DescTools, lsr","Author: Hong Ooi<br>102 collaborators in 2 packages: <br>DescTools, glmnetUtils","Author: Roland Rapold<br>101 collaborators in 1 packages: <br>DescTools","Author: William Revelle<br>101 collaborators in 2 packages: <br>DescTools, psych","Author: Caroline Rodriguez<br>101 collaborators in 1 packages: <br>DescTools","Author: Nathan Russell<br>108 collaborators in 3 packages: <br>DescTools, hashmap, Rcpp","Author: Nick Sabbe<br>104 collaborators in 2 packages: <br>DescTools, pim","Author: Werner A Stahel<br>103 collaborators in 2 packages: <br>DescTools, spate","Author: Mark Stevenson<br>128 collaborators in 3 packages: <br>DescTools, epiR, pubh","Author: Matthias Templ<br>131 collaborators in 9 packages: <br>DescTools, emdi, laeken, robCompositions<br>sdcMicro, simPop, sparkTable, VIM<br>VIMGUI","Author: Yves Tille<br>101 collaborators in 1 packages: <br>DescTools","Author: Adrian Trapletti<br>103 collaborators in 2 packages: <br>DescTools, tseries","Author: John Verzani<br>111 collaborators in 12 packages: <br>DescTools, Devore7, gWidgets, gWidgets2<br>gWidgets2RGtk2, gWidgets2tcltk, gWidgetsRGtk2, gWidgetstcltk<br>ProgGUIinR, RGtk2Extras, traitr, UsingR","Author: Stefan Wellek<br>103 collaborators in 3 packages: <br>DescTools, EQUIVNONINF, MIDN","Author: Rand R Wilcox<br>101 collaborators in 1 packages: <br>DescTools","Author: Peter Wolf<br>124 collaborators in 2 packages: <br>DescTools, Rcmdr","Author: Daniel Wollschlaeger<br>106 collaborators in 4 packages: <br>DescTools, DVHmetrics, epitools, shotGroups","Author: Thomas Yee<br>103 collaborators in 3 packages: <br>DescTools, VGAMdata, VGAMextra","Author: Detlef Steuer<br>73 collaborators in 5 packages: <br>desire, fortunes, loglognorm, mco<br>truncnorm","Author: Frank Bretz<br>17 collaborators in 4 packages: <br>DoseFinding, MCPMod, multcomp, mvtnorm","Author: Andy Bunn<br>74 collaborators in 2 packages: <br>dplR, fortunes","Author: Sarah Goslee<br>65 collaborators in 4 packages: <br>ecodist, fortunes, landsat, VFS","Author: Sarah Brockhaus<br>6 collaborators in 2 packages: <br>FDboost, mousetrap","Author: R community<br>62 collaborators in 1 packages: <br>fortunes","Author: Peter Dalgaard<br>70 collaborators in 3 packages: <br>fortunes, ISwR, pwr","Author: Kjetil Brinchmann Halvorsen<br>62 collaborators in 1 packages: <br>fortunes","Author: Ray Brownrigg<br>71 collaborators in 4 packages: <br>fortunes, mapdata, mapproj, maps","Author: David L Reiner<br>62 collaborators in 1 packages: <br>fortunes","Author: Berton Gunter<br>62 collaborators in 1 packages: <br>fortunes","Author: Roger Koenker<br>73 collaborators in 5 packages: <br>fortunes, glmx, quantreg, REBayes<br>SparseM","Author: Charles Berry<br>62 collaborators in 2 packages: <br>fortunes, sonicLength","Author: Peter Dunn<br>66 collaborators in 2 packages: <br>fortunes, statmod","Author: Roland Rau<br>69 collaborators in 6 packages: <br>fortunes, LogrankA, Miney, npst<br>ROMIplot, RVideoPoker","Author: Mark Leeds<br>62 collaborators in 1 packages: <br>fortunes","Author: Emmanuel Charpentier<br>74 collaborators in 3 packages: <br>fortunes, LaplacesDemon, patchSynctex","Author: Chris Evans<br>62 collaborators in 1 packages: <br>fortunes","Author: Paolo Sonego<br>63 collaborators in 3 packages: <br>fortunes, Rcolombos, RXKCD","Author: Peter Ehlers<br>62 collaborators in 1 packages: <br>fortunes","Author: Liviu Andronic<br>124 collaborators in 7 packages: <br>fortunes, plm, Rcmdr, RcmdrPlugin.Export<br>RcmdrPlugin.sos, RGtk2Extras, xtable","Author: Brian Diggs<br>147 collaborators in 2 packages: <br>fortunes, knitr","Author: Richard M Heiberger<br>72 collaborators in 6 packages: <br>fortunes, HH, microplot, multcomp<br>RcmdrPlugin.HH, twoway","Author: Patrick Burns<br>62 collaborators in 1 packages: <br>fortunes","Author: R Michael Weylandt<br>62 collaborators in 1 packages: <br>fortunes","Author: Jon Olav Skoien<br>71 collaborators in 4 packages: <br>fortunes, intamap, psgp, rtop","Author: Francois Morneau<br>62 collaborators in 1 packages: <br>fortunes","Author: Antony Unwin<br>62 collaborators in 3 packages: <br>fortunes, GDAdata, OutliersO3","Author: Joshua Wiley<br>64 collaborators in 2 packages: <br>fortunes, MplusAutomation","Author: Bryan Hanson<br>74 collaborators in 3 packages: <br>fortunes, hyperSpec, LindenmayeR","Author: Eduard Szoecs<br>89 collaborators in 3 packages: <br>fortunes, taxize, vegan","Author: Gregor Passolt<br>66 collaborators in 2 packages: <br>fortunes, vitality","Author: John C Nash<br>80 collaborators in 12 packages: <br>fortunes, lbfgsb3, lbfgsb3c, minqa<br>nlmrt, nlsr, optextras, optimr<br>optimx, Rcgmin, Rtnmin, Rvmmin","Author: Matthias Speidel<br>13 collaborators in 3 packages: <br>FourScores, hgam, hmi","Author: Anne-Laure Boulesteix<br>6 collaborators in 6 packages: <br>globalboosttest, ipflasso, MAclinical, plsgenomics<br>SNPmaxsel, WilcoxCV","Author: Hannah Frick<br>18 collaborators in 4 packages: <br>goodpractice, hgam, psychomix, trackeR","Author: Christina Riedel<br>11 collaborators in 2 packages: <br>GWG, MUCflights","Author: Martin Spindler<br>13 collaborators in 2 packages: <br>hdm, hgam","Author: Ivan Kondofersky<br>11 collaborators in 1 packages: <br>hgam","Author: Oliver S Kuehnle<br>11 collaborators in 1 packages: <br>hgam","Author: Christian Lindenlaub<br>22 collaborators in 2 packages: <br>hgam, MUCflights","Author: Georg Pfundstein<br>11 collaborators in 1 packages: <br>hgam","Author: Ariane Straub<br>23 collaborators in 3 packages: <br>hgam, MUCflights, sfa","Author: Florian Wickler<br>22 collaborators in 2 packages: <br>hgam, MUCflights","Author: Katharina Zink<br>11 collaborators in 1 packages: <br>hgam","Author: Manuel Eugster<br>25 collaborators in 3 packages: <br>hgam, MUCflights, roxygen2","Author: Heidi Seibold<br>11 collaborators in 5 packages: <br>highriskzone, model4you, palmtree, partykit<br>simex","Author: Brian S Everitt<br>4 collaborators in 4 packages: <br>HSAUR, HSAUR2, HSAUR3, MVA","Author: Andrea Peters<br>4 collaborators in 1 packages: <br>ipred","Author: Beth Atkinson<br>8 collaborators in 3 packages: <br>ipred, itree, rpart","Author: Fabian Sobotka<br>7 collaborators in 1 packages: <br>mboost","Author: Alan Genz<br>21 collaborators in 7 packages: <br>mnormpow, mnormt, mvord, mvtnorm<br>pbivnorm, PLordprob, SimplicialCubature","Author: Nikhil Garge<br>8 collaborators in 1 packages: <br>mobForest","Author: Georgiy Bobashev<br>8 collaborators in 1 packages: <br>mobForest","Author: Benjamin Carper<br>8 collaborators in 1 packages: <br>mobForest","Author: Kasey Jones<br>14 collaborators in 2 packages: <br>mobForest, rollmatch","Author: Carolin Strobl<br>27 collaborators in 6 packages: <br>mobForest, party, psychomix, psychotools<br>psychotree, stablelearner","Author: Basil Abou El-Komboz<br>11 collaborators in 1 packages: <br>MUCflights","Author: Abdelilah El Hadad<br>11 collaborators in 1 packages: <br>MUCflights","Author: Laura Goeres<br>11 collaborators in 1 packages: <br>MUCflights","Author: Max Hughes-Brandl<br>11 collaborators in 2 packages: <br>MUCflights, NightDay","Author: Peter Westfall<br>5 collaborators in 1 packages: <br>multcomp","Author: Andre Schuetzenmeister<br>8 collaborators in 4 packages: <br>multcomp, STB, VCA, VFP","Author: Susan Scheibe<br>5 collaborators in 1 packages: <br>multcomp","Author: Tetsuhisa Miwa<br>8 collaborators in 1 packages: <br>mvtnorm","Author: Xuefei Mi<br>11 collaborators in 2 packages: <br>mvtnorm, selectiongain"]},"edges":{"from":["Jon Eugster","Andrea Farnham","Raphael Hartmann","Tea Isler","Gilles Kratzer","Ke Li","Silvia Panunzi","Sophie Schneider","Craig Wang","Zhu Wang","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Andri Signorell","Ken Aho","Andreas Alfons","Nanina Anderegg","Tomas Aragon","Antti Arppe","Adrian Baddeley","Kamil Barton","Ben Bolker","Frederico Caeiro","Stephane Champely","Daniel Chessel","Leanne Chhay","Clint Cummins","Michael Dewey","Harold C Doran","Stephane Dray","Charles Dupont","Dirk Eddelbuettel","Jeff Enos","Claus Ekstrom","Martin Elff","Kamil Erguler","Richard W Farebrother","John Fox","Romain Francois","Michael Friendly","Tal Galili","Matthias Gamer","Joseph L Gastwirth","Yulia R Gel","Juergen Gross","Gabor Grothendieck","Frank E Harrell Jr","Richard Heiberger","Michael Hoehle","Christian W Hoffmann","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Sarah Brockhaus","David Ruegamer","Achim Zeileis","R community","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Anne-Laure Boulesteix","Hannah Frick","Ivan Kondofersky","Oliver S Kuehnle","Christian Lindenlaub","Georg Pfundstein","Matthias Speidel","Martin Spindler","Ariane Straub","Florian Wickler","Katharina Zink","Manuel Eugster","Brian S Everitt","Brian S Everitt","Torsten Hothorn","Andrea Peters","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Nikhil Garge","Barry Eggleston","Georgiy Bobashev","Benjamin Carper","Kasey Jones","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Heidi Seibold","Achim Zeileis","Torsten Hothorn","Torsten Hothorn","Basil Abou El-Komboz","Andreas Bender","Abdelilah El Hadad","Laura Goeres","Roman Hornung","Max Hughes-Brandl","Christian Lindenlaub","Christina Riedel","Ariane Straub","Florian Wickler","Manuel Eugster","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Brian S Everitt","Alan Genz","Frank Bretz","Tetsuhisa Miwa","Xuefei Mi","Friedrich Leisch","Fabian Scheipl","Bjoern Bornkamp","Martin Maechler","Heidi Seibold","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Kurt Hornik","Christian Buchta","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Ariane Straub","Benjamin Hofner","David Meyer","Torsten Hothorn"],"to":["Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Kurt Hornik","Mark A van de Wiel","Henric Winell","Achim Zeileis","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Markus Huerzeler","Wallace W Hui","Pete Hurd","Rob J Hyndman","Pablo J Villacorta Iglesias","Christopher Jackson","Matthias Kohl","Mikko Korpela","Max Kuhn","Detlew Labes","Friederich Leisch","Jim Lemon","Dong Li","Martin Maechler","Arni Magnusson","Daniel Malter","George Marsaglia","John Marsaglia","Alina Matei","David Meyer","Weiwen Miao","Giovanni Millo","Yongyi Min","David Mitchell","Markus Naepflin","Daniel Navarro","Henric Nilsson","Klaus Nordhausen","Derek Ogle","Hong Ooi","Nick Parsons","Sandrine Pavoine","Tony Plate","Roland Rapold","William Revelle","Tyler Rinker","Brian Ripley","Caroline Rodriguez","Nathan Russell","Nick Sabbe","Venkatraman E Seshan","Greg Snow","Michael Smithson","Karline Soetaert","Werner A Stahel","Alec Stephenson","Mark Stevenson","Matthias Templ","Terry Therneau","Yves Tille","Adrian Trapletti","Joshua Ulrich","Kevin Ushey","Jeremy VanDerWal","Bill Venables","John Verzani","Gregory R Warnes","Stefan Wellek","Hadley Wickham","Rand R Wilcox","Peter Wolf","Daniel Wollschlaeger","Thomas Yee","Achim Zeileis","Kurt Hornik","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Peter Dalgaard","Uwe Ligges","Kevin Wright","Martin Maechler","Kjetil Brinchmann Halvorsen","Kurt Hornik","Duncan Murdoch","Andy Bunn","Ray Brownrigg","Roger Bivand","Spencer Graves","Jim Lemon","Christian Kleiber","David L Reiner","Berton Gunter","Roger Koenker","Charles Berry","Marc Schwartz","Michael Dewey","Ben Bolker","Peter Dunn","Sarah Goslee","Simon Blomberg","Bill Venables","Roland Rau","Thomas Petzoldt","Rolf Turner","Mark Leeds","Emmanuel Charpentier","Chris Evans","Paolo Sonego","Peter Ehlers","Detlef Steuer","Tal Galili","Greg Snow","Brian Ripley","Michael Sumner","David Winsemius","Liviu Andronic","Brian Diggs","Matthieu Stigler","Michael Friendly","Dirk Eddelbuettel","Richard M Heiberger","Patrick Burns","Dieter Menne","Andrie de Vries","Barry Rowlingson","Renaud Lancelot","R Michael Weylandt","Jon Olav Skoien","Francois Morneau","Antony Unwin","Joshua Wiley","Terry Therneau","Bryan Hanson","Henrik Singmann","Eduard Szoecs","Gregor Passolt","John C Nash","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Brian S Everitt","Torsten Hothorn","Brian Ripley","Terry Therneau","Beth Atkinson","Achim Zeileis","Richard W Farebrother","Clint Cummins","Giovanni Millo","David Mitchell","Peter Buehlmann","Thomas Kneib","Matthias Schmid","Benjamin Hofner","Fabian Sobotka","Fabian Scheipl","Andreas Mayr","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Kurt Hornik","Carolin Strobl","Achim Zeileis","Torsten Hothorn","Torsten Hothorn","Friedrich Leisch","Achim Zeileis","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Frank Bretz","Peter Westfall","Richard M Heiberger","Andre Schuetzenmeister","Susan Scheibe","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Achim Zeileis","Kurt Hornik","Carolin Strobl","Achim Zeileis","Heidi Seibold","Achim Zeileis","Torsten Hothorn","Torsten Hothorn","Alexandros Karatzoglou","David Meyer","Achim Zeileis","Torsten Hothorn","Torsten Hothorn","Torsten Hothorn","Friedrich Leisch"],"colortitle":["collaborate in: ATR","collaborate in: ATR","collaborate in: ATR","collaborate in: ATR","collaborate in: ATR","collaborate in: ATR","collaborate in: ATR","collaborate in: ATR","collaborate in: ATR","collaborate in: bst","collaborate in: coin","collaborate in: coin","collaborate in: coin","collaborate in: coin","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: DescTools","collaborate in: exactRankTests","collaborate in: FDboost","collaborate in: FDboost","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: fortunes","collaborate in: globalboosttest","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: hgam","collaborate in: HSAUR","collaborate in: HSAUR2","collaborate in: HSAUR3","collaborate in: ipred","collaborate in: ipred","collaborate in: ipred","collaborate in: ipred","collaborate in: lmtest","collaborate in: lmtest","collaborate in: lmtest","collaborate in: lmtest","collaborate in: lmtest","collaborate in: mboost","collaborate in: mboost","collaborate in: mboost","collaborate in: mboost","collaborate in: mboost","collaborate in: mboost","collaborate in: mboost","collaborate in: mobForest","collaborate in: mobForest","collaborate in: mobForest","collaborate in: mobForest","collaborate in: mobForest","collaborate in: mobForest","collaborate in: mobForest","collaborate in: mobForest","collaborate in: model4you","collaborate in: model4you","collaborate in: modeltools","collaborate in: modeltools","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: MUCflights","collaborate in: multcomp","collaborate in: multcomp","collaborate in: multcomp","collaborate in: multcomp","collaborate in: multcomp","collaborate in: MVA","collaborate in: mvtnorm","collaborate in: mvtnorm","collaborate in: mvtnorm","collaborate in: mvtnorm","collaborate in: mvtnorm","collaborate in: mvtnorm","collaborate in: mvtnorm","collaborate in: mvtnorm","collaborate in: palmtree","collaborate in: palmtree","collaborate in: party","collaborate in: party","collaborate in: party","collaborate in: partykit","collaborate in: partykit","collaborate in: RWeka","collaborate in: RWeka","collaborate in: RWeka","collaborate in: RWeka","collaborate in: RWeka","collaborate in: sfa","collaborate in: stabs","collaborate in: StatDataML","collaborate in: StatDataML"]},"nodesToDataframe":true,"edgesToDataframe":true,"options":{"width":"100%","height":"100%","nodes":{"shape":"dot"},"manipulation":{"enabled":false},"edges":{"physics":false},"interaction":{"dragNodes":true,"dragView":true,"zoomView":true}},"groups":null,"width":null,"height":null,"idselection":{"enabled":false,"style":"width: 150px; height: 26px","useLabels":true},"byselection":{"enabled":false,"style":"width: 150px; height: 26px","multiple":false,"hideColor":"rgba(200,200,200,0.5)"},"main":{"text":"cranly collaboration network<br> CRAN database version<br>Mon, 22 Oct 2018, 11:52 <br> Author names with<br> \"Torsten Hothorn\" <br> Package names with<br> \"Inf\"","style":"font-family:Georgia, Times New Roman, Times, serif;font-size:15px"},"submain":null,"footer":null,"background":"rgba(0, 0, 0, 0)","highlight":{"enabled":true,"hoverNearest":false,"degree":1,"algorithm":"all","hideColor":"rgba(200,200,200,0.5)","labelOnly":true},"collapse":{"enabled":false,"fit":false,"resetHighlight":true,"clusterOptions":null},"legend":{"width":0.2,"useGroups":false,"position":"left","ncol":1,"stepX":100,"stepY":100,"zoom":true,"nodes":{"label":["Authors matching query","Collaborators"],"color":["#4A6FE3","#ECEEFC"]},"nodesToDataframe":true},"tooltipStay":300,"tooltipStyle":"position: fixed;visibility:hidden;padding: 5px;white-space: nowrap;font-family: verdana;font-size:14px;font-color:#000000;background-color: #f5f4ed;-moz-border-radius: 3px;-webkit-border-radius: 3px;border-radius: 3px;border: 1px solid #808074;box-shadow: 3px 3px 10px rgba(0, 0, 0, 0.2);","export":{"type":"png","css":"float:right;","background":"#fff","name":"cranly_network-22-Oct-2018-Torsten Hothorn-Inf.png","label":"PNG snapshot"}},"evals":[],"jsHooks":[]}</script>
<p>It is also helpful to know who the most prolific CRAN package authors are. You can generally count on packages from this crew being top-shelf.</p>
<pre class="r"><code>author_summary <- summary(author_net)</code></pre>
<pre><code>## Warning in closeness(cranly_graph, normalized = FALSE): At centrality.c:
## 2784 :closeness centrality is not well-defined for disconnected graphs</code></pre>
<pre class="r"><code>plot(author_summary)</code></pre>
<p><img src="/post/2018-10-17-searching-for-r-packages_files/figure-html/unnamed-chunk-9-1.png" width="672" /></p>
<p>I am not claiming that the path I have taken here is the best, or even unique. I have by no means exhausted the possibilities with the packages I have highlighted. Previous posts explore <a href="https://rviews.rstudio.com/2018/05/31/exploring-r-packages/">cranly</a> and the <code>tools::CRAN_package_db()</code> function in a little more depth, but there is much more to explore.</p>
<p>Finally, it would be remiss of me not to mention that the first thing anyone, novice or expert, should do when looking for a package to solve some new problem, or even to get an indication of the quality of a package, is to examine the <a href="https://cran.r-project.org/web/views/">CRAN Task Views</a>. These are lists of packages curated by experts and organized into functional areas. With just a little searching, you will see that <code>coin</code> shows up in multiple task views.</p>
<script>window.location.href='https://rviews.rstudio.com/2018/10/22/searching-for-r-packages/';</script>
Serendipity at R / Medicine
https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/
Tue, 16 Oct 2018 00:00:00 +0000https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/
<p>We knew we were on to something important early on in the process of organizing <a href="www.r-medicine.com">R / Medicine 2018</a>. Even during our initial attempts to articulate the differences between this conference and <a href="www.rinpharma.com">R / Pharma 2018</a>, it became clear that the focus on the use of R and statistics in clinical settings was going to be a richer topic than just the design of clinical trials. However, it wasn’t until the conference got underway that we realized there was magic in the mix of attendees. R / Medicine attracted quite a few clinicians who were themselves using R in their work, or were in the process of teaching themselves R. This group catalyzed the discussions that continued throughout the conference, enabling high-bandwidth exchanges that would have otherwise suffered from the effort to translate between the two cultures. The small, single-track nature of the conference helped to keep the conversations going, with the questions and answers at the end of a given talk helping to enrich the quality of successive discussions.</p>
<p>Rob Tibshirani set the collaborative tone for the conference with his opening <a href="https://r-medicine.netlify.com/talks/talk7.pdf">keynote talk</a> describing the clinical forecasting system he and his collaborators have built to predict platelet usage for the Stanford hospitals. Big-league and big-impact, the system shows the promise of delivering real clinical and financial benefits. Tibshirani’s presentation of the modeling process also set the bar for clarity.</p>
<p>The other keynotes were also “top shelf”. Michael Lawrence spoke about <a href="https://r-medicine.netlify.com/talks/michael-lawrence-keynote.pdf">Scientific Software In-the-Large</a>. He laid out three challenges for scientific programming at this scale:<br />
* Integration of independently developed modules<br />
* Translation of analyses and prototypes into software<br />
* Scalability<br />
and addressed these issues using examples from the <a href="https://www.bioconductor.org/">Bioconductor</a> project.</p>
<p>Victoria Stodden’s Keynote, <a href="http://web.stanford.edu/~vcs/talks/Yale-Sept-2018-STODDEN.pdf">Computational Reproducibility in Medical Research: Toward Open Code and Data</a>, was a meditation on the need to reassess scientific transparency in an age where big data and computational power are driving medical research, and deep intellectual contributions are encoded in software. I was particularly struck by the idea that progress towards computational reproducibility depends on the coordination of stakeholders.</p>
<p><img src="/post/2018-10-10-RMedicine_files/progress.png" height = "500" width="700"></p>
<p>Perhaps the highest-energy talk of the conference (and maybe all of the conferences I have attended this year) was given by Yale’s <a href="https://medicine.yale.edu/intmed/people/harlan_krumholz.profile">Dr. Harlan Krumholz</a>. Unfortunately, we have neither video nor slides from this keynote, but to give you some ideal of Dr. Krumholz iconoclastic work, look at the 2010 <a href="https://www.forbes.com/forbes/2010/0927/opinions-harlan-krumholz-yale-medicine-ideas-opinions.html#311fdfca6db3">Forbes Article</a> and this more recent article published in <a href="https://www.healthaffairs.org/doi/10.1377/hlthaff.2014.0053">HealthAffairs</a>. The following are some notes I managed to take at the talk between moments of mesmerization. With respect to medicine in general Dr. Krumholz said that:</p>
<blockquote>
<p>There could not be a more exciting era in medicine. Medicine is emerging as an information science and the clinician’s role is changing to be a guide or interpretor, not a shaman.</p>
</blockquote>
<p>Commenting on evidence-based medicine:</p>
<blockquote>
<p>More than half of the guidlines in cardiology are not based on evidence.</p>
</blockquote>
<p>With respect to medical data, he said:</p>
<blockquote>
<p>The goal should be to take high-dimensional data and make it low-dimensional. Instead of thinking that everyone should have the same data, we should move towards thinking: How dow we use the data that we do have? There should be no missing data.</p>
</blockquote>
<p>I took these statements to mean that teams of clinicians, statisticians, and data scientists should be working towards building predictive models for individual patients based on whatever data is available for them and whatever big data is relevant. This was clearly the music the crowd wanted to dance to.</p>
<p>The slides for most of the rest of the talks are available on the website. One talk I would like to highlight here is Nathaniel Phillips’ talk on <a href="https://ndphillips.github.io/RMedicine_2018/#1">Fast and Frugal Trees</a>.</p>
<p><img src="/post/2018-10-10-RMedicine_files/fft.png" height = "500" width="700"></p>
<p>This talk addressed a recurring theme throughout the conference: the difference in decision making between the two cultures of statisticians and physicians. Probabilistic estimates to characteristic risk and to inform decision making are central to a statisticians worldview. Physicians, on the other hand, are in general not comfortable with probabilities, and when push comes to shove, prefer unambiguous guidelines and thresholds, such as blood pressure ranges, to inform treatment decisions. A vexing cultural problem is to identify effective decision models that have a chance of actually being used by clinicians.</p>
<p>The conference finished with a roundtable discussion with the theme <em>Bridging the Two Cultures</em>, with panelists Beth Atkinson, Joseph Chou, Peter Higgins, Stephan Kadauke, Chinonyerem Madu, and Jack Wasey representing both the statistical and clinical points of view. The moderator (me) began by asking three questions:
1. How do clinicians engage with statisticians and data scientists?
2. What are some key ideas you should know about collaborating?
3. In your experience, what kinds of engagements have been the most successful?</p>
<p>Panelists were free to respond as they felt inclined to any or all of the questions. As I recall, a consensus emerged around three key ideas: make an effort to empathize with colleagues, meet frequently and go out of your way to interact with colleagues, and carefully select projects and then cultivate them.</p>
<p>Planning is already underway for R / Medicine 2019. Mark the week of September 23rd, and stay tuned!</p>
<script>window.location.href='https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/';</script>
September 2018: Top 40 New Packages
https://rviews.rstudio.com/2018/10/08/september-2018-top-40-new-packages/
Mon, 08 Oct 2018 00:00:00 +0000https://rviews.rstudio.com/2018/10/08/september-2018-top-40-new-packages/
<p>September was another relatively slow month for new package activity on CRAN: “only” 126 new packages by my count. My Top 40 list is heavy on what I characterize as “utilities”: packages that either extend R in some fashion or make it easier to do things in R. This month, the packages I selected fall into eight categories: Data, Finance, Machine Learning, Science, Statistics, Time Series, Utilities and Visualization.</p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=trigpoints">trigpoints</a> v1.0.0: Contains a complete data set of historic GB trig points (fixed survey points that help mapmakers and hikers) in <a href="https://en.wikipedia.org/wiki/Ordnance_Survey_National_Grid">British National Grid (OSGB36)</a> coordinate reference system.</p>
<p><a href="https://cran.r-project.org/package=UKgrid">UKgrid</a> v0.1.0: Provides a time series of the national grid demand (high-voltage electric power transmission network) in the UK since 2011. The <a href="https://cran.r-project.org/web/packages/UKgrid/vignettes/UKgrid_vignette.html">vignette</a> shows how to use the package.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/UKgrid.png" height = "400" width="600"></p>
<h3 id="finance">Finance</h3>
<p><a href="https://cran.r-project.org/package=jubilee">jubilee</a> v0.2-5: Implements a long-term forecast model called <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3156574">Jubilee-Tectonic model</a> to forecast future returns of the U.S. stock market, Treasury yield, and gold price. The <a href="https://cran.r-project.org/web/packages/jubilee/vignettes/jubilee-tutorial.pdf">vignette</a> shows the math.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/jubilee.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=portsort">portsort</a> v0.1.0: Provides functions to sort assets into portfolios for up to three factors via a conditional or unconditional sorting procedure. There is an <a href="https://cran.r-project.org/web/packages/portsort/vignettes/portsort.html">Introduction</a>.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/portsort.png" height = "300" width="500"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=crfsuite">crfsuite</a> v0.1.1: Wraps the <a href="https://github.com/chokkan/crfsuite">CRFsuite library</a> allowing users to fit a conditional random field model. The focus is Natural Language Processing, and there are models for named entity recognition, text chunking, part of speech tagging, intent recognition, and classification. The <a href="https://cran.r-project.org/web/packages/crfsuite/vignettes/crfsuite-nlp.html">vignette</a> shows how to use the package.</p>
<p><a href="https://cran.r-project.org/package=ELMSO">ELMSO</a> v1.0.0: Implements the algorithm described in <a href="http://journals.ama.org/doi/10.1509/jmr.15.0307">Paulson, Luo, and James (2018)</a>; see <a href="http://www-bcf.usc.edu/~gareth/research/ELMSO.pdf">here</a> for a full-text version of the paper. The algorithm allocates budget across a set of online advertising opportunities.</p>
<p><a href="https://cran.r-project.org/package=embed">embed</a> v0.0.1: Provides functions to convert factor predictors to one or more numeric representations using simple generalized <a href="arXiv:1611.09477">linear models</a> or <a href="arXiv:1604.06737">nonlinear models</a>.</p>
<p><a href="https://cran.r-project.org/package=newsmap">newsmap</a> v0.6: Implements a semi-supervised model for geographical document classification ([Watanabe (2018)])(doi:10.<sup>1080</sup>⁄<sub>21670811</sub>.2017.1293487) with seed dictionaries in English, German, Spanish, Japanese, and Russian. See the <a href="https://cran.r-project.org/web/packages/newsmap/readme/README.html">README</a> for an example.</p>
<p><a href="https://CRAN.R-project.org/package=splinetree">splinetree</a> v0.1.0: Provides functions to build regression trees and random forests for longitudinal or functional data using a spline projection method. Implements and extends the work of <a href="doi:10.1080/10618600.1999.10474847">Yu and Lambert (1999)</a>. There is an <a href="https://cran.r-project.org/web/packages/splinetree/vignettes/Long-Intro.html">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/splinetree/vignettes/Tree-Intro.html">trees</a> and <a href="https://cran.r-project.org/web/packages/splinetree/vignettes/Forest-Intro.html">forests</a>.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/splinetree.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=stylest">stylest</a> v0.1.0: Provides functions to estimate the distinctiveness in speakers’ (authors’) style. Fits models that can be used for predicting speakers of new texts. See <a href="doi:10.2139/ssrn.3235506">Spirling et al (2018)</a> for the details and the <a href="https://cran.r-project.org/web/packages/stylest/vignettes/stylest-vignette.html">vignette</a> for an example on how to use the package.</p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=conStruct">conStruct</a> v1.0.0: Provides a method for modeling genetic data as a combination of discrete layers, within each of which relatedness may decay continuously with geographic distance. There are vignettes for <a href="https://cran.r-project.org/web/packages/conStruct/vignettes/format-data.html">formatting data</a>, <a href="https://cran.r-project.org/web/packages/conStruct/vignettes/model-comparison.html">model construction</a>, and on <a href="https://cran.r-project.org/web/packages/conStruct/vignettes/run-conStruct.html">running</a> and <a href="https://cran.r-project.org/web/packages/conStruct/vignettes/visualize-results.html">visualizing</a> <code>consStruct</code> analyses.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/conStruct.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=episcan">episcan</a> v0.0.1: Provides some efficient mechanisms to scan epistasis in genome-wide interaction studies (GWIS), and supports both case-control status (binary outcome) and quantitative phenotype (continuous outcome) studies. See <a href="doi:10.1038/ejhg.2010.196">Kam-Thong and Cxamara et al. (2011)</a>, <a href="doi:10.1093/bioinformatics/btr218">Kam-Thong and Pütz et al. (2011)</a>, and the <a href="https://cran.r-project.org/web/packages/episcan/vignettes/episcan.html">vignette</a>.</p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=ahpsurvey">ahpsurvey</a> v0.2.2: Implements the Analytic Hierarchy Process, a versatile multi-criteria decision-making tool introduced by <a href="doi:10.1016/0270-0255(87)90473-8">Saaty (1987)</a> that allows decision-makers to weigh attributes and evaluate alternatives presented to them. The <a href="https://cran.r-project.org/web/packages/ahpsurvey/vignettes/my-vignette.html">vignette</a> provides examples.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/ahpsurvey.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=empirical">empirical</a> v0.1.0: Implements empirical univariate probability density functions (continuous functions) and empirical cumulative distribution functions (step functions or continuous). The <a href="https://cran.r-project.org/web/packages/empirical/vignettes/empirical.pdf">vignette</a> provides examples.</p>
<p><a href="https://cran.r-project.org/package=basicMCMCplots">basisMCMCplots</a> v0.1.0: Provides functions for examining posterior MCMC samples from a single and multiple chains that interface with the NIMBLE software package. See <a href="doi:10.1080/10618600.2016.1172487">de Valpine et al. (2017)</a>.</p>
<p><a href="https://cran.r-project.org/package=MetaStan">MetaStan</a> v0.0.1: Provides functions to perform Bayesian meta-analysis using <code>Stan</code>. Includes binomial-normal hierarchical models and option to use weakly informative priors for the heterogeneity parameter and the treatment effect parameter, which are described in <a href="arXiv:1809.04407">Guenhan, Roever, and Friede (2018)</a>. The <a href="https://cran.r-project.org/web/packages/MetaStan/vignettes/MetaStan_BNHM.html">vignette</a> contains an example.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/metastan.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=Opt5PL">Opt4PL</a> v0.1.1: Provides functions to obtain and evaluate various optimal designs for the 3-, 4-, and 5-parameter logistic models. The optimal designs are obtained based on the numerical algorithm in <a href="doi:10.18637/jss.v083.i05">Hyun, Wong, Yang (2018)</a>.</p>
<p><a href="https://cran.r-project.org/package=rmetalog">rmatalog</a> v1.0.0: Implements the metalog distribution, a modern, highly flexible, data-driven distribution. See <a href="doi:10.1287/deca.2016.0338">Keelin (2016)</a>. The <a href="https://cran.r-project.org/web/packages/rmetalog/vignettes/rmetalog-vignette.html">vignette</a> provides an example.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/rmetalog.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=rwavelet">rwavelet</a> v0.1.0: Provides functions to perform wavelet analysis (orthogonal and translation invariant transforms) with applications to data compression or denoising. Most of the code is a port of the <a href="https://statweb.stanford.edu/~wavelab/"><code>MATLAB</code> Wavelab toolbox</a> written by Donoho, Maleki and Shahram. The <a href="https://cran.r-project.org/web/packages/rwavelet/vignettes/rwaveletvignette.html">vignette</a> provides examples.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/rwavelet.png" height = "300" width="400"></p>
<p><a href="https://cran.r-project.org/package=SamplingBigData">samplingBigData</a> v1.0.0: Provides methods for sampling large data sets, including spatially balanced sampling in multi-dimensional spaces with any prescribed inclusion probabilities. Written in C, it uses efficient data structures such as k-d trees that scale to several million rows on a modern desktop computer.</p>
<p><a href="https://cran.r-project.org/package=survivalAnalysis">survivalAnalysis</a> v0.1.0: Implements a high-level interface to perform survival analysis, including Kaplan-Meier analysis and log-rank tests and Cox regression. There are vignettes for <a href="https://cran.r-project.org/web/packages/survivalAnalysis/vignettes/univariate.html">univariate</a> and <a href="https://cran.r-project.org/web/packages/survivalAnalysis/vignettes/multivariate.html">multivariate</a> survival analyses.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/survivalAnalysis.png" height = "300" width="700"></p>
<p><a href="https://cran.r-project.org/package=ungroup">ungroup</a> v1.1.0: Provides functions to implement a penalized composite link model for efficient estimation of smooth distributions from coarsely binned data. For a detailed description of the method and applications, see <a href="doi:10.1093/aje/kwv020">Rizzi et al. (2015)</a>. The <a href="https://cran.r-project.org/web/packages/ungroup/vignettes/Intro.pdf">vignette</a> provides examples.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/ungroup.png" height = "400" width="600"></p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://CRAN.R-project.org/package=bayesdfa">bayesdfa</a> v0.1.0: Implements Bayesian dynamic factor analysis, a dimension-reduction tool for multivariate time series, with <code>Stan</code>. The <a href="https://cran.r-project.org/web/packages/bayesdfa/vignettes/bayesdfa.html">vignette</a> shows how to identify extremes and latent regimes with <code>glmmfields</code>.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/bayesdfa.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=tbrf">tbrf</a> v0.1.0: Provides rolling statistical functions based on date and time windows instead of n-lagged observations. The <a href="https://cran.r-project.org/web/packages/tbrf/vignettes/intro_to_tbrf.html">vignette</a> offers examples.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/tbrf.png" height = "400" width="700"></p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=atable">atable</a> v0.1.0: Provides functions to create tables for reporting clinical trials, calculate descriptive statistics and hypotheses tests, and arrange the results in a table with <code>LaTeX</code> or <code>Word</code>. The <a href="https://cran.r-project.org/web/packages/atable/vignettes/atable_usage.pdf">vignette</a> provides examples.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/atable.png" height = "400" width="700"></p>
<p><a href="https://cran.r-project.org/package=av">av</a> v0.2: Implements bindings to the <a href="http://www.ffmpeg.org/">FFmpeg</a> AV library for working with audio and video in R.</p>
<p><a href="https://cran.r-project.org/package=binb">binb</a> v0.0.2: Provides a collection of <code>LaTeX</code> styles using <code>Beamer</code> customization for PDF-based presentation slides in <code>RMarkdown</code>. The <a href="https://cran.r-project.org/web/packages/binb/vignettes/metropolisDemo.pdf">vignette</a> provides an example.</p>
<p><a href="https://cran.r-project.org/package=broom.mixed">broom.mixed</a> v0.2.2: Converts fitted objects from various R mixed-model packages into tidy data frames along the lines of the <code>broom</code> package.</p>
<p><a href="https://cran.r-project.org/package=codified">codified</a> v0.2.0: Allows authors to augment clinical data with metadata to create output used in conventional publications and reports. See the <a href="https://cran.r-project.org/web/packages/codified/vignettes/nih-enrollment-html.html">vignette</a> for examples.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/codified.png" height = "400" width="700"></p>
<p><a href="https://cran.r-project.org/package=duawranglr">duawrangler</a> v0.6.3: Allows users to create shareable data sets from raw data files that contain protected elements. There are vignettes on the <a href="https://cran.r-project.org/web/packages/duawranglr/vignettes/duawranglr.html">motivation</a> for the package and on <a href="https://cran.r-project.org/web/packages/duawranglr/vignettes/securing_data.html">securing data</a>.</p>
<p><a href="https://cran.r-project.org/package=ipc">ipc</a> v0.1.0: Provides tools for passing messages between R processes with Shiny Examples showing how to perform useful tasks. The <a href="https://cran.r-project.org/web/packages/ipc/vignettes/shinymp.html">vignette</a> shows how to use the package.</p>
<p><a href="https://cran.r-project.org/package=piggyback">piggyback</a> v0.0.8: Works around git’s 50MB commit limit to allow larger (up to 2 GB) data files to piggyback on a repository as assets attached to individual GitHub releases. There is a package <a href="https://cran.r-project.org/web/packages/piggyback/vignettes/intro.html">overview</a> and a vignette on <a href="https://cran.r-project.org/web/packages/piggyback/vignettes/alternatives.html">alternatives</a>.</p>
<p><a href="https://cran.r-project.org/package=pysd2r">pysd2r</a> v0.1.0: Uses <code>reticulate</code> to implement an interface to the <code>pysd</code> toolset, provides a number of <code>pysd</code> functions, and can read files in <code>Vensim</code>, <code>mdl</code>, and <code>xmile</code> formats. The vignette provides an <a href="https://cran.r-project.org/web/packages/pysd2r/vignettes/pysd2r.html">overview</a>.</p>
<p><a href="https://cran.r-project.org/package=radix">radix</a> v0.5: Provides functions to format scientific and technical articles for the web with Radix reader-friendly typography, flexible layout options for visualizations, and full support for footnotes and citations.</p>
<p><a href="https://cran.r-project.org/package=rbtc">rbtc</a> v0.1-5: Implements the <a href="https://en.bitcoin.it/wiki/API_reference_(JSON-RPC)">RPC-JSON API for Bitcoin</a> and provides utility functions for address creation and content analysis of the blockchain.</p>
<p><a href="https://cran.r-project.org/package=salty">salty</a> v0.1.0: Lets users take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc. See <a href="https://cran.r-project.org/web/packages/salty/readme/README.html">README</a> for examples.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=customLayout">customLayout</a> v0.2.0: Offers an extended version of the <code>graphics::layout()</code> function that also supports <code>grid</code> graphics, allowing users to create complicated drawing areas for multiple elements by combining much simpler layouts. The <a href="https://cran.r-project.org/web/packages/customLayout/vignettes/layouts-for-officer-power-point-document.html">vignette</a> for <code>PowerPoint</code>.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/customLayout.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=echarts4r">echarts4r</a> v0.1.1: Allows users to create interactive charts by leveraging the <a href="https://ecomfe.github.io/echarts-examples/public/index.html">Echarts</a> JavaScript library. It includes 33 chart types, themes, <code>Shiny</code> proxies, and animations. Look <a href="https://echarts4r.john-coene.com/">here</a> for an example.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/echarts4r.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=ggparliament">ggparliament</a> v2.0.0: Provides parliament plots to visualize election results as points in the architectural layout of the legislative chamber. There are vignettes for <a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/arrange_parliament_8.html">arranging parliament</a>, <a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/basic-parliament-plots_1.html">basic plots</a>, <a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/draw-majority-threshold_3.html">drawing majorities</a>, <a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/emphasize_parliamentarians_6.html">emphasizing parliamentarians</a>, <a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/facet-parliament_5.html">faceting</a>,
<a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/hanging_seats_7.html">hanging seats</a>, <a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/highlight-government_4.html">highlighingt government</a>, and <a href="https://cran.r-project.org/web/packages/ggparliament/vignettes/label-parties_2.html">labeling parties</a>.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/ggparliament.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=ggTimeSeries">ggTimeSeries</a> v1.0.1: Provides additional time series visualizations, such as calendar heat map, steamgraph, and marimekko. There is a <a href="https://cran.r-project.org/web/packages/ggTimeSeries/vignettes/ggTimeSeries.html">vignette</a>.</p>
<p><img src="/post/2018-10-08-Sept-Top40_files/ggTimeSeries.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2018/10/08/september-2018-top-40-new-packages/';</script>
Some Thoughts on R / Pharma 2018
https://rviews.rstudio.com/2018/10/03/some-thoughts-on-r-pharma-2018/
Wed, 03 Oct 2018 00:00:00 +0000https://rviews.rstudio.com/2018/10/03/some-thoughts-on-r-pharma-2018/
<p>It’s no secret that there are few industries more competitive than the pharmaceutical industry. Big money placed on long-shot bets for block-buster drugs where being first makes all the difference means a constant struggle to gain a <a href="https://pharma.elsevier.com/pharma-rd/gaining-competitive-advantage/">competitive edge</a>. So, you might find it surprising that the inaugural R / Pharma Conference held this past August on the Harvard campus in a very classy auditorium was all about collaboration.</p>
<p>Some might also find it surprising that data scientists from competitive companies would gather to share information, but this is quite common. I have seen it before in other competitive industries, for example in <a href="https://www.ieee.org/standards/index.html">IEEE</a>-led standards initiatives, where engineers gather to forge a common technology. Not only is there the human need to share and learn from peers (and also brag a little), there is a larger force at play: a kind of market clearing operation where experts gather to gain as much of an advantage as they can by ensuring that no easily exploitable arbitrage opportunities remain.</p>
<p>It was a surprise, though (and I think a source of general amusement as the conference proceeded), that nearly every talk seemed to be about Shiny. Looking back, it is clear that it should not have been: 49% of the <a href="http://rinpharma.com/program/talks-by-author.html">abstracts</a> explicitly mention Shiny. This word cloud was built from the abstract submissions.
<img src="/post/2018-09-28-Rickert-RPharma_files/titles.png" height = "500" width="700"></p>
<p>Shiny is basically a technology for sharing complex information across multiple organizations and stakeholders with different skill sets. Shiny, too, is all about collaboration. For a look into the large, production-grade Shiny app, <a href="https://zappingseb.github.io/RPharma2018/">bioWARP</a>, see Sebastian Wolf’s <a href="https://rviews.rstudio.com/2018/09/04/how-to-build-shiny-trucks-not-shiny-cars/">recent post</a>.</p>
<p>Other major themes addressed at the conference were: reproducible research, package administration, scaling R for production, and using R in a regulatory environment. This last theme was underscored by a strong FDA presence. Lilliam Rosario from the <a href="https://www.fda.gov/aboutfda/centersoffices/officeofmedicalproductsandtobacco/cder/">FDA Center for Drug Evaluation & Research</a> delivered the opening keynote, in which she addressed the regulatory role of CDER and the use of R. FDA speaker Mat Souktup spoke about the need to transcend the compartmentalized culture common in medical research, and how open-source tools are helpful in working towards this goal. He explicitly noted along the way that the FDA does not specify what software may be used. The third FDA speaker, Paul Schuette, filled in some details associated with topics raised by Rosario and talked about the use of R and Shiny at CDER. Along these same lines, Andy Nicholls from <a href="https://www.gsk.com/">GSK</a> conducted a well-attended and very informative workshop on <em>The Challenges of Validating R</em>. You can find Andy’s slides <a href="https://github.com/andyofsmeg/RValidation">here</a>.</p>
<p>Other keynote speakers were Max Kuhn, who talked about Modeling in the tidyverse (slides <a href="http://appliedpredictivemodeling.com/blog/rpharma18">here</a>); Joe Cheng, who described how to use Shiny responsibly in pharma (slides <a href="https://speakerdeck.com/jcheng5/using-shiny-responsibly-in-pharma">here</a>); and Michael Lawrence, who spoke about enabling open-source analytics in the enterprise.</p>
<p>Slides for some of the other presentations made at the conference may be found <a href="http://rinpharma.com/program/schedule.html">here</a>. I expect more will become available soon.</p>
<p>My very biased impression was that R / Pharma was an unqualified success at accomplishing the major objectives of bringing together data scientists and statisticians working in the Pharmaceutical industry, and of presenting a high quality program that explored several issues relating to the production use of R in a regulatory environment.</p>
<p>The following chart shows that representatives from quite a few pharmaceutical companies attended in spite of organization problems that artificially limited the overall number of attendees to about 140.</p>
<p><img src="/post/2018-09-28-Rickert-RPharma_files/attendees.png" height = "400" width="600"></p>
<p>Planning has already begun for R / Pharma 2019. The exact date has not yet been locked in, but I expect it will be mid-August. Please stay tuned for more information.</p>
<script>window.location.href='https://rviews.rstudio.com/2018/10/03/some-thoughts-on-r-pharma-2018/';</script>
August 2018: Top 40 New Packages
https://rviews.rstudio.com/2018/09/26/august-2018-top-40-new-packages/
Wed, 26 Sep 2018 00:00:00 +0000https://rviews.rstudio.com/2018/09/26/august-2018-top-40-new-packages/
<p>Package developers relaxed a bit in August.; only 160 new packages went to CRAN that month. Here are my “Top 40” picks organized into seven categories: Data, Machine Learning, Science, Statistics, Time Series, Utilities, and Visualization.</p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=nsapi">nsapi</a> v0.1.1: Provides an interface to the <a href="https://www.ns.nl/en/travel-information/ns-api">Nederlandse Spoorwegen (Dutch Railways) API</a>, allowing users to download current departure times, disruptions and engineering work, the station list, and travel recommendations from station to station. There is a <a href="https://cran.r-project.org/web/packages/nsapi/vignettes/basic_use_nsapi_package.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=repec">repec</a> v0.1.0: Provides utilities for accessing <a href="http://repec.org/">RePEc</a> (Research Papers in Economics) through a RESTful API. You can request an access code and get detailed information <a href="https://ideas.repec.org/api.html">here</a>.</p>
<p><a href="https://cran.r-project.org/package=rfacebookstat">rfacebookstat</a> v1.8.3: Implements an interface to the <a href="https://developers.facebook.com/docs/marketing-apis/">Facebook Marketing API</a>, allowing users to load data by campaigns, ads, ad sets, and insights.</p>
<p><a href="https://CRAN.R-project.org/package=UCSCXenaTools">UCSCXenaTools</a> v0.2.4: Provides access to data sets from <a href="https://xena.ucsc.edu/public-hubs/">UCSC Xena data hubs</a>, which are a collection of UCSC-hosted public databases.</p>
<p><a href="https://cran.r-project.org/package=ZipRadius">ZipRadius</a> v1.0.1: Generates a data frame of US zip codes and their distance to the given zip code, when given a starting zip code and a radius in miles. Also includes functions for use with <code>choroplethrZip</code>, which are detailed in the <a href="https://cran.r-project.org/web/packages/ZipRadius/vignettes/ZipRadius.html">vignette</a>.
<img src="/post/2018-09-21-Aug-Top40_files/ZipRadius.png" height = "500" width="700"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=dials">dials</a> v0.0.1: Provides tools for creating model parameters that cannot be directly estimated from the data. There is a <a href="https://cran.r-project.org/web/packages/dials/vignettes/Basics.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=tosca">tosca</a> v0.1-2: Provides a framework for statistical analysis in content analysis. See the <a href="https://cran.r-project.org/web/packages/tosca/vignettes/Vignette.pdf">vignette</a> for details.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/tosca.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=tsmp">tsmap</a> v0.3.1: Implements the <a href="http://www.cs.ucr.edu/~eamonn/MatrixProfile.html">Matrix Profile concept</a> for classification.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/tsmap.png" height = "400" width="500"></p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=DSAIRM">DSAIRM</a> v0.4.0: Provides a collection of <code>Shiny</code> apps that implement dynamical systems simulations to explore within-host immune response scenarios. See the package <a href="https://cran.r-project.org/web/packages/DSAIRM/vignettes/DSAIRM.html">Tutorial</a>.</p>
<p><a href="https://cran.r-project.org/package=epiflows">epiflows</a> v0.2.0: Provides functions and classes designed to handle and visualize epidemiological flows between locations, as well as a statistical method for predicting disease spread from flow data initially described in <a href="doi:10.2807/1560-7917.ES.2017.22.28.30572">Dorigatti et al. (2017)</a>. For more information, see the <a href="http://www.repidemicsconsortium.org/">RECON toolkit</a> for outbreak analysis. There is an <a href="https://cran.r-project.org/web/packages/epiflows/vignettes/introduction.html">Overview</a> and a vignette on <a href="https://cran.r-project.org/web/packages/epiflows/vignettes/epiflows-class.html">Data Preparation</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/epiflows.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=fieldRS">fieldRS</a> v0.1.1: Provides functions for remote-sensing field work using best practices suggested by <a href="doi:10.1016/j.rse.2014.02.015">Olofsson et al. (2014)</a>. See the <a href="https://cran.r-project.org/web/packages/fieldRS/vignettes/fieldRS.html">vignette</a> for details.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/fieldsRS.png" height = "500" width="700"></p>
<p><a href="https://cran.r-project.org/package=Rnmr1D">Rnmr1D</a> v1.2.1: Provides functions to perform the complete processing of proton nuclear magnetic resonance spectra from the free induction decay raw data. For details see <a href="doi:10.1007/s11306-017-1178-y">Jacob et al. (2017)</a> and the <a href="https://cran.r-project.org/web/packages/Rnmr1D/vignettes/Rnmr1D.html">vignette</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/Rnmr1D.png" height = "500" width="700"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=bcaboot">bcaboot</a> v0.2-1: Provides functions to compute bootstrap confidence intervals in an almost automatic fashion. See the <a href="https://cran.r-project.org/web/packages/bcaboot/vignettes/bcaboot.html">vignette</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/bcaboot.png" height = "500" width="700"></p>
<p><a href="https://cran.r-project.org/package=bivariate">bivariate</a> v0.2.2: Contains convenience functions for constructing and plotting bivariate probability distributions. See the <a href="https://cran.r-project.org/web/packages/bivariate/vignettes/bivariate.pdf">vignette</a> for details.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/bivariate.png" height = "400" width="500"></p>
<p><a href="https://CRAN.R-project.org/package=DesignLibrary">DesignLibrary</a> v0.1.1: Provides a simple interface to build designs and allow users to compare performance of a given design across a range of combinations of parameters, such as effect size, sample size, and assignment probabilities. Look <a href="https://declaredesign.org/library/">here</a> for more information.</p>
<p><a href="https://CRAN.R-project.org/package=doremi">doremi</a> v0.1.0: Provides functions to fit the dynamics of a regulated system experiencing exogenous inputs using differential equations and linear mixed-effects regressions to estimate the characteristic parameters of the equation. See the <a href="https://cran.r-project.org/web/packages/doremi/vignettes/Introduction-to-doremi.html">vignette</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/doremi.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=eikosograms">eikosograms</a> v0.1.1: An eikosogram (probability picture from the ancient Greek εὶκὀσ - likely or probable) divides the unit square into rectangular regions whose areas, sides, and widths represent various probabilities associated with the values of one or more categorical variates. For a discussion on the eikosogram and its superiority to Venn diagrams in teaching probability, see <a href="https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/paper.pdf">Cherry and Oldford (2003)</a>, and for a discussion of its value in exploring conditional independence structure and relation to graphical and log-linear models, see <a href="https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/independence/paper.pdf">Oldford (2003)</a>. There is an <a href="https://cran.r-project.org/web/packages/eikosograms/vignettes/Introduction.html">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/eikosograms/vignettes/DataAnalysis.html">Data Analysis</a> and <a href="https://cran.r-project.org/web/packages/eikosograms/vignettes/IndependenceExploration.html">Independence Relations</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/eikosograms.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=localIV">localIV</a> v0.1.0: Provides functions to estimate marginal treatment effects using local instrumental variables. See <a href="doi:10.1162/rest.88.3.389">Heckman et al. (2006)</a> and <a href="https://scholar.harvard.edu/files/xzhou/files/zhou-xie_mte2.pdf">Zhou and Xie (2018)</a> for background.</p>
<p><a href="https://cran.r-project.org/package=merlin">merlin</a> v0.0.1: Provides functions to fit linear, non-linear, and user-defined mixed effects regression models following the framework developed by <a href="arXiv:1710.02223">Crowther (2017)</a>. See the <a href="https://cran.r-project.org/web/packages/merlin/vignettes/merlin.html">vignette</a> for details.</p>
<p><a href="https://cran.r-project.org/package=MRFcov">MRFcov</a> v1.0.35: Provides functions to approximate node interaction parameters of Markov Random Fields graphical networks. The general methods are described in <a href="doi:10.1002/ecy.2221">Clark et al. (2018)</a>. There are vignettes on <a href="https://cran.r-project.org/web/packages/MRFcov/vignettes/CRF_data_prep.html">Preparing Datasets</a>, <a href="https://cran.r-project.org/web/packages/MRFcov/vignettes/Gaussian_Poisson_CRFs.html">Gaussian and Poisson Fields</a>, and an example using <a href="https://cran.r-project.org/web/packages/MRFcov/vignettes/Bird_Parasite_CRF.html">Bird parasite data</a>.</p>
<p><a href="https://CRAN.R-project.org/package=SCPME">SCPME</a> v1.0: Provides functions to estimate a penalized precision matrix via an augmented ADMM algorithm as described in <a href="doi:10.1093/biomet/asy023">Molstad and Rothman (2018)</a>. There is a <a href="https://cran.r-project.org/web/packages/SCPME/vignettes/Tutorial.html">Tutorial</a> and a vignette describing <a href="https://cran.r-project.org/web/packages/SCPME/vignettes/Details.html">Algorithm Details</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/SCPME.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=survxai">survxai</a> v0.2.0: Contains functions for creating a unified representation of survival models, which can be further processed by various survival explainers. There are vignettes on <a href="https://cran.r-project.org/web/packages/survxai/vignettes/Local_explanations.html">Local explanations</a>, <a href="https://cran.r-project.org/web/packages/survxai/vignettes/Global_explanations.html">global explanations</a>, <a href="https://cran.r-project.org/web/packages/survxai/vignettes/How_to_compare_models_with_survxai.html">comparing models</a>, and on a <a href="https://cran.r-project.org/web/packages/survxai/vignettes/Custom_predict_for_survival_models.html">custom prediction function</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/survxai.png" height = "400" width="500"></p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://CRAN.R-project.org/package=hpiR">hpiR</a> v0.2.0: Provides functions to compute house price indexes and series, and evaluate index goodness based on accuracy, volatility and revision statistics. For the background on model construction, see <a href="doi:10.2307/2109686">Case and Quigley (1991)</a>, and for hedonic pricing models, see <a href="doi:10.1016/j.jhe.2006.03.001">Bourassa et al. (2006)</a>. There is an an <a href="https://cran.r-project.org/web/packages/hpiR/vignettes/introduction.html">introduction</a> to the package and a vignette on <a href="https://cran.r-project.org/web/packages/hpiR/vignettes/classstructure.html">Classes</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/hpiR.png" height = "500" width="700"></p>
<p><a href="https://cran.r-project.org/package=STMotif">STMotif</a> v0.1.1: Provides functions to identify motifs (previously identified sub-sequences) in spatial-time series. There are vignettes on <a href="https://cran.r-project.org/web/packages/STMotif/vignettes/discovery-motifs.html">motif discovery</a>, <a href="https://cran.r-project.org/web/packages/STMotif/vignettes/examples.html">examples</a>, <a href="https://cran.r-project.org/web/packages/STMotif/vignettes/generation-of-candidates.html">candidate generation</a>, and <a href="https://cran.r-project.org/web/packages/STMotif/vignettes/validate-candidates.html">candidate validation</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/STMotif.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=trawl">trawl</a> v0.2.1: Contains functions for simulating and estimating integer-valued trawl processes as described in <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3100076">Veraart (2018)</a>, and for simulating random vectors from the bivariate negative binomial and the bi- and trivariate logarithmic series distributions. There is a vignette on <a href="https://cran.r-project.org/web/packages/trawl/vignettes/my-vignette2.html">trawl processes</a>, and another on the <a href="https://cran.r-project.org/web/packages/trawl/vignettes/my-vignette.html">binomial distributions</a>.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/src/contrib/Archive/arkdb">arkdb</a> v0.0.3: Provides functions for exporting tables from relational database connections into compressed text files, and streaming those text files back into a database without requiring the whole table to fit in working memory. See the <a href="https://cran.r-project.org/web/packages/arkdb/vignettes/arkdb.html">vignette</a> for a tutorial.</p>
<p><a href="https://cran.r-project.org/package=aws.kms">aws.kms</a> v0.1.2: Implements an interface to <a href="https://aws.amazon.com/kms/">AWS Key Management Service</a>, a cloud service for managing encryption keys. See the <a href="https://cran.r-project.org/web/packages/aws.kms/readme/README.html">README</a> for details.</p>
<p><a href="https://cran.r-project.org/package=DataPackageR">DatapackageR</a> v0.15.3: Provides a framework to help construct R data packages in a reproducible manner. It maintains data provenance by turning the data-processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. There is a <a href="https://cran.r-project.org/web/packages/DataPackageR/vignettes/usingDataPackageR.html">Guide</a> to using the package, and a vignette on <a href="https://cran.r-project.org/web/packages/DataPackageR/vignettes/YAML_CONFIG.html">YAML configuration</a>.</p>
<p><a href="https://cran.r-project.org/package=hedgehog">hedgehog</a> v0.1: Enables users to test properties of their programs against randomly generated input, providing far superior test coverage compared to unit testing. There is a general <a href="https://cran.r-project.org/web/packages/hedgehog/vignettes/hedgehog.html">tutorial</a> and a description of the <a href="https://cran.r-project.org/web/packages/hedgehog/vignettes/state-machines.html">Hedgehog state machine</a>.</p>
<p><a href="https://cran.r-project.org/package=jsonstat">jsonstat</a> v0.0.2: Implements an interface to <a href="https://json-stat.org/">JSON-stat</a>, a simple, lightweight ‘JSON’ format for data dissemination. There is a short <a href="https://cran.r-project.org/web/packages/jsonstat/vignettes/quickstart.html">quickstart quide</a>.</p>
<p><a href="https://cran.r-project.org/package=nseval">nseval</a> v0.4: Provides an API for Lazy and Non-Standard Evaluation with facilities to capture, inspect, manipulate, and create lazy values (promises), “…” lists, and active calls. See <a href="https://cran.r-project.org/web/packages/nseval/readme/README.html">README</a>.</p>
<p><a href="https://cran.r-project.org/package=runner">runner</a> v0.1.0: Provides running functions (windowed, rolling, cumulative) with varying window size and missing handling options for R vectors. See the <a href="https://cran.r-project.org/web/packages/runner/vignettes/runner.html">vignette</a> for details.</p>
<p><a href="https://cran.r-project.org/package=RTest">RTest</a> v1.1.9.0: Provides an XML-based testing framework for automated component tests of R packages developed for a regulatory environment. There is a short <a href="https://cran.r-project.org/web/packages/RTest/vignettes/RTest.pdf">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=sparkbq">sparkbq</a> v0.1.0: Extends <code>sparklyr</code> by providing integration with Google <a href="https://cloud.google.com/bigquery/">BigQuery</a>. It supports direct import/export from/to <code>BigQuery</code>, as well as intermediate data extraction from <a href="https://cloud.google.com/storage/">Google Cloud Storage</a>. See <a href="https://cran.r-project.org/web/packages/sparkbq/readme/README.html">README</a>.</p>
<p><a href="https://cran.r-project.org/package=vapour">vapour</a> v0.1.0: Provides low-level access to <code>GDAL</code>, the <a href="http://gdal.org/">Geospatial Data Abstraction Library</a>. There is a <a href="https://cran.r-project.org/web/packages/vapour/vignettes/vapour.html">vignette</a>.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=mapdeck">mapdeck</a> v0.1.0: Provides a mechanism to plot interactive maps using <a href="https://www.mapbox.com/mapbox-gl-js/api/">Mapbox GL</a>, a JavaScript library for interactive maps, and <a href="http://deck.gl/#/">Deck.gl</a>, a JavaScript library which uses <code>WebGL</code> for visualizing large data sets. The <a href="https://cran.r-project.org/web/packages/mapdeck/vignettes/mapdeck.html">vignette</a> explains how to use the package.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/mapdeck.gif" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=rayshader">rayshader</a> v0.5.1: Provides functions that use a combination of raytracing, spherical texture mapping, lambertian reflectance, and ambient occlusion to produce hillshades of elevation matrices. Includes water-detection and layering functions, programmable color palette generation, built-in textures, 2D and 3D plotting options, and more. See <a href="https://cran.r-project.org/web/packages/rayshader/readme/README.html">README</a> for details and examples.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/rayshader.gif" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=sigmajs">sigmajs</a> v0.1.1: Provides an interface to the <a href="http://sigmajs.org/">sigma.js</a> graph-visualization library, including animations, plugins, and shiny proxies. There is a brief <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/get_started.html">Get Started Guide</a>, and vignettes on <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/animate.html">Animation</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/buttons.html">Buttons</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/cluster.html">Coloring by Cluster</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/dynamic.html">Dynamic graphs</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/formats.html">igraph & gexf</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/layout.html">Layout</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/plugins.html">Plugins</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/settings.html">Settings</a>, <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/shiny.html">Shiny</a>, and <a href="https://cran.r-project.org/web/packages/sigmajs/vignettes/talkcross.html">Crosstalk</a>.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/sigmajs.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=survsup">survsup</a> v0.0.1: Implements functions to plot survival curves. The <a href="https://cran.r-project.org/web/packages/survsup/vignettes/survsup_intro.html">vignette</a> provides examples.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/survsup.png" height = "400" width="500"></p>
<p><a href="https://cran.r-project.org/package=tidybayes">tidybayes</a> v1.0.1: Provides functions for composing data and extracting, manipulating, and visualizing posterior draws from Bayesian models (<code>JAGS</code>, <code>Stan</code>, <code>rstanarm</code>, <code>brms</code>, <code>MCMCglmm</code>, <code>coda</code>, …) in a tidy data format. There is a vignette on <a href="https://cran.r-project.org/web/packages/tidybayes/vignettes/tidybayes.html">Using tidy data with Bayesian Models</a>, and vignettes for <a href="https://cran.r-project.org/web/packages/tidybayes/vignettes/tidy-brms.html">brms</a> and <a href="https://cran.r-project.org/web/packages/tidybayes/vignettes/tidy-rstanarm.html">rstanarm</a> models.</p>
<p><img src="/post/2018-09-21-Aug-Top40_files/tidybayes.png" height = "400" width="500"></p>
<script>window.location.href='https://rviews.rstudio.com/2018/09/26/august-2018-top-40-new-packages/';</script>