R Language on R Views
https://rviews.rstudio.com/categories/r-language/
Recent content in R Language on R ViewsHugo -- gohugo.ioen-usWed, 24 Jul 2019 00:00:00 +0000June 2019 "Top 40" R Packages
https://rviews.rstudio.com/2019/07/24/june-2019-top-40-r-packages/
Wed, 24 Jul 2019 00:00:00 +0000https://rviews.rstudio.com/2019/07/24/june-2019-top-40-r-packages/
<p>Approximately 136 new packages stuck to CRAN in June. (This number is difficult to nail down with certainty because packages may be removed from CRAN after sitting there for a few days.) Here are my picks for the June “Top 40” in ten categories: Computational Methods, Data, Finance, Genomics, Machine Learning, Science and Medicine, Statistics, Time Series, Utilities, and Visualization.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=cppRouting">cppRouting</a> v1.1: Provides functions to calculate distances, shortest paths and isochrones on weighted graphs using several variants of Dijkstra algorithm. Algorithms include unidirectional Dijkstra <a href="doi:10.1007/BF01386390">Dijkstra (1959)</a>, bidirectional Dijkstra <a href="https://pdfs.semanticscholar.org/0761/18dfbe1d5a220f6ac59b4de4ad07b50283ac.pdf">Goldberg et al. (2005)</a>, A* search <a href="doi:10.1109/TSSC.1968.300136">Hart et al. (1968)</a>, and new bidirectional A* <a href="http://repub.eur.nl/pub/16100/ei2009-10.pdf">Pijls & Post (2009)</a>. See the <a href="https://cran.r-project.org/web/packages/cppRouting/vignettes/cpprouting.html">vignette</a> and <a href="https://github.com/vlarmet/cppRouting/blob/master/readme.md">website</a> for how to use the package.</p>
<p><img src="/post/2019-07-19-June-Top40_files/cppRouting.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=GuessCompx">GuessCompx</a> v1.0.3: Provides functions to test multiple increasing random samples of a data set, and tries to fit various complexity functions o(n), o(n2), o(log(n)), etc. to make an empirical guess about the time and memory complexities of an algorithm or a function. See the <a href="https://cran.r-project.org/web/packages/GuessCompx/vignettes/GuessCompx.html">vignette</a> for details.</p>
<p><img src="/post/2019-07-19-June-Top40_files/GuessCompx.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=SimJoint">SimJoint</a> v0.2.1: Provides functions to simulate multivariate correlated data given non-parametric marginal distributions and their covariance structure, characterized by a correlation matrix from a purely computational perspective. See <a href="doi:10.1080/03610918208812265">Iman and Conover (1982)</a>, <a href="doi:10.1080/00273170802285693">Ruscio- and Kaczetow (2008)</a>, and the <a href="https://cran.r-project.org/web/packages/SimJoint/vignettes/SimulatedJointDistribution.pdf">vignette</a> for details.</p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=pinochet">pinochet</a> v0.1.0: Contains data about the victims of the Pinochet regime as compiled by the Chilean National Commission for Truth and Reconciliation Report (1991, ISBN:9780268016463). See the <a href="https://cran.r-project.org/web/packages/pinochet/vignettes/pinochet.html">vignette</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/pinochet.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=usdarnass">usdarnass</a> v:0.1.0: Offers an alternative for downloading various United States Department of Agriculture <a href="https://quickstats.nass.usda.gov/">(USDA)</a> data through R. There is a <a href="https://cran.r-project.org/web/packages/usdarnass/vignettes/usdarnass.html">Getting Started Guide</a> and a vignette on <a href="https://cran.r-project.org/web/packages/usdarnass/vignettes/usdarnass_output.html">usdarnass output</a>.</p>
<h3 id="finance">Finance</h3>
<p><a href="https://cran.r-project.org/package=ceRtainty">ceRtainty</a> v1.0.0: Provides functions to compute the certainty equivalents and premium risks as tools for risk-efficiency analysis. For more technical information, see <a href="doi:10.1111/j.1467-8489.2004.00239.x">Hardaker et al. (2004)</a>, and <a href="doi:10.2495/RISK080231">Richardson and Outlaw (2008)</a>. The <a href="https://cran.r-project.org/web/packages/ceRtainty/vignettes/ceRtainty.pdf">vignette</a> contains examples.</p>
<p><a href="https://cran.r-project.org/package=portfolioBacktest">portfolioBacktest</a> v0.1.1: Implements automated backtesting of multiple portfolios over multiple data sets of stock prices in a rolling-window fashion. Intended for researchers, practitioners, and Finance instructors. See the <a href="https://cran.r-project.org/web/packages/portfolioBacktest/vignettes/PortfolioBacktest.html">vignette</a> to get started.</p>
<p><img src="/post/2019-07-19-June-Top40_files/portfolioBacktest.png" height = "400" width="600"></p>
<h3 id="genomics">Genomics</h3>
<p><a href="https://cran.r-project.org/package=jackalope">jackalope</a> v0.1.2: Provides functions to simulate variants from reference genomes and read from both <a href="https://www.illumina.com/">Illumina</a> and <a href="https://www.pacb.com/">Pacific Biosciences (PacBio)</a> platforms. Simulating Illumina sequencing is based on ART by <a href="doi:10.1093/bioinformatics/btr708">Huang et al. (2012)</a>. PacBio sequencing simulation is based on SimLoRD by <a href="doi:10.1093/bioinformatics/btw286">Stöcker et al. (2016)</a>. There is an <a href="https://cran.r-project.org/web/packages/jackalope/vignettes/jackalope-intro.html">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/jackalope/vignettes/sub-models.html">Models of nucleotide substitution</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/jacalope.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=Patterns">Patterns</a> v1.0: Implements tools for deciphering biological networks with patterned heterogeneous measurements that enables joint modeling of genes and proteins. There is an <a href="https://cran.r-project.org/web/packages/Patterns/vignettes/IntroPatterns.html">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/Patterns/vignettes/ExampleCLL.html">Network Inference</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/Patterns.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=subgxe">subgxe</a> v0.9.0: Implements functions to combine multiple GWAS using gene environment interactions and p-value assisted subset testing (PASTA), as described in <a href="doi:10.1159/000496867">Yu et al. (2019)</a>. Get started with the <a href="https://cran.r-project.org/web/packages/subgxe/vignettes/subgxe.pdf">vignette</a>.</p>
<h3 id="marketing">Marketing</h3>
<p><a href="https://cran.r-project.org/package=mmetrics">mmetrics</a> v0.2.0: Provides a mechanism for easy computation of marketing metrics. Default metrics include Click Through Rate, Conversion Rate, and Cost Per Click, but you can define your own metrics easily. See the <a href="https://cran.r-project.org/web/packages/mmetrics/vignettes/introduction.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=promotionImpact">promotionImpact</a> v0.1.2: Provides functions to analyze and measure the promotion effectiveness on a given target variable (e.g., daily sales). Effects of these variables controlled for trend/periodicity/structural change using <code>prophet</code> <a href="doi:10.7287/peerj.preprints.3190v2">Taylor and Letham (2017)</a>. See the <a href="https://github.com/ncsoft/promotionImpact">GitHub</a> page for information.</p>
<p><img src="/post/2019-07-19-June-Top40_files/promotionImpact.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=uplifteval">uplifteval</a> Provides a variety of plots and metrics to evaluate uplift models including the R <a href="https://CRAN.R-project.org/package=uplift">uplift</a> package’s Qini metric and Qini plot. For background see <a href="https://pdfs.semanticscholar.org/147b/32f3d56566c8654a9999c5477dded233328e.pdf">Radcliffe (2007)</a>. There are vignettes on the <a href="https://cran.r-project.org/web/packages/uplifteval/vignettes/existing_packages.html"><code>plot_uplift_guelman()</code></a>, <a href="https://cran.r-project.org/web/packages/uplifteval/vignettes/plot_uplift.html"><code>plot_uplift()</code></a>, and <a href="https://cran.r-project.org/web/packages/uplifteval/vignettes/pylift.html"><code>pl_plot()</code></a> functions.</p>
<p><img src="/post/2019-07-19-June-Top40_files/uplifteval.png" height = "200" width="400"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=archetypal">archetypal</a> v1.0.0: Provides functions to perform archetypal analysis by using a convex hull approximation. See <a href="doi:10.1016/j.neucom.2011.06.033">Morup and Hansen (2012)</a>, <a href="doi:10.1287/moor.10.2.180">Hochbaum and Shmoys (1985)</a>, <a href="doi:10.1145/355759.355768">Eddy (1977)</a>, <a href="doi:10.1145/235815.235821">Barber et al. (1996)</a>, and <a href="doi:10.2139/ssrn.3043076">Christopoulos (2016)</a> for background information, and the <a href="https://cran.r-project.org/web/packages/archetypal/vignettes/archetypal.html">vignette</a> for an introduction.</p>
<p><img src="/post/2019-07-19-June-Top40_files/archetypal.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=googleCloudVisionR">googleCloudVisionR</a> v0.1.0: Provides access to the <a href="https://cloud.google.com/vision/">Google Cloud Vision</a> API in R. It is part of the <a href="https://cloudyr.github.io/">cloudyr</a> project.</p>
<p><a href="https://cran.r-project.org/package=modelDown">modelDown</a> v1.0.1: Implements a website generator with HTML summaries for predictive models. This package uses <a href="https://cran.r-project.org/package=DALEX">DALEX</a> explainers to describe global model behavior. See the <a href="https://github.com/MI2DataLab/modelDown">GitHub page</a> for getting started information and examples.</p>
<p><img src="/post/2019-07-19-June-Top40_files/modelDown.png" height = "400" width="600"></p>
<h3 id="science-and-medicine">Science and Medicine</h3>
<p><a href="https://cran.r-project.org/package=iCARH">iCARH</a> v2.0.0: Implements the integrative conditional autoregressive horseshoe model discussed in <a href="arXiv:1801.07767">Jendoubi and Ebbels (2018)</a>. The <a href="https://cran.r-project.org/web/packages/iCARH/vignettes/example.html">vignette</a> provides an example.</p>
<p><a href="https://cran.r-project.org/package=justifier">justifier</a> v0.1.0: Implements a <a href="https:yaml.org">YAML</a>-based standard for documenting justifications, such as for decisions taken during the planning, execution, and analysis of a study or during the development of a behavior change intervention. See <a href="doi:10.17605/osf.io/ndxha">Marques and Peters (2019)</a> for background. There is an <a href="https://cran.r-project.org/web/packages/justifier/vignettes/general-introduction-to-justifier.html">Introduction</a> and vignettes on using <code>justifier</code> in <a href="https://cran.r-project.org/web/packages/justifier/vignettes/justifier-in-intervention-development.html">behavior change intervention</a> and in <a href="https://cran.r-project.org/web/packages/justifier/vignettes/justifier-in-study-design.html">study design</a>.</p>
<p><a href="https://cran.r-project.org/package=replicateBE">replicateBE</a> v1.0.9: Implements comparative bioavailability-calculations for the EMA’s <a href="https://link.springer.com/article/10.1007/s11095-011-0651-y">Average Bioequivalence</a> with Expanding Limits (ABEL), and includes <code>Method A</code> and <code>Method B</code> detection of outliers. See the <a href="https://cran.r-project.org/web/packages/replicateBE/vignettes/vignette.html">vignette</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/replicateBE.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=StratifiedMedicine">StratefiedMedicine</a> v0.1.0: Provides analytic and visualization tools to aid in stratified and personalized medicine. Stratified medicine aims to find subgroups of patients with similar treatment effects, while personalized medicine aims to understand treatment effects at the individual level. There is an <a href="https://cran.r-project.org/web/packages/StratifiedMedicine/vignettes/SM_PRISM.html">Introduction</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/StratefiedMedicine.png" height = "400" width="600"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=cpsurvsim">cpsurvsim</a> v1.1.0: Provides functions to simulate time-to-event data with type I right censoring using two methods: the inverse CDF method and a proposed memoryless method. See <a href="https://www.demogr.mpg.de/papers/technicalreports/tr-2010-003.pdf">Rainer Walke (2010)</a> for background and the <a href="https://cran.r-project.org/web/packages/cpsurvsim/vignettes/cpsurvsim-vignette.html">vignette</a> for the math.</p>
<p><a href="https://cran.r-project.org/package=durmod">durmod</a> v1.1: Provides functions to estimate piecewise constant mixed proportional hazard competing risk models as described in <a href="doi:10.1016/j.jeconom.2007.01.015">Gaure et al. (2007)</a>, <a href="doi:10.2307/1911491">Heckman and Singer (1984)</a>, and <a href="doi:10.1214/aos/1176346059">Lindsay (1983)</a>. The <a href="https://cran.r-project.org/web/packages/durmod/vignettes/whatmph.pdf">vignette</a> provides examples.</p>
<p><a href="https://cran.r-project.org/package=kernelPSI">kernelPSI</a> v1.0.0: Implements post-selection inference strategies for kernel selection, as described in <a href="http://proceedings.mlr.press/v97/slim19a/slim19a.pdf">Slim et al. (2019)</a>. See the <a href="https://cran.r-project.org/web/packages/kernelPSI/vignettes/kernelPSI.html">vignette</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=missSBM">missSBM</a> v0.2.0: Provides methods for handling missing data in stochastic block models. See <a href="doi:10.1080/01621459.2018.1562934">Tabouy et al. (2019)</a> for background and the <a href="https://cran.r-project.org/web/packages/missSBM/vignettes/case_study_war_networks.html">vignette</a> for a case study.</p>
<p><img src="/post/2019-07-19-June-Top40_files/missSBM.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=RandomCoefficients">RandomCoefficients</a> v0.0.2: Implement adaptive estimation of the joint density linear model where the coefficients, intercept, and slopes are random and independent from regressors. See <a href="arXiv:1905.06584">Gaillac and Gautier (2019)</a> for background information and the <a href="https://cran.r-project.org/web/packages/RandomCoefficients/vignettes/RandomCoefficients.pdf">vignette</a> for the theory and examples.</p>
<p><img src="/post/2019-07-19-June-Top40_files/RandomCoefficients.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=spatialfusion">spatialfusion</a> v0.6: Provides functions for multivariate modelling of geostatistical (point), lattice (areal), and point pattern data in a unifying spatial fusion framework. Details are given in <a href="arXiv:1906.00364">Wang and Furrer (2019)</a>. Model inference is done using either <a href="https://mc-stan.org/">Stan</a> or <a href="http://www.r-inla.org">INLA</a>. See the <a href="https://cran.r-project.org/web/packages/spatialfusion/vignettes/spatialfusion_vignette.pdf">vignette</a> for examples.</p>
<p><img src="/post/2019-07-19-June-Top40_files/spatialfusion.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=ui">ui</a> v0.1.0: Implements functions to derive uncertainty intervals for (i) regression (linear and probit) parameters when outcomes are not missing at random (see <a href="doi:10.1007/s00362-014-0610-x">Genbaeck et al. (2015)</a> and <a href="doi:10.1007/s10433-017-0448-x">Genbaeck et al. (2018)</a>), and (ii) double robust and outcome regression estimators of average causal effects with possibly unobserved confounding (see <a href="doi:10.1111/biom.13001">Genbaeck and de Luna (2018)</a>).</p>
<p><a href="https://cran.r-project.org/package=varclust">varclust</a> v0.9.4: Provides functions to cluster quantitative variables, assuming that clusters lie in low-dimensional subspaces. Segmentation of variables, number of clusters, and their dimensions are selected based on BIC. There is a brief <a href="https://cran.r-project.org/web/packages/varclust/vignettes/varclustTutorial.html">tutorial</a>.</p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=bvartools">bvartools</a> v0.0.1: Implements some common functions used for Bayesian inference for mulitvariate time series models. There is an <a href="https://cran.r-project.org/web/packages/bvartools/vignettes/bvartools.html">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/bvartools/vignettes/bsvar.html">Bayesian Structural Vector Autoregression</a>, <a href="https://cran.r-project.org/web/packages/bvartools/vignettes/bvec.html">Bayesian Error Correlation</a>, and <a href="https://cran.r-project.org/web/packages/bvartools/vignettes/ssvs.html">Stochastic Search Variable Selection</a>.</p>
<p><a href="https://cran.r-project.org/package=wwntests">wwwntests</a> v1.0.0: Provides an array of white noise hypothesis tests for functional data and related visualizations. Methods are described in <a href="doi:10.1016/j.jmva.2017.08.004">Kokoszka et al. (2017)</a>, <a href="doi:10.1016/j.ecosta.2019.01.003">Characiejus and Rice (2019)</a>, and <a href="doi:10.1198/016214507000001111">Gabrys and Kokoszka (2007)</a>. See the <a href="https://cran.r-project.org/web/packages/wwntests/vignettes/wwntests_vignette.html">vignette</a> for details.</p>
<p><img src="/post/2019-07-19-June-Top40_files/wwwntests.png" height = "400" width="600"></p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=gargle">gargle</a> v0.3.0: Provides utilities for working with <a href="https://developers.google.com/apis-explorer">Google APIs</a>, including functions and classes for handling common credential types and for preparing, executing, and processing HTTP requests. There are vignettes on <a href="https://cran.r-project.org/web/packages/gargle/vignettes/gargle-auth-in-client-package.html">authentication</a>, <a href="https://cran.r-project.org/web/packages/gargle/vignettes/get-api-credentials.html">API Credentials</a>, <a href="https://cran.r-project.org/web/packages/gargle/vignettes/how-gargle-gets-tokens.html">tokens</a>, and <a href="https://cran.r-project.org/web/packages/gargle/vignettes/request-helper-functions.html">helper functions</a>.</p>
<p><a href="https://cran.r-project.org/package=git2rdata">git2rdata</a> v0.1: Provides functions to store and retrieve data frames in a Git repository, making versioning of <code>data.frame</code>s easy and efficient using git repositories. There is a <a href="https://cran.r-project.org/web/packages/git2rdata/vignettes/plain_text.html">Getting Started Guide</a> and vignettes on <a href="https://cran.r-project.org/web/packages/git2rdata/vignettes/efficiency.html">Efficiency</a>, <a href="https://cran.r-project.org/web/packages/git2rdata/vignettes/version_control.html">Optimizing Storage</a>, and a <a href="https://cran.r-project.org/web/packages/git2rdata/vignettes/workflow.html">Suggested Workflow</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/git2rdata.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=metapost">metapost</a> v1.0-6: Provides an interface to the <a href="http://www.tug.org/docs/metapost/mpman.pdf">MetaPost</a> programming language. There are functions to generate an R description of a MetaPost curve, functions to generate MetaPost code from an R description, functions to process MetaPost code, and functions to read solved MetaPost paths back into R. Look <a href="https://stattech.wordpress.fos.auckland.ac.nz/2018/12/03/2018-12-metapost-three-ways/">here</a> for different approaches for communicating between R and MetaPost.</p>
<p><a href="https://cran.r-project.org/package=rless">rless</a> Provides CSS preprocessor features using the <a href="http://lesscss.org/">LESS</a> language, a CSS extension giving options to use variables, functions, or using operators while creating styles. See the <a href="https://cran.r-project.org/web/packages/rless/vignettes/basic-h">vignette</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=rock">rock</a> v0.0.1: Implements the Reproducible Open Coding Kit, which was developed to facilitate reproducible and open coding, specifically geared towards qualitative research methods. See the <a href="https://cran.r-project.org/web/packages/rock/vignettes/introduction_to_rock.html">Introduction</a> for details.</p>
<p><a href="https://cran.r-project.org/package=tidyrules">tidyrules</a> v0.1.0: Provides functions to convert a text-based summary of rule-based models to a tidy data frame (where each row represents a rule), with related metrics such as support, confidence, and lift. See the <a href="https://cran.r-project.org/web/packages/tidyrules/vignettes/tidyrules_vignette.html">vignette</a> for examples.</p>
<p><img src="/post/2019-07-19-June-Top40_files/tidyrules.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=tsibbledata">tsibbledata</a> v0.1.0: Provides diverse data sets in the <code>tsibble</code> data structure, which are useful for learning and demonstrating how tidy temporal data can tidied, visualized, and forecasted. See <a href="https://cran.r-project.org/web/packages/tsibbledata/readme/README.html">README</a> for an example.</p>
<p><img src="/post/2019-07-19-June-Top40_files/tsibbledata.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=websocket">websocket</a> v1.0.0: Implements a <a href="https://en.wikipedia.org/wiki/WebSocket">WebSocket</a> client interface for R. WebSocket is a protocol for low-overhead real-time communication. There is an <a href="https://cran.r-project.org/web/packages/websocket/vignettes/overview.html">Overview</a> vignette.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=basetheme">basetheme</a> v0.1.1: Provides functions to create and select graphical themes for the base plotting system. See <a href="https://cran.r-project.org/web/packages/basetheme/readme/README.html">README</a> for the themes.</p>
<p><img src="/post/2019-07-19-June-Top40_files/basethemes.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=condvis2">condvis2</a> v0.1.0: Extends the <a href="https://cran.r-project.org/package=condvis">condvis</a> package and Shiny app with interactive displays for conditional visualization of models, data, and density functions. See <a href="doi:10.18637/jss.v081.i05">O’Connell et al. (2017)</a> for the background. There is an <a href="https://cran.r-project.org/web/packages/condvis2/vignettes/introduction.html">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/condvis2/vignettes/Keras.html">Exploring keras models</a> and <a href="https://cran.r-project.org/web/packages/condvis2/vignettes/mclust.html">Exploring model-based clustering</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/condvis2.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=nomnoml">nomnoml</a> v0.1.0: Implements a tool for drawing <a href="https://en.wikipedia.org/wiki/Unified_Modeling_Language">UML</a> diagrams based on a simple syntax. See the <a href="https://cran.r-project.org/web/packages/nomnoml/readme/README.html">README</a> for examples, and look <a href="http://www.nomnoml.com/">here</a> for a demo.</p>
<p><img src="/post/2019-07-19-June-Top40_files/nomnoml.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=ormPlot">ormPlot</a> v0.3.2: Extends the <a href="https://cran.r-project.org/package=rms"><code>rms</code></a> Regression Modeling Strategies package that facilitates plotting ordinal regression model predictions together with confidence intervals for each dependent variable level. The <a href="https://cran.r-project.org/web/packages/ormPlot/vignettes/ormPlot.htm">vignette</a> provides examples.</p>
<p><img src="/post/2019-07-19-June-Top40_files/ormPlot.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=sugarbag">sugarbag</a> v0.1.0: Provides functions to create a hexagon tilegram from spatial polygons. Developed to aid visualization and analysis of spatial distributions across Australia, which can be challenging due to the concentration of the population on the coast and the wide open interior of the country. There is a vignette pointing to <a href="https://cran.r-project.org/web/packages/sugarbag/vignettes/abs-data.html">ABS Data</a> and another developing an example for <a href="https://cran.r-project.org/web/packages/sugarbag/vignettes/tasmania.html">Tasmania</a>.</p>
<p><img src="/post/2019-07-19-June-Top40_files/sugarbag.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2019/07/24/june-2019-top-40-r-packages/';</script>
Three Strategies for Working with Big Data in R
https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/
Wed, 17 Jul 2019 00:00:00 +0000https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/
<p>For many R users, it’s obvious <em>why</em> you’d want to use R with big data, but not so obvious how. In fact, many people (wrongly) believe that R just doesn’t work very well for big data.</p>
<p>In this article, I’ll share three strategies for thinking about how to use big data in R, as well as some examples of how to execute each of them.</p>
<p>By default R runs only on data that can fit into your computer’s memory. Hardware advances have made this less of a problem for many users since these days, most laptops come with at least 4-8Gb of memory, and you can get instances on any major cloud provider with terabytes of RAM. But this is still a real problem for almost any data set that could really be called <em>big data</em>.</p>
<p>The fact that R runs on in-memory data is the biggest issue that you face when trying to use Big Data in R. The data has to fit into the RAM on your machine, and it’s not even 1:1. Because you’re actually <em>doing</em> something with the data, a good rule of thumb is that your machine needs 2-3x the RAM of the size of your data.</p>
<p>An other big issue for doing Big Data work in R is that data transfer speeds are extremely slow relative to the time it takes to actually do data processing once the data has transferred. For example, the time it takes to make a call over the internet from San Francisco to New York City takes over 4 times longer than reading from a standard hard drive and over 200 times longer than reading from a solid state hard drive.<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a> This is an especially big problem early in developing a model or analytical project, when data might have to be pulled repeatedly.</p>
<p>Nevertheless, there are effective methods for working with big data in R. In this post, I’ll share three strategies. And, it important to note that these strategies aren’t mutually exclusive – they can be combined as you see fit!</p>
<div id="strategy-1-sample-and-model" class="section level2">
<h2>Strategy 1: Sample and Model</h2>
<p>To sample and model, you downsample your data to a size that can be easily downloaded in its entirety and create a model on the sample. Downsampling to thousands – or even hundreds of thousands – of data points can make model runtimes feasible while also maintaining statistical validity.<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a></p>
<p>If maintaining class balance is necessary (or one class needs to be over/under-sampled), it’s reasonably simple stratify the data set during sampling.</p>
<div class="figure">
<img src="/post/2019-07-01-3-big-data-paradigms-for-r_files/sample_model.png" alt="Illustration of Sample and Model" />
<p class="caption">Illustration of Sample and Model</p>
</div>
<div id="advantages" class="section level3">
<h3>Advantages</h3>
<ul>
<li><strong>Speed</strong> Relative to working on your entire data set, working on just a sample can drastically decrease run times and increase iteration speed.</li>
<li><strong>Prototyping</strong> Even if you’ll eventually have to run your model on the entire data set, this can be a good way to refine hyperparameters and do feature engineering for your model.</li>
<li><strong>Packages</strong> Since you’re working on a normal in-memory data set, you can use all your favorite R packages.</li>
</ul>
</div>
<div id="disadvantages" class="section level3">
<h3>Disadvantages</h3>
<ul>
<li><strong>Sampling</strong> Downsampling isn’t terribly difficult, but does need to be done with care to ensure that the sample is valid and that you’ve pulled enough points from the original data set.</li>
<li><strong>Scaling</strong> If you’re using sample and model to prototype something that will later be run on the full data set, you’ll need to have a strategy (such as <a href="#push-compute">pushing compute to the data</a>) for scaling your prototype version back to the full data set.</li>
<li><strong>Totals</strong> Business Intelligence (BI) tasks frequently answer questions about totals, like the count of all sales in a month. One of the other strategies is usually a better fit in this case.</li>
</ul>
</div>
</div>
<div id="strategy-2-chunk-and-pull" class="section level2">
<h2>Strategy 2: Chunk and Pull</h2>
<p>In this strategy, the data is chunked into separable units and each chunk is pulled separately and operated on serially, in parallel, or after recombining. This strategy is conceptually similar to the <a href="https://en.wikipedia.org/wiki/MapReduce">MapReduce</a> algorithm. Depending on the task at hand, the chunks might be time periods, geographic units, or logical like separate businesses, departments, products, or customer segments.</p>
<div class="figure">
<img src="/post/2019-07-01-3-big-data-paradigms-for-r_files/chunk_pull.png" alt="Chunk and Pull Illustration" />
<p class="caption">Chunk and Pull Illustration</p>
</div>
<div id="advantages-1" class="section level3">
<h3>Advantages</h3>
<ul>
<li><strong>Full data set</strong> The entire data set gets used.</li>
<li><strong>Parallelization</strong> If the chunks are run separately, the problem is easy to treat as <a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel">embarassingly parallel</a> and make use of parallelization to speed runtimes.</li>
</ul>
</div>
<div id="disadvantages-1" class="section level3">
<h3>Disadvantages</h3>
<ul>
<li><strong>Need Chunks</strong> Your data needs to have separable chunks for chunk and pull to be appropriate.</li>
<li><strong>Pull All Data</strong> Eventually have to pull in all data, which may still be very time and memory intensive.</li>
<li><strong>Stale Data</strong> The data may require periodic refreshes from the database to stay up-to-date since you’re saving a version on your local machine.</li>
</ul>
</div>
</div>
<div id="push-compute" class="section level2">
<h2>Strategy 3: Push Compute to Data</h2>
<p>In this strategy, the data is compressed on the database, and only the compressed data set is moved out of the database into R. It is often possible to obtain significant speedups simply by doing summarization or filtering in the database before pulling the data into R.</p>
<p>Sometimes, more complex operations are also possible, including computing histogram and raster maps with <a href="https://db.rstudio.com/dbplot/"><code>dbplot</code></a>, building a model with <a href="https://cran.r-project.org/web/packages/modeldb/index.html"><code>modeldb</code></a>, and generating predictions from machine learning models with <a href="https://db.rstudio.com/tidypredict/"><code>tidypredict</code></a>.</p>
<div class="figure">
<img src="/post/2019-07-01-3-big-data-paradigms-for-r_files/push_data.png" alt="Push Compute to Data Illustration" />
<p class="caption">Push Compute to Data Illustration</p>
</div>
<div id="advantages-2" class="section level3">
<h3>Advantages</h3>
<ul>
<li><strong>Use the Database</strong> Takes advantage of what databases are often best at: quickly summarizing and filtering data based on a query.</li>
<li><strong>More Info, Less Transfer</strong> By compressing before pulling data back to R, the entire data set gets used, but transfer times are far less than moving the entire data set.</li>
</ul>
</div>
<div id="disadvantages-2" class="section level3">
<h3>Disadvantages</h3>
<ul>
<li><strong>Database Operations</strong> Depending on what database you’re using, some operations might not be supported.</li>
<li><strong>Database Speed</strong> In some contexts, the limiting factor for data analysis is the speed of the database itself, and so pushing more work onto the database is the last thing analysts want to do.</li>
</ul>
</div>
</div>
<div id="an-example" class="section level2">
<h2>An Example</h2>
<p>I’ve preloaded the <code>flights</code> data set from the <a href="https://cran.r-project.org/web/packages/nycflights13/index.html"><code>nycflights13</code></a> package into a PostgreSQL database, which I’ll use for these examples.</p>
<p>Let’s start by connecting to the database. I’m using a config file here to connect to the database, one of RStudio’s <a href="https://db.rstudio.com/best-practices/managing-credentials/">recommended database connection methods</a>:</p>
<pre class="r"><code>library(DBI)
library(dplyr)
library(ggplot2)
db <- DBI::dbConnect(
odbc::odbc(),
Driver = config$driver,
Server = config$server,
Port = config$port,
Database = config$database,
UID = config$uid,
PWD = config$pwd,
BoolsAsChar = ""
)</code></pre>
<p>The <a href="https://dplyr.tidyverse.org/"><code>dplyr</code></a> package is a great tool for interacting with databases, since I can write normal R code that is translated into SQL on the backend. I could also use the <a href="https://db.rstudio.com/dbi/"><code>DBI</code></a> package to send queries directly, or a <a href="https://bookdown.org/yihui/rmarkdown/language-engines.html#sql">SQL chunk</a> in the R Markdown document.</p>
<pre class="r"><code>df <- dplyr::tbl(db, "flights")
tally(df)</code></pre>
<pre><code>## # A tibble: 1 x 1
## n
## <int>
## 1 336776</code></pre>
<p>With only a few hundred thousand rows, this example isn’t close to the kind of big data that really requires a Big Data strategy, but it’s rich enough to demonstrate on.</p>
</div>
<div id="sample-and-model" class="section level2">
<h2>Sample and Model</h2>
<p>Let’s say I want to model whether flights will be delayed or not. This is a great problem to sample and model.</p>
<p>Let’s start with some minor cleaning of the data</p>
<pre class="r"><code># Create is_delayed column in database
df <- df %>%
mutate(
# Create is_delayed column
is_delayed = arr_delay > 0,
# Get just hour (currently formatted so 6 pm = 1800)
hour = sched_dep_time / 100
) %>%
# Remove small carriers that make modeling difficult
filter(!is.na(is_delayed) & !carrier %in% c("OO", "HA"))
df %>% count(is_delayed)</code></pre>
<pre><code>## # A tibble: 2 x 2
## is_delayed n
## <lgl> <int>
## 1 FALSE 194078
## 2 TRUE 132897</code></pre>
<p>These classes are reasonably well balanced, but since I’m going to be using logistic regression, I’m going to load a perfectly balanced sample of 40,000 data points.</p>
<p>For most databases, random sampling methods don’t work super smoothly with R, so I can’t use <code>dplyr::sample_n</code> or <code>dplyr::sample_frac</code>. I’ll have to be a little more manual.</p>
<pre class="r"><code>set.seed(1028)
# Create a modeling dataset
df_mod <- df %>%
# Within each class
group_by(is_delayed) %>%
# Assign random rank (using random and row_number from postgres)
mutate(x = random() %>% row_number()) %>%
ungroup()
# Take first 20K for each class for training set
df_train <- df_mod %>%
filter(x <= 20000) %>%
collect()
# Take next 5K for test set
df_test <- df_mod %>%
filter(x > 20000 & x <= 25000) %>%
collect()
# Double check I sampled right
count(df_train, is_delayed)
count(df_test, is_delayed)</code></pre>
<pre><code>## # A tibble: 2 x 2
## is_delayed n
## <lgl> <int>
## 1 FALSE 20000
## 2 TRUE 20000</code></pre>
<pre><code>## # A tibble: 2 x 2
## is_delayed n
## <lgl> <int>
## 1 FALSE 5000
## 2 TRUE 5000</code></pre>
<p>Now let’s build a model – let’s see if we can predict whether there will be a delay or not by the combination of the carrier, the month of the flight, and the time of day of the flight.</p>
<pre class="r"><code>mod <- glm(is_delayed ~ carrier +
as.character(month) +
poly(sched_dep_time, 3),
family = "binomial",
data = df_train)
# Out-of-Sample AUROC
df_test$pred <- predict(mod, newdata = df_test)
auc <- suppressMessages(pROC::auc(df_test$is_delayed, df_test$pred))
auc</code></pre>
<pre><code>## Area under the curve: 0.6425</code></pre>
<p>As you can see, this is not a great model and any modelers reading this will have many ideas of how to improve what I’ve done. But that wasn’t the point!</p>
<p>I built a model on a small subset of a big data set. Including sampling time, this took my laptop less than 10 seconds to run, making it easy to iterate quickly as I want to improve the model. After I’m happy with this model, I could pull down a larger sample or even the entire data set if it’s feasible, or do something with the model from the sample.</p>
</div>
<div id="chunk-and-pull" class="section level2">
<h2>Chunk and Pull</h2>
<p>In this case, I want to build another model of on-time arrival, but I want to do it per-carrier. This is exactly the kind of use case that’s ideal for chunk and pull. I’m going to separately pull the data in by carrier and run the model on each carrier’s data.</p>
<p>I’m going to start by just getting the complete list of the carriers.</p>
<pre class="r"><code># Get all unique carriers
carriers <- df %>%
select(carrier) %>%
distinct() %>%
pull(carrier)</code></pre>
<p>Now, I’ll write a function that</p>
<ul>
<li>takes the name of a carrier as input</li>
<li>pulls the data for that carrier into R</li>
<li>splits the data into training and test</li>
<li>trains the model</li>
<li>outputs the out-of-sample AUROC (a common measure of model quality)</li>
</ul>
<pre class="r"><code>carrier_model <- function(carrier_name) {
# Pull a chunk of data
df_mod <- df %>%
dplyr::filter(carrier == carrier_name) %>%
collect()
# Split into training and test
split <- df_mod %>%
rsample::initial_split(prop = 0.9, strata = "is_delayed") %>%
suppressMessages()
# Get training data
df_train <- split %>% rsample::training()
# Train model
mod <- glm(is_delayed ~ as.character(month) + poly(sched_dep_time, 3),
family = "binomial",
data = df_train)
# Get out-of-sample AUROC
df_test <- split %>% rsample::testing()
df_test$pred <- predict(mod, newdata = df_test)
suppressMessages(auc <- pROC::auc(df_test$is_delayed ~ df_test$pred))
auc
}</code></pre>
<p>Now, I’m going to actually run the carrier model function across each of the carriers. This code runs pretty quickly, and so I don’t think the overhead of parallelization would be worth it. But if I wanted to, I would replace the <code>lapply</code> call below with a parallel backend.<a href="#fn3" class="footnote-ref" id="fnref3"><sup>3</sup></a></p>
<pre class="r"><code>set.seed(98765)
mods <- lapply(carriers, carrier_model) %>%
suppressMessages()
names(mods) <- carriers</code></pre>
<p>Let’s look at the results.</p>
<pre class="r"><code>mods</code></pre>
<pre><code>## $UA
## Area under the curve: 0.6408
##
## $AA
## Area under the curve: 0.6041
##
## $B6
## Area under the curve: 0.6475
##
## $DL
## Area under the curve: 0.6162
##
## $EV
## Area under the curve: 0.6419
##
## $MQ
## Area under the curve: 0.5973
##
## $US
## Area under the curve: 0.6096
##
## $WN
## Area under the curve: 0.6968
##
## $VX
## Area under the curve: 0.6969
##
## $FL
## Area under the curve: 0.6347
##
## $AS
## Area under the curve: 0.6906
##
## $`9E`
## Area under the curve: 0.6071
##
## $F9
## Area under the curve: 0.625
##
## $YV
## Area under the curve: 0.7029</code></pre>
<p>So these models (again) are a little better than random chance. The point was that we utilized the chunk and pull strategy to pull the data separately by logical units and building a model on each chunk.</p>
</div>
<div id="push-compute-to-the-data" class="section level2">
<h2>Push Compute to the Data</h2>
<p>In this case, I’m doing a pretty simple BI task - plotting the proportion of flights that are late by the hour of departure and the airline.</p>
<p>Just by way of comparison, let’s run this first the naive way – pulling all the data to my system and then doing my data manipulation to plot.</p>
<pre class="r"><code>system.time(
df_plot <- df %>%
collect() %>%
# Change is_delayed to numeric
mutate(is_delayed = ifelse(is_delayed, 1, 0)) %>%
group_by(carrier, sched_dep_time) %>%
# Get proportion per carrier-time
summarize(delay_pct = mean(is_delayed, na.rm = TRUE)) %>%
ungroup() %>%
# Change string times into actual times
mutate(sched_dep_time = stringr::str_pad(sched_dep_time, 4, "left", "0") %>%
strptime("%H%M") %>%
as.POSIXct())) -> timing1</code></pre>
<p>Now that wasn’t too bad, just <code>2.366</code> seconds on my laptop.</p>
<p>But let’s see how much of a speedup we can get from chunk and pull. The conceptual change here is significant - I’m doing as much work as possible on the Postgres server now instead of locally. But using <code>dplyr</code> means that the code change is minimal. The only difference in the code is that the <code>collect</code> call got moved down by a few lines (to below <code>ungroup()</code>).</p>
<pre class="r"><code>system.time(
df_plot <- df %>%
# Change is_delayed to numeric
mutate(is_delayed = ifelse(is_delayed, 1, 0)) %>%
group_by(carrier, sched_dep_time) %>%
# Get proportion per carrier-time
summarize(delay_pct = mean(is_delayed, na.rm = TRUE)) %>%
ungroup() %>%
collect() %>%
# Change string times into actual times
mutate(sched_dep_time = stringr::str_pad(sched_dep_time, 4, "left", "0") %>%
strptime("%H%M") %>%
as.POSIXct())) -> timing2</code></pre>
<p>It might have taken you the same time to read this code as the last chunk, but this took only <code>0.269</code> seconds to run, almost an order of magnitude faster!<a href="#fn4" class="footnote-ref" id="fnref4"><sup>4</sup></a> That’s pretty good for just moving one line of code.</p>
<p>Now that we’ve done a speed comparison, we can create the nice plot we all came for.</p>
<pre class="r"><code>df_plot %>%
mutate(carrier = paste0("Carrier: ", carrier)) %>%
ggplot(aes(x = sched_dep_time, y = delay_pct)) +
geom_line() +
facet_wrap("carrier") +
ylab("Proportion of Flights Delayed") +
xlab("Time of Day") +
scale_y_continuous(labels = scales::percent) +
scale_x_datetime(date_breaks = "4 hours",
date_labels = "%H")</code></pre>
<p><img src="/post/2019-07-01-3-big-data-paradigms-for-r_files/figure-html/unnamed-chunk-17-1.png" width="672" /></p>
<p>It looks to me like flights later in the day might be a little more likely to experience delays, but that’s a question for another blog post.</p>
</div>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p><a href="https://blog.codinghorror.com/the-infinite-space-between-words/" class="uri">https://blog.codinghorror.com/the-infinite-space-between-words/</a><a href="#fnref1" class="footnote-back">↩</a></p></li>
<li id="fn2"><p>This isn’t just a general heuristic. You’ll probably remember that the error in many statistical processes is determined by a factor of <span class="math inline">\(\frac{1}{n^2}\)</span> for sample size <span class="math inline">\(n\)</span>, so a lot of the statistical power in your model is driven by adding the first few thousand observations compared to the final millions.<a href="#fnref2" class="footnote-back">↩</a></p></li>
<li id="fn3"><p>One of the biggest problems when parallelizing is dealing with random number generation, which you use here to make sure that your test/training splits are reproducible. It’s not an insurmountable problem, but requires some careful thought.<a href="#fnref3" class="footnote-back">↩</a></p></li>
<li id="fn4"><p>And lest you think the real difference here is offloading computation to a more powerful database, this Postgres instance is running on a container on my laptop, so it’s got exactly the same horsepower behind it.<a href="#fnref4" class="footnote-back">↩</a></p></li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/';</script>
Dividend Sleuthing with R
https://rviews.rstudio.com/2019/07/09/dividend-sleuthing-with-r/
Tue, 09 Jul 2019 00:00:00 +0000https://rviews.rstudio.com/2019/07/09/dividend-sleuthing-with-r/
<p>Welcome to a mid-summer edition of <a href="http://www.reproduciblefinance.com/">Reproducible Finance with R</a>. Today, we’ll explore the dividend histories of some stocks in the S&P 500. By way of history for all you young tech IPO and crypto investors out there: way back, a long time ago in the dark ages, companies used to take pains to generate free cash flow and then return some of that free cash to investors in the form of dividends. That hasn’t been very popular in the last 15 years, but now that it looks increasing likely that interest rates will never, ever, ever rise to “normal” levels again, dividend yields from those dinosaur companies should be an attractive source of cash for a while.</p>
<p>Let’s load up our packages.</p>
<pre class="r"><code>library(tidyverse)
library(tidyquant)
library(riingo)
riingo_set_token("your tiingo api key here")</code></pre>
<p>We are going to source our data from <code>tiingo</code> by way of the <code>riingo</code> package, the same as we did in this <a href="http://www.reproduciblefinance.com/2019/01/14/looking-back-on-last-year/">previous post</a>, but first we need the tickers from the S&P 500. Fortunately, the <code>tidyquant</code> package has this covered with the <code>tq_index()</code> function.</p>
<pre class="r"><code>sp_500 <-
tq_index("SP500")
sp_500 %>%
head()</code></pre>
<pre><code># A tibble: 6 x 5
symbol company weight sector shares_held
<chr> <chr> <dbl> <chr> <dbl>
1 MSFT Microsoft Corporation 0.0425 Information Techno… 85542230
2 AAPL Apple Inc. 0.0354 Information Techno… 48795028
3 AMZN Amazon.com Inc. 0.0327 Consumer Discretio… 4616564
4 FB Facebook Inc. Class A 0.0190 Communication Serv… 26820356
5 BRK.B Berkshire Hathaway Inc. Cl… 0.0169 Financials 21622312
6 JNJ Johnson & Johnson 0.0151 Health Care 29639160</code></pre>
<p>We want to <code>pull()</code> out the symbols, and we also want to make sure they are supported by <code>tiingo</code>. We can create a master list of supported tickers with the <code>supported_tickers()</code> function from <code>riingo</code>.</p>
<pre class="r"><code># heads up, there's ~ 79,000 tickers == 4mb of RAM of your machine
test_tickers <-
supported_tickers() %>%
select(ticker) %>%
pull()</code></pre>
<p>Let’s arrange the <code>sp_500</code> tickers by the <code>weight</code> column and then slice the top 30 for today’s illustrative purposes. We could easily extend this to the entire index by commenting out the <code>slice()</code> code and running this on all 505 tickers (caution: I did this to test the code flow and it works but it’s a big, RAM-intensive job.)</p>
<pre class="r"><code>tickers <-
sp_500 %>%
arrange(desc(weight)) %>%
# We'll run this on the top 30, easily extendable to whole 500
slice(1:30) %>%
filter(symbol %in% test_tickers) %>%
pull(symbol)</code></pre>
<p>Let’s import our data from <code>tiingo</code>, using the <code>riingo</code> package. Since we’re only halfway through 2019 and companies haven’t completed their annual dividend payments yet, let’s exclude this year and set <code>end_date = "2018-12-31"</code>.</p>
<pre class="r"><code>divs_from_riingo <-
tickers %>%
riingo_prices(start_date = "1990-01-01", end_date = "2018-12-31") %>%
arrange(ticker) %>%
mutate(date = ymd(date))
divs_from_riingo %>%
select(date, ticker, close, divCash) %>%
head()</code></pre>
<pre><code># A tibble: 6 x 4
date ticker close divCash
<date> <chr> <dbl> <dbl>
1 1990-01-02 AAPL 37.2 0
2 1990-01-03 AAPL 37.5 0
3 1990-01-04 AAPL 37.6 0
4 1990-01-05 AAPL 37.8 0
5 1990-01-08 AAPL 38 0
6 1990-01-09 AAPL 37.6 0</code></pre>
<p>Let’s take a look at the most recent dividend paid by each company, to search for any weird outliers. We don’t want any 0-dollar dividend dates, so we <code>filter(divCash > 0)</code>, and we get the most recent payment with <code>slice(n())</code>, which grabs the last row of each group.</p>
<pre class="r"><code>divs_from_riingo %>%
group_by(ticker) %>%
filter(divCash > 0) %>%
slice(n()) %>%
ggplot(aes(x = date, y = divCash, color = ticker)) +
geom_point() +
geom_label(aes(label = ticker)) +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "", y = "div/share", title = "2019 Divs: Top 20 SP 500 companies") +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5)) </code></pre>
<p><img src="/post/2019-07-08-dividend-sleuthing-with-r_files/figure-html/unnamed-chunk-6-1.png" width="672" /></p>
<p>A $500 dividend from Google back in 2014? A little, um, internet search engine’ing reveals that Google had a stock split in 2014 and issued that split as a dividend. That’s not quite what we want to capture today - in fact, we’re pretty much going to ignore splits and special dividends. For now, let’s adjust our filter to <code>filter(date > "2017-12-31" & divCash > 0)</code> and grab the last dividend paid in 2018.</p>
<pre class="r"><code>divs_from_riingo %>%
group_by(ticker) %>%
filter(date > "2017-12-31" & divCash > 0) %>%
slice(n()) %>%
ggplot(aes(x = date, y = divCash, color = ticker)) +
geom_point() +
geom_label(aes(label = ticker)) +
scale_y_continuous(labels = scales::dollar) +
scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "", y = "div/share", title = "2018 Divs: Top 20 SP 500 companies") +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5)) </code></pre>
<p><img src="/post/2019-07-08-dividend-sleuthing-with-r_files/figure-html/unnamed-chunk-7-1.png" width="672" /></p>
<p>Note, this is the absolute cash dividend payout. The dividend yield, the total annual cash dividend divided by the share price, might be more meaningful to us.</p>
<p>To get the total annual yield, we want to sum the total dividends in 2018 and divide by the closing price at, say, the first dividend date.</p>
<p>To get total dividends in a year, we first create a year column with <code>mutate(year = year(date))</code>, then <code>group_by(year, ticker)</code>, and finally make the calculation with <code>mutate(div_total = sum(divCash))</code>. From there, the yield is <code>mutate(div_yield = div_total/close)</code>.</p>
<pre class="r"><code>divs_from_riingo %>%
group_by(ticker) %>%
filter(date > "2017-12-31" & divCash > 0) %>%
mutate(year = year(date)) %>%
group_by(year, ticker) %>%
mutate(div_total = sum(divCash)) %>%
slice(1) %>%
mutate(div_yield = div_total/close) %>%
ggplot(aes(x = date, y = div_yield, color = ticker)) +
geom_point() +
geom_text(aes(label = ticker), vjust = 0, nudge_y = 0.002) +
scale_y_continuous(labels = scales::percent, breaks = scales::pretty_breaks(n = 10)) +
scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "", y = "yield", title = "2018 Div Yield: Top 30 SP 500 companies") +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5)) </code></pre>
<p><img src="/post/2019-07-08-dividend-sleuthing-with-r_files/figure-html/unnamed-chunk-8-1.png" width="672" /></p>
<p>Let’s nitpick this visualization and take issue with the fact that some of the labels are overlapping - just the type of small error that drives our end audience crazy. It is also a good opportunity to explore the <code>ggrepel</code> package. We can use the <code>geom_text_repel()</code> function, which will somehow “automatically” position our labels in a non-overlapping way.</p>
<pre class="r"><code>library(ggrepel)
divs_from_riingo %>%
group_by(ticker) %>%
filter(date > "2017-12-31" & divCash > 0) %>%
mutate(year = year(date)) %>%
group_by(year, ticker) %>%
mutate(div_total = sum(divCash)) %>%
slice(1) %>%
mutate(div_yield = div_total/close) %>%
ggplot(aes(x = date, y = div_yield, color = ticker)) +
geom_point() +
geom_text_repel(aes(label = ticker), vjust = 0, nudge_y = 0.002) +
scale_y_continuous(labels = scales::percent, breaks = scales::pretty_breaks(n = 10)) +
scale_x_date(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "", y = "yield", title = "2018 Divs: Top 20 SP 500 companies") +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5)) </code></pre>
<p><img src="/post/2019-07-08-dividend-sleuthing-with-r_files/figure-html/unnamed-chunk-9-1.png" width="672" /></p>
<p>We have a decent snapshot of these companies’ dividends in 2018. Let’s dig into their histories a bit. If you’re into this kind of thing, you might have heard of the Dividend Aristocrats or Dividend Kings or some other nomenclature that indicates a quality dividend-paying company. The core of these classifications is the consistency with which companies increase their dividend payouts, because those annual increases indicate a company that has strong free cash flow and believes that shareholder value is tied to the dividend. For examples of companies which don’t fit this description, check out every tech company that has IPO’d in the last decade. (Just kidding. Actually, I’m not kidding but now I guess I’m obligated to do a follow-up post on every tech company that has IPO’d in the last decade so we can dig into their free cash flow - stay tuned!)</p>
<p>Back to dividend consistency, we’re interested in how each company has behaved each year, and specifically scrutinizing whether the company increased its dividend from the previous year.</p>
<p>This is one of those fascinating areas of exploration that is quite easy to conceptualize and even describe in plain English, but turns out to require (at least for me) quite a bit of thought and white-boarding (I mean Goggling and Stack Overflow snooping) to implement with code. In English, we want to calculate the total dividend payout for each company each year, then count how many years in a row the company has increased that dividend consecutively up to today. Let’s get to it with some code.</p>
<p>We’ll start with a very similar code flow as above, except we want the whole history, no filtering to just 2018. Let’s also <code>arrange(ticker)</code> so we we can glance at how each ticker behaved.</p>
<pre class="r"><code>divs_from_riingo %>%
mutate(year = year(date)) %>%
#filter(year > "2017")
group_by(year, ticker) %>%
mutate(div_total = sum(divCash)) %>%
slice(1) %>%
arrange(ticker) %>%
select(div_total) %>%
head(10) </code></pre>
<pre><code># A tibble: 10 x 3
# Groups: year, ticker [10]
year ticker div_total
<dbl> <chr> <dbl>
1 1990 AAPL 0.45
2 1991 AAPL 0.48
3 1992 AAPL 0.48
4 1993 AAPL 0.48
5 1994 AAPL 0.48
6 1995 AAPL 0.48
7 1996 AAPL 0
8 1997 AAPL 0
9 1998 AAPL 0
10 1999 AAPL 0 </code></pre>
<p>Let’s save that as an object called <code>divs_total</code>.</p>
<pre class="r"><code>divs_total <-
divs_from_riingo %>%
mutate(year = year(date)) %>%
group_by(year, ticker) %>%
mutate(div_total = sum(divCash)) %>%
slice(1) %>%
arrange(ticker) %>%
select(div_total)</code></pre>
<p>The data now looks like we’re ready to get to work. But have a quick peek at the AAPL data above. Notice anything weird? AAPL paid a dividend of <code>$5.30</code> in 2012, <code>$11.80</code> in 2013, then <code>$7.28</code> in 2014, then around <code>$2.00</code> going forward. There was a stock split and we probably shouldn’t evaluate that as dividend decrease without adjusting for the split. We don’t have that stock split data here today, but if we were doing this in a more robust way, and maybe over the course of two posts, we would mash in stock split data and adjust the dividends accordingly. For now, we’ll just carry on using the <code>div_total</code> as the true dividend payout.</p>
<p>Now we want to find years of consecutive increase, up to present day.
Let’s use <code>mutate(div_increase = case_when(divCash > lag(divCash, 1) ~ 1, TRUE ~ 0))</code> to create a new column called <code>div_increase</code> that is a 1 if the dividend increased from the previous year and a 0 otherwise.</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
tail(10)</code></pre>
<pre><code># A tibble: 10 x 4
# Groups: ticker [1]
year ticker div_total div_increase
<dbl> <chr> <dbl> <dbl>
1 2009 XOM 1.66 1
2 2010 XOM 1.74 1
3 2011 XOM 1.85 1
4 2012 XOM 2.18 1
5 2013 XOM 2.46 1
6 2014 XOM 2.70 1
7 2015 XOM 2.88 1
8 2016 XOM 2.98 1
9 2017 XOM 3.06 1
10 2018 XOM 3.23 1</code></pre>
<p>We can see that XOM has a nice history of increasing its dividend.</p>
<p>For my brain, it’s easier to work through the next steps if we put 2018 as the first observation. Let’s use <code>arrange(desc(year))</code> to accomplish that.</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
arrange(desc(year)) %>%
arrange(ticker) %>%
slice(1:10)</code></pre>
<pre><code># A tibble: 282 x 4
# Groups: ticker [29]
year ticker div_total div_increase
<dbl> <chr> <dbl> <dbl>
1 2018 AAPL 2.82 1
2 2017 AAPL 2.46 1
3 2016 AAPL 2.23 1
4 2015 AAPL 2.03 0
5 2014 AAPL 7.28 0
6 2013 AAPL 11.8 1
7 2012 AAPL 5.3 1
8 2011 AAPL 0 0
9 2010 AAPL 0 0
10 2009 AAPL 0 0
# … with 272 more rows</code></pre>
<p>Here I sliced off the first 10 rows of each group, so I could glance with my human eyes and see if anything looked weird or plain wrong. Try filtering by <code>AMZN</code> or <code>FB</code> or <code>GOOG</code>, companies we know haven’t paid a dividend. Make sure we see all zeroes. How about, say, <code>MSFT</code>?</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
arrange(desc(year)) %>%
arrange(ticker) %>%
filter(ticker == "MSFT")</code></pre>
<pre><code># A tibble: 29 x 4
# Groups: ticker [1]
year ticker div_total div_increase
<dbl> <chr> <dbl> <dbl>
1 2018 MSFT 1.72 1
2 2017 MSFT 1.59 1
3 2016 MSFT 1.47 1
4 2015 MSFT 1.29 1
5 2014 MSFT 1.15 1
6 2013 MSFT 0.97 1
7 2012 MSFT 0.83 1
8 2011 MSFT 0.68 1
9 2010 MSFT 0.55 1
10 2009 MSFT 0.52 1
# … with 19 more rows</code></pre>
<p>Looks like a solid history going back to 2003. In 2004 <code>MSFT</code> issued a special dividend that then made 2005 look like a down year. Again, we’re not correcting for splits and special events so we’ll just let this be. Back to it.</p>
<p>Now we want to start in 2018 (this year) and count the number of consecutive dividend increases, or 1’s in the <code>div_increase</code> column, going back in time. This is the piece that I found most difficult to implement - how to detect if a certain behavior has occurred every year, and if we see it stop, we want to delete all the rows after is has stopped? I considered trying to condition on both increases and consecutive years, but finally discovered a combination of <code>slice()</code>, <code>seq_len()</code>, and <code>min(which(div_increase == 0))</code> that worked well.</p>
<p>Let’s take it piece-by-painstaking-piece. If we just <code>slice(which(div_increase == 0))</code>, we get every instance of a dividend increase equal to 0.</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
arrange(desc(year)) %>%
arrange(ticker) %>%
slice(which(div_increase == 0))</code></pre>
<pre><code># A tibble: 319 x 4
# Groups: ticker [29]
year ticker div_total div_increase
<dbl> <chr> <dbl> <dbl>
1 2015 AAPL 2.03 0
2 2014 AAPL 7.28 0
3 2011 AAPL 0 0
4 2010 AAPL 0 0
5 2009 AAPL 0 0
6 2008 AAPL 0 0
7 2007 AAPL 0 0
8 2006 AAPL 0 0
9 2005 AAPL 0 0
10 2004 AAPL 0 0
# … with 309 more rows</code></pre>
<p>If we use <code>slice(min(which(div_increase == 0)))</code>, we get the first instance, for each group, of a dividend increase of zero.</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
arrange(desc(year)) %>%
arrange(ticker) %>%
slice(min(which(div_increase == 0)))</code></pre>
<pre><code># A tibble: 29 x 4
# Groups: ticker [29]
year ticker div_total div_increase
<dbl> <chr> <dbl> <dbl>
1 2015 AAPL 2.03 0
2 2018 AMZN 0 0
3 2011 BA 1.68 0
4 2013 BAC 0.04 0
5 2014 C 0.04 0
6 2017 CMCSA 0.473 0
7 2010 CSCO 0 0
8 2005 CVX 1.75 0
9 2009 DIS 0.35 0
10 2018 FB 0 0
# … with 19 more rows</code></pre>
<p>The magic comes from <code>seq_len</code>. If we use <code>slice(seq_len(min(which(div_increase == 0))))</code>, then we slice the rows from row 1 (which recall is the year 2018) through the first time we see a dividend increase of 0. <code>seq_len()</code> creates a sequence of 1 through some number, in this case, we create a sequence of 1 through the row number where we first see a dividend increase of 0 (there’s a good tutorial on <code>seq_len</code> <a href="http://www.rpubs.com/Mentors_Ubiqum/seq_len">here</a>). And that’s what we want to keep.</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
arrange(desc(year)) %>%
arrange(ticker) %>%
slice(seq_len(min(which(div_increase == 0))))</code></pre>
<pre><code># A tibble: 240 x 4
# Groups: ticker [29]
year ticker div_total div_increase
<dbl> <chr> <dbl> <dbl>
1 2018 AAPL 2.82 1
2 2017 AAPL 2.46 1
3 2016 AAPL 2.23 1
4 2015 AAPL 2.03 0
5 2018 AMZN 0 0
6 2018 BA 6.84 1
7 2017 BA 5.68 1
8 2016 BA 4.36 1
9 2015 BA 3.64 1
10 2014 BA 2.92 1
# … with 230 more rows</code></pre>
<p>Now let’s order our data by those years of increase, so that the company with the longest consecutive years of increase is at the top.</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
arrange(desc(year)) %>%
arrange(ticker) %>%
slice(seq_len(min(which(div_increase ==0)))) %>%
mutate(div_inc_consec = sum(div_increase)) %>%
slice(1) %>%
arrange(desc(div_inc_consec))</code></pre>
<pre><code># A tibble: 29 x 5
# Groups: ticker [29]
year ticker div_total div_increase div_inc_consec
<dbl> <chr> <dbl> <dbl> <dbl>
1 2018 PEP 3.59 1 20
2 2018 JNJ 3.54 1 16
3 2018 XOM 3.23 1 16
4 2018 CVX 4.48 1 13
5 2018 MSFT 1.72 1 13
6 2018 PG 2.84 1 13
7 2018 T 2 1 12
8 2018 DIS 1.72 1 9
9 2018 HD 4.12 1 9
10 2018 UNH 3.45 1 9
# … with 19 more rows</code></pre>
<p>And the winner is….</p>
<pre class="r"><code>divs_total %>%
group_by(ticker) %>%
mutate(div_increase = case_when(div_total > lag(div_total, 1) ~ 1,
TRUE ~ 0)) %>%
arrange(desc(year)) %>%
arrange(ticker) %>%
slice(seq_len(min(which(div_increase ==0)))) %>%
mutate(div_inc_consec = sum(div_increase)) %>%
slice(1) %>%
arrange(desc(div_inc_consec)) %>%
filter(div_inc_consec > 0) %>%
ggplot(aes(x = reorder(ticker, div_inc_consec), y = div_inc_consec, fill = ticker)) +
geom_col(width = .5) +
geom_label_repel(aes(label = ticker), color = "white", nudge_y = .6) +
theme(legend.position = "none",
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
labs(x = "", y = "years consec div increase") +
scale_y_continuous(breaks = scales::pretty_breaks(n = 10))</code></pre>
<p><img src="/post/2019-07-08-dividend-sleuthing-with-r_files/figure-html/unnamed-chunk-19-1.png" width="672" /></p>
<p><code>PEP</code>!! Followed by <code>JNJ</code> and <code>XOM</code>, though <code>MSFT</code> would have made the leader board without that special dividend, which made their dividend seem to drop in 2005.</p>
<p>Before we close, for the curious, I did run this separately on all 505 members of the S&P 500. It took about five minutes to pull the data from <code>tiingo</code> because it’s 500 tickers, daily data, over 28 years, 2 million rows, 14 columns, and total size of 305.2 MB, but <em>fortes fortuna iuvat</em>. Here’s a plot of all the tickers with at least five consecutive years of dividend increases. This was created with the exact code flow we used above, except I didn’t filter down to the top 30 by market cap.</p>
<p><img src="/post/2019-07-08-dividend-sleuthing-with-r_files/plot_consec_greater_than_five.png" /></p>
<p>Aren’t you just dying to know which ticker is that really high bar at the extreme right? It’s
<code>ATO</code>, Atmos Energy Corporation, with 28 years of consecutive dividend increases but, side note, if we had adjusted for stock splits, <code>ATO</code> would not have won this horse race.</p>
<p>A couple of plugs before we close:</p>
<p>If you like this sort of thing check out my book, <a href="https://www.amazon.com/Reproducible-Finance-Portfolio-Analysis-Chapman/dp/1138484032">Reproducible Finance with R</a>.</p>
<p>I’m also going to be posting weekly code snippets on <a href="https://www.linkedin.com/in/jkregenstein/">linkedin</a>, connect with me there if you’re keen for some weekly R finance stuff.</p>
<p>Happy coding!</p>
<script>window.location.href='https://rviews.rstudio.com/2019/07/09/dividend-sleuthing-with-r/';</script>
May 2019: "Top 40" New CRAN Packages
https://rviews.rstudio.com/2019/06/25/may-2019-top-40-new-cran-packages/
Tue, 25 Jun 2019 00:00:00 +0000https://rviews.rstudio.com/2019/06/25/may-2019-top-40-new-cran-packages/
<p>Two hundred twenty-two new packages made it to CRAN in May, and it was more of an effort than usual to select the “Top 40”. Nevertheless, here they are in nine categories, Computational Methods, Data, Machine Learning, Mathematics, Medicine, Science, Statistics, Utilities and Visualization.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=dde">dde</a> v1.0.0: Implements a <a href="https://en.wikipedia.org/wiki/Dormand%E2%80%93Prince_method">Dormand-Prince</a> algorithm for solving “non-stiff” differential equations. See the <a href="https://cran.r-project.org/web/packages/dde/vignettes/dde.html">vignette</a> for examples.</p>
<p><img src="/post/2019-06-18-May-Top40_files/dde.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=rTRNG">rTRNG</a> v4.20-1: Implements an interface to <a href="https://www.numbercrunch.de/trng/">Tina’s Random Number Generator</a> C++ Library, and provides examples of how to use parallel RNG with <a href="https://cran.r-project.org/package=RcppParallel">RcppParallel</a>. See <a href="https://numbercrunch.de/trng/trng.pdf">Bauke (2018)</a> for implementation details and the <a href="https://cran.r-project.org/web/packages/rTRNG/vignettes/mcMat.html">vignette</a> for background, and <a href="https://cran.r-project.org/web/packages/rTRNG/vignettes/rTRNG.useR2017.pdf">useR!2017 Presentation</a> for examples.</p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=ramlegacy">ramlegacy</a> v0.2.0: Provides functions to download, cache and read an Excel version of the <a href="https://www.ramlegacy.org/">RAM Legacy Stock Assessment Data Base</a>, an online compilation of stock assessment results for commercially exploited marine populations from around the world. See the <a href="https://cran.r-project.org/web/packages/ramlegacy/vignettes/ramlegacy.html">vignette</a> for an introduction.</p>
<p><a href="https://cran.r-project.org/package=rnassqs">rnassqs</a> v0.4.0: Implements an interface to the United States Department of Agricultre’s National Agricultural Statistical Service (NASS) <a href="https://quickstats.nass.usda.gov/api">Quick Stats API</a>. There is a <a href="https://cran.r-project.org/web/packages/rnassqs/vignettes/rnassqs.html">vignette</a> showing how to use the package.</p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=EIX">EIX</a> v1.0 Provides functions to examine the structure and explain interactions in <a href="https://cran.r-project.org/package=XGBoost">XGBoost</a> and <a href="https://github.com/Microsoft/LightGBM">LightGBM</a> models including functions to visualize tree-based ensembles models, identify interactions and measure variable importance. EIX is a part of the <a href="https://github.com/ModelOriented/DrWhy/blob/master/README.md">DrWhy.AI</a> universe. There is a vignette on <a href="https://cran.r-project.org/web/packages/EIX/vignettes/EIX.html">Explaining Interactions</a> and another analyzing the <a href="https://cran.r-project.org/web/packages/EIX/vignettes/titanic_data.html">Titanic Data</a>.</p>
<p><img src="/post/2019-06-18-May-Top40_files/EIX.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=OTclust">OTclust</a> v1.0.2: Implements a mean partition for ensemble clustering by optimal transport alignment. The <a href="https://cran.r-project.org/web/packages/OTclust/vignettes/OTclust.htm">vignette</a></p>
<p><img src="/post/2019-06-18-May-Top40_files/OTclust.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=paws">paws</a> v0.1.1: An ensemble of packages including <a href="https://cran.r-project.org/package=paws,compute">paws.compute</a>, <a href="https://cran.r-project.org/package=paws.storage">paws.storage</a>, <a href="https://cran.r-project.org/package=paws.database">paws.database</a>, <a href="https://cran.r-project.org/package=paws.networking">paws.networking</a>, <a href="https://cran.r-project.org/package=paws.management">paws.management</a>, <a href="https://cran.r-project.org/package=paws.machine.learning">paws.machine.learning</a>, <a href="https://cran.r-project.org/package=paws.analytics">paws.analytics</a>, <a href="https://cran.r-project.org/package=paws.security.identity">paws.security.identity</a>, <a href="https://cran.r-project.org/package=paws.application.integration">paws.application.integration</a>, <a href="https://cran.r-project.org/package=paws.cost.management">paws.cost.management</a>, and <a href="https://cran.r-project.org/package=paws.customer.engagement">paws.customer.engagement</a> that provides a comprehensive interface to <a href="https://aws.amazon.com/">Amazon Web Services</a>.</p>
<p><a href="https://cran.r-project.org/package=peRspective">peRspective</a> v0.1.0: Implements an interface to the <a href="https://github.com/conversationai/perspectiveapi#perspective-comment-analyzer-api">Perspective API</a> which uses machine learning models to score the perceived impact a comment might have on a conversation (i.e. TOXICITY, INFLAMMATORY, etc.). See <a href="https://cran.r-project.org/web/packages/peRspective/readme/README.html">README</a> for an example.</p>
<p><img src="/post/2019-06-18-May-Top40_files/peRspective.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=SelectBoost">SelectBoost</a> v1.4.0: Implements the <a href="https://arxiv.org/abs/1810.01670">SelectBosot</a> algorithm to enhance the performance of variable selection methods in correlated data sets. There are vignettes on <a href="https://cran.r-project.org/web/packages/SelectBoost/vignettes/benchmarking-selectboost-networks.html">Benchmarking</a>, <a href="https://cran.r-project.org/web/packages/SelectBoost/vignettes/confidence-indices-Cascade-networks.html">Confidence Estimates</a>, <a href="https://cran.r-project.org/web/packages/SelectBoost/vignettes/sim-with-sb.html">Simulation Tools</a> that are provided with the package.</p>
<p><img src="/post/2019-06-18-May-Top40_files/SelectBoost.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=spectralGraphTopology">spectralGraphTopology</a> v0.1.1: Implements block coordinate descent estimators to learn k-component, bipartite, and k-component bipartite graphs from data by imposing spectral constraints on the eigenvalues and eigenvectors of the Laplacian and adjacency matrices. This package is based on the paper by <a href="arXiv:1904.09792">Kumar et al. (2019)</a>. There is a <a href="https://cran.r-project.org/web/packages/spectralGraphTopology/vignettes/SpectralGraphTopology-pdf.pdf">vignette</a>.</p>
<p><img src="/post/2019-06-18-May-Top40_files/spectralGraphTopology.png" height = "400" width="600"></p>
<h3 id="mathematics">Mathematics</h3>
<p><a href="https://cran.r-project.org/package=gridBezier">gridBezier</a> v1.1-1: Provides functions for rendering both quadratic and cubic <a href="https://pomax.github.io/bezierinfo/">Bezier curves</a> in grid. Look <a href="https://www.stat.auckland.ac.nz/~paul/Reports/VWline/offsetbezier/offsetbezier.html">here</a> for a tutorial on variable-width Bezier Splines in R.</p>
<p><img src="/post/2019-06-18-May-Top40_files/gridBezier.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=wedge">wedge</a> v1.0-1: Provides functions for working with differentials, k-forms, wedge products, Stokes’s theorem, and related concepts from the <a href="http://www.physics.usu.edu/Wheeler/GaugeTheory/09Jan12zNotes.pdf">exterior calculus</a>.</p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=adept">adept</a> v1.1.2: Provides functions for analyzing high-density data from walking strides collected from a wearable accelerometer worn during continuous walking activity. There is an <a href="https://cran.r-project.org/web/packages/adept/vignettes/adept-intro.html">Introduction</a> and a <a href="https://cran.r-project.org/web/packages/adept/vignettes/adept-strides-segmentation.html">vignette</a> on walking stride segmentation.</p>
<p><img src="/post/2019-06-18-May-Top40_files/adept.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=basket">basket</a> v0.9.2: Implementation of multisource exchangeability models for Bayesian analyses of prespecified subgroups arising in the context of basket trial design and monitoring. See <a href="https://doi.org/10.1056/NEJMoa1502309">Hyman et al. (20185</a>, <a href="https://doi.org/10.1002/sim.7893">Hobbs & Landin (2018)</a>, <a href="https://doi.org/10.1093/annonc/mdy457">Hobbs et al. (2018)</a> and <a href="https://doi.org/10.1093/biostatistics/kxx031">Kaizer et al. (2017)</a> for background, and the <a href="https://cran.r-project.org/web/packages/basket/vignettes/using-the-basket-package.html">vignette</a> for how to use the package.</p>
<p><img src="/post/2019-06-18-May-Top40_files/basket.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=BayesianFROC">BayesianFROC</a> v0.1.3: Implements methods for Free-response Receiver Operating Characteristic (FROC) analysis to compare performance metrics such as area under the curve (AUC) for the purpose of finding lesions in radiographs of different modalities: Resonance Imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), …, etc. See <a href="https://aapm.onlinelibrary.wiley.com/doi/abs/10.1118/1.596358">Chakraborty (1981)</a> for background and the vignettes <a href="https://cran.r-project.org/web/packages/BayesianFROC/vignettes/Theory_of_Bayesian_FROC_with_R_scripts.html">Theory of Bayesian FROC</a> and <a href="https://cran.r-project.org/web/packages/BayesianFROC/vignettes/Brief_explanation.html">Single Reader and Single Modality</a> for how to use the package.</p>
<p><img src="/post/2019-06-18-May-Top40_files/BayesianFROC.jpeg" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=gtsummary">getsummary</a> v0.1.0: Provides functions to create presentation-ready tables summarizing data sets, regression models, and more. Function defaults follow reporting guidelines outlined in <a href="doi:10.1016/j.eururo.2018.12.014">Assel et al. (2019)</a>. There are tutorials on using the <a href="https://cran.r-project.org/web/packages/gtsummary/vignettes/fmt_regression.html"><code>fmt-regression()</code></a> and <a href="https://cran.r-project.org/web/packages/gtsummary/vignettes/fmt_table1.html"><code>fmt_table1()</code></a> functions.</p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=beastier">beastier</a> v2.01.15: Implements an API to access the <a href="http://www.beast2.org">BEAST2</a> tool for Bayesian phylogenetic analysis. The <a href="https://cran.r-project.org/web/packages/beastier/vignettes/demo.html">vignette</a> shows how to use the package.</p>
<p><a href="https://cran.r-project.org/package=gdalcubes">gdalcubes</a> v0.1.0: Provides functions to process collections of Earth observation images as on-demand multispectral, multitemporal data cubes. Users define cubes by spatiotemporal extent, resolution, and spatial reference system and let ‘gdalcubes’ automatically apply cropping, reprojection, and resampling using the <a href="https://www.dataone.org/software-tools/gdal-geospatial-data-abstraction-library">Geospatial Data Abstraction Library (GDAL)</a>. There is a <a href="https://cran.r-project.org/web/packages/gdalcubes/vignettes/getting_started.html">Getting Starting Guide</a>.</p>
<p><img src="/post/2019-06-18-May-Top40_files/gdalcubes.png" height = "400" width="600"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=explore">explore</a> v0.4.3O Provides interactive functions to facilitate exploratory data analysis. See the <a href="https://cran.r-project.org/web/packages/explore/vignettes/explore.html">vignette</a>.</p>
<p><img src="/post/2019-06-18-May-Top40_files/explore.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=Monte.Carlo.se">Monte.Carlo.se</a> v0.1.0: Provides functions to compute Monte Carlo standard errors for summaries of Monte Carlo output. See <a href="doi:10.1111/insr.12087">Boos and Osborne (2015)</a> for background. Additionally, there is an <a href="https://cran.r-project.org/web/packages/Monte.Carlo.se/vignettes/Brief-Overview.html">Overview</a> and vignettes on <a href="https://cran.r-project.org/web/packages/Monte.Carlo.se/vignettes/Example1.html">Creating Tables</a>, <a href="https://cran.r-project.org/web/packages/Monte.Carlo.se/vignettes/Example2.html">Summary Statistics</a>, and <a href="https://cran.r-project.org/web/packages/Monte.Carlo.se/vignettes/Example3.html">Pairwise Comparisons</a>.</p>
<p><a href="https://cran.r-project.org/package=multinomineq">multinomialeq</a> v0.2.1: Implements Gibbs sampling and Bayes factors for multinomial models with linear inequality constraints on the vector of probability parameters. See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0022249618301457?via%3Dihub">Heck and Davis-Stober (2019)</a> for background and the <a href="https://cran.r-project.org/web/packages/multinomineq/vignettes/multinomineq_intro.html">vignette</a> for an example.</p>
<p><a href="https://cran.r-project.org/package=PLNmodels">PLNmodels</a> v0.9.2: Implements the Poisson-lognormal model which is applicable to a variety of multivariate problems when count data are at play, including principal component analysis for count data (Chiquet et al. (2018)](doi:10.<sup>1214</sup>⁄<sub>18</sub>-AOAS1177), and network inference <a href="arXiv:1806.03120">Chiquet et al. (2018)</a>. There are vignettes for <a href="https://cran.r-project.org/web/packages/PLNmodels/vignettes/Import_data.html">importing data</a>, <a href="https://cran.r-project.org/web/packages/PLNmodels/vignettes/PLN.html">analyzing count data</a>, <a href="https://cran.r-project.org/web/packages/PLNmodels/vignettes/PLNLDA.html">classification</a>, <a href="https://cran.r-project.org/web/packages/PLNmodels/vignettes/PLNnetwork.html">sparse structure estimation</a>, <a href="https://cran.r-project.org/web/packages/PLNmodels/vignettes/PLNPCA.html">PCA</a>, and a description of the <a href="https://cran.r-project.org/web/packages/PLNmodels/vignettes/Trichoptera.html">Trichoptera</a> data set.</p>
<p><img src="/post/2019-06-18-May-Top40_files/PLNmodels.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=pvaluefunctions">pvaluefunctions</a> v1.2.0: Provides a function to compute and plot confidence distributions, confidence densities, p-value functions and s-value (surprisal) functions for several commonly used estimates. See <a href="https://ajph.aphapublications.org/doi/10.2105/AJPH.77.2.195">Poole (1987)</a>, <a href="https://ajph.aphapublications.org/doi/10.2105/AJPH.77.2.195">Schweder and Hjort (202)</a> and <a href="https://projecteuclid.org/euclid.lnms/1196794948">Singh et al. (2007)</a> for background, and the <a href="https://cran.r-project.org/web/packages/pvaluefunctions/vignettes/pvaluefun.html">vignette</a> for details of the package.</p>
<p><img src="/post/2019-06-18-May-Top40_files/pvaluefunctions.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=SGB">SGB</a> v1.0: Implements multivariate regression using a generalization of the Dirichlet distribution, the Simplicial Generalized Beta distribution over the simplex space of compositions or positive vectors with sum of components equal to 1. See the <a href="https://cran.r-project.org/web/packages/SGB/vignettes/vignette.pdf">vignette</a> for details.</p>
<p><a href="https://cran.r-project.org/package=suddengains">suddengains</a> v0.2.1: Provides functions to identify sudden gains in longitudinal data based on the criteria outlined in <a href="doi:10.1037/0022-006X.67.6.894">Tang and DeRubeis (1999)</a>. There is a <a href="https://cran.r-project.org/web/packages/suddengains/vignettes/suddengains-tutorial.html">Tutorial</a>.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=distill">distill</a> v0.7: Enables <code>R Markdown</code> formatting for web-based scientific and technical articles. Look <a href="https://rstudio.github.io/distill/">here</a> for documentation.</p>
<p><img src="/post/2019-06-18-May-Top40_files/distill.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=packageRank">packageRank</a> v0.1.0: Allows users to compute and visualize the cross-sectional and longitudinal number and rank percentile of package downloads from RStudio’s CRAN mirror. The <a href="https://github.com/lindbrook/packageRank">documentation</a> is on GitHub.</p>
<p><img src="/post/2019-06-18-May-Top40_files/packageRank.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=table.express">table.express</a> v0.1.1: Offers versions of <code>dplyr</code> data manipulation verbs that parse and build expressions which are ultimately evaluated by <code>data.table</code>. See the <a href="https://cran.r-project.org/web/packages/table.express/vignettes/table.express.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=text2speech">test2speech</a> v0.2.4: Unifies different text to speech engines from Google, Microsoft, and Amazon by enabling users to switch between different services by setting a function argument. There is a vignette <a href="https://cran.r-project.org/web/packages/text2speech/vignettes/listing_voices.html">Listing out voices</a>.</p>
<p><a href="https://cran.r-project.org/package=tidycode">tidycode</a> v0.1.0: Provides functions to analyze lines of R code using tidy principles allowing users to input lines of R code and output a data frame with one row per function. See the <a href="https://cran.r-project.org/web/packages/tidycode/vignettes/tidycode.html">vignette</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=matahari">matahari</a> v0.1.0: Allows users to spy on their R sessions by logging everything they type into the R console. See the <a href="https://cran.r-project.org/web/packages/matahari/vignettes/matahari.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=vroom">vroom</a> v1.0.1: Uses lazy evaluation to quickly read data from <code>csv</code>, <code>tsv</code> and <code>fwf</code> files. There is a <a href="https://cran.r-project.org/web/packages/vroom/vignettes/vroom.html">Getting Started Guide</a> and a vignette with <a href="https://cran.r-project.org/web/packages/vroom/vignettes/benchmarks.html">Benchmarks</a>.</p>
<p><img src="/post/2019-06-18-May-Top40_files/vroom.png" height = "400" width="600"></p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=ggpMX">ggPMX</a> v0.9.4: Implements a toolbox of diagnostic functions and plots for non-linear mixed effects models that produces publication ready plots. See the <a href="https://cran.r-project.org/web/packages/ggPMX/vignettes/ggPMX-guide.pdf">User Guide</a> for details.</p>
<p><img src="/post/2019-06-18-May-Top40_files/ggPMX.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=ggResidpanel">ggResidpanel</a> v0.3.0: Provides functions to create interactive diagnostic plots and panels of diagnostic plots for residuals. See the <a href="https://cran.r-project.org/web/packages/ggResidpanel/vignettes/introduction.html">vignette</a>.</p>
<p><img src="/post/2019-06-18-May-Top40_files/ggResidpanel.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=mapcan">mapcan</a> v0.0.1: Provides functions to standard choropleth maps as well as choropleth alternatives for <code>ggplot2</code> that address the peculiarities of plotting statistics for Canadian data at the riding level. There is an <a href="https://cran.r-project.org/web/packages/mapcan/vignettes/choropleth_maps_vignette.html">Introduction</a> and a vignette on the <a href="https://cran.r-project.org/web/packages/mapcan/vignettes/riding_binplot_vignette.html"><code>riding_binplot()</code></a> function.</p>
<p><img src="/post/2019-06-18-May-Top40_files/mapcan.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=oceanis">oceanis</a> v0.8.5: Provides functions to create interactive maps for statistical analysis that may include proportional circles, chroropleths, typology and flows. The <a href="https://cran.r-project.org/web/packages/oceanis/vignettes/oceanis.html">vignette</a> is in French.</p>
<p><img src="/post/2019-06-18-May-Top40_files/oceanis.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=parcoords">parcoords</a> v1.0.1: Implements an <code>htmlwidget</code> for the <code>d3.js</code> parallel coordinates function <a href="https://github.com/BigFatDog/parcoords-es">parcoords-es</a>. See the <a href="https://cran.r-project.org/web/packages/parcoords/vignettes/introduction-to-parcoords-.html">vignette</a></p>
<p><img src="/post/2019-06-18-May-Top40_files/parcoords.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=plotdap">plotdap</a> v0.0.2: Provides functions to visualize and animate data from <a href="https://upwell.pfeg.noaa.gov/erddap/information.html">ERDDAP</a> servers via the <a href="https://cran.r-project.org/package=rerddap">rerddap</a> package using base graphics or <code>ggplot2</code>. There is a <a href="https://cran.r-project.org/web/packages/plotdap/vignettes/using_plotdap.html">vignette</a>.</p>
<p><img src="/post/2019-06-18-May-Top40_files/plotdap.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=prettyB">prettyB</a> v0.2.1: Provides drop-in replacements for some standard, base R graphics functions.</p>
<p><img src="/post/2019-06-18-May-Top40_files/prettyB.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=radarBoxplot">radarBoxplot</a> v`.0.0: Creates radar-boxplots for visualizing multivariate data. See <a href="https://cran.r-project.org/web/packages/radarBoxplot/readme/README.html">README</a> for documentation.</p>
<p><img src="/post/2019-06-18-May-Top40_files/radarBoxplot.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2019/06/25/may-2019-top-40-new-cran-packages/';</script>
A Gentle Introduction to tidymodels
https://rviews.rstudio.com/2019/06/19/a-gentle-intro-to-tidymodels/
Wed, 19 Jun 2019 00:00:00 +0000https://rviews.rstudio.com/2019/06/19/a-gentle-intro-to-tidymodels/
<p><img src="/post/2019-06-14-a-gentle-intro-to-tidymodels_files/figure-html/ds.png" /><!-- --></p>
<p>Recently, I had the opportunity to showcase <code>tidymodels</code> in workshops and talks. Because of my vantage point as a user, I figured it would be valuable to share what I have learned so far. Let’s begin by framing where <code>tidymodels</code> fits in our analysis projects.</p>
<p>The diagram above is based on the <a href="https://r4ds.had.co.nz/explore-intro.html">R for Data Science</a> book, by Wickham and Grolemund. The version in this article illustrates what step each package covers. Even though it is a single step, developing models can benefit from having a <code>tidyverse</code>-friendly interface. That is where <code>tidymodels</code> comes in.</p>
<p>It is important to clarify that the group of packages that make up <code>tidymodels</code> do not implement statistical models themselves. Instead, they focus on making all the tasks around fitting the model much easier. Those tasks are data <em>pre-processing</em> and <em>results validation</em>.</p>
<p>In a way, the <strong>Model</strong> step itself has sub-steps. For these sub-steps, <code>tidymodels</code> provides one or several packages. This article will showcase functions from four <code>tidymodels</code> packages:</p>
<ul>
<li><code>rsample</code> - Different types of re-samples</li>
<li><code>recipes</code> - Transformations for model data pre-processing</li>
<li><code>parnip</code> - A common interface for model creation</li>
<li><code>yardstick</code> - Measure model performance</li>
</ul>
<p>The following diagram illustrates each modeling step, and lines up the <code>tidymodels</code> packages that we will use in this article:</p>
<p><img src="/post/2019-06-14-a-gentle-intro-to-tidymodels_files/figure-html/tidymodels.png" /><!-- --></p>
<p>In a given analysis, a <code>tidyverse</code> package may or may not be used. Not all projects need to work with time variables, so there is no need to use functions from the <code>hms</code> package. The same idea applies to <code>tidymodels</code>. Depending on what type of modeling is going to be done, only functions from some its packages will be used.</p>
<div id="an-example" class="section level2">
<h2>An Example</h2>
<p>We will use the <code>iris</code> data set for an example. Its data is already imported, and sufficiently tidy to move directly to modeling.</p>
<div id="load-only-the-tidymodels-library" class="section level3">
<h3>Load <em>only</em> the <code>tidymodels</code> library</h3>
<p>This may be the first article I have written where only one package is called via <code>library()</code>. Apart from loading its core modeling packages, <code>tidymodels</code> also conveniently loads some <code>tidyverse</code> packages, including <code>dplyr</code> and <code>ggplot2</code>. Throughout this exercise, we will use some functions out of those packages, but we don’t have to explicitly load them into our R session.</p>
<pre class="r"><code>library(tidymodels)</code></pre>
</div>
<div id="pre-process" class="section level3">
<h3>Pre-Process</h3>
<p>This step focuses on making data suitable for modeling by using data transformations. All transformations can be accomplished with <code>dplyr</code>, or other <code>tidyverse</code> packages Consider using <code>tidymodels</code> packages when model development is more heavy and complex.</p>
<div id="data-sampling" class="section level4">
<h4>Data Sampling</h4>
<p>The <code>initial_split()</code> function is specially built to separate the data set into a <em>training</em> and <em>testing</em> set. By default, it holds 3/4 of the data for training and the rest for testing. That can be changed by passing the <code>prop</code> argument. This function generates an <code>rplit</code> object, not a data frame. The printed output shows the row count for testing, training, and total.</p>
<pre class="r"><code>iris_split <- initial_split(iris, prop = 0.6)
iris_split</code></pre>
<pre><code>## <90/60/150></code></pre>
<p>To access the observations reserved for training, use the <code>training()</code> function. Similarly, use <code>testing()</code> to access the testing data.</p>
<pre class="r"><code>iris_split %>%
training() %>%
glimpse()</code></pre>
<pre><code>## Observations: 90
## Variables: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.9, 5.4, 4…
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 3.1, 3.7, 3…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.5, 1.5, 1…
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.1, 0.2, 0…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, set…</code></pre>
<p>These sampling functions are courtesy of the <code>rsample</code> package, which is part of <code>tidymodels</code>.</p>
</div>
<div id="pre-process-interface" class="section level4">
<h4>Pre-process interface</h4>
<p>In <code>tidymodels</code>, the <code>recipes</code> package provides an interface that specializes in data pre-processing. Within the package, the functions that start, or execute, the data transformations are named after cooking actions. That makes the interface more user-friendly. For example:</p>
<ul>
<li><p><code>recipe()</code> - Starts a new set of transformations to be applied, similar to the <code>ggplot()</code> command. Its main argument is the model’s formula.</p></li>
<li><p><code>prep()</code> - Executes the transformations on top of the data that is supplied (typically, the training data).</p></li>
</ul>
<p>Each data transformation is a step. Functions correspond to specific types of steps, each of which has a prefix of <code>step_</code>. There are several <code>step_</code> functions; in this example, we will use three of them:</p>
<ul>
<li><p><code>step_corr()</code> - Removes variables that have large absolute correlations with other variables</p></li>
<li><p><code>step_center()</code> - Normalizes numeric data to have a mean of zero</p></li>
<li><p><code>step_scale()</code> - Normalizes numeric data to have a standard deviation of one</p></li>
</ul>
<p>Another nice feature is that the step can be applied to a specific variable, groups of variables, or all variables. The <code>all_outocomes()</code> and <code>all_predictors()</code> functions provide a very convenient way to specify groups of variables. For example, if we want the <code>step_corr()</code> to only analyze the predictor variables, we use <code>step_corr(all_predictors())</code>. This capability saves us from having to enumerate each variable.</p>
<p>In the following example, we will put together the <code>recipe()</code>, <code>prep()</code>, and step functions to create a <code>recipe</code> object. The <code>training()</code> function is used to extract that data set from the previously created split sample data set.</p>
<pre class="r"><code>iris_recipe <- training(iris_split) %>%
recipe(Species ~.) %>%
step_corr(all_predictors()) %>%
step_center(all_predictors(), -all_outcomes()) %>%
step_scale(all_predictors(), -all_outcomes()) %>%
prep()</code></pre>
<p>If we call the <code>iris_recipe</code> object, it will print details about the recipe. The <strong>Operations</strong> section describes what was done to the data. One of the operations entries in the example explains that the correlation step removed the <code>Petal.Length</code> variable.</p>
<pre class="r"><code>iris_recipe</code></pre>
<pre><code>## Data Recipe
##
## Inputs:
##
## role #variables
## outcome 1
## predictor 4
##
## Training data contained 90 data points and no missing data.
##
## Operations:
##
## Correlation filter removed Petal.Length [trained]
## Centering for Sepal.Length, Sepal.Width, Petal.Width [trained]
## Scaling for Sepal.Length, Sepal.Width, Petal.Width [trained]</code></pre>
</div>
</div>
<div id="execute-the-pre-processing" class="section level3">
<h3>Execute the pre-processing</h3>
<p>The testing data can now be transformed using the exact same steps, weights, and categorization used to pre-process the training data. To do this, another function with a cooking term is used: <code>bake()</code>. Notice that the <code>testing()</code> function is used in order to extract the appropriate data set.</p>
<pre class="r"><code>iris_testing <- iris_recipe %>%
bake(testing(iris_split))
glimpse(iris_testing)</code></pre>
<pre><code>## Observations: 60
## Variables: 4
## $ Sepal.Length <dbl> -1.597601746, -1.138960096, 0.007644027, -0.7949788…
## $ Sepal.Width <dbl> -0.41010139, 0.71517681, 2.06551064, 1.61539936, 0.…
## $ Petal.Width <dbl> -1.2085003, -1.2085003, -1.2085003, -1.0796318, -1.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, set…</code></pre>
<p>Performing the same operation over the training data is redundant, because that data has already been prepped. To load the prepared training data into a variable, we use <code>juice()</code>. It will extract the data from the <code>iris_recipe</code> object.</p>
<pre class="r"><code>iris_training <- juice(iris_recipe)
glimpse(iris_training)</code></pre>
<pre><code>## Observations: 90
## Variables: 4
## $ Sepal.Length <dbl> -0.7949789, -1.0242997, -1.2536205, -1.3682809, -0.…
## $ Sepal.Width <dbl> 0.94023245, -0.18504575, 0.26506553, 0.04000989, 1.…
## $ Petal.Width <dbl> -1.2085003, -1.2085003, -1.2085003, -1.2085003, -1.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, set…</code></pre>
</div>
<div id="model-training" class="section level3">
<h3>Model Training</h3>
<p>In R, there are multiple packages that fit the same type of model. It is common for each package to provide a unique interface. In other words, things such as an argument for the same model attribute is defined differently for each package. For example, the <code>ranger</code> and <code>randomForest</code> packages fit Random Forest models. In the <code>ranger()</code> function, to define the number of trees we use <code>num.trees</code>. In <code>randomForest</code>, that argument is named <code>ntree</code>. It is not easy to switch between packages to run the same model.</p>
<p>Instead of replacing the modeling package, <code>tidymodels</code> replaces the interface. Better said, <code>tidymodels</code> provides a single set of functions and arguments to define a model. It then fits the model against the requested modeling package.</p>
<p>In the example below, the <code>rand_forest()</code> function is used to initialize a Random Forest model. To define the number of trees, the <code>trees</code> argument is used. To use the <code>ranger</code> version of Random Forest, the <code>set_engine()</code> function is used. Finally, to execute the model, the <code>fit()</code> function is used. The expected arguments are the formula and data. Notice that the model runs on top of the <em>juiced</em> trained data.</p>
<pre class="r"><code>iris_ranger <- rand_forest(trees = 100, mode = "classification") %>%
set_engine("ranger") %>%
fit(Species ~ ., data = iris_training)</code></pre>
<p>The payoff is that if we now want to run the same model against <code>randomForest</code>, we simply change the value in <code>set_engine()</code> to “randomForest”.</p>
<pre class="r"><code>iris_rf <- rand_forest(trees = 100, mode = "classification") %>%
set_engine("randomForest") %>%
fit(Species ~ ., data = iris_training)</code></pre>
<p>It is also worth mentioning that the model is not defined in a single, large function with a lot of arguments. The model definition is separated into smaller functions such as <code>fit()</code> and <code>set_engine()</code>. This allows for a more flexible - and easier to learn - interface.</p>
</div>
<div id="predictions" class="section level3">
<h3>Predictions</h3>
<p>Instead of a vector, the <code>predict()</code> function ran against a <code>parsnip</code> model returns a <code>tibble</code>. By default, the prediction variable is called <code>.pred_class</code>. In the example, notice that the <em>baked</em> testing data is used.</p>
<pre class="r"><code>predict(iris_ranger, iris_testing)</code></pre>
<pre><code>## # A tibble: 60 x 1
## .pred_class
## <fct>
## 1 setosa
## 2 setosa
## 3 setosa
## 4 setosa
## 5 setosa
## 6 setosa
## 7 setosa
## 8 setosa
## 9 setosa
## 10 setosa
## # … with 50 more rows</code></pre>
<p>It is very easy to add the predictions to the <em>baked</em> testing data by using <code>dplyr</code>’s <code>bind_cols()</code> function.</p>
<pre class="r"><code>iris_ranger %>%
predict(iris_testing) %>%
bind_cols(iris_testing) %>%
glimpse()</code></pre>
<pre><code>## Observations: 60
## Variables: 5
## $ .pred_class <fct> setosa, setosa, setosa, setosa, setosa, setosa, set…
## $ Sepal.Length <dbl> -1.597601746, -1.138960096, 0.007644027, -0.7949788…
## $ Sepal.Width <dbl> -0.41010139, 0.71517681, 2.06551064, 1.61539936, 0.…
## $ Petal.Width <dbl> -1.2085003, -1.2085003, -1.2085003, -1.0796318, -1.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, set…</code></pre>
</div>
<div id="model-validation" class="section level3">
<h3>Model Validation</h3>
<p>Use the <code>metrics()</code> function to measure the performance of the model. It will automatically choose metrics appropriate for a given type of model. The function expects a <code>tibble</code> that contains the actual results (<code>truth</code>) and what the model predicted (<code>estimate</code>).</p>
<pre class="r"><code>iris_ranger %>%
predict(iris_testing) %>%
bind_cols(iris_testing) %>%
metrics(truth = Species, estimate = .pred_class)</code></pre>
<pre><code>## # A tibble: 2 x 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy multiclass 0.917
## 2 kap multiclass 0.874</code></pre>
<p>Because of the consistency of the new interface, measuring the same metrics against the <code>randomForest</code> model is as easy as replacing the model variable at the top of the code.</p>
<pre class="r"><code>iris_rf %>%
predict(iris_testing) %>%
bind_cols(iris_testing) %>%
metrics(truth = Species, estimate = .pred_class)</code></pre>
<pre><code>## # A tibble: 2 x 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy multiclass 0.883
## 2 kap multiclass 0.824</code></pre>
<div id="per-classifier-metrics" class="section level4">
<h4>Per classifier metrics</h4>
<p>It is easy to obtain the probability for each possible predicted value by setting the <code>type</code> argument to <code>prob</code>. That will return a <code>tibble</code> with as many variables as there are possible predicted values. Their name will default to the original value name, prefixed with <code>.pred_</code>.</p>
<pre class="r"><code>iris_ranger %>%
predict(iris_testing, type = "prob") %>%
glimpse()</code></pre>
<pre><code>## Observations: 60
## Variables: 3
## $ .pred_setosa <dbl> 0.677480159, 0.978293651, 0.783250000, 0.983972…
## $ .pred_versicolor <dbl> 0.295507937, 0.011706349, 0.150833333, 0.001111…
## $ .pred_virginica <dbl> 0.02701190, 0.01000000, 0.06591667, 0.01491667,…</code></pre>
<p>Again, use <code>bind_cols()</code> to append the predictions to the <em>baked</em> testing data set.</p>
<pre class="r"><code>iris_probs <- iris_ranger %>%
predict(iris_testing, type = "prob") %>%
bind_cols(iris_testing)
glimpse(iris_probs)</code></pre>
<pre><code>## Observations: 60
## Variables: 7
## $ .pred_setosa <dbl> 0.677480159, 0.978293651, 0.783250000, 0.983972…
## $ .pred_versicolor <dbl> 0.295507937, 0.011706349, 0.150833333, 0.001111…
## $ .pred_virginica <dbl> 0.02701190, 0.01000000, 0.06591667, 0.01491667,…
## $ Sepal.Length <dbl> -1.597601746, -1.138960096, 0.007644027, -0.794…
## $ Sepal.Width <dbl> -0.41010139, 0.71517681, 2.06551064, 1.61539936…
## $ Petal.Width <dbl> -1.2085003, -1.2085003, -1.2085003, -1.0796318,…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa,…</code></pre>
<p>Now that everything is in one <code>tibble</code>, it is easy to calculate curve methods. In this case we are using <code>gain_curve()</code>.</p>
<pre class="r"><code>iris_probs%>%
gain_curve(Species, .pred_setosa:.pred_virginica) %>%
glimpse()</code></pre>
<pre><code>## Observations: 141
## Variables: 5
## $ .level <chr> "setosa", "setosa", "setosa", "setosa", "setosa"…
## $ .n <dbl> 0, 1, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14, 15, 16, …
## $ .n_events <dbl> 0, 1, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14, 15, 16, …
## $ .percent_tested <dbl> 0.000000, 1.666667, 5.000000, 6.666667, 8.333333…
## $ .percent_found <dbl> 0.000000, 5.882353, 17.647059, 23.529412, 29.411…</code></pre>
<p>The curve methods include an <code>autoplot()</code> function that easily creates a <code>ggplot2</code> visualization.</p>
<pre class="r"><code>iris_probs%>%
gain_curve(Species, .pred_setosa:.pred_virginica) %>%
autoplot()</code></pre>
<p><img src="/post/2019-06-14-a-gentle-intro-to-tidymodels_files/figure-html/gain_curve-1.png" width="672" /></p>
<p>This is an example of a <code>roc_curve()</code>. Again, because of the consistency of the interface, only the function name needs to be modified; even the argument values remain the same.</p>
<pre class="r"><code>iris_probs%>%
roc_curve(Species, .pred_setosa:.pred_virginica) %>%
autoplot()</code></pre>
<p><img src="/post/2019-06-14-a-gentle-intro-to-tidymodels_files/figure-html/roc_curve-1.png" width="672" /></p>
<p>To measured the combined single predicted value and the probability of each possible value, combine the two prediction modes (with and without <code>prob</code> type). In this example, using <code>dplyr</code>’s <code>select()</code> makes the resulting <code>tibble</code> easier to read.</p>
<pre class="r"><code>predict(iris_ranger, iris_testing, type = "prob") %>%
bind_cols(predict(iris_ranger, iris_testing)) %>%
bind_cols(select(iris_testing, Species)) %>%
glimpse()</code></pre>
<pre><code>## Observations: 60
## Variables: 5
## $ .pred_setosa <dbl> 0.677480159, 0.978293651, 0.783250000, 0.983972…
## $ .pred_versicolor <dbl> 0.295507937, 0.011706349, 0.150833333, 0.001111…
## $ .pred_virginica <dbl> 0.02701190, 0.01000000, 0.06591667, 0.01491667,…
## $ .pred_class <fct> setosa, setosa, setosa, setosa, setosa, setosa,…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa,…</code></pre>
<p>Pipe the resulting table into <code>metrics()</code>. In this case, specify <code>.pred_class</code> as the <code>estimate</code>.</p>
<pre class="r"><code>predict(iris_ranger, iris_testing, type = "prob") %>%
bind_cols(predict(iris_ranger, iris_testing)) %>%
bind_cols(select(iris_testing, Species)) %>%
metrics(Species, .pred_setosa:.pred_virginica, estimate = .pred_class)</code></pre>
<pre><code>## # A tibble: 4 x 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy multiclass 0.917
## 2 kap multiclass 0.874
## 3 mn_log_loss multiclass 0.274
## 4 roc_auc hand_till 0.980</code></pre>
</div>
</div>
</div>
<div id="closing-remarks" class="section level2">
<h2>Closing remarks</h2>
<p>This end-to-end example is intended to be a gentle introduction to <code>tidymodels</code>. The number of functions, and options of such functions, were kept at a minimum for the purposes of this demonstration, but there is much more that can be done with this wonderful group of packages. Hopefully, this article will help you get started, and maybe even encourage you to expand your knowledge further.</p>
</div>
<div id="thank-you" class="section level2">
<h2>Thank you!</h2>
<p>I would like to thank <a href="https://twitter.com/topepos">Max Kuhn</a> and <a href="https://twitter.com/dvaughan32">Davis Vaughan</a>, the primary developers of <code>tidymodels</code>. They have been very gracious in providing instruction, feedback, and guidance throughout my journey of learning <code>tidymodels</code>.</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/06/19/a-gentle-intro-to-tidymodels/';</script>
April 2019: "Top 40" New CRAN Packages
https://rviews.rstudio.com/2019/05/30/april-2019-top-40-new-cran-packages/
Thu, 30 May 2019 00:00:00 +0000https://rviews.rstudio.com/2019/05/30/april-2019-top-40-new-cran-packages/
<p>One hundred eighty-seven new packages made it to CRAN in April. Here are my picks for the “Top 40”, organized into ten categories: Biotechnology, Data, Econometrics, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities, and Visualization.</p>
<h3 id="biotechnology">Biotechnology</h3>
<p><a href="https://cran.r-project.org/package=genpwr">genpwr</a> v1.00: Provides functions for power and sample size calculations for genetic association studies allowing for mis-specification of the model of genetic susceptibility. The methods employed are extensions of <a href="doi:10.1093/aje/155.5.478">Gauderman (2002)</a> and <a href="doi:10.1002/sim.973">Gauderman (2002)</a>. See the <a href="https://cran.r-project.org/web/packages/genpwr/vignettes/vignette.html">vignette</a> for details.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/genpwr.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=rabhit">rabhit</a> v0.1.1: Implements an adaptive Bayesian framework to infer V-D-J haplotypes and gene deletions from AIRR-seq data. See <a href="doi:10.1038/s41467-019-08489-3">Gidoni et al. (2019)</a> for background and the <a href="https://cran.r-project.org/web/packages/rabhit/vignettes/RAbHIT-vignette.pdf">vignette</a> for an introduction to the package.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/rabhit.png" height = "400" width="600"></p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=compstatr">compstatr</a> v0.1.1:
Provides a set of tools for creating yearly data sets of St. Louis Metropolitan Police Department (SLMPD) <a href="http:www.slmpd.org/Crimereports.shtml">crime data</a>, which are available from January 2008 onward as monthly CSV releases. See the <a href="https://cran.r-project.org/web/packages/compstatr/vignettes/compstatr.html">vignette</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/compstatr.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=DataSpaceR">DataSpaceR</a> v0.6.3: Provides a convenient API interface to access immunological data within the <a href="https://dataspace.cavd.org">CAVD DataSpace</a>, a data sharing and discovery tool that facilitates exploration of HIV immunological data from pre-clinical and clinical HIV vaccine studies. There is an <a href="https://cran.r-project.org/web/packages/DataSpaceR/vignettes/Intro_to_DataSpaceR.html">Introduction</a>.</p>
<p><a href="https://cran.r-project.org/package=ebirdst">ebirdst</a> v0.1.0: Provides tools to download, map, plot and analyze <a href="https://ebird.org">eBird</a>, a global database of bird observations collected by citizen scientists, <a href="https://ebird.org/science/status-and-trends">Status and Trends data</a>. There is an <a href="Introduction to loading, mapping, and plotting">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/ebirdst/vignettes/ebirdst-advanced-mapping.html">Generating maps and stats</a>, <a href="https://cran.r-project.org/web/packages/ebirdst/vignettes/ebirdst-introduction.html">Data Structure</a>, and <a href="https://cran.r-project.org/web/packages/ebirdst/vignettes/ebirdst-non-raster.html">Predictor Performance, Directionaly and Performance Metrics</a></p>
<p><a href="https://cran.r-project.org/package=PropublicaR">PropublicaR</a> v0.9.2: Provides wrapper functions to access the ProPublica’s Congress and Campaign Finance <a href="https://www.propublica.org/datastore/apis">APIs</a>.</p>
<p><a href="https://cran.r-project.org/package=tradestatistics">tradestatistics</a> v0.2: Provides access to the <a href="https://tradestatistics.io/">Open Trade Statistics</a> API from R to download international trade data. There is an <a href="https://cran.r-project.org/web/packages/tradestatistics/vignettes/basic-usage.html">Introduction</a> and a <a href="https://cran.r-project.org/web/packages/tradestatistics/vignettes/creating-datasets.html">vignette</a> on data sets.</p>
<p><a href="https://cran.r-project.org/package=ukpolice">ukpolice</a> v0.1.2: Provides access to <a href="https://data.police.uk/docs/">UK Police public data</a>, including data on police forces and police force areas, crime reports, and the use of stop-and-search powers. See the <a href="https://cran.r-project.org/web/packages/ukpolice/vignettes/introduction.html">vignette</a> for details.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/ukpolice.png" height = "400" width="600"></p>
<h3 id="econometrics">Econometrics</h3>
<p><a href="https://cran.r-project.org/package=modelplotr">modelplotr</a> v1.0.0: Provides plots to assess the quality of predictive models from a business perspective, which can show how implementing the model will impact business targets like response on a campaign or return on investment. See the <a href="https://cran.r-project.org/web/packages/modelplotr/vignettes/modelplotr.html">vignette</a> for examples.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/modelplotr.jpeg" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=SortedEffects">SortedEffects</a> v1.0.0: Implements the estimation and inference methods for sorted causal effects and classification analysis as described in <a href="doi:10.3982/ECTA14415">Chernozhukov et al. (2018)</a>. See the <a href="https://cran.r-project.org/web/packages/SortedEffects/vignettes/VignetteGithub.html">vignette</a> for an introduction and example.</p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://CRAN.R-project.org/package=iBreakDown">iBreakDown</a> v0.9.6: Implements Break Down Tables and Plots, which are model-agnostic tools for the decomposition of predictions from black boxes. The methodology behind it is described in <a href="arXiv:1903.11420">Gosiewska and Biecek (2019)</a>. There are vignettes for <a href="https://cran.r-project.org/web/packages/iBreakDown/vignettes/vignette_iBreakDown_classification.html">classification models</a>, <a href="https://cran.r-project.org/web/packages/iBreakDown/vignettes/vignette_iBreakDown_regression.html">regression</a>, and an <a href="https://cran.r-project.org/web/packages/iBreakDown/vignettes/vignette_iBreakDown_titanic.html">Example</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/iBreakDown.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=localModel">localModel</a> v0.3.11: Provides local explanations of machine learning models, and describes how features contributed to a single prediction. See <a href="doi:10.1145/2939672.2939778">Ribeiro & Singh (2016)</a> for details. There is an <a href="https://cran.r-project.org/web/packages/localModel/vignettes/regression_example.html">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/localModel/vignettes/classification_example.html">Methodology</a> and <a href="https://cran.r-project.org/web/packages/localModel/vignettes/classification_example.html">Explaining Classification Models</a>.</p>
<p><a href="https://cran.r-project.org/package=polyreg">polyreg</a> v0.6.4: Automates the formation and evaluation of polynomial regression models, and provides support for cross-validating categorical variables. See <a href="arXiv:1806.06850">Cheng et al.</a>, and look <a href="https://github.com/matloff/polyreg">here</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=rfVarImpOOB">rfVarImpOOB</a> v1.0: Estimates variable importance for random forests by computing impurity reduction importance scores for out-of-bag (OOB) data, complementing the existing in-bag Gini importance. See <a href="doi:10.1186/1471-2105-8-25">Strobl et al (2007)</a>, <a href="doi:10.1016/j.csda.2006.12.030">Strobl et al (2007)</a>, and <a href="doi:10.1023/A:1010933404324">Breiman (2001)</a>. The <a href="https://cran.r-project.org/web/packages/rfVarImpOOB/vignettes/rfVarImpOOB-vignette.html">vignette</a> contains a small example.</p>
<p><a href="https://cran.r-project.org/package=rsparse">rsparse</a> v0.3.3.1: Implements several algorithms for statistical learning on sparse matrices, including matrix factorizations, matrix completion, elastic net regressions, and factorization machines. Look here for an <a href="https://github.com/dselivanov/rsparse">example</a>.</p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=blockRAR">blockRAR</a> v1.0: Provides functions to compute power for response-adaptive randomization clinical trial with a block design that captures both the time and treatment effect. See [Chandereng & Chappell (2019)](arXiv:1904.07758 for details and the <a href="https://cran.r-project.org/web/packages/blockRAR/vignettes/blockRAR.html">vignette</a> for Bayesian and Frequentist examples.</p>
<p><a href="https://CRAN.R-project.org/package=gestate">gestate</a> v1.3.2: Provides tools to assist planning and monitoring of time-to-event trials under complicated censoring assumptions and/or non-proportional hazards. There are vignette on <a href="https://cran.r-project.org/web/packages/gestate/vignettes/event_prediction.html">Predicting Events</a> and <a href="https://cran.r-project.org/web/packages/gestate/vignettes/trial_planning.html">Planning Trials</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/gestate.png" height = "400" width="600"></p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=streamDepletr">streamDepletr</a> v0.1.0: Implements analytical models for estimating streamflow depletion due to groundwater pumping, and other related tools. See <a href="doi:10.1029/2018WR022707">Zipper et al. (2018)</a> for more information on depletion apportionment equations, and <a href="doi:10.31223/osf.io/uqbd7">Zipper et al. (2019)</a> for more information on analytical depletion functions. There is a <a href="https://cran.r-project.org/web/packages/streamDepletr/vignettes/intro-to-streamDepletr.html">vignette</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/streamDepletr.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=robustSingleCell">RobustSingleCell</a> v0.1.1: Implements functions for the robust single cell clustering and comparison of population compositions across tissues and experimental models via similarity analysis. See <a href="doi:10.1101/543199">Magen (2019)</a> for details and the <a href="https://cran.r-project.org/web/packages/robustSingleCell/vignettes/lcmv.html">vignette</a> for examples.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/RobustSingleCell.png" height = "400" width="600"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=BayesSenMC">BayesSenMC</a> v0.1.1: Provides functions to generate different posterior distributions of adjusted odds ratio under different priors of sensitivity and specificity, and plots the models for comparison. See <a href="doi:10.1016/j.annepidem.2006.04.001">Chu et al. (2006)</a> and <a href="doi:10.1177/0272989X09353452">Chu et al. (2010)</a> for background.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/BayesSenMC.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=bayestestR">bayestestR</a> 0.1.0: Provides utilities to describe posterior distributions and Bayesian models, and the <a href="https://cran.r-project.org/web/packages/BayesSenMC/vignettes/BayesSenMC_demo.pdf">vignette</a> for details and examples. There is a <a href="https://cran.r-project.org/web/packages/bayestestR/vignettes/bayestestR.html">Getting Started Guide</a>, and vignettes on <a href="https://cran.r-project.org/web/packages/bayestestR/vignettes/indicesEstimationComparison.html">Comparison of Point Estimates</a>, <a href="https://cran.r-project.org/web/packages/bayestestR/vignettes/indicesExistenceComparison.html">Comparison of Indices</a>, and <a href="https://cran.r-project.org/web/packages/bayestestR/vignettes/example1_GLM.html">Examples</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/bayestestR.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=CoSMoS">CoSMos</a> v1.0.1: Implements a framework unifying, extending, and improving a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific ‘parent’ Gaussian process. See <a href="doi:10.1016/j.advwatres.2018.02.013">Papalexiou (2018)</a> and the <a href="https://cran.r-project.org/web/packages/CoSMoS/vignettes/vignette.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=fic">fic</a> v1.0.0: Provides functions to determine how well different models fitted by maximum likelihood estimate a quantity of interest, including generalized linear models and parametric survival models. See <a href="https://www.tandfonline.com/doi/abs/10.1198/016214503000000819">Claeskens and Hjort (2003)</a> for details. There is an <a href="https://cran.r-project.org/web/packages/fic/vignettes/fic.pdf">Introduction</a> along with vignettes for <a href="https://cran.r-project.org/web/packages/fic/vignettes/linear.pdf">Linear regression</a>, <a href="https://cran.r-project.org/web/packages/fic/vignettes/loss.pdf">loss functions</a>, <a href="https://cran.r-project.org/web/packages/fic/vignettes/multistate.pdf">multi-state models</a>, <a href="https://cran.r-project.org/web/packages/fic/vignettes/skewnormal.pdf">skew normal models</a>, and <a href="https://cran.r-project.org/web/packages/fic/vignettes/survival.pdf">survival models</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/fic.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=foieGras">foieGras</a> v0.2.1: Provides functions to fit continuous-time state-space models for filtering Argos satellite (and other) telemetry data. The <a href="https://cran.r-project.org/web/packages/foieGras/vignettes/foiegras-basics.html">vignette</a> provides an overview.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/foieGras.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=glmaag">glmaag</a> v0.0.6: Implements efficient procedures for adaptive LASSO and network regularized for Gaussian, logistic, and Cox models. See <a href="doi:10.1093/bioinformatics/btm423">Ucar, et. al (2007)</a> and <a href="doi:10.1214/009053606000000281">Meinshausen and Buhlmann (2006)</a> for a discussion of network estimation procedures. The <a href="https://cran.r-project.org/web/packages/glmaag/vignettes/glmaag.html">vignette</a> provides an example.</p>
<p><a href="https://CRAN.R-project.org/package=Irescale">Irescale</a> v0.2.6: Provides a scaling method to obtain a standardized <a href="https://en.wikipedia.org/wiki/Moran%27s_I">Moran’s I measure</a> for the spatial autocorrelation of a data set, which gives a measure of similarity between data and its surrounding. See <a href="arXiv:1606.03658">Chen (2009)</a> for the method of calculation and the <a href="https://cran.r-project.org/web/packages/Irescale/vignettes/irescale.html">vignette</a> for an example.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/Irescale.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=mvgraphnorm">mvgraphnorm</a> v1.81: Provides a function to compute a constrained covariance matrix for a given graph to generate samples from a Gaussian graphical model, using different algorithms for the analysis of complex network structure. See <a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-114">Kim et al. (2008)</a> and <a href="https://doi.org/10.1214/aos/1176349846">Speed et al. (1986)</a> for algorithms, and the <a href="https://cran.r-project.org/web/packages/mvgraphnorm/vignettes/mvgraphnorm-intro.pdf">vignette</a> for an example.</p>
<p><a href="https://cran.r-project.org/package=ptsuite">ptsuite</a> v1.0.0: Implements several methods for tail index estimation for power law distributions, including maximum likelihood <a href="doi:10.1016/j.cities.2012.03.001">Newman (2005)</a>, Hill’s estimator <a href="doi:10.1214/aos/1176343247">Hill 1975</a>, least squares <a href="doi:10.9734/BJMCS/2014/10890">Zaher et al. (2014)</a>, method of moments <a href="doi:10.2143/AST.20.2.2005443">Rytgaard (1990)</a>, and percentiles <a href="doi:10.1371/journal.pone.0196456">Bhatti et al. (2018)</a>. There is a <a href="https://cran.r-project.org/web/packages/ptsuite/vignettes/ptsuite_vignette.pdf">vignette</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/ptsuite.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=spatialreg">spatialreg</a> v1.1-3: Provides a collection of functions for fitting functions for spatial cross-sectional models on lattice/areal data using spatial weights matrices. There is a vignette on <a href="https://cran.r-project.org/web/packages/spatialreg/vignettes/SpatialFiltering.html">Moran Eigenvectors</a> and another on the <a href="https://cran.r-project.org/web/packages/spatialreg/vignettes/sids_models.html">North Carolina SIDS data set</a>.</p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=CSTools">CSTools</a> v1.0.0: Implements process-based methods for assessing climate forecasts, including forecast calibration, bias correction, statistical and stochastic down-scaling, optimal forecast combination, and multivariate verification. See <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/j.1600-0870.2005.00104.x">Doblas-Reyes et al. (2005)</a>, <a href="doi:10.1007/s00382-018-4404-z">Mishra et al. (2018)</a>, <a href="doi:10.5194/nhess-18-2825-2018">Terzago et al. (2018)</a>, <a href="doi:10.1175/JAMC-D-16-0204.1">Torralba et al. (2017)</a>, and <a href="doi:10.1175/JHM-D-13-096.1">D’Onofrio et al. (2014)</a> for details. There are vignettes on <a href="https://cran.r-project.org/web/packages/CSTools/vignettes/MultiModelSkill_vignette.html">Multi Model Skill Assessment</a>, <a href="https://cran.r-project.org/web/packages/CSTools/vignettes/MultivarRMSE_vignette.html">Multivariate RMSE</a>, and the <a href="https://cran.r-project.org/web/packages/CSTools/vignettes/RainFARM_vignette.html">RainFarm Model</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/CSTools.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=DChaos">DChaos</a> v0.1-1: Implements several algorithms for detecting chaotic signals inside univariate time series using methods derived from chaos theory, which estimate the complexity of a data set through exploring the structure of the attractor. See <a href="https://link.springer.com/article/10.1007/BF01646553">Ruelle and Takens (1971)</a> for some deep background.</p>
<p><a href="https://cran.r-project.org/package=otsad">otsad</a> v0.1.0: Implements a set of online fault detectors for time-series, including PEWMA <a href="doi:10.1109/SSP.2012.6319708">Carter et al. (2012)</a>, SD-EWMA and TSSD-EWMA <a href="doi:10.1016/j.patcog.2014.07.028">H. Raza et al. (2015)</a>, KNN-CAD <a href="arXiv:1608.04585">Burnaev et al. (2016)</a>, KNN-LDCD <a href="arXiv:1706.03412">Ishimtsev et al. (2017)</a>, and CAD-OSE <a href="https://github.com/smirmik/CAD">M. Smirnov (2018)</a>. There is a <a href="https://cran.r-project.org/web/packages/otsad/vignettes/otsad.pdf">vignette</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/otsad.png" height = "400" width="600"></p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=inspectdf">inspectdf</a> v0.0.2: Provides a collection of utilities for column-wise summary, comparison, and visualization of data frames. Functions report missingness, categorical levels, numeric distribution, correlation, column types, and memory usage. See <a href="https://cran.r-project.org/web/packages/inspectdf/readme/README.html">README</a> for examples.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/inspectdf.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=suppdata">suppdata</a> v1.1-1: Provides functions for downloading data supplementary materials from manuscripts, using papers’ DOIs as references. Facilitates open, reproducible research workflows: scientists re-analyzing published data sets can work with them as easily as if they were stored on their own computer. There is a brief <a href="https://cran.r-project.org/web/packages/suppdata/vignettes/suppdata-intro.pdf">Introduction</a>.</p>
<p><a href="https://cran.r-project.org/package=tidync">tidync</a> v0.2.1: Provides tidy tools for working with <a href="https://www.unidata.ucar.edu/software/netcdf/">NetCDF</a> data sources. The <a href="https://cran.r-project.org/web/packages/tidync/vignettes/netcdf-with-tidync.html">vignette</a> provides background and describes data extraction and exploration.</p>
<p><a href="https://cran.r-project.org/package=tinytest">tinytest</a> v0.9.3: Provides a lightweight (zero-dependency) and easy to use unit testing framework. Main feature: install tests with the package. The <a href="https://cran.r-project.org/web/packages/tinytest/vignettes/using_tinytest.pdf">vignette</a> shows how to use the package.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=frequentdirections">frequentdirections</a> v0.1.0: Implements frequent-directions algorithm for efficient matrix sketching. See <a href="doi:10.1145/2487575.2487623">Edo Liberty (2013)</a> for details and the <a href="https://cran.r-project.org/web/packages/frequentdirections/readme/README.html">README</a> for examples.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/frequentdirections.png" height = "400" width="600"></p>
<p><a href="https://CRAN.R-project.org/package=ggdemetra">ggdemetra</a> v0.1.0: Provides <code>ggplot2</code> functions to return the results of seasonal and trading-day adjustment made by <a href="https://github.com/jdemetra/jdemetra-app">RJDemetra</a>, the seasonal adjustment software officially recommended to the members of the European Statistical System and the European System of Central Banks. The <a href="https://cran.r-project.org/web/packages/ggdemetra/vignettes/ggdemetra.html">vignette</a> provides examples.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/ggdemetra.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=graphlayouts">graphlayouts</a> v0.1.0: Implements several new layout algorithms to visualize networks. Most are based on the concept of stress majorization by <a href="doi:10.1007/978-3-540-31843-9_25">Gansner et al. (2004)</a>. The <a href="https://cran.r-project.org/web/packages/graphlayouts/vignettes/introduction.html">vignette</a> shows several examples.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/graphlayouts.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=tidymv">tidymv</a> v2.1.0: Provides functions for visualizing generalized additive models and getting predicted values using tidy tools from the <code>tidyverse</code> packages. There is a vignette for <a href="https://cran.r-project.org/web/packages/tidymv/vignettes/predict-gam.html">Plotting Model Predictions</a> and another for <a href="https://cran.r-project.org/web/packages/tidymv/vignettes/plot-smooths.html">Plotting Smoothing Curves</a>.</p>
<p><img src="/post/2019-05-21-AprilTop40_files/tidymv.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2019/05/30/april-2019-top-40-new-cran-packages/';</script>
Momentum Investing with R
https://rviews.rstudio.com/2019/05/29/momentum-investing-with-r/
Wed, 29 May 2019 00:00:00 +0000https://rviews.rstudio.com/2019/05/29/momentum-investing-with-r/
<script src="/rmarkdown-libs/htmlwidgets/htmlwidgets.js"></script>
<script src="/rmarkdown-libs/jquery/jquery.min.js"></script>
<script src="/rmarkdown-libs/proj4js/proj4.js"></script>
<link href="/rmarkdown-libs/highcharts/css/motion.css" rel="stylesheet" />
<link href="/rmarkdown-libs/highcharts/css/htmlwdgtgrid.css" rel="stylesheet" />
<script src="/rmarkdown-libs/highcharts/highcharts.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-3d.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-more.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/stock.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/map.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/annotations.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/boost.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/data.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drag-panes.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drilldown.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/item-series.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/offline-exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/overlapping-datalabels.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/export-data.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/funnel.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/heatmap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/treemap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sankey.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/solid-gauge.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/streamgraph.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sunburst.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/vector.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/wordcloud.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/xrange.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/tilemap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/venn.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/gantt.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/timeline.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/parallel-coordinates.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/grouped-categories.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/motion.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/multicolor_series.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/reset.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/symbols-extra.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/text-symbols.js"></script>
<script src="/rmarkdown-libs/highchart-binding/highchart.js"></script>
<p>After an extended hiatus, Reproducible Finance is back! We’ll celebrate by changing focus a bit and coding up an investment strategy called <a href="http://www.optimalmomentum.com/momentum.html">Momentum</a>. Before we even tiptoe in that direction, please note that this is not intended as investment advice and it’s not intended to be a script that can be implemented for trading. The goal is to explore some R code flows applied to a real-world project. Don’t live-trade this at home!</p>
<p>Back to the substance of the day, the theory behind momentum investing is that an asset that has done well in the recent past will continue to do so. It’s not “buy low and sell high”. It’s “buy high, and sell higher”! What might explain this anomaly? Behavioral economics has some possible answers, like <a href="https://en.wikipedia.org/wiki/Anchoring">anchoring</a>, <a href="https://en.wikipedia.org/wiki/Disposition_effect">disposition</a>, and <a href="https://en.wikipedia.org/wiki/Herd_behavior">herding</a>.</p>
<p>In practice, momentum entails a look back into the past to determine whether an asset has exceeded some benchmark, and if it has, buy and hold that asset for some time into the future. That’s completely flying in the face of the efficient market hypothesis because it’s positing that the past is somehow giving us information that has not been reflected in the current price of the asset.</p>
<p>There’s a plethora of fascinating research on momentum investing; here’s a sampling.</p>
<p>The seminal academic paper is <a href="https://www.jstor.org/stable/2328882">Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency</a> by Jegadeesh and Titman from way back in 1993.</p>
<p>An excellent primer on the philosophy of momentum (and where I would recommend to start) is this <a href="https://alphaarchitect.com/2015/12/01/quantitative-momentum-investing-philosophy/">piece from Alpha Architect</a>. From there, AA also has a treasure trove of momentum posts <a href="https://alphaarchitect.com/category/architect-academic-insights/momentum-investing/">here</a>. More recently, there are two excellent and more advanced research papers from <a href="https://investresolve.com/global-equity-momentum-a-craftsmans-perspective-lp/">ReSolve Asset Management</a>, with some excellent data visualizations, and
<a href="https://blog.thinknewfound.com/2019/01/fragility-case-study-dual-momentum-gem/">Newfound Research</a>, with a concise explanation of the strategy logic.</p>
<p>That’s a lot of material - here’s a short quotation from Fama and French by way of Alpha Architect that should provide us plenty of motivation:<br />
“The premier anomaly is momentum.” <a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a></p>
<p>Let’s get to it and load up some packages.</p>
<pre class="r"><code>library(tidyverse)
library(highcharter)
library(tibbletime)
library(tidyquant)
library(timetk)
library(riingo)</code></pre>
<p>We are going to implement a simplified version of a momentum strategy that deals with 4 assets:</p>
<pre><code>+ SPY, an SP500 ETF
+ EFA, a global equities ETF
+ AGG, a bond ETF
+ TLT, a treasuries ETF</code></pre>
<p>The strategy logic goes as follows: compare the previous twelve months’ returns of the SP500 (SPY) to treasury bonds (TLT). If the returns of SPY do not exceed those of TLT, hold bonds (AGG) for the next month. If SPY returns do exceed TLT, compare the previous 12 months’ returns of the SP500 to the non-US equities (EFA). Whichever of SP500 or non-US equities had the higher previous twelve months’ returns, hold that asset for the next month. Thus, each month, our strategy will hold either AGG, SPY or EFA and we reexamine at the end of each month. That’s twelve look-backs per year, and twelve possible buy/sell transactions per year.</p>
<p>Here is the trading logic broken down into more code-oriented steps:
1) Import prices for SPY, TLT, AGG and EFA
2) Convert to monthly prices
3) Convert AGG to monthly returns
4) Convert TLT to twelve months’ returns
5) Convert SPY and EFA to both monthly and twelve month’s returns
6) Each month, compare previous twelve months’ SPY returns to those of TLT; if TLT exceeds SPY, hold AGG next month
7) Else, compare previous twelve months’ SPY returns to EFA
8) If previous twelve months’ SPY returns exceeded those of EFA, hold SPY next month
9) If previous twelve months’ EFA returns exceeded those of SPY, hold EFA next month</p>
<p>Let’s start by importing the price data from <code>tiingo</code>. We used <code>tiingo</code> via the <code>riingo</code> package in a <a href="http://www.reproduciblefinance.com/2019/01/14/looking-back-on-last-year/">previous post</a>, if you’d like to see it in a different context and get a bit more explanation.</p>
<p>We’ll first create a vector that holds tickers for SPY, AGG, EFA and TLT, and then pass that vector to <code>riingo_prices()</code>.</p>
<pre class="r"><code># riingo requires an API key. You can get one free from tiingo.com
# riingo_set_token("your API key here")
symbols <-
c("SPY", "AGG", "EFA", "TLT")
symbols %>%
riingo_prices(.,
start_date = "2000-01-01",
end_date = "2018-12-31") %>%
mutate(date = ymd(date)) %>%
# see what happens if don't group_by here
group_by(ticker) %>%
slice(1)</code></pre>
<pre><code># A tibble: 4 x 14
# Groups: ticker [4]
ticker date close high low open volume adjClose adjHigh adjLow
<chr> <date> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 AGG 2003-09-26 102. 102. 102 102 1.18e4 60.9 60.9 60.7
2 EFA 2001-08-17 126. 126. 125. 126 1.61e5 27.0 27.0 26.9
3 SPY 2000-01-03 145. 148. 144. 148. 8.16e6 101. 103. 100.
4 TLT 2002-07-26 82.5 82.8 82.4 82.7 3.16e5 44.6 44.7 44.5
# … with 4 more variables: adjOpen <dbl>, adjVolume <int>, divCash <dbl>,
# splitFactor <dbl></code></pre>
<p>We imported price data for each of our ETFs, but they don’t have the same inception dates. AGG launched in September of 2003, so that’s when we’ll be constrained to as a start date. We will have about 15 years of data.</p>
<p>As noted above, our strategy is tied to a monthly periodicity, looking back twelve months to determine if we wish to hold a certain asset for the next month. Let’s convert our daily prices to monthly prices with <code>tq_transmute(select = adjClose, mutate_fun = to.monthly, indexAt = "lastof")</code> from the <code>tidyquant</code> package. We are telling that function to convert the column called ‘adjClose’ to monthly prices, and anchored to the end of the month. We will store the results in an object called <code>prices_monthly</code>.</p>
<pre class="r"><code>prices_monthly <-
symbols %>%
riingo_prices(.,
start_date = "2000-01-01",
end_date = "2018-12-31") %>%
mutate(date = ymd(date)) %>%
group_by(ticker) %>%
tq_transmute(select = adjClose, mutate_fun = to.monthly, indexAt = "lastof")</code></pre>
<p>Let’s use <code>slice(1, n())</code> to grab the first and last observation for each of our funds. The last observation is considered the <code>nth</code> observation here.</p>
<pre class="r"><code>prices_monthly %>%
slice(1, n())</code></pre>
<pre><code># A tibble: 8 x 3
# Groups: ticker [4]
ticker date adjClose
<chr> <date> <dbl>
1 AGG 2003-09-30 61.1
2 AGG 2018-12-31 105.
3 EFA 2001-08-31 26.8
4 EFA 2018-12-31 58.8
5 SPY 2000-01-31 97.3
6 SPY 2018-12-31 249.
7 TLT 2002-07-31 44.6
8 TLT 2018-12-31 120. </code></pre>
<p>We have monthly prices for our funds, with differing start dates and a common end date. Now let’s start to wrangle this data into a format that is suited for our strategy’s logic. Recall that we wish to implement this flow:</p>
<ol style="list-style-type: decimal">
<li>If prev twelve months’ SPY returns is lower than TLT’s, buy/hold AGG this month, else move to step 2.</li>
<li>If prev twelve months’ SPY is higher then prev twelve months’ EFA, buy/hold SPY this month, else move to step 3.</li>
<li>If prev twelve months’ EFA is higher then prev twelve months’ SPY, buy/hold EFA this month.</li>
</ol>
<p>Since we need to compare the past twelve months’ return of SPY to that of TLT and then to that of EFA, it will be nice if those twelve months’ returns for each asset are in three separate columns. That means we’ll want a <code>tibble</code> with one column for twelve month’s returns of SPY, one column for twelve months’ returns of TLT, and one column for 12 months’ returns of EFA.</p>
<p>Honestly, that last sentence makes me pause and wonder if I’m doing something totally wrong. Any time I start taking data that is tidy and making it wider, I stop and think about why I need to do that. There is currently one column for prices, one column for the date, and one column for the ticker name.</p>
<pre><code># A tibble: 8 x 3
# Groups: ticker [4]
ticker date adjClose
<chr> <date> <dbl>
1 AGG 2003-09-30 61.1
2 AGG 2003-10-31 60.5
3 EFA 2001-08-31 26.8
4 EFA 2001-09-30 24.2
5 SPY 2000-01-31 97.3
6 SPY 2000-02-29 95.8
7 TLT 2002-07-31 44.6
8 TLT 2002-08-31 47.0</code></pre>
<p>To me, by putting those symbols in one column, we are treating them as being the same variable but different categories or groups of that variable. However, our trading logic treats them as different classes of assets entirely, not just different groups within the same thing called ‘ticker’. Plus, it’s easier to compare these assets with <code>if_else()</code> logic when they are in different columns. So, we’ll take this nice, tidy tibble of ETF prices and make it wider with different returns data.</p>
<p>Let’s get to it and hopefully that will make sense.</p>
<p>Since we are going to be calculating monthly and rolling twelve month returns for several assets and mashing them together, let’s run the calculations in pieces then join them. We’ll start with TLT, which means using <code>filter(ticker == "TLT")</code>. Then we’ll use <code>mutate()</code> to calculate twelve-month returns. Notice that we use quite verbose names for the new column, e.g., <code>tbill_twelve_mon_ret</code>. That will get cumbersome, but it will also help our human eyes to quickly glance at our end data and find the new columns.</p>
<pre class="r"><code>n_lag <- 12
risk_free_tlt <-
prices_monthly %>%
filter(ticker == "TLT") %>%
mutate(tbill_twelve_mon_ret = ((adjClose / lag(adjClose, n_lag)) - 1)) %>%
# Why ungroup here? If we don't tibble won't let us delete the ticker column
ungroup() %>%
select(-adjClose, -ticker) %>%
na.omit()
risk_free_tlt %>%
head()</code></pre>
<pre><code># A tibble: 6 x 2
date tbill_twelve_mon_ret
<date> <dbl>
1 2003-07-31 0.0535
2 2003-08-31 0.0115
3 2003-09-30 0.0279
4 2003-10-31 0.0350
5 2003-11-30 0.0481
6 2003-12-31 0.0162</code></pre>
<p>We calculated just the twelve months’ returns and did not include the one month return, because our logic does not contemplate holding this asset for even a month. We use AGG for our bond exposure. But, the twelve months’ returns of AGG are not part of our logic, so we won’t calculate that. This is another reason to break up these returns calculations and then do a big join at the end.</p>
<p>We will calculate the one month returns of AGG with the exact same code, except we lag by one instead of twelve.</p>
<pre class="r"><code>bond_returns <-
prices_monthly %>%
filter(ticker == "AGG") %>%
mutate(bond_return = ((adjClose / lag(adjClose, 1)) - 1)) %>%
ungroup() %>%
select(-adjClose, -ticker) %>%
na.omit()
bond_returns %>%
head()</code></pre>
<pre><code># A tibble: 6 x 2
date bond_return
<date> <dbl>
1 2003-10-31 -0.00935
2 2003-11-30 0.00335
3 2003-12-31 0.00979
4 2004-01-31 0.00441
5 2004-02-29 0.0114
6 2004-03-31 0.00684</code></pre>
<p>For EFA and SPY, we need to calculate both twelve months’ returns and one month returns. The code flow for calculating their returns are identical. I’m not going to delete the <code>ticker</code> column from <code>SPY</code>, but I will rename it to <code>mom_asset</code> (just in case we want to try different assets later, or even use Shiny to dynamically choose different assets).</p>
<pre class="r"><code>equities_ex_us_returns <-
prices_monthly %>%
filter(ticker == "EFA") %>%
mutate(ex_us_return = ((adjClose / lag(adjClose)) - 1),
ex_us_twelve_mon_ret = ((adjClose / lag(adjClose, n_lag)) -1)) %>%
ungroup() %>%
select(-adjClose, -ticker) %>%
na.omit()
sp_500_returns <-
prices_monthly %>%
filter(ticker == "SPY") %>%
mutate(spy_return = ((adjClose / lag(adjClose)) - 1),
spy_twelve_mon_ret = ((adjClose / lag(adjClose, n_lag)) - 1)) %>%
select(-adjClose) %>%
rename(mom_asset = ticker)</code></pre>
<p>We now have four tibbles of returns, and want to combine them to one tibble. They all have a column called <code>date</code>, so we can run a bunch of calls to <code>left_join(..., by = "date")</code>.</p>
<pre class="r"><code>joined_returns_tbl <-
sp_500_returns %>%
left_join(risk_free_tlt, by = "date") %>%
left_join(equities_ex_us_returns, by = "date") %>%
left_join(bond_returns, by = "date") %>%
na.omit()
joined_returns_tbl %>%
head()</code></pre>
<pre><code># A tibble: 6 x 8
# Groups: mom_asset [1]
mom_asset date spy_return spy_twelve_mon_… tbill_twelve_mo…
<chr> <date> <dbl> <dbl> <dbl>
1 SPY 2003-10-31 0.0535 0.209 0.0350
2 SPY 2003-11-30 0.0109 0.151 0.0481
3 SPY 2003-12-31 0.0503 0.282 0.0162
4 SPY 2004-01-31 0.0198 0.340 0.0411
5 SPY 2004-02-29 0.0136 0.377 0.0329
6 SPY 2004-03-31 -0.0132 0.356 0.0619
# … with 3 more variables: ex_us_return <dbl>, ex_us_twelve_mon_ret <dbl>,
# bond_return <dbl></code></pre>
<p>Alright, we now have one massive tibble called <code>joined_returns_tbl</code> with eight columns. Those verbose names will come in handy here. For example, if we want to see all our twelve month calculations, we can use <code>select(date, contains("twelve"))</code> to grab each column whose name contains the string “twelve”.</p>
<pre class="r"><code>joined_returns_tbl %>%
select(date, contains("twelve")) %>%
head()</code></pre>
<pre><code># A tibble: 6 x 5
# Groups: mom_asset [1]
mom_asset date spy_twelve_mon_r… tbill_twelve_mon… ex_us_twelve_mo…
<chr> <date> <dbl> <dbl> <dbl>
1 SPY 2003-10-31 0.209 0.0350 0.285
2 SPY 2003-11-30 0.151 0.0481 0.247
3 SPY 2003-12-31 0.282 0.0162 0.398
4 SPY 2004-01-31 0.340 0.0411 0.477
5 SPY 2004-02-29 0.377 0.0329 0.544
6 SPY 2004-03-31 0.356 0.0619 0.581</code></pre>
<p>We selected twelve months’ returns for SPY, EFA, and TLT so we can quickly glance at the new data and make sure it looks how we were expecting.</p>
<p>Next, we’ll implement the trading logic, which is, to repeat:</p>
<ol style="list-style-type: decimal">
<li>If previous twelve months SP500 returns are lower than treasuries, buy/hold bonds via AGG this month, else to next step.</li>
<li>If previous twelve months SP500 returns higher than EFA (equities ex us proxy), buy/hold SP500 this month.</li>
<li>If previous twelve months EFA returns higher than SP500, buy/hold EFA this month.</li>
</ol>
<p>We can implement this using the <code>if_else()</code> function from <code>dplyr</code>, and we’ll place those <code>if_else()</code> calls inside <code>mutate()</code>. To implement step one, we write
<code>if_else(lag(spy_twelve_mon_ret) < lag(tbill_twelve_mon_ret), bond_return</code>, which in English says if the lagged 12-month SPY return is less than the lagged 12-month TLT return, then this month we invest in AGG, or we book the one month return of AGG. We follow that with the <code>else</code> logic,
<code>if_else(lag(spy_twelve_mon_ret) > lag(ex_us_twelve_mon_ret), spy_return</code>, and that with our final <code>else</code> logic, <code>ex_us_return</code>. We labeled the new column that holds the results of those <code>if_else()</code> statements as <code>strat_returns</code>.</p>
<p>It might be interesting or important to know when the strategy is holding AGG, SPY, or EFA, so let’s create a column called <code>strat_label</code> that has the values <code>bond</code>, <code>spy</code>, and <code>ex-us</code> depending on where the strategy invests. That would allow us to calculate and visualize, for example, the proportion of time spent invested in EFA.</p>
<pre class="r"><code>joined_returns_tbl %>%
# encode logic, if ticker higher than tbill, we could complicate this by adding in other
# assets for comparison.
mutate(strat_returns = if_else(lag(spy_twelve_mon_ret) < lag(tbill_twelve_mon_ret),
bond_return,
if_else(lag(spy_twelve_mon_ret) > lag(ex_us_twelve_mon_ret),
spy_return,
ex_us_return)),
strat_label = if_else(lag(spy_twelve_mon_ret) < lag(tbill_twelve_mon_ret),
"bond",
if_else(lag(spy_twelve_mon_ret) > lag(ex_us_twelve_mon_ret),
"spy",
"ex_us"))) %>%
na.omit() %>%
select(date, contains("strat")) %>%
head()</code></pre>
<pre><code># A tibble: 6 x 4
# Groups: mom_asset [1]
mom_asset date strat_returns strat_label
<chr> <date> <dbl> <chr>
1 SPY 2003-11-30 0.0253 ex_us
2 SPY 2003-12-31 0.0834 ex_us
3 SPY 2004-01-31 0.0113 ex_us
4 SPY 2004-02-29 0.0230 ex_us
5 SPY 2004-03-31 0.000707 ex_us
6 SPY 2004-04-30 -0.0339 ex_us </code></pre>
<p>All right, the logic or the signal is encoded. Let’s add a column for a 80/20 SPY/AGG portfolio as a sort of buy and hold benchmark and then save the whole massive <code>tibble</code> as <code>strat_returns</code>.</p>
<pre class="r"><code>strat_returns <-
joined_returns_tbl %>%
# encode logic, if ticker higher than tbill, we could complicate this by adding in other
# assets for comparison.
mutate(strat_returns = if_else(lag(spy_twelve_mon_ret) < lag(tbill_twelve_mon_ret),
bond_return,
if_else(lag(spy_twelve_mon_ret) > lag(ex_us_twelve_mon_ret),
spy_return,
ex_us_return)),
strat_label = if_else(lag(spy_twelve_mon_ret) < lag(tbill_twelve_mon_ret),
"bond",
if_else(lag(spy_twelve_mon_ret) > lag(ex_us_twelve_mon_ret),
"spy",
"ex_us")),
bench_returns = (.8 * spy_return) + (.2 * bond_return)) %>%
na.omit() %>%
select(date, bench_returns, contains("strat"))</code></pre>
<p>Let’s see how frequently we are invested in bonds versus SPY versus equities ex-USA.</p>
<pre class="r"><code>strat_returns %>%
count(strat_label) %>%
mutate(prop = prop.table(n))</code></pre>
<pre><code># A tibble: 3 x 4
# Groups: mom_asset [1]
mom_asset strat_label n prop
<chr> <chr> <int> <dbl>
1 SPY bond 63 0.346
2 SPY ex_us 65 0.357
3 SPY spy 54 0.297</code></pre>
<p>Let’s visualize the same data with a bar chart using <code>geom_col()</code> and the height set to the proportion of months in each asset. Instead of using a legend, or placing the labels on the x-axis, we will place them on the chart above the bars.</p>
<p>First we add the labels with <code>geom_label(aes(label = strat_label)</code> and let’s make them white with <code>fill = "white"</code>. We can remove the legend with <code>theme(legend.position = "none")</code> and the x-axis text can be removed with <code>theme(...axis.text.x = element_blank(), axis.ticks = element_blank())</code>.</p>
<pre class="r"><code>strat_returns %>%
count(strat_label) %>%
mutate(prop = prop.table(n)) %>%
ggplot(aes(x = strat_label, y = prop, fill = strat_label)) +
geom_col(width = .15) +
scale_y_continuous(labels = scales::percent) +
geom_label(aes(label = strat_label), vjust = -.5, fill = "white") +
ylab("relative frequencies") +
xlab("") +
expand_limits(y = .4) +
theme(legend.position = "none",
axis.text.x = element_blank(),
axis.ticks = element_blank()) </code></pre>
<p><img src="/post/2019-05-22-momentum-investing-with-r_files/figure-html/unnamed-chunk-14-1.png" width="672" /></p>
<p>Since the data is in a nice, tidy format, we can also head straight to <code>highcharter</code> to build an interactive bar chart.</p>
<pre class="r"><code>strat_returns %>%
count(strat_label) %>%
mutate(prop = prop.table(n)) %>%
hchart(., hcaes(x = strat_label, y = prop, color = strat_label),
type = "column",
pointWidth = 30) %>%
hc_tooltip(pointFormat = "{point.strat_label}: {point.prop: .2f}")</code></pre>
<div id="htmlwidget-1" style="width:100%;height:500px;" class="highchart html-widget"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"hc_opts":{"title":{"text":null},"yAxis":{"title":{"text":"prop"},"type":"linear"},"credits":{"enabled":false},"exporting":{"enabled":false},"plotOptions":{"series":{"label":{"enabled":false},"turboThreshold":0,"showInLegend":false},"treemap":{"layoutAlgorithm":"squarified"},"scatter":{"marker":{"symbol":"circle"}}},"series":[{"group":"group","data":[{"mom_asset":"SPY","strat_label":"bond","n":63,"prop":0.346153846153846,"y":0.346153846153846,"color":"#440154","name":"bond","colorv":"bond"},{"mom_asset":"SPY","strat_label":"ex_us","n":65,"prop":0.357142857142857,"y":0.357142857142857,"color":"#21908C","name":"ex_us","colorv":"ex_us"},{"mom_asset":"SPY","strat_label":"spy","n":54,"prop":0.296703296703297,"y":0.296703296703297,"color":"#FDE725","name":"spy","colorv":"spy"}],"type":"column","pointWidth":30}],"xAxis":{"type":"category","title":{"text":"strat_label"},"categories":null},"tooltip":{"pointFormat":"{point.strat_label}: {point.prop: .2f}"}},"theme":{"chart":{"backgroundColor":"transparent"}},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"chart","fonts":[],"debug":false},"evals":[],"jsHooks":[]}</script>
<p>Maybe we want to see the distribution of returns for each strategy. Here is a histogram of each using <code>geom_histogram()</code>.</p>
<pre class="r"><code>strat_returns %>%
ungroup() %>%
select(bench_returns, strat_returns) %>%
gather(type, returns) %>%
ggplot(aes(returns, color = type, fill = type)) +
geom_histogram(bins = 60) +
facet_wrap(~type)</code></pre>
<p><img src="/post/2019-05-22-momentum-investing-with-r_files/figure-html/unnamed-chunk-16-1.png" width="672" /></p>
<p>Here’s a horizontal box plot and violin chart built using a combination of <code>gom_boxplot</code> and <code>geom_violin()</code>.</p>
<pre class="r"><code>strat_returns %>%
ungroup() %>%
select(bench_returns, strat_returns) %>%
gather(type, returns) %>%
ggplot(aes(x = type, y = returns, color = type, fill = type)) +
geom_boxplot(width = 0.1, fill = "white")+
geom_violin(alpha = .05) +
coord_flip()</code></pre>
<p><img src="/post/2019-05-22-momentum-investing-with-r_files/figure-html/unnamed-chunk-17-1.png" width="672" /></p>
<p>The distribution of returns is helpful, but let’s also have a look at which returns path delivered the highest growth. We won’t take account of risk here; we’ll just visualize how one dollar would have grown in each strategy. We will start with the seemingly underused <code>accumulate()</code> function from <code>purrr</code> to create two new columns called <code>strat_growth</code> and <code>bench_growth</code>. Then we use <code>select(date, contains("growth"))</code> to isolate just the date and growth columns.</p>
<pre class="r"><code>strat_growth <-
strat_returns %>%
mutate(strat_growth = accumulate(1 + strat_returns, `*`),
bench_growth = accumulate(1 + bench_returns, `*`)) %>%
ungroup() %>%
select(date, contains("growth")) %>%
gather(strat, growth, -date)</code></pre>
<p>Now we can pipe <code>strat_growth</code> to <code>hchart()</code>.</p>
<pre class="r"><code>strat_growth %>%
hchart(., hcaes(x = date, y = growth, group = strat),
type = "line") %>%
hc_tooltip(pointFormat = "{point.strat}: ${point.growth: .2f}")</code></pre>
<div id="htmlwidget-2" style="width:100%;height:500px;" class="highchart html-widget"></div>
<script type="application/json" data-for="htmlwidget-2">{"x":{"hc_opts":{"title":{"text":null},"yAxis":{"title":{"text":"growth"},"type":"linear"},"credits":{"enabled":false},"exporting":{"enabled":false},"plotOptions":{"series":{"label":{"enabled":false},"turboThreshold":0,"showInLegend":true},"treemap":{"layoutAlgorithm":"squarified"},"scatter":{"marker":{"symbol":"circle"}}},"series":[{"name":"bench_growth","data":[{"date":"2003-11-30","strat":"bench_growth","growth":1.00940618250481,"x":1070150400000,"y":1.00940618250481},{"date":"2003-12-31","strat":"bench_growth","growth":1.05202003358152,"x":1072828800000,"y":1.05202003358152},{"date":"2004-01-31","strat":"bench_growth","growth":1.06958562997628,"x":1075507200000,"y":1.06958562997628},{"date":"2004-02-29","strat":"bench_growth","growth":1.08363868704131,"x":1078012800000,"y":1.08363868704131},{"date":"2004-03-31","strat":"bench_growth","growth":1.07367974774466,"x":1080691200000,"y":1.07367974774466},{"date":"2004-04-30","strat":"bench_growth","growth":1.0513999122807,"x":1083283200000,"y":1.0513999122807},{"date":"2004-05-31","strat":"bench_growth","growth":1.0650082720731,"x":1085961600000,"y":1.0650082720731},{"date":"2004-06-30","strat":"bench_growth","growth":1.08231514022806,"x":1088553600000,"y":1.08231514022806},{"date":"2004-07-31","strat":"bench_growth","growth":1.05642071288145,"x":1091232000000,"y":1.05642071288145},{"date":"2004-08-31","strat":"bench_growth","growth":1.06190008573509,"x":1093910400000,"y":1.06190008573509},{"date":"2004-09-30","strat":"bench_growth","growth":1.07150565533507,"x":1096502400000,"y":1.07150565533507},{"date":"2004-10-31","strat":"bench_growth","growth":1.08440908249718,"x":1099180800000,"y":1.08440908249718},{"date":"2004-11-30","strat":"bench_growth","growth":1.12142100199833,"x":1101772800000,"y":1.12142100199833},{"date":"2004-12-31","strat":"bench_growth","growth":1.15026149252695,"x":1104451200000,"y":1.15026149252695},{"date":"2005-01-31","strat":"bench_growth","growth":1.13075298442515,"x":1107129600000,"y":1.13075298442515},{"date":"2005-02-28","strat":"bench_growth","growth":1.14882294948965,"x":1109548800000,"y":1.14882294948965},{"date":"2005-03-31","strat":"bench_growth","growth":1.12978479398818,"x":1112227200000,"y":1.12978479398818},{"date":"2005-04-30","strat":"bench_growth","growth":1.11674096710101,"x":1114819200000,"y":1.11674096710101},{"date":"2005-05-31","strat":"bench_growth","growth":1.14737765291019,"x":1117497600000,"y":1.14737765291019},{"date":"2005-06-30","strat":"bench_growth","growth":1.15075655420766,"x":1120089600000,"y":1.15075655420766},{"date":"2005-07-31","strat":"bench_growth","growth":1.18359954849724,"x":1122768000000,"y":1.18359954849724},{"date":"2005-08-31","strat":"bench_growth","growth":1.17752036711026,"x":1125446400000,"y":1.17752036711026},{"date":"2005-09-30","strat":"bench_growth","growth":1.18284355345837,"x":1128038400000,"y":1.18284355345837},{"date":"2005-10-31","strat":"bench_growth","growth":1.15812049228407,"x":1130716800000,"y":1.15812049228407},{"date":"2005-11-30","strat":"bench_growth","growth":1.19970943681718,"x":1133308800000,"y":1.19970943681718},{"date":"2005-12-31","strat":"bench_growth","growth":1.20061130007186,"x":1135987200000,"y":1.20061130007186},{"date":"2006-01-31","strat":"bench_growth","growth":1.22353338563158,"x":1138665600000,"y":1.22353338563158},{"date":"2006-02-28","strat":"bench_growth","growth":1.22953281343011,"x":1141084800000,"y":1.22953281343011},{"date":"2006-03-31","strat":"bench_growth","growth":1.24363335517453,"x":1143763200000,"y":1.24363335517453},{"date":"2006-04-30","strat":"bench_growth","growth":1.25583097028832,"x":1146355200000,"y":1.25583097028832},{"date":"2006-05-31","strat":"bench_growth","growth":1.22507352091182,"x":1149033600000,"y":1.22507352091182},{"date":"2006-06-30","strat":"bench_growth","growth":1.22688525158538,"x":1151625600000,"y":1.22688525158538},{"date":"2006-07-31","strat":"bench_growth","growth":1.23507410663869,"x":1154304000000,"y":1.23507410663869},{"date":"2006-08-31","strat":"bench_growth","growth":1.26056273619082,"x":1156982400000,"y":1.26056273619082},{"date":"2006-09-30","strat":"bench_growth","growth":1.29033552352095,"x":1159574400000,"y":1.29033552352095},{"date":"2006-10-31","strat":"bench_growth","growth":1.32463849633314,"x":1162252800000,"y":1.32463849633314},{"date":"2006-11-30","strat":"bench_growth","growth":1.3485223128685,"x":1164844800000,"y":1.3485223128685},{"date":"2006-12-31","strat":"bench_growth","growth":1.36186474377283,"x":1167523200000,"y":1.36186474377283},{"date":"2007-01-31","strat":"bench_growth","growth":1.37751339253658,"x":1170201600000,"y":1.37751339253658},{"date":"2007-02-28","strat":"bench_growth","growth":1.36037327833122,"x":1172620800000,"y":1.36037327833122},{"date":"2007-03-31","strat":"bench_growth","growth":1.3724687166813,"x":1175299200000,"y":1.3724687166813},{"date":"2007-04-30","strat":"bench_growth","growth":1.42272276731405,"x":1177891200000,"y":1.42272276731405},{"date":"2007-05-31","strat":"bench_growth","growth":1.45872103182193,"x":1180569600000,"y":1.45872103182193},{"date":"2007-06-30","strat":"bench_growth","growth":1.44053048836756,"x":1183161600000,"y":1.44053048836756},{"date":"2007-07-31","strat":"bench_growth","growth":1.40753495616816,"x":1185840000000,"y":1.40753495616816},{"date":"2007-08-31","strat":"bench_growth","growth":1.42559006121012,"x":1188518400000,"y":1.42559006121012},{"date":"2007-09-30","strat":"bench_growth","growth":1.47153185661302,"x":1191110400000,"y":1.47153185661302},{"date":"2007-10-31","strat":"bench_growth","growth":1.49047423272496,"x":1193788800000,"y":1.49047423272496},{"date":"2007-11-30","strat":"bench_growth","growth":1.44963490823813,"x":1196380800000,"y":1.44963490823813},{"date":"2007-12-31","strat":"bench_growth","growth":1.43647968149688,"x":1199059200000,"y":1.43647968149688},{"date":"2008-01-31","strat":"bench_growth","growth":1.37364388382329,"x":1201737600000,"y":1.37364388382329},{"date":"2008-02-29","strat":"bench_growth","growth":1.34480568867139,"x":1204243200000,"y":1.34480568867139},{"date":"2008-03-31","strat":"bench_growth","growth":1.33543751491286,"x":1206921600000,"y":1.33543751491286},{"date":"2008-04-30","strat":"bench_growth","growth":1.38724011031619,"x":1209513600000,"y":1.38724011031619},{"date":"2008-05-31","strat":"bench_growth","growth":1.40050836087206,"x":1212192000000,"y":1.40050836087206},{"date":"2008-06-30","strat":"bench_growth","growth":1.30643414390996,"x":1214784000000,"y":1.30643414390996},{"date":"2008-07-31","strat":"bench_growth","growth":1.298074418013,"x":1217462400000,"y":1.298074418013},{"date":"2008-08-31","strat":"bench_growth","growth":1.31523210533141,"x":1220140800000,"y":1.31523210533141},{"date":"2008-09-30","strat":"bench_growth","growth":1.21375240488097,"x":1222732800000,"y":1.21375240488097},{"date":"2008-10-31","strat":"bench_growth","growth":1.04636035168074,"x":1225411200000,"y":1.04636035168074},{"date":"2008-11-30","strat":"bench_growth","growth":0.994437038949117,"x":1228003200000,"y":0.994437038949117},{"date":"2008-12-31","strat":"bench_growth","growth":1.01549331806287,"x":1230681600000,"y":1.01549331806287},{"date":"2009-01-31","strat":"bench_growth","growth":0.944749364017805,"x":1233360000000,"y":0.944749364017805},{"date":"2009-02-28","strat":"bench_growth","growth":0.861546089971479,"x":1235779200000,"y":0.861546089971479},{"date":"2009-03-31","strat":"bench_growth","growth":0.920985234305027,"x":1238457600000,"y":0.920985234305027},{"date":"2009-04-30","strat":"bench_growth","growth":0.995131546181699,"x":1241049600000,"y":0.995131546181699},{"date":"2009-05-31","strat":"bench_growth","growth":1.04308239975355,"x":1243728000000,"y":1.04308239975355},{"date":"2009-06-30","strat":"bench_growth","growth":1.04340670139126,"x":1246320000000,"y":1.04340670139126},{"date":"2009-07-31","strat":"bench_growth","growth":1.10726544705044,"x":1248998400000,"y":1.10726544705044},{"date":"2009-08-31","strat":"bench_growth","growth":1.14289645751772,"x":1251676800000,"y":1.14289645751772},{"date":"2009-09-30","strat":"bench_growth","growth":1.17857257297047,"x":1254268800000,"y":1.17857257297047},{"date":"2009-10-31","strat":"bench_growth","growth":1.16049223106345,"x":1256947200000,"y":1.16049223106345},{"date":"2009-11-30","strat":"bench_growth","growth":1.22117287138367,"x":1259539200000,"y":1.22117287138367},{"date":"2009-12-31","strat":"bench_growth","growth":1.23561813473075,"x":1262217600000,"y":1.23561813473075},{"date":"2010-01-31","strat":"bench_growth","growth":1.20261555164908,"x":1264896000000,"y":1.20261555164908},{"date":"2010-02-28","strat":"bench_growth","growth":1.23361408824105,"x":1267315200000,"y":1.23361408824105},{"date":"2010-03-31","strat":"bench_growth","growth":1.29360842906005,"x":1269993600000,"y":1.29360842906005},{"date":"2010-04-30","strat":"bench_growth","growth":1.31118780145322,"x":1272585600000,"y":1.31118780145322},{"date":"2010-05-31","strat":"bench_growth","growth":1.23179582883007,"x":1275264000000,"y":1.23179582883007},{"date":"2010-06-30","strat":"bench_growth","growth":1.18516972034106,"x":1277856000000,"y":1.18516972034106},{"date":"2010-07-31","strat":"bench_growth","growth":1.25195541515052,"x":1280534400000,"y":1.25195541515052},{"date":"2010-08-31","strat":"bench_growth","growth":1.21012629542368,"x":1283212800000,"y":1.21012629542368},{"date":"2010-09-30","strat":"bench_growth","growth":1.29684371170088,"x":1285804800000,"y":1.29684371170088},{"date":"2010-10-31","strat":"bench_growth","growth":1.33687177720845,"x":1288483200000,"y":1.33687177720845},{"date":"2010-11-30","strat":"bench_growth","growth":1.33483404400689,"x":1291075200000,"y":1.33483404400689},{"date":"2010-12-31","strat":"bench_growth","growth":1.40422659446647,"x":1293753600000,"y":1.40422659446647},{"date":"2011-01-31","strat":"bench_growth","growth":1.43016258423087,"x":1296432000000,"y":1.43016258423087},{"date":"2011-02-28","strat":"bench_growth","growth":1.47073969782229,"x":1298851200000,"y":1.47073969782229},{"date":"2011-03-31","strat":"bench_growth","growth":1.47021002201943,"x":1301529600000,"y":1.47021002201943},{"date":"2011-04-30","strat":"bench_growth","growth":1.50888236884921,"x":1304121600000,"y":1.50888236884921},{"date":"2011-05-31","strat":"bench_growth","growth":1.4990865179382,"x":1306800000000,"y":1.4990865179382},{"date":"2011-06-30","strat":"bench_growth","growth":1.4774950078181,"x":1309392000000,"y":1.4774950078181},{"date":"2011-07-31","strat":"bench_growth","growth":1.45883213404087,"x":1312070400000,"y":1.45883213404087},{"date":"2011-08-31","strat":"bench_growth","growth":1.39910620395612,"x":1314748800000,"y":1.39910620395612},{"date":"2011-09-30","strat":"bench_growth","growth":1.32352719237809,"x":1317340800000,"y":1.32352719237809},{"date":"2011-10-31","strat":"bench_growth","growth":1.43942822276752,"x":1320019200000,"y":1.43942822276752},{"date":"2011-11-30","strat":"bench_growth","growth":1.43379170186672,"x":1322611200000,"y":1.43379170186672},{"date":"2011-12-31","strat":"bench_growth","growth":1.4496560647058,"x":1325289600000,"y":1.4496560647058},{"date":"2012-01-31","strat":"bench_growth","growth":1.50554153640907,"x":1327968000000,"y":1.50554153640907},{"date":"2012-02-29","strat":"bench_growth","growth":1.55778089653617,"x":1330473600000,"y":1.55778089653617},{"date":"2012-03-31","strat":"bench_growth","growth":1.59607099523276,"x":1333152000000,"y":1.59607099523276},{"date":"2012-04-30","strat":"bench_growth","growth":1.59043968327309,"x":1335744000000,"y":1.59043968327309},{"date":"2012-05-31","strat":"bench_growth","growth":1.51745866508106,"x":1338422400000,"y":1.51745866508106},{"date":"2012-06-30","strat":"bench_growth","growth":1.56664507466277,"x":1341014400000,"y":1.56664507466277},{"date":"2012-07-31","strat":"bench_growth","growth":1.58569200512892,"x":1343692800000,"y":1.58569200512892},{"date":"2012-08-31","strat":"bench_growth","growth":1.61747360496623,"x":1346371200000,"y":1.61747360496623},{"date":"2012-09-30","strat":"bench_growth","growth":1.65113960949843,"x":1348963200000,"y":1.65113960949843},{"date":"2012-10-31","strat":"bench_growth","growth":1.62695137952858,"x":1351641600000,"y":1.62695137952858},{"date":"2012-11-30","strat":"bench_growth","growth":1.63524441741861,"x":1354233600000,"y":1.63524441741861},{"date":"2012-12-31","strat":"bench_growth","growth":1.64616850760154,"x":1356912000000,"y":1.64616850760154},{"date":"2013-01-31","strat":"bench_growth","growth":1.71153758325087,"x":1359590400000,"y":1.71153758325087},{"date":"2013-02-28","strat":"bench_growth","growth":1.73103277630898,"x":1362009600000,"y":1.73103277630898},{"date":"2013-03-31","strat":"bench_growth","growth":1.7839646143604,"x":1364688000000,"y":1.7839646143604},{"date":"2013-04-30","strat":"bench_growth","growth":1.81484003566628,"x":1367280000000,"y":1.81484003566628},{"date":"2013-05-31","strat":"bench_growth","growth":1.84180775595479,"x":1369958400000,"y":1.84180775595479},{"date":"2013-06-30","strat":"bench_growth","growth":1.81640029853058,"x":1372550400000,"y":1.81640029853058},{"date":"2013-07-31","strat":"bench_growth","growth":1.89246829581418,"x":1375228800000,"y":1.89246829581418},{"date":"2013-08-31","strat":"bench_growth","growth":1.84393546619203,"x":1377907200000,"y":1.84393546619203},{"date":"2013-09-30","strat":"bench_growth","growth":1.89480512696434,"x":1380499200000,"y":1.89480512696434},{"date":"2013-10-31","strat":"bench_growth","growth":1.96815046190358,"x":1383177600000,"y":1.96815046190358},{"date":"2013-11-30","strat":"bench_growth","growth":2.01383485094302,"x":1385769600000,"y":2.01383485094302},{"date":"2013-12-31","strat":"bench_growth","growth":2.05331510179247,"x":1388448000000,"y":2.05331510179247},{"date":"2014-01-31","strat":"bench_growth","growth":2.00174246842352,"x":1391126400000,"y":2.00174246842352},{"date":"2014-02-28","strat":"bench_growth","growth":2.07613434355836,"x":1393545600000,"y":2.07613434355836},{"date":"2014-03-31","strat":"bench_growth","growth":2.08932099478851,"x":1396224000000,"y":2.08932099478851},{"date":"2014-04-30","strat":"bench_growth","growth":2.10359864162632,"x":1398816000000,"y":2.10359864162632},{"date":"2014-05-31","strat":"bench_growth","growth":2.14758184069103,"x":1401494400000,"y":2.14758184069103},{"date":"2014-06-30","strat":"bench_growth","growth":2.18361032282227,"x":1404086400000,"y":2.18361032282227},{"date":"2014-07-31","strat":"bench_growth","growth":2.15903988845542,"x":1406764800000,"y":2.15903988845542},{"date":"2014-08-31","strat":"bench_growth","growth":2.23216371881198,"x":1409443200000,"y":2.23216371881198},{"date":"2014-09-30","strat":"bench_growth","growth":2.2047925151069,"x":1412035200000,"y":2.2047925151069},{"date":"2014-10-31","strat":"bench_growth","growth":2.25102026948456,"x":1414713600000,"y":2.25102026948456},{"date":"2014-11-30","strat":"bench_growth","growth":2.30345158667313,"x":1417305600000,"y":2.30345158667313},{"date":"2014-12-31","strat":"bench_growth","growth":2.29941632227864,"x":1419984000000,"y":2.29941632227864},{"date":"2015-01-31","strat":"bench_growth","growth":2.25435051650073,"x":1422662400000,"y":2.25435051650073},{"date":"2015-02-28","strat":"bench_growth","growth":2.35168051390927,"x":1425081600000,"y":2.35168051390927},{"date":"2015-03-31","strat":"bench_growth","growth":2.32382425104367,"x":1427760000000,"y":2.32382425104367},{"date":"2015-04-30","strat":"bench_growth","growth":2.34060180846411,"x":1430352000000,"y":2.34060180846411},{"date":"2015-05-31","strat":"bench_growth","growth":2.36262994347014,"x":1433030400000,"y":2.36262994347014},{"date":"2015-06-30","strat":"bench_growth","growth":2.31919463484046,"x":1435622400000,"y":2.31919463484046},{"date":"2015-07-31","strat":"bench_growth","growth":2.36510899979935,"x":1438300800000,"y":2.36510899979935},{"date":"2015-08-31","strat":"bench_growth","growth":2.24819601621605,"x":1440979200000,"y":2.24819601621605},{"date":"2015-09-30","strat":"bench_growth","growth":2.2061058982469,"x":1443571200000,"y":2.2061058982469},{"date":"2015-10-31","strat":"bench_growth","growth":2.35652872815265,"x":1446249600000,"y":2.35652872815265},{"date":"2015-11-30","strat":"bench_growth","growth":2.36158579780638,"x":1448841600000,"y":2.36158579780638},{"date":"2015-12-31","strat":"bench_growth","growth":2.32821899536506,"x":1451520000000,"y":2.32821899536506},{"date":"2016-01-31","strat":"bench_growth","growth":2.24127186187555,"x":1454198400000,"y":2.24127186187555},{"date":"2016-02-29","strat":"bench_growth","growth":2.24376111761244,"x":1456704000000,"y":2.24376111761244},{"date":"2016-03-31","strat":"bench_growth","growth":2.36838787558439,"x":1459382400000,"y":2.36838787558439},{"date":"2016-04-30","strat":"bench_growth","growth":2.37706922803639,"x":1461974400000,"y":2.37706922803639},{"date":"2016-05-31","strat":"bench_growth","growth":2.40947778178633,"x":1464652800000,"y":2.40947778178633},{"date":"2016-06-30","strat":"bench_growth","growth":2.42549988247674,"x":1467244800000,"y":2.42549988247674},{"date":"2016-07-31","strat":"bench_growth","growth":2.49895790952268,"x":1469923200000,"y":2.49895790952268},{"date":"2016-08-31","strat":"bench_growth","growth":2.50027613980805,"x":1472601600000,"y":2.50027613980805},{"date":"2016-09-30","strat":"bench_growth","growth":2.50068731887274,"x":1475193600000,"y":2.50068731887274},{"date":"2016-10-31","strat":"bench_growth","growth":2.46191844271421,"x":1477872000000,"y":2.46191844271421},{"date":"2016-11-30","strat":"bench_growth","growth":2.52184034640917,"x":1480464000000,"y":2.52184034640917},{"date":"2016-12-31","strat":"bench_growth","growth":2.56404408465273,"x":1483142400000,"y":2.56404408465273},{"date":"2017-01-31","strat":"bench_growth","growth":2.60184178984303,"x":1485820800000,"y":2.60184178984303},{"date":"2017-02-28","strat":"bench_growth","growth":2.68698865694094,"x":1488240000000,"y":2.68698865694094},{"date":"2017-03-31","strat":"bench_growth","growth":2.68940246975567,"x":1490918400000,"y":2.68940246975567},{"date":"2017-04-30","strat":"bench_growth","growth":2.71563538261834,"x":1493510400000,"y":2.71563538261834},{"date":"2017-05-31","strat":"bench_growth","growth":2.75003132996593,"x":1496188800000,"y":2.75003132996593},{"date":"2017-06-30","strat":"bench_growth","growth":2.76395581340613,"x":1498780800000,"y":2.76395581340613},{"date":"2017-07-31","strat":"bench_growth","growth":2.81125363469354,"x":1501459200000,"y":2.81125363469354},{"date":"2017-08-31","strat":"bench_growth","growth":2.82310453838704,"x":1504137600000,"y":2.82310453838704},{"date":"2017-09-30","strat":"bench_growth","growth":2.86536658842899,"x":1506729600000,"y":2.86536658842899},{"date":"2017-10-31","strat":"bench_growth","growth":2.91994377948576,"x":1509408000000,"y":2.91994377948576},{"date":"2017-11-30","strat":"bench_growth","growth":2.99047758677823,"x":1512000000000,"y":2.99047758677823},{"date":"2017-12-31","strat":"bench_growth","growth":3.02223334490709,"x":1514678400000,"y":3.02223334490709},{"date":"2018-01-31","strat":"bench_growth","growth":3.15169748940512,"x":1517356800000,"y":3.15169748940512},{"date":"2018-02-28","strat":"bench_growth","growth":3.05366483522254,"x":1519776000000,"y":3.05366483522254},{"date":"2018-03-31","strat":"bench_growth","growth":2.99078864452687,"x":1522454400000,"y":2.99078864452687},{"date":"2018-04-30","strat":"bench_growth","growth":2.99752846614175,"x":1525046400000,"y":2.99752846614175},{"date":"2018-05-31","strat":"bench_growth","growth":3.05979175354224,"x":1527724800000,"y":3.05979175354224},{"date":"2018-06-30","strat":"bench_growth","growth":3.07451084924468,"x":1530316800000,"y":3.07451084924468},{"date":"2018-07-31","strat":"bench_growth","growth":3.16545350260825,"x":1532995200000,"y":3.16545350260825},{"date":"2018-08-31","strat":"bench_growth","growth":3.24988524353731,"x":1535673600000,"y":3.24988524353731},{"date":"2018-09-30","strat":"bench_growth","growth":3.26132638983188,"x":1538265600000,"y":3.26132638983188},{"date":"2018-10-31","strat":"bench_growth","growth":3.07682575890128,"x":1540944000000,"y":3.07682575890128},{"date":"2018-11-30","strat":"bench_growth","growth":3.12570640544882,"x":1543536000000,"y":3.12570640544882},{"date":"2018-12-31","strat":"bench_growth","growth":2.91819520307427,"x":1546214400000,"y":2.91819520307427}],"type":"line"},{"name":"strat_growth","data":[{"date":"2003-11-30","strat":"strat_growth","growth":1.02528089887913,"x":1070150400000,"y":1.02528089887913},{"date":"2003-12-31","strat":"strat_growth","growth":1.11076916796197,"x":1072828800000,"y":1.11076916796197},{"date":"2004-01-31","strat":"strat_growth","growth":1.12327526913934,"x":1075507200000,"y":1.12327526913934},{"date":"2004-02-29","strat":"strat_growth","growth":1.14909955597994,"x":1078012800000,"y":1.14909955597994},{"date":"2004-03-31","strat":"strat_growth","growth":1.14991164046945,"x":1080691200000,"y":1.14991164046945},{"date":"2004-04-30","strat":"strat_growth","growth":1.1109315848628,"x":1083283200000,"y":1.1109315848628},{"date":"2004-05-31","strat":"strat_growth","growth":1.12360010293368,"x":1085961600000,"y":1.12360010293368},{"date":"2004-06-30","strat":"strat_growth","growth":1.16128082335564,"x":1088553600000,"y":1.16128082335564},{"date":"2004-07-31","strat":"strat_growth","growth":1.11174366935232,"x":1091232000000,"y":1.11174366935232},{"date":"2004-08-31","strat":"strat_growth","growth":1.12498064657062,"x":1093910400000,"y":1.12498064657062},{"date":"2004-09-30","strat":"strat_growth","growth":1.14828747148676,"x":1096502400000,"y":1.14828747148676},{"date":"2004-10-31","strat":"strat_growth","growth":1.18848565383318,"x":1099180800000,"y":1.18848565383318},{"date":"2004-11-30","strat":"strat_growth","growth":1.17970823808495,"x":1101772800000,"y":1.17970823808495},{"date":"2004-12-31","strat":"strat_growth","growth":1.2362936795246,"x":1104451200000,"y":1.2362936795246},{"date":"2005-01-31","strat":"strat_growth","growth":1.21276359701198,"x":1107129600000,"y":1.21276359701198},{"date":"2005-02-28","strat":"strat_growth","growth":1.20826050556799,"x":1109548800000,"y":1.20826050556799},{"date":"2005-03-31","strat":"strat_growth","growth":1.17656357045429,"x":1112227200000,"y":1.17656357045429},{"date":"2005-04-30","strat":"strat_growth","growth":1.15753059773369,"x":1114819200000,"y":1.15753059773369},{"date":"2005-05-31","strat":"strat_growth","growth":1.16710519899485,"x":1117497600000,"y":1.16710519899485},{"date":"2005-06-30","strat":"strat_growth","growth":1.17730158495779,"x":1120089600000,"y":1.17730158495779},{"date":"2005-07-31","strat":"strat_growth","growth":1.16512350040202,"x":1122768000000,"y":1.16512350040202},{"date":"2005-08-31","strat":"strat_growth","growth":1.17889185238287,"x":1125446400000,"y":1.17889185238287},{"date":"2005-09-30","strat":"strat_growth","growth":1.16784860339572,"x":1128038400000,"y":1.16784860339572},{"date":"2005-10-31","strat":"strat_growth","growth":1.13066237419901,"x":1130716800000,"y":1.13066237419901},{"date":"2005-11-30","strat":"strat_growth","growth":1.15679323795822,"x":1133308800000,"y":1.15679323795822},{"date":"2005-12-31","strat":"strat_growth","growth":1.1699286216287,"x":1135987200000,"y":1.1699286216287},{"date":"2006-01-31","strat":"strat_growth","growth":1.1692307817108,"x":1138665600000,"y":1.1692307817108},{"date":"2006-02-28","strat":"strat_growth","growth":1.16104653825018,"x":1141084800000,"y":1.16104653825018},{"date":"2006-03-31","strat":"strat_growth","growth":1.20754792155111,"x":1143763200000,"y":1.20754792155111},{"date":"2006-04-30","strat":"strat_growth","growth":1.26539564237845,"x":1146355200000,"y":1.26539564237845},{"date":"2006-05-31","strat":"strat_growth","growth":1.21703420374311,"x":1149033600000,"y":1.21703420374311},{"date":"2006-06-30","strat":"strat_growth","growth":1.21629018161205,"x":1151625600000,"y":1.21629018161205},{"date":"2006-07-31","strat":"strat_growth","growth":1.22614847487093,"x":1154304000000,"y":1.22614847487093},{"date":"2006-08-31","strat":"strat_growth","growth":1.25739740444857,"x":1156982400000,"y":1.25739740444857},{"date":"2006-09-30","strat":"strat_growth","growth":1.26018748744749,"x":1159574400000,"y":1.26018748744749},{"date":"2006-10-31","strat":"strat_growth","growth":1.30743289288219,"x":1162252800000,"y":1.30743289288219},{"date":"2006-11-30","strat":"strat_growth","growth":1.3476100880515,"x":1164844800000,"y":1.3476100880515},{"date":"2006-12-31","strat":"strat_growth","growth":1.39071907437376,"x":1167523200000,"y":1.39071907437376},{"date":"2007-01-31","strat":"strat_growth","growth":1.41009265339585,"x":1170201600000,"y":1.41009265339585},{"date":"2007-02-28","strat":"strat_growth","growth":1.40895303109947,"x":1172620800000,"y":1.40895303109947},{"date":"2007-03-31","strat":"strat_growth","growth":1.44845993733853,"x":1175299200000,"y":1.44845993733853},{"date":"2007-04-30","strat":"strat_growth","growth":1.50354168161176,"x":1177891200000,"y":1.50354168161176},{"date":"2007-05-31","strat":"strat_growth","growth":1.53905990981351,"x":1180569600000,"y":1.53905990981351},{"date":"2007-06-30","strat":"strat_growth","growth":1.53412154653464,"x":1183161600000,"y":1.53412154653464},{"date":"2007-07-31","strat":"strat_growth","growth":1.49898319242898,"x":1185840000000,"y":1.49898319242898},{"date":"2007-08-31","strat":"strat_growth","growth":1.48948633996733,"x":1188518400000,"y":1.48948633996733},{"date":"2007-09-30","strat":"strat_growth","growth":1.56869008949212,"x":1191110400000,"y":1.56869008949212},{"date":"2007-10-31","strat":"strat_growth","growth":1.63535799376764,"x":1193788800000,"y":1.63535799376764},{"date":"2007-11-30","strat":"strat_growth","growth":1.57609763441313,"x":1196380800000,"y":1.57609763441313},{"date":"2007-12-31","strat":"strat_growth","growth":1.57602577791765,"x":1199059200000,"y":1.57602577791765},{"date":"2008-01-31","strat":"strat_growth","growth":1.61247828676644,"x":1201737600000,"y":1.61247828676644},{"date":"2008-02-29","strat":"strat_growth","growth":1.60989955841388,"x":1204243200000,"y":1.60989955841388},{"date":"2008-03-31","strat":"strat_growth","growth":1.61198393326494,"x":1206921600000,"y":1.61198393326494},{"date":"2008-04-30","strat":"strat_growth","growth":1.61731034323042,"x":1209513600000,"y":1.61731034323042},{"date":"2008-05-31","strat":"strat_growth","growth":1.59686217741395,"x":1212192000000,"y":1.59686217741395},{"date":"2008-06-30","strat":"strat_growth","growth":1.59389002219241,"x":1214784000000,"y":1.59389002219241},{"date":"2008-07-31","strat":"strat_growth","growth":1.60018377684575,"x":1217462400000,"y":1.60018377684575},{"date":"2008-08-31","strat":"strat_growth","growth":1.61206961739563,"x":1220140800000,"y":1.61206961739563},{"date":"2008-09-30","strat":"strat_growth","growth":1.58367753054887,"x":1222732800000,"y":1.58367753054887},{"date":"2008-10-31","strat":"strat_growth","growth":1.54759686189933,"x":1225411200000,"y":1.54759686189933},{"date":"2008-11-30","strat":"strat_growth","growth":1.59450791561737,"x":1228003200000,"y":1.59450791561737},{"date":"2008-12-31","strat":"strat_growth","growth":1.70058923468343,"x":1230681600000,"y":1.70058923468343},{"date":"2009-01-31","strat":"strat_growth","growth":1.6668059360668,"x":1233360000000,"y":1.6668059360668},{"date":"2009-02-28","strat":"strat_growth","growth":1.64922149096558,"x":1235779200000,"y":1.64922149096558},{"date":"2009-03-31","strat":"strat_growth","growth":1.66739421814088,"x":1238457600000,"y":1.66739421814088},{"date":"2009-04-30","strat":"strat_growth","growth":1.67598746729818,"x":1241049600000,"y":1.67598746729818},{"date":"2009-05-31","strat":"strat_growth","growth":1.68790950068041,"x":1243728000000,"y":1.68790950068041},{"date":"2009-06-30","strat":"strat_growth","growth":1.69508047922248,"x":1246320000000,"y":1.69508047922248},{"date":"2009-07-31","strat":"strat_growth","growth":1.70794233035881,"x":1248998400000,"y":1.70794233035881},{"date":"2009-08-31","strat":"strat_growth","growth":1.73038144969284,"x":1251676800000,"y":1.73038144969284},{"date":"2009-09-30","strat":"strat_growth","growth":1.75503717807023,"x":1254268800000,"y":1.75503717807023},{"date":"2009-10-31","strat":"strat_growth","growth":1.75538259636291,"x":1256947200000,"y":1.75538259636291},{"date":"2009-11-30","strat":"strat_growth","growth":1.82421467190484,"x":1259539200000,"y":1.82421467190484},{"date":"2009-12-31","strat":"strat_growth","growth":1.83704646419643,"x":1262217600000,"y":1.83704646419643},{"date":"2010-01-31","strat":"strat_growth","growth":1.74399780102854,"x":1264896000000,"y":1.74399780102854},{"date":"2010-02-28","strat":"strat_growth","growth":1.74865023418845,"x":1267315200000,"y":1.74865023418845},{"date":"2010-03-31","strat":"strat_growth","growth":1.86030862998732,"x":1269993600000,"y":1.86030862998732},{"date":"2010-04-30","strat":"strat_growth","growth":1.8081349152831,"x":1272585600000,"y":1.8081349152831},{"date":"2010-05-31","strat":"strat_growth","growth":1.66442005302084,"x":1275264000000,"y":1.66442005302084},{"date":"2010-06-30","strat":"strat_growth","growth":1.57831165084671,"x":1277856000000,"y":1.57831165084671},{"date":"2010-07-31","strat":"strat_growth","growth":1.68611146811488,"x":1280534400000,"y":1.68611146811488},{"date":"2010-08-31","strat":"strat_growth","growth":1.61026932717157,"x":1283212800000,"y":1.61026932717157},{"date":"2010-09-30","strat":"strat_growth","growth":1.61040445414104,"x":1285804800000,"y":1.61040445414104},{"date":"2010-10-31","strat":"strat_growth","growth":1.61285322546713,"x":1288483200000,"y":1.61285322546713},{"date":"2010-11-30","strat":"strat_growth","growth":1.61288725477699,"x":1291075200000,"y":1.61288725477699},{"date":"2010-12-31","strat":"strat_growth","growth":1.72066318254467,"x":1293753600000,"y":1.72066318254467},{"date":"2011-01-31","strat":"strat_growth","growth":1.76075497677813,"x":1296432000000,"y":1.76075497677813},{"date":"2011-02-28","strat":"strat_growth","growth":1.82191890859551,"x":1298851200000,"y":1.82191890859551},{"date":"2011-03-31","strat":"strat_growth","growth":1.82211373564611,"x":1301529600000,"y":1.82211373564611},{"date":"2011-04-30","strat":"strat_growth","growth":1.87488480997287,"x":1304121600000,"y":1.87488480997287},{"date":"2011-05-31","strat":"strat_growth","growth":1.83352271205472,"x":1306800000000,"y":1.83352271205472},{"date":"2011-06-30","strat":"strat_growth","growth":1.81166106709703,"x":1309392000000,"y":1.81166106709703},{"date":"2011-07-31","strat":"strat_growth","growth":1.76858365895056,"x":1312070400000,"y":1.76858365895056},{"date":"2011-08-31","strat":"strat_growth","growth":1.67135463385874,"x":1314748800000,"y":1.67135463385874},{"date":"2011-09-30","strat":"strat_growth","growth":1.55528019419197,"x":1317340800000,"y":1.55528019419197},{"date":"2011-10-31","strat":"strat_growth","growth":1.55724058139326,"x":1320019200000,"y":1.55724058139326},{"date":"2011-11-30","strat":"strat_growth","growth":1.55206424012668,"x":1322611200000,"y":1.55206424012668},{"date":"2011-12-31","strat":"strat_growth","growth":1.5731150728972,"x":1325289600000,"y":1.5731150728972},{"date":"2012-01-31","strat":"strat_growth","growth":1.58452996685071,"x":1327968000000,"y":1.58452996685071},{"date":"2012-02-29","strat":"strat_growth","growth":1.58432172453201,"x":1330473600000,"y":1.58432172453201},{"date":"2012-03-31","strat":"strat_growth","growth":1.57524735338577,"x":1333152000000,"y":1.57524735338577},{"date":"2012-04-30","strat":"strat_growth","growth":1.58952142409785,"x":1335744000000,"y":1.58952142409785},{"date":"2012-05-31","strat":"strat_growth","growth":1.60666672263778,"x":1338422400000,"y":1.60666672263778},{"date":"2012-06-30","strat":"strat_growth","growth":1.60634604039347,"x":1341014400000,"y":1.60634604039347},{"date":"2012-07-31","strat":"strat_growth","growth":1.62822358837955,"x":1343692800000,"y":1.62822358837955},{"date":"2012-08-31","strat":"strat_growth","growth":1.62822859086216,"x":1346371200000,"y":1.62822859086216},{"date":"2012-09-30","strat":"strat_growth","growth":1.63253437337613,"x":1348963200000,"y":1.63253437337613},{"date":"2012-10-31","strat":"strat_growth","growth":1.60282512799027,"x":1351641600000,"y":1.60282512799027},{"date":"2012-11-30","strat":"strat_growth","growth":1.61195335033266,"x":1354233600000,"y":1.61195335033266},{"date":"2012-12-31","strat":"strat_growth","growth":1.62640100913864,"x":1356912000000,"y":1.62640100913864},{"date":"2013-01-31","strat":"strat_growth","growth":1.68704065281184,"x":1359590400000,"y":1.68704065281184},{"date":"2013-02-28","strat":"strat_growth","growth":1.66515889459883,"x":1362009600000,"y":1.66515889459883},{"date":"2013-03-31","strat":"strat_growth","growth":1.72839406959367,"x":1364688000000,"y":1.72839406959367},{"date":"2013-04-30","strat":"strat_growth","growth":1.76160059381321,"x":1367280000000,"y":1.76160059381321},{"date":"2013-05-31","strat":"strat_growth","growth":1.70841697885739,"x":1369958400000,"y":1.70841697885739},{"date":"2013-06-30","strat":"strat_growth","growth":1.66232349826352,"x":1372550400000,"y":1.66232349826352},{"date":"2013-07-31","strat":"strat_growth","growth":1.74822713746462,"x":1375228800000,"y":1.74822713746462},{"date":"2013-08-31","strat":"strat_growth","growth":1.695793794358,"x":1377907200000,"y":1.695793794358},{"date":"2013-09-30","strat":"strat_growth","growth":1.74951878445173,"x":1380499200000,"y":1.74951878445173},{"date":"2013-10-31","strat":"strat_growth","growth":1.8065280719669,"x":1383177600000,"y":1.8065280719669},{"date":"2013-11-30","strat":"strat_growth","growth":1.86006929305396,"x":1385769600000,"y":1.86006929305396},{"date":"2013-12-31","strat":"strat_growth","growth":1.90823737000139,"x":1388448000000,"y":1.90823737000139},{"date":"2014-01-31","strat":"strat_growth","growth":1.8409753348148,"x":1391126400000,"y":1.8409753348148},{"date":"2014-02-28","strat":"strat_growth","growth":1.92476874577775,"x":1393545600000,"y":1.92476874577775},{"date":"2014-03-31","strat":"strat_growth","growth":1.94076488919149,"x":1396224000000,"y":1.94076488919149},{"date":"2014-04-30","strat":"strat_growth","growth":1.95425611616262,"x":1398816000000,"y":1.95425611616262},{"date":"2014-05-31","strat":"strat_growth","growth":1.99960739452054,"x":1401494400000,"y":1.99960739452054},{"date":"2014-06-30","strat":"strat_growth","growth":2.04086603955512,"x":1404086400000,"y":2.04086603955512},{"date":"2014-07-31","strat":"strat_growth","growth":2.01344177180497,"x":1406764800000,"y":2.01344177180497},{"date":"2014-08-31","strat":"strat_growth","growth":2.09289915593276,"x":1409443200000,"y":2.09289915593276},{"date":"2014-09-30","strat":"strat_growth","growth":2.06403557477979,"x":1412035200000,"y":2.06403557477979},{"date":"2014-10-31","strat":"strat_growth","growth":2.11264548781869,"x":1414713600000,"y":2.11264548781869},{"date":"2014-11-30","strat":"strat_growth","growth":2.17068404778395,"x":1417305600000,"y":2.17068404778395},{"date":"2014-12-31","strat":"strat_growth","growth":2.17389986925777,"x":1419984000000,"y":2.17389986925777},{"date":"2015-01-31","strat":"strat_growth","growth":2.21851495920041,"x":1422662400000,"y":2.21851495920041},{"date":"2015-02-28","strat":"strat_growth","growth":2.19866642023818,"x":1425081600000,"y":2.19866642023818},{"date":"2015-03-31","strat":"strat_growth","growth":2.20691769523211,"x":1427760000000,"y":2.20691769523211},{"date":"2015-04-30","strat":"strat_growth","growth":2.19977535027137,"x":1430352000000,"y":2.19977535027137},{"date":"2015-05-31","strat":"strat_growth","growth":2.19016635456374,"x":1433030400000,"y":2.19016635456374},{"date":"2015-06-30","strat":"strat_growth","growth":2.14572647688157,"x":1435622400000,"y":2.14572647688157},{"date":"2015-07-31","strat":"strat_growth","growth":2.19419685879852,"x":1438300800000,"y":2.19419685879852},{"date":"2015-08-31","strat":"strat_growth","growth":2.06046029966032,"x":1440979200000,"y":2.06046029966032},{"date":"2015-09-30","strat":"strat_growth","growth":2.0771742486068,"x":1443571200000,"y":2.0771742486068},{"date":"2015-10-31","strat":"strat_growth","growth":2.07859682797985,"x":1446249600000,"y":2.07859682797985},{"date":"2015-11-30","strat":"strat_growth","growth":2.07051027172711,"x":1448841600000,"y":2.07051027172711},{"date":"2015-12-31","strat":"strat_growth","growth":2.0349404854165,"x":1451520000000,"y":2.0349404854165},{"date":"2016-01-31","strat":"strat_growth","growth":1.9336356442204,"x":1454198400000,"y":1.9336356442204},{"date":"2016-02-29","strat":"strat_growth","growth":1.93203060949258,"x":1456704000000,"y":1.93203060949258},{"date":"2016-03-31","strat":"strat_growth","growth":1.94893281336861,"x":1459382400000,"y":1.94893281336861},{"date":"2016-04-30","strat":"strat_growth","growth":1.95389693435084,"x":1461974400000,"y":1.95389693435084},{"date":"2016-05-31","strat":"strat_growth","growth":1.95416770723623,"x":1464652800000,"y":1.95416770723623},{"date":"2016-06-30","strat":"strat_growth","growth":1.99198905499764,"x":1467244800000,"y":1.99198905499764},{"date":"2016-07-31","strat":"strat_growth","growth":2.00283469215588,"x":1469923200000,"y":2.00283469215588},{"date":"2016-08-31","strat":"strat_growth","growth":1.99852375492778,"x":1472601600000,"y":1.99852375492778},{"date":"2016-09-30","strat":"strat_growth","growth":1.99954459045234,"x":1475193600000,"y":1.99954459045234},{"date":"2016-10-31","strat":"strat_growth","growth":1.96487842210174,"x":1477872000000,"y":1.96487842210174},{"date":"2016-11-30","strat":"strat_growth","growth":1.91446753446935,"x":1480464000000,"y":1.91446753446935},{"date":"2016-12-31","strat":"strat_growth","growth":1.95329907428658,"x":1483142400000,"y":1.95329907428658},{"date":"2017-01-31","strat":"strat_growth","growth":1.98825275521061,"x":1485820800000,"y":1.98825275521061},{"date":"2017-02-28","strat":"strat_growth","growth":2.0663742320784,"x":1488240000000,"y":2.0663742320784},{"date":"2017-03-31","strat":"strat_growth","growth":2.06897387199815,"x":1490918400000,"y":2.06897387199815},{"date":"2017-04-30","strat":"strat_growth","growth":2.08951089948863,"x":1493510400000,"y":2.08951089948863},{"date":"2017-05-31","strat":"strat_growth","growth":2.11899996460198,"x":1496188800000,"y":2.11899996460198},{"date":"2017-06-30","strat":"strat_growth","growth":2.13250711646859,"x":1498780800000,"y":2.13250711646859},{"date":"2017-07-31","strat":"strat_growth","growth":2.1890905108188,"x":1501459200000,"y":2.1890905108188},{"date":"2017-08-31","strat":"strat_growth","growth":2.18810929588631,"x":1504137600000,"y":2.18810929588631},{"date":"2017-09-30","strat":"strat_growth","growth":2.23978661558091,"x":1506729600000,"y":2.23978661558091},{"date":"2017-10-31","strat":"strat_growth","growth":2.27739985459587,"x":1509408000000,"y":2.27739985459587},{"date":"2017-11-30","strat":"strat_growth","growth":2.29309929349177,"x":1512000000000,"y":2.29309929349177},{"date":"2017-12-31","strat":"strat_growth","growth":2.32410541140168,"x":1514678400000,"y":2.32410541140168},{"date":"2018-01-31","strat":"strat_growth","growth":2.440789981196,"x":1517356800000,"y":2.440789981196},{"date":"2018-02-28","strat":"strat_growth","growth":2.32278320664574,"x":1519776000000,"y":2.32278320664574},{"date":"2018-03-31","strat":"strat_growth","growth":2.30328068647944,"x":1522454400000,"y":2.30328068647944},{"date":"2018-04-30","strat":"strat_growth","growth":2.3383191125374,"x":1525046400000,"y":2.3383191125374},{"date":"2018-05-31","strat":"strat_growth","growth":2.29402525318104,"x":1527724800000,"y":2.29402525318104},{"date":"2018-06-30","strat":"strat_growth","growth":2.30722776347155,"x":1530316800000,"y":2.30722776347155},{"date":"2018-07-31","strat":"strat_growth","growth":2.39270269351686,"x":1532995200000,"y":2.39270269351686},{"date":"2018-08-31","strat":"strat_growth","growth":2.46907730762835,"x":1535673600000,"y":2.46907730762835},{"date":"2018-09-30","strat":"strat_growth","growth":2.48376408977902,"x":1538265600000,"y":2.48376408977902},{"date":"2018-10-31","strat":"strat_growth","growth":2.31212532889691,"x":1540944000000,"y":2.31212532889691},{"date":"2018-11-30","strat":"strat_growth","growth":2.35501366038692,"x":1543536000000,"y":2.35501366038692},{"date":"2018-12-31","strat":"strat_growth","growth":2.14792267778483,"x":1546214400000,"y":2.14792267778483}],"type":"line"}],"xAxis":{"type":"datetime","title":{"text":"date"},"categories":null},"tooltip":{"pointFormat":"{point.strat}: ${point.growth: .2f}"}},"theme":{"chart":{"backgroundColor":"transparent"}},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"chart","fonts":[],"debug":false},"evals":[],"jsHooks":[]}</script>
<p>Our momentum strategy did not grow as much as a pure buy-hold strategy. Let’s check some summary statistics by using <code>tq_performance()</code> and <code>table.Stats</code>.</p>
<pre class="r"><code>strat_returns %>%
ungroup() %>%
select(date, strat_returns, bench_returns) %>%
gather(strats, returns, -date) %>%
group_by(strats) %>%
tq_performance(Ra = returns,
performance_fun = table.Stats) %>%
gather(stat, value, -strats)</code></pre>
<pre><code># A tibble: 32 x 3
# Groups: strats [2]
strats stat value
<chr> <chr> <dbl>
1 strat_returns ArithmeticMean 0.0046
2 bench_returns ArithmeticMean 0.0064
3 strat_returns GeometricMean 0.0042
4 bench_returns GeometricMean 0.0059
5 strat_returns Kurtosis 1.01
6 bench_returns Kurtosis 2.57
7 strat_returns LCLMean(0.95) 0.0005
8 bench_returns LCLMean(0.95) 0.0018
9 strat_returns Maximum 0.0834
10 bench_returns Maximum 0.0876
# … with 22 more rows</code></pre>
<p>Thus far, our strategy seems inferior to buy-hold. Let’s check the max drawdowns with the <code>table.DownsideRisk</code> function.</p>
<pre class="r"><code>strat_returns %>%
ungroup() %>%
select(date, strat_returns, bench_returns) %>%
gather(strats, returns, -date) %>%
group_by(strats) %>%
tq_performance(Ra = returns,
performance_fun = table.DownsideRisk) %>%
gather(stat, value, -strats)</code></pre>
<pre><code># A tibble: 22 x 3
# Groups: strats [2]
strats stat value
<chr> <chr> <dbl>
1 strat_returns DownsideDeviation(0%) 0.0184
2 bench_returns DownsideDeviation(0%) 0.0213
3 strat_returns DownsideDeviation(MAR=10%) 0.0225
4 bench_returns DownsideDeviation(MAR=10%) 0.025
5 strat_returns DownsideDeviation(Rf=0%) 0.0184
6 bench_returns DownsideDeviation(Rf=0%) 0.0213
7 strat_returns GainDeviation 0.0176
8 bench_returns GainDeviation 0.0176
9 strat_returns HistoricalES(95%) -0.0622
10 bench_returns HistoricalES(95%) -0.0727
# … with 12 more rows</code></pre>
<p>If we isolate the <code>MaximumDrawdown</code> and VaR measures, we see this:</p>
<pre class="r"><code>strat_returns %>%
ungroup() %>%
select(date, strat_returns, bench_returns) %>%
gather(strats, returns, -date) %>%
group_by(strats) %>%
tq_performance(Ra = returns,
performance_fun = table.DownsideRisk) %>%
select(MaximumDrawdown, contains("VaR"))</code></pre>
<pre><code># A tibble: 2 x 4
# Groups: strats [2]
strats MaximumDrawdown `HistoricalVaR(95%)` `ModifiedVaR(95%)`
<chr> <dbl> <dbl> <dbl>
1 strat_returns 0.172 -0.0482 -0.0435
2 bench_returns 0.422 -0.0496 -0.0503</code></pre>
<p>The momentum strat didn’t grow as much as buy-hold, but it had a far smaller max drawdown.</p>
<p>That’s all for today - thanks for reading!</p>
<p>A couple of plugs before we close:</p>
<p>If you like this sort of thing, check out my book, <a href="https://www.amazon.com/Reproducible-Finance-Portfolio-Analysis-Chapman/dp/1138484032">Reproducible Finance with R</a>.</p>
<p>This is not specific to finance, but several of the <code>stringr</code> and <code>ggplot</code> tricks in this post came from this awesome <a href="https://university.business-science.io/p/ds4b-101-r-business-analysis-r">Business Science University course</a>.</p>
<p>Want to get into Momentum Investing? Have a look at this book <a href="https://alphaarchitect.com/book/">Quantitative Momentum</a> first.</p>
<p>Happy coding!</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>Fama, E. and K. French, 2008, Dissecting Anomalies, The Journal of Finance, 63, pg. 1653-1678.]<a href="#fnref1" class="footnote-back">↩</a></p></li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/05/29/momentum-investing-with-r/';</script>
Analysing the HIV pandemic, Part 4: Classification of lab samples
https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/
Thu, 23 May 2019 00:00:00 +0000https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/
<p><em>Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio</em></p>
<p><em>Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the <a href="https://www.ufs.ac.za/health/departments-and-divisions/virology-home">Division of Virology</a>, <a href="https://www.ufs.ac.za">University of the Free State</a>, and <a href="http://www.nhls.ac.za/">National Health Laboratory Service (NHLS)</a>, Bloemfontein, South Africa</em></p>
<p>In this post we complete our series on analysing the HIV pandemic in Africa. Previously we covered the bigger picture of <a href="https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/">HIV infection in Africa</a>, and a <a href="https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/">pipeline for drug resistance testing</a> of samples in the lab.</p>
<p>Then, in <a href="https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/">part 3</a> we saw that sometimes the same patient’s genotype must be repeatedly analysed in the lab, from samples taken years apart.</p>
<blockquote>
<p>Let’s say we have genotyped a patient five years ago and we have a current genotype sequence. It should be possible to retrieve the previous sequence from a database of sequences without relying on identifiers only or at all. Sometimes when someone remarries they may change their surname or transcription errors can be made, which makes finding previous samples tedious and error-prone. So instead of using patient information to look for previous samples to include, we can rather use the sequence data itself and then confirm the sequences belong to the same patient or investigate any irregularities. If we suspect mother-to-child transmission from our analysis, we confirm this with the healthcare worker who sent the sample.</p>
</blockquote>
<p>In this final part, we discuss how the inter- and intra-patient HIV genetic distances were analyzed using logistic regression to gain insights into the probability distribution of these two classes. In other words, the goal is to find a way to tell whether two genetic samples are from the same person or from two different people.</p>
<p>Samples from the same person can have slightly different genetic sequences, due to mutations and other errors. This is especially useful in comparing samples of genetic material from retroviruses.</p>
<div id="preliminary-analysis" class="section level2">
<h2>Preliminary analysis</h2>
<p>To help answer this question, we downloaded data from the <a href="https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html">Los Alamos HIV sequence database</a> (specifically, <em>Virus HIV-1, subtype C, genetic region POL CDS</em>).</p>
<p>Each observation is the (dis)similarity distance between different samples.</p>
<pre class="r"><code>library(readr)
library(dplyr)
library(ggplot2)</code></pre>
<pre><code>## Warning: package 'ggplot2' was built under R version 3.5.2</code></pre>
<pre class="r"><code>pt_distance <-
read_csv("dist_sample_10.csv.zip", col_types = "ccdccf")
head(pt_distance)</code></pre>
<pre><code>## # A tibble: 6 x 6
## sample1 sample2 distance sub area type
## <chr> <chr> <dbl> <chr> <chr> <fct>
## 1 KI_797.67744.AB874124… KI_481.67593.AB873933.… 0.0644 B INT Inter
## 2 502-2794.39696.JF3202… WC3.27170.EF175209.B.U… 0.0418 B INT Inter
## 3 KI_882.67653.AB874186… KI_813.67589.AB874131.… 0.0347 B INT Inter
## 4 HTM360.13332.DQ322231… C11-2069070.63977.AB87… 0.0487 B INT Inter
## 5 O5598.34737.GQ372062.… LM49.4011.AF086817.B.T… 0.0360 B INT Inter
## 6 GKN.45901.HQ026515.B.… C11-2069083.65198.AB87… 0.0699 B INT Inter</code></pre>
<p>Next, plot a histogram of the distance between samples. This clearly shows that the distance between samples of the same subject (intra-patient) is smaller than the distance between different subjects (inter-patient). This is not surprising.</p>
<p>However, from the histogram it is also clear that there is not a clear demarcation between these types. Simply eye-balling the data seems to indicate that one could use an arbitrary threshold of around 0.025 to indicate whether the sample is from the same person or different people.</p>
<pre class="r"><code>pt_distance %>%
mutate(
type = forcats::fct_rev(type)
) %>%
ggplot(aes(x = distance, fill = type)) +
geom_histogram(binwidth = 0.001) +
facet_grid(rows = vars(type), scales = "free_y") +
scale_fill_manual(values = c("red", "blue")) +
coord_cartesian(xlim = c(0, 0.1)) +
ggtitle("Histogram of phylogenetic distance by type")</code></pre>
<p><img src="/post/2019/2019-05-21_analysing-hiv-pandemic-part-4/2019-05-21-analysing-hiv-pandemic-part-4_files/figure-html/histogram-1.png" width="672" /></p>
</div>
<div id="modeling" class="section level2">
<h2>Modeling</h2>
<p>Since we have <strong>two</strong> sample types (intra-patient vs inter-patient), this is a binary classification problem.</p>
<p><a href="https://en.wikipedia.org/wiki/Logistic_regression">Logistic regression</a> is a simple algorithm for binary classification, and a special case of a <a href="https://en.wikipedia.org/wiki/Generalized_linear_model">generalized linear model</a> (<strong>GLM</strong>). In <strong>R</strong>, you can use the <code>glm()</code> function to fit a GLM, and to specify a logistic regression, use the <code>family = binomial</code> argument.</p>
<p>In this case we want to train a model with <code>distance</code> as independent variable, and <code>type</code> the dependent variable, i.e. <code>type ~ distance</code>.</p>
<p>We train on 100,000 (<code>n = 1e5</code>) observations purely to reduce computation time:</p>
<pre class="r"><code>pt_sample <-
pt_distance %>%
sample_n(1e5)
model <- glm(type ~ distance, data = pt_sample, family = binomial)</code></pre>
<pre><code>## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred</code></pre>
<p>(Note that sometimes the model throws a warning indicating numerical problems. This happens because the overlap between intra and inter is very small. If there is a very sharp dividing line between classes, the logistic regression algorithm has problems to converge.)</p>
<p>However, in this case the numerical problems doesn’t actually cause a practical problem with model itself.</p>
<p>The model summary tells us that the <code>distance</code> variable is highly significant (indicated by the ***):</p>
<pre class="r"><code>summary(model)</code></pre>
<pre><code>##
## Call:
## glm(formula = type ~ distance, family = binomial, data = pt_sample)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4035 -0.0050 -0.0010 -0.0002 8.4904
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.7887 0.1796 32.23 <2e-16 ***
## distance -355.1454 9.3247 -38.09 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 23659.2 on 99999 degrees of freedom
## Residual deviance: 1440.5 on 99998 degrees of freedom
## AIC: 1444.5
##
## Number of Fisher Scoring iterations: 12</code></pre>
<p>Now we can use the model to compute a prediction for a range of genetic distances (from 0 to 0.05) and create a plot.</p>
<pre class="r"><code>newdata <- data.frame(distance = seq(0, 0.05, by = 0.001))
pred <- predict(model, newdata, type = "response")</code></pre>
<pre class="r"><code>plot_inter <-
pt_sample %>%
filter(distance <= 0.05, type == "Inter") %>%
sample_n(2000)
plot_intra <-
pt_sample %>%
filter(distance <= 0.05, type == "Intra") %>%
sample_n(2000)
threshold <- with(newdata, approx(pred, distance, xout = 0.5))$y
ggplot() +
geom_point(data = plot_inter, aes(x = distance, y = 0), alpha = 0.05, col = "blue") +
geom_point(data = plot_intra, aes(x = distance, y = 1), alpha = 0.05, col = "red") +
geom_rug(data = plot_inter, aes(x = distance, y = 0), col = "blue") +
geom_rug(data = plot_intra, aes(x = distance, y = 0), col = "red") +
geom_line(data = newdata, aes(x = distance, y = pred)) +
annotate(x = 0.005, y = 0.9, label = "Type == intra", geom = "text", col = "red") +
annotate(x = 0.04, y = 0.1, label = "Type == inter", geom = "text", col = "blue") +
geom_vline(xintercept = threshold, col = "grey50") +
ggtitle("Model results", subtitle = "Predicted probability that Type == 'Intra'") +
xlab("Phylogenetic distance") +
ylab("Probability")</code></pre>
<p><img src="/post/2019/2019-05-21_analysing-hiv-pandemic-part-4/2019-05-21-analysing-hiv-pandemic-part-4_files/figure-html/predictionplot-1.png" width="672" /></p>
<p>Logistic regression essentially fits an s-curve that indicates the probability. In this case, for small distances (lower than ~0.01) the probability of being the same person (i.e., type is intra) is almost 100%. For distances greater than 0.03 the probability of being type intra is almost zero (i.e., the model predicts type inter).</p>
<p>The model puts the distance threshold at approximately 0.016.</p>
</div>
<div id="the-practical-value-of-this-work" class="section level2">
<h2>The practical value of this work</h2>
<p>In part 2, we discussed how <a href="https://journals.plos.org/plosone/article/authors?id=10.1371/journal.pone.0213241">researchers</a> developed an automated pipeline of phylogenetic analysis. The project was designed to run on the Raspberry Pi, a very low-cost computing device. This meant that the cost of implementation of the project is low, and the project has been implemented at the <a href="http://www.nhls.ac.za/">National Health Laboratory Service (NHLS)</a> in South Africa.</p>
<p>In this part, we described the very simple logistic regression model that runs as part of the pipeline. In addition to the descriptive analysis, e.g., heat maps and trees (as described in part 3), this logistic regression makes a prediction whether two samples were obtained from the same person, or from two different people. This prediction is helpful in allowing the laboratory staff identify potential contamination of samples, or indeed to match samples from people who weren’t matched properly by their name and other identifying information (e.g., through spelling mistakes or name changes).</p>
<p>Finally, it’s interesting to note that traditionally the decision whether two samples were intra-patient or inter-patient was made on heuristics, instead of modelling. For example, a heuristic might say that if the genetic distance between two samples is less than 0.01, they should be considered a match from a single person.</p>
<p>Heuristics are easy to implement in the lab, but sometimes it can happen that the origin of the original heuristic gets lost. This means that it’s possible that the heuristic is no longer applicable to the sample population.</p>
<p>This modelling gave the researchers a tool to establish confidence intervals around predictions. In addition, it is now possible to repeat the model for many different local sample populations of interest, and thus have a tool that is better able to discriminate given the most recent data.</p>
</div>
<div id="conclusion" class="section level2">
<h2>Conclusion</h2>
<p>In this multi-part series of HIV in Africa we covered four topics:</p>
<ul>
<li>In <a href="https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/">part 1</a>, we analysed the incidence of HIV in sub-Sahara Africa, with special mention of the effect of the wide-spread availability of anti-retroviral (ARV) drugs during 2004. Since then, there was a rapid decline in HIV infection rates in South Africa.</li>
<li>In <a href="https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-2/">part 2</a>, we described the PhyloPi project - a phylogenetic pipeline to analyse HIV in the lab, available for the low-cost RaspBerry Pi. This work as published in the <a href="https://journals.plos.org/plosone/">PLoS ONE journal</a>: “<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241">PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility</a>”</li>
<li>Then, <a href="https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/">part 3</a> described the biological mechanism how the HIV virus mutates, and how this can be modeled using a Markov chain, and visualized as heat maps and phylogenetic trees.</li>
<li>This final part covered how we used a very simple logistic regression model to identify if two samples in the lab came from the same person or two different people.</li>
</ul>
</div>
<div id="closing-thoughts" class="section level2">
<h2>Closing thoughts</h2>
<p>Dear readers,</p>
<p>I hope that you enjoyed this series on ‘Analysing the HIV pandemic’ using R and some of the tools available as part of the <a href="https://www.tidyverse.org/"><code>tidyverse</code></a> packages. Learning R provided me not only with a tool set to analyse data problems, but also a <a href="https://stackoverflow.com/questions/tagged/r">community</a>. Being a biologist, I was not sure of the best approach for solving the problem of inter- and intra-patient genetic distances. I contacted Andrie from <a href="https://resources.rstudio.com/authors/andrie-de-vries">Rstudio</a>, and not only did he help us with this, but he was also excited about it. It was a pleasure telling you about our journey on this blog site, and a privilege doing this with experts.</p>
<p>Armand</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/';</script>
Analysing the HIV pandemic, Part 3: Genetic diversity
https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/
Thu, 16 May 2019 00:00:00 +0000https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/
<script src="/rmarkdown-libs/htmlwidgets/htmlwidgets.js"></script>
<script src="/rmarkdown-libs/plotly-binding/plotly.js"></script>
<script src="/rmarkdown-libs/typedarray/typedarray.min.js"></script>
<script src="/rmarkdown-libs/jquery/jquery.min.js"></script>
<link href="/rmarkdown-libs/crosstalk/css/crosstalk.css" rel="stylesheet" />
<script src="/rmarkdown-libs/crosstalk/js/crosstalk.min.js"></script>
<link href="/rmarkdown-libs/plotly-htmlwidgets-css/plotly-htmlwidgets.css" rel="stylesheet" />
<script src="/rmarkdown-libs/plotly-main/plotly-latest.min.js"></script>
<p><em>Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the <a href="https://www.ufs.ac.za/health/departments-and-divisions/virology-home">Division of Virology</a>, <a href="https://www.ufs.ac.za">University of the Free State</a>, and <a href="http://www.nhls.ac.za/">National Health Laboratory Service (NHLS)</a>, Bloemfontein, South Africa</em></p>
<p><em>Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio</em></p>
<div id="recap" class="section level2">
<h2>Recap</h2>
<p>In <a href="https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/">part 2 of this series</a>, we discussed the <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241">PhyloPi</a> pipeline for conducting routine HIV phylogenetics in the drug-resistance testing laboratory as a part of quality control. As mentioned, during HIV replication the error-prone viral reverse transcriptase (RT) converts its RNA genome into DNA before it can be integrated into the host cell genome. During this conversion, the enzyme makes random mistakes in the copying process. These mistakes, or mutations, can be deleterious, beneficial or may have no measurable impact on the replicative fitness of the virus. However, the fast rate of mutation provides enough divergence to be useful for phylogenetic analysis.</p>
</div>
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>As infections spread from person to person, the virus continues to mutate and become more and more divergent. This allows us to use the genetic information we obtain while doing the drug resistance test and analyse the sequences for abnormalities.</p>
<p>We showed how DNA sequences can be aligned and, based on the composition of ‘columns’ in these strings, a distance matrix can be calculated of each string against each other. In the example we discussed in part 2, we had a very simple method for calculating matches, i.e., we used either a one or zero. We can get closer to the truth by using substitution models, as we will explain below. In many machine learning algorithms, it is required that one first calculate the distances of each observation against each other, and the choice of algorithm is up to the analyst. Phylogenetic inference is very similar in that a distance matrix needs to be constructed on which the tree can be calculated.</p>
<p>If the sequence targeted for phylogenetic inference is very stable with little or no evolution, the distances calculated will be zero or very close to it. This will not allow for differentiation. However, as we mentioned, HIV has a very fast rate of evolution due to its error-prone reverse transcriptase.</p>
<p><a href="https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002251">Cuevas</a> <em>et al.</em> (2015) published work on the <em>in vivo</em> rate of HIV evolution. Their analysis revealed the highest mutation rate of any biological entity of <span class="math inline">\(4.1 \cdot 10^{-3}\)</span> (<span class="math inline">\(sd=1.7 \cdot 10^{-3}\)</span>). However, the error-prone reverse transcriptase is not the only mechanism of mutation. One defence against HIV infection is an enzyme called apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like or <strong><a href="https://en.wikipedia.org/wiki/APOBEC3G">APOBEC</a></strong>. These enzymes act on RNA and convert or mutate cytidine to uridine (uridine in RNA is the thymadine counterpart in DNA). This results in a G to A mutation on the cDNA.</p>
<p>Also, shown by Cuevas <em>et al</em>, these enzymes are not equally active in all people. On the other hand, the viral Vif protein inhibits this hypermutation by ‘tagging’ the APOBEC protein with ubiquinone for degradation by the cytoplasmic ubiquitin-dependent proteasome machinery.</p>
<p>But how does this virus-driven mutation, or APOBEC-driven hypermutation, affect the virus in a negative (or positive) way?</p>
<p>We first need to understand how RNA is translated into proteins. Below is a table showing the codon combinations for each of the 20 amino acids.</p>
<div class="figure" style="text-align: center"><span id="fig:codons"></span>
<img src="/post/2019-05-14-analysis-hiv-pandemic-part-3_files/codon-table-by-sabal-edu.jpg" alt="Amino acid encoding. Available at https://www.biologyjunction.com/protein-synthesis-worksheet/" width="80%" style="margin:50px 10px" />
<p class="caption">
Figure 1: Amino acid encoding. Available at <a href="https://www.biologyjunction.com/protein-synthesis-worksheet/" class="uri">https://www.biologyjunction.com/protein-synthesis-worksheet/</a>
</p>
</div>
<p>As can be seen from the table above, some amino acids are encoded by more than one codon. For example, if we change the codon CGU to AGA, the resulting amino acid stays Arginine or R. This is referred to as a silent mutation, since the resulting protein will look the same. On the other hand, if we mutate AGU to CGU, the resulting mutation is from Serine to Arginine, or in single-letter notation, <strong>S to R</strong>. A change in the amino acid is referred to as a non-synonymous mutation.</p>
</div>
<div id="example" class="section level2">
<h2>Example</h2>
<p>In reality, the APOBEC enzyme recognizes specific RNA sequence motifs, but just to give an idea of how this works, let’s look at an example.</p>
<p>Load some packages:</p>
<pre class="r"><code>library(ape)
library(Biostrings)
library(tibble)
library(tidyr)
library(dplyr)
library(knitr)
library(plotly)
library(RColorBrewer)
library(diagram)</code></pre>
<p>Create a RNA sequence (remember <code>U</code> is <code>T</code> in RNA language):</p>
<pre class="r"><code>WT <- c("CGA", "GUU", "AUA", "GAG", "UGG", "AGU")</code></pre>
<p>We have the sequence CGAGUUAUAGAGUGGAGU that we created in the cell block above as codons for clarity. We can now translate this sequence using the codon table or some function.</p>
<pre class="r"><code>translate_dna_sequence <- function(x){
x %>%
paste0(collapse = "") %>%
gsub("U", "T", .) %>%
DNAString() %>%
as.DNAbin() %>%
trans() %>%
.[[1]] %>%
as.character.AAbin()
}
AA <- WT %>% translate_dna_sequence()</code></pre>
<p>The code block above translated our RNA sequence into a protein sequence: R, V, I, E, W, S.</p>
<p>Now let’s mutate all occurrences of <code>C</code> to <code>U/T</code>:</p>
<pre class="r"><code>MUT <- gsub("C", "U", WT)</code></pre>
<p>The resulting mutant sequence is: UGA, GUU, AUA, GAG, UGG, AGU, and if we now translate that, we get …</p>
<pre class="r"><code>AA <- MUT %>% translate_dna_sequence()</code></pre>
<p>… the protein sequence: *, V, I, E, W, S.</p>
<p>The <code>*</code> means a <em>stop codon</em> was introduced. Stop codons are responsible for terminating translation from RNA to protein. If one of the viral genes has a stop codon in it, the protein will truncate prematurely and the protein will most likely be dysfunctional. Mutations other than stop codons could also have a negative effect on the virus, or it can cause resistance to an ARV.</p>
</div>
<div id="calculating-genetic-distances-from-a-multiple-sequence-alignment-msa" class="section level2">
<h2>Calculating genetic distances from a multiple sequence alignment (MSA)</h2>
<p>In <a href="https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/">part 2</a>, we showed the general principle of a MSA. In biology, sequence alignments are used to look at similarities of DNA or protein sequences. For most phylogenetic analysis, a multiple sequence alignment is a requirement, and the more accurate the MSA, the more accurate the phylogenetic inference.</p>
<p>First, we read in the multiple sequence alignment file.</p>
<pre class="r"><code># Read in the alignment file
aln <- read.dna('example.aln', format = 'fasta')</code></pre>
<p>Next, we can calculate the distance matrix using the Kimura two-parameter (K80) model. There are various models that can be applied when looking at DNA substitution models. We will use a model based on <a href="https://en.wikipedia.org/wiki/Markov_chain">Markov chains</a>. Remember:</p>
<blockquote>
<p>“All models are wrong, but some are useful” - <a href="https://en.wikipedia.org/wiki/George_E._P._Box">George Box</a></p>
</blockquote>
<p>This is <strong>very</strong> true when it comes to estimating genetic distances and phylogenetic inference. Consider the image below:</p>
<div class="figure" style="text-align: center"><span id="fig:sumbstetutions"></span>
<img src="/post/2019-05-14-analysis-hiv-pandemic-part-3_files/1024px-All_transitions_and_transversions.svg.png" alt="transversions vs transitions. Available at https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png" style="margin:50px 10px" />
<p class="caption">
Figure 2: transversions vs transitions. Available at <a href="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png" class="uri">https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png</a>
</p>
</div>
<p>The figure above shows transition and transversion events. <strong>Transition</strong> between <strong>A</strong> and <strong>G</strong> (the purines) and <strong>C</strong> and <strong>T</strong> (the pyrimidines) are more likely than <strong>transversions</strong> (indicated by the red arrows). The K80 model takes this into account as one of its parameters, and these rates, or probabilities, are calculated or estimated by maximum likelihood.</p>
<p>Let’s see what that looks like:</p>
<pre class="r"><code>tmDNA <- matrix(c(0.8,0.05,0.1,0.05,
0.05,0.8,0.05,0.1,
0.1,0.05,0.8,0.05,
0.05,0.1,0.05,0.8),
nrow = 4, byrow = TRUE)
stateNames <- c("A","C","G", "T")
row.names(tmDNA) <- stateNames; colnames(tmDNA) <- stateNames
tmDNA %>%
kable(
caption = "Example K80 probabilities of transitions or transversions"
)</code></pre>
<table>
<caption><span id="tab:unnamed-chunk-6">Table 1: </span>Example K80 probabilities of transitions or transversions</caption>
<thead>
<tr class="header">
<th></th>
<th align="right">A</th>
<th align="right">C</th>
<th align="right">G</th>
<th align="right">T</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>A</td>
<td align="right">0.80</td>
<td align="right">0.05</td>
<td align="right">0.10</td>
<td align="right">0.05</td>
</tr>
<tr class="even">
<td>C</td>
<td align="right">0.05</td>
<td align="right">0.80</td>
<td align="right">0.05</td>
<td align="right">0.10</td>
</tr>
<tr class="odd">
<td>G</td>
<td align="right">0.10</td>
<td align="right">0.05</td>
<td align="right">0.80</td>
<td align="right">0.05</td>
</tr>
<tr class="even">
<td>T</td>
<td align="right">0.05</td>
<td align="right">0.10</td>
<td align="right">0.05</td>
<td align="right">0.80</td>
</tr>
</tbody>
</table>
<pre class="r"><code>plotmat(tmDNA,pos = c(2,2),
lwd = 1, box.lwd = 2,
cex.txt = 0.8,
box.size = 0.1,
box.type = "circle",
box.prop = 0.5,
box.col = "light blue",
arr.length=.1,
arr.width=.1,
self.cex = .6,
self.shifty = -.01,
self.shiftx = .14,
main = "Markov Chain")</code></pre>
<p><img src="/post/2019/2019-05-14_analysing-hiv-pandemic-part-3/2019-05-14-analysing-hiv-pandemic-part-3_files/figure-html/unnamed-chunk-7-1.png" width="672" /></p>
<p>This example is contrived, but should explain the concept of a substitutions model. The viral reverse transcriptase is not a random sequence generator, but it does make mistakes. Most of the time when it is copying the RNA into DNA, the base (state) stays the same. Then also, the probability of a transversion <em>vs.</em> a transition is different. If you look at the figure above where we introduced transversion and transition, you will notice that A is more similar to G, and T is more similar to C in its chemical structure.</p>
<p>There are many other <a href="http://www.iqtree.org/doc/Substitution-Models">substitution models</a>. It is not always trivial to select the best model for phylogenetic inference. One technique is to run multiple maximum likelihood phylogenetic calculations using different models, and then pick the model with the lowest AIC (Akaike Information Criterion). For our pipeline, we selected the rather simple K80 model. Since we are looking at different sets of sequences at each submission, a simple model is probably better in order to avoid the problems caused by overfitting.</p>
<p>We can use the <code>ape</code> package and calculate distances using the <code>K80</code> model.</p>
<pre class="r"><code># Calculate the genetic distances between sequences using the K80 model, as.mattrix makes the rest easier
alnDist <- dist.dna(aln, model = "K80", as.matrix = TRUE)
alnDist[1:5, 1:5] %>%
kable(caption = "First few rows of our distance matrix")</code></pre>
<table>
<caption><span id="tab:unnamed-chunk-8">Table 2: </span>First few rows of our distance matrix</caption>
<thead>
<tr class="header">
<th></th>
<th align="right">01_AE.JP.AB253686_INT</th>
<th align="right">B.US.HM450245_INT</th>
<th align="right">B.AU.AF407664_INT</th>
<th align="right">B.CN.KJ820110_INT</th>
<th align="right">B.RU.HM466986_INT</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>01_AE.JP.AB253686_INT</td>
<td align="right">0.0000000</td>
<td align="right">0.0935626</td>
<td align="right">0.0961965</td>
<td align="right">0.0962887</td>
<td align="right">0.0962887</td>
</tr>
<tr class="even">
<td>B.US.HM450245_INT</td>
<td align="right">0.0935626</td>
<td align="right">0.0000000</td>
<td align="right">0.0378446</td>
<td align="right">0.0378167</td>
<td align="right">0.0378748</td>
</tr>
<tr class="odd">
<td>B.AU.AF407664_INT</td>
<td align="right">0.0961965</td>
<td align="right">0.0378446</td>
<td align="right">0.0000000</td>
<td align="right">0.0454602</td>
<td align="right">0.0494138</td>
</tr>
<tr class="even">
<td>B.CN.KJ820110_INT</td>
<td align="right">0.0962887</td>
<td align="right">0.0378167</td>
<td align="right">0.0454602</td>
<td align="right">0.0000000</td>
<td align="right">0.0479955</td>
</tr>
<tr class="odd">
<td>B.RU.HM466986_INT</td>
<td align="right">0.0962887</td>
<td align="right">0.0378748</td>
<td align="right">0.0494138</td>
<td align="right">0.0479955</td>
<td align="right">0.0000000</td>
</tr>
</tbody>
</table>
<p>The matrix has a shape of 47 by 47, so we just preview the first 5 rows and columns.</p>
</div>
<div id="reduction-of-the-heatmap-to-focus-on-the-important-data" class="section level2">
<h2>Reduction of the heatmap to focus on the important data</h2>
<p>The pipeline mentioned uses the <strong>Basic Local Alignment Search Tool</strong> (BLAST) to retrieve previously sampled sequences, and adds these retrieved sequences to the analysis. <a href="https://blast.ncbi.nlm.nih.gov/Blast.cgi">BLAST</a> is like a search engine you use on the web, but for protein or DNA sequences. By doing this, important sequences from retrospective samples are included, which enables PhyloPi to be aware of past sequences and not just batch-per-batch aware. Have a look at the <a href="https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0213241">paper</a> for some examples.</p>
<p>The data we have is ready to use for heatmap plotting purposes, but since the data also contains previously sampled sequences, comparing those sequences amongst themselves would be a distraction. We are interested in those samples, but only compared to the current batch of samples analysed. The figures below should explain this a bit better.</p>
<hr />
<div class="figure" style="text-align: center">
<img src="/post/2019-05-14-analysis-hiv-pandemic-part-3_files/heatmap_full.png" alt="A diagram of a heatmap with lots of redundant and distracting data. " width="50%" style="margin:50px 10px" />
<p class="caption">
(#fig:distracting data)A diagram of a heatmap with lots of redundant and distracting data.
</p>
</div>
<hr />
<p>From the image above you can see that, typical of a heatmap, it is symmetrical on the diagonal. We show submitted <em>vs</em> retrieved samples in both the horizontal and vertical direction. Notice also, annotated as “Distraction”, the previous samples are compared amongst themselves. We are not interested in those samples now, as we would already have acted on any issues then. What we want instead is a heatmap, as depicted in the image below.</p>
<hr />
<div class="figure" style="text-align: center">
<img src="/post/2019-05-14-analysis-hiv-pandemic-part-3_files/heatmap_focused.png" alt="A diagram of a more focussed heatmap with the redundant and distracting data removed." width="50%" style="margin:50px 10px" />
<p class="caption">
(#fig:focussed data)A diagram of a more focussed heatmap with the redundant and distracting data removed.
</p>
</div>
<hr />
<p>Fortunately, we have a very powerful tool, <strong>R</strong>, at our disposal, and plenty of really useful and convenient packages like <code>dplyr</code> to fix this.</p>
<pre class="r"><code>alnDistLong <-
alnDist %>%
as.data.frame(stringsToFactors = FALSE) %>%
rownames_to_column(var = "sample_1") %>%
gather(key = "sample_2", value = "distance", -sample_1, na.rm = TRUE) %>%
arrange(distance)
alnDistLong %>% head()</code></pre>
<pre><code>## sample_1 sample_2 distance
## 1 01_AE.JP.AB253686_INT 01_AE.JP.AB253686_INT 0
## 2 B.US.HM450245_INT B.US.HM450245_INT 0
## 3 B.AU.AF407664_INT B.AU.AF407664_INT 0
## 4 B.CN.KJ820110_INT B.CN.KJ820110_INT 0
## 5 B.RU.HM466986_INT B.RU.HM466986_INT 0
## 6 B.US.DQ127546_INT B.US.DQ127546_INT 0</code></pre>
<p>Final cleanup and removal of distracting data</p>
<pre class="r"><code># get the names of samples originally in the fasta file used for submission
qSample <- names(read.dna("example.fasta", format = "fasta"))
# compute new order of samples, so the new alignment is in the order of the heatmap example
sample_1 <- unique(alnDistLong$sample_1)
new_order <- c(sort(qSample), setdiff(sample_1, qSample))
new_order</code></pre>
<pre><code>## [1] "01_AE.JP.AB253686_INT" "01_AE.TH.JX448243_INT"
## [3] "01_AE.VN.LC100946_INT" "38_BF1.UY.FJ213783_INT"
## [5] "B.AU.AF407664_INT" "B.CN.KJ820110_INT"
## [7] "B.KR.JN417106_INT" "B.RU.HM466986_INT"
## [9] "B.US.DQ127546_INT" "B.US.GU076504_INT"
## [11] "B.US.HM450245_INT" "BC.CN.JQ898256_INT"
## [13] "C.ZA.KT183056_INT" "C.ZM.KM049918_INT"
## [15] "C.ZM.KM050042_INT" "01_AE.TH.JX448252_INT"
## [17] "01_AE.TH.JX448250_INT" "01_AE.TH.JX448249_INT"
## [19] "C.ZA.KT183058_INT" "C.ZM.KM049913_INT"
## [21] "B.KR.JN417120_INT" "B.KR.JN417117_INT"
## [23] "B.KR.JN417116_INT" "57_BC.CN.JX679207_INT"
## [25] "C.ZM.KM050043_INT" "C.ZM.KM050041_INT"
## [27] "01_AE.JP.AB253682_INT" "01_AE.JP.AB253689_INT"
## [29] "B.US.KJ704790_INT" "B.ES.KC238594_INT"
## [31] "B.AU.AF407665_INT" "B.AU.AF407667_INT"
## [33] "B.CN.KC987976_INT" "B.CN.KT192001_INT"
## [35] "B.US.AF040369_INT" "B.US.M38429_INT"
## [37] "B.US.DQ127547_INT" "B.US.DQ127543_INT"
## [39] "C.ZA.KT183062_INT" "B.US.GU076505_INT"
## [41] "B.US.GU076507_INT" "C.ZM.KM049917_INT"
## [43] "01_AE.CN.JQ302565_INT" "01_AE.VN.FJ185234_INT"
## [45] "F1.BR.FJ771006_INT" "BF.AR.AF408631_INT"
## [47] "BC.CN.KC898983_INT"</code></pre>
<p>Plot the heatmap using <code>plotly</code> for interactivity</p>
<pre class="r"><code>alnDistLong %>%
filter(
sample_1 %in% qSample,
sample_1 != sample_2
) %>%
mutate(
sample_2 = factor(sample_2, levels = new_order)
) %>%
plot_ly(
x = ~sample_2,
y = ~sample_1,
z = ~distance,
type = "heatmap", colors = brewer.pal(11, "RdYlBu"),
zmin = 0.0, zmax = 0.03, xgap = 2, ygap = 1
) %>%
layout(
margin = list(l = 100, r = 10, b = 100, t = 10, pad = 4),
yaxis = list(tickfont = list(size = 10), showspikes = TRUE),
xaxis = list(tickfont = list(size = 10), showspikes = TRUE)
)</code></pre>
<div id="htmlwidget-1" style="width:672px;height:480px;" class="plotly html-widget"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"visdat":{"538659887c83":["function () ","plotlyVisDat"]},"cur_data":"538659887c83","attrs":{"538659887c83":{"x":{},"y":{},"z":{},"zmin":0,"zmax":0.03,"xgap":2,"ygap":1,"colors":["#A50026","#D73027","#F46D43","#FDAE61","#FEE090","#FFFFBF","#E0F3F8","#ABD9E9","#74ADD1","#4575B4","#313695"],"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"heatmap"}},"layout":{"margin":{"b":100,"l":100,"t":10,"r":10,"pad":4},"yaxis":{"domain":[0,1],"automargin":true,"tickfont":{"size":10},"showspikes":true,"title":"sample_1","type":"category","categoryorder":"array","categoryarray":["01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.AU.AF407664_INT","B.CN.KJ820110_INT","B.KR.JN417106_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.US.GU076504_INT","B.US.HM450245_INT","BC.CN.JQ898256_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","C.ZM.KM050042_INT"]},"xaxis":{"domain":[0,1],"automargin":true,"tickfont":{"size":10},"showspikes":true,"title":"sample_2","type":"category","categoryorder":"array","categoryarray":["01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.AU.AF407664_INT","B.CN.KJ820110_INT","B.KR.JN417106_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.US.GU076504_INT","B.US.HM450245_INT","BC.CN.JQ898256_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","C.ZM.KM050042_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZA.KT183058_INT","C.ZM.KM049913_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","57_BC.CN.JX679207_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","01_AE.JP.AB253682_INT","01_AE.JP.AB253689_INT","B.US.KJ704790_INT","B.ES.KC238594_INT","B.AU.AF407665_INT","B.AU.AF407667_INT","B.CN.KC987976_INT","B.CN.KT192001_INT","B.US.AF040369_INT","B.US.M38429_INT","B.US.DQ127547_INT","B.US.DQ127543_INT","C.ZA.KT183062_INT","B.US.GU076505_INT","B.US.GU076507_INT","C.ZM.KM049917_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","F1.BR.FJ771006_INT","BF.AR.AF408631_INT","BC.CN.KC898983_INT"]},"scene":{"zaxis":{"title":"distance"}},"hovermode":"closest","showlegend":false,"legend":{"yanchor":"top","y":0.5}},"source":"A","config":{"showSendToCloud":false},"data":[{"colorbar":{"title":"distance","ticklen":2,"len":0.5,"lenmode":"fraction","y":1,"yanchor":"top"},"colorscale":[["0","rgba(165,0,38,1)"],["0.0416666666666667","rgba(186,25,39,1)"],["0.0833333333333333","rgba(207,42,39,1)"],["0.125","rgba(222,66,46,1)"],["0.166666666666667","rgba(235,91,57,1)"],["0.208333333333333","rgba(245,115,69,1)"],["0.25","rgba(249,143,82,1)"],["0.291666666666667","rgba(253,169,94,1)"],["0.333333333333333","rgba(254,191,112,1)"],["0.375","rgba(254,212,132,1)"],["0.416666666666667","rgba(254,229,152,1)"],["0.458333333333333","rgba(255,242,171,1)"],["0.5","rgba(255,255,191,1)"],["0.541666666666667","rgba(244,250,215,1)"],["0.583333333333333","rgba(230,245,239,1)"],["0.625","rgba(211,236,244,1)"],["0.666666666666667","rgba(189,226,238,1)"],["0.708333333333333","rgba(167,213,231,1)"],["0.75","rgba(144,195,221,1)"],["0.791666666666667","rgba(121,177,211,1)"],["0.833333333333333","rgba(101,154,199,1)"],["0.875","rgba(82,131,187,1)"],["0.916666666666667","rgba(67,106,175,1)"],["0.958333333333333","rgba(59,80,162,1)"],["1","rgba(49,54,149,1)"]],"showscale":true,"x":["01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZA.KT183058_INT","C.ZM.KM049913_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","57_BC.CN.JX679207_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","C.ZM.KM049917_INT","01_AE.JP.AB253682_INT","B.US.DQ127547_INT","01_AE.JP.AB253689_INT","B.US.GU076505_INT","B.AU.AF407665_INT","C.ZA.KT183062_INT","B.US.GU076507_INT","B.AU.AF407667_INT","B.CN.KC987976_INT","B.CN.KT192001_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","BC.CN.KC898983_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253689_INT","B.ES.KC238594_INT","B.US.DQ127543_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","01_AE.JP.AB253682_INT","B.US.KJ704790_INT","B.US.AF040369_INT","B.US.M38429_INT","01_AE.TH.JX448243_INT","01_AE.VN.LC100946_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","01_AE.JP.AB253682_INT","F1.BR.FJ771006_INT","B.US.KJ704790_INT","01_AE.JP.AB253686_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253689_INT","BF.AR.AF408631_INT","B.US.AF040369_INT","B.US.AF040369_INT","B.US.M38429_INT","B.US.AF040369_INT","B.US.KJ704790_INT","B.US.KJ704790_INT","B.US.KJ704790_INT","B.ES.KC238594_INT","B.US.M38429_INT","B.US.KJ704790_INT","B.US.AF040369_INT","B.US.AF040369_INT","B.ES.KC238594_INT","B.AU.AF407665_INT","B.US.DQ127543_INT","B.ES.KC238594_INT","B.US.AF040369_INT","B.CN.KT192001_INT","B.ES.KC238594_INT","B.ES.KC238594_INT","B.US.M38429_INT","B.US.M38429_INT","B.US.DQ127547_INT","B.US.KJ704790_INT","B.AU.AF407667_INT","B.US.DQ127543_INT","B.US.DQ127546_INT","B.KR.JN417106_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.US.HM450245_INT","B.CN.KJ820110_INT","B.ES.KC238594_INT","B.US.M38429_INT","B.US.DQ127543_INT","B.US.HM450245_INT","B.AU.AF407664_INT","B.US.HM450245_INT","B.RU.HM466986_INT","B.US.DQ127543_INT","B.US.HM450245_INT","B.KR.JN417106_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.CN.KT192001_INT","B.CN.KJ820110_INT","B.KR.JN417106_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.US.M38429_INT","B.CN.KC987976_INT","B.CN.KT192001_INT","B.US.DQ127543_INT","B.AU.AF407665_INT","B.CN.KC987976_INT","B.US.DQ127547_INT","B.US.DQ127547_INT","B.CN.KT192001_INT","B.US.DQ127543_INT","B.US.DQ127547_INT","B.US.HM450245_INT","B.US.DQ127546_INT","B.CN.KT192001_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.KR.JN417106_INT","B.AU.AF407667_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.AU.AF407665_INT","B.AU.AF407667_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.US.GU076505_INT","B.CN.KJ820110_INT","B.US.GU076504_INT","B.AU.AF407665_INT","B.AU.AF407667_INT","B.AU.AF407664_INT","B.CN.KJ820110_INT","B.CN.KT192001_INT","B.AU.AF407665_INT","B.US.HM450245_INT","B.US.GU076504_INT","B.US.GU076505_INT","B.CN.KC987976_INT","B.US.DQ127547_INT","B.US.DQ127547_INT","B.US.GU076505_INT","B.US.GU076507_INT","B.US.GU076504_INT","B.KR.JN417106_INT","B.CN.KC987976_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.AU.AF407664_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.CN.KC987976_INT","B.AU.AF407664_INT","B.KR.JN417106_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.AU.AF407667_INT","B.US.DQ127546_INT","B.US.GU076504_INT","B.US.GU076505_INT","B.US.GU076507_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.US.GU076507_INT","B.AU.AF407665_INT","B.CN.KC987976_INT","BC.CN.KC898983_INT","B.AU.AF407667_INT","C.ZA.KT183062_INT","BC.CN.JQ898256_INT","C.ZM.KM050042_INT","B.US.GU076507_INT","57_BC.CN.JX679207_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","C.ZM.KM049918_INT","BC.CN.JQ898256_INT","C.ZM.KM049913_INT","57_BC.CN.JX679207_INT","B.US.GU076505_INT","C.ZM.KM049917_INT","B.US.GU076505_INT","B.AU.AF407664_INT","B.US.GU076504_INT","B.RU.HM466986_INT","B.US.GU076504_INT","C.ZA.KT183056_INT","BC.CN.JQ898256_INT","C.ZA.KT183058_INT","57_BC.CN.JX679207_INT","B.US.GU076507_INT","C.ZM.KM049918_INT","C.ZM.KM050042_INT","C.ZM.KM049913_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","BC.CN.KC898983_INT","BC.CN.KC898983_INT","B.US.GU076507_INT","C.ZA.KT183062_INT","C.ZM.KM049917_INT","C.ZM.KM049917_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","C.ZA.KT183058_INT","C.ZM.KM049913_INT","C.ZA.KT183062_INT","C.ZA.KT183056_INT","C.ZM.KM050042_INT","C.ZA.KT183058_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","BF.AR.AF408631_INT","B.CN.KT192001_INT","B.US.M38429_INT","F1.BR.FJ771006_INT","B.US.DQ127543_INT","BC.CN.KC898983_INT","B.ES.KC238594_INT","B.US.AF040369_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","B.US.DQ127546_INT","38_BF1.UY.FJ213783_INT","B.US.DQ127543_INT","F1.BR.FJ771006_INT","B.US.KJ704790_INT","B.US.DQ127547_INT","B.US.DQ127547_INT","BF.AR.AF408631_INT","01_AE.CN.JQ302565_INT","01_AE.JP.AB253686_INT","C.ZM.KM050042_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.US.M38429_INT","B.ES.KC238594_INT","B.US.AF040369_INT","B.ES.KC238594_INT","B.US.DQ127546_INT","BC.CN.JQ898256_INT","B.CN.KC987976_INT","57_BC.CN.JX679207_INT","B.CN.KJ820110_INT","BC.CN.JQ898256_INT","57_BC.CN.JX679207_INT","B.KR.JN417106_INT","BC.CN.JQ898256_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","57_BC.CN.JX679207_INT","01_AE.JP.AB253682_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","B.US.M38429_INT","B.US.KJ704790_INT","B.ES.KC238594_INT","B.US.AF040369_INT","B.US.HM450245_INT","BC.CN.JQ898256_INT","B.AU.AF407665_INT","F1.BR.FJ771006_INT","F1.BR.FJ771006_INT","57_BC.CN.JX679207_INT","BC.CN.KC898983_INT","BC.CN.KC898983_INT","01_AE.VN.FJ185234_INT","01_AE.JP.AB253689_INT","B.US.KJ704790_INT","B.ES.KC238594_INT","B.US.AF040369_INT","B.US.AF040369_INT","B.AU.AF407667_INT","BC.CN.KC898983_INT","B.US.DQ127547_INT","B.US.M38429_INT","B.US.KJ704790_INT","B.US.GU076504_INT","BC.CN.JQ898256_INT","B.US.KJ704790_INT","B.US.GU076505_INT","BF.AR.AF408631_INT","57_BC.CN.JX679207_INT","B.US.M38429_INT","01_AE.JP.AB253686_INT","B.US.DQ127546_INT","B.US.DQ127547_INT","B.US.DQ127547_INT","01_AE.TH.JX448243_INT","C.ZM.KM050042_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.US.DQ127543_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","57_BC.CN.JX679207_INT","57_BC.CN.JX679207_INT","C.ZA.KT183062_INT","B.US.M38429_INT","BC.CN.KC898983_INT","B.AU.AF407664_INT","BC.CN.JQ898256_INT","B.US.GU076507_INT","57_BC.CN.JX679207_INT","B.US.DQ127546_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253682_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","B.US.DQ127546_INT","01_AE.VN.LC100946_INT","BC.CN.KC898983_INT","BC.CN.KC898983_INT","B.US.DQ127543_INT","01_AE.JP.AB253682_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","C.ZA.KT183056_INT","38_BF1.UY.FJ213783_INT","C.ZA.KT183058_INT","38_BF1.UY.FJ213783_INT","B.KR.JN417106_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.US.M38429_INT","BC.CN.KC898983_INT","C.ZM.KM049918_INT","B.KR.JN417106_INT","C.ZA.KT183062_INT","C.ZM.KM049913_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","01_AE.JP.AB253689_INT","B.US.KJ704790_INT","F1.BR.FJ771006_INT","B.ES.KC238594_INT","F1.BR.FJ771006_INT","F1.BR.FJ771006_INT","01_AE.VN.LC100946_INT","C.ZM.KM050042_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","01_AE.JP.AB253689_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","B.US.DQ127543_INT","B.US.HM450245_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","BC.CN.JQ898256_INT","57_BC.CN.JX679207_INT","C.ZA.KT183062_INT","B.KR.JN417106_INT","C.ZM.KM050042_INT","C.ZM.KM049917_INT","BF.AR.AF408631_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","C.ZA.KT183056_INT","B.KR.JN417106_INT","C.ZA.KT183058_INT","BF.AR.AF408631_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","BC.CN.KC898983_INT","B.ES.KC238594_INT","B.US.AF040369_INT","B.US.KJ704790_INT","B.AU.AF407665_INT","C.ZA.KT183062_INT","C.ZA.KT183062_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","01_AE.VN.LC100946_INT","BC.CN.JQ898256_INT","57_BC.CN.JX679207_INT","BC.CN.KC898983_INT","B.CN.KT192001_INT","F1.BR.FJ771006_INT","BF.AR.AF408631_INT","B.RU.HM466986_INT","BC.CN.JQ898256_INT","B.CN.KT192001_INT","F1.BR.FJ771006_INT","57_BC.CN.JX679207_INT","B.US.AF040369_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","B.AU.AF407665_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZA.KT183058_INT","C.ZA.KT183058_INT","F1.BR.FJ771006_INT","BC.CN.KC898983_INT","B.US.HM450245_INT","01_AE.TH.JX448243_INT","B.CN.KT192001_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","38_BF1.UY.FJ213783_INT","C.ZM.KM050042_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.AU.AF407665_INT","B.AU.AF407667_INT","B.US.DQ127547_INT","C.ZA.KT183062_INT","B.US.M38429_INT","B.US.DQ127543_INT","C.ZA.KT183062_INT","C.ZA.KT183062_INT","BF.AR.AF408631_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","01_AE.JP.AB253682_INT","B.AU.AF407667_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","B.US.HM450245_INT","01_AE.TH.JX448243_INT","C.ZM.KM049918_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.CN.KT192001_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZM.KM049913_INT","B.KR.JN417120_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.KR.JN417116_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","F1.BR.FJ771006_INT","B.US.DQ127546_INT","C.ZM.KM050042_INT","BF.AR.AF408631_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.US.HM450245_INT","C.ZA.KT183056_INT","C.ZA.KT183058_INT","C.ZA.KT183062_INT","B.US.DQ127543_INT","B.US.GU076505_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","B.AU.AF407664_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253689_INT","B.AU.AF407665_INT","B.AU.AF407667_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","B.CN.KJ820110_INT","01_AE.TH.JX448243_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZM.KM049917_INT","01_AE.CN.JQ302565_INT","01_AE.VN.FJ185234_INT","01_AE.JP.AB253682_INT","01_AE.JP.AB253682_INT","01_AE.JP.AB253682_INT","F1.BR.FJ771006_INT","F1.BR.FJ771006_INT","F1.BR.FJ771006_INT","B.AU.AF407664_INT","38_BF1.UY.FJ213783_INT","C.ZM.KM049918_INT","38_BF1.UY.FJ213783_INT","B.US.DQ127547_INT","C.ZM.KM049913_INT","01_AE.TH.JX448243_INT","B.US.GU076504_INT","B.US.DQ127543_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","01_AE.JP.AB253686_INT","B.AU.AF407664_INT","B.CN.KC987976_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","B.US.HM450245_INT","01_AE.TH.JX448243_INT","C.ZM.KM049918_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253689_INT","01_AE.JP.AB253689_INT","01_AE.JP.AB253689_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZM.KM049913_INT","B.US.GU076505_INT","B.US.GU076507_INT","B.RU.HM466986_INT","38_BF1.UY.FJ213783_INT","B.CN.KJ820110_INT","38_BF1.UY.FJ213783_INT","B.US.HM450245_INT","C.ZM.KM050042_INT","B.AU.AF407665_INT","C.ZM.KM049917_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.US.DQ127546_INT","C.ZA.KT183056_INT","B.AU.AF407665_INT","B.US.DQ127547_INT","C.ZA.KT183058_INT","B.CN.KJ820110_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","B.US.GU076504_INT","B.CN.KT192001_INT","C.ZA.KT183058_INT","C.ZA.KT183058_INT","B.US.GU076505_INT","B.US.GU076505_INT","B.US.GU076507_INT","BF.AR.AF408631_INT","BF.AR.AF408631_INT","01_AE.JP.AB253682_INT","B.AU.AF407667_INT","B.CN.KC987976_INT","B.RU.HM466986_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253682_INT","01_AE.JP.AB253682_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZM.KM049917_INT","01_AE.VN.LC100946_INT","B.KR.JN417106_INT","B.CN.KT192001_INT","BF.AR.AF408631_INT","B.KR.JN417120_INT","B.KR.JN417117_INT","B.KR.JN417116_INT","B.US.GU076504_INT","38_BF1.UY.FJ213783_INT","BF.AR.AF408631_INT","B.RU.HM466986_INT","C.ZM.KM050042_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.AU.AF407664_INT","B.US.DQ127546_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","C.ZA.KT183058_INT","C.ZM.KM049913_INT","B.US.GU076504_INT","01_AE.VN.LC100946_INT","B.US.GU076505_INT","B.US.AF040369_INT","B.AU.AF407664_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253689_INT","B.RU.HM466986_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253689_INT","01_AE.JP.AB253689_INT","B.CN.KC987976_INT","B.US.GU076504_INT","C.ZM.KM050042_INT","B.US.GU076505_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.AU.AF407667_INT","C.ZM.KM049917_INT","B.AU.AF407667_INT","B.CN.KC987976_INT","01_AE.JP.AB253686_INT","B.US.GU076504_INT","B.US.GU076507_INT","B.US.GU076507_INT","B.US.KJ704790_INT","B.CN.KJ820110_INT","01_AE.VN.LC100946_INT","BF.AR.AF408631_INT","BF.AR.AF408631_INT","B.AU.AF407664_INT","C.ZM.KM050042_INT","C.ZA.KT183062_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.US.HM450245_INT","C.ZM.KM049918_INT","C.ZM.KM049913_INT","01_AE.JP.AB253682_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.US.GU076507_INT","B.ES.KC238594_INT","C.ZA.KT183056_INT","01_AE.VN.LC100946_INT","B.CN.KC987976_INT","C.ZA.KT183058_INT","B.RU.HM466986_INT","C.ZA.KT183056_INT","C.ZA.KT183058_INT","B.US.GU076507_INT","C.ZM.KM049917_INT","01_AE.JP.AB253689_INT","C.ZA.KT183062_INT","B.CN.KC987976_INT","B.AU.AF407665_INT","B.AU.AF407667_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","01_AE.TH.JX448252_INT","01_AE.TH.JX448250_INT","01_AE.TH.JX448249_INT","C.ZM.KM049918_INT","01_AE.VN.LC100946_INT","C.ZM.KM049913_INT","B.CN.KJ820110_INT","C.ZM.KM050042_INT","C.ZM.KM050043_INT","C.ZM.KM050041_INT","B.CN.KT192001_INT","01_AE.JP.AB253682_INT","C.ZM.KM049917_INT","B.RU.HM466986_INT","C.ZM.KM049918_INT","C.ZM.KM049913_INT","01_AE.JP.AB253689_INT","C.ZM.KM049917_INT","B.AU.AF407664_INT","C.ZM.KM049918_INT","C.ZM.KM049913_INT","B.CN.KJ820110_INT","C.ZM.KM049918_INT","C.ZM.KM049913_INT","C.ZM.KM049917_INT","C.ZM.KM049917_INT","B.US.GU076504_INT","C.ZM.KM049918_INT","B.US.GU076505_INT","C.ZM.KM049913_INT","B.CN.KC987976_INT","C.ZM.KM049917_INT","B.US.GU076507_INT"],"y":["01_AE.TH.JX448243_INT","01_AE.TH.JX448243_INT","01_AE.TH.JX448243_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","BC.CN.JQ898256_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","C.ZM.KM049918_INT","01_AE.JP.AB253686_INT","B.US.DQ127546_INT","01_AE.JP.AB253686_INT","B.US.GU076504_INT","B.AU.AF407664_INT","C.ZA.KT183056_INT","B.US.GU076504_INT","B.AU.AF407664_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","BC.CN.JQ898256_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.TH.JX448243_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","B.US.HM450245_INT","B.US.DQ127546_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.US.HM450245_INT","01_AE.VN.LC100946_INT","01_AE.TH.JX448243_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.CN.KJ820110_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253686_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.CN.KJ820110_INT","B.US.DQ127546_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.CN.KJ820110_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.AU.AF407664_INT","B.KR.JN417106_INT","B.US.DQ127546_INT","B.US.HM450245_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.US.GU076504_INT","B.US.HM450245_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.CN.KJ820110_INT","B.KR.JN417106_INT","B.US.GU076504_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.KR.JN417106_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.CN.KJ820110_INT","B.US.HM450245_INT","B.US.GU076504_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.AU.AF407664_INT","B.US.HM450245_INT","B.RU.HM466986_INT","B.US.HM450245_INT","B.CN.KJ820110_INT","B.KR.JN417106_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","B.US.GU076504_INT","B.US.HM450245_INT","B.US.GU076504_INT","B.US.GU076504_INT","B.CN.KJ820110_INT","B.KR.JN417106_INT","B.US.HM450245_INT","B.CN.KJ820110_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.US.HM450245_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.KR.JN417106_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.US.DQ127546_INT","B.RU.HM466986_INT","B.US.HM450245_INT","B.US.GU076504_INT","B.CN.KJ820110_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.CN.KJ820110_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","B.US.GU076504_INT","B.US.HM450245_INT","B.CN.KJ820110_INT","B.US.DQ127546_INT","B.AU.AF407664_INT","B.US.GU076504_INT","B.US.DQ127546_INT","B.US.HM450245_INT","B.KR.JN417106_INT","B.US.GU076504_INT","B.US.GU076504_INT","B.US.GU076504_INT","B.US.GU076504_INT","B.US.GU076504_INT","B.US.DQ127546_INT","B.RU.HM466986_INT","B.CN.KJ820110_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","B.KR.JN417106_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.US.GU076504_INT","B.US.DQ127546_INT","B.KR.JN417106_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","B.AU.AF407664_INT","B.US.DQ127546_INT","B.US.GU076504_INT","B.RU.HM466986_INT","C.ZM.KM049918_INT","B.US.GU076504_INT","BC.CN.JQ898256_INT","C.ZM.KM050042_INT","BC.CN.JQ898256_INT","B.KR.JN417106_INT","C.ZM.KM050042_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","C.ZM.KM049918_INT","BC.CN.JQ898256_INT","C.ZM.KM049918_INT","B.AU.AF407664_INT","BC.CN.JQ898256_INT","B.RU.HM466986_INT","B.US.GU076504_INT","B.AU.AF407664_INT","B.US.GU076504_INT","B.RU.HM466986_INT","BC.CN.JQ898256_INT","C.ZA.KT183056_INT","BC.CN.JQ898256_INT","C.ZA.KT183056_INT","B.AU.AF407664_INT","C.ZM.KM050042_INT","C.ZM.KM049918_INT","C.ZM.KM050042_INT","C.ZM.KM049918_INT","C.ZM.KM049918_INT","C.ZA.KT183056_INT","C.ZM.KM050042_INT","B.RU.HM466986_INT","C.ZM.KM049918_INT","C.ZM.KM050042_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","C.ZA.KT183056_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","C.ZA.KT183056_INT","C.ZM.KM050042_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","B.US.DQ127546_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","B.US.DQ127546_INT","38_BF1.UY.FJ213783_INT","B.KR.JN417106_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","38_BF1.UY.FJ213783_INT","B.US.DQ127546_INT","BC.CN.JQ898256_INT","B.KR.JN417106_INT","BC.CN.JQ898256_INT","38_BF1.UY.FJ213783_INT","BC.CN.JQ898256_INT","B.KR.JN417106_INT","B.US.DQ127546_INT","C.ZM.KM050042_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","38_BF1.UY.FJ213783_INT","BC.CN.JQ898256_INT","B.US.DQ127546_INT","BC.CN.JQ898256_INT","B.US.DQ127546_INT","BC.CN.JQ898256_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","BC.CN.JQ898256_INT","B.KR.JN417106_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","B.KR.JN417106_INT","C.ZM.KM050042_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.TH.JX448243_INT","BC.CN.JQ898256_INT","B.US.HM450245_INT","BC.CN.JQ898256_INT","B.US.HM450245_INT","C.ZA.KT183056_INT","B.US.HM450245_INT","B.US.DQ127546_INT","B.CN.KJ820110_INT","B.US.DQ127546_INT","C.ZM.KM050042_INT","01_AE.TH.JX448243_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","BC.CN.JQ898256_INT","B.US.HM450245_INT","01_AE.JP.AB253686_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","BC.CN.JQ898256_INT","B.US.GU076504_INT","38_BF1.UY.FJ213783_INT","BC.CN.JQ898256_INT","B.US.HM450245_INT","B.US.GU076504_INT","C.ZA.KT183056_INT","B.US.DQ127546_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.VN.LC100946_INT","C.ZM.KM050042_INT","01_AE.TH.JX448243_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","01_AE.TH.JX448243_INT","01_AE.TH.JX448243_INT","01_AE.TH.JX448243_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","BC.CN.JQ898256_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","B.US.GU076504_INT","BC.CN.JQ898256_INT","B.AU.AF407664_INT","BC.CN.JQ898256_INT","B.AU.AF407664_INT","01_AE.TH.JX448243_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","01_AE.VN.LC100946_INT","B.US.DQ127546_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","BC.CN.JQ898256_INT","B.US.HM450245_INT","B.US.HM450245_INT","38_BF1.UY.FJ213783_INT","C.ZA.KT183056_INT","38_BF1.UY.FJ213783_INT","B.KR.JN417106_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","C.ZM.KM050042_INT","B.AU.AF407664_INT","B.KR.JN417106_INT","C.ZM.KM049918_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","C.ZM.KM049918_INT","C.ZM.KM049918_INT","C.ZM.KM049918_INT","B.US.DQ127546_INT","C.ZM.KM050042_INT","B.AU.AF407664_INT","C.ZA.KT183056_INT","B.CN.KJ820110_INT","BC.CN.JQ898256_INT","C.ZM.KM050042_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","BC.CN.JQ898256_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.US.HM450245_INT","BC.CN.JQ898256_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","B.US.HM450245_INT","C.ZM.KM050042_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.CN.KJ820110_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","C.ZA.KT183056_INT","B.KR.JN417106_INT","B.AU.AF407664_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","B.RU.HM466986_INT","C.ZM.KM050042_INT","C.ZM.KM050042_INT","C.ZA.KT183056_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","BC.CN.JQ898256_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","B.RU.HM466986_INT","C.ZA.KT183056_INT","BC.CN.JQ898256_INT","B.RU.HM466986_INT","C.ZA.KT183056_INT","C.ZM.KM050042_INT","B.RU.HM466986_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","C.ZM.KM049918_INT","01_AE.VN.LC100946_INT","01_AE.TH.JX448243_INT","B.US.HM450245_INT","01_AE.TH.JX448243_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","C.ZM.KM050042_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","C.ZM.KM050042_INT","B.US.DQ127546_INT","C.ZM.KM049918_INT","C.ZM.KM050042_INT","B.CN.KJ820110_INT","B.US.GU076504_INT","C.ZM.KM050042_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","C.ZA.KT183056_INT","01_AE.TH.JX448243_INT","B.US.HM450245_INT","C.ZM.KM049918_INT","B.KR.JN417106_INT","01_AE.JP.AB253686_INT","B.KR.JN417106_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","B.KR.JN417106_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","C.ZM.KM049918_INT","C.ZM.KM049918_INT","B.US.GU076504_INT","C.ZM.KM050042_INT","B.US.DQ127546_INT","B.RU.HM466986_INT","B.US.DQ127546_INT","B.US.DQ127546_INT","C.ZA.KT183056_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.AU.AF407664_INT","C.ZA.KT183056_INT","01_AE.TH.JX448243_INT","B.US.GU076504_INT","B.US.GU076504_INT","01_AE.TH.JX448243_INT","B.AU.AF407664_INT","C.ZA.KT183056_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253686_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","01_AE.TH.JX448243_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","01_AE.JP.AB253686_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","B.US.HM450245_INT","C.ZM.KM049918_INT","B.KR.JN417106_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.AU.AF407664_INT","38_BF1.UY.FJ213783_INT","C.ZM.KM049918_INT","C.ZA.KT183056_INT","38_BF1.UY.FJ213783_INT","B.US.GU076504_INT","01_AE.TH.JX448243_INT","C.ZM.KM049918_INT","B.US.GU076504_INT","B.US.GU076504_INT","B.US.GU076504_INT","B.AU.AF407664_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","01_AE.JP.AB253686_INT","01_AE.JP.AB253686_INT","01_AE.VN.LC100946_INT","C.ZM.KM049918_INT","01_AE.TH.JX448243_INT","B.US.HM450245_INT","B.US.HM450245_INT","C.ZM.KM049918_INT","B.KR.JN417106_INT","C.ZM.KM049918_INT","C.ZM.KM049918_INT","C.ZM.KM049918_INT","01_AE.TH.JX448243_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","B.RU.HM466986_INT","38_BF1.UY.FJ213783_INT","B.CN.KJ820110_INT","C.ZM.KM050042_INT","B.US.HM450245_INT","C.ZM.KM050042_INT","38_BF1.UY.FJ213783_INT","B.US.HM450245_INT","B.US.HM450245_INT","C.ZA.KT183056_INT","B.US.DQ127546_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","B.US.DQ127546_INT","C.ZA.KT183056_INT","B.CN.KJ820110_INT","B.US.GU076504_INT","C.ZA.KT183056_INT","C.ZM.KM050042_INT","B.CN.KJ820110_INT","B.US.GU076504_INT","C.ZA.KT183056_INT","01_AE.VN.LC100946_INT","01_AE.TH.JX448243_INT","BC.CN.JQ898256_INT","C.ZM.KM049918_INT","B.AU.AF407664_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","B.RU.HM466986_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","01_AE.TH.JX448243_INT","B.KR.JN417106_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","38_BF1.UY.FJ213783_INT","B.US.GU076504_INT","B.US.GU076504_INT","C.ZM.KM050042_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","C.ZA.KT183056_INT","C.ZM.KM049918_INT","B.AU.AF407664_INT","B.US.DQ127546_INT","B.AU.AF407664_INT","B.US.DQ127546_INT","01_AE.VN.LC100946_INT","B.US.GU076504_INT","01_AE.JP.AB253686_INT","C.ZM.KM049918_INT","01_AE.VN.LC100946_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","01_AE.VN.LC100946_INT","B.RU.HM466986_INT","B.CN.KJ820110_INT","B.RU.HM466986_INT","38_BF1.UY.FJ213783_INT","C.ZM.KM050042_INT","B.US.GU076504_INT","C.ZM.KM050042_INT","B.US.GU076504_INT","B.US.GU076504_INT","C.ZM.KM050042_INT","B.US.DQ127546_INT","C.ZA.KT183056_INT","C.ZA.KT183056_INT","B.US.GU076504_INT","01_AE.JP.AB253686_INT","C.ZA.KT183056_INT","01_AE.VN.LC100946_INT","C.ZM.KM049918_INT","01_AE.VN.LC100946_INT","B.CN.KJ820110_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","C.ZM.KM050042_INT","B.AU.AF407664_INT","B.RU.HM466986_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","C.ZM.KM049918_INT","B.US.HM450245_INT","B.US.HM450245_INT","B.US.GU076504_INT","38_BF1.UY.FJ213783_INT","01_AE.VN.LC100946_INT","01_AE.JP.AB253686_INT","C.ZM.KM049918_INT","01_AE.VN.LC100946_INT","C.ZA.KT183056_INT","01_AE.VN.LC100946_INT","01_AE.VN.LC100946_INT","C.ZA.KT183056_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","C.ZM.KM050042_INT","B.US.HM450245_INT","B.US.GU076504_INT","01_AE.VN.LC100946_INT","C.ZM.KM050042_INT","C.ZM.KM049918_INT","C.ZM.KM049918_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","01_AE.JP.AB253686_INT","01_AE.TH.JX448243_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","38_BF1.UY.FJ213783_INT","01_AE.VN.LC100946_INT","C.ZM.KM049918_INT","01_AE.VN.LC100946_INT","C.ZM.KM050042_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","C.ZM.KM049918_INT","38_BF1.UY.FJ213783_INT","01_AE.VN.LC100946_INT","C.ZM.KM049918_INT","B.RU.HM466986_INT","B.RU.HM466986_INT","38_BF1.UY.FJ213783_INT","B.RU.HM466986_INT","C.ZM.KM049918_INT","B.AU.AF407664_INT","B.AU.AF407664_INT","C.ZM.KM049918_INT","B.CN.KJ820110_INT","B.CN.KJ820110_INT","B.AU.AF407664_INT","B.CN.KJ820110_INT","C.ZM.KM049918_INT","B.US.GU076504_INT","C.ZM.KM049918_INT","B.US.GU076504_INT","C.ZM.KM049918_INT","B.US.GU076504_INT","C.ZM.KM049918_INT"],"z":[0,0,0,0,0,0,0,0,0,0,0,0.00118588838545844,0.00118624017681518,0.00118624017681518,0.00237530137929893,0.00356295206894832,0.00356719705693705,0.00356719705693705,0.00595350496095845,0.00595954611860516,0.00715254630296478,0.00715254630296478,0.0144078126601721,0.0144078126601721,0.0155870201477025,0.0156273254598515,0.0156273254598515,0.0168178038999123,0.0168178038999123,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.019250040193739,0.019250040193739,0.0204770346745223,0.0241565618280353,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0242220267343138,0.0253584872200611,0.0253941862223067,0.0254642633981448,0.0254642633981448,0.0254642633981448,0.0266143894883516,0.0266348816479994,0.0278192110973387,0.0278192110973387,0.0278786633839003,0.0290614477611697,0.0291019568084084,0.0291255468231015,0.0291255468231015,0.0302883205933425,0.030350396907702,0.030350396907702,0.0316019620167614,0.0328067836257485,0.0340614894284614,0.0340614894284614,0.0341145301180979,0.0352946202395574,0.035319351727085,0.0353755648344528,0.0354070364734542,0.0364714474011735,0.0365300078460321,0.0365300078460321,0.0365540667492993,0.0365803864434399,0.0365803864434399,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378446096201193,0.0378446096201193,0.0378747853263698,0.0378747853263698,0.0378747853263698,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391778645270624,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0402300862987281,0.0403226935406603,0.0403515533580218,0.0415651479742212,0.0416238186755191,0.0417288904690102,0.0428098896677898,0.0428993295713217,0.0429337120779973,0.0429337120779973,0.0440569279050769,0.0440837757071892,0.0440837757071892,0.0441443681969443,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0442141225088342,0.0442141225088342,0.0453324388008796,0.0453324388008796,0.0453324388008796,0.0454247786277811,0.0454247786277811,0.0454247786277811,0.0454247786277811,0.0454601546313492,0.0454601546313492,0.0454601546313492,0.0455377705654831,0.0466112118764381,0.0466112118764381,0.0466413260405744,0.0467084763804041,0.0467084763804041,0.0478367045264898,0.0478367045264898,0.0479250237931973,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0481184214555959,0.0491187565109586,0.0491187565109586,0.0491786116124573,0.0492477575062494,0.0494137589676422,0.0494137589676422,0.0504345221143317,0.0504672721196076,0.0505397478892119,0.0505397478892119,0.0517592625025701,0.0517592625025701,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0519201928139014,0.0519201928139014,0.0519201928139014,0.0519201928139014,0.0530546000146163,0.0531768718957638,0.0543155247820763,0.0543533020434591,0.0543533020434591,0.055615915632105,0.055615915632105,0.055655386112653,0.055655386112653,0.055655386112653,0.055655386112653,0.0556972010120974,0.055741355269655,0.055741355269655,0.055741355269655,0.055741355269655,0.055741355269655,0.0568808787627047,0.0569196973356454,0.0569608698830132,0.0570043912991329,0.0570043912991329,0.0582697711540526,0.05831500803907,0.05831500803907,0.05831500803907,0.05831500803907,0.05836259325208,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0727187684900024,0.0740077866714672,0.0752992347330038,0.0753554963823033,0.0766487632175863,0.0780618629356785,0.0781890298399144,0.0781890298399144,0.0783259358529553,0.0783259358529553,0.07924264135199,0.07924264135199,0.0794205594239284,0.0794205594239284,0.079551428503007,0.0805994942877881,0.0806581857138925,0.0807193420094362,0.0808490289258925,0.0809885151347227,0.0809885151347227,0.0809885151347227,0.0809885151347227,0.0812967310572018,0.0813798501393859,0.0813798501393859,0.0819600393449685,0.082020584376985,0.082020584376985,0.082020584376985,0.082020584376985,0.0820835993360382,0.0820835993360382,0.0820835993360382,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0823602581075981,0.0825934447555855,0.0825934447555855,0.0826760422767417,0.082761066487341,0.082761066487341,0.082761066487341,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0834515892454756,0.0835887618082053,0.083735774793543,0.0841461089242488,0.0841461089242488,0.0841461089242488,0.0846922865810079,0.0847565693159816,0.0847565693159816,0.0848925702635871,0.0854461324645494,0.0855349987062,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0861301965535511,0.0862699816030094,0.0862699816030094,0.0862699816030094,0.0864196683669504,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.0865792181538065,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0871923074146537,0.0872503555801866,0.0874395462398282,0.0875076078929734,0.0875076078929734,0.0875076078929734,0.0875076078929734,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0878047108038582,0.0878047108038582,0.0878851762008905,0.0878851762008905,0.0879681079357578,0.0880535013582811,0.0880535013582811,0.0880535013582811,0.0885640503875291,0.0885640503875291,0.0885640503875291,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.088818857459368,0.0888888242409284,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0890362403878722,0.0890362403878722,0.0890362403878722,0.0891136799014978,0.0891136799014978,0.0891136799014978,0.089275997681829,0.089275997681829,0.089275997681829,0.089275997681829,0.0894482024020703,0.0894482024020703,0.0894482024020703,0.089538000835919,0.0900657428371278,0.0900657428371278,0.0901326079009568,0.0901326079009568,0.0901326079009568,0.0902019842058131,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0904251301698234,0.0904251301698234,0.0905045013824362,0.0905863591465226,0.0905863591465226,0.0905863591465226,0.0906706987256182,0.0906706987256182,0.0907575154492772,0.0907575154492772,0.090846804712601,0.090846804712601,0.090846804712601,0.0913826105051333,0.0914488695835729,0.0915889476471757,0.0915889476471757,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0918992024262255,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0920693010361488,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0927676529420411,0.0927676529420411,0.0927676529420411,0.0927676529420411,0.0928358330249355,0.0928358330249355,0.0929065401198158,0.0929797691281142,0.0930555150205005,0.0930555150205005,0.0930555150205005,0.0930555150205005,0.093214537683343,0.0932978047367561,0.0932978047367561,0.0933835692391848,0.0933835692391848,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0936558008647126,0.0936558008647126,0.0942266545058739,0.094299298680529,0.094299298680529,0.094299298680529,0.094299298680529,0.094299298680529,0.0943744701719034,0.0943744701719034,0.0943744701719034,0.0943744701719034,0.0945323751469019,0.0945323751469019,0.0946150988231988,0.0946150988231988,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0949710235302174,0.0949710235302174,0.0949710235302174,0.0950662390355332,0.0950662390355332,0.0950662390355332,0.0956213555496632,0.0956213555496632,0.095773072482434,0.095773072482434,0.095773072482434,0.095773072482434,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.096196516180269,0.096196516180269,0.096196516180269,0.0962887353590812,0.0962887353590812,0.0962887353590812,0.0962887353590812,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0966754557211777,0.0967392384692493,0.0968055853353737,0.0968055853353737,0.0969459502375118,0.0969459502375118,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0973413712990135,0.097428045764283,0.097428045764283,0.0975172390596885,0.0976089464980463,0.0976089464980463,0.0976089464980463,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0980682142818908,0.0980682142818908,0.0983465113773677,0.0984224833239981,0.0984224833239981,0.0984224833239981,0.0984224833239981,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0987518094698342,0.0987518094698342,0.0987518094698342,0.0989316671599273,0.0990253780401984,0.0990253780401984,0.0990253780401984,0.0991216041269198,0.0991216041269198,0.0991216041269198,0.0991216041269198,0.0996756187798842,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0999094557142755,0.0999094557142755,0.0999925068053664,0.0999925068053664,0.100166237570073,0.100166237570073,0.100166237570073,0.100256907624212,0.100350107827527,0.100544080193036,0.100544080193036,0.100644843278986,0.100644843278986,0.101321886032053,0.101321886032053,0.101321886032053,0.101321886032053,0.101321886032053,0.101406934905605,0.101406934905605,0.101406934905605,0.101584678237673,0.101677363125014,0.101677363125014,0.101677363125014,0.101772583893643,0.101870335929038,0.101870335929038,0.101870335929038,0.101870335929038,0.102738317574205,0.102738317574205,0.102738317574205,0.102738317574205,0.102738317574205,0.103007154303788,0.103298908329428,0.104158773075007,0.104247851639321,0.104247851639321,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104942864013885,0.104942864013885,0.104942864013885,0.105674386130524,0.105674386130524,0.105674386130524,0.105674386130524,0.105864304935508,0.105963103650085,0.106274801952539,0.107011847863434,0.107011847863434,0.107011847863434,0.107399886247951,0.1083518876488,0.108539723485016,0.108539723485016,0.108539723485016,0.108637512537915,0.108637512537915,0.108637512537915,0.109881296197943,0.109978573398824,0.11113129880212,0.11113129880212,0.11113129880212,0.11113129880212,0.111421575843391,0.11247859423247,0.112668461221151],"zmin":0,"zmax":0.03,"xgap":2,"ygap":1,"type":"heatmap","xaxis":"x","yaxis":"y","frame":null}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>
</div>
<div id="phylogenetic-tree" class="section level2">
<h2>Phylogenetic tree</h2>
<p>Above we used the package <a href="http://ape-package.ird.fr/">ape</a> to calculate the genetic distances for the heatmap.</p>
<p>Another way of looking at our alignment data is to use phylogenetic inference. The PhyloPi pipeline saves each step of phylogenetic inference to allow the user to intercept at any step. We can use the <a href="https://en.wikipedia.org/wiki/Newick_format">newick tree file</a> (a text file formatted as newick) and draw our own tree:</p>
<pre class="r"><code>tree <- read.tree("example-tree.txt")
plot.phylo(
tree, cex = 0.8,
use.edge.length = TRUE,
tip.color = 'blue',
align.tip.label = FALSE,
show.node.label = TRUE
)
nodelabels("This one", 9, frame = "r", bg = "red", adj = c(-8.2,-46))</code></pre>
<p><img src="/post/2019/2019-05-14_analysing-hiv-pandemic-part-3/2019-05-14-analysing-hiv-pandemic-part-3_files/figure-html/unnamed-chunk-9-1.png" width="1152" /></p>
<p>We have highlighted a node with a red block, with the text “This one”, which we can now discuss. We have three leaves in this node - KM050043, KM050042, KM050041 - and if you would look up these accession numbers at <a href="https://www.ncbi.nlm.nih.gov/nuccore/KM050041.1/">NCBI</a>, you will notice the publication it is tied to:</p>
<blockquote>
<p>“HIV transmission. Selection bias at the heterosexual HIV-1 transmission bottleneck”</p>
</blockquote>
<p>In this paper, the authors looked at selection bias when the infection is transmitted. They found that in a pool of viral quasi-species, transmission is biased to benefit the fittest viral quasi-species. The node highlighted above shows the kind of clustering one would expect with a study like the one mentioned above. You will also notice plenty of other nodes, which you can explore using the accession number and searching for it <a href="https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html">here</a>.</p>
<p>The tree above is much like a <a href="https://en.wikipedia.org/wiki/Dendrogram">dendrogram</a> used when displaying <a href="https://en.wikipedia.org/wiki/Hierarchical_clustering#Agglomerative_clustering_example">agglomerative</a> or <a href="https://en.wikipedia.org/wiki/Hierarchical_clustering">hierarchical clustering</a>. The numbers on the tree indicate the probability that the corresponding clusters are correct. The branch lengths indicate the distances between samples. In conjunction with a properly coloured heatmap, this is very useful for finding relevant clusters to investigate. If the reason for close clustering cannot be explained, the tests are repeated.</p>
</div>
<div id="the-importance-of-phylogenetics" class="section level2">
<h2>The importance of phylogenetics</h2>
<p>Phylogenetics, and thus genetic distance calculations, are used in many branches of biology. It is one of the quality-control measures at our disposal, but it has been used for the reconstruction of the origin of HIV. You may find the research papers listed below interesting where the authors used phylogenetics to infer the zoonotic origins of HIV.</p>
<ul>
<li><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3234451/">Paul M. Sharp and Beatrice H. Hahn</a></p></li>
<li><p><a href="https://science.sciencemag.org/content/287/5453/607.long">Beatrice H. Hahn et al.</a></p></li>
</ul>
<p>As another example, in 1998, six foreign medical workers were accused of deliberately infecting hospitalized children with HIV and were <a href="https://en.wikipedia.org/wiki/HIV_trial_in_Libya">sentenced to death in Libya</a>. In 2006, <a href="https://www.nature.com/articles/444836a">de Oliveira, et al.</a> used phylogenetics to provide evidence that the origin of the HIV strains that infected the children had an evolutionary history in the mid-90s, which was before the health care workers arrived in 1998. The six medics were released in 2007. There is also a very good writeup on the case by <a href="https://www.nature.com/articles/444658b">Declan Butler</a>. Although probably very emotional, this would be a great movie.</p>
<p>These techniques are also used in criminal convictions. However, the interpretation of this kind of evidence in court cases can be unsafe. The insights of <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1971185/">Pillay, et al.</a> should bring this to light.</p>
</div>
<div id="summary" class="section level2">
<h2>Summary</h2>
<p>In this post we discussed that as infections spread from person to person, the virus continues to mutate and become more and more divergent. This allows using the genetic information we obtain while doing the drug resistance test and analyse the sequences for abnormalities.</p>
<p>We then showed how to compute genetic distance using multiple sequence alignment (MSA) and that it’s possible to model this process as a Markov chain. Then you can view the resulting model as a heatmap or phylogenetic trees.</p>
<p>This finds practical application in diverse situations, for exampling shedding light on the origin of the HIV virus, as well as evidence in legal trials.</p>
</div>
<div id="whats-next" class="section level2">
<h2>What’s next</h2>
<p>In the fourth and final part of this series, we will show how we analysed the inter- and intra-patient genetic distances of HIV sequences by logistic regression. This was useful in properly colouring our heatmap explained in this series. See you there!</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/';</script>
Analysing the HIV pandemic, Part 2: Drug resistance testing
https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/
Tue, 07 May 2019 00:00:00 +0000https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/
<p><em>Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the <a href="https://www.ufs.ac.za/health/departments-and-divisions/virology-home">Division of Virology</a>, <a href="https://www.ufs.ac.za">University of the Free State</a>, and <a href="http://www.nhls.ac.za/">National Health Laboratory Service (NHLS)</a>, Bloemfontein, South Africa</em></p>
<p><em>Dominique Goedhals is a pathologist, researcher, and lecturer at the <a href="https://www.ufs.ac.za/health/departments-and-divisions/virology-home">Division of Virology</a>, <a href="https://www.ufs.ac.za">University of the Free State</a>, and <a href="http://www.nhls.ac.za/">National Health Laboratory Service (NHLS)</a>, Bloemfontein, South Africa</em></p>
<p><em>Andrie de Vries is the author of “R for Dummies”, and a Solutions Engineer at RStudio</em></p>
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>In <a href="https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/">part 1</a> of this four-part series about HIV AIDS, we discussed the <a href="https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/">HIV pandemic in Sub-Saharan Africa</a>. In this second installment, we cover a recent publication in the <a href="https://journals.plos.org/plosone/">PLoS ONE journal</a>: “<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241">PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility</a>”.</p>
<p>The authors described how they used affordable hardware to create a <a href="https://en.wikipedia.org/wiki/Phylogenetics">phylogenetic</a> pipeline, tailored for the HIV drug-resistance testing facility.</p>
</div>
<div id="hiv-drug-resistance" class="section level2">
<h2>HIV drug resistance</h2>
<p>Natural selection is the process by which some form of selective pressure favours a <strong>phenotypic</strong> trait or change. These phenotypic traits can be the blood group of a person, whether a pea is wrinkly or not, or whether an infectious organism is susceptible or resistant to a drug. Many times these phenotypic traits, or physical attributes, are caused by genetics.</p>
<p><strong>Genotyping</strong> is the process by which one can infer this phenotypic trait from a genotype, and this is used more and more frequently in medicine. For exampe, in breast cancer treatment, the BRCA (BReast CAncer) genes are genotyped to determine whether these cancer suppressing genes are intact. If there is a deleterious or damaging mutation in one of these genes, it can increase the risk of developing breast cancer, thus a phenotype of increased risk of breast cancer.</p>
<p>For most organisms, the copying of genetic material happens by very precise enzymes or pathways, but occasionally mutations do occur. If a mutation occurs and is sufficiently damaging, it gets removed from the gene pool. However, if the mutation is sufficiently beneficial, it increases the survival of this genetic variation and might biasly select for it.</p>
<p>In the <a href="https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/">previous post</a>, we discussed <strong>ARVs</strong> (antiretrovirals) and how these drugs changed the landscape of HIV infection by preventing the development of AIDS. We mentioned that ARVs suppress viral replication. One of the steps in HIV replication is the conversion of its single-stranded RNA to DNA, which can then be incorporated into the DNA of infected cells. The enzyme responsible for this conversion is reverse transcriptase, and it has a high error rate when doing this conversion. One can thus say that HIV has a high evolutionary rate, or mutation rate. These genes are translated into viral proteins, which are required to make more virions (viral particles). Proteins are strings or polymers of amino acid residues with an alphabet of 20 choices of amino acids or letters. The sequence of the DNA or RNA influences the sequence of the protein; thus, mutations in the DNA or RNA can result in changes in the protein, and our targets for stopping HIV replication are proteins/enzymes.</p>
<p>There are various classes of ARVs which interfere with viral replication by inhibition of viral enzymes. If the DNA or RNA sequence encoding this enzyme is changed, the result might be an unfit virus not capable of further infection or replication. On the other hand, if this mutation results in an ARV-resistant virus, replication and infection can still continue in the presence of the ARV in question, possibly causing the ARV to become ineffective in stopping replication.</p>
<p>The question remains, why do people develop resistance? The short answer: it’s a numbers game.</p>
<p>If the patient received the correct regimen of ARVs (known as <strong>HAART</strong>, or highly active antiretroviral treatment) and is taking the doses correctly, the viral load will suppress. Suppression is caused by stopping viral replication, and if the virus is not replicating, the error-prone reverse transcriptase can’t cause mutations, which in turn cannot be favoured by selective pressure. If the patient is not taking any treatment, the virus is replicating and thus inevitably mutating, but there is no selective pressure to select for these variants. Lastly, if the patient is adhering poorly to the treatment, there are times where the levels of the treatment are too low to effectively suppress viral replication completely. In this scenario, mutants with a mutation which makes them less susceptible to the treatment will replicate more than the wild type counterparts - these are called escape mutants.</p>
<p>The reason why this is a numbers game is that the virus is mutating randomly and one resulting amino acid residue could be replaced by any of 19 other amino acid residues. It is only when this change causes an increase in replicative fitness while there is some form of selective pressure that this mutant can become a dominant quasi-species and the patient develops resistance.</p>
<p>Mutations are expressed using the notation <code>[WT AA][POS][Mutant AA]</code>, where:</p>
<ul>
<li>WT denotes wild type (the typical genotype)</li>
<li>AA denotes amino acid residue</li>
<li>POS denotes the position on the protein</li>
<li>Mutant means the changed genotype</li>
</ul>
<p>We mentioned some classes of ARVs in part 1. To the viral reverse transcriptase, <strong>NRTIs</strong> (Nucleoside/Nucleotide Reverse Transcriptase Inhibitors) look like the building blocks of DNA called nucleotides. If the reverse transcriptase incorporates one of these ‘fake’ nucleotides, it is not able to further extend the DNA strand, leaving it incomplete, thus interfering with replication. Not all mutations cause the same level of resistance. These levels are:</p>
<table>
<thead>
<tr class="header">
<th>Level</th>
<th>Total score</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Susceptible</td>
<td>0 to 9</td>
</tr>
<tr class="even">
<td>Potential low-level resistance</td>
<td>10 to 14</td>
</tr>
<tr class="odd">
<td>Low-level resistance</td>
<td>15 to 29</td>
</tr>
<tr class="even">
<td>Intermediate resistance</td>
<td>30 to 59</td>
</tr>
<tr class="odd">
<td>High-level resistance</td>
<td>>= 60</td>
</tr>
</tbody>
</table>
<p><a href="https://hivdb.stanford.edu/page/release-notes/">Source</a></p>
<p>We can plot resistance scores for five commonly used NRTIs.</p>
<pre class="r"><code>suppressPackageStartupMessages({
library(dplyr)
library(readr)
library(stringr)
library(tidyr)
library(ggplot2)
library(knitr)
library(broom)
})</code></pre>
<pre class="r"><code>nrti_dr_scores <- read_tsv("ScoresNRTI_1555579653110.tsv", col_types = "cdcddddddd")
nrti_dr_scores %>%
select(Rule, ABC:AZT, FTC:TDF) %>%
gather(arv, score, 2:6) %>%
filter(!grepl(" ", Rule)) %>%
mutate(effect = ifelse(score > 0, "resistance", "hyper-susceptible")) %>%
ggplot(aes(x = Rule, y = score, fill = effect)) +
geom_col() +
coord_flip() +
theme_bw() +
facet_grid(. ~ arv)</code></pre>
<p><img src="/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-1-1.png" width="960" /></p>
<p>We can see that 3TC and FTC have the exact same profiles, and they are chemically also very similar, as shown in the figure below.</p>
<hr />
<div class="figure" style="text-align: center">
<img src="/post/2019-05-07-analysis-hiv-pandemic-part-2_files/lamivu10.gif" alt="The chemical structures of 3TC (left) and FTC (right). Available at http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber" style="margin:50px 10px" />
<p class="caption">
(#fig:3TC and FTC)The chemical structures of 3TC (left) and FTC (right). Available at <a href="http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber" class="uri">http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber</a>
</p>
</div>
<hr />
<p>Also, note that some of the mutations increase susceptibility for AZT and TDF, indicated by a negative value for resistance. This is called <strong>hyper-susceptibility</strong>, and is used by clinicians treating patients.</p>
<p>For example, the mutation <strong>M184V</strong> means that the wild type AA at position 184 is a methionine (M) and it has been mutated to valine (V). Although this mutation makes the virus highly resistant to 3TC, it has a crippling effect on viral replication, i.e., the virus can still replicate in the presence of 3TC, but slower. This mutation also makes the virus hypersusceptible to AZT and TDF. The way clinicians use this knowledge is to keep patients on 3TC in order to keep the selective pressure for M184V, and use AZT or TDF as the other NRTI. It is typical to have a patient on two NRTIs, which is sometimes referred to as the “back bone”, and then one drug from another drug class to which the patient is fully susceptible. Knowing the genotype of the virus allows us to infer the phenotype, which in this case is the drug-resistance profile.</p>
</div>
<div id="phylopi-an-affordable-purpose-built-phylogenetic-pipeline-for-the-hiv-drug-resistance-testing-facility" class="section level2">
<h2>PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility</h2>
<p>The goal of HIV drug resistance genotyping is to determine which drugs will produce the best response in the patient, and, as mentioned earlier, we use the viral sequence information for this. Due to the rapid evolution of HIV, we can use this attribute in quality assurance. <strong>PCR</strong> (polymerase chain reaction) is very sensitive to contamination, and if gross cross-contamination occurred during this process, the sequences of, say, two unrelated individuals might be very similar. Also, the viral sequences of a patient over time will be more similar than the sequences between different people.</p>
<p>Let’s say we genotyped a patient five years ago and we have a current genotype sequence. It should be possible to retrieve the previous sequence from a database of sequences without relying on identifiers only, or at all. Sometimes when someone remarries they may change their surname or transcription errors can be made, which makes finding previous samples tedious and error-prone. So instead of using patient information to look for previous samples to include, we can instead use the sequence data itself, and then confirm the sequences belong to the same patient, or investigate any irregularities. If we suspect mother-to-child transmission from our analysis, we confirm this with the health care worker who sent the sample.</p>
<p>We recently published an automated pipeline for maintaining a sequence database, automatically retrieving the most similar sequences from previous genotyped viral isolates, calculating genetic distances and phylogenetic inference. Let’s look at each of these steps.</p>
<p>Firstly, we cannot conduct phylogenetic analysis on all past and present sequences; this would be very computationally expensive and time-consuming, and the result will be very difficult to interpret. Rather, we want to focus on the current batch of sequences the laboratory generated, but also the most similar sequences from previous batches stored in our rolling database:</p>
<ul>
<li>We used a tool called <a href="https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download"><code>BLAST</code></a> (Basic Local Alignment Search Tool) for this. This tool is used to add our new submissions to the current rolling database and then also retrieve the most similar previous sequences.</li>
<li>These sequences are aligned using <a href="https://mafft.cbrc.jp/alignment/software/"><code>MAFFT</code></a>.</li>
<li>The resulting multiple sequence alignment is automatically curated with <a href="http://trimal.cgenomics.org/"><code>trimAl</code></a>.</li>
</ul>
<p>Finally, the sequences are ready for phylogenetic inference.</p>
<ul>
<li>For this, we used <a href="http://www.microbesonline.org/fasttree/"><code>FastTree</code></a>. As its name implies, it is fast and capable of handling large datasets requiring minimal resources.</li>
<li>The resulting tree is rendered using the <a href="http://etetoolkit.org/"><code>ETE3</code></a> python API.</li>
<li>R is used to calculate a distance matrix from the multiple sequence alignment using the <a href="https://cran.r-project.org/web/packages/ape/index.html"><code>ape</code></a> library and <a href="https://plot.ly/r/"><code>plotly</code></a> for visualization.</li>
</ul>
<p>In part 3 of this series, we will talk more about the distance matrix calculation and how logistic regression was used to look at inter- and intra-patient genetic distances of HIV sequences by mining a large public database at the <a href="https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html">Los Alamos HIV sequence database</a>. This was important, as the insights gained here were used to colour the distance matrix so that the user’s attention is drawn to relevant samples.</p>
<p>This is an R for medicine blog post, but there is a lot of jargon in the paragraph above. We can clear things up a bit, but please check out our <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241">publication</a>.</p>
</div>
<div id="how-does-it-work" class="section level2">
<h2>How does it work?</h2>
<p>Firstly, our DNA sequences are strings consisting of an alphabet: A, C, G, and T. Also, genetic distances are much like <a href="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein</a> or <a href="https://en.wikipedia.org/wiki/Hamming_distance">Hamming</a> distances, or other <a href="https://en.wikipedia.org/wiki/Edit_distance">edit distance</a> algorithms.</p>
<div id="raw-strings" class="section level3">
<h3>Raw strings</h3>
<p>Consider the following strings, A, B and C:</p>
<pre><code>A: peter kicked the ball really far
B: i think it was yesterday when peter kicked the ball really far
C: pieter kicked the round ball really hard</code></pre>
<p>We can see that there are obvious similarities between these three sentences, but it would be much easier if they where aligned.</p>
</div>
<div id="aligned-strings" class="section level3">
<h3>Aligned strings</h3>
<pre><code>A: ______________________________p eter kicked the _____ ball really far
B: i think it was yesterday when p eter kicked the _____ ball really far
C: ______________________________pieter kicked the round ball really hard</code></pre>
<p>By aligning the string it is much easier to calculate the similarities or differences.</p>
</div>
<div id="curated-strings" class="section level3">
<h3>Curated strings</h3>
<p>Next, we remove the overhangs since it is possible that in reality strings A and C also had more text on the left-hand side, but it was not sampled. Depending on your situation, we could also remove the internal ‘gaps’ like the word ‘round’. For our pipeline, insertions and deletions, like the letter ‘i’ in our example and the word ‘round’ are real features we would like to include. We also have a substitution in C, where the ‘f’ in A and B was changed to an ‘h’.</p>
<pre><code>A: p eter kicked the _____ ball really far
B: p eter kicked the _____ ball really far
C: pieter kicked the round ball really har</code></pre>
</div>
<div id="calculation" class="section level3">
<h3>Calculation</h3>
<pre><code>A: p eter kicked the _____ ball really far
B: p eter kicked the _____ ball really far
M: 111111 111111 111 11111 1111 111111 111</code></pre>
<p>We can see for A and B we have matches for all of the features. If we sum up all the ones, we get 33, so the distance between them:</p>
<p><span class="math display">\[ d = \frac{33 - 33}{33} = 0\]</span></p>
<pre><code>B: p eter kicked the _____ ball really far
C: pieter kicked the round ball really har
M: 101111 111111 111 00000 1111 111111 011</code></pre>
<p><span class="math display">\[ d = \frac{33 - 26}{33} = 0.212\]</span></p>
<p>After the multiple sequence alignment and curation, each sequence is compared to each in order to calculate a distance matrix. This can then be used to create a phylogenetic tree, like a kind of dendrogram that can be calculated using hierarchical clustering. The above is very simplified, but should give enough background to understand the rest of the post. The resource at <a href="https://www.ebi.ac.uk/training/online/course/introduction-phylogenetics/what-phylogenetics">EMBL-EBI Train Online</a> is a good place to get started if you want to know more</p>
</div>
</div>
<div id="the-pipeline-on-a-raspberry-pi" class="section level2">
<h2>The pipeline on a Raspberry Pi</h2>
<p>The <a href="https://www.raspberrypi.org/">Raspberry Pi</a> is a small and cheap single-board computer. It is used amongst many hobbyists for all kinds of projects, for example:</p>
<ul>
<li><a href="https://pyvideo.org/pycon-us-2012/militarizing-your-backyard-with-python-computer.html">Militarizing Your Backyard with Python: Computer Vision and the Squirrel Hordes</a></li>
<li><a href="https://hackaday.com/2013/01/20/raspberry-pi-and-r/">Brewing beer with the help of R</a></li>
<li><a href="https://retropie.org.uk/">Retro gaming machines</a></li>
</ul>
<p>One of the motivations behind developing this computer was to teach kids to <a href="http://blog.sparkfuneducation.com/teaching-coding-to-kids-using-raspberry-pi-3-and-scratch">code or engage in electronics</a></p>
<p>All of the above are very important, but the Raspberry Pi has made its way into <strong>science and medicine</strong> as well. For example, a group developed a cheap <a href="https://pubs.rsc.org/en/content/articlehtml/2017/sc/c7sc03281a">instrument</a> to diagnose Ebola virus infection in the field. Researchers can attach various sensors to the Raspberry Pi and use it for data collection.</p>
<div id="benchmarking" class="section level3">
<h3>Benchmarking</h3>
<p>For our application, we needed to show that the Pi can handle the problem we wanted it to solve, so we did some benchmarking.</p>
<p>We used <a href="https://www.seleniumhq.org/">Selenium WebDriver</a> to operate the pipeline as a human would, by actually browsing for an input file and submitting it through the button. Time stamps were taken for each step, and the number of blast hits that were included in the phylogenetic inference was also recorded. For this exercise, we set the number of closest sequences to retrieve for each sample to 5, which means the submitted sample and 4 of the genetically closest samples. However, it is possible that different submitted sequences have retrieved a sequence in common; these will be included in the analysis only once. When we start analyzing this data, we will see this.</p>
<pre class="r"><code># Read csv with time data
time_dat <- read_csv(
"timeFile.csv",
col_types = "ccd",
col_names = c("Run", "Description", "Measure")
)
head(time_dat) %>%
kable(caption = "First few lines of the benchmarking data.")</code></pre>
<table>
<caption><span id="tab:import">Table 1: </span>First few lines of the benchmarking data.</caption>
<thead>
<tr class="header">
<th align="left">Run</th>
<th align="left">Description</th>
<th align="right">Measure</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">final5best_random_1</td>
<td align="left">blastHits</td>
<td align="right">5.000000</td>
</tr>
<tr class="even">
<td align="left">final5best_random_1</td>
<td align="left">blast</td>
<td align="right">11.219230</td>
</tr>
<tr class="odd">
<td align="left">final5best_random_1</td>
<td align="left">mafftTime</td>
<td align="right">13.404623</td>
</tr>
<tr class="even">
<td align="left">final5best_random_1</td>
<td align="left">trimalTime</td>
<td align="right">0.111737</td>
</tr>
<tr class="odd">
<td align="left">final5best_random_1</td>
<td align="left">fasttreeTime</td>
<td align="right">0.986582</td>
</tr>
<tr class="even">
<td align="left">final5best_random_1</td>
<td align="left">heatmapTime</td>
<td align="right">2.354820</td>
</tr>
</tbody>
</table>
<p>The <code>Run</code> column shows some info regarding the benchmarking experiment. We know we asked for the five best hits to be included; the sequences were pseudo-randomly selected. We started with one sequence for submission and then incremented this by one up to 50. The above again shows how data is not always in the best format for working with. We need to extract the digits at the end of the Run variable. Previously we used the <code>tidyr::gather()</code> function to pivot data from wide to long. This time we will use the <code>spread()</code> function to make long data wide.</p>
<pre class="r"><code>time_dat <- time_dat %>%
mutate(nSubmitted = str_extract(Run, "\\d+$") %>% as.numeric) %>%
select(-Run ) %>%
spread(Description, Measure)
head(time_dat) %>%
kable(caption = "First few lines of the benchmarking data after some cleaning.")</code></pre>
<table>
<caption><span id="tab:unnamed-chunk-2">Table 2: </span>First few lines of the benchmarking data after some cleaning.</caption>
<thead>
<tr class="header">
<th align="right">nSubmitted</th>
<th align="right">blast</th>
<th align="right">blastHits</th>
<th align="right">fasttreeTime</th>
<th align="right">heatmapTime</th>
<th align="right">mafftTime</th>
<th align="right">renderTime</th>
<th align="right">trimalTime</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="right">1</td>
<td align="right">11.21923</td>
<td align="right">5</td>
<td align="right">0.986582</td>
<td align="right">2.354820</td>
<td align="right">13.40462</td>
<td align="right">1.686239</td>
<td align="right">0.1117370</td>
</tr>
<tr class="even">
<td align="right">2</td>
<td align="right">22.08694</td>
<td align="right">10</td>
<td align="right">3.129514</td>
<td align="right">2.369152</td>
<td align="right">30.26920</td>
<td align="right">1.890183</td>
<td align="right">0.2699649</td>
</tr>
<tr class="odd">
<td align="right">3</td>
<td align="right">33.67705</td>
<td align="right">15</td>
<td align="right">5.480334</td>
<td align="right">2.400223</td>
<td align="right">47.42213</td>
<td align="right">2.107776</td>
<td align="right">0.4849610</td>
</tr>
<tr class="even">
<td align="right">4</td>
<td align="right">43.58782</td>
<td align="right">21</td>
<td align="right">4.627502</td>
<td align="right">2.437273</td>
<td align="right">76.47209</td>
<td align="right">2.243336</td>
<td align="right">0.7980120</td>
</tr>
<tr class="odd">
<td align="right">5</td>
<td align="right">55.43246</td>
<td align="right">25</td>
<td align="right">10.753521</td>
<td align="right">2.476636</td>
<td align="right">105.21836</td>
<td align="right">2.494058</td>
<td align="right">1.0820050</td>
</tr>
<tr class="even">
<td align="right">6</td>
<td align="right">65.18629</td>
<td align="right">30</td>
<td align="right">9.688977</td>
<td align="right">2.516058</td>
<td align="right">128.93219</td>
<td align="right">2.653201</td>
<td align="right">1.4656579</td>
</tr>
</tbody>
</table>
<p>We got rid of the useless data in the <code>Run</code> variable and extracted the useful information into the <code>nSubmitted</code> variable.</p>
<p>Below are the explanations for the variables.</p>
<ul>
<li><code>nSubmitted</code>: Number of sequences submitted or uploaded to the pipeline</li>
<li><code>blast</code>: time in seconds for blast to find most similar previously sequenced samples</li>
<li><code>blastHits</code>: the number of sequences retrieved</li>
<li><code>mafftTime</code>: the time it took to create a multiple-sequence alignment</li>
<li><code>trimalTime</code>: the time it took to clean the multiple-sequence alignment</li>
<li><code>fasttreeTime</code>: the time it took for phylogenetic inference</li>
<li><code>heatmapTime</code>: the time it took to produce the heatmap</li>
<li><code>renderTime</code>: the time it took to render the tree</li>
</ul>
<div id="number-of-sequences-submitted-vs.-most-similar-sequences-retrieved" class="section level4">
<h4>Number of sequences submitted <em>vs.</em> most similar sequences retrieved</h4>
<pre class="r"><code>time_dat %>%
ggplot(aes(x = nSubmitted, y = blastHits)) +
geom_smooth(method = lm, se = FALSE, colour = "black", formula = y ~ x - 1, size = 0.25) +
geom_point() +
theme_bw() +
xlab("Number of sequences submitted") +
ylab("Number of sequences retrieved using blastn") +
annotate("text", x = 41, y = 72, label = "y == 4.628 * x", parse = TRUE) +
annotate("text", x = 40, y = 60, label = "R^2 == 0.998", parse = TRUE)</code></pre>
<p><img src="/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-3-1.png" width="672" /></p>
<pre class="r"><code>fit <- lm(blastHits ~ nSubmitted - 1, data = time_dat)
tidy(fit) %>%
kable(caption = "Regression analysis of the number of blast hits retrieved.") </code></pre>
<table>
<caption><span id="tab:unnamed-chunk-4">Table 3: </span>Regression analysis of the number of blast hits retrieved.</caption>
<thead>
<tr class="header">
<th align="left">term</th>
<th align="right">estimate</th>
<th align="right">std.error</th>
<th align="right">statistic</th>
<th align="right">p.value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">nSubmitted</td>
<td align="right">4.628026</td>
<td align="right">0.0280312</td>
<td align="right">165.1026</td>
<td align="right">0</td>
</tr>
</tbody>
</table>
<p>A linear line fits the data really well. We mentioned that if different sequences retrieve the same sequence from the database, it is used only once. The slope of this line will depend on the genetic diversity of the database. A more diverse database will have a steeper slope, whereas a less diverse database will have a shallower slope. Also, theoretically, at some point, the line will reach an asymptote as the number of requested sequences start to saturate the number of available sequences. Practically, one would not have to submit more than 16 - 24 samples at a time; thus, we are in the linear part of the rarefaction curve. We can thus see from this that for the Los Alamos data used in the analysis, about 4.5 sequences get retrieved for every sequence submitted.</p>
</div>
<div id="blast-time-vs.-number-of-sequences-submitted" class="section level4">
<h4>BLAST time <em>vs.</em> number of sequences submitted</h4>
<pre class="r"><code>time_dat %>%
ggplot(aes(x = nSubmitted, y = blast)) +
geom_smooth(method = lm, se = FALSE, colour = "black", formula = y ~ x, size = 0.25) +
geom_point(colour = "blue") +
theme_bw() +
xlab("Number of input sequences") + ylab("Time in seconds (blastn)") +
annotate("text", x = 41, y = 90, label = "y == 11.0453 * x", parse = TRUE) +
annotate("text", x = 40, y = 60, label = "R^2 == 0.9999", parse = TRUE)</code></pre>
<p><img src="/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-5-1.png" width="672" /></p>
<pre class="r"><code>fit <- lm(time_dat$blast ~ time_dat$nSubmitted)
tidy(fit) %>%
kable(caption = "Regression analysis of blastn time vs. number of sequences.") </code></pre>
<table>
<caption><span id="tab:unnamed-chunk-6">Table 4: </span>Regression analysis of blastn time vs. number of sequences.</caption>
<thead>
<tr class="header">
<th align="left">term</th>
<th align="right">estimate</th>
<th align="right">std.error</th>
<th align="right">statistic</th>
<th align="right">p.value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">(Intercept)</td>
<td align="right">-0.8176139</td>
<td align="right">0.5185500</td>
<td align="right">-1.576731</td>
<td align="right">0.121426</td>
</tr>
<tr class="even">
<td align="left">time_dat$nSubmitted</td>
<td align="right">11.0453236</td>
<td align="right">0.0176978</td>
<td align="right">624.105409</td>
<td align="right">0.000000</td>
</tr>
</tbody>
</table>
<p>Again, we see a linear relationship for <code>blastn</code> and the time it takes to complete. For every sequence submitted, it takes about 11 seconds to search a database of about 11,000 sequence entries. We can say the <code>blastn</code> displays linear time complexity or <span class="math inline">\(O(n)\)</span> time. We did not discover anything new here. Remember, the purpose of this is to show off the Pi flexing its muscles. (You can read about the BLAST algorithm <a href="https://www.ncbi.nlm.nih.gov/pubmed/2231712">here</a>.)</p>
</div>
<div id="multiple-sequence-alignment-time-vs.-number-of-total-sequences-submitted-and-retrieved" class="section level4">
<h4>Multiple sequence alignment time <em>vs.</em> number of total sequences, submitted and retrieved</h4>
<pre class="r"><code>fit <- lm(mafftTime ~ I(blastHits^2) - 1, data = time_dat)
time_dat %>%
ggplot(aes(x = blastHits, y = mafftTime)) +
geom_point(colour = "blue") +
geom_smooth(method = "lm",formula = y ~ I(x^2) - 1, colour = "black", size = 0.25) +
annotate("text", x = 190, y = 1800, label = "y == 0.09997 * x^2", parse = TRUE) +
theme_bw() +
xlab("Number of sequences in alignment") +
ylab("Time in seconds (MAFFT)")</code></pre>
<p><img src="/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-7-1.png" width="672" /></p>
<pre class="r"><code>tidy(fit) %>%
kable(caption = "Regression analysis of multiple sequence alignment.")</code></pre>
<table>
<caption><span id="tab:unnamed-chunk-8">Table 5: </span>Regression analysis of multiple sequence alignment.</caption>
<thead>
<tr class="header">
<th align="left">term</th>
<th align="right">estimate</th>
<th align="right">std.error</th>
<th align="right">statistic</th>
<th align="right">p.value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">I(blastHits^2)</td>
<td align="right">0.099974</td>
<td align="right">0.0004048</td>
<td align="right">246.9813</td>
<td align="right">0</td>
</tr>
</tbody>
</table>
<p>Since in multiple sequence alignment, each sequence is aligned with each other sequence, we would expect <span class="math inline">\(O(N^2)\)</span> time complexity. We can see in our regression result that we are very close to what we expect. And <span class="math inline">\(O\)</span> is a bit less than a sixth of a second. Thus, if we would analyse 16 sequences, we would retrieve <span class="math inline">\(16 * 4.5 = 72\)</span>, and the multiple-sequence alignment would take <span class="math inline">\(0.09997 * 72^2 = 518\)</span> seconds or ~8.6 minutes, which is not bad. Also consider that you can submit your samples and walk away.</p>
</div>
</div>
<div id="impact" class="section level3">
<h3>Impact</h3>
<p>It is important to mention that PhyloPi is not used for tracking or detecting transmission clusters, but rather offers a way of automating phylogenetic analysis. Some patients will be genotyped more than once, and these sequences will cluster very closely on a phylogenetic tree. This offers a spot check into the quality of the results. Sometimes we find that the patient has two different first names, which they interchangeably use depending on the health care worker and patient language preference. We have also detected sample swaps which otherwise would have gone unnoticed.</p>
</div>
</div>
<div id="what-next" class="section level2">
<h2>What next?</h2>
<p>In part 3, we will discuss how the inter- and intrapatient HIV genetic distances were analyzed using logistic regression to gain insights into the probability distribution of these two classes. This is also where we asked Andrie from RStudio for help. It was useful for us biologists and virologists to have someone not just to oversee the analysis we did, but also to implement the correct analysis to get the job done. Hope to see you in the next section!</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/';</script>
Analysing the HIV pandemic, Part 1: HIV in sub-Sahara Africa
https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/
Tue, 30 Apr 2019 00:00:00 +0000https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/
<p><em>Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the <a href="https://www.ufs.ac.za/health/departments-and-divisions/virology-home">Division of Virology</a>, <a href="https://www.ufs.ac.za">University of the Free State</a>, and <a href="http://www.nhls.ac.za/">National Health Laboratory Service (NHLS)</a>, Bloemfontein, South Africa</em></p>
<p><em>Sabeehah Vawda is a pathologist, researcher, and lecturer at the <a href="https://www.ufs.ac.za/health/departments-and-divisions/virology-home">Division of Virology</a>, <a href="https://www.ufs.ac.za">University of the Free State</a>, and <a href="http://www.nhls.ac.za/">National Health Laboratory Service (NHLS)</a>, Bloemfontein, South Africa</em></p>
<p><em>Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio</em></p>
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>The <a href="https://www.immunology.org/public-information/bitesized-immunology/pathogens-and-disease/human-immunodeficiency-virus-hiv">Human Immunodeficiency Virus</a> (<strong>HIV</strong>) is the virus that causes acquired immunodeficiency syndrome (<strong>AIDS</strong>). The virus invades various immune cells, causing loss of immunity, and thus increased susceptibility to infections, including Tuberculosis and cancer. In a recent publication in <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241">PLoS ONE</a>, the authors described how they used affordable hardware to create a <a href="https://en.wikipedia.org/wiki/Phylogenetics">phylogenetic</a> pipeline, tailored for the HIV drug resistance testing facility. In this series of blog posts we highlight the serious problem of HIV infection in sub-Saharan Africa, with special analysis of the situation in South Africa.</p>
<div id="stages-of-hiv-infection" class="section level3">
<h3>Stages of HIV infection</h3>
<p>HIV infection can be divided into the three consecutive stages: acute primary infection, asymptomatic stage, and the symptomatic stage.</p>
<p>The first stage, <strong>acute primary infection</strong>, has symptoms very much like flu and may last for a week or two. The body reacts with an immune response, which results in the production of antibodies to fight the HIV infection. This process is called seroconversion and can last a couple of months. During this stage, although the patient is infected and the virus is spreading through the body, the patient might not test positive. This initial period of seroconversion is called ‘the window period’ and depends on the type of test used. Rapid tests are done at the point of care. This means that the test can be done at the clinic with a finger prick and the result is ready in 20 minutes. The drawback of this test is a window period of three months and a small false positive rate. The rapid test detects HIV antibodies, and because the immune system needs some time to produce sufficient antibodies to be detected, there is this window period. Most laboratories these days use fourth-generation <a href="https://www.immunology.org/public-information/bitesized-immunology/experimental-techniques/enzyme-linked-immunosorbent-assay">ELISA</a> (Enzyme-Linked Immunosorbent Assay) for HIV diagnosis and confirmation. This technique detects both HIV antibodies and antigens. Antigens are the foreign objects that the immune system recognizes as ‘non-self’; in this case, it is the viral protein p24. The advantage of this technique is a window period of only one month.</p>
<p>This first stage, including the window period, is then followed by the <strong>asymptomatic stage</strong>, which may last for as long as ten years. During this stage, the infected person does not experience symptoms and feels healthy. However, the virus is still replicating and destroying immune cells, especially CD4 cells. This damages the immune system and ultimately leads to stage 3 if not treated. This does not mean that people at stage 3 are doomed, but the earlier treatment starts, the better the outcome.</p>
<p>Stage 3 is referred to as <strong>symptomatic HIV infection or AIDS</strong> (Acquired Immune Deficiency Syndrome). At this stage, the immune system is so weak that it is not able to fight off bacterial or fungal infections that typically do not cause infections in immune competent people. These serious infections are called opportunistic infections, and have a high morbidity and mortality rate.</p>
</div>
<div id="transmission-and-epidemiology" class="section level3">
<h3>Transmission and epidemiology</h3>
<p>Worldwide, approximately 36.9 million (UNAIDS) people are living with HIV.</p>
<p>HIV is transmitted mainly by:</p>
<ul>
<li>Having unprotected sex</li>
<li>Non-sterile needles in drug use or sharing needles</li>
<li>Mother-to-child transmission during birth or breastfeeding</li>
<li>Infected blood transfusions, transplants or other medical procedures (very unlikely)</li>
</ul>
<p>We mentioned the window period of the HIV infection as well as the asymptomatic stage. During any of the stages, it is possible to transmit the infection. The problem with the window period is an unknown HIV status or falsely assumed negative status, and during the asymptomatic stage, there is no reason for the infected person to seek medical attention. There are obviously behavioural issues in HIV transmission, and due to the long asymptomatic phase, HIV-positive status can be unknown for a long period. For these reasons, it is important that high-risk individuals do frequent HIV tests to determine their status.</p>
</div>
<div id="treatment-for-hiv-infection" class="section level3">
<h3>Treatment for HIV infection</h3>
<p>HIV is treatable but not (yet) curable. The good news, however, is that if a person receives <strong>antiretroviral (ARV) treatment</strong>, their viral load suppresses (viral replication stops) and the chance of transmitting HIV drastically decreases.</p>
<p>So 30 years into this pandemic, the big question is, why is HIV still a problem?</p>
<p>Not all countries adopted the use of ARVs in an equal manner. Although AZT (Zidovudine) was the first drug to be approved by the <a href="https://www.fda.gov/forpatients/illness/hivaids/history/ucm151074.htm">FDA</a> in March 1987, it was soon discovered that monotherapy with only AZT was not effective for very long, as the virus developed resistance to the medicine quickly. Since then, ARVs have come a long way, and patients are placed on:</p>
<ul>
<li><strong>HAART</strong> (Highly Active Antiretroviral Treatment), or</li>
<li><strong>cART</strong> (combination Antiretroviral Treatment), which typically consists of 3 drugs of different classes.</li>
</ul>
</div>
</div>
<div id="hiv-in-africa" class="section level2">
<h2>HIV in Africa</h2>
<p>Let’s look at the rates of HIV infection in different African countries. The world factbook by the CIA has some HIV infection rate <a href="https://www.cia.gov/LIBRARY/publications/the-world-factbook/rankorder/rawdata_2155.txt">data</a>.</p>
<pre class="r"><code>suppressPackageStartupMessages({
library(dplyr)
library(readr)
library(stringr)
library(tidyr)
library(ggplot2)
library(forcats)
library(knitr)
library(maptools)
library(viridis)
library(RColorBrewer)
library(mapproj)
library(broom)
library(ggrepel)
library(sf)
})</code></pre>
<pre class="r"><code># read the HIV data
HIV_rate_2016 <- read_csv(
file.path(file_path, "HIV rates.csv"), col_names = TRUE, col_types = "cd"
)
# read the Africa shape file
africa <-
sf::st_read(
file.path(file_path, "Africa_SHP/Africa.shp"),
stringsAsFactors = FALSE, quiet = TRUE
) %>%
rename(Country = "COUNTRY") %>%
left_join(HIV_rate_2016, by = "Country")
africa %>%
ggplot(aes(fill = Rate)) +
geom_sf() +
coord_sf() +
scale_fill_viridis(option = "plasma") +
theme_minimal()</code></pre>
<p><img src="/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_map-1.png" width="672" /></p>
<p>In the choropleth above, we see that South Africa, Botswana, Lesotho, and Swaziland seem to have the highest rates of infection. This is presented as the percentage infected, which takes into account population sizes. It is important to understand that the level of denial is indirectly proportional to the reported rate of infection. Even in this day and age, denial of stigmatized diseases is an issue.</p>
<div id="cleaning-the-data" class="section level3">
<h3>Cleaning the data</h3>
<p>We can also look at the burden of HIV as the number of people infected, and we might get a different picture from what we saw from the choropleth.</p>
<p>Here, we read in the <a href="http://apps.who.int/gho/data/node.main.626">data</a>, and rename the columns to <code>Country</code>, <code>PersCov</code> (percentage ARV coverage), <code>NumberOnARV</code> (Number of patients on ARVs), and <code>NumberInfected</code> (Number of patients infected).</p>
<pre class="r"><code># Read csv with ARV infection dat
arv_dat <- read_csv(file.path(file_path, "ARV cov 2017.csv"),
col_types = "cccc",
col_names = c("Country", "PersCov", "NumberOnARV", "NumberInfected"),
skip = 1
)
head(arv_dat)</code></pre>
<pre><code>## # A tibble: 6 x 4
## Country PersCov NumberOnARV NumberInfected
## <chr> <chr> <chr> <chr>
## 1 Afghanistan No data 790 No data
## 2 Albania 42 [40-44] 570 1400 [1300-1400]
## 3 Algeria 80 [75-87] 11000 14 000 [13 000-15 000]
## 4 Andorra No data No data No data
## 5 Angola 26 [22-30] 78700 310 000 [260 000-360 000]
## 6 Antigua and Barbuda No data No data No data</code></pre>
<p>This data has several symptoms of being very messy:</p>
<ul>
<li>Very long variable names, descriptive, but difficult to work with; this was changed during import</li>
<li>The values contain confidence intervals in brackets; this will be difficult to work with as-is</li>
<li>We might want to transform no data to <code>NA</code></li>
<li>We are interested in Sub-Saharan Africa, but the data is for the whole world</li>
</ul>
<pre class="r"><code># A list of Sub-Saharan countries
sub_sahara <- readLines(file.path(file_path, "Sub-Saharan.txt"))
clean_column <- function(x){
# Remove the ranges in brackets and convert the values to numeric
x %>%
str_replace_all("\\[.*?\\]", "") %>%
str_replace_all("<", "") %>%
str_replace_all(" ", "") %>%
as.numeric()
}
arv_dat <-
arv_dat %>%
filter(Country %in% sub_sahara) %>%
na_if("No data") %>%
mutate_at(2:4, clean_column)
head(arv_dat)</code></pre>
<pre><code>## # A tibble: 6 x 4
## Country PersCov NumberOnARV NumberInfected
## <chr> <dbl> <dbl> <dbl>
## 1 Angola 26 78700 310000
## 2 Benin 55 38400 70000
## 3 Botswana 84 318000 380000
## 4 Burkina Faso 65 61400 94000
## 5 Burundi 77 60100 78000
## 6 Cameroon 49 254000 510000</code></pre>
<p>We use a regular expression to get rid of all the square bracket ranges. We also remove the “<” sign and spaces within numbers, change “No data” to <code>NA</code>, and convert the characters to numbers. We filter out the countries we don’t want. (Note that some countries are not available in the ARV data, e.g., Swaziland and Reunion.)</p>
</div>
<div id="highest-infected-countries" class="section level3">
<h3>Highest infected countries</h3>
<p>Now look at the countries with the highest number of infected people of all ages.</p>
<pre class="r"><code>arv_dat %>%
top_n(4, wt = NumberInfected) %>%
arrange(-NumberInfected) %>%
kable(
caption = "Countries with the highest number of HIV infections"
)</code></pre>
<table>
<caption><span id="tab:unnamed-chunk-1">Table 1: </span>Countries with the highest number of HIV infections</caption>
<thead>
<tr class="header">
<th align="left">Country</th>
<th align="right">PersCov</th>
<th align="right">NumberOnARV</th>
<th align="right">NumberInfected</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">South Africa</td>
<td align="right">61</td>
<td align="right">4359000</td>
<td align="right">7200000</td>
</tr>
<tr class="even">
<td align="left">Nigeria</td>
<td align="right">33</td>
<td align="right">1040000</td>
<td align="right">3100000</td>
</tr>
<tr class="odd">
<td align="left">Mozambique</td>
<td align="right">54</td>
<td align="right">1156000</td>
<td align="right">2100000</td>
</tr>
<tr class="even">
<td align="left">Kenya</td>
<td align="right">75</td>
<td align="right">1122000</td>
<td align="right">1500000</td>
</tr>
</tbody>
</table>
<p>We can see that South Africa has the highest number of HIV-infected people in Sub-Saharan Africa.</p>
</div>
</div>
<div id="hiv-in-southern-africa" class="section level2">
<h2>HIV in Southern Africa</h2>
<p>In South Africa, the first AIDS-related death occurred in 1985. Not all patients were eligible to receive ARVs, and it was only in 2004 that ARVs became available in the public sector in South Africa. Eligibility restriction still applied, so not all HIV infected patients received treatment.</p>
<p>Ideally, a country would have all its HIV-infected people on treatment, but due to financial constraints, this is not always possible. In South Africa, patients were only initialized on ARVs when their CD4 counts dropped below a certain level. This threshold was initially 200 cells/mL in 2004, which was then changed to 350 cells/mL and 500 cell/mL at later intervals. These recommendations were a compromise between the availability of funds and getting ARVs to the people needing it the most. CD4 cells are a major component of the immune system; the lower the CD4 cell count the higher the chance for opportunistic infections. Thus, the idea is to support the patients who are most likely to contract an opportunistic infection.</p>
<p>The problem with this was that about only a third of the HIV infected people in South Africa were receiving HAART treatment. In 2017, the guidelines changed to test and treat; i.e., any newly diagnosed patient will receive HAART treatment. This is a big improvement for many reasons, but notably a lower infection rate. If a patient is taking HAART treatment and it is effective in suppressing the viral replication, the chances of the patient transmitting the virus are very close to zero.</p>
<p>However, these treatments are not without side effects, which in some cases causes very poor adherence to the treatment. There are numerous factors to blame here, specifically socio-economic factors and depression. There is also ignorance and the “fear of knowing”, which causes people not to know their status. Finally, human nature brings with it various other complexities, such as conspiracy theories, and religious and personal beliefs. This will be a very long post if we delve into all the issues, but the take-home message is: the situation is complicated.</p>
<div id="arv-coverage-by-country" class="section level3">
<h3>ARV coverage by country</h3>
<p>We looked at the rate of HIV infections, and also the number of people infected, in the most endemic countries. We have talked about treatment. It would be interesting to look at ARV coverage by country.</p>
<p>Let’s see how these countries rank by ARV coverage:</p>
<pre class="r"><code>arv_dat %>%
na.omit(PersCov) %>%
ggplot(aes(x = reorder(Country, PersCov), y = PersCov)) +
geom_point(aes(colour = NumberInfected), size = 3) +
scale_colour_viridis(
name = "Number of people infected",
trans = "log10",
option = "plasma"
) +
coord_flip() +
ylab("% ARV coverage") + xlab("Country") +
theme_bw()</code></pre>
<p><img src="/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_rank-1.png" width="768" /></p>
<p>This shows that Zimbabwe, Namibia, Botswana, and Rwanda have the highest ARV coverage (above 80%). South Africa has the highest number of infections (as we saw before), and coverage of just above 60%.</p>
<p>Botswana rolled out their treatment program in 2002, and by mid-2005, about half of the eligible population received ARV treatment. South Africa, on the other hand, only started treatment in 2004, which we discuss later.</p>
<p>When talking about treatment, we should also look at the changes in mortality.</p>
</div>
<div id="hiv-related-deaths" class="section level3">
<h3>HIV related deaths</h3>
<p>Read in the <a href="http://apps.who.int/gho/data/node.main.623?lang=en">data</a>:</p>
<pre class="r"><code>hiv_mort <-
read_csv(file.path(file_path, "HIV deaths.csv"), col_types = "ccccc") %>%
na_if("No data") %>%
mutate_at(vars(starts_with("Deaths")), clean_column) %>%
filter(Country %in% sub_sahara)
head(hiv_mort)</code></pre>
<pre><code>## # A tibble: 6 x 5
## Country Deaths_2017 Deaths_2010 Deaths_2005 Deaths_2000
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Angola 13000 10000 7900 3900
## 2 Benin 2500 2600 4300 2600
## 3 Botswana 4100 5900 13000 15000
## 4 Burkina Faso 2900 5400 12000 15000
## 5 Burundi 1700 5400 8600 8500
## 6 Cameroon 24000 25000 26000 17000</code></pre>
<pre class="r"><code>summary(hiv_mort)</code></pre>
<pre><code>## Country Deaths_2017 Deaths_2010 Deaths_2005
## Length:43 Min. : 100 Min. : 100 Min. : 100
## Class :character 1st Qu.: 1900 1st Qu.: 1975 1st Qu.: 2050
## Mode :character Median : 4400 Median : 5400 Median : 8250
## Mean : 15442 Mean : 23483 Mean : 33227
## 3rd Qu.: 16250 3rd Qu.: 27250 3rd Qu.: 48250
## Max. :150000 Max. :200000 Max. :260000
## NA's :3 NA's :3 NA's :3
## Deaths_2000
## Min. : 100
## 1st Qu.: 1150
## Median : 6500
## Mean : 26496
## 3rd Qu.: 41500
## Max. :130000
## NA's :3</code></pre>
<p>The 2017 mean for the dataset as a whole is about half of that during the early 2000s. It would be interesting to plot this data, but it will probably be too busy as it is. We can instead have a look at countries which had the most change.</p>
<pre class="r"><code>hiv_mort <- hiv_mort %>%
mutate(
min = apply(hiv_mort[, 2:4], 1, FUN = min),
max = apply(hiv_mort[, 2:4], 1, FUN = max),
Change = max - min
)</code></pre>
<p>Next, we can create a plot of the data, and look at the top five countries with the biggest change in HIV-related mortality.</p>
<pre class="r"><code>hiv_mort %>%
top_n(5, wt = Change) %>%
gather(Year, Deaths, Deaths_2017:Deaths_2000) %>%
na.omit() %>%
mutate(
Year = str_replace(Year, "Deaths_", "") %>% as.numeric(),
Country = fct_reorder(Country, Deaths)
) %>%
ggplot(aes(x = Year, y = Deaths, color = Country)) +
geom_line(size = 1) +
geom_vline(xintercept = 2004, color = "black", linetype = "dotted", size = 1.5) +
scale_color_viridis(option = "D", discrete = TRUE) +
theme_bw() +
theme(legend.position = "bottom") </code></pre>
<p><img src="/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_hiv_mort-1.png" width="672" /></p>
<p>Remember, we mentioned that <strong>HAART</strong> (Highly Active Antiretroviral Treatment) was introduced in 2004 in South Africa, depicted here by the black dotted line. It is easy to appreciate the dramatic effect the introduction of ARVs had in South Africa.</p>
<p>Although the picture above is positive, the fight is not over. The target is to get at least 90% of HIV-infected patients on treatment. Adherence to ARV regimens stays crucial not only to suppress viral replication, but also to minimize the development of drug resistance.</p>
</div>
<div id="infection-rates" class="section level3">
<h3>Infection rates</h3>
<p>As mentioned earlier, if a patient is taking and responding to treatment, the viral load gets suppressed and the chances of transmitting the infection become very close to null. Thus, the more patients with an undetectable viral load, the lower the transmission rate.</p>
<p>Read the <a href="http://aidsinfo.unaids.org/?did=5b4eaa7cdddb54192bb39714&r=world&t=null&tb=d&bt=dnli&ts=null&tr=world&tl=2">data</a>:</p>
<pre class="r"><code>new_infections <-
read_csv(file.path(file_path,
"Epidemic transition metrics_Trend of new HIV infections.csv"),
na = "...",
col_types = cols(
.default = col_character(),
`2017_1` = col_double()
)
) %>%
select(
-ends_with("_upper"),
-ends_with("lower"),
-ends_with("_1")
) %>%
mutate_at(-1, clean_column) %>%
na.omit()</code></pre>
<pre><code>## Warning: Duplicated column names deduplicated: '2017' => '2017_1' [26]</code></pre>
<pre class="r"><code>new_infections %>%
gather(Year, NewInfections, 2:9) %>%
ggplot(aes(x = Year, y = NewInfections, color = Country)) +
geom_point() +
theme_classic() +
theme(legend.position = "none") +
xlab("Year") +
ylab("Number of new infections")</code></pre>
<p><img src="/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/new_infections-1.png" width="672" /></p>
<p>This is a bit busy. Countries that are highly endemic with good ARV coverage and prevention of infection programs should have a steeper decline in the newly infected people. At first glance, it looks like some of the data points are fairly linear. Let’s go with that assumption, and apply linear regression to each country.</p>
<pre class="r"><code>rates_modeled <-
new_infections %>%
filter(Country %in% sub_sahara) %>%
na.omit() %>%
gather(Year, NewInfections, 2:9) %>%
mutate(Year = as.numeric(Year)) %>%
group_by(Country) %>%
do(tidy(lm(NewInfections ~ Year, data = .))) %>%
filter(term == "Year") %>%
ungroup() %>%
mutate(
Country = fct_reorder(Country, estimate, .desc = TRUE)
) %>%
arrange(desc(estimate)) %>%
select(-one_of("term", "statistic"))
rates_modeled %>%
head() %>%
kable(
caption = "Results of linear regression: Rate of new infections per year"
)</code></pre>
<table>
<caption><span id="tab:unnamed-chunk-3">Table 2: </span>Results of linear regression: Rate of new infections per year</caption>
<thead>
<tr class="header">
<th align="left">Country</th>
<th align="right">estimate</th>
<th align="right">std.error</th>
<th align="right">p.value</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Madagascar</td>
<td align="right">469.04762</td>
<td align="right">12.56126</td>
<td align="right">0.0000000</td>
</tr>
<tr class="even">
<td align="left">Côte d’Ivoire</td>
<td align="right">190.47619</td>
<td align="right">153.99689</td>
<td align="right">0.2623441</td>
</tr>
<tr class="odd">
<td align="left">Botswana</td>
<td align="right">130.95238</td>
<td align="right">92.46968</td>
<td align="right">0.2064860</td>
</tr>
<tr class="even">
<td align="left">Mali</td>
<td align="right">108.33333</td>
<td align="right">23.21683</td>
<td align="right">0.0034452</td>
</tr>
<tr class="odd">
<td align="left">Congo</td>
<td align="right">103.57143</td>
<td align="right">16.45271</td>
<td align="right">0.0007486</td>
</tr>
<tr class="even">
<td align="left">Eritrea</td>
<td align="right">89.28571</td>
<td align="right">23.05347</td>
<td align="right">0.0082374</td>
</tr>
</tbody>
</table>
<pre class="r"><code>rates_modeled %>%
na.omit() %>%
ggplot(aes(x = Country, y = estimate, fill = p.value >= 0.05)) +
geom_col() +
coord_flip() +
theme_bw() +
ylab("Estimated change in HIV infection (people/year)")</code></pre>
<p><img src="/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/rates_model_plot-1.png" width="672" /></p>
<p>With a quick look at the plot shown above, we can see that for most countries, a linear model fits the data with a significant p-value cutoff of 0.05.</p>
<p>It is important to note here that the data we have at hand is from 2010 to 2017. This shows that some countries - notably, South Africa - are on a good trajectory. Botswana, being the “Poster Child” of a good HIV treatment and prevention program, seems to have stabilized in terms of rate of infection, with a positive but insignificant estimate of the rate of infection. This could be explained by the following reasons:</p>
<ul>
<li>First African country to introduce HAART, 2002</li>
<li>Progressive in terms of prevention programs</li>
<li>Looking only from 2010, we are missing the dramatic decline in infection</li>
<li>The <a href="https://www.who.int/">WHO</a> goal is to get 90% of a country’s infected people on HAART, but the last 5-7% might be the hardest to convince</li>
</ul>
<p>We can combine the ARV and estimated rates of infection data.</p>
<pre class="r"><code>arv_on_infection <-
arv_dat %>%
left_join(rates_modeled, by = "Country") %>%
mutate(p_interpretation = if_else(p.value >= 0.05, "Significant", "Insignificant"))</code></pre>
<pre><code>## Warning: Column `Country` joining character vector and factor, coercing
## into character vector</code></pre>
<pre class="r"><code>arv_on_infection %>%
na.omit() %>%
ggplot(aes(x = PersCov, y = estimate,
shape = p_interpretation >= 0.05)) +
geom_point(aes(color = NumberInfected), size = 2) +
geom_text_repel(aes(label = Country), size = 3) +
scale_color_gradient(high = "red", low = "blue") +
theme_grey() +
xlab("% ARV coverage") +
ylab("Estimated change in HIV infection\n(people/year)") +
ggtitle("Antiretroviral (ARV) coverage")</code></pre>
<p><img src="/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/arv_infection-1.png" width="672" /></p>
<p>South Africa has the highest number of infected people, but on the positive side, has a downward trajectory of about 15000 fewer people newly infected each year. Although ARVs do play a crucial role in controlling this epidemic, it is not the only factor involved. Prevention of mother-to-child transmission has been very successful in South Africa. Awareness campaigns and education are playing a big role as well. The plot above shows our linearly modeled rates.</p>
</div>
</div>
<div id="the-laboratory-hiv-diagnosis-and-monitoring" class="section level2">
<h2>The laboratory, HIV diagnosis and monitoring</h2>
<p>HIV-related laboratory tests are not the only diagnostics done in a Virology department, but in endemic countries, it accounts for the majority of tests which are done. The first HIV-related test done would be for diagnosis. This is done differently in adults than in infants. As we discussed earlier, after HIV infection, the immune system develops antibodies. We can use a field of study called <strong>serology</strong> to detect antibodies and antigens, and in most cases, an ELISA test is performed to confirm HIV seroconversion or status. Since the mother’s antibodies will be present in the infant, an ELISA will tell us the baby is positive even though not infected. Infants are diagnosed by detecting viral RNA or DNA in their blood. This is done by PCR (Polymerase Chain Reaction).</p>
<p>Once a patient is diagnosed as HIV-positive, the patient will be initiated on HAART, and in most cases, the viral load will be suppressed. In the South African public sector treatment program, after HAART initiation, the patient gets two six-monthly viral load tests to make sure viral replication is suppressed. To keep an eye out for trouble, a yearly viral load is done to confirm adherence and effectiveness of the treatment.</p>
<p>When an unsuppressed viral load is detected, action is taken and adherence counselling is performed. If this does not solve the problem, drug-resistance testing is performed to assess the resistance profile of the infection in order to adjust the ARV regimen accordingly. This is done by isolating the viral RNA, converting it to DNA, amplifying the DNA to sufficient quantities to enable sequencing of the DNA. In our laboratory, we use <a href="https://en.wikipedia.org/wiki/Sanger_sequencing">Sanger sequencing</a>, but other sequencing technologies also exist.</p>
<hr />
<div class="figure" style="text-align: center"><span id="fig:unnamed-chunk-4"></span>
<img src="/post/2019-05-01-analysis-hiv-pandemic-part-1_files/hxb2genome.gif" alt="HIV Genome as depicted by the Los Alamos HIV sequence database. Available at https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html" style="margin:50px 10px" />
<p class="caption">
Figure 1: HIV Genome as depicted by the Los Alamos HIV sequence database. Available at <a href="https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html" class="uri">https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html</a>
</p>
</div>
<hr />
<p>This diagram depicts the genome of HIV. The most common targets for interfering with viral replication is located in the <em>pol</em> gene. Specifically:</p>
<ul>
<li><p><strong>prot</strong>: The viral protease. Many of the viral proteins are translated as longer polypeptides, which are then cleaved into mature proteins by the protease.</p></li>
<li><p><strong>p51 RT</strong>: The viral reverse transcriptase: Each virion contains two copies of viral RNA. The reverse transcriptase converts the RNA to DNA.</p></li>
<li><p><strong>p31 int</strong>: The viral integrase: This enzyme integrates the reverse transcribed viral DNA into host genomes of the infected cells, and establishes chronic infection.</p></li>
</ul>
<p>Essentially, ARVs interfere with these viral enzymes by inhibiting their action:</p>
<ul>
<li><p><strong>Protease inhibitors</strong> prevent the maturation of viral proteins.</p></li>
<li><p><strong>Reverse transcriptase inhibitors</strong> prevent the formation of a DNA copy of the viral genome, which then gives the integrase nothing to work with.</p></li>
<li><p><strong>Integrase inhibitors</strong> prevent the integration of viral DNA into the host genome, which is a crucial part of replication and infection.</p></li>
</ul>
<p>Combining these ARVs in clever ways results in HAART or cART. By sequencing the viral RNA, we can detect mutations that cause resistance to specific ARVs. This information is then used to adjust the ARV regimen to once again effectively suppress viral replication.</p>
<p>The viral reverse transcriptase has a high error rate when doing the conversion of RNA to DNA, and introduces random mutations in the viral genome. In the presence of selective pressure like ARVs, these random mutations might give advantageous phenotypic traits to the replicating virus, like drug resistance. On the other hand, if the patient is properly adhering to the treatment, the viral replication is suppressed, replication does not occur, thus mutations can’t occur.</p>
<p>This high rate of mutation can be used in the laboratory as one of the quality-control tools. The polymerase chain reaction is prone to contamination, so it is possible when doing these reactions that one sample might contaminate another. This will give rise to false mutations in the contaminated sample and an erroneous result to the treating clinician, thus direct negative impact on the patient.</p>
</div>
<div id="what-next" class="section level2">
<h2>What next?</h2>
<p>In a recent publication in <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241">PLoS ONE</a>, the authors described how they used affordable hardware to create a <a href="https://en.wikipedia.org/wiki/Phylogenetics">phylogenetic</a> pipeline, tailored for the HIV drug resistance testing facility.</p>
<ul>
<li><p>In <strong>Part 2</strong> of this four part series, we discuss this pipeline.</p></li>
<li><p>In <strong>Part 3</strong>, we will discuss genetic distances and phylogenetics.</p></li>
<li><p>Finally, in <strong>Part 4</strong>, we will look at the application of logistic regression in analyzing inter- and intra-patient genetic distance of viral sequences.</p></li>
</ul>
<p>See you in the next section!</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/';</script>
March 2019: "Top 40" New CRAN Packages
https://rviews.rstudio.com/2019/04/26/march-2019-top-40-new-cran-packages/
Fri, 26 Apr 2019 00:00:00 +0000https://rviews.rstudio.com/2019/04/26/march-2019-top-40-new-cran-packages/
<p>By my count, two hundred and thirty-three packages stuck to CRAN last month. I have tried to capture something of the diversity of the offerings by selecting packages in ten categories: Computational Methods, Data, Machine Learning, Medicine, Science, Shiny, Statistics, Time Series, Utilities, and Visualization. The Shiny category contains packages that expand on Shiny capabilities, not just packages that implement a Shiny application. It is not clear whether this is going to be a new cottage industry or not.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=DistributionOptimization">DistributionOptimization</a> v1.2.1: Fits Gaussian mixtures by applying Genetic algorithms from the <a href="doi:10.18637/jss.v053.i04">GA package</a> using Gaussian Mixture Logic stems from <a href="doi:10.3390/ijms161025897">AdaptGauss</a>.</p>
<p><a href="https://cran.r-project.org/package=latte">latte</a> v0.2.1: Implements connections to <a href="https://www.math.ucdavis.edu/~latte"><code>LattE</code></a> for counting lattice points and integration inside convex polytopes, and <a href="http://www.4ti2.de/"><code>4ti2</code></a> for algebraic, geometric, and combinatorial problems on linear spaces and front-end tools facilitating their use in the ‘R’ ecosystem. Look <a href="https://github.com/dkahle/latt">here</a> for an example.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/latte.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=nlrx">nlrx</a> v0.2.0: Provides tools to set up, run, and analyze <a href="https://ccl.northwestern.edu/netlogo/">NetLogo</a> model simulations in R. There is a <a href="https://cran.r-project.org/web/packages/nlrx/vignettes/getstarted.html">Getting Started Guide</a>, vignettes for <a href="https://cran.r-project.org/web/packages/nlrx/vignettes/furthernotes.html">Advanced Configuration</a>, and <a href="https://cran.r-project.org/web/packages/nlrx/vignettes/simdesign-examples.html">Examples</a>.</p>
<p><a href="https://cran.r-project.org/package=nvctr">nvctr</a> v0.1.1: Implements the n-vector approach to calculating geographical positions using an ellipsoidal model of the Earth. This package is a translation of the FFi <code>Matlab</code> library from FFI described in <a href="doi:10.1017/S0373463309990415">Gade (2010)</a>. The <a href="https://cran.r-project.org/web/packages/nvctr/vignettes/position-calculations.html">vignette</a> provides examples.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/nvctr.png" height = "200" width="400"></p>
<h4 id="data">Data</h4>
<p><a href="https://cran.r-project.org/package=EHRtemporalVariability">EHRtemporalVariability</a> v1.0: Provides functions to delineate reference changes over time in Electronic Health Records through the projection and visualization of dissimilarities among data temporal batches, and explore results through data temporal heat maps, information geometric temporal (IGT) plots, and a <a href="http://ehrtemporalvariability.upv.es">Shiny app</a>. The <a href="https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html">vignette</a> shows how to use the package.</p>
<p><a href="https://cran.r-project.org/package=kayadata">kayadata</a> v0.4.0: Provides data for <a href="https://en.wikipedia.org/wiki/Kaya_identity">Kaya identity variables</a> (population, gross domestic product, primary energy consumption, and energy-related CO2 emissions), and includes utility functions for exploring and plotting fuel mix for a given country or region. See the <a href="https://cran.r-project.org/web/packages/kayadata/vignettes/policy_analysis.html">vignette</a> for examples.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/kayadata.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=newsanchor">newsanchor</a> v0.1.0: Implements an interface to gather news from the <a href="https://newsapi.org/">News API</a>. A personal API key is required. The <a href="https://cran.r-project.org/web/packages/newsanchor/vignettes/scrape-nyt.html">vignette</a> shows how to scrape New York Times online articles.</p>
<p><a href="https://cran.r-project.org/package=raustats">raustats</a> v0.1.0: Provides functions for downloading Australian economic statistics from the <a href="https://www.abs.gov.au/">Australian Bureau of Statistics</a> and <a href="https://www.rba.gov.au/">Reserve Bank of Australia</a> websites. The <a href="https://cran.r-project.org/web/packages/raustats/vignettes/raustats_introduction.html">vignette</a> shows how to use the package.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/raustats.png" height = "400" width="600"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=akmedoids">akmedoids</a> v0.1.2: Advances a set of R-functions for longitudinal clustering of long-term trajectories, and determines the optimal solution based on the Caliński-Harabasz criterion ( <a href="https://doi.org/10.1080/03610927408827101">Caliński and Harabasz (1974)</a> ). The <a href="https://cran.r-project.org/web/packages/akmedoids/vignettes/akmedoids-vignette.html">vignette</a> works through an extended example.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/akmedoids.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=shapper">shapper</a> v0.1.0: Implements a wrapper for the Python <code>shap</code> library that provides <a href="arXiv:1705.07874">SHapley Additive exPlanations (SHAP)</a> for the variables that influence particular observations in machine learning models. There are vignettes for <a href="https://cran.r-project.org/web/packages/shapper/vignettes/shapper_classification.html">classification</a> and <a href="https://cran.r-project.org/web/packages/shapper/vignettes/shapper_regression.html">regression</a>.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/shapper.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=sparkxgb">sparkxgb</a> v0.1.0: Implements a <a href="https://spark.rstudio.com/"><code>sparklyr</code></a> extension that provides an interface for <a href="https://github.com/dmlc/xgboost">XGBoost</a> on Apache Spark. See the <a href="https://cran.r-project.org/web/packages/sparkxgb/readme/README.html">README</a> for a brief overview.</p>
<p><a href="https://cran.r-project.org/package=xgb2sql">xgb2sql</a> v0.1.2: Enables in-database scoring of <a href="https://xgboost.readthedocs.io/en/latest/index.htm"><code>XGBoost</code></a> models built in R, by translating trained model objects into SQL query. See <a href="doi:10.1145/2939672.2939785">Chen & Guestrin (2016)</a> for details on <code>XGBoost</code>, and the <a href="https://cran.r-project.org/web/packages/xgb2sql/vignettes/xgb2sql.html">vignette</a> for an overview of the package.</p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=ctrdata">ctrdata</a> v0.18: Provides functions for querying, retrieving, and analyzing protocol- and results-related information on clinical trials from two public registers, the <a href="https://www.clinicaltrialsregister.eu/">European Union Clinical Trials Register</a> and <a href="https://clinicaltrials.gov/">ClinicalTrials.gov</a>. There is a <a href="https://cran.r-project.org/web/packages/ctrdata/vignettes/ctrdata_get_started.html">Getting Started Guide</a> and a vignette with <a href="https://cran.r-project.org/web/packages/ctrdata/vignettes/ctrdata_usage_examples.html">examples</a>.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/ctrdata.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=pubtatordb">pubtatordb</a> v0.1.3: Provides functions to download <a href="https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/">PubTator</a> (National Center for Biotechnology Information) annotations, and then create and query a local version of the database. There is a <a href="https://cran.r-project.org/web/packages/pubtatordb/vignettes/pubtatordb.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=tacmagic">tacmagic</a> v0.2.1: Provides functions to facilitate the analysis of positron emission tomography (PET) time activity curve (TAC) data. See <a href="doi:10.1097/00004647-199609000-00008">Logan et al. (1996)</a> and <a href="doi:10.1001/archneur.65.11.1509">Aizenstein et al. (2008)</a> for use cases, and the <a href="https://cran.r-project.org/web/packages/tacmagic/vignettes/walkthrough.html">vignette</a> for a detailed overview.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/tacmagic.png" height = "400" width="600"></p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=bulletcp">bulletcp</a> v1.0.0: Provides functions to automatically detect groove locations via a Bayesian changepoint detection method, to be used in the data pre-processing step of forensic bullet matching algorithms. See <a href="doi:10.2307/2986119">Stephens (1994)</a> for reference, the <a href="https://cran.r-project.org/web/packages/bulletcp/vignettes/Bayesian_changepoint_groove_detection.html">vignette</a> for the theory, and <a href="https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2019.01251.x">Mejia et al.</a> in the most recent issue of <a href="https://rss.onlinelibrary.wiley.com/toc/17409713/2019/16/2">Significance</a> for the big picture.</p>
<p><a href="https://cran.r-project.org/package=earthtide">earthtide</a> v0.0.5: Ports the <a href="http://igets.u-strasbg.fr/soft_and_tool.php">Fortran ETERNA 3.4</a> program by H.G. Wenzel for calculating synthetic Earth tides using the <a href="doi:10.1029/95GL03324">Hartmann and Wenzel (1994)</a> or <a href="doi:10.1007/s00190-003-0361-2">Kudryavtsev (2004)</a> tidal catalogs. See the <a href="https://cran.r-project.org/web/packages/earthtide/vignettes/introduction.html">vignette</a> for an introduction.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/earthtide.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=steps">steps</a> v0.2.1: Implements functions to simulate population dynamics across space and time. The <a href="https://cran.r-project.org/web/packages/steps/vignettes/egk_vignette.pd">Eastern Grey Kangeroo</a> vignette offers an extended example.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/steps.png" height = "400" width="600"></p>
<h3 id="shiny">Shiny</h3>
<p><a href="https://cran.r-project.org/package=periscope">periscope</a> v0.4.1: Implements an enterprise-targeted, scalable and UI-standardized <code>shiny</code> framework. There are vignettes for a <a href="https://cran.r-project.org/web/packages/periscope/vignettes/downloadFile-module.html">downloadFile module</a>, <a href="https://cran.r-project.org/web/packages/periscope/vignettes/downloadablePlot-module.html">downloadablePlot module</a>, <a href="https://cran.r-project.org/web/packages/periscope/vignettes/downloadableTable-module.html">downloadableTable module</a>, and the creation of a <a href="https://cran.r-project.org/web/packages/periscope/vignettes/new-application.html">framework-based application</a>.</p>
<p><a href="https://cran.r-project.org/package=reactlog">reactlog</a> v1.0.0: Provides visual insight into that black box of <code>shiny</code> reactivity by constructing a directed dependency graph of the application’s reactive state at any point in a reactive recording. See the <a href="file:///Users/JBRickert/Desktop/reactlog.htm">vignette</a> for an introduction.</p>
<p><iframe src="https://player.vimeo.com/video/321837450?title=0&byline=0&portrait=0" width="640" height="361" frameborder="0" allow="autoplay; fullscreen" allowfullscreen></iframe>
<p><a href="https://vimeo.com/321837450">reactlog highlight filter</a> from <a href="https://vimeo.com/cpsievert">Carson Sievert</a> on <a href="https://vimeo.com">Vimeo</a>.</p></p>
<p><a href="https://cran.r-project.org/package=shinyhttr">shinyhttr</a> v1.0.0: Modifies the <code>progress()</code> function from the <code>httr</code> package to let it send output to <code>progressBar()</code> function from the <code>shinyWidgets</code> package.</p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=CoopGame">CoopGame</a> v0.2.1: Provides a comprehensive set of tools for cooperative game theory with transferable utility, enabling users to create special families of cooperative games, such as bankruptcy games, cost-sharing games, and weighted-voting games. The <a href="https://cran.r-project.org/web/packages/CoopGame/vignettes/UsingCoopGame.pdf">vignette</a> offers theory and examples.</p>
<p><a href="https://cran.r-project.org/package=discfrail">discfrail</a> v0.1: Provides functions for fitting Cox proportional hazards models for grouped time-to-event data, where the shared group-specific frailties have a discrete non-parametric distribution. See <a href="doi:10.1093/biostatistics/kxy071">Gasperoni et. al (2018)</a>. The <a href="https://cran.r-project.org/web/packages/discfrail/vignettes/vignette.pdf">vignette</a> shows the math.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/discfrail.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=fastglm">fastglm</a> v0.1.1: Provides functions to fit generalized linear models efficiently using <code>RcppEigen</code>. The iteratively reweighted least squares implementation utilizes the step-halving approach of <a href="doi:10.32614/RJ-2011-012">Marschner (2011)</a>. There is a <a href="https://cran.r-project.org/web/packages/fastglm/vignettes/quick-usage-guide-to-the-fastglm-package.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=hettx">hettx</a> v0.1.1: Implements methods developed by <a href="arXiv:1412.5000">Ding, Feller, and Miratrix (2016)</a>, and <a href="arXiv:1605.06566">Ding, Feller, and Miratrix (2018)</a> for testing whether there is unexplained variation in treatment effects across observations, and for characterizing the extent of the explained and unexplained variation in treatment effects. There are vignettes on <a href="https://cran.r-project.org/web/packages/hettx/vignettes/detect_idiosyncratic_vignette.html">heterogeneous treatment effects</a> and <a href="https://cran.r-project.org/web/packages/hettx/vignettes/estimate_systematic_vignette.html">systematic fariation estimation</a>.</p>
<p><a href="https://cran.r-project.org/package=mcmcabn">mcmcabn</a> v0.1: Implements a structural MCMC sampler for Directed Acyclic Graphs (DAGs). It supports the new edge reversal move from <a href="doi:10.1007/s10994-008-5057-7">Grzegorczyk and Husmeier (2008)</a> and the Markov blanket resampling from <a href="http://jmlr.org/papers/v17/su16a.html">Su and Borsuk (2016)</a>, and three priors: a prior controlling for structure complexity from <a href="http://dl.acm.org/citation.cfm?id=1005332.1005352">Koivisto and Sood (2004)</a>, an uninformative prior, and a user-defined prior. The <a href="https://cran.r-project.org/web/packages/mcmcabn/vignettes/mcmcabn.html">vignette</a> provides an overview of the package.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/mcmcabn.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=networkABC">networkABC</a> v0.5-3: Implements a new multi-level approximation Bayesian computation (ABC) algorithm to decipher network data and assess the strength of the inferred links between network’s actors. The <a href="https://cran.r-project.org/web/packages/networkABC/vignettes/vignette.html">vignette</a> provides an example.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/networkABC.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=retrodesign">retrodesign</a> v0.1.0: Provides tools for working with Type S (Sign) and Type M (Magnitude) errors, as proposed in <a href="doi.org/10.1007/s001800000040">Gelman and Tuerlinckx (2000)</a> and <a href="doi.org/10.1177/1745691614551642">Gelman & Carlin (2014)</a>, using the closed forms solutions for the probability of a Type S/M error from <a href="doi.org/10.1111/bmsp.12132">Lu, Qiu, and Deng (2018)</a>. The <a href="https://cran.r-project.org/web/packages/retrodesign/vignettes/Intro_To_retrodesign.html">vignette</a> shows how to use Type S and M errors in hypothesis testing.</p>
<p><a href="https://cran.r-project.org/package=sensobol">senssobol</a> v0.1.1: Enables users to compute, bootstrap, and plot up to third-order <a href="https://en.wikipedia.org/wiki/Variance-based_sensitivity_analysis">Sobol</a> indices using the estimators by <a href="doi:10.1016/j.cpc.2009.09.018">Saltelli et al. (2010)</a> and <a href="doi:10.1016/S0010-4655(98)00154-4">Jansen (1999)</a>, and calculate the approximation error in the computation of Sobol first and total indices using the algorithm of <a href="doi:10.1016/j.envsoft.2017.02.001">Khorashadi Zadeh et al. (2017)</a>. The <a href="https://cran.r-project.org/web/packages/sensobol/vignettes/sensobol.html">vignette</a> provides an overview.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/sensobol.png" height = "400" width="600"></p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=DTSg">DTSg</a> v:0.1.2: Provides a class for working with time series data based on <code>data.table</code> and <code>R6</code> with reference semantics. There are vignettes for <a href="https://cran.r-project.org/web/packages/DTSg/vignettes/basicUsage.html">Basic</a> and <a href="https://cran.r-project.org/web/packages/DTSg/vignettes/advancedUsage.html">Advanced</a> usage.</p>
<p><a href="https://cran.r-project.org/package=RJDemetra">RJDemetra</a> v0.1.2: Implements an interface to <a href="https://github.com/jdemetra/jdemetra-app">JDemetra+</a>, the seasonal adjustment software officially recommended to the members of the European Statistical System (ESS) and the European System of Central Banks.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/RJDemetra.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=runstats">runstats</a> v1.0.1: Provides methods for quickly computing time series sample statistics, including: (1) mean, (2) standard deviation, and (3) variance over a fixed-length window of time-series, (4) correlation, (5) covariance, and (6) Euclidean distance (L2 norm) between short-time pattern and time-series. See the <a href="https://cran.r-project.org/web/packages/runstats/vignettes/using-runstats.html">vignette</a> for examples.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/runstats.png" height = "400" width="600"></p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=aweek">aweek</a> v0.2.0: Converts dates to arbitrary week definitions. The <a href="https://cran.r-project.org/web/packages/aweek/vignettes/introduction.html">vignette</a> provides examples.</p>
<p><a href="https://cran.r-project.org/package=credentials">credentials</a> v1.1: Provides tools for managing <a href="https://en.wikipedia.org/wiki/Secure_Shell">SSH</a> and <a href="https://git-scm.com/">git</a> credentials. See the <a href="https://cran.r-project.org/web/packages/credentials/vignettes/intro.html">vignette</a> for details.</p>
<p><a href="https://cran.r-project.org/package=cyphr">cyphr</a> v1.0.1: Implements wrappers using low-level support from <a href="https://cran.r-project.org/web/packages/sodium/vignettes/intro.html"><code>sodium</code></a> and <a href="https://www.openssl.org/">OpenSSL</a> to facilitate using encryption for data analysis. There is an <a href="https://cran.r-project.org/web/packages/cyphr/vignettes/cyphr.html">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/cyphr/vignettes/cyphr.html">Data Encryption</a>.</p>
<p><a href="https://cran.r-project.org/package=encryptr">encryptr</a> v0.1.2: Provides functions to encrypt data frame or tibble columns using strong RSA encryption. See <a href="https://cran.r-project.org/web/packages/encryptr/readme/README.html">README</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=lenses">lenses</a> v0.0.3: Provides tools for creating and using lenses to simplify data manipulation. Lenses are composable getter/setter pairs for working with data in a purely functional way, which were inspired by the Haskell library <code>lens</code> ( <a href="https://hackage.haskell.org/package/lens">Kmett (2012)</a> ). For a comprehensive history of lenses, see the <a href="https://github.com/ekmett/lens/wiki/History-of-Lenses"><code>lens</code> wiki</a> and look <a href="https://cfhammill.github.io/lenses/">here</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=yum">yum</a> v0.0.1: Provides functions to facilitate extracting information in <a href="https://en.wikipedia.org/wiki/YAML"><code>YAML</code></a> fragments from one or multiple files, optionally structuring the information in a <code>data.tree</code>. See the <a href="https://cran.r-project.org/web/packages/yum/readme/README.html">README</a> file.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/yum.png" height = "400" width="600"></p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=ggasym">ggasym</a> v0.1.1: Provides functions for asymmetric matrix plotting with <code>ggplot2</code>. See the <a href="https://cran.r-project.org/web/packages/ggasym/vignettes/ggasym-stats.html">vignette</a> for examples.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/ggasym.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=predict3d">predict3d</a> v:0.1.0: Provides functions for 2- and 3-dimensional plots for multiple regression models using packages <code>ggplot2</code> and <code>rgl</code>. It supports linear models (lm), generalized linear models (glm), and local polynomial regression fittings (loess). There is a <a href="https://cran.r-project.org/web/packages/predict3d/vignettes/predict3d.html">vignette</a>.</p>
<p><img src="/post/2019-04-17-MarchTop40_files/predict3d.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2019/04/26/march-2019-top-40-new-cran-packages/';</script>
Setting up RStudio Server on a Cloud for Collaboration and Reproducibility
https://rviews.rstudio.com/2019/04/17/setting-up-rstudio-server-on-a-cloud-with-linux/
Wed, 17 Apr 2019 00:00:00 +0000https://rviews.rstudio.com/2019/04/17/setting-up-rstudio-server-on-a-cloud-with-linux/
<p><em>Roland Stevenson is a data scientist and consultant who may be reached on <a href="https://www.linkedin.com/in/roland-stevenson/">Linkedin</a>.</em></p>
<p>When setting up R and RStudio Server on a cloud Linux instance, some thought should be given to implementing a workflow that facilitates collaboration and ensures R project reproducibility. There are many possible workflows to accomplish this. In this post, we offer an “opinionated” solution based on what we have found to work in a production environment. We assume all development takes place on an RStudio Server cloud Linux instance, ensuring that only one operating system needs to be supported. We will keep the motivation for <a href="https://semver.org/">good versioning</a> and <a href="http://adv-r.had.co.nz/Reproducibility.html">reproducibility</a> short: R projects evolve over time, as do the packages that they rely on. R projects that do not control package versions will eventually break and/or not be shareable or <a href="https://en.wikipedia.org/wiki/Replication_crisis">reproducible</a><sup class="footnote-ref" id="fnref:1"><a href="#fn:1">1</a></sup>.</p>
<p>Since R is a slowly evolving language, it might be reasonable to require that a particular Linux instance have only one version of R installed. However, requiring all R users to use the same versions of all packages to facilitate collaboration is clearly out of the question. The solution is to control package versions at the project level.</p>
<p><img src="/post/2019-04-15-Roland_files/Roland1.png" alt="R system, user, and Packrat library locations in Linux CentOS 7" /></p>
<p>We use <a href="https://rstudio.github.io/packrat/"><code>packrat</code></a> to control package versions. Already integrated with RStudio Server, <code>packrat</code> ensures that all installed packages are stored <em>with</em> the project<sup class="footnote-ref" id="fnref:2"><a href="#fn:2">2</a></sup>, and that these packages are available when a project is opened. With <code>packrat</code>, we know that project A will always be able to use ggplot2 2.5.0 and project B will always be able to use ggplot2 3.1.0. This is important if we want to be able to reproduce results in the future.</p>
<p>On Linux, <code>packrat</code> stores compiled packages in <code>packrat/lib/<LINUX_FLAVOR>/<R_VERSION></code>, an R-version-specific path, relative to the project’s base directory. An issue arises if we are using R version 3.5.0 one week and then upgrade to R 3.5.1 the next week: a <code>packrat</code> project will not find the 3.5.0 libraries anymore, and we will need to rebuild all the packages to install them in the 3.5.1 path. <code>packrat</code> will automatically build all packages from source (sources are stored in <code>packrat/src</code>) if it notices they are missing. However, this process can take tens of minutes, depending on the number of packages being built. Since this can be cumbersome when collaborating, we also opt to include the <code>packrat/lib</code> path in version control, thereby committing the compiled libraries as well.</p>
<p>Our solution is to bind one fixed R version to an instance<sup class="footnote-ref" id="fnref:5"><a href="#fn:5">3</a></sup> and release fixed-R instance images periodically. We prefer limited, consistent R-versions over continually upgrading to the most recent version of R. This approach helps to ensure reproducibility and make collaboration easier, avoids having to use docker containers<sup class="footnote-ref" id="fnref:4"><a href="#fn:4">4</a></sup>. While binding a fixed version of R to an instance may seem restrictive, we have found that it is in fact quite liberating. Since we only update the existing R version infrequently (think once a year), the barrier of agreeing on an R-version is removed and with it any need to agree on package versions at the user level. Instead, packages are distributed with the project via git. The benefits of fixing the R version for a particular instance are:</p>
<ul>
<li>Sharing <code>packrat</code> projects and reproducing results are both made easier, since pre-compiled libraries are included with the projects.</li>
<li>Fixing the R-version on an instance doesn’t keep us from upgrading R for a project, as <code>packrat</code> will automatically build and install libraries if an upgraded version is detected. In this way, a project can be opened on an instance with an upgraded R version and have its libraries compiled. Our limited instance image release schedule means the overhead to handle this only occurs at a maximum of once each year.</li>
<li>It is very unlikely that results will be different across R-versions, however being able to tie project results to one R-version allows us to upgrade R for a project while ensuring that results remain as expected.</li>
</ul>
<p>What we lose by not being on the bleeding edge of (thankfully relatively non-critical) bug fixes we gain in ease of collaboration. Here’s what we’ve done to accomplish this:</p>
<ul>
<li><a href="https://github.com/ras44/rstudio-instance">rstudio-instance</a> contains branches with scripts to set up a Linux instance with fixed R and RStudio versions. We <code>git clone</code> the repo and <code>git checkout</code> the branch suitable for the Linux flavor, R-version, and RStudio version we want. The scripts also ensure R is not auto-updated in the future.</li>
<li>We then run the install script to set up the instance and archive an image of it for future use.</li>
<li>Once the fixed-R instance is set up, <a href="https://github.com/ras44/rstudio-project">rstudio-project</a> contains an R-version specific base project with pre-built, <code>packrat</code>-managed, fixed-versions of many popular data-science packages<sup class="footnote-ref" id="fnref:3"><a href="#fn:3">5</a></sup>.</li>
<li>We <code>git clone</code> <a href="https://github.com/ras44/rstudio-project">rstudio-project</a> to a new project directory locally and remove the existing <code>.git</code> directory so that it can be turned it into a new git repo with <code>git init</code>.</li>
<li>We open the project in RStudio and begin work. All packages are pre-built, so we don’t have to go through lengthy installs. We can upgrade packages in the <code>packrat</code> library of the “Packages” tab, and then run <code>packrat::snapshot()</code> to save any libraries and ugrades into the project’s <code>packrat/</code> directory. We can then <code>git add packrat</code> to add any <code>packrat</code> updates to the project’s git repo.</li>
<li>If we ever need to duplicate results, we can always build the same fixed-R instance (or clone the image we stored earlier), clone the project on the instance, and know that it will work exactly the same as when we previously worked on it… sometimes years earlier.</li>
</ul>
<p>Here is a quick example script showing the workflow:</p>
<pre><code>git clone git@github.com:ras44/rstudio-instance.git
cd rstudio-instance
git checkout centos7_R3.5.0_RSS1.1.453
./install.sh
sudo passwd <USERNAME> # set user password for RStudio Server login
cd
git clone git@github.com:ras44/rstudio-project.git new-project
cd new-project
git checkout dev-linux-centos7-R3.5.0
rm -rf .git
git init
</code></pre>
<p>Finally, here are some issues with <code>packrat</code> that we have run into along with our solutions. Note that RStudio support has been very helpful in addressing issues while monitoring and providing solutions via their <a href="https://github.com/rstudio/packrat/issues">github issue tracker</a>.</p>
<p><img src="/post/2019-04-15-Roland_files/Roland2.png" alt="`packrat` libraries are listed under "Project Library"" /></p>
<ul>
<li><p>If R crashes and the <code>packrat</code> libraries are not accessible after the RStudio restarts the session, the project might need to be re-opened. Run <code>.libPaths()</code> to ensure the project library paths are correct. Verify libraries are accessible by looking at the “packages” tab in RStudio Server and ensuring a “Project Library” header exists with all packages(see above image). Follow <a href="https://github.com/rstudio/packrat/issues/549">issue discussion</a>.</p></li>
<li><p>An <a href="https://github.com/r-dbi/bigrquery/issues/247">issue</a> can arise when some packages are updated but others aren’t. This can be challenging to troubleshoot and raises the question of what to do when package versions become incompatible with each other. This is not <code>packrat</code>, but version compatibility.</p></li>
<li><p>Installing packages directly from a private/internal github is evolving. An easy solution exists: simply clone the package to a local directory such as <code>~/local_repos/</code>. Then use <code>install_local()</code> to install from the <code>local_repos</code> directory. See <a href="https://github.com/rstudio/packrat/issues/447">issue</a> for details.</p></li>
<li><p><code>packrat</code> can occasionally have <a href="https://github.com/rstudio/packrat/issues/347">very slow snapshots</a>, particularly with projects that contains many R-Markdown files and packages. This is likely due to <code>packrat</code> dependency searches. As discussed in the issue, we resolve it by ignoring all of our source directories with <code>packrat::set_opts(ignored.directories=c("all","my","R","src","directories")</code> and then running <code>packrat::snapshot(ignore.stale=TRUE, infer.dependencies=FALSE)</code>.</p></li>
</ul>
<div class="footnotes">
<hr />
<ol>
<li id="fn:1">Unless you somehow exclusively use packages that are never updated, never implement version-breaking/major version updates, or always provide backwards-compatible version upgrades. Many R packages are in major version 0, meaning there is no guarantee that a future release will maintain the same API.
<a class="footnote-return" href="#fnref:1">↩</a></li>
<li id="fn:2">In the <code>packrat/</code> directory
<a class="footnote-return" href="#fnref:2">↩</a></li>
<li id="fn:5">It <a href="https://support.rstudio.com/hc/en-us/articles/215488098-Installing-multiple-versions-of-R-on-Linux">is possible</a> to have multiple R versions installed on a system. I have avoided that for simplicity.
<a class="footnote-return" href="#fnref:5">↩</a></li>
<li id="fn:4">docker containers may be a good alternate solution, but in this case we are not using them.
<a class="footnote-return" href="#fnref:4">↩</a></li>
<li id="fn:3">rstudio-project contains all packages in the anaconda distribution and more.
<a class="footnote-return" href="#fnref:3">↩</a></li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/04/17/setting-up-rstudio-server-on-a-cloud-with-linux/';</script>
On Meeting Data Journalists
https://rviews.rstudio.com/2019/04/08/some-impressions-from-ire-car-2019/
Mon, 08 Apr 2019 00:00:00 +0000https://rviews.rstudio.com/2019/04/08/some-impressions-from-ire-car-2019/
<p><strong>“I’d rather do data than date”</strong>. I overheard this while eavesdropping on a conversation among three female data journalists while waiting for an elevator at the <a href="https://www.ire.org/conferences/nicar-2019/">IRE-CAR</a> (Investigative Reporters and Editors - Computer-Assisted Reporting) conference last month. I would like to think the remark was overloaded with hyperbole, but maybe not. Most of the attendees as this conference were motivated, tenacious, and highly skilled data hounds, the kind of investigative journalists who pry information from government databases through persistent requests, legal leverage, and SQL expertise.</p>
<p>This was my first CAR conference, and I was very impressed by the mission-driven enthusiasm with which the speakers, panelists, and attendees focused on data as an essential tool for the pursuit of the truth. I was impressed, but not surprised, to find this passion for data. Journalists have been sifting through data to find the truth since at least the early twentieth century when social work, academic social science, and journalism were all in the same primeval soup<sup>1</sup>. The modern tradition of computer-assisted data journalism dates at least as far back as <a href="https://en.wikipedia.org/wiki/Philip_Meyer">Philip Meyer’s</a> coverage of the <a href="https://en.wikipedia.org/wiki/1967_Detroit_riot">Detroit Riots</a>, his 1973 book <a href="https://www.ebooks.com/en-us/1352166/precision-journalism/meyer-philip/"><em>Precision Journalism</em></a>, and subsequent collaboration with <a href="https://en.wikipedia.org/wiki/Donald_L._Barlett">Donald Bartlet</a> and <a href="https://en.wikipedia.org/wiki/James_B._Steele">James Steele</a> examining patterns in 1970’s Philadelphia criminal conviction sentences<sup>2</sup>. I mention all this to emphasize that data journalism is not just a trendy offshoot of data science. In fact, it might be the other way around! Data scientists probably owe as much to data journalists as they do to statisticians.</p>
<p>The number of packed workshops and talks on data wrangling and visualization far exceeded my expectations. I did expect some R content. (I saw the recent <a href="https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535">BBC post</a>, and New York Times <a href="(https://flowingdata.com/tag/new-york-times/)">R-based visualizations</a> are a daily part of my news consumption.) But, there were over 15 R-related sessions on <a href="https://www.ire.org/events-and-training/conferences/nicar-2019/schedule">the schedule</a>, along with at least as many sessions devoted to Python, SQL, JavaScript, D3, and other programming tools. Moreover, the fact that several featured technical workshops were repeated over multiple days indicated that the conference organizers expected the data journalists to want to dig into the details of all these technologies.</p>
<p>You can get a flavor for the technology presentations by looking into the <a href="http://www.machlis.com/nicar19.html">tip sheets</a> that are available for many of these sessions. There is a wealth of information buried here, well worth a couple of hours of exploration. For example, see <a href="https://peteraldhous.com/">Peter Aldhous’</a> talk <a href="https://paldhous.github.io/NICAR/2019/r-text-analysis.html">Text mining in R with tidytext</a>,</p>
<p><img src="/post/2019-03-28-Rickert-IRECAR_files/aldhouse.png" height = "400" width="600"></p>
<p>Andrew Ba Tran’s <a href="https://github.com/andrewbtran/NICAR-2019-mapping">Mapping with R</a>, and the panel discussion <a href="https://docs.google.com/document/d/18E7iilbiGKC4bM8i05sFvB3b0ekahWhJxYIORPX3lhI/edit">How and why to make your data analysis reproducible</a>, and Sharon Machlis’ video series: <a href="https://www.youtube.com/playlist?list=PL7D2RMSmRO9JOvPC1gbA8Mc3azvSfm8Vv">Do More with R</a>.</p>
<p>A session on statistical inference that I very much enjoyed was <a href="https://jevinwest.org/">Jevin West’s</a> talk <a href="https://www.ire.org/events-and-training/event/3433/4193/">Calling bullshit: Data reasoning in a digital world</a>. There is no tip sheet for the talk, but the <a href="https://callingbullshit.org/syllabus.html#Introduction">website</a> for the course that he teaches at the University of Washington with his colleague <a href="http://octavia.zoology.washington.edu/">Carl Bergstrom</a> contains voluminous material. Something like this course ought to be included in every statistics and data science syllabus. In addition to discussing the standard topics, such as attributing cause to correlation and deconstructing misleading visualizations, it also presents several up-to-date cautionary tales: for example, have a look at the case study <a href="https://callingbullshit.org/case_studies/case_study_criminal_machine_learning.html">Criminal Machine Learning</a>.</p>
<p><img src="/post/2019-03-28-Rickert-IRECAR_files/criminal.png" height = "400" width="600"></p>
<p>Some additional tip sheets that I found illuminating for what they reveal about the types of data sources that data journalists seek out are: <a href="https://assets.documentcloud.org/documents/5757972/NICAR-2019-Dark-Money-Tip-Sheet-March-2019.pdf">Tracking dark money tips</a>, <a href="https://www.dropbox.com/s/vhgn04nvmewxgdn/Hansi_Wang_Tip_Sheet_Census_Reporting20190305.pdf?dl=0">2020 Census Reporting Mistakes</a>, <a href="https://docs.google.com/document/d/1-tt52jNG_lOYLm5m1Hk1QqLGQRkOwqD9FZLKPGN0648/edit">Tips on Finding Nonprofit Data</a>, and <a href="http://mjwebster.github.io/DataJ/tipsheets/BeforeYouEverStartYourAnalysis.pdf">Before you ever begin your analysis</a>.</p>
<p>If you are a data journalist, or a data journalist in training, or really anyone new to R, and are looking for a fast on-ramp to becoming productive at data wrangling and creating visualizations, I highly recommend Sharon Machilis book <a href="https://smach.github.io/R4JournalismBook/">Practical R for Mass Communication and Journalism</a> and Andrew Ba Tran’s tutorial, <a href="https://learn.r-journalism.com/en/">R For Journalists</a>. Both of these resources are unusual in that they provide up-to-date, <a href="https://www.tidyverse.org/">tidyverse</a>-based, GitHub-aware introductions to R, stressing data acquisition, manipulation, reporting, and graphing without the burden of having to simultaneously take an introductory course in statistics.</p>
<p>Finally, thanks to Andrew, here is a list of R-fluent journalists whom you may want to follow on Twitter:</p>
<ul>
<li><a href="https://twitter.com/paldhous">Peter Aldhouse</a></li>
<li><a href="https://twitter.com/akesslerdc">Aaron Kessler</a></li>
<li><a href="https://twitter.com/sharon000">Sharon Machlis</a></li>
<li><a href="https://twitter.com/dhmontgomery">David Montgomery</a></li>
<li><a href="https://twitter.com/hannah_recht">Hannah Recht</a></li>
<li><a href="https://twitter.com/abtran">Andrew Ba Tran</a></li>
<li><a href="https://twitter.com/MaryJoWebster">MaryJo Webster</a></li>
<li><a href="https://twitter.com/christinezhang">Christine Zhang</a></li>
</ul>
<p><sup>1</sup>C.W. Anderson. <a href="https://www.amazon.com/Apostles-Certainty-Journalism-Politics-Studies/dp/0190492341/ref=sr_1_1?crid=35VDHWYA1JY8L&keywords=apostles+of+certainty&qid=1553882060&s=books&sprefix=apostles+of+certainty%2Cstripbooks%2C210&sr=1-1"><em>Apostoles of Certainty</em></a>: Oxford University Press 2018. Chapter 2</p>
<p><sup>2</sup> <a href="https://en.wikipedia.org/wiki/James_B._Steele"><em>Data Journalism</em></a>, Wikipedia</p>
<script>window.location.href='https://rviews.rstudio.com/2019/04/08/some-impressions-from-ire-car-2019/';</script>
How to share R visualizations in Microsoft PowerPoint
https://rviews.rstudio.com/2019/04/04/sharing-r-visualizations-in-powerpoint/
Thu, 04 Apr 2019 00:00:00 +0000https://rviews.rstudio.com/2019/04/04/sharing-r-visualizations-in-powerpoint/
<p><em>Hadrien Dykiel is an RStudio Customer Success Engineer</em></p>
<p>Microsoft PowerPoint is often the de facto choice for creating presentation slides, especially at larger companies. In many organizations, it comes pre-installed on workstations and pretty much everybody knows how to use it. This can make it an effective medium for sharing information, since most folks are comfortable with it. Unfortunately, valuable time is often lost manually creating slides. R developers often find themselves copying and pasting their results into presentation decks. Moreover, results may change or over time, requiring analysts and data scientists to manually update their slides with the latest results. So, in addition to being a time-consuming task, copying and pasting also introduces a big reproducibility problem. R can help solve these problems by programmatically exporting your results to PowerPoint for you.</p>
<p>Let’s say you are collaborating on a project in which members of your team will use other tools like SAS and Excel to perform their analyses. At the end of the day, you plan to combine all of your work together into a single presentation deck that will be shared with various business stakeholders. You boot up RStudio and open an R Markdown file and produce a correlation plot for the presentation.</p>
<p>Rather than manually copying and pasting your corrplot into the final PowerPoint deck, you can update the output document type in your document’s YAML header to <code>powerpoint_presentation</code>. Optionally, you may also want to customize the appearance of your slides by passing a custom reference document via the <code>reference_doc</code> option. This is a nice option to use if you want your slides to match your company’s color schemes, for example. The snippet below shows what the code for a typical <code>rmarkdown</code> file with the output format set to PowerPoint might look like. Like all <code>.Rmd</code> files, it contains three elements: a YAML header that contains the metadata for your RMD file, narrative in simple markdown syntax, and code.</p>
<div class="figure">
<img src="/post/2019-03-05-sharing-r-visualizations-in-powerpoint_files/rmd_powerpoint_screenshot.png" />
</div>
<p>As soon as you hit the Knit button (or use the keyboard shortcut Cmd/Ctrl + Shift + K), RStudio initiates the knitting process. The <code>rmarkdown</code> package transforms your R script into markdown, and the <code>pandoc</code> package converts it to the PowerPoint output format, as specified in your YAML header. This process happens all underneath the hood, so as a user, the only thing you see is the final PowerPoint output file, which automatically opens as soon as your document finishes knitting. Because you created your PowerPoint slides programmatically, you can easily update them in the future, such as if new data becomes available and you wish to refresh your results.</p>
<p>The <code>rmarkdown</code> package offers a fair amount of flexibility for customizing your PowerPoint slides, such as having the ability to include R code, images, R visualizations, speaker notes, and customized column layout.</p>
<p>To learn more about creating PowerPoint presentations with R, Yihui’s <a href="https://bookdown.org/yihui/rmarkdown/powerpoint-presentation.html">RMD: The Definitive Guide</a> and RStudio’s article <a href="https://support.rstudio.com/hc/en-us/articles/360004672913-Rendering-PowerPoint-Presentations-with-RStudio">Rendering Powerpoint Presentations with RStudio</a> are both great resources.</p>
<script>window.location.href='https://rviews.rstudio.com/2019/04/04/sharing-r-visualizations-in-powerpoint/';</script>
RInside Help in Testing
https://rviews.rstudio.com/2019/04/01/rinside-help-in-testing/
Mon, 01 Apr 2019 00:00:00 +0000https://rviews.rstudio.com/2019/04/01/rinside-help-in-testing/
<p>A problem arises when building R interfaces to C/C++ libraries involves testing: how to go about replicating the existing C/C++ tests in R without undue effort. If the C/C++ tests are simple and small enough, they can be manually translated. However, when there are many tests, and each test initializes its own large data structures, the task becomes a chore.</p>
<p>We faced this problem with a recent release of the <a href="https://cran.r-project.org/package=ECOSolveR"><code>ECOSolveR</code></a>, a solver package crucial to our larger package <a href="https://cvxr.rbind.io"><code>CVXR</code></a>. Until version 0.4, we had been content with including one small test and a larger one using saved <code>RDA</code> files in the R package. But with our work on <code>CVXR</code> moving towards a version 1.0 release, we wanted to batten down the hatch.</p>
<p>The <a href="https://github.com/embotech/ecos"><code>ECOS</code></a> C library has about 28 tests and many of them include large, initialized arrays as test data. For example, see <a href="https://raw.githubusercontent.com/embotech/ecos/2954b2a640f2194bf91dbf51e682be17012d7698/test/MPC/MPC01.h">this</a>.</p>
<p>The initial thought was to parse out the arrays in the C source and write out R equivalents for testing. But that could be error-prone and if a test failed, we could never really be sure that our translation was not a problem.</p>
<p>So we looked around for a <a href="http://threevirtues.com/">lazy</a> solution that would let us create R data structures from within the C test code.</p>
<div id="enter-rinside" class="section level2">
<h2>Enter <code>RInside</code></h2>
<p><a href="http://dirk.eddelbuettel.com/code/rinside.html">RInside</a> allows one to embed R inside C/C++.</p>
<p>To use <code>RInside</code>, one initializes a handle to the R process in the C/C++ program. This handle can be used to assign R variables to values held in C/C++ scalars, arrays, etc. One can also evaluate arbitrary strings using the R process, which is quite handy: we can use it to execute a <code>saveRDS</code> call to save R structures in a file. The C++ snippet below shows a simple example. (To run the example, first untar the <a href="https://cran.r-project.org.package=RInside">RInside</a> source, copy <code>test.cpp</code> below to the <code>RInside/inst/examples/standard</code>, and <code>make test</code>.)</p>
<pre class="c"><code>// test.cpp
#include <RInside.h>
// Stuff a double array in a std::vector
std::vector< double > dVec(double *data, int len) {
std::vector< double > result;
for (int i = 0; i < len; i++) result.push_back(data[i]);
return(result);
}
#define DLEN(x) x? (int) (sizeof(x) / sizeof(double)) : 0
int main(int argc, char* argv[]) {
int n = 1;
double y[] = {1.0, 2.0, 3.0};
RInside R(argc, argv);
R.assign(n, "n"); // assign R variable n to C scalar n
R.assign(dVec(y, DLEN(y)), "y"); // assign R variable y to C vector y
std::string rds_file = "out.RDS";
R.assign(rds_file, "rds_file"); // assign R variable rds_file the value "rds_file"
R.parseEvalQ("saveRDS(list(n = n, y = y), file = rds_file)");
return 0;
}</code></pre>
<p>Note the use of some macros to determine lengths of initialized C arrays; in particular, we account for the fact that the array pointer could be <code>NULL</code>, common in the code we encounter. Not shown here is a similar macro that can be used for an initialized integer array. (Dynamically allocated arrays pose no difficulty since the lengths would be known.) Such macros are made accessible to the C/C++ code via included headers.</p>
<p>The following features of the <code>ECOS</code> C tests make it possible to exploit <code>RInside</code>.</p>
<ul>
<li>Each C test is in a single source file. One exception has five tests in a single file.</li>
<li>Each C test has a pattern: a call to a <em>setup</em> function to set up the data, followed by an actual call to the solver.</li>
<li>Most (but not all) arrays are initialized in the C source.</li>
<li>The types of the variables in the setup function call are fixed.</li>
<li>The tests are all invoked from a single test harness.</li>
</ul>
<p>This suggests a following strategy.</p>
<ol style="list-style-type: decimal">
<li>Create an R process in the C library testing harness.</li>
<li>Modify each C test source file by inserting <code>RInside</code> calls <em>before</em> calling the setup function. These calls will save the C data in R data structures, and export the R data using a call <code>saveRDS</code>.</li>
<li>Rerun the C library tests to generate the <code>RDS</code> files.</li>
<li>Use the generated <code>RDS</code> files in R package tests.</li>
</ol>
</div>
<div id="example" class="section level2">
<h2>Example</h2>
<p>Taking <a href="https://raw.githubusercontent.com/embotech/ecos/2954b2a640f2194bf91dbf51e682be17012d7698/test/MPC/MPC01.h">one test</a> as an example, the setup call (near the bottom of the file) has the following form:</p>
<pre class="c"><code>mywork = ECOS_setup(MPC01_n, MPC01_m, MPC01_p, MPC01_l, MPC01_ncones, MPC01_q, 0,
MPC01_Gpr, MPC01_Gjc, MPC01_Gir,
MPC01_Apr, MPC01_Ajc, MPC01_Air,
MPC01_c, MPC01_h, MPC01_b);</code></pre>
<p>And since the signature of <code>ECOS_setup</code> is known, the parameter types (scalars or arrays) can be fixed.</p>
<pre class="r"><code>var_type <- c(n = "int", m = "int", p = "int", l = "int", ncones = "int",
q = "int*", e = "int",
Gpr = "double*", Gjc = "int*", Gir = "int*",
Apr = "double*", Ajc = "int*", Air = "int*",
c = "double*", h = "double*", b = "double*")</code></pre>
<div id="extracting-c-variable-names" class="section level3">
<h3>Extracting C variable names</h3>
<p>A simple string match in R can easily identify the starting and ending lines of the C setup invocation in the test source file and return a list of the C variable names in each call.</p>
<pre class="r"><code>get_setup_vars <- function(source, file_source = TRUE) {
n <- length(lines <- if (file_source) readLines(source) else source)
starts <- grep("ECOS_setup", lines)
potential_ends <- grep("\\);$", lines)
ends <- sapply(starts,
function(i) potential_ends[potential_ends >= i][1])
## starts[i]:ends[i] are chunks of interest.
lapply(seq_along(starts),
function(i) {
lines[starts[i]:ends[i]] %>%
paste(collapse = "") %>%
stringr::str_replace(pattern = "(.*ECOS_setup\\()", replacement = "") %>%
stringr::str_replace(pattern = "(\\);)$", replacement = "") %>%
stringr::str_split(pattern = ",") %>%
magrittr::extract2(1) %>%
stringr::str_trim()
})
}</code></pre>
<p>A test invocation.</p>
<pre class="r"><code>source_lines <- c('mywork = ECOS_setup(MPC01_n, MPC01_m, MPC01_p, MPC01_l, MPC01_ncones, MPC01_q, 0,',
'MPC01_Gpr, MPC01_Gjc, MPC01_Gir,',
'MPC01_Apr, MPC01_Ajc, MPC01_Air,',
'MPC01_c, MPC01_h, MPC01_b);')
get_setup_vars(source_lines, file_source = FALSE)</code></pre>
<pre><code>[[1]]
[1] "MPC01_n" "MPC01_m" "MPC01_p" "MPC01_l"
[5] "MPC01_ncones" "MPC01_q" "0" "MPC01_Gpr"
[9] "MPC01_Gjc" "MPC01_Gir" "MPC01_Apr" "MPC01_Ajc"
[13] "MPC01_Air" "MPC01_c" "MPC01_h" "MPC01_b" </code></pre>
</div>
<div id="inserting-c-code" class="section level3">
<h3>Inserting C code</h3>
<p>The next task is to generate the code to be inserted before the call to setup. The following code does the job, taking the types of each of the variables in the setup call and making use of macros shown in the C++ example above.</p>
<pre class="r"><code>#' Assign a variable in R
#' @param x C variable name
#' @param r_name R variable name
set_val <- function(x, r_name) paste0('R.assign(', x, ', "', r_name, '");')
set_ivec <- function(x, r_name) paste0('R.assign(iVec(', x, ', ILEN(', x, ')), "', r_name, '");')
set_dvec <- function(x, r_name) paste0('R.assign(dVec(', x, ', DLEN(', x, ')), "', r_name, '");')
#' Generate C++ code lines for insertion into C test source
#' @param c_name a vector of C variable names that should be saved in R
#' @param a named list of variable types
#' @param rds_file a string naming the rds file for saving the list object
gen_cpp <- function(c_name, var_type, rds_file) {
r_name <- names(var_type)
result <- sapply(seq_along(var_type),
function(i) {
if (var_type[i] == "int" || var_type[i] == "double") {
set_val(c_name[i], r_name[i])
} else if (var_type[i] == "int*") {
set_ivec(c_name[i], r_name[i])
} else if (var_type[i] == "double*") {
set_dvec(c_name[i], r_name[i])
} else {
stop("Unknown variable type")
}
})
## Create list to save:
output <- paste("foo <- list(",
paste(sapply(r_name, function(x) paste(x, "=", x)), collapse=", "),
")")
## Set output file name and insert call to R saveRDS
c(result,
paste0("std::string fname = ", paste0('"', rds_file, '"'), ";"),
set_val('fname', 'fname'),
paste0('R.parseEvalQ("', output, '");'),
paste0('R.parseEvalQ("saveRDS(foo, file=fname)");'))
}</code></pre>
</div>
<div id="does-it-work" class="section level3">
<h3>Does it work?</h3>
<p>We can check that the appropriate C code is generated for inserting into the file.</p>
<pre class="r"><code>c_name <- get_setup_vars(source_lines, file_source = FALSE)[[1]]
gen_cpp(c_name, var_type, "foo.rds")</code></pre>
<pre><code> [1] "R.assign(MPC01_n, \"n\");"
[2] "R.assign(MPC01_m, \"m\");"
[3] "R.assign(MPC01_p, \"p\");"
[4] "R.assign(MPC01_l, \"l\");"
[5] "R.assign(MPC01_ncones, \"ncones\");"
[6] "R.assign(iVec(MPC01_q, ILEN(MPC01_q)), \"q\");"
[7] "R.assign(0, \"e\");"
[8] "R.assign(dVec(MPC01_Gpr, DLEN(MPC01_Gpr)), \"Gpr\");"
[9] "R.assign(iVec(MPC01_Gjc, ILEN(MPC01_Gjc)), \"Gjc\");"
[10] "R.assign(iVec(MPC01_Gir, ILEN(MPC01_Gir)), \"Gir\");"
[11] "R.assign(dVec(MPC01_Apr, DLEN(MPC01_Apr)), \"Apr\");"
[12] "R.assign(iVec(MPC01_Ajc, ILEN(MPC01_Ajc)), \"Ajc\");"
[13] "R.assign(iVec(MPC01_Air, ILEN(MPC01_Air)), \"Air\");"
[14] "R.assign(dVec(MPC01_c, DLEN(MPC01_c)), \"c\");"
[15] "R.assign(dVec(MPC01_h, DLEN(MPC01_h)), \"h\");"
[16] "R.assign(dVec(MPC01_b, DLEN(MPC01_b)), \"b\");"
[17] "std::string fname = \"foo.rds\";"
[18] "R.assign(fname, \"fname\");"
[19] "R.parseEvalQ(\"foo <- list( n = n, m = m, p = p, l = l, ncones = ncones, q = q, e = e, Gpr = Gpr, Gjc = Gjc, Gir = Gir, Apr = Apr, Ajc = Ajc, Air = Air, c = c, h = h, b = b )\");"
[20] "R.parseEvalQ(\"saveRDS(foo, file=fname)\");" </code></pre>
<p>This C code can be inserted before the setup call in each test file in an automated way.</p>
</div>
</div>
<div id="summary" class="section level2">
<h2>Summary</h2>
<p><code>RInside</code> can be part of the solution for generating R package tests based on underlying C/C++ library tests. The above approach, with some minor modifications, enabled us to reprogram all 28 C tests for our R package. The modified C test source can be found on <a href="https://github.com/bnaras/ecos-2.0.7-rinside/">GitHub</a>. For instance, compare the <a href="https://raw.githubusercontent.com/embotech/ecos/2954b2a640f2194bf91dbf51e682be17012d7698/test/MPC/MPC01.h">original source of one test</a> to the <a href="https://raw.githubusercontent.com/bnaras/ecos-2.0.7-rinside/master/test/MPC/MPC01.h">modified one</a>. The modifications are towards the end.</p>
<p>Once the modifications were inserted into the C source files, the C tests were re-run to generate <code>RDS</code> files now included in <code>ECOSolveR</code> version 0.5. It was then quite straightforward to add the tests using <a href="https://cran.r-project.org/package=testthat"><code>testthat</code></a> as may be seen from the <a href="https://github.com/bnaras/ECOSolveR/blob/master/tests/testthat/test-ecos-src.R">R test source</a>.</p>
<p><sup>(1)</sup> <em>Balasubramanian Narasimhan is a Senior Research Scientist in Statistics at Stanford University, and Director of the Data Coordinating Center in the Department of Biomedical Data Sciences.</em></p>
<p><sup>(2)</sup> <em>Anqi Fu is a Ph.D Candidate in Electrical Engineering at Stanford University.</em></p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/04/01/rinside-help-in-testing/';</script>
February 2019: “Top 40” New CRAN Packages
https://rviews.rstudio.com/2019/03/26/february-2019-top-40-new-cran-packages/
Tue, 26 Mar 2019 00:00:00 +0000https://rviews.rstudio.com/2019/03/26/february-2019-top-40-new-cran-packages/
<p>One hundred and fifty-one new packages arrived at CRAN in February. Here are my “Top 40” picks organized into eight categories: Bioinformatics, Data, Machine Learning, Medicine, Statistics, Time Series, Utilities and Visualization.</p>
<h3 id="bioinfomatics">Bioinfomatics</h3>
<p><a href="https://cran.r-project.org/package=Cascade">Cascade</a> v1.7: Implements a modeling tool allowing gene selection, reverse engineering, and prediction in cascade networks. See <a href="doi:10.1093/bioinformatics/btt705">Jung et al. (2014)</a> for details, along with a <a href="https://cran.r-project.org/web/packages/Cascade/vignettes/Cascade-manual.pdf">Package Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/Cascade/vignettes/E-MTAB-1475_re-analysis.pdf">re-analysis</a>.</p>
<figure>
<img src="/post/2019-03-18-Rickert-FebTop40_files/Cascade.png" height = "400" width="600"/>
<figcaption>
Result of reverse engineering a TH1 network
</figcaption>
</figure>
<p><a href="https://cran.r-project.org/package=countfitteR">countfitteR</a> v1.0: Implements functions and a <code>Shiny</code> app for the automatized evaluation of distribution models for count data with an eye towards use in DNA analyses. The <a href="https://cran.r-project.org/web/packages/countfitteR/vignettes/countfitteR.html">vignette</a> provides an overview.</p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=noaaoceans">noaaoceans</a> v0.1.0: Provides tools to access the <a href="https://tidesandcurrents.noaa.gov/api/">National Oceanic and Atmospheric Administration (NOAA) API</a>. See the <a href="https://cran.r-project.org/web/packages/noaaoceans/vignettes/getting_started.html">vignette</a> for details.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/noaaoceans.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=guardianapi">guardianapi</a> v0.1.0: Provides functions to access to <a href="https://open-platform.theguardian.com/">The Guardian’s open API</a>, containing all articles published in ‘The Guardian’ from 1999 to the present. The <a href="https://cran.r-project.org/web/packages/guardianapi/vignettes/introduction.html">vignette</a> shows how to use the package.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/guardianapi.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=RobinHood">RobinHood</a> v:1.0.1: Implements an interface to the <a href="https://robinhood.com">RobinHood</a> investing platform, including the ability to access account data, retrieve investment statistics and quotes, place and cancel orders, and more.</p>
<p><a href="https://cran.r-project.org/package=stlcsb">stlcsb</a> v0.1.2: Provides functions working with data from <a href="https://www.stlouis-mo.gov/government/departments/public-safety/neighborhood-stabilization-office/citizens-service-bureau/">The Citizens’ Service Bureau of the City of St. Louis</a> including downloading data, categorizing problem requests, cleaning and subsetting CSB data, and projecting the data using the x and y coordinates. See the <a href="https://cran.r-project.org/web/packages/stlcsb/vignettes/stlcsb.html">vignette</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/stlcsb.png" height = "200" width="400"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=bigMap">bigMap</a> v2.1.0: Implements an unsupervised clustering protocol for large scale structured data, based on a low dimensional representation of the data. See <a href="arXiv:1812.09869">Garriga and Bartumeus (2018)</a> and the <a href="https://cran.r-project.org/web/packages/bigMap/vignettes/bigMap_qckref.pdf">vignette</a> for details.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/bigMap.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=fastNaiveBayes">fastNaiveBayes</a> v1.0.1: Provides an extremely fast implementation of a Naive Bayes classifier that is largely based on the paper <a href="doi:10.3115/1067807">Schneider (2003)</a>. See the <a href="https://cran.r-project.org/web/packages/fastNaiveBayes/vignettes/fastnaivebayes.html">vignette</a> for an introduction.</p>
<p><a href="https://cran.r-project.org/package=gama">gama</a> v1.0.3: Implements a genetic, evolutionary approach to performing hard partitional clustering. For details see <a href="doi:10.18637/jss.v053.i04">Scrucca (2013)</a>, <a href="doi:10.18637/jss.v061.i06">Charrad et al. (2014)</a>, and <a href="doi:10.7287/peerj.preprints.26605v1">Tsagris and Papadakis (2018)</a>. The <a href="https://cran.r-project.org/web/packages/gama/vignettes/gama.html">vignette</a> shows how to use the package.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/gama.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=leiden">leiden</a> v0.2.3: Uses <code>reticulate</code> to implement the <code>Python leidenalg</code> clustering algorithm for partitioning graphs in to communities in R. See the <a href="https://github.com/vtraag/leidenalg"><code>Python</code> repository</a> and <a href="arXiv:1810.08473">Traag et al (2018)</a> for details. There is also a <a href="https://cran.r-project.org/web/packages/leiden/vignettes/run_leiden.html">vignette</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/leiden.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=r.blip">r.blip</a> v1.1: Provides functions to learn Bayesian networks from datasets containing thousands of variables, and includes algorithms for (1) parent set identification (<a href="http://papers.nips.cc/paper/5803-learning-bayesian-networks-with-thousands-of-variables">Scanagatta (2015)</a>), (2) general structure optimization (<a href="doi:10.1007/s10994-018-5701-9">Scanagatta (2018)</a>), (3) bounded tree width structure optimization (<a href="http://papers.nips.cc/paper/6232-learning-treewidth-bounded-bayesian-networks-with-thousands-of-variables">Scanagatta (2016)</a>), and (4) structure learning on incomplete data sets (<a href="doi:10.1016/j.ijar.2018.02.004">Scanagatta (2018)</a>).</p>
<p><a href="https://cran.r-project.org/package=RTML">RTML</a> v0.9: Implements efficient solvers for 10 regularized multi-task learning algorithms applicable for regression, classification, joint feature selection, task clustering, low-rank learning, sparse learning and network incorporation. The details are described <a href="doi:10.1093/bioinformatics/bty831">Cao and Schwarz (2018)</a>. There is a <a href="https://cran.r-project.org/web/packages/RMTL/vignettes/rmtl.html">Tutorial</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/RTML.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=Spectrum">Spectrum</a> v0.4: Implements a fast, adaptive spectral clustering algorithm for single and multi-view data. The <a href="https://cran.r-project.org/web/packages/Spectrum/vignettes/Spectrum_vignette.pdf">vignette</a> provides an introduction.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/Spectrum.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=SAR">SAR</a> v1.0.0: Provides both a stand-alone and <a href="https://github.com/Microsoft/Product-Recommendations/blob/master/doc/sar.md">Azure Cloud</a> implementation of the Smart Adaptive Recommendations (SAR) algorithm for personalized recommendations. Look <a href="https://github.com/Microsoft/Product-Recommendations/blob/master/doc/sar.md">here</a> for a description of the SAR algorithm.</p>
<p><a href="https://cran.r-project.org/package=tfdeploy">tfdeploy</a> v0.6.0: Provides tools to deploy <a href="https://www.tensorflow.org/">TensorFlow</a> models across several services. There is a vignette of <a href="https://cran.r-project.org/web/packages/tfdeploy/vignettes/introduction.html">Deploying TensorFlow Models</a> and another for using <a href="https://cran.r-project.org/web/packages/tfdeploy/vignettes/saved_models.html">Saved Models</a>.</p>
<p><a href="https://cran.r-project.org/package=tfio">tfio</a> v0.4.0: Provides an interface to <a href="https://www.tensorflow.org/api_docs/python/tf/io">TensorFlow IO</a>. There is a brief <a href="https://cran.r-project.org/web/packages/tfio/vignettes/introduction.html">Introduction</a>.</p>
<p><a href="https://cran.r-project.org/package=stabm">stabm</a> v1.0.0: Implements several measures for the assessment of the stability of feature selection. See <a href="doi:10.1155/2017/7907163">Bommert et al. (2017)</a>.</p>
<p><a href="https://cran.r-project.org/package=tidystopwords">tidystopwords</a> 0.9.0: Provides functions to generate stopword lists in 53 languages, in a way consistent across all the languages supported. There is a <a href="https://cran.r-project.org/web/packages/tidystopwords/vignettes/tidystopwords.html">vignette</a>.</p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=ClinReport">ClinReport</a> v0.9.1.11: Provides functions to create formatted statistical tables in Microsoft Word documents that meet clinical standards. There is a vignette for <a href="https://cran.r-project.org/web/packages/ClinReport/vignettes/clinreport_vignette_get_started.html">Getting Started</a>, a vignette for <a href="https://cran.r-project.org/web/packages/ClinReport/vignettes/clinreport_modify_outputs.html">Modifying Outputs</a>, and another for <a href="https://cran.r-project.org/web/packages/ClinReport/vignettes/clinreport_graphics.html">Graphic Outputs</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/ClinReport.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=safetyGraphics">safetyGraphics</a> v0.7.3: Implements a framework for evaluation of clinical trial safety through a <code>Shiny</code> application or standalone <code>htmlwidget</code> charts. See the <a href="https://cran.r-project.org/web/packages/safetyGraphics/vignettes/shinyUserGuide.html">User Guide</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/safetyGraphics.png" height = "200" width="400"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=dosearch">dosearch</a> v1.0.2: Implements a method to identify causal effects from arbitrary observational and experimental probability distributions via do-calculus and standard probability manipulations, using a search-based algorithm that handles selection bias (<a href="http://ftp.cs.ucla.edu/pub/stat_ser/r445.pdf">Bareinboim and Tian (2015)</a>), transportability (<a href="http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf">Bareinboim and Pearl (2014)</a>), missing data (<a href="http://ftp.cs.ucla.edu/pub/stat_ser/r410.pdf">Mohan et al. (2013)</a>), and arbitrary combinations of these. There is an informative <a href="https://cran.r-project.org/web/packages/dosearch/vignettes/dosearch.pdf">Introduction</a></p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/dosearch.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=geosample">geosample</a> v0.2.1: Provides functions for constructing sampling designs. For details, see <a href="doi:10.1016/j.spasta.2015.12.004">Chipeta et al. (2016)</a> and the <a href="https://cran.r-project.org/web/packages/geosample/vignettes/geosample-vignette.pdf">vignette</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/geosample.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=interactions">interactions</a> v1.0.0: Provides a suite of functions for conducting and interpreting the analysis of statistical interaction in regression models, and includes visualization of two- and three-way interactions. There is a vignette for <a href="https://cran.r-project.org/web/packages/interactions/vignettes/interactions.html">Exploring Interactions</a> and another for <a href="https://cran.r-project.org/web/packages/interactions/vignettes/categorical.html">Plotting Interactions</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/interactions.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=IrregLong">IrregLong</a> v0.1.1: Provides functions to analyze longitudinal data for which the times of observation are random variables that are potentially associated with the outcome process, and includes inverse-intensity weighting methods (<a href="doi:10.1111/j.1467-9868.2004.b5543.x">Lin et al. (2004)</a>) and multiple outputation (<a href="doi:10.1002/sim.6829">Pullenayegum (2016)</a>). Look <a href="https://cran.r-project.org/web/packages/IrregLong/vignettes/Irreglong-vignette.html">here</a> for an overview.</p>
<p><a href="https://cran.r-project.org/package=missCompare">missCompare</a> v1.0.1: Implements a pipeline to test and compare various missing data imputation algorithms on simulated and real data. There is a <a href="https://cran.r-project.org/web/packages/missCompare/vignettes/misscompare.html">Tutorial</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/missCompare.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=OutlierDetection">OutlierDetection</a> v0.1.0: Implements various methods to detect outliers including: model-based (<a href="https://www.jstor.org/stable/2347159">Barnett (1978)</a>), distance-based (<a href="http://cs.uef.fi/~franti/papers.html">Hautamaki et al. (2004)</a>), dispersion-based (<a href="https://link.springer.com/chapter/10.1007/0-387-25465-X_7">Jin et al. (2001)</a>), depth-based (<a href="http://www.aaai.org/Library/KDD/1998/kdd98-038.php">Johnson et al. (1998)</a>), and density-based (<a href="https://dl.acm.org/citation.cfm?id=3001507">Ester et al. (1996)</a>).</p>
<p><a href="https://cran.r-project.org/package=plsr">plsr</a> v0.0.1: Provides functions for the partial least squares analysis of the relation between two high-dimensional data sets. See <a href="doi:10.1016/j.neuroimage.2004.07.020">McIntosh & Lobaugh (2004)</a> and the <a href="https://cran.r-project.org/web/packages/plsr/vignettes/introduction.html">vignette</a> for more information.</p>
<p><a href="https://cran.r-project.org/package=pliable">pliable</a> v1.1: Fits a pliable lasso model. For details see <a href="arXiv:1712.00484">Tibshirani and Friedman (2018)</a> and the package <a href="https://cran.r-project.org/web/packages/pliable/vignettes/pliable.html">vignette</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/pliable.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=PointFore">PointFore</a> v0.2.0: Provides functions to estimate specification models for the state-dependent level of an optimal quantile/expectile forecast along with Wald Tests and a test of overidentifying restrictions. There is a <a href="https://cran.r-project.org/web/packages/PointFore/vignettes/Tutorial.html">Tutorial</a> and vignettes on the <a href="https://cran.r-project.org/web/packages/PointFore/vignettes/GDP.html">GDP Greenbook</a> and <a href="https://cran.r-project.org/web/packages/PointFore/vignettes/Precipitation.html">Preciptation</a> examples.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/PointFore.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=segmenTier">segmenTier</a> v0.1.2: Implements a dynamic programming solution to segmentation based on maximization of arbitrary similarity measures within segments, based on the theory described in <a href="doi:10.1038/s41598-017-12401-8">Machne et al. (2017)</a>. The vignette provides an <a href="https://cran.r-project.org/web/packages/segmenTier/vignettes/segmenTier.html">Introduction</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/segmenTier.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=TextForecast">TextForecast</a> v0.1.1: Provides functions for regression analysis and forecasting using textual data, which are based on <a href="doi:10.2139/ssrn.3312483">Lima (2018)</a>. The <a href="https://cran.r-project.org/web/packages/TextForecast/vignettes/textforecast.html">vignette</a> shows how to use the package.</p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=Rlgt">Rlgt</a> v0.1-2: Provides functions to use <code>rstan</code> to fit several Global Trend models for time series forecasting that are Bayesian generalizations and extensions of some Exponential Smoothing models. There is an <a href="https://cran.r-project.org/web/packages/Rlgt/vignettes/GT_models.html">Intorduction to global trend time series forecasting</a> and an <a href="https://cran.r-project.org/web/packages/Rlgt/vignettes/gettingStarted.html">Introduction</a> to the package.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/Rlgt.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=tsfeatures">tsfeatures</a> v1.0.0: Implements methods for extracting various features from time series data as described in <a href="doi:10.1109/ICDMW.2015.104">Hyndman et al. (2013)</a> , <a href="doi:10.1016/j.ijforecast.2016.09.004">Kang et al.(2017)</a> and <a href="doi:10.1098/rsif.2013.0048">Fulcher et al. (2013)</a>. The <a href="https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html">vignette</a> contains examples.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=pak">pak</a> v0.1.2: Streamlines and improves package installation. See <a href="https://cran.r-project.org/web/packages/pak/readme/README.html">README</a>.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/pak.svg" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=qs">qs</a> v0.14.1: Provides functions for quickly writing and reading any R object to and from disk. See the <a href="https://cran.r-project.org/web/packages/qs/vignettes/vignette.html">vignette</a> for use and timings.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/qs.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=ropendata">ropendata</a> v0.1.0: Provides functions to collect cyber-security data and make it available via the <a href="http://opendata.rapid7.com">Open Data</a> portal. Look at <a href="https://cran.r-project.org/web/packages/ropendata/readme/README.html">README</a> for information on using the package.</p>
<p><a href="https://cran.r-project.org/package=rosr">rosr</a> v0.0.5: Provides methods to create reproducible academic projects with integrated academic elements, including datasets, references, codes, images, manuscripts, dissertations, slides and so on.</p>
<p><a href="https://cran.r-project.org/package=shinyEventLogger">ShinyEventLogger</a> v0.1.1: Implements a logging framework for complex Shiny apps. The <a href="https://cran.r-project.org/web/packages/shinyEventLogger/vignettes/shinyEventLogger.html">vignette</a> shows how to start logging.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=gratia">gratia</a> v0.2-8: Provides graceful <code>ggplot</code>-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the <code>mgcv</code> package. Look <a href="https://gavinsimpson.github.io/gratia/">here</a> for examples.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/gratia.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=jskm">jskm</a> v0.3.1: Provides the function <code>jskm()</code> to create publication quality Kaplan-Meier plots with at-risk tables below, and <code>svyjskm()</code> to plot a weighted Kaplan-Meier estimator.</p>
<p><img src="/post/2019-03-18-Rickert-FebTop40_files/jskm.png" height = "200" width="400"></p>
<script>window.location.href='https://rviews.rstudio.com/2019/03/26/february-2019-top-40-new-cran-packages/';</script>
How to Avoid Publishing Credentials in Your Code
https://rviews.rstudio.com/2019/03/21/how-to-avoid-publishing-credentials-in-your-code/
Thu, 21 Mar 2019 00:00:00 +0000https://rviews.rstudio.com/2019/03/21/how-to-avoid-publishing-credentials-in-your-code/
<p><em>Roland Stevenson is a data scientist and consultant who may be reached on <a href="https://www.linkedin.com/in/roland-stevenson/">Linkedin</a>.</em></p>
<p>When accessing an API or database in R, it is often necessary to provide credentials such as a login name and password. You may find yourself being prompted with something like this:</p>
<p><img src="/post/2019-03-15-roland_files/roland.png" alt="Figure: Providing credentials via an interactive prompt" /></p>
<p>When writing an R script that requires a user to provide credentials, you will want a way to have the script prompt the user or, better yet, programatically provided the credentials in the R script. Either way, be careful! You don’t want to put your credentials out there in the clear for all the world to see. Best practices<sup class="footnote-ref" id="fnref:3"><a href="#fn:3">1</a></sup> emphatically state:</p>
<blockquote>
<p>As with every programming language, it is important to <strong>avoid publishing code with your credentials in plain text</strong>.</p>
</blockquote>
<p>So, how can we provide credentials without putting them in the script itself? There are a variety of options described in RStudio’s <a href="https://db.rstudio.com/best-practices/managing-credentials/">“Databases using R”</a>.</p>
<p>I will focus on two cases:</p>
<ul>
<li>simply prompting for credentials via <a href="https://cran.r-project.org/package=rstudioapi"><code>rstudioapi</code></a>
<ul>
<li>suitable for simple credential management</li>
</ul></li>
<li>storing sets of encrypted credentials in a local file via the R <a href="https://cran.r-project.org/package=keyring"><code>keyring</code></a> package
<ul>
<li>suitable for more complicated credential management</li>
</ul></li>
</ul>
<h3 id="prompting-for-a-username-and-password">Prompting for a username and password</h3>
<p>If an R Script requires only one set of credentials and those credentials are easy to remember, it may be easiest to prompt the user for them using <code>rstudioapi</code>. A typical example would be prompting users for their username and password to access a corporate database:</p>
<pre><code>username <- rstudioapi::askForPassword("Database username")
password <- rstudioapi::askForPassword("Database password")
</code></pre>
<p>This method may also be convenient if the user’s credentials tend to change over time.</p>
<h3 id="the-r-keyring-package">The R Keyring package</h3>
<p>A more sophisticated option is to use the R <code>keyring</code> package to store and access encrypted credentials locally. This might be more suitable if multiple credentials exist to access a variety of services (think multiple access tokens). With <code>keyring</code>, one password unlocks the keyring which then provides access to all the credentials.</p>
<p>To use the <code>keyring</code> package, a user only needs to to install and load the package<sup class="footnote-ref" id="fnref:2"><a href="#fn:2">2</a></sup> and define three strings: the keyring name, a keyring service, and the username that we want to associate our secret credentials with.</p>
<p>The following example shows how to create a keyring name <code>my_keyring</code>, with credentials to access <code>my_database</code> as <code>my_username</code>. We first create a <code>backend_file</code> type of keyring which will store the encrypted credentials in the user’s home directory ( <code>~/.config/r-keyring</code>). With <code>keyring_create</code>, we prompt for the password that will unlock the keyring. Finally, we store a credential in the keyring with <code>set</code> before locking it with <code>keyring_lock</code>.</p>
<pre><code>library(keyring)
# Set variables to be used in keyring.
kr_name <- "my_keyring"
kr_service <- "my_database"
kr_username <- "my_username"
# Create a keyring and add an entry using the variables above
kb <- keyring::backend_file$new()
# Prompt for the keyring password, used to unlock keyring
kb$keyring_create(kr_name)
# Prompt for the credential to be stored in the keyring
kb$set(kr_service, username=kr_username, keyring=kr_name)
# Lock the keyring
kb$keyring_lock(kr_name)
# The encrypted keyring file is now stored at ~/.config/r-keyring/ and can be
# accessed by any R program that provides the keyring password
</code></pre>
<p>We can store credentials for multiple usernames per service, and multiple services per keyring. This is ideal in the case of an application that must access a variety of services via access tokens. The encrypted credentials file can either be published with the code, or perhaps for extra security, distributed via a separate channel.</p>
<h2 id="retrieving-credentials">Retrieving credentials</h2>
<p>To retrieve credentials, set the same three variables and use the <code>keyring</code> <code>get()</code> function, which will prompt us for the keyring password that we set when we called <code>create</code>. A retrieval script might look like this:</p>
<pre><code>library(keyring)
library(DBI)
# Set variables to be used in keyring.
kr_name <- "my_keyring"
kr_service <- "my_database"
kr_username <- "my_username"
# Output the stored password: normally you would not want to do this
keyring::backend_file$new()$get(service = kr_service,
user = kr_username,
keyring = kr_name)
# Establish connection to Teradata retrieving the password from the keyring.
dbConnect(drv = odbc::odbc(),
dsn = "my_dsn", # set DSN options in ~/.odbc.ini
pwd = keyring::backend_file$new()$get(service = kr_service,
user = kr_username,
keyring = kr_name))
</code></pre>
<p>With this, we are able to retrieve arbitrary credentials for a particular username and service, allowing us to manage much more complicated sets of credentials with a single password.</p>
<p>So, which is the best way to ensure that plain text credentials are not published with code? If your code relies on a limited number of credentials, an interactive prompt may be the more suitable choice: code users know what their username and password are and can easily enter them interactively.</p>
<p>If the code requires multiple, hard-to-remember, or cumbersome to provide credentials, you might want to consider using keyrings. Users will only need to provide one password, which will unlock the keyring and provide access to all credentials.</p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:3"><a href="https://db.rstudio.com/best-practices/managing-credentials/">“Databases using R”</a> from RStudio
<a class="footnote-return" href="#fnref:3">↩</a></li>
<li id="fn:2">The r-keyring package is automatically installed and available in <a href="https://github.com/ras44/rstudio-project">rstudio-project</a>.
<a class="footnote-return" href="#fnref:2">↩</a></li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/03/21/how-to-avoid-publishing-credentials-in-your-code/';</script>
The reticulate package solves the hardest problem in data science: people
https://rviews.rstudio.com/2019/03/18/the-reticulate-package-solves-the-hardest-problem-in-data-science-people/
Mon, 18 Mar 2019 00:00:00 +0000https://rviews.rstudio.com/2019/03/18/the-reticulate-package-solves-the-hardest-problem-in-data-science-people/
<p><em>Andrew Mangano is the Director of eCommerce Analytics at Albertsons Companies.</em></p>
<p>Part I - Modelling</p>
<p>The <code>reticulate</code> package integrates Python within R and, when used with RStudio 1.2, brings the two languages together like never before. Much more important than the technical details of how it all works is the impact that it has on on both individuals and teams by enabling data scientists who speak different languages to collaborate seamlessly on a project.</p>
<p>A data scientist is first and foremost a problem solver. The ability to frame a problem and decide how it might be solved is what separates someone who merely knows code syntax from someone who is capable of discovering a novel solution to a hard problem. Despite all of the buzz around the field, however, there exists a major skills gap where there is limited talent available.</p>
<p>When I meet someone who shares an analytic mindset and a passion for data science, it is exciting. Unfortunately, when it comes to collaboration, the Python/R language gap among practitioners yields an inefficient separation. Python and R did not easily mix previously. Teams would spend valuable analysis time translating and re-coding, or worse, dividing analysts into groups around one language. I recently heard a recruiter say: “if you program in Python, then you should apply to this team and if you program in R, then you should apply to that team.” How absurd it is to throw the problem solver out of the equation and limit team-building to a language!</p>
<p>In this example, I highlight how the <code>reticulate</code> package might be used for an integrated analysis. While simple, it highlights three different types of models: native R (<code>xgboost</code>), ‘native’ R with Python backend (<code>TensorFlow</code>), and a native Python model (<code>lightgbm</code>) run in-line with R code, in which data is passed seamlessly to and from Python.</p>
<p>In order to provide an open-source reproducible example, we’ll use the <code>BreastCancer</code> data set from the <code>mlbench</code> package. Our task is binary classification to predict the class as ‘benign’ or ‘malignant.’</p>
<pre class="r"><code>library(mlbench) #provides the data set
data("BreastCancer")</code></pre>
<p>Many machine learning models require the data frame to be represented as a numeric matrix. Using <code>sapply()</code>, we convert the data frame to a numeric matrix. To make the example simpler, we remove incomplete observations via complete.cases and remove the Id column. Converting the ‘Class’ column to numeric creates a numeric column as 1 and 2 instead of 0 and 1, which much be corrected.</p>
<p>The matrix is now ready for a 70% / 30% split for training and testing data sets. The initial training and testing sets are in their native dimensions. One of the model frameworks that we plan to use requires scaled data, which is achieved by using the mean and standard deviation in the scale function.</p>
<pre class="r"><code>#convert to numeric for models and remove na values for this example
model_set <- sapply(BreastCancer[complete.cases(BreastCancer),-1], as.numeric)
#format target variable as 0, 1 instead of 1,2
model_set[,10]<-model_set[,10]-1
#Split into test and train sets
indices <- sample(1:nrow(model_set), size = 0.7 * nrow(model_set))
#Target variables
target<-unlist(model_set[indices,10])
test_target<-unlist(model_set[-indices,10])
#create unscaled data set for boosted tree models
unscale_train<-as.matrix(model_set[indices,-10])
unscale_test<-as.matrix(model_set[-indices,-10 ])
#create normalized data set for neural network
mean <- apply(model_set[indices,-10], 2, mean)
std <- apply(model_set[indices,-10], 2, sd)
train <- scale(model_set[indices,-10], center = mean, scale = std)
test <- scale(model_set[-indices,-10], center = mean, scale = std)</code></pre>
<p>Now that we have a training and testing data set, we can train models.</p>
<p>The first model is a native R package, <code>xgboost</code>, short for ‘extreme gradient boosting’. This library can be installed via a simple call of <code>install.packages('xgboost')</code>, and does not require any additional software. The objective function for our classification problem is ‘binary:logistic’, and the evaluation metric is ‘auc’ for ‘area under the curve’ in an ROC framework.</p>
<pre class="r"><code>library(xgboost)
boost_model<-xgboost(data = unscale_train,label=target,booster="gbtree", nfold = 2,nrounds = 25, verbose = FALSE, objective = "binary:logistic", eval_metric = "auc", nthread = 4)</code></pre>
<p>The next model is a “native” R Package, <code>TensorFlow</code> in R using Keras. Keras is a common interface for TensorFlow, which makes it easier to build certain models. Unlike the previous package, there are extra installation steps for this package beyond <code>install.packages('keras')</code>. Once the library is installed, another step is required via <code>install_keras()</code>. TensorFlow in R uses a python backend, which is why additional set up is needed. Despite the underlying technical details about how the code works, most users will likely not even notice because the coding is done entirely in R.</p>
<p>The structure and details of this example model are similar to the MNIST example on the <a href="https://tensorflow.rstudio.com/keras/">RStudio Keras page</a>.</p>
<pre class="r"><code>library(keras)
y_target<-to_categorical(target,2)
tf_nn <- keras_model_sequential() %>%
layer_dense(units = 12,
activation = 'relu',
input_shape = dim(train)[[2]]) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 12,
activation = 'relu')%>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 2,
activation = 'softmax')
tf_nn %>% compile(
optimizer = optimizer_rmsprop(),
loss = "categorical_crossentropy",
metrics = c("accuracy")
)
history<-tf_nn %>% fit(
x=train,
y=y_target,
epochs = 7,
batch_size = 12
)</code></pre>
<p>(<em>Note that the output has been truncated for publication.</em>) <img src="/post/2019-03-11-mangano_files/man1.png" height = "600" width= 100%></p>
<p>The last model to be tested is entirely outside of the R ecosystem. The goal is to take the data we have been using in R, pass it to python, train a model, then pass the results back into R.</p>
<p>As an example, we will use the Python <code>LightGBM</code> package.</p>
<p>To begin, load the reticulate package.</p>
<pre class="r"><code>library(reticulate)</code></pre>
<p>The next code chunk is written entirely in python. RStudio 1.2 allows chunks of python code to be run in the same notebook as R code. Notice that the beginning of the chunk is not <code>{r}</code>, but instead <code>{python}</code>.</p>
<p>Data is passed to Python through <code>r.</code> commands. In this code chunk, the model turning parameters are saved in <code>params</code> and passed in the <code>lgb.train</code> function. The data from R is passed in the <code>r.unscale_train</code>, <code>r.target</code>, <code>r.unscale_test</code>. This is the same data used in the <code>xgboost</code> model.</p>
<pre class="python"><code>import pandas as pd
import lightgbm as lgb
params = {
'boosting_type': 'rf',#or can use 'gbdt',#'dart',
'objective': 'binary',
'metric': {'auc'},
'num_leaves': 10,
'max_depth': 8,
'feature_fraction': 0.9,
'bagging_fraction': 0.95,
'bagging_freq': 10,
'learning_rate': 0.003,
'nthreads': 1,
'nrounds':10,
'min_data': 10
}
lgtrain = lgb.Dataset(r.unscale_train, label=r.target)
model = lgb.train(params, lgtrain, 200)
light_gbm_test = model.predict(r.unscale_test)</code></pre>
<p>(<em>Note that the output has been truncated for publication.</em>) <img src="/post/2019-03-11-mangano_files/man2.png" height = "600" width= 100%></p>
<p>Once the model is trained in Python, it is possible to pass the data back to R using the <code>py$</code> command. In this chunk, which is back in R code, the test set predictions are passed to a data frame in R to compare the performance against the other models.</p>
<pre class="r"><code>lgb_pred<-py$light_gbm_test</code></pre>
<p>This last code chunk creates probability and binary predictions for the <code>xgboost</code> and <code>TensorFlow</code> (neural net) models, and creates a binary prediction for the <code>lightGBM</code> model. Using the binary predictions, we then create basic confusion matrices to compare the model predictions on the test data set.</p>
<pre class="r"><code>xgbpred <- predict (boost_model,unscale_test)
xgbpred_binary <- ifelse (xgbpred > 0.5,1,0)
nn_pred <- tf_nn %>% predict_classes(x=test)
nnpred_prob <- tf_nn %>% predict(x=test)
lgb_pred_binary <- ifelse (lgb_pred > 0.5,1,0)
#XG Boost - Natvie R
data.frame(pred=xgbpred_binary,act=test_target)%>%table()</code></pre>
<p><img src="/post/2019-03-11-mangano_files/man3.png" height = "600" width= 100%></p>
<pre class="r"><code>#TensorFlow - 'Native' R
data.frame(pred=nn_pred,act=test_target)%>%table()</code></pre>
<p><img src="/post/2019-03-11-mangano_files/man4.png" height = "600" width= 100%></p>
<pre class="r"><code>#LightGBM - Python
data.frame(pred=lgb_pred_binary,act=test_target)%>%table()</code></pre>
<p><img src="/post/2019-03-11-mangano_files/man5.png" height = "600" width= 100%></p>
<p>In the first part of this example, I showed how R and Python can be used together in a single notebook for a classification problem. The simplicity with which data can be passed allows for streamlined integration between the two languages.</p>
<p>In Part II, I will show visualization features of the <code>reticulate</code> package and RStudio 1.2.</p>
<p>For more information on the algorithms used in this post, please explore these resources.</p>
<p>XG Boost - <a href="https://xgboost.readthedocs.io" class="uri">https://xgboost.readthedocs.io</a></p>
<p>Python version 3.7 - <a href="https://www.anaconda.com/" class="uri">https://www.anaconda.com/</a></p>
<p>Keras TensorFlow for R - <a href="https://tensorflow.rstudio.com/keras/" class="uri">https://tensorflow.rstudio.com/keras/</a></p>
<p>Microsoft Light GBM - <a href="https://lightgbm.readthedocs.io/en/latest/" class="uri">https://lightgbm.readthedocs.io/en/latest/</a></p>
<script>window.location.href='https://rviews.rstudio.com/2019/03/18/the-reticulate-package-solves-the-hardest-problem-in-data-science-people/';</script>
Parsnipping Fama French
https://rviews.rstudio.com/2019/03/14/parsnipping-fama-french/
Thu, 14 Mar 2019 00:00:00 +0000https://rviews.rstudio.com/2019/03/14/parsnipping-fama-french/
<script src="/rmarkdown-libs/htmlwidgets/htmlwidgets.js"></script>
<script src="/rmarkdown-libs/jquery/jquery.min.js"></script>
<script src="/rmarkdown-libs/proj4js/proj4.js"></script>
<link href="/rmarkdown-libs/highcharts/css/motion.css" rel="stylesheet" />
<script src="/rmarkdown-libs/highcharts/highcharts.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-3d.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-more.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/stock.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/heatmap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/treemap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/annotations.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/boost.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/data.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drag-panes.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drilldown.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/funnel.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/item-series.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/offline-exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/overlapping-datalabels.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/parallel-coordinates.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sankey.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/solid-gauge.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/streamgraph.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sunburst.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/vector.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/wordcloud.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/xrange.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/export-data.js"></script>
<script src="/rmarkdown-libs/highcharts/maps/modules/map.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/grouped-categories.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/motion.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/multicolor_series.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/reset.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/symbols-extra.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/text-symbols.js"></script>
<script src="/rmarkdown-libs/highchart-binding/highchart.js"></script>
<p>Today, we will continue our exploration of developments in the world of <a href="https://github.com/tidymodels">tidy models</a>, and we will stick with our usual Fama French modeling flow to do so. For new readers who want get familiar with Fama French before diving into this post, see <a href="https://rviews.rstudio.com/2018/04/11/introduction-to-fama-french/">here</a> where we covered importing and wrangling the data, <a href="https://rviews.rstudio.com/2018/05/10/rolling-fama-french/">here</a> where we covered rolling models and visualization, and <a href="https://rviews.rstudio.com/2018/11/19/many-factor-models/">here</a> where we covered managing many models. If you’re into Shiny, <a href="http://www.reproduciblefinance.com/shiny/fama-french-three-factor/">this flexdashboard</a> might be of interest, as well.</p>
<p>Let’s get to it.</p>
<p>First, we need our data and, as usual, we’ll import data for daily prices of five ETFs, convert them to returns (have a look <a href="http://www.reproduciblefinance.com/2017/09/25/asset-prices-to-log-returns/">here</a> for a refresher on that code flow), then import the five Fama French factor data and join it to our five ETF returns data. Here’s the code to make that happen (this code was covered in detail in <a href="http://www.reproduciblefinance.com/2018/06/07/fama-french-write-up-part-one/">this post</a>:</p>
<pre class="r"><code>symbols <- c("SPY", "EFA", "IJS", "EEM", "AGG")
# The prices object will hold our daily price data.
prices <-
getSymbols(symbols,
src = 'yahoo',
from = "2012-12-31",
to = "2017-12-31",
auto.assign = TRUE,
warnings = FALSE) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`(symbols)
asset_returns_long <-
prices %>%
tk_tbl(preserve_index = TRUE, rename_index = "date") %>%
gather(asset, prices, -date) %>%
group_by(asset) %>%
mutate(daily_returns = (log(prices) - log(lag(prices)))) %>%
na.omit()
factors_data_address <-
"http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/Global_5_Factors_Daily_CSV.zip"
factors_csv_name <- "Global_5_Factors_Daily.csv"
temp <- tempfile()
download.file(
# location of file to be downloaded
factors_data_address,
# where we want R to store that file
temp,
quiet = TRUE)
Global_5_Factors <-
read_csv(unz(temp, factors_csv_name), skip = 6 ) %>%
rename(date = X1, MKT = `Mkt-RF`) %>%
mutate(date = ymd(parse_date_time(date, "%Y%m%d")))%>%
mutate_if(is.numeric, funs(. / 100)) %>%
select(-RF)
data_joined_tidy <-
asset_returns_long %>%
left_join(Global_5_Factors, by = "date") %>%
na.omit()</code></pre>
<p>For today, let’s work with just the <code>SPY</code> data by filtering our data set by asset.</p>
<pre class="r"><code>spy_2013_2017 <- data_joined_tidy %>%
filter(asset == "SPY")</code></pre>
<p>Next, we re-sample this five years’ worth of data into smaller subsets of training and testing sets. This is frequently done by k-fold cross validation (see <a href="http://www.reproduciblefinance.com/2019/03/13/rsampling-fama-french/">here</a> for an example), where random samples are taken from the data, but since we are working with time series, we will use a time-aware technique. The <a href="https://cran.r-project.org/package=rsample"><code>rsample</code></a> package has a function for exactly this purpose, the <code>rolling_origin()</code> function. We covered this process extensively in this <a href="http://www.reproduciblefinance.com/2019/03/14/rolling-origin-fama-french/">previous post</a>. Here’s the code to make it happen.</p>
<pre class="r"><code>rolling_origin_spy_2013_2017 <-
rolling_origin(
data = spy_2013_2017,
initial = 100,
assess = 1,
cumulative = FALSE
)
rolling_origin_spy_2013_2017 %>%
dim()</code></pre>
<pre><code>[1] 1159 2</code></pre>
<p>We now have a data object called <code>rolling_origin_spy_2013_2017</code> that holds 1159 <code>splits</code> of data. Each split consists of an analysis data set with 100 days of return and factor data, and an assessment data set with one day of return and factor data.</p>
<p>Now, we can start using that collection of data splits to fit a model on the assessment data, and then test our model on the assessment data. That means it’s time to introduce a relatively new addition to the R tool chain, the <a href="https://cran.r-project.org/package=parsnip"><code>parsnip</code></a> package.</p>
<p><code>parsnip</code> is a unified model interface that allows us to create a model specification, set an analytic engine, and then fit a model. It’s a ‘unified’ interface in the sense that we can use the same scaffolding but insert different models, or different engines, or different modes. Let’s see how that works with linear regression.</p>
<p>Recall that <a href="http://www.reproduciblefinance.com/2019/03/14/rolling-origin-fama-french/">in the previous post</a>, we piped our data into a linear model like so:</p>
<pre class="r"><code>analysis(rolling_origin_spy_2013_2017$splits[[1]]) %>%
do(model = lm(daily_returns ~ MKT + SMB + HML + RMW + CMA,
data = .)) %>%
tidy(model)</code></pre>
<pre><code># A tibble: 6 x 6
# Groups: asset [1]
asset term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 SPY (Intercept) 0.000579 0.000338 1.71 8.98e- 2
2 SPY MKT 0.909 0.0739 12.3 2.79e-21
3 SPY SMB -0.495 0.112 -4.43 2.52e- 5
4 SPY HML -0.609 0.208 -2.92 4.38e- 3
5 SPY RMW -0.591 0.259 -2.28 2.47e- 2
6 SPY CMA -0.395 0.206 -1.92 5.81e- 2</code></pre>
<p>Now, we will pipe into the <code>parsnip</code> scaffolding, which will allow us to quickly change to a different model and specification further down in the code.</p>
<p>Since we are running a linear regression, we first create a specification with <code>linear_reg()</code>, then set the engine with <code>set_engine("lm")</code>, and finally fit the model with <code>fit(five_factor_model, data = one of our splits)</code></p>
<pre class="r"><code>lm_model <-
linear_reg() %>%
set_engine("lm") %>%
fit(daily_returns ~ MKT + SMB + HML + RMW + CMA,
data = analysis(rolling_origin_spy_2013_2017$splits[[1]]))
lm_model </code></pre>
<pre><code>parsnip model object
Call:
stats::lm(formula = formula, data = data)
Coefficients:
(Intercept) MKT SMB HML RMW
0.0005794 0.9086303 -0.4951297 -0.6085088 -0.5910375
CMA
-0.3954515 </code></pre>
<p>Now that we’ve fit the model on our test set, let’s see how well it predicted the test set. We can use the <code>predict()</code> function and pass it the results of our <code>parnsip</code> code flow, along with the <code>assessment</code> split.</p>
<pre class="r"><code>assessment(rolling_origin_spy_2013_2017$splits[[1]]) %>%
select(returns) %>%
bind_cols(predict(lm_model,
new_data = assessment(rolling_origin_spy_2013_2017$splits[[1]])))</code></pre>
<pre><code># A tibble: 1 x 3
# Groups: asset [1]
asset returns .pred
<chr> <dbl> <dbl>
1 SPY 148. 0.00737</code></pre>
<p>That worked well, but now let’s head to a more complex model and use the <a href="https://cran.r-project.org/web/packages/ranger/ranger.pdf"><code>ranger</code></a> package as an engine for a random forest analysis.</p>
<p>To set up the ranger random forest model in <code>parsnip</code>, we first use <code>rand_forest(mode = "regression", mtry = 3, trees = 100)</code> to create the specification, <code>set_engine("ranger")</code> to set the engine as the <code>ranger</code> package, and <code>fit(daily_returns ~ MKT + SMB + HML + RMW + CMA ~ , data = analysis(rolling_origin_spy_2013_2017$splits[[1]])</code> to fit the five-factor Fama French model to the 100-day sample in our first split.</p>
<pre class="r"><code># Need to load the packages to be used as the random forest engine
library(ranger)
rand_forest(mode = "regression", mtry = 3, trees = 100) %>%
set_engine("ranger") %>%
fit(daily_returns ~ MKT + SMB + HML + RMW + CMA,
data = analysis(rolling_origin_spy_2013_2017$splits[[1]]))</code></pre>
<pre><code>parsnip model object
Ranger result
Call:
ranger::ranger(formula = formula, data = data, mtry = ~3, num.trees = ~100, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
Type: Regression
Number of trees: 100
Sample size: 100
Number of independent variables: 5
Mtry: 3
Target node size: 5
Variable importance mode: none
Splitrule: variance
OOB prediction error (MSE): 1.514654e-05
R squared (OOB): 0.6880896 </code></pre>
<p>Notice that <code>ranger</code> gives us an <code>OOB prediction error (MSE)</code> value as part of its return. <code>parsnip</code> returns to us what the underlying engine returns.</p>
<p>Now, let’s apply that random forest regression to all 1159 of our splits (recall that each split consists of 100 days of training data and one day of test data), so we can get an average RMSE. Warning: this will consume some resources on your machine and some time in your day.</p>
<p>To apply that model to our entire data set, we create a function that takes one split, passes it to our <code>parsnip</code> enabled model, and then uses the <code>predict</code> function to attempt to predict our <code>assessment</code> split. The function also allows us to specify the number of trees and the number of variables randomly sampled at each tree split, which is set with the <code>mtry</code> argument.</p>
<pre class="r"><code>ranger_rf_regress <- function(mtry = 3, trees = 5, split){
analysis_set_rf <- analysis(split)
model <-
rand_forest(mtry = mtry, trees = trees) %>%
set_engine("ranger") %>%
fit(daily_returns ~ MKT + SMB + HML + RMW + CMA, data = analysis_set_rf)
assessment_set_rf <- assessment(split)
assessment_set_rf %>%
select(date, daily_returns) %>%
mutate(.pred = unlist(predict(model, new_data = assessment_set_rf))) %>%
select(date, daily_returns, .pred)
}</code></pre>
<p>Now we want to pass it our object of 1159 splits, <code>rolling_origin_spy_2013_2017$splits</code>, and we want the function to iterate over each split. For that we turn to <code>map_df()</code> from the <code>purrr</code> package, which allows us to iterate over the data object and return a data frame. <code>map_df()</code> takes the data as an argument and our function as an argument.</p>
<pre class="r"><code>ranger_results <-
map_df(.x = rolling_origin_spy_2013_2017$splits,
~ranger_rf_regress(mtry = 3, trees = 100, split = .x))</code></pre>
<p>Here are the results. We now have 1159 predictions.</p>
<pre class="r"><code>ranger_results %>%
head()</code></pre>
<pre><code># A tibble: 6 x 4
# Groups: asset [1]
asset date daily_returns .pred
<chr> <date> <dbl> <dbl>
1 SPY 2013-05-28 0.00597 0.00583
2 SPY 2013-05-29 -0.00652 -0.00403
3 SPY 2013-05-30 0.00369 0.00658
4 SPY 2013-05-31 -0.0145 -0.0114
5 SPY 2013-06-03 0.00549 0.00119
6 SPY 2013-06-04 -0.00482 0.00202</code></pre>
<p>Notice how the date of each prediction is included since we included it in the <code>select()</code> call in our function. That will come in handy for charting later.</p>
<p>Now, we can use the <code>rmse()</code> function from <code>yardstick</code> to calculate the root mean-squared error each of our predictions (our test sets had only one observation in them because we were testing on one month, so the RMSE is not a complex calculation here, but it would be the same code pattern if we had a larger test set). We can then find the average RMSE by calling <code>summarise(avg_rmse = mean(.estimate))</code>.</p>
<pre class="r"><code>library(yardstick)
ranger_results %>%
group_by(date) %>%
rmse(daily_returns, .pred) %>%
summarise(avg_rmse = mean(.estimate))</code></pre>
<pre><code># A tibble: 1 x 1
avg_rmse
<dbl>
1 0.00253</code></pre>
<p>We have the average RMSE; let’s see if the RMSE were stable over time, first with <code>ggplot</code>.</p>
<pre class="r"><code>ranger_results %>%
group_by(date) %>%
rmse(daily_returns, .pred) %>%
ggplot(aes(x = date, y = .estimate)) +
geom_point(color = "cornflowerblue") +
labs(y = "rmse", x = "", title = "RMSE over time via Ranger RF")</code></pre>
<p><img src="/post/2019-03-13-parsnipping-fama-french_files/figure-html/unnamed-chunk-12-1.png" width="672" /></p>
<p>And with <code>highcharter</code>.</p>
<pre class="r"><code>ranger_results %>%
group_by(date) %>%
rmse(daily_returns, .pred) %>%
hchart(., hcaes(x = date, y = .estimate),
type = "point") %>%
hc_title(text = "RMSE over time via Ranger RF") %>%
hc_yAxis(title = list(text = "RMSE"))</code></pre>
<div id="htmlwidget-1" style="width:100%;height:500px;" class="highchart html-widget"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"hc_opts":{"title":{"text":"RMSE over time via Ranger RF"},"yAxis":{"title":{"text":"RMSE"},"type":"linear"},"credits":{"enabled":false},"exporting":{"enabled":false},"plotOptions":{"series":{"label":{"enabled":false},"turboThreshold":0,"showInLegend":false},"treemap":{"layoutAlgorithm":"squarified"},"scatter":{"marker":{"symbol":"circle"}}},"series":[{"group":"group","data":[{"date":"2013-05-28",".metric":"rmse",".estimator":"standard",".estimate":0.000142752717804637,"x":1369699200000,"y":0.000142752717804637},{"date":"2013-05-29",".metric":"rmse",".estimator":"standard",".estimate":0.00248144590239636,"x":1369785600000,"y":0.00248144590239636},{"date":"2013-05-30",".metric":"rmse",".estimator":"standard",".estimate":0.00289139762321627,"x":1369872000000,"y":0.00289139762321627},{"date":"2013-05-31",".metric":"rmse",".estimator":"standard",".estimate":0.00309703642561825,"x":1369958400000,"y":0.00309703642561825},{"date":"2013-06-03",".metric":"rmse",".estimator":"standard",".estimate":0.00430144561084509,"x":1370217600000,"y":0.00430144561084509},{"date":"2013-06-04",".metric":"rmse",".estimator":"standard",".estimate":0.00684300611337766,"x":1370304000000,"y":0.00684300611337766},{"date":"2013-06-05",".metric":"rmse",".estimator":"standard",".estimate":0.00291214523819374,"x":1370390400000,"y":0.00291214523819374},{"date":"2013-06-06",".metric":"rmse",".estimator":"standard",".estimate":0.00279000747041504,"x":1370476800000,"y":0.00279000747041504},{"date":"2013-06-07",".metric":"rmse",".estimator":"standard",".estimate":0.0041295175155098,"x":1370563200000,"y":0.0041295175155098},{"date":"2013-06-10",".metric":"rmse",".estimator":"standard",".estimate":0.00123362056195843,"x":1370822400000,"y":0.00123362056195843},{"date":"2013-06-11",".metric":"rmse",".estimator":"standard",".estimate":0.000260231607059154,"x":1370908800000,"y":0.000260231607059154},{"date":"2013-06-12",".metric":"rmse",".estimator":"standard",".estimate":0.00439769098328566,"x":1370995200000,"y":0.00439769098328566},{"date":"2013-06-13",".metric":"rmse",".estimator":"standard",".estimate":0.00862505756614801,"x":1371081600000,"y":0.00862505756614801},{"date":"2013-06-14",".metric":"rmse",".estimator":"standard",".estimate":0.00627871135416806,"x":1371168000000,"y":0.00627871135416806},{"date":"2013-06-17",".metric":"rmse",".estimator":"standard",".estimate":0.00139361772830066,"x":1371427200000,"y":0.00139361772830066},{"date":"2013-06-18",".metric":"rmse",".estimator":"standard",".estimate":0.00129999403832123,"x":1371513600000,"y":0.00129999403832123},{"date":"2013-06-19",".metric":"rmse",".estimator":"standard",".estimate":0.00549970257328453,"x":1371600000000,"y":0.00549970257328453},{"date":"2013-06-20",".metric":"rmse",".estimator":"standard",".estimate":0.010067877171231,"x":1371686400000,"y":0.010067877171231},{"date":"2013-06-21",".metric":"rmse",".estimator":"standard",".estimate":0.00757847732215951,"x":1371772800000,"y":0.00757847732215951},{"date":"2013-06-24",".metric":"rmse",".estimator":"standard",".estimate":0.00344560021960934,"x":1372032000000,"y":0.00344560021960934},{"date":"2013-06-25",".metric":"rmse",".estimator":"standard",".estimate":0.00111555171287498,"x":1372118400000,"y":0.00111555171287498},{"date":"2013-06-26",".metric":"rmse",".estimator":"standard",".estimate":0.00210720426205369,"x":1372204800000,"y":0.00210720426205369},{"date":"2013-06-27",".metric":"rmse",".estimator":"standard",".estimate":0.00190601194984189,"x":1372291200000,"y":0.00190601194984189},{"date":"2013-06-28",".metric":"rmse",".estimator":"standard",".estimate":0.00323689404809277,"x":1372377600000,"y":0.00323689404809277},{"date":"2013-07-01",".metric":"rmse",".estimator":"standard",".estimate":0.0015246248388892,"x":1372636800000,"y":0.0015246248388892},{"date":"2013-07-02",".metric":"rmse",".estimator":"standard",".estimate":0.00027955683504102,"x":1372723200000,"y":0.00027955683504102},{"date":"2013-07-03",".metric":"rmse",".estimator":"standard",".estimate":0.00238816133375573,"x":1372809600000,"y":0.00238816133375573},{"date":"2013-07-05",".metric":"rmse",".estimator":"standard",".estimate":0.00990653992577938,"x":1372982400000,"y":0.00990653992577938},{"date":"2013-07-08",".metric":"rmse",".estimator":"standard",".estimate":0.0027759555185396,"x":1373241600000,"y":0.0027759555185396},{"date":"2013-07-09",".metric":"rmse",".estimator":"standard",".estimate":0.00227547985103656,"x":1373328000000,"y":0.00227547985103656},{"date":"2013-07-10",".metric":"rmse",".estimator":"standard",".estimate":0.00340354669550348,"x":1373414400000,"y":0.00340354669550348},{"date":"2013-07-11",".metric":"rmse",".estimator":"standard",".estimate":0.00785244615911465,"x":1373500800000,"y":0.00785244615911465},{"date":"2013-07-12",".metric":"rmse",".estimator":"standard",".estimate":0.00172836514222019,"x":1373587200000,"y":0.00172836514222019},{"date":"2013-07-15",".metric":"rmse",".estimator":"standard",".estimate":0.000141612792237647,"x":1373846400000,"y":0.000141612792237647},{"date":"2013-07-16",".metric":"rmse",".estimator":"standard",".estimate":0.00327137485857389,"x":1373932800000,"y":0.00327137485857389},{"date":"2013-07-17",".metric":"rmse",".estimator":"standard",".estimate":0.00176641332984882,"x":1374019200000,"y":0.00176641332984882},{"date":"2013-07-18",".metric":"rmse",".estimator":"standard",".estimate":0.00154772618154754,"x":1374105600000,"y":0.00154772618154754},{"date":"2013-07-19",".metric":"rmse",".estimator":"standard",".estimate":0.00269690078023818,"x":1374192000000,"y":0.00269690078023818},{"date":"2013-07-22",".metric":"rmse",".estimator":"standard",".estimate":0.00136133061201296,"x":1374451200000,"y":0.00136133061201296},{"date":"2013-07-23",".metric":"rmse",".estimator":"standard",".estimate":0.00308872127573454,"x":1374537600000,"y":0.00308872127573454},{"date":"2013-07-24",".metric":"rmse",".estimator":"standard",".estimate":0.00195016821410643,"x":1374624000000,"y":0.00195016821410643},{"date":"2013-07-25",".metric":"rmse",".estimator":"standard",".estimate":0.000918117461552164,"x":1374710400000,"y":0.000918117461552164},{"date":"2013-07-26",".metric":"rmse",".estimator":"standard",".estimate":0.00392416647401538,"x":1374796800000,"y":0.00392416647401538},{"date":"2013-07-29",".metric":"rmse",".estimator":"standard",".estimate":0.00186499539974162,"x":1375056000000,"y":0.00186499539974162},{"date":"2013-07-30",".metric":"rmse",".estimator":"standard",".estimate":0.000704951304307538,"x":1375142400000,"y":0.000704951304307538},{"date":"2013-07-31",".metric":"rmse",".estimator":"standard",".estimate":0.000291087547639628,"x":1375228800000,"y":0.000291087547639628},{"date":"2013-08-01",".metric":"rmse",".estimator":"standard",".estimate":0.00156555762355721,"x":1375315200000,"y":0.00156555762355721},{"date":"2013-08-02",".metric":"rmse",".estimator":"standard",".estimate":0.00342466009642512,"x":1375401600000,"y":0.00342466009642512},{"date":"2013-08-05",".metric":"rmse",".estimator":"standard",".estimate":0.00141234538646521,"x":1375660800000,"y":0.00141234538646521},{"date":"2013-08-06",".metric":"rmse",".estimator":"standard",".estimate":0.00453039357456201,"x":1375747200000,"y":0.00453039357456201},{"date":"2013-08-07",".metric":"rmse",".estimator":"standard",".estimate":0.000453949337123345,"x":1375833600000,"y":0.000453949337123345},{"date":"2013-08-08",".metric":"rmse",".estimator":"standard",".estimate":0.00319715187637578,"x":1375920000000,"y":0.00319715187637578},{"date":"2013-08-09",".metric":"rmse",".estimator":"standard",".estimate":0.00421636388783291,"x":1376006400000,"y":0.00421636388783291},{"date":"2013-08-12",".metric":"rmse",".estimator":"standard",".estimate":0.000582989994890629,"x":1376265600000,"y":0.000582989994890629},{"date":"2013-08-13",".metric":"rmse",".estimator":"standard",".estimate":0.00248043179424258,"x":1376352000000,"y":0.00248043179424258},{"date":"2013-08-14",".metric":"rmse",".estimator":"standard",".estimate":0.00309464529048978,"x":1376438400000,"y":0.00309464529048978},{"date":"2013-08-15",".metric":"rmse",".estimator":"standard",".estimate":0.00409494503279893,"x":1376524800000,"y":0.00409494503279893},{"date":"2013-08-16",".metric":"rmse",".estimator":"standard",".estimate":0.00258176487593892,"x":1376611200000,"y":0.00258176487593892},{"date":"2013-08-19",".metric":"rmse",".estimator":"standard",".estimate":0.00117265718611906,"x":1376870400000,"y":0.00117265718611906},{"date":"2013-08-20",".metric":"rmse",".estimator":"standard",".estimate":0.00602758728915038,"x":1376956800000,"y":0.00602758728915038},{"date":"2013-08-21",".metric":"rmse",".estimator":"standard",".estimate":0.000830939445518749,"x":1377043200000,"y":0.000830939445518749},{"date":"2013-08-22",".metric":"rmse",".estimator":"standard",".estimate":0.00168966716232213,"x":1377129600000,"y":0.00168966716232213},{"date":"2013-08-23",".metric":"rmse",".estimator":"standard",".estimate":0.00281591810324628,"x":1377216000000,"y":0.00281591810324628},{"date":"2013-08-26",".metric":"rmse",".estimator":"standard",".estimate":0.00187026973058945,"x":1377475200000,"y":0.00187026973058945},{"date":"2013-08-27",".metric":"rmse",".estimator":"standard",".estimate":0.00291060928304423,"x":1377561600000,"y":0.00291060928304423},{"date":"2013-08-28",".metric":"rmse",".estimator":"standard",".estimate":0.00626027940262622,"x":1377648000000,"y":0.00626027940262622},{"date":"2013-08-29",".metric":"rmse",".estimator":"standard",".estimate":0.00117709798349316,"x":1377734400000,"y":0.00117709798349316},{"date":"2013-08-30",".metric":"rmse",".estimator":"standard",".estimate":0.00270900069061262,"x":1377820800000,"y":0.00270900069061262},{"date":"2013-09-03",".metric":"rmse",".estimator":"standard",".estimate":0.000892207820183886,"x":1378166400000,"y":0.000892207820183886},{"date":"2013-09-04",".metric":"rmse",".estimator":"standard",".estimate":0.00439848539962431,"x":1378252800000,"y":0.00439848539962431},{"date":"2013-09-05",".metric":"rmse",".estimator":"standard",".estimate":0.000749332096425049,"x":1378339200000,"y":0.000749332096425049},{"date":"2013-09-06",".metric":"rmse",".estimator":"standard",".estimate":0.00404723911777309,"x":1378425600000,"y":0.00404723911777309},{"date":"2013-09-09",".metric":"rmse",".estimator":"standard",".estimate":0.00470142591028648,"x":1378684800000,"y":0.00470142591028648},{"date":"2013-09-10",".metric":"rmse",".estimator":"standard",".estimate":0.000234752371109743,"x":1378771200000,"y":0.000234752371109743},{"date":"2013-09-11",".metric":"rmse",".estimator":"standard",".estimate":0.00285950641404687,"x":1378857600000,"y":0.00285950641404687},{"date":"2013-09-12",".metric":"rmse",".estimator":"standard",".estimate":0.000247670846161255,"x":1378944000000,"y":0.000247670846161255},{"date":"2013-09-13",".metric":"rmse",".estimator":"standard",".estimate":0.0015043158443795,"x":1379030400000,"y":0.0015043158443795},{"date":"2013-09-16",".metric":"rmse",".estimator":"standard",".estimate":0.00115698479726509,"x":1379289600000,"y":0.00115698479726509},{"date":"2013-09-17",".metric":"rmse",".estimator":"standard",".estimate":0.00281728468879361,"x":1379376000000,"y":0.00281728468879361},{"date":"2013-09-18",".metric":"rmse",".estimator":"standard",".estimate":0.00340383993451582,"x":1379462400000,"y":0.00340383993451582},{"date":"2013-09-19",".metric":"rmse",".estimator":"standard",".estimate":0.00572689019624601,"x":1379548800000,"y":0.00572689019624601},{"date":"2013-09-20",".metric":"rmse",".estimator":"standard",".estimate":0.00253135671003318,"x":1379635200000,"y":0.00253135671003318},{"date":"2013-09-23",".metric":"rmse",".estimator":"standard",".estimate":0.000983885632568376,"x":1379894400000,"y":0.000983885632568376},{"date":"2013-09-24",".metric":"rmse",".estimator":"standard",".estimate":0.000394017946609534,"x":1379980800000,"y":0.000394017946609534},{"date":"2013-09-25",".metric":"rmse",".estimator":"standard",".estimate":0.00276821422041246,"x":1380067200000,"y":0.00276821422041246},{"date":"2013-09-26",".metric":"rmse",".estimator":"standard",".estimate":0.00245223915591046,"x":1380153600000,"y":0.00245223915591046},{"date":"2013-09-27",".metric":"rmse",".estimator":"standard",".estimate":0.00167896835475828,"x":1380240000000,"y":0.00167896835475828},{"date":"2013-09-30",".metric":"rmse",".estimator":"standard",".estimate":0.00302581645549956,"x":1380499200000,"y":0.00302581645549956},{"date":"2013-10-01",".metric":"rmse",".estimator":"standard",".estimate":0.00133532966948982,"x":1380585600000,"y":0.00133532966948982},{"date":"2013-10-02",".metric":"rmse",".estimator":"standard",".estimate":0.000404804026220111,"x":1380672000000,"y":0.000404804026220111},{"date":"2013-10-03",".metric":"rmse",".estimator":"standard",".estimate":0.00614434885800635,"x":1380758400000,"y":0.00614434885800635},{"date":"2013-10-04",".metric":"rmse",".estimator":"standard",".estimate":0.00561509975346613,"x":1380844800000,"y":0.00561509975346613},{"date":"2013-10-07",".metric":"rmse",".estimator":"standard",".estimate":0.00423365708880972,"x":1381104000000,"y":0.00423365708880972},{"date":"2013-10-08",".metric":"rmse",".estimator":"standard",".estimate":0.000778471899940642,"x":1381190400000,"y":0.000778471899940642},{"date":"2013-10-09",".metric":"rmse",".estimator":"standard",".estimate":0.00320215175070333,"x":1381276800000,"y":0.00320215175070333},{"date":"2013-10-10",".metric":"rmse",".estimator":"standard",".estimate":0.0106384753612411,"x":1381363200000,"y":0.0106384753612411},{"date":"2013-10-11",".metric":"rmse",".estimator":"standard",".estimate":0.00171777076192885,"x":1381449600000,"y":0.00171777076192885},{"date":"2013-10-14",".metric":"rmse",".estimator":"standard",".estimate":0.0014617944493349,"x":1381708800000,"y":0.0014617944493349},{"date":"2013-10-15",".metric":"rmse",".estimator":"standard",".estimate":0.00437650246614441,"x":1381795200000,"y":0.00437650246614441},{"date":"2013-10-16",".metric":"rmse",".estimator":"standard",".estimate":0.00283682416435875,"x":1381881600000,"y":0.00283682416435875},{"date":"2013-10-17",".metric":"rmse",".estimator":"standard",".estimate":0.000458134362367081,"x":1381968000000,"y":0.000458134362367081},{"date":"2013-10-18",".metric":"rmse",".estimator":"standard",".estimate":0.000488303630816335,"x":1382054400000,"y":0.000488303630816335},{"date":"2013-10-21",".metric":"rmse",".estimator":"standard",".estimate":0.000887043567693242,"x":1382313600000,"y":0.000887043567693242},{"date":"2013-10-22",".metric":"rmse",".estimator":"standard",".estimate":0.00115608253109249,"x":1382400000000,"y":0.00115608253109249},{"date":"2013-10-23",".metric":"rmse",".estimator":"standard",".estimate":0.00235818360568705,"x":1382486400000,"y":0.00235818360568705},{"date":"2013-10-24",".metric":"rmse",".estimator":"standard",".estimate":0.00106066455567277,"x":1382572800000,"y":0.00106066455567277},{"date":"2013-10-25",".metric":"rmse",".estimator":"standard",".estimate":0.00660056619588438,"x":1382659200000,"y":0.00660056619588438},{"date":"2013-10-28",".metric":"rmse",".estimator":"standard",".estimate":0.00253966857103552,"x":1382918400000,"y":0.00253966857103552},{"date":"2013-10-29",".metric":"rmse",".estimator":"standard",".estimate":0.000708643111683541,"x":1383004800000,"y":0.000708643111683541},{"date":"2013-10-30",".metric":"rmse",".estimator":"standard",".estimate":0.00459062122339914,"x":1383091200000,"y":0.00459062122339914},{"date":"2013-10-31",".metric":"rmse",".estimator":"standard",".estimate":0.00242759731143618,"x":1383177600000,"y":0.00242759731143618},{"date":"2013-11-01",".metric":"rmse",".estimator":"standard",".estimate":0.00119101679630836,"x":1383264000000,"y":0.00119101679630836},{"date":"2013-11-04",".metric":"rmse",".estimator":"standard",".estimate":0.00158715544003218,"x":1383523200000,"y":0.00158715544003218},{"date":"2013-11-05",".metric":"rmse",".estimator":"standard",".estimate":0.000707140242553759,"x":1383609600000,"y":0.000707140242553759},{"date":"2013-11-06",".metric":"rmse",".estimator":"standard",".estimate":0.000869667678226866,"x":1383696000000,"y":0.000869667678226866},{"date":"2013-11-07",".metric":"rmse",".estimator":"standard",".estimate":3.41997301053047e-05,"x":1383782400000,"y":3.41997301053047e-05},{"date":"2013-11-08",".metric":"rmse",".estimator":"standard",".estimate":0.00774212491252219,"x":1383868800000,"y":0.00774212491252219},{"date":"2013-11-11",".metric":"rmse",".estimator":"standard",".estimate":0.00383346982479243,"x":1384128000000,"y":0.00383346982479243},{"date":"2013-11-12",".metric":"rmse",".estimator":"standard",".estimate":0.000542045432367102,"x":1384214400000,"y":0.000542045432367102},{"date":"2013-11-13",".metric":"rmse",".estimator":"standard",".estimate":0.00477472845551608,"x":1384300800000,"y":0.00477472845551608},{"date":"2013-11-14",".metric":"rmse",".estimator":"standard",".estimate":7.97778727704276e-05,"x":1384387200000,"y":7.97778727704276e-05},{"date":"2013-11-15",".metric":"rmse",".estimator":"standard",".estimate":0.000100208301392964,"x":1384473600000,"y":0.000100208301392964},{"date":"2013-11-18",".metric":"rmse",".estimator":"standard",".estimate":0.00305915006399908,"x":1384732800000,"y":0.00305915006399908},{"date":"2013-11-19",".metric":"rmse",".estimator":"standard",".estimate":0.00178020971267786,"x":1384819200000,"y":0.00178020971267786},{"date":"2013-11-20",".metric":"rmse",".estimator":"standard",".estimate":0.00019925260780281,"x":1384905600000,"y":0.00019925260780281},{"date":"2013-11-21",".metric":"rmse",".estimator":"standard",".estimate":0.00406160223566766,"x":1384992000000,"y":0.00406160223566766},{"date":"2013-11-22",".metric":"rmse",".estimator":"standard",".estimate":0.000723442153834573,"x":1385078400000,"y":0.000723442153834573},{"date":"2013-11-25",".metric":"rmse",".estimator":"standard",".estimate":0.000328744423454753,"x":1385337600000,"y":0.000328744423454753},{"date":"2013-11-26",".metric":"rmse",".estimator":"standard",".estimate":0.00265982636503166,"x":1385424000000,"y":0.00265982636503166},{"date":"2013-11-27",".metric":"rmse",".estimator":"standard",".estimate":0.00128358907179459,"x":1385510400000,"y":0.00128358907179459},{"date":"2013-11-29",".metric":"rmse",".estimator":"standard",".estimate":0.00199764651921265,"x":1385683200000,"y":0.00199764651921265},{"date":"2013-12-02",".metric":"rmse",".estimator":"standard",".estimate":0.00206032785364643,"x":1385942400000,"y":0.00206032785364643},{"date":"2013-12-03",".metric":"rmse",".estimator":"standard",".estimate":0.000479576111824037,"x":1386028800000,"y":0.000479576111824037},{"date":"2013-12-04",".metric":"rmse",".estimator":"standard",".estimate":0.00218374080985527,"x":1386115200000,"y":0.00218374080985527},{"date":"2013-12-05",".metric":"rmse",".estimator":"standard",".estimate":0.000704660423616735,"x":1386201600000,"y":0.000704660423616735},{"date":"2013-12-06",".metric":"rmse",".estimator":"standard",".estimate":0.000797892320329585,"x":1386288000000,"y":0.000797892320329585},{"date":"2013-12-09",".metric":"rmse",".estimator":"standard",".estimate":0.000848881324378698,"x":1386547200000,"y":0.000848881324378698},{"date":"2013-12-10",".metric":"rmse",".estimator":"standard",".estimate":0.000121244794397763,"x":1386633600000,"y":0.000121244794397763},{"date":"2013-12-11",".metric":"rmse",".estimator":"standard",".estimate":0.00157056881431764,"x":1386720000000,"y":0.00157056881431764},{"date":"2013-12-12",".metric":"rmse",".estimator":"standard",".estimate":0.000762878609040502,"x":1386806400000,"y":0.000762878609040502},{"date":"2013-12-13",".metric":"rmse",".estimator":"standard",".estimate":0.000262529678308959,"x":1386892800000,"y":0.000262529678308959},{"date":"2013-12-16",".metric":"rmse",".estimator":"standard",".estimate":0.000656239523109479,"x":1387152000000,"y":0.000656239523109479},{"date":"2013-12-17",".metric":"rmse",".estimator":"standard",".estimate":0.000472490575181525,"x":1387238400000,"y":0.000472490575181525},{"date":"2013-12-18",".metric":"rmse",".estimator":"standard",".estimate":0.00585553726209109,"x":1387324800000,"y":0.00585553726209109},{"date":"2013-12-19",".metric":"rmse",".estimator":"standard",".estimate":0.00640125999372666,"x":1387411200000,"y":0.00640125999372666},{"date":"2013-12-20",".metric":"rmse",".estimator":"standard",".estimate":0.000492841717857647,"x":1387497600000,"y":0.000492841717857647},{"date":"2013-12-23",".metric":"rmse",".estimator":"standard",".estimate":0.00010882949525809,"x":1387756800000,"y":0.00010882949525809},{"date":"2013-12-24",".metric":"rmse",".estimator":"standard",".estimate":0.000850285446368634,"x":1387843200000,"y":0.000850285446368634},{"date":"2013-12-26",".metric":"rmse",".estimator":"standard",".estimate":0.00229071411562669,"x":1388016000000,"y":0.00229071411562669},{"date":"2013-12-27",".metric":"rmse",".estimator":"standard",".estimate":0.00361979126537698,"x":1388102400000,"y":0.00361979126537698},{"date":"2013-12-30",".metric":"rmse",".estimator":"standard",".estimate":0.00330600930550152,"x":1388361600000,"y":0.00330600930550152},{"date":"2013-12-31",".metric":"rmse",".estimator":"standard",".estimate":0.00151186957064735,"x":1388448000000,"y":0.00151186957064735},{"date":"2014-01-02",".metric":"rmse",".estimator":"standard",".estimate":0.00148733526212147,"x":1388620800000,"y":0.00148733526212147},{"date":"2014-01-03",".metric":"rmse",".estimator":"standard",".estimate":0.000818930753694061,"x":1388707200000,"y":0.000818930753694061},{"date":"2014-01-06",".metric":"rmse",".estimator":"standard",".estimate":0.00125900224553879,"x":1388966400000,"y":0.00125900224553879},{"date":"2014-01-07",".metric":"rmse",".estimator":"standard",".estimate":0.00253206351383783,"x":1389052800000,"y":0.00253206351383783},{"date":"2014-01-08",".metric":"rmse",".estimator":"standard",".estimate":0.00446864942376773,"x":1389139200000,"y":0.00446864942376773},{"date":"2014-01-09",".metric":"rmse",".estimator":"standard",".estimate":0.000571434050674663,"x":1389225600000,"y":0.000571434050674663},{"date":"2014-01-10",".metric":"rmse",".estimator":"standard",".estimate":0.0011876363523322,"x":1389312000000,"y":0.0011876363523322},{"date":"2014-01-13",".metric":"rmse",".estimator":"standard",".estimate":0.00920096136928017,"x":1389571200000,"y":0.00920096136928017},{"date":"2014-01-14",".metric":"rmse",".estimator":"standard",".estimate":0.00415825572747359,"x":1389657600000,"y":0.00415825572747359},{"date":"2014-01-15",".metric":"rmse",".estimator":"standard",".estimate":0.00151620167562148,"x":1389744000000,"y":0.00151620167562148},{"date":"2014-01-16",".metric":"rmse",".estimator":"standard",".estimate":0.000422788504755945,"x":1389830400000,"y":0.000422788504755945},{"date":"2014-01-17",".metric":"rmse",".estimator":"standard",".estimate":0.00169462330330027,"x":1389916800000,"y":0.00169462330330027},{"date":"2014-01-21",".metric":"rmse",".estimator":"standard",".estimate":0.000125594486346043,"x":1390262400000,"y":0.000125594486346043},{"date":"2014-01-22",".metric":"rmse",".estimator":"standard",".estimate":0.00333538903158534,"x":1390348800000,"y":0.00333538903158534},{"date":"2014-01-23",".metric":"rmse",".estimator":"standard",".estimate":0.00280448815225807,"x":1390435200000,"y":0.00280448815225807},{"date":"2014-01-24",".metric":"rmse",".estimator":"standard",".estimate":0.0114439499820774,"x":1390521600000,"y":0.0114439499820774},{"date":"2014-01-27",".metric":"rmse",".estimator":"standard",".estimate":0.000493179949841391,"x":1390780800000,"y":0.000493179949841391},{"date":"2014-01-28",".metric":"rmse",".estimator":"standard",".estimate":0.00126244587670463,"x":1390867200000,"y":0.00126244587670463},{"date":"2014-01-29",".metric":"rmse",".estimator":"standard",".estimate":0.00510795794284275,"x":1390953600000,"y":0.00510795794284275},{"date":"2014-01-30",".metric":"rmse",".estimator":"standard",".estimate":0.00201308629063307,"x":1391040000000,"y":0.00201308629063307},{"date":"2014-01-31",".metric":"rmse",".estimator":"standard",".estimate":0.000231244217711399,"x":1391126400000,"y":0.000231244217711399},{"date":"2014-02-03",".metric":"rmse",".estimator":"standard",".estimate":0.0115634039089308,"x":1391385600000,"y":0.0115634039089308},{"date":"2014-02-04",".metric":"rmse",".estimator":"standard",".estimate":0.00406486240510587,"x":1391472000000,"y":0.00406486240510587},{"date":"2014-02-05",".metric":"rmse",".estimator":"standard",".estimate":0.0047391731187252,"x":1391558400000,"y":0.0047391731187252},{"date":"2014-02-06",".metric":"rmse",".estimator":"standard",".estimate":0.00546190892666902,"x":1391644800000,"y":0.00546190892666902},{"date":"2014-02-07",".metric":"rmse",".estimator":"standard",".estimate":0.00260291637432031,"x":1391731200000,"y":0.00260291637432031},{"date":"2014-02-10",".metric":"rmse",".estimator":"standard",".estimate":0.00129593547982691,"x":1391990400000,"y":0.00129593547982691},{"date":"2014-02-11",".metric":"rmse",".estimator":"standard",".estimate":0.000580525462518246,"x":1392076800000,"y":0.000580525462518246},{"date":"2014-02-12",".metric":"rmse",".estimator":"standard",".estimate":0.00246574780588963,"x":1392163200000,"y":0.00246574780588963},{"date":"2014-02-13",".metric":"rmse",".estimator":"standard",".estimate":0.00105519704792631,"x":1392249600000,"y":0.00105519704792631},{"date":"2014-02-14",".metric":"rmse",".estimator":"standard",".estimate":0.00236393646847481,"x":1392336000000,"y":0.00236393646847481},{"date":"2014-02-18",".metric":"rmse",".estimator":"standard",".estimate":0.00361555311954361,"x":1392681600000,"y":0.00361555311954361},{"date":"2014-02-19",".metric":"rmse",".estimator":"standard",".estimate":0.00202325532716078,"x":1392768000000,"y":0.00202325532716078},{"date":"2014-02-20",".metric":"rmse",".estimator":"standard",".estimate":0.00381074086602401,"x":1392854400000,"y":0.00381074086602401},{"date":"2014-02-21",".metric":"rmse",".estimator":"standard",".estimate":0.00231301823550689,"x":1392940800000,"y":0.00231301823550689},{"date":"2014-02-24",".metric":"rmse",".estimator":"standard",".estimate":3.89640725640683e-05,"x":1393200000000,"y":3.89640725640683e-05},{"date":"2014-02-25",".metric":"rmse",".estimator":"standard",".estimate":0.000476087731678889,"x":1393286400000,"y":0.000476087731678889},{"date":"2014-02-26",".metric":"rmse",".estimator":"standard",".estimate":0.00243442029091147,"x":1393372800000,"y":0.00243442029091147},{"date":"2014-02-27",".metric":"rmse",".estimator":"standard",".estimate":0.00178820676495497,"x":1393459200000,"y":0.00178820676495497},{"date":"2014-02-28",".metric":"rmse",".estimator":"standard",".estimate":1.08820158253868e-05,"x":1393545600000,"y":1.08820158253868e-05},{"date":"2014-03-03",".metric":"rmse",".estimator":"standard",".estimate":0.0013276177154797,"x":1393804800000,"y":0.0013276177154797},{"date":"2014-03-04",".metric":"rmse",".estimator":"standard",".estimate":0.00643113277773242,"x":1393891200000,"y":0.00643113277773242},{"date":"2014-03-05",".metric":"rmse",".estimator":"standard",".estimate":0.00016254202236877,"x":1393977600000,"y":0.00016254202236877},{"date":"2014-03-06",".metric":"rmse",".estimator":"standard",".estimate":0.00230278019254107,"x":1394064000000,"y":0.00230278019254107},{"date":"2014-03-07",".metric":"rmse",".estimator":"standard",".estimate":0.00350509104329834,"x":1394150400000,"y":0.00350509104329834},{"date":"2014-03-10",".metric":"rmse",".estimator":"standard",".estimate":0.00237167859360072,"x":1394409600000,"y":0.00237167859360072},{"date":"2014-03-11",".metric":"rmse",".estimator":"standard",".estimate":0.000504901343541518,"x":1394496000000,"y":0.000504901343541518},{"date":"2014-03-12",".metric":"rmse",".estimator":"standard",".estimate":0.00233872525911163,"x":1394582400000,"y":0.00233872525911163},{"date":"2014-03-13",".metric":"rmse",".estimator":"standard",".estimate":0.00128850236022677,"x":1394668800000,"y":0.00128850236022677},{"date":"2014-03-14",".metric":"rmse",".estimator":"standard",".estimate":0.00138860484329246,"x":1394755200000,"y":0.00138860484329246},{"date":"2014-03-17",".metric":"rmse",".estimator":"standard",".estimate":0.000348630426874252,"x":1395014400000,"y":0.000348630426874252},{"date":"2014-03-18",".metric":"rmse",".estimator":"standard",".estimate":0.00171305593536281,"x":1395100800000,"y":0.00171305593536281},{"date":"2014-03-19",".metric":"rmse",".estimator":"standard",".estimate":0.00103422560544747,"x":1395187200000,"y":0.00103422560544747},{"date":"2014-03-20",".metric":"rmse",".estimator":"standard",".estimate":0.00244948500910164,"x":1395273600000,"y":0.00244948500910164},{"date":"2014-03-21",".metric":"rmse",".estimator":"standard",".estimate":0.00339788499753144,"x":1395360000000,"y":0.00339788499753144},{"date":"2014-03-24",".metric":"rmse",".estimator":"standard",".estimate":0.00130675471112893,"x":1395619200000,"y":0.00130675471112893},{"date":"2014-03-25",".metric":"rmse",".estimator":"standard",".estimate":0.00213675190692013,"x":1395705600000,"y":0.00213675190692013},{"date":"2014-03-26",".metric":"rmse",".estimator":"standard",".estimate":0.00466893619476399,"x":1395792000000,"y":0.00466893619476399},{"date":"2014-03-27",".metric":"rmse",".estimator":"standard",".estimate":0.00165591295288144,"x":1395878400000,"y":0.00165591295288144},{"date":"2014-03-28",".metric":"rmse",".estimator":"standard",".estimate":0.0022503861609131,"x":1395964800000,"y":0.0022503861609131},{"date":"2014-03-31",".metric":"rmse",".estimator":"standard",".estimate":0.00144421796473591,"x":1396224000000,"y":0.00144421796473591},{"date":"2014-04-01",".metric":"rmse",".estimator":"standard",".estimate":0.000293571879672249,"x":1396310400000,"y":0.000293571879672249},{"date":"2014-04-02",".metric":"rmse",".estimator":"standard",".estimate":0.000921277632247967,"x":1396396800000,"y":0.000921277632247967},{"date":"2014-04-03",".metric":"rmse",".estimator":"standard",".estimate":0.0010656539825164,"x":1396483200000,"y":0.0010656539825164},{"date":"2014-04-04",".metric":"rmse",".estimator":"standard",".estimate":0.00618316301989382,"x":1396569600000,"y":0.00618316301989382},{"date":"2014-04-07",".metric":"rmse",".estimator":"standard",".estimate":0.00410145020592395,"x":1396828800000,"y":0.00410145020592395},{"date":"2014-04-08",".metric":"rmse",".estimator":"standard",".estimate":0.000744359375515803,"x":1396915200000,"y":0.000744359375515803},{"date":"2014-04-09",".metric":"rmse",".estimator":"standard",".estimate":0.00371180214554768,"x":1397001600000,"y":0.00371180214554768},{"date":"2014-04-10",".metric":"rmse",".estimator":"standard",".estimate":0.00955218910528874,"x":1397088000000,"y":0.00955218910528874},{"date":"2014-04-11",".metric":"rmse",".estimator":"standard",".estimate":0.000757652870459674,"x":1397174400000,"y":0.000757652870459674},{"date":"2014-04-14",".metric":"rmse",".estimator":"standard",".estimate":0.00231523494562687,"x":1397433600000,"y":0.00231523494562687},{"date":"2014-04-15",".metric":"rmse",".estimator":"standard",".estimate":0.00502091291754025,"x":1397520000000,"y":0.00502091291754025},{"date":"2014-04-16",".metric":"rmse",".estimator":"standard",".estimate":0.00027389125520713,"x":1397606400000,"y":0.00027389125520713},{"date":"2014-04-17",".metric":"rmse",".estimator":"standard",".estimate":0.00043409550268547,"x":1397692800000,"y":0.00043409550268547},{"date":"2014-04-21",".metric":"rmse",".estimator":"standard",".estimate":0.000808809667989553,"x":1398038400000,"y":0.000808809667989553},{"date":"2014-04-22",".metric":"rmse",".estimator":"standard",".estimate":0.0013365075844831,"x":1398124800000,"y":0.0013365075844831},{"date":"2014-04-23",".metric":"rmse",".estimator":"standard",".estimate":0.000424163460882393,"x":1398211200000,"y":0.000424163460882393},{"date":"2014-04-24",".metric":"rmse",".estimator":"standard",".estimate":0.00150007335746622,"x":1398297600000,"y":0.00150007335746622},{"date":"2014-04-25",".metric":"rmse",".estimator":"standard",".estimate":0.00156559003191409,"x":1398384000000,"y":0.00156559003191409},{"date":"2014-04-28",".metric":"rmse",".estimator":"standard",".estimate":0.0028638148816032,"x":1398643200000,"y":0.0028638148816032},{"date":"2014-04-29",".metric":"rmse",".estimator":"standard",".estimate":0.00170800478467133,"x":1398729600000,"y":0.00170800478467133},{"date":"2014-04-30",".metric":"rmse",".estimator":"standard",".estimate":0.00105875020917615,"x":1398816000000,"y":0.00105875020917615},{"date":"2014-05-01",".metric":"rmse",".estimator":"standard",".estimate":0.00389061926590377,"x":1398902400000,"y":0.00389061926590377},{"date":"2014-05-02",".metric":"rmse",".estimator":"standard",".estimate":0.000875771905077209,"x":1398988800000,"y":0.000875771905077209},{"date":"2014-05-05",".metric":"rmse",".estimator":"standard",".estimate":0.000302082593616106,"x":1399248000000,"y":0.000302082593616106},{"date":"2014-05-06",".metric":"rmse",".estimator":"standard",".estimate":0.00377581232614828,"x":1399334400000,"y":0.00377581232614828},{"date":"2014-05-07",".metric":"rmse",".estimator":"standard",".estimate":0.00432694148901447,"x":1399420800000,"y":0.00432694148901447},{"date":"2014-05-08",".metric":"rmse",".estimator":"standard",".estimate":0.00541476252741813,"x":1399507200000,"y":0.00541476252741813},{"date":"2014-05-09",".metric":"rmse",".estimator":"standard",".estimate":0.00130666361160008,"x":1399593600000,"y":0.00130666361160008},{"date":"2014-05-12",".metric":"rmse",".estimator":"standard",".estimate":0.00106690615564849,"x":1399852800000,"y":0.00106690615564849},{"date":"2014-05-13",".metric":"rmse",".estimator":"standard",".estimate":0.00352556126676091,"x":1399939200000,"y":0.00352556126676091},{"date":"2014-05-14",".metric":"rmse",".estimator":"standard",".estimate":0.00139230203117092,"x":1400025600000,"y":0.00139230203117092},{"date":"2014-05-15",".metric":"rmse",".estimator":"standard",".estimate":0.00263900248643558,"x":1400112000000,"y":0.00263900248643558},{"date":"2014-05-16",".metric":"rmse",".estimator":"standard",".estimate":0.00282314343433784,"x":1400198400000,"y":0.00282314343433784},{"date":"2014-05-19",".metric":"rmse",".estimator":"standard",".estimate":0.000290963538759818,"x":1400457600000,"y":0.000290963538759818},{"date":"2014-05-20",".metric":"rmse",".estimator":"standard",".estimate":0.00121348207373204,"x":1400544000000,"y":0.00121348207373204},{"date":"2014-05-21",".metric":"rmse",".estimator":"standard",".estimate":0.00356217827640926,"x":1400630400000,"y":0.00356217827640926},{"date":"2014-05-22",".metric":"rmse",".estimator":"standard",".estimate":0.000775913446580488,"x":1400716800000,"y":0.000775913446580488},{"date":"2014-05-23",".metric":"rmse",".estimator":"standard",".estimate":0.00240507902895481,"x":1400803200000,"y":0.00240507902895481},{"date":"2014-05-27",".metric":"rmse",".estimator":"standard",".estimate":0.00359690871245207,"x":1401148800000,"y":0.00359690871245207},{"date":"2014-05-28",".metric":"rmse",".estimator":"standard",".estimate":0.000446458811152914,"x":1401235200000,"y":0.000446458811152914},{"date":"2014-05-29",".metric":"rmse",".estimator":"standard",".estimate":0.00134881889143732,"x":1401321600000,"y":0.00134881889143732},{"date":"2014-05-30",".metric":"rmse",".estimator":"standard",".estimate":4.37586477533163e-05,"x":1401408000000,"y":4.37586477533163e-05},{"date":"2014-06-02",".metric":"rmse",".estimator":"standard",".estimate":6.62895970695884e-05,"x":1401667200000,"y":6.62895970695884e-05},{"date":"2014-06-03",".metric":"rmse",".estimator":"standard",".estimate":0.00124667880122449,"x":1401753600000,"y":0.00124667880122449},{"date":"2014-06-04",".metric":"rmse",".estimator":"standard",".estimate":0.000486938615441381,"x":1401840000000,"y":0.000486938615441381},{"date":"2014-06-05",".metric":"rmse",".estimator":"standard",".estimate":0.000855713483865092,"x":1401926400000,"y":0.000855713483865092},{"date":"2014-06-06",".metric":"rmse",".estimator":"standard",".estimate":0.00196057510424582,"x":1402012800000,"y":0.00196057510424582},{"date":"2014-06-09",".metric":"rmse",".estimator":"standard",".estimate":0.000917943445872296,"x":1402272000000,"y":0.000917943445872296},{"date":"2014-06-10",".metric":"rmse",".estimator":"standard",".estimate":0.00176967951524932,"x":1402358400000,"y":0.00176967951524932},{"date":"2014-06-11",".metric":"rmse",".estimator":"standard",".estimate":0.00282423404314993,"x":1402444800000,"y":0.00282423404314993},{"date":"2014-06-12",".metric":"rmse",".estimator":"standard",".estimate":0.00483068423246947,"x":1402531200000,"y":0.00483068423246947},{"date":"2014-06-13",".metric":"rmse",".estimator":"standard",".estimate":0.0019474720610407,"x":1402617600000,"y":0.0019474720610407},{"date":"2014-06-16",".metric":"rmse",".estimator":"standard",".estimate":0.000259047311752497,"x":1402876800000,"y":0.000259047311752497},{"date":"2014-06-17",".metric":"rmse",".estimator":"standard",".estimate":0.000874624755127248,"x":1402963200000,"y":0.000874624755127248},{"date":"2014-06-18",".metric":"rmse",".estimator":"standard",".estimate":0.00162835712315407,"x":1403049600000,"y":0.00162835712315407},{"date":"2014-06-19",".metric":"rmse",".estimator":"standard",".estimate":0.00283043809754556,"x":1403136000000,"y":0.00283043809754556},{"date":"2014-06-20",".metric":"rmse",".estimator":"standard",".estimate":0.000730178279525006,"x":1403222400000,"y":0.000730178279525006},{"date":"2014-06-23",".metric":"rmse",".estimator":"standard",".estimate":0.000137424624938349,"x":1403481600000,"y":0.000137424624938349},{"date":"2014-06-24",".metric":"rmse",".estimator":"standard",".estimate":0.000547485209905004,"x":1403568000000,"y":0.000547485209905004},{"date":"2014-06-25",".metric":"rmse",".estimator":"standard",".estimate":0.00327809240197013,"x":1403654400000,"y":0.00327809240197013},{"date":"2014-06-26",".metric":"rmse",".estimator":"standard",".estimate":0.00152343341502415,"x":1403740800000,"y":0.00152343341502415},{"date":"2014-06-27",".metric":"rmse",".estimator":"standard",".estimate":0.00135515261171915,"x":1403827200000,"y":0.00135515261171915},{"date":"2014-06-30",".metric":"rmse",".estimator":"standard",".estimate":0.00160195578294905,"x":1404086400000,"y":0.00160195578294905},{"date":"2014-07-01",".metric":"rmse",".estimator":"standard",".estimate":0.000233602083369974,"x":1404172800000,"y":0.000233602083369974},{"date":"2014-07-02",".metric":"rmse",".estimator":"standard",".estimate":0.00174962461912607,"x":1404259200000,"y":0.00174962461912607},{"date":"2014-07-03",".metric":"rmse",".estimator":"standard",".estimate":0.000319944598172364,"x":1404345600000,"y":0.000319944598172364},{"date":"2014-07-07",".metric":"rmse",".estimator":"standard",".estimate":0.00365706027220958,"x":1404691200000,"y":0.00365706027220958},{"date":"2014-07-08",".metric":"rmse",".estimator":"standard",".estimate":0.0061451999312466,"x":1404777600000,"y":0.0061451999312466},{"date":"2014-07-09",".metric":"rmse",".estimator":"standard",".estimate":0.0008333097331233,"x":1404864000000,"y":0.0008333097331233},{"date":"2014-07-10",".metric":"rmse",".estimator":"standard",".estimate":0.000942594891875229,"x":1404950400000,"y":0.000942594891875229},{"date":"2014-07-11",".metric":"rmse",".estimator":"standard",".estimate":0.000882323784889272,"x":1405036800000,"y":0.000882323784889272},{"date":"2014-07-14",".metric":"rmse",".estimator":"standard",".estimate":0.00115369579766836,"x":1405296000000,"y":0.00115369579766836},{"date":"2014-07-15",".metric":"rmse",".estimator":"standard",".estimate":0.00258036734617897,"x":1405382400000,"y":0.00258036734617897},{"date":"2014-07-16",".metric":"rmse",".estimator":"standard",".estimate":0.000210557689839008,"x":1405468800000,"y":0.000210557689839008},{"date":"2014-07-17",".metric":"rmse",".estimator":"standard",".estimate":0.00307865459624111,"x":1405555200000,"y":0.00307865459624111},{"date":"2014-07-18",".metric":"rmse",".estimator":"standard",".estimate":0.00600957581655225,"x":1405641600000,"y":0.00600957581655225},{"date":"2014-07-21",".metric":"rmse",".estimator":"standard",".estimate":0.000141034509552021,"x":1405900800000,"y":0.000141034509552021},{"date":"2014-07-22",".metric":"rmse",".estimator":"standard",".estimate":0.00157316429465122,"x":1405987200000,"y":0.00157316429465122},{"date":"2014-07-23",".metric":"rmse",".estimator":"standard",".estimate":5.79389393633141e-05,"x":1406073600000,"y":5.79389393633141e-05},{"date":"2014-07-24",".metric":"rmse",".estimator":"standard",".estimate":0.00298640090271562,"x":1406160000000,"y":0.00298640090271562},{"date":"2014-07-25",".metric":"rmse",".estimator":"standard",".estimate":0.000846890598854637,"x":1406246400000,"y":0.000846890598854637},{"date":"2014-07-28",".metric":"rmse",".estimator":"standard",".estimate":0.000486268231837349,"x":1406505600000,"y":0.000486268231837349},{"date":"2014-07-29",".metric":"rmse",".estimator":"standard",".estimate":0.00108729136628571,"x":1406592000000,"y":0.00108729136628571},{"date":"2014-07-30",".metric":"rmse",".estimator":"standard",".estimate":0.00240110003602132,"x":1406678400000,"y":0.00240110003602132},{"date":"2014-07-31",".metric":"rmse",".estimator":"standard",".estimate":0.00886385990974978,"x":1406764800000,"y":0.00886385990974978},{"date":"2014-08-01",".metric":"rmse",".estimator":"standard",".estimate":0.00130950662628627,"x":1406851200000,"y":0.00130950662628627},{"date":"2014-08-04",".metric":"rmse",".estimator":"standard",".estimate":0.00414144612909378,"x":1407110400000,"y":0.00414144612909378},{"date":"2014-08-05",".metric":"rmse",".estimator":"standard",".estimate":0.00371691255412742,"x":1407196800000,"y":0.00371691255412742},{"date":"2014-08-06",".metric":"rmse",".estimator":"standard",".estimate":0.00349694714381284,"x":1407283200000,"y":0.00349694714381284},{"date":"2014-08-07",".metric":"rmse",".estimator":"standard",".estimate":0.00103639681438946,"x":1407369600000,"y":0.00103639681438946},{"date":"2014-08-08",".metric":"rmse",".estimator":"standard",".estimate":0.00819731088788718,"x":1407456000000,"y":0.00819731088788718},{"date":"2014-08-11",".metric":"rmse",".estimator":"standard",".estimate":0.00320991418289113,"x":1407715200000,"y":0.00320991418289113},{"date":"2014-08-12",".metric":"rmse",".estimator":"standard",".estimate":0.00183064761761298,"x":1407801600000,"y":0.00183064761761298},{"date":"2014-08-13",".metric":"rmse",".estimator":"standard",".estimate":0.000796963386906422,"x":1407888000000,"y":0.000796963386906422},{"date":"2014-08-14",".metric":"rmse",".estimator":"standard",".estimate":0.000597107731675361,"x":1407974400000,"y":0.000597107731675361},{"date":"2014-08-15",".metric":"rmse",".estimator":"standard",".estimate":0.000147625995064866,"x":1408060800000,"y":0.000147625995064866},{"date":"2014-08-18",".metric":"rmse",".estimator":"standard",".estimate":0.00206935067639825,"x":1408320000000,"y":0.00206935067639825},{"date":"2014-08-19",".metric":"rmse",".estimator":"standard",".estimate":0.000960516267774979,"x":1408406400000,"y":0.000960516267774979},{"date":"2014-08-20",".metric":"rmse",".estimator":"standard",".estimate":0.00164247712046299,"x":1408492800000,"y":0.00164247712046299},{"date":"2014-08-21",".metric":"rmse",".estimator":"standard",".estimate":0.00137361399566395,"x":1408579200000,"y":0.00137361399566395},{"date":"2014-08-22",".metric":"rmse",".estimator":"standard",".estimate":0.00203472779931931,"x":1408665600000,"y":0.00203472779931931},{"date":"2014-08-25",".metric":"rmse",".estimator":"standard",".estimate":0.00119471810723019,"x":1408924800000,"y":0.00119471810723019},{"date":"2014-08-26",".metric":"rmse",".estimator":"standard",".estimate":0.00177017943673564,"x":1409011200000,"y":0.00177017943673564},{"date":"2014-08-27",".metric":"rmse",".estimator":"standard",".estimate":0.00167361871840368,"x":1409097600000,"y":0.00167361871840368},{"date":"2014-08-28",".metric":"rmse",".estimator":"standard",".estimate":0.00415390822110768,"x":1409184000000,"y":0.00415390822110768},{"date":"2014-08-29",".metric":"rmse",".estimator":"standard",".estimate":0.000553509892759291,"x":1409270400000,"y":0.000553509892759291},{"date":"2014-09-02",".metric":"rmse",".estimator":"standard",".estimate":0.0028580303501736,"x":1409616000000,"y":0.0028580303501736},{"date":"2014-09-03",".metric":"rmse",".estimator":"standard",".estimate":0.00441100910006302,"x":1409702400000,"y":0.00441100910006302},{"date":"2014-09-04",".metric":"rmse",".estimator":"standard",".estimate":0.000231429680361579,"x":1409788800000,"y":0.000231429680361579},{"date":"2014-09-05",".metric":"rmse",".estimator":"standard",".estimate":0.00276346914628592,"x":1409875200000,"y":0.00276346914628592},{"date":"2014-09-08",".metric":"rmse",".estimator":"standard",".estimate":0.00034802932604793,"x":1410134400000,"y":0.00034802932604793},{"date":"2014-09-09",".metric":"rmse",".estimator":"standard",".estimate":0.00233804117778962,"x":1410220800000,"y":0.00233804117778962},{"date":"2014-09-10",".metric":"rmse",".estimator":"standard",".estimate":0.00178648151267642,"x":1410307200000,"y":0.00178648151267642},{"date":"2014-09-11",".metric":"rmse",".estimator":"standard",".estimate":0.000721554836218572,"x":1410393600000,"y":0.000721554836218572},{"date":"2014-09-12",".metric":"rmse",".estimator":"standard",".estimate":0.00455760000316874,"x":1410480000000,"y":0.00455760000316874},{"date":"2014-09-15",".metric":"rmse",".estimator":"standard",".estimate":0.00350688868934713,"x":1410739200000,"y":0.00350688868934713},{"date":"2014-09-16",".metric":"rmse",".estimator":"standard",".estimate":0.00422725297543705,"x":1410825600000,"y":0.00422725297543705},{"date":"2014-09-17",".metric":"rmse",".estimator":"standard",".estimate":0.00281860847561002,"x":1410912000000,"y":0.00281860847561002},{"date":"2014-09-18",".metric":"rmse",".estimator":"standard",".estimate":0.00147408592681244,"x":1410998400000,"y":0.00147408592681244},{"date":"2014-09-19",".metric":"rmse",".estimator":"standard",".estimate":0.00215655991310362,"x":1411084800000,"y":0.00215655991310362},{"date":"2014-09-22",".metric":"rmse",".estimator":"standard",".estimate":0.00218257063330211,"x":1411344000000,"y":0.00218257063330211},{"date":"2014-09-23",".metric":"rmse",".estimator":"standard",".estimate":0.00060948260244335,"x":1411430400000,"y":0.00060948260244335},{"date":"2014-09-24",".metric":"rmse",".estimator":"standard",".estimate":0.00167606922803859,"x":1411516800000,"y":0.00167606922803859},{"date":"2014-09-25",".metric":"rmse",".estimator":"standard",".estimate":0.00536744976052472,"x":1411603200000,"y":0.00536744976052472},{"date":"2014-09-26",".metric":"rmse",".estimator":"standard",".estimate":0.000564206344653154,"x":1411689600000,"y":0.000564206344653154},{"date":"2014-09-29",".metric":"rmse",".estimator":"standard",".estimate":0.00229672644043357,"x":1411948800000,"y":0.00229672644043357},{"date":"2014-09-30",".metric":"rmse",".estimator":"standard",".estimate":0.00120139847585075,"x":1412035200000,"y":0.00120139847585075},{"date":"2014-10-01",".metric":"rmse",".estimator":"standard",".estimate":0.00214985386840889,"x":1412121600000,"y":0.00214985386840889},{"date":"2014-10-02",".metric":"rmse",".estimator":"standard",".estimate":0.0070625174291384,"x":1412208000000,"y":0.0070625174291384},{"date":"2014-10-03",".metric":"rmse",".estimator":"standard",".estimate":0.0039993124787582,"x":1412294400000,"y":0.0039993124787582},{"date":"2014-10-06",".metric":"rmse",".estimator":"standard",".estimate":0.00406570964175735,"x":1412553600000,"y":0.00406570964175735},{"date":"2014-10-07",".metric":"rmse",".estimator":"standard",".estimate":0.00608770260280821,"x":1412640000000,"y":0.00608770260280821},{"date":"2014-10-08",".metric":"rmse",".estimator":"standard",".estimate":0.0100893179621323,"x":1412726400000,"y":0.0100893179621323},{"date":"2014-10-09",".metric":"rmse",".estimator":"standard",".estimate":0.0093044512515634,"x":1412812800000,"y":0.0093044512515634},{"date":"2014-10-10",".metric":"rmse",".estimator":"standard",".estimate":0.0012772359736668,"x":1412899200000,"y":0.0012772359736668},{"date":"2014-10-13",".metric":"rmse",".estimator":"standard",".estimate":0.0104442600151737,"x":1413158400000,"y":0.0104442600151737},{"date":"2014-10-14",".metric":"rmse",".estimator":"standard",".estimate":0.00466686830959002,"x":1413244800000,"y":0.00466686830959002},{"date":"2014-10-15",".metric":"rmse",".estimator":"standard",".estimate":0.00125758742696745,"x":1413331200000,"y":0.00125758742696745},{"date":"2014-10-16",".metric":"rmse",".estimator":"standard",".estimate":0.000746949848383797,"x":1413417600000,"y":0.000746949848383797},{"date":"2014-10-17",".metric":"rmse",".estimator":"standard",".estimate":0.000446330061162016,"x":1413504000000,"y":0.000446330061162016},{"date":"2014-10-20",".metric":"rmse",".estimator":"standard",".estimate":0.00724374124158148,"x":1413763200000,"y":0.00724374124158148},{"date":"2014-10-21",".metric":"rmse",".estimator":"standard",".estimate":0.00945699685916905,"x":1413849600000,"y":0.00945699685916905},{"date":"2014-10-22",".metric":"rmse",".estimator":"standard",".estimate":0.00226758765277367,"x":1413936000000,"y":0.00226758765277367},{"date":"2014-10-23",".metric":"rmse",".estimator":"standard",".estimate":0.00162200882440421,"x":1414022400000,"y":0.00162200882440421},{"date":"2014-10-24",".metric":"rmse",".estimator":"standard",".estimate":0.00245971519926177,"x":1414108800000,"y":0.00245971519926177},{"date":"2014-10-27",".metric":"rmse",".estimator":"standard",".estimate":0.000111437078117051,"x":1414368000000,"y":0.000111437078117051},{"date":"2014-10-28",".metric":"rmse",".estimator":"standard",".estimate":0.00464060315517396,"x":1414454400000,"y":0.00464060315517396},{"date":"2014-10-29",".metric":"rmse",".estimator":"standard",".estimate":0.000994038213760407,"x":1414540800000,"y":0.000994038213760407},{"date":"2014-10-30",".metric":"rmse",".estimator":"standard",".estimate":8.24153890278114e-05,"x":1414627200000,"y":8.24153890278114e-05},{"date":"2014-10-31",".metric":"rmse",".estimator":"standard",".estimate":0.00136078990878973,"x":1414713600000,"y":0.00136078990878973},{"date":"2014-11-03",".metric":"rmse",".estimator":"standard",".estimate":0.00476872564334884,"x":1414972800000,"y":0.00476872564334884},{"date":"2014-11-04",".metric":"rmse",".estimator":"standard",".estimate":0.00194937552489031,"x":1415059200000,"y":0.00194937552489031},{"date":"2014-11-05",".metric":"rmse",".estimator":"standard",".estimate":0.000904004054702433,"x":1415145600000,"y":0.000904004054702433},{"date":"2014-11-06",".metric":"rmse",".estimator":"standard",".estimate":0.00408408385962174,"x":1415232000000,"y":0.00408408385962174},{"date":"2014-11-07",".metric":"rmse",".estimator":"standard",".estimate":0.000264872917642302,"x":1415318400000,"y":0.000264872917642302},{"date":"2014-11-10",".metric":"rmse",".estimator":"standard",".estimate":0.00185908943461171,"x":1415577600000,"y":0.00185908943461171},{"date":"2014-11-11",".metric":"rmse",".estimator":"standard",".estimate":0.00501504118188602,"x":1415664000000,"y":0.00501504118188602},{"date":"2014-11-12",".metric":"rmse",".estimator":"standard",".estimate":0.00265893779930789,"x":1415750400000,"y":0.00265893779930789},{"date":"2014-11-13",".metric":"rmse",".estimator":"standard",".estimate":0.00216546413051731,"x":1415836800000,"y":0.00216546413051731},{"date":"2014-11-14",".metric":"rmse",".estimator":"standard",".estimate":0.00246891237286566,"x":1415923200000,"y":0.00246891237286566},{"date":"2014-11-17",".metric":"rmse",".estimator":"standard",".estimate":9.41372941566312e-05,"x":1416182400000,"y":9.41372941566312e-05},{"date":"2014-11-18",".metric":"rmse",".estimator":"standard",".estimate":0.00254598014396055,"x":1416268800000,"y":0.00254598014396055},{"date":"2014-11-19",".metric":"rmse",".estimator":"standard",".estimate":2.7174938674092e-05,"x":1416355200000,"y":2.7174938674092e-05},{"date":"2014-11-20",".metric":"rmse",".estimator":"standard",".estimate":0.00229655959266466,"x":1416441600000,"y":0.00229655959266466},{"date":"2014-11-21",".metric":"rmse",".estimator":"standard",".estimate":0.00189549887810964,"x":1416528000000,"y":0.00189549887810964},{"date":"2014-11-24",".metric":"rmse",".estimator":"standard",".estimate":0.000260583677516683,"x":1416787200000,"y":0.000260583677516683},{"date":"2014-11-25",".metric":"rmse",".estimator":"standard",".estimate":0.00203921965649436,"x":1416873600000,"y":0.00203921965649436},{"date":"2014-11-26",".metric":"rmse",".estimator":"standard",".estimate":3.96736941195731e-06,"x":1416960000000,"y":3.96736941195731e-06},{"date":"2014-11-28",".metric":"rmse",".estimator":"standard",".estimate":0.000991448203063747,"x":1417132800000,"y":0.000991448203063747},{"date":"2014-12-01",".metric":"rmse",".estimator":"standard",".estimate":0.00315909482979975,"x":1417392000000,"y":0.00315909482979975},{"date":"2014-12-02",".metric":"rmse",".estimator":"standard",".estimate":0.00360565837034548,"x":1417478400000,"y":0.00360565837034548},{"date":"2014-12-03",".metric":"rmse",".estimator":"standard",".estimate":0.000671187015248425,"x":1417564800000,"y":0.000671187015248425},{"date":"2014-12-04",".metric":"rmse",".estimator":"standard",".estimate":0.000529501904159875,"x":1417651200000,"y":0.000529501904159875},{"date":"2014-12-05",".metric":"rmse",".estimator":"standard",".estimate":0.00175912866840544,"x":1417737600000,"y":0.00175912866840544},{"date":"2014-12-08",".metric":"rmse",".estimator":"standard",".estimate":0.000873646129684066,"x":1417996800000,"y":0.000873646129684066},{"date":"2014-12-09",".metric":"rmse",".estimator":"standard",".estimate":0.0023342024229448,"x":1418083200000,"y":0.0023342024229448},{"date":"2014-12-10",".metric":"rmse",".estimator":"standard",".estimate":0.00317063922796318,"x":1418169600000,"y":0.00317063922796318},{"date":"2014-12-11",".metric":"rmse",".estimator":"standard",".estimate":0.00423979642772988,"x":1418256000000,"y":0.00423979642772988},{"date":"2014-12-12",".metric":"rmse",".estimator":"standard",".estimate":0.0040882751945936,"x":1418342400000,"y":0.0040882751945936},{"date":"2014-12-15",".metric":"rmse",".estimator":"standard",".estimate":0.00777943296248288,"x":1418601600000,"y":0.00777943296248288},{"date":"2014-12-16",".metric":"rmse",".estimator":"standard",".estimate":0.00768636089900666,"x":1418688000000,"y":0.00768636089900666},{"date":"2014-12-17",".metric":"rmse",".estimator":"standard",".estimate":0.0124775738986566,"x":1418774400000,"y":0.0124775738986566},{"date":"2014-12-18",".metric":"rmse",".estimator":"standard",".estimate":0.0107834377209642,"x":1418860800000,"y":0.0107834377209642},{"date":"2014-12-19",".metric":"rmse",".estimator":"standard",".estimate":0.00153144229868774,"x":1418947200000,"y":0.00153144229868774},{"date":"2014-12-22",".metric":"rmse",".estimator":"standard",".estimate":0.000261767913493968,"x":1419206400000,"y":0.000261767913493968},{"date":"2014-12-23",".metric":"rmse",".estimator":"standard",".estimate":0.00281002510430981,"x":1419292800000,"y":0.00281002510430981},{"date":"2014-12-24",".metric":"rmse",".estimator":"standard",".estimate":0.000985711761268521,"x":1419379200000,"y":0.000985711761268521},{"date":"2014-12-26",".metric":"rmse",".estimator":"standard",".estimate":0.00225208514274718,"x":1419552000000,"y":0.00225208514274718},{"date":"2014-12-29",".metric":"rmse",".estimator":"standard",".estimate":0.00115263765734909,"x":1419811200000,"y":0.00115263765734909},{"date":"2014-12-30",".metric":"rmse",".estimator":"standard",".estimate":0.00179371322443149,"x":1419897600000,"y":0.00179371322443149},{"date":"2014-12-31",".metric":"rmse",".estimator":"standard",".estimate":0.00668050594834836,"x":1419984000000,"y":0.00668050594834836},{"date":"2015-01-02",".metric":"rmse",".estimator":"standard",".estimate":0.00395677051677915,"x":1420156800000,"y":0.00395677051677915},{"date":"2015-01-05",".metric":"rmse",".estimator":"standard",".estimate":0.00362346472773006,"x":1420416000000,"y":0.00362346472773006},{"date":"2015-01-06",".metric":"rmse",".estimator":"standard",".estimate":5.35915047082565e-05,"x":1420502400000,"y":5.35915047082565e-05},{"date":"2015-01-07",".metric":"rmse",".estimator":"standard",".estimate":0.00447041821256274,"x":1420588800000,"y":0.00447041821256274},{"date":"2015-01-08",".metric":"rmse",".estimator":"standard",".estimate":0.000670507797887281,"x":1420675200000,"y":0.000670507797887281},{"date":"2015-01-09",".metric":"rmse",".estimator":"standard",".estimate":0.00232322268857359,"x":1420761600000,"y":0.00232322268857359},{"date":"2015-01-12",".metric":"rmse",".estimator":"standard",".estimate":0.0055324501544945,"x":1421020800000,"y":0.0055324501544945},{"date":"2015-01-13",".metric":"rmse",".estimator":"standard",".estimate":0.00411443172418601,"x":1421107200000,"y":0.00411443172418601},{"date":"2015-01-14",".metric":"rmse",".estimator":"standard",".estimate":0.00182992367814795,"x":1421193600000,"y":0.00182992367814795},{"date":"2015-01-15",".metric":"rmse",".estimator":"standard",".estimate":0.00684786734873696,"x":1421280000000,"y":0.00684786734873696},{"date":"2015-01-16",".metric":"rmse",".estimator":"standard",".estimate":0.00517728592452819,"x":1421366400000,"y":0.00517728592452819},{"date":"2015-01-20",".metric":"rmse",".estimator":"standard",".estimate":0.00322047343366349,"x":1421712000000,"y":0.00322047343366349},{"date":"2015-01-21",".metric":"rmse",".estimator":"standard",".estimate":0.00474144338964312,"x":1421798400000,"y":0.00474144338964312},{"date":"2015-01-22",".metric":"rmse",".estimator":"standard",".estimate":0.00536853999227699,"x":1421884800000,"y":0.00536853999227699},{"date":"2015-01-23",".metric":"rmse",".estimator":"standard",".estimate":0.00431737006355696,"x":1421971200000,"y":0.00431737006355696},{"date":"2015-01-26",".metric":"rmse",".estimator":"standard",".estimate":0.00127446112543595,"x":1422230400000,"y":0.00127446112543595},{"date":"2015-01-27",".metric":"rmse",".estimator":"standard",".estimate":0.0069018612083931,"x":1422316800000,"y":0.0069018612083931},{"date":"2015-01-28",".metric":"rmse",".estimator":"standard",".estimate":0.00249742664827329,"x":1422403200000,"y":0.00249742664827329},{"date":"2015-01-29",".metric":"rmse",".estimator":"standard",".estimate":0.00233421320349214,"x":1422489600000,"y":0.00233421320349214},{"date":"2015-01-30",".metric":"rmse",".estimator":"standard",".estimate":0.00480908772662906,"x":1422576000000,"y":0.00480908772662906},{"date":"2015-02-02",".metric":"rmse",".estimator":"standard",".estimate":0.00242542796840435,"x":1422835200000,"y":0.00242542796840435},{"date":"2015-02-03",".metric":"rmse",".estimator":"standard",".estimate":0.00261338329411186,"x":1422921600000,"y":0.00261338329411186},{"date":"2015-02-04",".metric":"rmse",".estimator":"standard",".estimate":0.00198146309507166,"x":1423008000000,"y":0.00198146309507166},{"date":"2015-02-05",".metric":"rmse",".estimator":"standard",".estimate":0.00013164875324184,"x":1423094400000,"y":0.00013164875324184},{"date":"2015-02-06",".metric":"rmse",".estimator":"standard",".estimate":0.000544066084412995,"x":1423180800000,"y":0.000544066084412995},{"date":"2015-02-09",".metric":"rmse",".estimator":"standard",".estimate":0.000947089446149543,"x":1423440000000,"y":0.000947089446149543},{"date":"2015-02-10",".metric":"rmse",".estimator":"standard",".estimate":0.000175757503755772,"x":1423526400000,"y":0.000175757503755772},{"date":"2015-02-11",".metric":"rmse",".estimator":"standard",".estimate":0.000338261761013436,"x":1423612800000,"y":0.000338261761013436},{"date":"2015-02-12",".metric":"rmse",".estimator":"standard",".estimate":0.00100328702089229,"x":1423699200000,"y":0.00100328702089229},{"date":"2015-02-13",".metric":"rmse",".estimator":"standard",".estimate":0.00179271505344779,"x":1423785600000,"y":0.00179271505344779},{"date":"2015-02-17",".metric":"rmse",".estimator":"standard",".estimate":0.00230582211235819,"x":1424131200000,"y":0.00230582211235819},{"date":"2015-02-18",".metric":"rmse",".estimator":"standard",".estimate":0.00327919423348622,"x":1424217600000,"y":0.00327919423348622},{"date":"2015-02-19",".metric":"rmse",".estimator":"standard",".estimate":0.00242318352913266,"x":1424304000000,"y":0.00242318352913266},{"date":"2015-02-20",".metric":"rmse",".estimator":"standard",".estimate":0.000964435546642736,"x":1424390400000,"y":0.000964435546642736},{"date":"2015-02-23",".metric":"rmse",".estimator":"standard",".estimate":0.00117093313511718,"x":1424649600000,"y":0.00117093313511718},{"date":"2015-02-24",".metric":"rmse",".estimator":"standard",".estimate":0.000347530115280883,"x":1424736000000,"y":0.000347530115280883},{"date":"2015-02-25",".metric":"rmse",".estimator":"standard",".estimate":0.00156198286358544,"x":1424822400000,"y":0.00156198286358544},{"date":"2015-02-26",".metric":"rmse",".estimator":"standard",".estimate":0.000101984604233385,"x":1424908800000,"y":0.000101984604233385},{"date":"2015-02-27",".metric":"rmse",".estimator":"standard",".estimate":0.00182582223293412,"x":1424995200000,"y":0.00182582223293412},{"date":"2015-03-02",".metric":"rmse",".estimator":"standard",".estimate":0.00288086391998691,"x":1425254400000,"y":0.00288086391998691},{"date":"2015-03-03",".metric":"rmse",".estimator":"standard",".estimate":0.00169682218549016,"x":1425340800000,"y":0.00169682218549016},{"date":"2015-03-04",".metric":"rmse",".estimator":"standard",".estimate":0.00299797546408927,"x":1425427200000,"y":0.00299797546408927},{"date":"2015-03-05",".metric":"rmse",".estimator":"standard",".estimate":0.000705709812426563,"x":1425513600000,"y":0.000705709812426563},{"date":"2015-03-06",".metric":"rmse",".estimator":"standard",".estimate":0.00674386588619811,"x":1425600000000,"y":0.00674386588619811},{"date":"2015-03-09",".metric":"rmse",".estimator":"standard",".estimate":0.00442197188878407,"x":1425859200000,"y":0.00442197188878407},{"date":"2015-03-10",".metric":"rmse",".estimator":"standard",".estimate":0.00315569341433053,"x":1425945600000,"y":0.00315569341433053},{"date":"2015-03-11",".metric":"rmse",".estimator":"standard",".estimate":0.000474564201651918,"x":1426032000000,"y":0.000474564201651918},{"date":"2015-03-12",".metric":"rmse",".estimator":"standard",".estimate":0.0031755201986522,"x":1426118400000,"y":0.0031755201986522},{"date":"2015-03-13",".metric":"rmse",".estimator":"standard",".estimate":0.000442805972450956,"x":1426204800000,"y":0.000442805972450956},{"date":"2015-03-16",".metric":"rmse",".estimator":"standard",".estimate":0.00145573918699198,"x":1426464000000,"y":0.00145573918699198},{"date":"2015-03-17",".metric":"rmse",".estimator":"standard",".estimate":0.0013937081708804,"x":1426550400000,"y":0.0013937081708804},{"date":"2015-03-18",".metric":"rmse",".estimator":"standard",".estimate":0.000270695042425027,"x":1426636800000,"y":0.000270695042425027},{"date":"2015-03-19",".metric":"rmse",".estimator":"standard",".estimate":0.00270669488153355,"x":1426723200000,"y":0.00270669488153355},{"date":"2015-03-20",".metric":"rmse",".estimator":"standard",".estimate":0.000153771958484089,"x":1426809600000,"y":0.000153771958484089},{"date":"2015-03-23",".metric":"rmse",".estimator":"standard",".estimate":0.00096664476767556,"x":1427068800000,"y":0.00096664476767556},{"date":"2015-03-24",".metric":"rmse",".estimator":"standard",".estimate":0.00280287339960634,"x":1427155200000,"y":0.00280287339960634},{"date":"2015-03-25",".metric":"rmse",".estimator":"standard",".estimate":0.00574731265276986,"x":1427241600000,"y":0.00574731265276986},{"date":"2015-03-26",".metric":"rmse",".estimator":"standard",".estimate":2.69665922843373e-05,"x":1427328000000,"y":2.69665922843373e-05},{"date":"2015-03-27",".metric":"rmse",".estimator":"standard",".estimate":0.00124996438604877,"x":1427414400000,"y":0.00124996438604877},{"date":"2015-03-30",".metric":"rmse",".estimator":"standard",".estimate":0.000369263023341724,"x":1427673600000,"y":0.000369263023341724},{"date":"2015-03-31",".metric":"rmse",".estimator":"standard",".estimate":0.00333011690004239,"x":1427760000000,"y":0.00333011690004239},{"date":"2015-04-01",".metric":"rmse",".estimator":"standard",".estimate":0.000800734527692391,"x":1427846400000,"y":0.000800734527692391},{"date":"2015-04-02",".metric":"rmse",".estimator":"standard",".estimate":0.00211113388134838,"x":1427932800000,"y":0.00211113388134838},{"date":"2015-04-06",".metric":"rmse",".estimator":"standard",".estimate":0.00420722444446628,"x":1428278400000,"y":0.00420722444446628},{"date":"2015-04-07",".metric":"rmse",".estimator":"standard",".estimate":0.00468779581382743,"x":1428364800000,"y":0.00468779581382743},{"date":"2015-04-08",".metric":"rmse",".estimator":"standard",".estimate":0.00202548638905039,"x":1428451200000,"y":0.00202548638905039},{"date":"2015-04-09",".metric":"rmse",".estimator":"standard",".estimate":0.00271888967659028,"x":1428537600000,"y":0.00271888967659028},{"date":"2015-04-10",".metric":"rmse",".estimator":"standard",".estimate":0.00254943812031821,"x":1428624000000,"y":0.00254943812031821},{"date":"2015-04-13",".metric":"rmse",".estimator":"standard",".estimate":0.000723113907213286,"x":1428883200000,"y":0.000723113907213286},{"date":"2015-04-14",".metric":"rmse",".estimator":"standard",".estimate":0.000824328027385011,"x":1428969600000,"y":0.000824328027385011},{"date":"2015-04-15",".metric":"rmse",".estimator":"standard",".estimate":0.000319171035078047,"x":1429056000000,"y":0.000319171035078047},{"date":"2015-04-16",".metric":"rmse",".estimator":"standard",".estimate":0.000237303743723273,"x":1429142400000,"y":0.000237303743723273},{"date":"2015-04-17",".metric":"rmse",".estimator":"standard",".estimate":0.00142556356502989,"x":1429228800000,"y":0.00142556356502989},{"date":"2015-04-20",".metric":"rmse",".estimator":"standard",".estimate":0.00120494144999053,"x":1429488000000,"y":0.00120494144999053},{"date":"2015-04-21",".metric":"rmse",".estimator":"standard",".estimate":0.00243306536885689,"x":1429574400000,"y":0.00243306536885689},{"date":"2015-04-22",".metric":"rmse",".estimator":"standard",".estimate":0.000642918523424196,"x":1429660800000,"y":0.000642918523424196},{"date":"2015-04-23",".metric":"rmse",".estimator":"standard",".estimate":0.000847782712402617,"x":1429747200000,"y":0.000847782712402617},{"date":"2015-04-24",".metric":"rmse",".estimator":"standard",".estimate":0.0013466046179942,"x":1429833600000,"y":0.0013466046179942},{"date":"2015-04-27",".metric":"rmse",".estimator":"standard",".estimate":0.00654832948070044,"x":1430092800000,"y":0.00654832948070044},{"date":"2015-04-28",".metric":"rmse",".estimator":"standard",".estimate":0.00678489140048155,"x":1430179200000,"y":0.00678489140048155},{"date":"2015-04-29",".metric":"rmse",".estimator":"standard",".estimate":0.000777663114918571,"x":1430265600000,"y":0.000777663114918571},{"date":"2015-04-30",".metric":"rmse",".estimator":"standard",".estimate":0.00371967268152809,"x":1430352000000,"y":0.00371967268152809},{"date":"2015-05-01",".metric":"rmse",".estimator":"standard",".estimate":0.00346225141118423,"x":1430438400000,"y":0.00346225141118423},{"date":"2015-05-04",".metric":"rmse",".estimator":"standard",".estimate":0.000107364674304658,"x":1430697600000,"y":0.000107364674304658},{"date":"2015-05-05",".metric":"rmse",".estimator":"standard",".estimate":3.08138475900062e-05,"x":1430784000000,"y":3.08138475900062e-05},{"date":"2015-05-06",".metric":"rmse",".estimator":"standard",".estimate":0.00293495531587135,"x":1430870400000,"y":0.00293495531587135},{"date":"2015-05-07",".metric":"rmse",".estimator":"standard",".estimate":0.00373552826403472,"x":1430956800000,"y":0.00373552826403472},{"date":"2015-05-08",".metric":"rmse",".estimator":"standard",".estimate":0.000568415536384584,"x":1431043200000,"y":0.000568415536384584},{"date":"2015-05-11",".metric":"rmse",".estimator":"standard",".estimate":6.1954636984678e-05,"x":1431302400000,"y":6.1954636984678e-05},{"date":"2015-05-12",".metric":"rmse",".estimator":"standard",".estimate":0.00197403080935961,"x":1431388800000,"y":0.00197403080935961},{"date":"2015-05-13",".metric":"rmse",".estimator":"standard",".estimate":0.00216174101677157,"x":1431475200000,"y":0.00216174101677157},{"date":"2015-05-14",".metric":"rmse",".estimator":"standard",".estimate":0.00137483653164649,"x":1431561600000,"y":0.00137483653164649},{"date":"2015-05-15",".metric":"rmse",".estimator":"standard",".estimate":0.000270038671279347,"x":1431648000000,"y":0.000270038671279347},{"date":"2015-05-18",".metric":"rmse",".estimator":"standard",".estimate":0.00382533238668263,"x":1431907200000,"y":0.00382533238668263},{"date":"2015-05-19",".metric":"rmse",".estimator":"standard",".estimate":6.50218287763947e-05,"x":1431993600000,"y":6.50218287763947e-05},{"date":"2015-05-20",".metric":"rmse",".estimator":"standard",".estimate":0.000243958145563688,"x":1432080000000,"y":0.000243958145563688},{"date":"2015-05-21",".metric":"rmse",".estimator":"standard",".estimate":0.00213951116460373,"x":1432166400000,"y":0.00213951116460373},{"date":"2015-05-22",".metric":"rmse",".estimator":"standard",".estimate":0.00162751922109553,"x":1432252800000,"y":0.00162751922109553},{"date":"2015-05-26",".metric":"rmse",".estimator":"standard",".estimate":2.74205103022677e-05,"x":1432598400000,"y":2.74205103022677e-05},{"date":"2015-05-27",".metric":"rmse",".estimator":"standard",".estimate":0.00146570580680937,"x":1432684800000,"y":0.00146570580680937},{"date":"2015-05-28",".metric":"rmse",".estimator":"standard",".estimate":0.00114926777927846,"x":1432771200000,"y":0.00114926777927846},{"date":"2015-05-29",".metric":"rmse",".estimator":"standard",".estimate":0.000488777570115981,"x":1432857600000,"y":0.000488777570115981},{"date":"2015-06-01",".metric":"rmse",".estimator":"standard",".estimate":0.00407131652498733,"x":1433116800000,"y":0.00407131652498733},{"date":"2015-06-02",".metric":"rmse",".estimator":"standard",".estimate":0.00204226604847051,"x":1433203200000,"y":0.00204226604847051},{"date":"2015-06-03",".metric":"rmse",".estimator":"standard",".estimate":0.000700663281221754,"x":1433289600000,"y":0.000700663281221754},{"date":"2015-06-04",".metric":"rmse",".estimator":"standard",".estimate":0.00205612474736232,"x":1433376000000,"y":0.00205612474736232},{"date":"2015-06-05",".metric":"rmse",".estimator":"standard",".estimate":0.00295680947839454,"x":1433462400000,"y":0.00295680947839454},{"date":"2015-06-08",".metric":"rmse",".estimator":"standard",".estimate":0.00157716240117475,"x":1433721600000,"y":0.00157716240117475},{"date":"2015-06-09",".metric":"rmse",".estimator":"standard",".estimate":0.000509103297587919,"x":1433808000000,"y":0.000509103297587919},{"date":"2015-06-10",".metric":"rmse",".estimator":"standard",".estimate":0.00144361579136164,"x":1433894400000,"y":0.00144361579136164},{"date":"2015-06-11",".metric":"rmse",".estimator":"standard",".estimate":0.000743861394920468,"x":1433980800000,"y":0.000743861394920468},{"date":"2015-06-12",".metric":"rmse",".estimator":"standard",".estimate":0.00270145927928372,"x":1434067200000,"y":0.00270145927928372},{"date":"2015-06-15",".metric":"rmse",".estimator":"standard",".estimate":0.000926839353652406,"x":1434326400000,"y":0.000926839353652406},{"date":"2015-06-16",".metric":"rmse",".estimator":"standard",".estimate":0.000378560473414271,"x":1434412800000,"y":0.000378560473414271},{"date":"2015-06-17",".metric":"rmse",".estimator":"standard",".estimate":0.000606422038276323,"x":1434499200000,"y":0.000606422038276323},{"date":"2015-06-18",".metric":"rmse",".estimator":"standard",".estimate":0.00235093666152922,"x":1434585600000,"y":0.00235093666152922},{"date":"2015-06-19",".metric":"rmse",".estimator":"standard",".estimate":0.0011336978173662,"x":1434672000000,"y":0.0011336978173662},{"date":"2015-06-22",".metric":"rmse",".estimator":"standard",".estimate":0.0068783144157268,"x":1434931200000,"y":0.0068783144157268},{"date":"2015-06-23",".metric":"rmse",".estimator":"standard",".estimate":0.00123664431681601,"x":1435017600000,"y":0.00123664431681601},{"date":"2015-06-24",".metric":"rmse",".estimator":"standard",".estimate":0.00284828561798717,"x":1435104000000,"y":0.00284828561798717},{"date":"2015-06-25",".metric":"rmse",".estimator":"standard",".estimate":0.000491312260733792,"x":1435190400000,"y":0.000491312260733792},{"date":"2015-06-26",".metric":"rmse",".estimator":"standard",".estimate":0.00280120919641011,"x":1435276800000,"y":0.00280120919641011},{"date":"2015-06-29",".metric":"rmse",".estimator":"standard",".estimate":0.00963998417169128,"x":1435536000000,"y":0.00963998417169128},{"date":"2015-06-30",".metric":"rmse",".estimator":"standard",".estimate":0.00614788463974238,"x":1435622400000,"y":0.00614788463974238},{"date":"2015-07-01",".metric":"rmse",".estimator":"standard",".estimate":0.000406467434416981,"x":1435708800000,"y":0.000406467434416981},{"date":"2015-07-02",".metric":"rmse",".estimator":"standard",".estimate":0.000811571940198679,"x":1435795200000,"y":0.000811571940198679},{"date":"2015-07-06",".metric":"rmse",".estimator":"standard",".estimate":0.00509080325444627,"x":1436140800000,"y":0.00509080325444627},{"date":"2015-07-07",".metric":"rmse",".estimator":"standard",".estimate":0.00807239426729424,"x":1436227200000,"y":0.00807239426729424},{"date":"2015-07-08",".metric":"rmse",".estimator":"standard",".estimate":0.00774965792847944,"x":1436313600000,"y":0.00774965792847944},{"date":"2015-07-09",".metric":"rmse",".estimator":"standard",".estimate":0.00304060787414298,"x":1436400000000,"y":0.00304060787414298},{"date":"2015-07-10",".metric":"rmse",".estimator":"standard",".estimate":0.00179563015071285,"x":1436486400000,"y":0.00179563015071285},{"date":"2015-07-13",".metric":"rmse",".estimator":"standard",".estimate":0.00465201680262755,"x":1436745600000,"y":0.00465201680262755},{"date":"2015-07-14",".metric":"rmse",".estimator":"standard",".estimate":0.00164811486400728,"x":1436832000000,"y":0.00164811486400728},{"date":"2015-07-15",".metric":"rmse",".estimator":"standard",".estimate":0.00156774204135862,"x":1436918400000,"y":0.00156774204135862},{"date":"2015-07-16",".metric":"rmse",".estimator":"standard",".estimate":0.000354375284437844,"x":1437004800000,"y":0.000354375284437844},{"date":"2015-07-17",".metric":"rmse",".estimator":"standard",".estimate":0.00131872471606728,"x":1437091200000,"y":0.00131872471606728},{"date":"2015-07-20",".metric":"rmse",".estimator":"standard",".estimate":0.0015726815682986,"x":1437350400000,"y":0.0015726815682986},{"date":"2015-07-21",".metric":"rmse",".estimator":"standard",".estimate":0.000525162727281741,"x":1437436800000,"y":0.000525162727281741},{"date":"2015-07-22",".metric":"rmse",".estimator":"standard",".estimate":0.00139037651660623,"x":1437523200000,"y":0.00139037651660623},{"date":"2015-07-23",".metric":"rmse",".estimator":"standard",".estimate":0.00581183024543379,"x":1437609600000,"y":0.00581183024543379},{"date":"2015-07-24",".metric":"rmse",".estimator":"standard",".estimate":0.00241219761518981,"x":1437696000000,"y":0.00241219761518981},{"date":"2015-07-27",".metric":"rmse",".estimator":"standard",".estimate":0.000763883990557287,"x":1437955200000,"y":0.000763883990557287},{"date":"2015-07-28",".metric":"rmse",".estimator":"standard",".estimate":0.00299993049672719,"x":1438041600000,"y":0.00299993049672719},{"date":"2015-07-29",".metric":"rmse",".estimator":"standard",".estimate":0.00220400241253419,"x":1438128000000,"y":0.00220400241253419},{"date":"2015-07-30",".metric":"rmse",".estimator":"standard",".estimate":0.000885402065001293,"x":1438214400000,"y":0.000885402065001293},{"date":"2015-07-31",".metric":"rmse",".estimator":"standard",".estimate":0.00447994683444991,"x":1438300800000,"y":0.00447994683444991},{"date":"2015-08-03",".metric":"rmse",".estimator":"standard",".estimate":0.00304606040265679,"x":1438560000000,"y":0.00304606040265679},{"date":"2015-08-04",".metric":"rmse",".estimator":"standard",".estimate":8.54531067407526e-05,"x":1438646400000,"y":8.54531067407526e-05},{"date":"2015-08-05",".metric":"rmse",".estimator":"standard",".estimate":0.003746877821574,"x":1438732800000,"y":0.003746877821574},{"date":"2015-08-06",".metric":"rmse",".estimator":"standard",".estimate":0.00206132552144418,"x":1438819200000,"y":0.00206132552144418},{"date":"2015-08-07",".metric":"rmse",".estimator":"standard",".estimate":0.00078137333722023,"x":1438905600000,"y":0.00078137333722023},{"date":"2015-08-10",".metric":"rmse",".estimator":"standard",".estimate":0.00629187034087075,"x":1439164800000,"y":0.00629187034087075},{"date":"2015-08-11",".metric":"rmse",".estimator":"standard",".estimate":0.00137798053706101,"x":1439251200000,"y":0.00137798053706101},{"date":"2015-08-12",".metric":"rmse",".estimator":"standard",".estimate":0.00576804908886543,"x":1439337600000,"y":0.00576804908886543},{"date":"2015-08-13",".metric":"rmse",".estimator":"standard",".estimate":0.00311120713321267,"x":1439424000000,"y":0.00311120713321267},{"date":"2015-08-14",".metric":"rmse",".estimator":"standard",".estimate":0.00355661003054919,"x":1439510400000,"y":0.00355661003054919},{"date":"2015-08-17",".metric":"rmse",".estimator":"standard",".estimate":0.00330562134101764,"x":1439769600000,"y":0.00330562134101764},{"date":"2015-08-18",".metric":"rmse",".estimator":"standard",".estimate":0.00200251053844045,"x":1439856000000,"y":0.00200251053844045},{"date":"2015-08-19",".metric":"rmse",".estimator":"standard",".estimate":0.00276658191142795,"x":1439942400000,"y":0.00276658191142795},{"date":"2015-08-20",".metric":"rmse",".estimator":"standard",".estimate":0.00556411265509235,"x":1440028800000,"y":0.00556411265509235},{"date":"2015-08-21",".metric":"rmse",".estimator":"standard",".estimate":0.0171847059316796,"x":1440115200000,"y":0.0171847059316796},{"date":"2015-08-24",".metric":"rmse",".estimator":"standard",".estimate":0.0246145137540276,"x":1440374400000,"y":0.0246145137540276},{"date":"2015-08-25",".metric":"rmse",".estimator":"standard",".estimate":0.0145503163878593,"x":1440460800000,"y":0.0145503163878593},{"date":"2015-08-26",".metric":"rmse",".estimator":"standard",".estimate":0.0297846625793239,"x":1440547200000,"y":0.0297846625793239},{"date":"2015-08-27",".metric":"rmse",".estimator":"standard",".estimate":0.0112745785635245,"x":1440633600000,"y":0.0112745785635245},{"date":"2015-08-28",".metric":"rmse",".estimator":"standard",".estimate":0.000652500025708383,"x":1440720000000,"y":0.000652500025708383},{"date":"2015-08-31",".metric":"rmse",".estimator":"standard",".estimate":0.00412055162031694,"x":1440979200000,"y":0.00412055162031694},{"date":"2015-09-01",".metric":"rmse",".estimator":"standard",".estimate":0.00604123138552932,"x":1441065600000,"y":0.00604123138552932},{"date":"2015-09-02",".metric":"rmse",".estimator":"standard",".estimate":0.00311910901748417,"x":1441152000000,"y":0.00311910901748417},{"date":"2015-09-03",".metric":"rmse",".estimator":"standard",".estimate":0.00217445455632374,"x":1441238400000,"y":0.00217445455632374},{"date":"2015-09-04",".metric":"rmse",".estimator":"standard",".estimate":0.00041940110598717,"x":1441324800000,"y":0.00041940110598717},{"date":"2015-09-08",".metric":"rmse",".estimator":"standard",".estimate":0.00639249519682666,"x":1441670400000,"y":0.00639249519682666},{"date":"2015-09-09",".metric":"rmse",".estimator":"standard",".estimate":0.0140990892360165,"x":1441756800000,"y":0.0140990892360165},{"date":"2015-09-10",".metric":"rmse",".estimator":"standard",".estimate":0.00623370926371633,"x":1441843200000,"y":0.00623370926371633},{"date":"2015-09-11",".metric":"rmse",".estimator":"standard",".estimate":0.00602317572435243,"x":1441929600000,"y":0.00602317572435243},{"date":"2015-09-14",".metric":"rmse",".estimator":"standard",".estimate":0.00139710374113922,"x":1442188800000,"y":0.00139710374113922},{"date":"2015-09-15",".metric":"rmse",".estimator":"standard",".estimate":0.00587095100687109,"x":1442275200000,"y":0.00587095100687109},{"date":"2015-09-16",".metric":"rmse",".estimator":"standard",".estimate":0.00238667462773552,"x":1442361600000,"y":0.00238667462773552},{"date":"2015-09-17",".metric":"rmse",".estimator":"standard",".estimate":0.00286420083260528,"x":1442448000000,"y":0.00286420083260528},{"date":"2015-09-18",".metric":"rmse",".estimator":"standard",".estimate":0.00040238550366976,"x":1442534400000,"y":0.00040238550366976},{"date":"2015-09-21",".metric":"rmse",".estimator":"standard",".estimate":0.00312927932426035,"x":1442793600000,"y":0.00312927932426035},{"date":"2015-09-22",".metric":"rmse",".estimator":"standard",".estimate":0.00535824984915497,"x":1442880000000,"y":0.00535824984915497},{"date":"2015-09-23",".metric":"rmse",".estimator":"standard",".estimate":0.000822491154298188,"x":1442966400000,"y":0.000822491154298188},{"date":"2015-09-24",".metric":"rmse",".estimator":"standard",".estimate":0.00224123623473526,"x":1443052800000,"y":0.00224123623473526},{"date":"2015-09-25",".metric":"rmse",".estimator":"standard",".estimate":0.0138344691619113,"x":1443139200000,"y":0.0138344691619113},{"date":"2015-09-28",".metric":"rmse",".estimator":"standard",".estimate":0.00357444326627458,"x":1443398400000,"y":0.00357444326627458},{"date":"2015-09-29",".metric":"rmse",".estimator":"standard",".estimate":0.00524808800446163,"x":1443484800000,"y":0.00524808800446163},{"date":"2015-09-30",".metric":"rmse",".estimator":"standard",".estimate":0.00191601260547396,"x":1443571200000,"y":0.00191601260547396},{"date":"2015-10-01",".metric":"rmse",".estimator":"standard",".estimate":0.00161335278422444,"x":1443657600000,"y":0.00161335278422444},{"date":"2015-10-02",".metric":"rmse",".estimator":"standard",".estimate":0.00401563431263661,"x":1443744000000,"y":0.00401563431263661},{"date":"2015-10-05",".metric":"rmse",".estimator":"standard",".estimate":0.00368765685413026,"x":1444003200000,"y":0.00368765685413026},{"date":"2015-10-06",".metric":"rmse",".estimator":"standard",".estimate":0.00296462790166401,"x":1444089600000,"y":0.00296462790166401},{"date":"2015-10-07",".metric":"rmse",".estimator":"standard",".estimate":0.00356094316941773,"x":1444176000000,"y":0.00356094316941773},{"date":"2015-10-08",".metric":"rmse",".estimator":"standard",".estimate":0.00349268333631326,"x":1444262400000,"y":0.00349268333631326},{"date":"2015-10-09",".metric":"rmse",".estimator":"standard",".estimate":0.00411732838803702,"x":1444348800000,"y":0.00411732838803702},{"date":"2015-10-12",".metric":"rmse",".estimator":"standard",".estimate":0.000457143179134392,"x":1444608000000,"y":0.000457143179134392},{"date":"2015-10-13",".metric":"rmse",".estimator":"standard",".estimate":0.00175946600828844,"x":1444694400000,"y":0.00175946600828844},{"date":"2015-10-14",".metric":"rmse",".estimator":"standard",".estimate":0.000761371757338223,"x":1444780800000,"y":0.000761371757338223},{"date":"2015-10-15",".metric":"rmse",".estimator":"standard",".estimate":0.00483450335566792,"x":1444867200000,"y":0.00483450335566792},{"date":"2015-10-16",".metric":"rmse",".estimator":"standard",".estimate":0.0009009931278971,"x":1444953600000,"y":0.0009009931278971},{"date":"2015-10-19",".metric":"rmse",".estimator":"standard",".estimate":0.00161812225629585,"x":1445212800000,"y":0.00161812225629585},{"date":"2015-10-20",".metric":"rmse",".estimator":"standard",".estimate":0.000847963004616699,"x":1445299200000,"y":0.000847963004616699},{"date":"2015-10-21",".metric":"rmse",".estimator":"standard",".estimate":0.00449553817830094,"x":1445385600000,"y":0.00449553817830094},{"date":"2015-10-22",".metric":"rmse",".estimator":"standard",".estimate":0.00412720015707421,"x":1445472000000,"y":0.00412720015707421},{"date":"2015-10-23",".metric":"rmse",".estimator":"standard",".estimate":0.0075915547846696,"x":1445558400000,"y":0.0075915547846696},{"date":"2015-10-26",".metric":"rmse",".estimator":"standard",".estimate":0.00413753895445339,"x":1445817600000,"y":0.00413753895445339},{"date":"2015-10-27",".metric":"rmse",".estimator":"standard",".estimate":0.00186023814882855,"x":1445904000000,"y":0.00186023814882855},{"date":"2015-10-28",".metric":"rmse",".estimator":"standard",".estimate":0.0027230691101985,"x":1445990400000,"y":0.0027230691101985},{"date":"2015-10-29",".metric":"rmse",".estimator":"standard",".estimate":0.00127307043453564,"x":1446076800000,"y":0.00127307043453564},{"date":"2015-10-30",".metric":"rmse",".estimator":"standard",".estimate":0.00208697975205613,"x":1446163200000,"y":0.00208697975205613},{"date":"2015-11-02",".metric":"rmse",".estimator":"standard",".estimate":0.0094658082655116,"x":1446422400000,"y":0.0094658082655116},{"date":"2015-11-03",".metric":"rmse",".estimator":"standard",".estimate":0.00177553400072022,"x":1446508800000,"y":0.00177553400072022},{"date":"2015-11-04",".metric":"rmse",".estimator":"standard",".estimate":0.00151539269324002,"x":1446595200000,"y":0.00151539269324002},{"date":"2015-11-05",".metric":"rmse",".estimator":"standard",".estimate":0.000186268602158622,"x":1446681600000,"y":0.000186268602158622},{"date":"2015-11-06",".metric":"rmse",".estimator":"standard",".estimate":0.00348294025067781,"x":1446768000000,"y":0.00348294025067781},{"date":"2015-11-09",".metric":"rmse",".estimator":"standard",".estimate":0.00392325383365242,"x":1447027200000,"y":0.00392325383365242},{"date":"2015-11-10",".metric":"rmse",".estimator":"standard",".estimate":0.00302523584044596,"x":1447113600000,"y":0.00302523584044596},{"date":"2015-11-11",".metric":"rmse",".estimator":"standard",".estimate":0.00407952531120488,"x":1447200000000,"y":0.00407952531120488},{"date":"2015-11-12",".metric":"rmse",".estimator":"standard",".estimate":0.00640934431579107,"x":1447286400000,"y":0.00640934431579107},{"date":"2015-11-13",".metric":"rmse",".estimator":"standard",".estimate":0.00061421375038837,"x":1447372800000,"y":0.00061421375038837},{"date":"2015-11-16",".metric":"rmse",".estimator":"standard",".estimate":0.00881911536493924,"x":1447632000000,"y":0.00881911536493924},{"date":"2015-11-17",".metric":"rmse",".estimator":"standard",".estimate":0.00409082751160914,"x":1447718400000,"y":0.00409082751160914},{"date":"2015-11-18",".metric":"rmse",".estimator":"standard",".estimate":0.00373578110429996,"x":1447804800000,"y":0.00373578110429996},{"date":"2015-11-19",".metric":"rmse",".estimator":"standard",".estimate":0.000940209145386004,"x":1447891200000,"y":0.000940209145386004},{"date":"2015-11-20",".metric":"rmse",".estimator":"standard",".estimate":0.00409924729863848,"x":1447977600000,"y":0.00409924729863848},{"date":"2015-11-23",".metric":"rmse",".estimator":"standard",".estimate":0.00077736955312196,"x":1448236800000,"y":0.00077736955312196},{"date":"2015-11-24",".metric":"rmse",".estimator":"standard",".estimate":0.00725924885664988,"x":1448323200000,"y":0.00725924885664988},{"date":"2015-11-25",".metric":"rmse",".estimator":"standard",".estimate":0.0023727074013492,"x":1448409600000,"y":0.0023727074013492},{"date":"2015-11-27",".metric":"rmse",".estimator":"standard",".estimate":0.00285667605279557,"x":1448582400000,"y":0.00285667605279557},{"date":"2015-11-30",".metric":"rmse",".estimator":"standard",".estimate":0.000540441757668088,"x":1448841600000,"y":0.000540441757668088},{"date":"2015-12-01",".metric":"rmse",".estimator":"standard",".estimate":0.00281707396194064,"x":1448928000000,"y":0.00281707396194064},{"date":"2015-12-02",".metric":"rmse",".estimator":"standard",".estimate":0.00380555182760584,"x":1449014400000,"y":0.00380555182760584},{"date":"2015-12-03",".metric":"rmse",".estimator":"standard",".estimate":0.00368604681807069,"x":1449100800000,"y":0.00368604681807069},{"date":"2015-12-04",".metric":"rmse",".estimator":"standard",".estimate":0.0157260417998821,"x":1449187200000,"y":0.0157260417998821},{"date":"2015-12-07",".metric":"rmse",".estimator":"standard",".estimate":0.00259219326445448,"x":1449446400000,"y":0.00259219326445448},{"date":"2015-12-08",".metric":"rmse",".estimator":"standard",".estimate":0.000502732151715991,"x":1449532800000,"y":0.000502732151715991},{"date":"2015-12-09",".metric":"rmse",".estimator":"standard",".estimate":0.00507356066681386,"x":1449619200000,"y":0.00507356066681386},{"date":"2015-12-10",".metric":"rmse",".estimator":"standard",".estimate":0.00478650841022692,"x":1449705600000,"y":0.00478650841022692},{"date":"2015-12-11",".metric":"rmse",".estimator":"standard",".estimate":0.00378942098803433,"x":1449792000000,"y":0.00378942098803433},{"date":"2015-12-14",".metric":"rmse",".estimator":"standard",".estimate":0.0051286889839488,"x":1450051200000,"y":0.0051286889839488},{"date":"2015-12-15",".metric":"rmse",".estimator":"standard",".estimate":0.00572619680990926,"x":1450137600000,"y":0.00572619680990926},{"date":"2015-12-16",".metric":"rmse",".estimator":"standard",".estimate":0.000756598335299976,"x":1450224000000,"y":0.000756598335299976},{"date":"2015-12-17",".metric":"rmse",".estimator":"standard",".estimate":0.00643824801210792,"x":1450310400000,"y":0.00643824801210792},{"date":"2015-12-18",".metric":"rmse",".estimator":"standard",".estimate":0.00495687121937044,"x":1450396800000,"y":0.00495687121937044},{"date":"2015-12-21",".metric":"rmse",".estimator":"standard",".estimate":0.00887843066546218,"x":1450656000000,"y":0.00887843066546218},{"date":"2015-12-22",".metric":"rmse",".estimator":"standard",".estimate":0.00236204083051885,"x":1450742400000,"y":0.00236204083051885},{"date":"2015-12-23",".metric":"rmse",".estimator":"standard",".estimate":0.000914800671941883,"x":1450828800000,"y":0.000914800671941883},{"date":"2015-12-24",".metric":"rmse",".estimator":"standard",".estimate":0.0026669781862874,"x":1450915200000,"y":0.0026669781862874},{"date":"2015-12-28",".metric":"rmse",".estimator":"standard",".estimate":0.00239132223435226,"x":1451260800000,"y":0.00239132223435226},{"date":"2015-12-29",".metric":"rmse",".estimator":"standard",".estimate":0.000306643195085988,"x":1451347200000,"y":0.000306643195085988},{"date":"2015-12-30",".metric":"rmse",".estimator":"standard",".estimate":0.000523259974480772,"x":1451433600000,"y":0.000523259974480772},{"date":"2015-12-31",".metric":"rmse",".estimator":"standard",".estimate":0.000265027554290816,"x":1451520000000,"y":0.000265027554290816},{"date":"2016-01-04",".metric":"rmse",".estimator":"standard",".estimate":0.00435473108535608,"x":1451865600000,"y":0.00435473108535608},{"date":"2016-01-05",".metric":"rmse",".estimator":"standard",".estimate":4.99137262643245e-05,"x":1451952000000,"y":4.99137262643245e-05},{"date":"2016-01-06",".metric":"rmse",".estimator":"standard",".estimate":4.8622227364398e-05,"x":1452038400000,"y":4.8622227364398e-05},{"date":"2016-01-07",".metric":"rmse",".estimator":"standard",".estimate":0.0076402684003439,"x":1452124800000,"y":0.0076402684003439},{"date":"2016-01-08",".metric":"rmse",".estimator":"standard",".estimate":9.21799517826765e-05,"x":1452211200000,"y":9.21799517826765e-05},{"date":"2016-01-11",".metric":"rmse",".estimator":"standard",".estimate":0.0016457241875072,"x":1452470400000,"y":0.0016457241875072},{"date":"2016-01-12",".metric":"rmse",".estimator":"standard",".estimate":0.000877015684916291,"x":1452556800000,"y":0.000877015684916291},{"date":"2016-01-13",".metric":"rmse",".estimator":"standard",".estimate":0.00943443643821319,"x":1452643200000,"y":0.00943443643821319},{"date":"2016-01-14",".metric":"rmse",".estimator":"standard",".estimate":0.009204029031713,"x":1452729600000,"y":0.009204029031713},{"date":"2016-01-15",".metric":"rmse",".estimator":"standard",".estimate":0.000353175248814303,"x":1452816000000,"y":0.000353175248814303},{"date":"2016-01-19",".metric":"rmse",".estimator":"standard",".estimate":0.00629909289057516,"x":1453161600000,"y":0.00629909289057516},{"date":"2016-01-20",".metric":"rmse",".estimator":"standard",".estimate":0.00191760098925365,"x":1453248000000,"y":0.00191760098925365},{"date":"2016-01-21",".metric":"rmse",".estimator":"standard",".estimate":0.00439061618074424,"x":1453334400000,"y":0.00439061618074424},{"date":"2016-01-22",".metric":"rmse",".estimator":"standard",".estimate":0.010413951735192,"x":1453420800000,"y":0.010413951735192},{"date":"2016-01-25",".metric":"rmse",".estimator":"standard",".estimate":0.00251449135223171,"x":1453680000000,"y":0.00251449135223171},{"date":"2016-01-26",".metric":"rmse",".estimator":"standard",".estimate":0.0028650925539264,"x":1453766400000,"y":0.0028650925539264},{"date":"2016-01-27",".metric":"rmse",".estimator":"standard",".estimate":0.00461631957115544,"x":1453852800000,"y":0.00461631957115544},{"date":"2016-01-28",".metric":"rmse",".estimator":"standard",".estimate":0.00577257442460362,"x":1453939200000,"y":0.00577257442460362},{"date":"2016-01-29",".metric":"rmse",".estimator":"standard",".estimate":0.00670028948029786,"x":1454025600000,"y":0.00670028948029786},{"date":"2016-02-01",".metric":"rmse",".estimator":"standard",".estimate":0.00394007287995396,"x":1454284800000,"y":0.00394007287995396},{"date":"2016-02-02",".metric":"rmse",".estimator":"standard",".estimate":0.00150439975495101,"x":1454371200000,"y":0.00150439975495101},{"date":"2016-02-03",".metric":"rmse",".estimator":"standard",".estimate":0.00606677272423137,"x":1454457600000,"y":0.00606677272423137},{"date":"2016-02-04",".metric":"rmse",".estimator":"standard",".estimate":0.00635159670254089,"x":1454544000000,"y":0.00635159670254089},{"date":"2016-02-05",".metric":"rmse",".estimator":"standard",".estimate":0.00992725209016555,"x":1454630400000,"y":0.00992725209016555},{"date":"2016-02-08",".metric":"rmse",".estimator":"standard",".estimate":0.00758043240968828,"x":1454889600000,"y":0.00758043240968828},{"date":"2016-02-09",".metric":"rmse",".estimator":"standard",".estimate":0.00466694271667664,"x":1454976000000,"y":0.00466694271667664},{"date":"2016-02-10",".metric":"rmse",".estimator":"standard",".estimate":0.00654589752122079,"x":1455062400000,"y":0.00654589752122079},{"date":"2016-02-11",".metric":"rmse",".estimator":"standard",".estimate":0.00322855895854392,"x":1455148800000,"y":0.00322855895854392},{"date":"2016-02-12",".metric":"rmse",".estimator":"standard",".estimate":0.00958575719258743,"x":1455235200000,"y":0.00958575719258743},{"date":"2016-02-16",".metric":"rmse",".estimator":"standard",".estimate":0.00389361272050569,"x":1455580800000,"y":0.00389361272050569},{"date":"2016-02-17",".metric":"rmse",".estimator":"standard",".estimate":0.000787449185594025,"x":1455667200000,"y":0.000787449185594025},{"date":"2016-02-18",".metric":"rmse",".estimator":"standard",".estimate":0.000357318694135214,"x":1455753600000,"y":0.000357318694135214},{"date":"2016-02-19",".metric":"rmse",".estimator":"standard",".estimate":0.000768228836225628,"x":1455840000000,"y":0.000768228836225628},{"date":"2016-02-22",".metric":"rmse",".estimator":"standard",".estimate":0.000887784879040737,"x":1456099200000,"y":0.000887784879040737},{"date":"2016-02-23",".metric":"rmse",".estimator":"standard",".estimate":0.00282139674783032,"x":1456185600000,"y":0.00282139674783032},{"date":"2016-02-24",".metric":"rmse",".estimator":"standard",".estimate":0.0107872487574005,"x":1456272000000,"y":0.0107872487574005},{"date":"2016-02-25",".metric":"rmse",".estimator":"standard",".estimate":0.00135768760736578,"x":1456358400000,"y":0.00135768760736578},{"date":"2016-02-26",".metric":"rmse",".estimator":"standard",".estimate":0.00315134214895253,"x":1456444800000,"y":0.00315134214895253},{"date":"2016-02-29",".metric":"rmse",".estimator":"standard",".estimate":0.00309953861889532,"x":1456704000000,"y":0.00309953861889532},{"date":"2016-03-01",".metric":"rmse",".estimator":"standard",".estimate":0.00596851063581625,"x":1456790400000,"y":0.00596851063581625},{"date":"2016-03-02",".metric":"rmse",".estimator":"standard",".estimate":0.002590271725395,"x":1456876800000,"y":0.002590271725395},{"date":"2016-03-03",".metric":"rmse",".estimator":"standard",".estimate":0.000895392770388821,"x":1456963200000,"y":0.000895392770388821},{"date":"2016-03-04",".metric":"rmse",".estimator":"standard",".estimate":0.000695556331745784,"x":1457049600000,"y":0.000695556331745784},{"date":"2016-03-07",".metric":"rmse",".estimator":"standard",".estimate":0.00419991571814754,"x":1457308800000,"y":0.00419991571814754},{"date":"2016-03-08",".metric":"rmse",".estimator":"standard",".estimate":0.00218963388792789,"x":1457395200000,"y":0.00218963388792789},{"date":"2016-03-09",".metric":"rmse",".estimator":"standard",".estimate":0.00173384458370314,"x":1457481600000,"y":0.00173384458370314},{"date":"2016-03-10",".metric":"rmse",".estimator":"standard",".estimate":0.00205103302636883,"x":1457568000000,"y":0.00205103302636883},{"date":"2016-03-11",".metric":"rmse",".estimator":"standard",".estimate":0.000199092245679135,"x":1457654400000,"y":0.000199092245679135},{"date":"2016-03-14",".metric":"rmse",".estimator":"standard",".estimate":0.00103645659262195,"x":1457913600000,"y":0.00103645659262195},{"date":"2016-03-15",".metric":"rmse",".estimator":"standard",".estimate":0.00200455073617329,"x":1458000000000,"y":0.00200455073617329},{"date":"2016-03-16",".metric":"rmse",".estimator":"standard",".estimate":0.000996575530414657,"x":1458086400000,"y":0.000996575530414657},{"date":"2016-03-17",".metric":"rmse",".estimator":"standard",".estimate":0.00316782341795077,"x":1458172800000,"y":0.00316782341795077},{"date":"2016-03-18",".metric":"rmse",".estimator":"standard",".estimate":0.000948794057591311,"x":1458259200000,"y":0.000948794057591311},{"date":"2016-03-21",".metric":"rmse",".estimator":"standard",".estimate":0.00146085276874177,"x":1458518400000,"y":0.00146085276874177},{"date":"2016-03-22",".metric":"rmse",".estimator":"standard",".estimate":0.000253473046446184,"x":1458604800000,"y":0.000253473046446184},{"date":"2016-03-23",".metric":"rmse",".estimator":"standard",".estimate":0.000394696354946654,"x":1458691200000,"y":0.000394696354946654},{"date":"2016-03-24",".metric":"rmse",".estimator":"standard",".estimate":0.00434327783930099,"x":1458777600000,"y":0.00434327783930099},{"date":"2016-03-28",".metric":"rmse",".estimator":"standard",".estimate":0.00254485496312226,"x":1459123200000,"y":0.00254485496312226},{"date":"2016-03-29",".metric":"rmse",".estimator":"standard",".estimate":0.00293945789754692,"x":1459209600000,"y":0.00293945789754692},{"date":"2016-03-30",".metric":"rmse",".estimator":"standard",".estimate":0.00292839427870702,"x":1459296000000,"y":0.00292839427870702},{"date":"2016-03-31",".metric":"rmse",".estimator":"standard",".estimate":0.000726623417067855,"x":1459382400000,"y":0.000726623417067855},{"date":"2016-04-01",".metric":"rmse",".estimator":"standard",".estimate":0.00599720890133593,"x":1459468800000,"y":0.00599720890133593},{"date":"2016-04-04",".metric":"rmse",".estimator":"standard",".estimate":0.00460142348446367,"x":1459728000000,"y":0.00460142348446367},{"date":"2016-04-05",".metric":"rmse",".estimator":"standard",".estimate":0.00368981274482765,"x":1459814400000,"y":0.00368981274482765},{"date":"2016-04-06",".metric":"rmse",".estimator":"standard",".estimate":0.00259717641563854,"x":1459900800000,"y":0.00259717641563854},{"date":"2016-04-07",".metric":"rmse",".estimator":"standard",".estimate":0.00154438576727709,"x":1459987200000,"y":0.00154438576727709},{"date":"2016-04-08",".metric":"rmse",".estimator":"standard",".estimate":0.00391666299607268,"x":1460073600000,"y":0.00391666299607268},{"date":"2016-04-11",".metric":"rmse",".estimator":"standard",".estimate":0.00134671830652272,"x":1460332800000,"y":0.00134671830652272},{"date":"2016-04-12",".metric":"rmse",".estimator":"standard",".estimate":0.00141967032713428,"x":1460419200000,"y":0.00141967032713428},{"date":"2016-04-13",".metric":"rmse",".estimator":"standard",".estimate":0.00357924958125467,"x":1460505600000,"y":0.00357924958125467},{"date":"2016-04-14",".metric":"rmse",".estimator":"standard",".estimate":0.000998301985721335,"x":1460592000000,"y":0.000998301985721335},{"date":"2016-04-15",".metric":"rmse",".estimator":"standard",".estimate":0.00205469281685017,"x":1460678400000,"y":0.00205469281685017},{"date":"2016-04-18",".metric":"rmse",".estimator":"standard",".estimate":0.00301784509946262,"x":1460937600000,"y":0.00301784509946262},{"date":"2016-04-19",".metric":"rmse",".estimator":"standard",".estimate":0.00718699191439488,"x":1461024000000,"y":0.00718699191439488},{"date":"2016-04-20",".metric":"rmse",".estimator":"standard",".estimate":0.00106201157201088,"x":1461110400000,"y":0.00106201157201088},{"date":"2016-04-21",".metric":"rmse",".estimator":"standard",".estimate":0.00592184849459788,"x":1461196800000,"y":0.00592184849459788},{"date":"2016-04-22",".metric":"rmse",".estimator":"standard",".estimate":0.00177747332129503,"x":1461283200000,"y":0.00177747332129503},{"date":"2016-04-25",".metric":"rmse",".estimator":"standard",".estimate":0.000890597896476351,"x":1461542400000,"y":0.000890597896476351},{"date":"2016-04-26",".metric":"rmse",".estimator":"standard",".estimate":0.00109709822713204,"x":1461628800000,"y":0.00109709822713204},{"date":"2016-04-27",".metric":"rmse",".estimator":"standard",".estimate":0.00118245146927174,"x":1461715200000,"y":0.00118245146927174},{"date":"2016-04-28",".metric":"rmse",".estimator":"standard",".estimate":0.00745966954471345,"x":1461801600000,"y":0.00745966954471345},{"date":"2016-04-29",".metric":"rmse",".estimator":"standard",".estimate":0.000640248171194056,"x":1461888000000,"y":0.000640248171194056},{"date":"2016-05-02",".metric":"rmse",".estimator":"standard",".estimate":0.00329397997581203,"x":1462147200000,"y":0.00329397997581203},{"date":"2016-05-03",".metric":"rmse",".estimator":"standard",".estimate":0.00165645482482795,"x":1462233600000,"y":0.00165645482482795},{"date":"2016-05-04",".metric":"rmse",".estimator":"standard",".estimate":0.00402444273380057,"x":1462320000000,"y":0.00402444273380057},{"date":"2016-05-05",".metric":"rmse",".estimator":"standard",".estimate":0.00185719119331311,"x":1462406400000,"y":0.00185719119331311},{"date":"2016-05-06",".metric":"rmse",".estimator":"standard",".estimate":0.0037362581696451,"x":1462492800000,"y":0.0037362581696451},{"date":"2016-05-09",".metric":"rmse",".estimator":"standard",".estimate":0.00215024100261312,"x":1462752000000,"y":0.00215024100261312},{"date":"2016-05-10",".metric":"rmse",".estimator":"standard",".estimate":0.000153526860089999,"x":1462838400000,"y":0.000153526860089999},{"date":"2016-05-11",".metric":"rmse",".estimator":"standard",".estimate":0.00572802100333414,"x":1462924800000,"y":0.00572802100333414},{"date":"2016-05-12",".metric":"rmse",".estimator":"standard",".estimate":0.000890313377164864,"x":1463011200000,"y":0.000890313377164864},{"date":"2016-05-13",".metric":"rmse",".estimator":"standard",".estimate":0.00170487435991207,"x":1463097600000,"y":0.00170487435991207},{"date":"2016-05-16",".metric":"rmse",".estimator":"standard",".estimate":0.00298076176335253,"x":1463356800000,"y":0.00298076176335253},{"date":"2016-05-17",".metric":"rmse",".estimator":"standard",".estimate":0.00418253200971225,"x":1463443200000,"y":0.00418253200971225},{"date":"2016-05-18",".metric":"rmse",".estimator":"standard",".estimate":0.00309503739952598,"x":1463529600000,"y":0.00309503739952598},{"date":"2016-05-19",".metric":"rmse",".estimator":"standard",".estimate":0.00038459289575377,"x":1463616000000,"y":0.00038459289575377},{"date":"2016-05-20",".metric":"rmse",".estimator":"standard",".estimate":0.0013834793689818,"x":1463702400000,"y":0.0013834793689818},{"date":"2016-05-23",".metric":"rmse",".estimator":"standard",".estimate":0.00208638939394112,"x":1463961600000,"y":0.00208638939394112},{"date":"2016-05-24",".metric":"rmse",".estimator":"standard",".estimate":0.00205138137682025,"x":1464048000000,"y":0.00205138137682025},{"date":"2016-05-25",".metric":"rmse",".estimator":"standard",".estimate":0.00208963123522787,"x":1464134400000,"y":0.00208963123522787},{"date":"2016-05-26",".metric":"rmse",".estimator":"standard",".estimate":0.00251970710245937,"x":1464220800000,"y":0.00251970710245937},{"date":"2016-05-27",".metric":"rmse",".estimator":"standard",".estimate":0.00326808685249009,"x":1464307200000,"y":0.00326808685249009},{"date":"2016-05-31",".metric":"rmse",".estimator":"standard",".estimate":0.0014940017301349,"x":1464652800000,"y":0.0014940017301349},{"date":"2016-06-01",".metric":"rmse",".estimator":"standard",".estimate":0.00406429020421808,"x":1464739200000,"y":0.00406429020421808},{"date":"2016-06-02",".metric":"rmse",".estimator":"standard",".estimate":0.00315603697795239,"x":1464825600000,"y":0.00315603697795239},{"date":"2016-06-03",".metric":"rmse",".estimator":"standard",".estimate":0.00225870371101233,"x":1464912000000,"y":0.00225870371101233},{"date":"2016-06-06",".metric":"rmse",".estimator":"standard",".estimate":0.00174967185587845,"x":1465171200000,"y":0.00174967185587845},{"date":"2016-06-07",".metric":"rmse",".estimator":"standard",".estimate":0.00126028131051133,"x":1465257600000,"y":0.00126028131051133},{"date":"2016-06-08",".metric":"rmse",".estimator":"standard",".estimate":0.00171757609224733,"x":1465344000000,"y":0.00171757609224733},{"date":"2016-06-09",".metric":"rmse",".estimator":"standard",".estimate":0.00417535879558533,"x":1465430400000,"y":0.00417535879558533},{"date":"2016-06-10",".metric":"rmse",".estimator":"standard",".estimate":0.000213703812772883,"x":1465516800000,"y":0.000213703812772883},{"date":"2016-06-13",".metric":"rmse",".estimator":"standard",".estimate":0.00050605259871416,"x":1465776000000,"y":0.00050605259871416},{"date":"2016-06-14",".metric":"rmse",".estimator":"standard",".estimate":0.00456610427707012,"x":1465862400000,"y":0.00456610427707012},{"date":"2016-06-15",".metric":"rmse",".estimator":"standard",".estimate":0.00420693469354562,"x":1465948800000,"y":0.00420693469354562},{"date":"2016-06-16",".metric":"rmse",".estimator":"standard",".estimate":0.000952615275277162,"x":1466035200000,"y":0.000952615275277162},{"date":"2016-06-17",".metric":"rmse",".estimator":"standard",".estimate":0.00545205205103526,"x":1466121600000,"y":0.00545205205103526},{"date":"2016-06-20",".metric":"rmse",".estimator":"standard",".estimate":0.00264430798564621,"x":1466380800000,"y":0.00264430798564621},{"date":"2016-06-21",".metric":"rmse",".estimator":"standard",".estimate":0.00162799812729609,"x":1466467200000,"y":0.00162799812729609},{"date":"2016-06-22",".metric":"rmse",".estimator":"standard",".estimate":0.00459564363340417,"x":1466553600000,"y":0.00459564363340417},{"date":"2016-06-23",".metric":"rmse",".estimator":"standard",".estimate":0.00126125717424231,"x":1466640000000,"y":0.00126125717424231},{"date":"2016-06-24",".metric":"rmse",".estimator":"standard",".estimate":0.0218964449621362,"x":1466726400000,"y":0.0218964449621362},{"date":"2016-06-27",".metric":"rmse",".estimator":"standard",".estimate":0.00521052872799053,"x":1466985600000,"y":0.00521052872799053},{"date":"2016-06-28",".metric":"rmse",".estimator":"standard",".estimate":0.00316642968791388,"x":1467072000000,"y":0.00316642968791388},{"date":"2016-06-29",".metric":"rmse",".estimator":"standard",".estimate":0.00620234930872215,"x":1467158400000,"y":0.00620234930872215},{"date":"2016-06-30",".metric":"rmse",".estimator":"standard",".estimate":0.00408141724618029,"x":1467244800000,"y":0.00408141724618029},{"date":"2016-07-01",".metric":"rmse",".estimator":"standard",".estimate":0.00160529542707455,"x":1467331200000,"y":0.00160529542707455},{"date":"2016-07-05",".metric":"rmse",".estimator":"standard",".estimate":0.000420495638959605,"x":1467676800000,"y":0.000420495638959605},{"date":"2016-07-06",".metric":"rmse",".estimator":"standard",".estimate":0.00569225319450741,"x":1467763200000,"y":0.00569225319450741},{"date":"2016-07-07",".metric":"rmse",".estimator":"standard",".estimate":0.000777195035299231,"x":1467849600000,"y":0.000777195035299231},{"date":"2016-07-08",".metric":"rmse",".estimator":"standard",".estimate":0.00773284183498263,"x":1467936000000,"y":0.00773284183498263},{"date":"2016-07-11",".metric":"rmse",".estimator":"standard",".estimate":0.00225238666352208,"x":1468195200000,"y":0.00225238666352208},{"date":"2016-07-12",".metric":"rmse",".estimator":"standard",".estimate":0.00202290859257446,"x":1468281600000,"y":0.00202290859257446},{"date":"2016-07-13",".metric":"rmse",".estimator":"standard",".estimate":0.000245693389992942,"x":1468368000000,"y":0.000245693389992942},{"date":"2016-07-14",".metric":"rmse",".estimator":"standard",".estimate":0.00152066896876954,"x":1468454400000,"y":0.00152066896876954},{"date":"2016-07-15",".metric":"rmse",".estimator":"standard",".estimate":0.00078254577677485,"x":1468540800000,"y":0.00078254577677485},{"date":"2016-07-18",".metric":"rmse",".estimator":"standard",".estimate":0.00020540843385228,"x":1468800000000,"y":0.00020540843385228},{"date":"2016-07-19",".metric":"rmse",".estimator":"standard",".estimate":0.000109756098568486,"x":1468886400000,"y":0.000109756098568486},{"date":"2016-07-20",".metric":"rmse",".estimator":"standard",".estimate":0.000459558969443395,"x":1468972800000,"y":0.000459558969443395},{"date":"2016-07-21",".metric":"rmse",".estimator":"standard",".estimate":0.0025596046571624,"x":1469059200000,"y":0.0025596046571624},{"date":"2016-07-22",".metric":"rmse",".estimator":"standard",".estimate":0.00159646668902795,"x":1469145600000,"y":0.00159646668902795},{"date":"2016-07-25",".metric":"rmse",".estimator":"standard",".estimate":0.000752919943341165,"x":1469404800000,"y":0.000752919943341165},{"date":"2016-07-26",".metric":"rmse",".estimator":"standard",".estimate":0.000342669890539401,"x":1469491200000,"y":0.000342669890539401},{"date":"2016-07-27",".metric":"rmse",".estimator":"standard",".estimate":0.00160685797048586,"x":1469577600000,"y":0.00160685797048586},{"date":"2016-07-28",".metric":"rmse",".estimator":"standard",".estimate":1.32340893194056e-05,"x":1469664000000,"y":1.32340893194056e-05},{"date":"2016-07-29",".metric":"rmse",".estimator":"standard",".estimate":0.0053179992417029,"x":1469750400000,"y":0.0053179992417029},{"date":"2016-08-01",".metric":"rmse",".estimator":"standard",".estimate":0.00251111239187018,"x":1470009600000,"y":0.00251111239187018},{"date":"2016-08-02",".metric":"rmse",".estimator":"standard",".estimate":0.00120298550294066,"x":1470096000000,"y":0.00120298550294066},{"date":"2016-08-03",".metric":"rmse",".estimator":"standard",".estimate":0.00215962728514518,"x":1470182400000,"y":0.00215962728514518},{"date":"2016-08-04",".metric":"rmse",".estimator":"standard",".estimate":0.000265387922839076,"x":1470268800000,"y":0.000265387922839076},{"date":"2016-08-05",".metric":"rmse",".estimator":"standard",".estimate":0.00330781285050998,"x":1470355200000,"y":0.00330781285050998},{"date":"2016-08-08",".metric":"rmse",".estimator":"standard",".estimate":0.00111069887988652,"x":1470614400000,"y":0.00111069887988652},{"date":"2016-08-09",".metric":"rmse",".estimator":"standard",".estimate":0.00205034315014647,"x":1470700800000,"y":0.00205034315014647},{"date":"2016-08-10",".metric":"rmse",".estimator":"standard",".estimate":0.00268895250401861,"x":1470787200000,"y":0.00268895250401861},{"date":"2016-08-11",".metric":"rmse",".estimator":"standard",".estimate":0.000442165938877304,"x":1470873600000,"y":0.000442165938877304},{"date":"2016-08-12",".metric":"rmse",".estimator":"standard",".estimate":0.00125028667314022,"x":1470960000000,"y":0.00125028667314022},{"date":"2016-08-15",".metric":"rmse",".estimator":"standard",".estimate":0.00276551418131063,"x":1471219200000,"y":0.00276551418131063},{"date":"2016-08-16",".metric":"rmse",".estimator":"standard",".estimate":0.000826884946007651,"x":1471305600000,"y":0.000826884946007651},{"date":"2016-08-17",".metric":"rmse",".estimator":"standard",".estimate":0.000134187926020832,"x":1471392000000,"y":0.000134187926020832},{"date":"2016-08-18",".metric":"rmse",".estimator":"standard",".estimate":0.00232007960670281,"x":1471478400000,"y":0.00232007960670281},{"date":"2016-08-19",".metric":"rmse",".estimator":"standard",".estimate":0.00296264890988167,"x":1471564800000,"y":0.00296264890988167},{"date":"2016-08-22",".metric":"rmse",".estimator":"standard",".estimate":0.000720129925662719,"x":1471824000000,"y":0.000720129925662719},{"date":"2016-08-23",".metric":"rmse",".estimator":"standard",".estimate":0.000507613820433002,"x":1471910400000,"y":0.000507613820433002},{"date":"2016-08-24",".metric":"rmse",".estimator":"standard",".estimate":0.00217257880341865,"x":1471996800000,"y":0.00217257880341865},{"date":"2016-08-25",".metric":"rmse",".estimator":"standard",".estimate":0.000613887044415466,"x":1472083200000,"y":0.000613887044415466},{"date":"2016-08-26",".metric":"rmse",".estimator":"standard",".estimate":0.00178061008027771,"x":1472169600000,"y":0.00178061008027771},{"date":"2016-08-29",".metric":"rmse",".estimator":"standard",".estimate":0.00116055813386945,"x":1472428800000,"y":0.00116055813386945},{"date":"2016-08-30",".metric":"rmse",".estimator":"standard",".estimate":0.00292582593184027,"x":1472515200000,"y":0.00292582593184027},{"date":"2016-08-31",".metric":"rmse",".estimator":"standard",".estimate":0.00197813491112788,"x":1472601600000,"y":0.00197813491112788},{"date":"2016-09-01",".metric":"rmse",".estimator":"standard",".estimate":0.00104498304856213,"x":1472688000000,"y":0.00104498304856213},{"date":"2016-09-02",".metric":"rmse",".estimator":"standard",".estimate":0.00263560444221595,"x":1472774400000,"y":0.00263560444221595},{"date":"2016-09-06",".metric":"rmse",".estimator":"standard",".estimate":0.000738604008634933,"x":1473120000000,"y":0.000738604008634933},{"date":"2016-09-07",".metric":"rmse",".estimator":"standard",".estimate":0.000751260854430958,"x":1473206400000,"y":0.000751260854430958},{"date":"2016-09-08",".metric":"rmse",".estimator":"standard",".estimate":0.00136297845744431,"x":1473292800000,"y":0.00136297845744431},{"date":"2016-09-09",".metric":"rmse",".estimator":"standard",".estimate":0.0171051525717296,"x":1473379200000,"y":0.0171051525717296},{"date":"2016-09-12",".metric":"rmse",".estimator":"standard",".estimate":0.00876376658570854,"x":1473638400000,"y":0.00876376658570854},{"date":"2016-09-13",".metric":"rmse",".estimator":"standard",".estimate":0.00740929065252387,"x":1473724800000,"y":0.00740929065252387},{"date":"2016-09-14",".metric":"rmse",".estimator":"standard",".estimate":0.000285211158969392,"x":1473811200000,"y":0.000285211158969392},{"date":"2016-09-15",".metric":"rmse",".estimator":"standard",".estimate":0.0041449808882293,"x":1473897600000,"y":0.0041449808882293},{"date":"2016-09-16",".metric":"rmse",".estimator":"standard",".estimate":0.00137146529713595,"x":1473984000000,"y":0.00137146529713595},{"date":"2016-09-19",".metric":"rmse",".estimator":"standard",".estimate":0.00151473219216688,"x":1474243200000,"y":0.00151473219216688},{"date":"2016-09-20",".metric":"rmse",".estimator":"standard",".estimate":0.00279413362661776,"x":1474329600000,"y":0.00279413362661776},{"date":"2016-09-21",".metric":"rmse",".estimator":"standard",".estimate":0.0041009394467729,"x":1474416000000,"y":0.0041009394467729},{"date":"2016-09-22",".metric":"rmse",".estimator":"standard",".estimate":0.000115329292514692,"x":1474502400000,"y":0.000115329292514692},{"date":"2016-09-23",".metric":"rmse",".estimator":"standard",".estimate":0.00122090649588156,"x":1474588800000,"y":0.00122090649588156},{"date":"2016-09-26",".metric":"rmse",".estimator":"standard",".estimate":0.00314035666753001,"x":1474848000000,"y":0.00314035666753001},{"date":"2016-09-27",".metric":"rmse",".estimator":"standard",".estimate":0.00317780922371701,"x":1474934400000,"y":0.00317780922371701},{"date":"2016-09-28",".metric":"rmse",".estimator":"standard",".estimate":0.00482269019241134,"x":1475020800000,"y":0.00482269019241134},{"date":"2016-09-29",".metric":"rmse",".estimator":"standard",".estimate":0.000353841625704364,"x":1475107200000,"y":0.000353841625704364},{"date":"2016-09-30",".metric":"rmse",".estimator":"standard",".estimate":0.00314513811300813,"x":1475193600000,"y":0.00314513811300813},{"date":"2016-10-03",".metric":"rmse",".estimator":"standard",".estimate":0.00135476344691113,"x":1475452800000,"y":0.00135476344691113},{"date":"2016-10-04",".metric":"rmse",".estimator":"standard",".estimate":0.00550845521949076,"x":1475539200000,"y":0.00550845521949076},{"date":"2016-10-05",".metric":"rmse",".estimator":"standard",".estimate":0.00354310101259846,"x":1475625600000,"y":0.00354310101259846},{"date":"2016-10-06",".metric":"rmse",".estimator":"standard",".estimate":0.00113420274953901,"x":1475712000000,"y":0.00113420274953901},{"date":"2016-10-07",".metric":"rmse",".estimator":"standard",".estimate":0.000533430818523165,"x":1475798400000,"y":0.000533430818523165},{"date":"2016-10-10",".metric":"rmse",".estimator":"standard",".estimate":0.000994234577704675,"x":1476057600000,"y":0.000994234577704675},{"date":"2016-10-11",".metric":"rmse",".estimator":"standard",".estimate":0.00544168235562282,"x":1476144000000,"y":0.00544168235562282},{"date":"2016-10-12",".metric":"rmse",".estimator":"standard",".estimate":0.000794636237756211,"x":1476230400000,"y":0.000794636237756211},{"date":"2016-10-13",".metric":"rmse",".estimator":"standard",".estimate":0.00141791000631537,"x":1476316800000,"y":0.00141791000631537},{"date":"2016-10-14",".metric":"rmse",".estimator":"standard",".estimate":0.00167771170140043,"x":1476403200000,"y":0.00167771170140043},{"date":"2016-10-17",".metric":"rmse",".estimator":"standard",".estimate":0.00128606775744151,"x":1476662400000,"y":0.00128606775744151},{"date":"2016-10-18",".metric":"rmse",".estimator":"standard",".estimate":0.000696321255851035,"x":1476748800000,"y":0.000696321255851035},{"date":"2016-10-19",".metric":"rmse",".estimator":"standard",".estimate":0.00106988912544378,"x":1476835200000,"y":0.00106988912544378},{"date":"2016-10-20",".metric":"rmse",".estimator":"standard",".estimate":0.00195601674382502,"x":1476921600000,"y":0.00195601674382502},{"date":"2016-10-21",".metric":"rmse",".estimator":"standard",".estimate":0.00151595275158728,"x":1477008000000,"y":0.00151595275158728},{"date":"2016-10-24",".metric":"rmse",".estimator":"standard",".estimate":0.000254057182805282,"x":1477267200000,"y":0.000254057182805282},{"date":"2016-10-25",".metric":"rmse",".estimator":"standard",".estimate":0.00269233868707742,"x":1477353600000,"y":0.00269233868707742},{"date":"2016-10-26",".metric":"rmse",".estimator":"standard",".estimate":0.00250671120842745,"x":1477440000000,"y":0.00250671120842745},{"date":"2016-10-27",".metric":"rmse",".estimator":"standard",".estimate":0.00295175598622102,"x":1477526400000,"y":0.00295175598622102},{"date":"2016-10-28",".metric":"rmse",".estimator":"standard",".estimate":0.00292179672571274,"x":1477612800000,"y":0.00292179672571274},{"date":"2016-10-31",".metric":"rmse",".estimator":"standard",".estimate":0.0021935037024321,"x":1477872000000,"y":0.0021935037024321},{"date":"2016-11-01",".metric":"rmse",".estimator":"standard",".estimate":0.00434584529735663,"x":1477958400000,"y":0.00434584529735663},{"date":"2016-11-02",".metric":"rmse",".estimator":"standard",".estimate":0.00188345490020019,"x":1478044800000,"y":0.00188345490020019},{"date":"2016-11-03",".metric":"rmse",".estimator":"standard",".estimate":0.00288298366891027,"x":1478131200000,"y":0.00288298366891027},{"date":"2016-11-04",".metric":"rmse",".estimator":"standard",".estimate":0.00432576156597594,"x":1478217600000,"y":0.00432576156597594},{"date":"2016-11-07",".metric":"rmse",".estimator":"standard",".estimate":0.00750457832153818,"x":1478476800000,"y":0.00750457832153818},{"date":"2016-11-08",".metric":"rmse",".estimator":"standard",".estimate":0.00143570899145871,"x":1478563200000,"y":0.00143570899145871},{"date":"2016-11-09",".metric":"rmse",".estimator":"standard",".estimate":0.00872335368942581,"x":1478649600000,"y":0.00872335368942581},{"date":"2016-11-10",".metric":"rmse",".estimator":"standard",".estimate":0.00130036512165486,"x":1478736000000,"y":0.00130036512165486},{"date":"2016-11-11",".metric":"rmse",".estimator":"standard",".estimate":0.000188960934723348,"x":1478822400000,"y":0.000188960934723348},{"date":"2016-11-14",".metric":"rmse",".estimator":"standard",".estimate":0.00374913684516115,"x":1479081600000,"y":0.00374913684516115},{"date":"2016-11-15",".metric":"rmse",".estimator":"standard",".estimate":0.00218357763184909,"x":1479168000000,"y":0.00218357763184909},{"date":"2016-11-16",".metric":"rmse",".estimator":"standard",".estimate":0.00238196498526625,"x":1479254400000,"y":0.00238196498526625},{"date":"2016-11-17",".metric":"rmse",".estimator":"standard",".estimate":0.000938810312724679,"x":1479340800000,"y":0.000938810312724679},{"date":"2016-11-18",".metric":"rmse",".estimator":"standard",".estimate":0.00130263392187033,"x":1479427200000,"y":0.00130263392187033},{"date":"2016-11-21",".metric":"rmse",".estimator":"standard",".estimate":0.00114916938760177,"x":1479686400000,"y":0.00114916938760177},{"date":"2016-11-22",".metric":"rmse",".estimator":"standard",".estimate":0.000501342327058653,"x":1479772800000,"y":0.000501342327058653},{"date":"2016-11-23",".metric":"rmse",".estimator":"standard",".estimate":0.00190004576015157,"x":1479859200000,"y":0.00190004576015157},{"date":"2016-11-25",".metric":"rmse",".estimator":"standard",".estimate":0.0012117898747769,"x":1480032000000,"y":0.0012117898747769},{"date":"2016-11-28",".metric":"rmse",".estimator":"standard",".estimate":0.00142795890448438,"x":1480291200000,"y":0.00142795890448438},{"date":"2016-11-29",".metric":"rmse",".estimator":"standard",".estimate":0.00268304157491472,"x":1480377600000,"y":0.00268304157491472},{"date":"2016-11-30",".metric":"rmse",".estimator":"standard",".estimate":0.00148940303325588,"x":1480464000000,"y":0.00148940303325588},{"date":"2016-12-01",".metric":"rmse",".estimator":"standard",".estimate":0.000437561464797262,"x":1480550400000,"y":0.000437561464797262},{"date":"2016-12-02",".metric":"rmse",".estimator":"standard",".estimate":0.00181584516009029,"x":1480636800000,"y":0.00181584516009029},{"date":"2016-12-05",".metric":"rmse",".estimator":"standard",".estimate":0.00350600101325258,"x":1480896000000,"y":0.00350600101325258},{"date":"2016-12-06",".metric":"rmse",".estimator":"standard",".estimate":0.000751234918225076,"x":1480982400000,"y":0.000751234918225076},{"date":"2016-12-07",".metric":"rmse",".estimator":"standard",".estimate":0.00288827459285574,"x":1481068800000,"y":0.00288827459285574},{"date":"2016-12-08",".metric":"rmse",".estimator":"standard",".estimate":0.000465255667712638,"x":1481155200000,"y":0.000465255667712638},{"date":"2016-12-09",".metric":"rmse",".estimator":"standard",".estimate":0.0041919487983297,"x":1481241600000,"y":0.0041919487983297},{"date":"2016-12-12",".metric":"rmse",".estimator":"standard",".estimate":0.00207827225774742,"x":1481500800000,"y":0.00207827225774742},{"date":"2016-12-13",".metric":"rmse",".estimator":"standard",".estimate":0.00397264757248993,"x":1481587200000,"y":0.00397264757248993},{"date":"2016-12-14",".metric":"rmse",".estimator":"standard",".estimate":0.0022687501699622,"x":1481673600000,"y":0.0022687501699622},{"date":"2016-12-15",".metric":"rmse",".estimator":"standard",".estimate":0.00397321228206072,"x":1481760000000,"y":0.00397321228206072},{"date":"2016-12-16",".metric":"rmse",".estimator":"standard",".estimate":0.00106538653048893,"x":1481846400000,"y":0.00106538653048893},{"date":"2016-12-19",".metric":"rmse",".estimator":"standard",".estimate":0.00352588010170244,"x":1482105600000,"y":0.00352588010170244},{"date":"2016-12-20",".metric":"rmse",".estimator":"standard",".estimate":0.00167888204009601,"x":1482192000000,"y":0.00167888204009601},{"date":"2016-12-21",".metric":"rmse",".estimator":"standard",".estimate":0.00175087061937348,"x":1482278400000,"y":0.00175087061937348},{"date":"2016-12-22",".metric":"rmse",".estimator":"standard",".estimate":0.00185851169646374,"x":1482364800000,"y":0.00185851169646374},{"date":"2016-12-23",".metric":"rmse",".estimator":"standard",".estimate":0.00175558003853289,"x":1482451200000,"y":0.00175558003853289},{"date":"2016-12-27",".metric":"rmse",".estimator":"standard",".estimate":0.00195891830748184,"x":1482796800000,"y":0.00195891830748184},{"date":"2016-12-28",".metric":"rmse",".estimator":"standard",".estimate":0.00396659321091997,"x":1482883200000,"y":0.00396659321091997},{"date":"2016-12-29",".metric":"rmse",".estimator":"standard",".estimate":0.000725011466779303,"x":1482969600000,"y":0.000725011466779303},{"date":"2016-12-30",".metric":"rmse",".estimator":"standard",".estimate":0.00176939826781894,"x":1483056000000,"y":0.00176939826781894},{"date":"2017-01-03",".metric":"rmse",".estimator":"standard",".estimate":0.00220068192788672,"x":1483401600000,"y":0.00220068192788672},{"date":"2017-01-04",".metric":"rmse",".estimator":"standard",".estimate":0.000642315768222612,"x":1483488000000,"y":0.000642315768222612},{"date":"2017-01-05",".metric":"rmse",".estimator":"standard",".estimate":0.00561006432182169,"x":1483574400000,"y":0.00561006432182169},{"date":"2017-01-06",".metric":"rmse",".estimator":"standard",".estimate":0.00157459366482963,"x":1483660800000,"y":0.00157459366482963},{"date":"2017-01-09",".metric":"rmse",".estimator":"standard",".estimate":0.00192663683645288,"x":1483920000000,"y":0.00192663683645288},{"date":"2017-01-10",".metric":"rmse",".estimator":"standard",".estimate":0.000578662280016658,"x":1484006400000,"y":0.000578662280016658},{"date":"2017-01-11",".metric":"rmse",".estimator":"standard",".estimate":0.000800513789868231,"x":1484092800000,"y":0.000800513789868231},{"date":"2017-01-12",".metric":"rmse",".estimator":"standard",".estimate":0.000500465543224869,"x":1484179200000,"y":0.000500465543224869},{"date":"2017-01-13",".metric":"rmse",".estimator":"standard",".estimate":0.00103834851978848,"x":1484265600000,"y":0.00103834851978848},{"date":"2017-01-17",".metric":"rmse",".estimator":"standard",".estimate":0.00383031801301423,"x":1484611200000,"y":0.00383031801301423},{"date":"2017-01-18",".metric":"rmse",".estimator":"standard",".estimate":0.00286167031512328,"x":1484697600000,"y":0.00286167031512328},{"date":"2017-01-19",".metric":"rmse",".estimator":"standard",".estimate":0.00328813540835988,"x":1484784000000,"y":0.00328813540835988},{"date":"2017-01-20",".metric":"rmse",".estimator":"standard",".estimate":0.000876948327263606,"x":1484870400000,"y":0.000876948327263606},{"date":"2017-01-23",".metric":"rmse",".estimator":"standard",".estimate":0.00156888230108249,"x":1485129600000,"y":0.00156888230108249},{"date":"2017-01-24",".metric":"rmse",".estimator":"standard",".estimate":0.0032166393991214,"x":1485216000000,"y":0.0032166393991214},{"date":"2017-01-25",".metric":"rmse",".estimator":"standard",".estimate":0.00217879048330384,"x":1485302400000,"y":0.00217879048330384},{"date":"2017-01-26",".metric":"rmse",".estimator":"standard",".estimate":0.00208204732072695,"x":1485388800000,"y":0.00208204732072695},{"date":"2017-01-27",".metric":"rmse",".estimator":"standard",".estimate":0.000599960726891122,"x":1485475200000,"y":0.000599960726891122},{"date":"2017-01-30",".metric":"rmse",".estimator":"standard",".estimate":0.000950789978650351,"x":1485734400000,"y":0.000950789978650351},{"date":"2017-01-31",".metric":"rmse",".estimator":"standard",".estimate":0.000507809258160204,"x":1485820800000,"y":0.000507809258160204},{"date":"2017-02-01",".metric":"rmse",".estimator":"standard",".estimate":0.00110790893937926,"x":1485907200000,"y":0.00110790893937926},{"date":"2017-02-02",".metric":"rmse",".estimator":"standard",".estimate":0.00220632295079328,"x":1485993600000,"y":0.00220632295079328},{"date":"2017-02-03",".metric":"rmse",".estimator":"standard",".estimate":0.00190173837291212,"x":1486080000000,"y":0.00190173837291212},{"date":"2017-02-06",".metric":"rmse",".estimator":"standard",".estimate":0.000283413885439519,"x":1486339200000,"y":0.000283413885439519},{"date":"2017-02-07",".metric":"rmse",".estimator":"standard",".estimate":0.000110672502792694,"x":1486425600000,"y":0.000110672502792694},{"date":"2017-02-08",".metric":"rmse",".estimator":"standard",".estimate":0.00105880448341185,"x":1486512000000,"y":0.00105880448341185},{"date":"2017-02-09",".metric":"rmse",".estimator":"standard",".estimate":0.00166694677845329,"x":1486598400000,"y":0.00166694677845329},{"date":"2017-02-10",".metric":"rmse",".estimator":"standard",".estimate":0.000220772382538859,"x":1486684800000,"y":0.000220772382538859},{"date":"2017-02-13",".metric":"rmse",".estimator":"standard",".estimate":0.000249209801303179,"x":1486944000000,"y":0.000249209801303179},{"date":"2017-02-14",".metric":"rmse",".estimator":"standard",".estimate":0.00431022211578958,"x":1487030400000,"y":0.00431022211578958},{"date":"2017-02-15",".metric":"rmse",".estimator":"standard",".estimate":0.00103493868767278,"x":1487116800000,"y":0.00103493868767278},{"date":"2017-02-16",".metric":"rmse",".estimator":"standard",".estimate":0.00205249723613406,"x":1487203200000,"y":0.00205249723613406},{"date":"2017-02-17",".metric":"rmse",".estimator":"standard",".estimate":0.00350548252229491,"x":1487289600000,"y":0.00350548252229491},{"date":"2017-02-21",".metric":"rmse",".estimator":"standard",".estimate":0.00119927435630245,"x":1487635200000,"y":0.00119927435630245},{"date":"2017-02-22",".metric":"rmse",".estimator":"standard",".estimate":0.00130648950199573,"x":1487721600000,"y":0.00130648950199573},{"date":"2017-02-23",".metric":"rmse",".estimator":"standard",".estimate":0.000390169797896131,"x":1487808000000,"y":0.000390169797896131},{"date":"2017-02-24",".metric":"rmse",".estimator":"standard",".estimate":0.00299172731009862,"x":1487894400000,"y":0.00299172731009862},{"date":"2017-02-27",".metric":"rmse",".estimator":"standard",".estimate":0.00296701477459117,"x":1488153600000,"y":0.00296701477459117},{"date":"2017-02-28",".metric":"rmse",".estimator":"standard",".estimate":0.00207270660961641,"x":1488240000000,"y":0.00207270660961641},{"date":"2017-03-01",".metric":"rmse",".estimator":"standard",".estimate":0.00365719805722696,"x":1488326400000,"y":0.00365719805722696},{"date":"2017-03-02",".metric":"rmse",".estimator":"standard",".estimate":0.00223678026754732,"x":1488412800000,"y":0.00223678026754732},{"date":"2017-03-03",".metric":"rmse",".estimator":"standard",".estimate":0.00435131557483537,"x":1488499200000,"y":0.00435131557483537},{"date":"2017-03-06",".metric":"rmse",".estimator":"standard",".estimate":0.00028169879948786,"x":1488758400000,"y":0.00028169879948786},{"date":"2017-03-07",".metric":"rmse",".estimator":"standard",".estimate":0.00113239477279367,"x":1488844800000,"y":0.00113239477279367},{"date":"2017-03-08",".metric":"rmse",".estimator":"standard",".estimate":0.00086881789580495,"x":1488931200000,"y":0.00086881789580495},{"date":"2017-03-09",".metric":"rmse",".estimator":"standard",".estimate":0.000789334630958708,"x":1489017600000,"y":0.000789334630958708},{"date":"2017-03-10",".metric":"rmse",".estimator":"standard",".estimate":0.00118177999223406,"x":1489104000000,"y":0.00118177999223406},{"date":"2017-03-13",".metric":"rmse",".estimator":"standard",".estimate":0.000374107274594283,"x":1489363200000,"y":0.000374107274594283},{"date":"2017-03-14",".metric":"rmse",".estimator":"standard",".estimate":0.000978798772025025,"x":1489449600000,"y":0.000978798772025025},{"date":"2017-03-15",".metric":"rmse",".estimator":"standard",".estimate":0.00267634984239615,"x":1489536000000,"y":0.00267634984239615},{"date":"2017-03-16",".metric":"rmse",".estimator":"standard",".estimate":0.00548569083226419,"x":1489622400000,"y":0.00548569083226419},{"date":"2017-03-17",".metric":"rmse",".estimator":"standard",".estimate":0.000924198492296146,"x":1489708800000,"y":0.000924198492296146},{"date":"2017-03-20",".metric":"rmse",".estimator":"standard",".estimate":0.00049963095661782,"x":1489968000000,"y":0.00049963095661782},{"date":"2017-03-21",".metric":"rmse",".estimator":"standard",".estimate":0.00733028421635421,"x":1490054400000,"y":0.00733028421635421},{"date":"2017-03-22",".metric":"rmse",".estimator":"standard",".estimate":0.00131465365401076,"x":1490140800000,"y":0.00131465365401076},{"date":"2017-03-23",".metric":"rmse",".estimator":"standard",".estimate":0.00131953505018048,"x":1490227200000,"y":0.00131953505018048},{"date":"2017-03-24",".metric":"rmse",".estimator":"standard",".estimate":0.00138218647588729,"x":1490313600000,"y":0.00138218647588729},{"date":"2017-03-27",".metric":"rmse",".estimator":"standard",".estimate":0.00191462925483187,"x":1490572800000,"y":0.00191462925483187},{"date":"2017-03-28",".metric":"rmse",".estimator":"standard",".estimate":0.00273949165867805,"x":1490659200000,"y":0.00273949165867805},{"date":"2017-03-29",".metric":"rmse",".estimator":"standard",".estimate":0.000822054788003299,"x":1490745600000,"y":0.000822054788003299},{"date":"2017-03-30",".metric":"rmse",".estimator":"standard",".estimate":0.00322639817543668,"x":1490832000000,"y":0.00322639817543668},{"date":"2017-03-31",".metric":"rmse",".estimator":"standard",".estimate":0.00068188422327266,"x":1490918400000,"y":0.00068188422327266},{"date":"2017-04-03",".metric":"rmse",".estimator":"standard",".estimate":0.000544672589532932,"x":1491177600000,"y":0.000544672589532932},{"date":"2017-04-04",".metric":"rmse",".estimator":"standard",".estimate":0.000854233252933708,"x":1491264000000,"y":0.000854233252933708},{"date":"2017-04-05",".metric":"rmse",".estimator":"standard",".estimate":0.00100477045596371,"x":1491350400000,"y":0.00100477045596371},{"date":"2017-04-06",".metric":"rmse",".estimator":"standard",".estimate":0.00215866438382077,"x":1491436800000,"y":0.00215866438382077},{"date":"2017-04-07",".metric":"rmse",".estimator":"standard",".estimate":0.000865894966884134,"x":1491523200000,"y":0.000865894966884134},{"date":"2017-04-10",".metric":"rmse",".estimator":"standard",".estimate":0.00118363053690387,"x":1491782400000,"y":0.00118363053690387},{"date":"2017-04-11",".metric":"rmse",".estimator":"standard",".estimate":0.000494117441836051,"x":1491868800000,"y":0.000494117441836051},{"date":"2017-04-12",".metric":"rmse",".estimator":"standard",".estimate":0.00380490458823101,"x":1491955200000,"y":0.00380490458823101},{"date":"2017-04-13",".metric":"rmse",".estimator":"standard",".estimate":0.000117661808128848,"x":1492041600000,"y":0.000117661808128848},{"date":"2017-04-17",".metric":"rmse",".estimator":"standard",".estimate":0.0049480019472521,"x":1492387200000,"y":0.0049480019472521},{"date":"2017-04-18",".metric":"rmse",".estimator":"standard",".estimate":0.00106821779106287,"x":1492473600000,"y":0.00106821779106287},{"date":"2017-04-19",".metric":"rmse",".estimator":"standard",".estimate":0.000438994283468146,"x":1492560000000,"y":0.000438994283468146},{"date":"2017-04-20",".metric":"rmse",".estimator":"standard",".estimate":0.00253151193669525,"x":1492646400000,"y":0.00253151193669525},{"date":"2017-04-21",".metric":"rmse",".estimator":"standard",".estimate":0.00166868348772587,"x":1492732800000,"y":0.00166868348772587},{"date":"2017-04-24",".metric":"rmse",".estimator":"standard",".estimate":9.66476184357384e-06,"x":1492992000000,"y":9.66476184357384e-06},{"date":"2017-04-25",".metric":"rmse",".estimator":"standard",".estimate":0.000600998687300889,"x":1493078400000,"y":0.000600998687300889},{"date":"2017-04-26",".metric":"rmse",".estimator":"standard",".estimate":0.000771834275634839,"x":1493164800000,"y":0.000771834275634839},{"date":"2017-04-27",".metric":"rmse",".estimator":"standard",".estimate":0.0032970150530321,"x":1493251200000,"y":0.0032970150530321},{"date":"2017-04-28",".metric":"rmse",".estimator":"standard",".estimate":0.000474000982886702,"x":1493337600000,"y":0.000474000982886702},{"date":"2017-05-01",".metric":"rmse",".estimator":"standard",".estimate":0.00209574437391007,"x":1493596800000,"y":0.00209574437391007},{"date":"2017-05-02",".metric":"rmse",".estimator":"standard",".estimate":0.00316517891181679,"x":1493683200000,"y":0.00316517891181679},{"date":"2017-05-03",".metric":"rmse",".estimator":"standard",".estimate":0.000206302215708535,"x":1493769600000,"y":0.000206302215708535},{"date":"2017-05-04",".metric":"rmse",".estimator":"standard",".estimate":0.00210164345440232,"x":1493856000000,"y":0.00210164345440232},{"date":"2017-05-05",".metric":"rmse",".estimator":"standard",".estimate":0.00154657108301496,"x":1493942400000,"y":0.00154657108301496},{"date":"2017-05-08",".metric":"rmse",".estimator":"standard",".estimate":0.0017768859329387,"x":1494201600000,"y":0.0017768859329387},{"date":"2017-05-09",".metric":"rmse",".estimator":"standard",".estimate":0.000473959293859324,"x":1494288000000,"y":0.000473959293859324},{"date":"2017-05-10",".metric":"rmse",".estimator":"standard",".estimate":0.000750032628056844,"x":1494374400000,"y":0.000750032628056844},{"date":"2017-05-11",".metric":"rmse",".estimator":"standard",".estimate":0.00156257968148124,"x":1494460800000,"y":0.00156257968148124},{"date":"2017-05-12",".metric":"rmse",".estimator":"standard",".estimate":0.00211340142295421,"x":1494547200000,"y":0.00211340142295421},{"date":"2017-05-15",".metric":"rmse",".estimator":"standard",".estimate":0.000190435030948505,"x":1494806400000,"y":0.000190435030948505},{"date":"2017-05-16",".metric":"rmse",".estimator":"standard",".estimate":0.00220134627792209,"x":1494892800000,"y":0.00220134627792209},{"date":"2017-05-17",".metric":"rmse",".estimator":"standard",".estimate":0.00920513582953843,"x":1494979200000,"y":0.00920513582953843},{"date":"2017-05-18",".metric":"rmse",".estimator":"standard",".estimate":0.00406630081956308,"x":1495065600000,"y":0.00406630081956308},{"date":"2017-05-19",".metric":"rmse",".estimator":"standard",".estimate":6.73730041831069e-05,"x":1495152000000,"y":6.73730041831069e-05},{"date":"2017-05-22",".metric":"rmse",".estimator":"standard",".estimate":0.00395531164381844,"x":1495411200000,"y":0.00395531164381844},{"date":"2017-05-23",".metric":"rmse",".estimator":"standard",".estimate":0.00194058527039616,"x":1495497600000,"y":0.00194058527039616},{"date":"2017-05-24",".metric":"rmse",".estimator":"standard",".estimate":0.00164017589020188,"x":1495584000000,"y":0.00164017589020188},{"date":"2017-05-25",".metric":"rmse",".estimator":"standard",".estimate":0.00473730155666312,"x":1495670400000,"y":0.00473730155666312},{"date":"2017-05-26",".metric":"rmse",".estimator":"standard",".estimate":0.00184065102288655,"x":1495756800000,"y":0.00184065102288655},{"date":"2017-05-30",".metric":"rmse",".estimator":"standard",".estimate":0.00250601739829797,"x":1496102400000,"y":0.00250601739829797},{"date":"2017-05-31",".metric":"rmse",".estimator":"standard",".estimate":0.000531071324415682,"x":1496188800000,"y":0.000531071324415682},{"date":"2017-06-01",".metric":"rmse",".estimator":"standard",".estimate":0.00315499368568156,"x":1496275200000,"y":0.00315499368568156},{"date":"2017-06-02",".metric":"rmse",".estimator":"standard",".estimate":0.00149827419496544,"x":1496361600000,"y":0.00149827419496544},{"date":"2017-06-05",".metric":"rmse",".estimator":"standard",".estimate":0.000224562299631777,"x":1496620800000,"y":0.000224562299631777},{"date":"2017-06-06",".metric":"rmse",".estimator":"standard",".estimate":0.00490647578662081,"x":1496707200000,"y":0.00490647578662081},{"date":"2017-06-07",".metric":"rmse",".estimator":"standard",".estimate":0.00181778717567682,"x":1496793600000,"y":0.00181778717567682},{"date":"2017-06-08",".metric":"rmse",".estimator":"standard",".estimate":0.00150359110204811,"x":1496880000000,"y":0.00150359110204811},{"date":"2017-06-09",".metric":"rmse",".estimator":"standard",".estimate":0.000589126233841853,"x":1496966400000,"y":0.000589126233841853},{"date":"2017-06-12",".metric":"rmse",".estimator":"standard",".estimate":0.000785267452530611,"x":1497225600000,"y":0.000785267452530611},{"date":"2017-06-13",".metric":"rmse",".estimator":"standard",".estimate":0.00032663222694752,"x":1497312000000,"y":0.00032663222694752},{"date":"2017-06-14",".metric":"rmse",".estimator":"standard",".estimate":0.00193962443631849,"x":1497398400000,"y":0.00193962443631849},{"date":"2017-06-15",".metric":"rmse",".estimator":"standard",".estimate":0.00178480116194402,"x":1497484800000,"y":0.00178480116194402},{"date":"2017-06-16",".metric":"rmse",".estimator":"standard",".estimate":0.00227838045911216,"x":1497571200000,"y":0.00227838045911216},{"date":"2017-06-19",".metric":"rmse",".estimator":"standard",".estimate":0.00281397691092402,"x":1497830400000,"y":0.00281397691092402},{"date":"2017-06-20",".metric":"rmse",".estimator":"standard",".estimate":0.00276198862616263,"x":1497916800000,"y":0.00276198862616263},{"date":"2017-06-21",".metric":"rmse",".estimator":"standard",".estimate":0.000192884361855274,"x":1498003200000,"y":0.000192884361855274},{"date":"2017-06-22",".metric":"rmse",".estimator":"standard",".estimate":0.000259720774694152,"x":1498089600000,"y":0.000259720774694152},{"date":"2017-06-23",".metric":"rmse",".estimator":"standard",".estimate":0.000557558234526353,"x":1498176000000,"y":0.000557558234526353},{"date":"2017-06-26",".metric":"rmse",".estimator":"standard",".estimate":0.000126300272981984,"x":1498435200000,"y":0.000126300272981984},{"date":"2017-06-27",".metric":"rmse",".estimator":"standard",".estimate":0.00561618579486439,"x":1498521600000,"y":0.00561618579486439},{"date":"2017-06-28",".metric":"rmse",".estimator":"standard",".estimate":0.0020236818136823,"x":1498608000000,"y":0.0020236818136823},{"date":"2017-06-29",".metric":"rmse",".estimator":"standard",".estimate":0.00479603402518207,"x":1498694400000,"y":0.00479603402518207},{"date":"2017-06-30",".metric":"rmse",".estimator":"standard",".estimate":0.0039170890474173,"x":1498780800000,"y":0.0039170890474173},{"date":"2017-07-03",".metric":"rmse",".estimator":"standard",".estimate":0.00330775801804971,"x":1499040000000,"y":0.00330775801804971},{"date":"2017-07-05",".metric":"rmse",".estimator":"standard",".estimate":0.000923877973580159,"x":1499212800000,"y":0.000923877973580159},{"date":"2017-07-06",".metric":"rmse",".estimator":"standard",".estimate":0.00455581805067294,"x":1499299200000,"y":0.00455581805067294},{"date":"2017-07-07",".metric":"rmse",".estimator":"standard",".estimate":0.00548688562762336,"x":1499385600000,"y":0.00548688562762336},{"date":"2017-07-10",".metric":"rmse",".estimator":"standard",".estimate":0.000429896269046332,"x":1499644800000,"y":0.000429896269046332},{"date":"2017-07-11",".metric":"rmse",".estimator":"standard",".estimate":0.00160472256570825,"x":1499731200000,"y":0.00160472256570825},{"date":"2017-07-12",".metric":"rmse",".estimator":"standard",".estimate":0.00036455733410835,"x":1499817600000,"y":0.00036455733410835},{"date":"2017-07-13",".metric":"rmse",".estimator":"standard",".estimate":0.00122078887115348,"x":1499904000000,"y":0.00122078887115348},{"date":"2017-07-14",".metric":"rmse",".estimator":"standard",".estimate":0.00027744181369957,"x":1499990400000,"y":0.00027744181369957},{"date":"2017-07-17",".metric":"rmse",".estimator":"standard",".estimate":0.000130179465291443,"x":1500249600000,"y":0.000130179465291443},{"date":"2017-07-18",".metric":"rmse",".estimator":"standard",".estimate":0.000304545761030702,"x":1500336000000,"y":0.000304545761030702},{"date":"2017-07-19",".metric":"rmse",".estimator":"standard",".estimate":0.000460089510473261,"x":1500422400000,"y":0.000460089510473261},{"date":"2017-07-20",".metric":"rmse",".estimator":"standard",".estimate":0.000350007543751075,"x":1500508800000,"y":0.000350007543751075},{"date":"2017-07-21",".metric":"rmse",".estimator":"standard",".estimate":0.00110942433359531,"x":1500595200000,"y":0.00110942433359531},{"date":"2017-07-24",".metric":"rmse",".estimator":"standard",".estimate":0.000574869765522386,"x":1500854400000,"y":0.000574869765522386},{"date":"2017-07-25",".metric":"rmse",".estimator":"standard",".estimate":0.00222648739942126,"x":1500940800000,"y":0.00222648739942126},{"date":"2017-07-26",".metric":"rmse",".estimator":"standard",".estimate":0.000724593735381417,"x":1501027200000,"y":0.000724593735381417},{"date":"2017-07-27",".metric":"rmse",".estimator":"standard",".estimate":0.000590437231668881,"x":1501113600000,"y":0.000590437231668881},{"date":"2017-07-28",".metric":"rmse",".estimator":"standard",".estimate":0.000418764452709116,"x":1501200000000,"y":0.000418764452709116},{"date":"2017-07-31",".metric":"rmse",".estimator":"standard",".estimate":0.00112373658911328,"x":1501459200000,"y":0.00112373658911328},{"date":"2017-08-01",".metric":"rmse",".estimator":"standard",".estimate":0.00102780149433163,"x":1501545600000,"y":0.00102780149433163},{"date":"2017-08-02",".metric":"rmse",".estimator":"standard",".estimate":0.0002863885935313,"x":1501632000000,"y":0.0002863885935313},{"date":"2017-08-03",".metric":"rmse",".estimator":"standard",".estimate":0.00216380496530155,"x":1501718400000,"y":0.00216380496530155},{"date":"2017-08-04",".metric":"rmse",".estimator":"standard",".estimate":0.0020390862850439,"x":1501804800000,"y":0.0020390862850439},{"date":"2017-08-07",".metric":"rmse",".estimator":"standard",".estimate":0.00144022400144353,"x":1502064000000,"y":0.00144022400144353},{"date":"2017-08-08",".metric":"rmse",".estimator":"standard",".estimate":0.00178102609248742,"x":1502150400000,"y":0.00178102609248742},{"date":"2017-08-09",".metric":"rmse",".estimator":"standard",".estimate":0.00173423072868356,"x":1502236800000,"y":0.00173423072868356},{"date":"2017-08-10",".metric":"rmse",".estimator":"standard",".estimate":0.00585601391382766,"x":1502323200000,"y":0.00585601391382766},{"date":"2017-08-11",".metric":"rmse",".estimator":"standard",".estimate":0.0025707399242942,"x":1502409600000,"y":0.0025707399242942},{"date":"2017-08-14",".metric":"rmse",".estimator":"standard",".estimate":0.00278565297348008,"x":1502668800000,"y":0.00278565297348008},{"date":"2017-08-15",".metric":"rmse",".estimator":"standard",".estimate":0.000598661458612812,"x":1502755200000,"y":0.000598661458612812},{"date":"2017-08-16",".metric":"rmse",".estimator":"standard",".estimate":0.00305110556814486,"x":1502841600000,"y":0.00305110556814486},{"date":"2017-08-17",".metric":"rmse",".estimator":"standard",".estimate":0.00575431425787176,"x":1502928000000,"y":0.00575431425787176},{"date":"2017-08-18",".metric":"rmse",".estimator":"standard",".estimate":0.000562189503631827,"x":1503014400000,"y":0.000562189503631827},{"date":"2017-08-21",".metric":"rmse",".estimator":"standard",".estimate":0.00136023893386477,"x":1503273600000,"y":0.00136023893386477},{"date":"2017-08-22",".metric":"rmse",".estimator":"standard",".estimate":0.00249350862483187,"x":1503360000000,"y":0.00249350862483187},{"date":"2017-08-23",".metric":"rmse",".estimator":"standard",".estimate":0.00208495474240162,"x":1503446400000,"y":0.00208495474240162},{"date":"2017-08-24",".metric":"rmse",".estimator":"standard",".estimate":0.00147841572396225,"x":1503532800000,"y":0.00147841572396225},{"date":"2017-08-25",".metric":"rmse",".estimator":"standard",".estimate":0.00202125503968285,"x":1503619200000,"y":0.00202125503968285},{"date":"2017-08-28",".metric":"rmse",".estimator":"standard",".estimate":0.000131985749159282,"x":1503878400000,"y":0.000131985749159282},{"date":"2017-08-29",".metric":"rmse",".estimator":"standard",".estimate":0.00391999455458316,"x":1503964800000,"y":0.00391999455458316},{"date":"2017-08-30",".metric":"rmse",".estimator":"standard",".estimate":0.00178521604081493,"x":1504051200000,"y":0.00178521604081493},{"date":"2017-08-31",".metric":"rmse",".estimator":"standard",".estimate":0.000244439546468598,"x":1504137600000,"y":0.000244439546468598},{"date":"2017-09-01",".metric":"rmse",".estimator":"standard",".estimate":0.000955465062412176,"x":1504224000000,"y":0.000955465062412176},{"date":"2017-09-05",".metric":"rmse",".estimator":"standard",".estimate":0.00506874671983177,"x":1504569600000,"y":0.00506874671983177},{"date":"2017-09-06",".metric":"rmse",".estimator":"standard",".estimate":0.00244980141200023,"x":1504656000000,"y":0.00244980141200023},{"date":"2017-09-07",".metric":"rmse",".estimator":"standard",".estimate":0.00299945537967095,"x":1504742400000,"y":0.00299945537967095},{"date":"2017-09-08",".metric":"rmse",".estimator":"standard",".estimate":0.00124741212445197,"x":1504828800000,"y":0.00124741212445197},{"date":"2017-09-11",".metric":"rmse",".estimator":"standard",".estimate":0.00173984561777132,"x":1505088000000,"y":0.00173984561777132},{"date":"2017-09-12",".metric":"rmse",".estimator":"standard",".estimate":0.00208563767344366,"x":1505174400000,"y":0.00208563767344366},{"date":"2017-09-13",".metric":"rmse",".estimator":"standard",".estimate":0.00208755873836036,"x":1505260800000,"y":0.00208755873836036},{"date":"2017-09-14",".metric":"rmse",".estimator":"standard",".estimate":5.35492512113534e-05,"x":1505347200000,"y":5.35492512113534e-05},{"date":"2017-09-15",".metric":"rmse",".estimator":"standard",".estimate":0.000942207369800087,"x":1505433600000,"y":0.000942207369800087},{"date":"2017-09-18",".metric":"rmse",".estimator":"standard",".estimate":0.00143626119319263,"x":1505692800000,"y":0.00143626119319263},{"date":"2017-09-19",".metric":"rmse",".estimator":"standard",".estimate":0.000323798894153753,"x":1505779200000,"y":0.000323798894153753},{"date":"2017-09-20",".metric":"rmse",".estimator":"standard",".estimate":0.00155049944364725,"x":1505865600000,"y":0.00155049944364725},{"date":"2017-09-21",".metric":"rmse",".estimator":"standard",".estimate":0.00176376363798185,"x":1505952000000,"y":0.00176376363798185},{"date":"2017-09-22",".metric":"rmse",".estimator":"standard",".estimate":0.000415792730767708,"x":1506038400000,"y":0.000415792730767708},{"date":"2017-09-25",".metric":"rmse",".estimator":"standard",".estimate":0.00254524556426304,"x":1506297600000,"y":0.00254524556426304},{"date":"2017-09-26",".metric":"rmse",".estimator":"standard",".estimate":0.00111846677080901,"x":1506384000000,"y":0.00111846677080901},{"date":"2017-09-27",".metric":"rmse",".estimator":"standard",".estimate":0.00236947032196308,"x":1506470400000,"y":0.00236947032196308},{"date":"2017-09-28",".metric":"rmse",".estimator":"standard",".estimate":0.0013698886857053,"x":1506556800000,"y":0.0013698886857053},{"date":"2017-09-29",".metric":"rmse",".estimator":"standard",".estimate":0.00129241622887146,"x":1506643200000,"y":0.00129241622887146},{"date":"2017-10-02",".metric":"rmse",".estimator":"standard",".estimate":0.00351452183004398,"x":1506902400000,"y":0.00351452183004398},{"date":"2017-10-03",".metric":"rmse",".estimator":"standard",".estimate":0.00021945426661678,"x":1506988800000,"y":0.00021945426661678},{"date":"2017-10-04",".metric":"rmse",".estimator":"standard",".estimate":0.000979514740696812,"x":1507075200000,"y":0.000979514740696812},{"date":"2017-10-05",".metric":"rmse",".estimator":"standard",".estimate":0.00169205866343378,"x":1507161600000,"y":0.00169205866343378},{"date":"2017-10-06",".metric":"rmse",".estimator":"standard",".estimate":0.00103678310109957,"x":1507248000000,"y":0.00103678310109957},{"date":"2017-10-09",".metric":"rmse",".estimator":"standard",".estimate":0.00264705975475339,"x":1507507200000,"y":0.00264705975475339},{"date":"2017-10-10",".metric":"rmse",".estimator":"standard",".estimate":0.00119099648284331,"x":1507593600000,"y":0.00119099648284331},{"date":"2017-10-11",".metric":"rmse",".estimator":"standard",".estimate":0.000569358577152982,"x":1507680000000,"y":0.000569358577152982},{"date":"2017-10-12",".metric":"rmse",".estimator":"standard",".estimate":0.00132402101517672,"x":1507766400000,"y":0.00132402101517672},{"date":"2017-10-13",".metric":"rmse",".estimator":"standard",".estimate":0.000432152619367148,"x":1507852800000,"y":0.000432152619367148},{"date":"2017-10-16",".metric":"rmse",".estimator":"standard",".estimate":0.00121290820282521,"x":1508112000000,"y":0.00121290820282521},{"date":"2017-10-17",".metric":"rmse",".estimator":"standard",".estimate":0.000636049496482721,"x":1508198400000,"y":0.000636049496482721},{"date":"2017-10-18",".metric":"rmse",".estimator":"standard",".estimate":0.000316197071631185,"x":1508284800000,"y":0.000316197071631185},{"date":"2017-10-19",".metric":"rmse",".estimator":"standard",".estimate":0.000301573819444778,"x":1508371200000,"y":0.000301573819444778},{"date":"2017-10-20",".metric":"rmse",".estimator":"standard",".estimate":0.00406596387415966,"x":1508457600000,"y":0.00406596387415966},{"date":"2017-10-23",".metric":"rmse",".estimator":"standard",".estimate":0.00355686147299521,"x":1508716800000,"y":0.00355686147299521},{"date":"2017-10-24",".metric":"rmse",".estimator":"standard",".estimate":0.00136140929244226,"x":1508803200000,"y":0.00136140929244226},{"date":"2017-10-25",".metric":"rmse",".estimator":"standard",".estimate":0.00165174014030913,"x":1508889600000,"y":0.00165174014030913},{"date":"2017-10-26",".metric":"rmse",".estimator":"standard",".estimate":0.00100238682889144,"x":1508976000000,"y":0.00100238682889144},{"date":"2017-10-27",".metric":"rmse",".estimator":"standard",".estimate":0.00141794327172736,"x":1509062400000,"y":0.00141794327172736},{"date":"2017-10-30",".metric":"rmse",".estimator":"standard",".estimate":0.00324552277266412,"x":1509321600000,"y":0.00324552277266412},{"date":"2017-10-31",".metric":"rmse",".estimator":"standard",".estimate":0.00087199540983986,"x":1509408000000,"y":0.00087199540983986},{"date":"2017-11-01",".metric":"rmse",".estimator":"standard",".estimate":0.00187456571156308,"x":1509494400000,"y":0.00187456571156308},{"date":"2017-11-02",".metric":"rmse",".estimator":"standard",".estimate":0.000879943303785382,"x":1509580800000,"y":0.000879943303785382},{"date":"2017-11-03",".metric":"rmse",".estimator":"standard",".estimate":0.000436694115392702,"x":1509667200000,"y":0.000436694115392702},{"date":"2017-11-06",".metric":"rmse",".estimator":"standard",".estimate":0.00136881117908498,"x":1509926400000,"y":0.00136881117908498},{"date":"2017-11-07",".metric":"rmse",".estimator":"standard",".estimate":0.00241278001359499,"x":1510012800000,"y":0.00241278001359499},{"date":"2017-11-08",".metric":"rmse",".estimator":"standard",".estimate":0.00046799600464614,"x":1510099200000,"y":0.00046799600464614},{"date":"2017-11-09",".metric":"rmse",".estimator":"standard",".estimate":0.000464882341365728,"x":1510185600000,"y":0.000464882341365728},{"date":"2017-11-10",".metric":"rmse",".estimator":"standard",".estimate":0.000435266335629412,"x":1510272000000,"y":0.000435266335629412},{"date":"2017-11-13",".metric":"rmse",".estimator":"standard",".estimate":0.000779059815034458,"x":1510531200000,"y":0.000779059815034458},{"date":"2017-11-14",".metric":"rmse",".estimator":"standard",".estimate":0.00242723716408704,"x":1510617600000,"y":0.00242723716408704},{"date":"2017-11-15",".metric":"rmse",".estimator":"standard",".estimate":0.000942307511846099,"x":1510704000000,"y":0.000942307511846099},{"date":"2017-11-16",".metric":"rmse",".estimator":"standard",".estimate":0.00322323632467401,"x":1510790400000,"y":0.00322323632467401},{"date":"2017-11-17",".metric":"rmse",".estimator":"standard",".estimate":0.00143858562566698,"x":1510876800000,"y":0.00143858562566698},{"date":"2017-11-20",".metric":"rmse",".estimator":"standard",".estimate":0.00148140095216353,"x":1511136000000,"y":0.00148140095216353},{"date":"2017-11-21",".metric":"rmse",".estimator":"standard",".estimate":0.000342712528964379,"x":1511222400000,"y":0.000342712528964379},{"date":"2017-11-22",".metric":"rmse",".estimator":"standard",".estimate":0.00129852636681049,"x":1511308800000,"y":0.00129852636681049},{"date":"2017-11-24",".metric":"rmse",".estimator":"standard",".estimate":0.000523981551319308,"x":1511481600000,"y":0.000523981551319308},{"date":"2017-11-27",".metric":"rmse",".estimator":"standard",".estimate":0.000109118405659181,"x":1511740800000,"y":0.000109118405659181},{"date":"2017-11-28",".metric":"rmse",".estimator":"standard",".estimate":0.00425595935550045,"x":1511827200000,"y":0.00425595935550045},{"date":"2017-11-29",".metric":"rmse",".estimator":"standard",".estimate":0.000812053876236767,"x":1511913600000,"y":0.000812053876236767},{"date":"2017-11-30",".metric":"rmse",".estimator":"standard",".estimate":0.00567167634956679,"x":1512000000000,"y":0.00567167634956679},{"date":"2017-12-01",".metric":"rmse",".estimator":"standard",".estimate":0.00131648503522855,"x":1512086400000,"y":0.00131648503522855},{"date":"2017-12-04",".metric":"rmse",".estimator":"standard",".estimate":0.000202800762239206,"x":1512345600000,"y":0.000202800762239206},{"date":"2017-12-05",".metric":"rmse",".estimator":"standard",".estimate":0.000220439983437353,"x":1512432000000,"y":0.000220439983437353},{"date":"2017-12-06",".metric":"rmse",".estimator":"standard",".estimate":0.000334522204095748,"x":1512518400000,"y":0.000334522204095748},{"date":"2017-12-07",".metric":"rmse",".estimator":"standard",".estimate":0.00118911941047771,"x":1512604800000,"y":0.00118911941047771},{"date":"2017-12-08",".metric":"rmse",".estimator":"standard",".estimate":0.00239770292034793,"x":1512691200000,"y":0.00239770292034793},{"date":"2017-12-11",".metric":"rmse",".estimator":"standard",".estimate":0.000404106695356842,"x":1512950400000,"y":0.000404106695356842},{"date":"2017-12-12",".metric":"rmse",".estimator":"standard",".estimate":0.000860339700663732,"x":1513036800000,"y":0.000860339700663732},{"date":"2017-12-13",".metric":"rmse",".estimator":"standard",".estimate":0.000816992792503522,"x":1513123200000,"y":0.000816992792503522},{"date":"2017-12-14",".metric":"rmse",".estimator":"standard",".estimate":0.000475246264804703,"x":1513209600000,"y":0.000475246264804703},{"date":"2017-12-15",".metric":"rmse",".estimator":"standard",".estimate":0.00510855365225753,"x":1513296000000,"y":0.00510855365225753},{"date":"2017-12-18",".metric":"rmse",".estimator":"standard",".estimate":0.000761697975464818,"x":1513555200000,"y":0.000761697975464818},{"date":"2017-12-19",".metric":"rmse",".estimator":"standard",".estimate":0.00324328997442093,"x":1513641600000,"y":0.00324328997442093},{"date":"2017-12-20",".metric":"rmse",".estimator":"standard",".estimate":0.00139695394569652,"x":1513728000000,"y":0.00139695394569652},{"date":"2017-12-21",".metric":"rmse",".estimator":"standard",".estimate":5.61810734120722e-05,"x":1513814400000,"y":5.61810734120722e-05},{"date":"2017-12-22",".metric":"rmse",".estimator":"standard",".estimate":0.00135185904279269,"x":1513900800000,"y":0.00135185904279269},{"date":"2017-12-26",".metric":"rmse",".estimator":"standard",".estimate":0.000284135320262809,"x":1514246400000,"y":0.000284135320262809},{"date":"2017-12-27",".metric":"rmse",".estimator":"standard",".estimate":0.000798608648818393,"x":1514332800000,"y":0.000798608648818393},{"date":"2017-12-28",".metric":"rmse",".estimator":"standard",".estimate":0.000531259849385336,"x":1514419200000,"y":0.000531259849385336},{"date":"2017-12-29",".metric":"rmse",".estimator":"standard",".estimate":0.00227136567770183,"x":1514505600000,"y":0.00227136567770183}],"type":"scatter"}],"xAxis":{"type":"datetime","title":{"text":"date"},"categories":null}},"theme":{"chart":{"backgroundColor":"transparent"}},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"chart","fonts":[],"debug":false},"evals":[],"jsHooks":[]}</script>
<p>It looks like our RMSE is relatively stable, except for a period in mid to late 2015.</p>
<p>The amazing power of <code>parsnip</code> is how efficiently we can toggle to another random forest engine. Let’s suppose we wished to use the <a href="https://cran.r-project.org/web/packages/randomForest/randomForest.pdf"><code>randomForest</code></a> package instead of <code>ranger</code>. Here’s how we could reconfigure our previous work to use a different engine.</p>
<p>First, we’ll load up the <code>randomForest</code> package, because we need to load the package in order to use it as our engine. Then, we make one tweak to the original <code>ranger_rf_regress</code> function, by changing <code>set_engine("ranger")</code> to <code>set_engine("randomForest")</code>. That’s all, and we’re now running a random forest model using a different package.</p>
<pre class="r"><code>library(randomForest)
randomForest_rf_regress <- function(mtry = 3, trees = 5, split){
analysis_set_rf <- analysis(split)
model <-
rand_forest(mtry = mtry, trees = trees) %>%
set_engine("randomForest") %>%
fit(daily_returns ~ MKT + SMB + HML + RMW + CMA, data = analysis_set_rf)
assessment_set_rf <- assessment(split)
assessment_set_rf %>%
select(date, daily_returns) %>%
mutate(.pred = unlist(predict(model, new_data = assessment_set_rf))) %>%
select(date, daily_returns, .pred)
}</code></pre>
<p>We now have a new function called <code>randomForest_rf_regress()</code> that uses <code>randomForest</code> as the engine for our model and can use the same code scaffolding to run that model on our 1159 splits.</p>
<pre class="r"><code>randomForest_results <-
map_df(.x = rolling_origin_spy_2013_2017$splits,
~randomForest_rf_regress(mtry = 3, trees = 100, split = .x))
randomForest_results %>%
head()</code></pre>
<pre><code># A tibble: 6 x 4
# Groups: asset [1]
asset date daily_returns .pred
<chr> <date> <dbl> <dbl>
1 SPY 2013-05-28 0.00597 0.00609
2 SPY 2013-05-29 -0.00652 -0.00438
3 SPY 2013-05-30 0.00369 0.00597
4 SPY 2013-05-31 -0.0145 -0.00987
5 SPY 2013-06-03 0.00549 0.00134
6 SPY 2013-06-04 -0.00482 0.00118</code></pre>
<p>And we can use the same <code>yardstick</code> code to extract the <code>RMSE</code>.</p>
<pre class="r"><code>randomForest_results %>%
group_by(date) %>%
rmse(daily_returns, .pred) %>%
summarise(avg_rmse = mean(.estimate))</code></pre>
<pre><code># A tibble: 1 x 1
avg_rmse
<dbl>
1 0.00252</code></pre>
<p>There’s a lot more to explore in the <code>parsnip</code> package and the <code>tidymodels</code> collection. See you next time when we’ll get into some classification!</p>
<p>Wait: shameless book plug for those who read to the end: if you like this sort of thing, check out my new book <a href="https://www.amazon.com/Reproducible-Finance-Portfolio-Analysis-Chapman/dp/1138484032">Reproducible Finance with R</a>!</p>
<script>window.location.href='https://rviews.rstudio.com/2019/03/14/parsnipping-fama-french/';</script>
Paid in Books: An Interview with Christian Westergaard
https://rviews.rstudio.com/2019/03/07/treasured-books-an-interview-with-christian-westergaard/
Thu, 07 Mar 2019 00:00:00 +0000https://rviews.rstudio.com/2019/03/07/treasured-books-an-interview-with-christian-westergaard/
<p>R is greatly benefiting from new users coming from disciplines that traditionally did not provoke much serious computation. Journalists<sup>1</sup> and humanist scholars<sup>2</sup>, for example, are embracing R. But, does the avenue from the Humanities go both ways? In a recent conversation with Christian Westergaard, proprietor of <a href="https://www.sophiararebooks.com/">Sophia Rare Books</a> in Copenhagen, I was delighted to learn that it does.</p>
<hr />
<p><em>JBR: I was very pleased to learn when I spoke with you recently at the California Antiquarian Book Fair that you were an S and S+ user in graduate school. What were you studying and how how was S and S+ helpful?</em></p>
<p>CW: I did a Master’s in mathematics and a Bachelor’s in statistics at the University of Copenhagen in Denmark, graduating in 2005. During the first year of my courses in statistics, we were quickly introduced to S+ in order to do monthly assignments. In these assignments, we were to apply the theory we had learned in the lectures on some concrete data. I still remember how difficult I initially found applying the right statistical tools to real-world problems, rather than just understanding the math in theoretical statistics. I developed a deep respect for applied statistics. Our minds can easily be deceived and we need proper statistics to make the right decisions.</p>
<p><em>JBR: How did you move from technical studies to dealing in rare scientific books and manuscripts? On the surface it seems that these might be two completely unrelated activities. How did you find a path between these two worlds?</em></p>
<p>CW: It was a gradual shift. When I first began studying, I went into an old antiquarian book shop to acquire some second-hand math and statistic books to supplement my ordinary course texts. One of them was Statistical Methods by <a href="https://en.wikipedia.org/wiki/Anders_Hald">Anders Hald</a>. Hald was no longer working at the university, but his text had become a classic. I was fascinated by this book shop. The owner was an old, grey-haired man sitting behind a huge stack of books smoking a pipe, and writing his sales catalogues on an old IBM typing machine. He allowed me to go down into his cellar where there books everywhere from floor to ceiling. There were many books which I wanted to acquire down there, but I hardly had any money. It was a mess in the cellar and I offered to tidy up if he could pay me in the books I wanted, and he agreed. I loved coming to work there, and I continued to do so my entire studies. My boss gave me more and more responsibility and put me in charge of the mathematics, physics, statistics and science books in general. When I finished my masters, I was considering doing a PhD. I loved mathematics and still do until this day. But I also found that when I woke up in the morning, I was thinking of antiquarian books and in the evening I couldn’t get to bed because I was thinking of books. It gave me energy and happiness. So I thought, why not try and be a rare book dealer for a year or two and see how it works out? It’s been 14 years since I made that decision, and I have really enjoyed it. In 2009, I decided to start my own company and specialize in important books and manuscripts in science.</p>
<p><em>JBR: What is it like to be immersed in these rare artifacts that were so important for the transmission of scientific knowledge? What kinds of scholars do you consult to establish the authenticity of works like Euler’s <a href="https://www.sophiararebooks.com/pages/books/4420/leonhard-euler/opuscula-varii-argumenti-tomus-i-conjectura-physica-circa-propagationem-soni-ac-luminis-tomus-ii">Opuscula Varii Argumenti</a> or Cauchy’s <a href="https://www.sophiararebooks.com/pages/books/3696/augustin-louis-cauchy/lecons-sur-le-calcul-diff-rentiel">Leçons sur le calcul diffrentiel</a>?</em></p>
<p><img src="/post/2019-03-05-Sophia_files/Cauchy.png" height = "200" width="400"></p>
<p>CW: I feel privileged to handle some of these objects on a daily basis. One day I am sitting with an original autograph manuscript by Einstein doing research on relativity, and the next day I have a presentation copy of Darwin’s Origin of Species in my hands. These are objects which have changed the world and the way we think about ourselves. In addition to the books and manuscripts, I find the people who I meet extremely interesting. A few years before Anders Hald (whose book had originally brought me into my old boss’ shop) passed away, I went to buy his books. He was 92 and completely fresh in his mind. We spoke about the history of statistics – a subject about which he authored several books.</p>
<p><em>JBR: I have noticed that collectors seem to be very interested in the works of twentieth-century mathematicians and physicists. You have works by <a href="https://www.sophiararebooks.com/pages/books/4543/alonzo-church/an-unsolvable-problem-in-elementary-number-theory">Alonzo Church</a>, <a href="https://www.sophiararebooks.com/pages/books/4559/kurt-godel/uber-formal-unentscheidbare-satze-der-principia-mathematica-undver-wandter-systeme-i-offprint">Kurt Gödel</a>, <a href="https://www.sophiararebooks.com/pages/books/4459/richard-phillips-feynman/surely-you-re-joking-mr-feynman-adventures-of-a-curious-character-as-told-to-ralph-leighton">Richard Feynman</a>, and others in your catalogue. But your roster of statisticians seems to focus on the old masters such as <a href="https://www.sophiararebooks.com/pages/books/4159/pierre-simon-laplace-marquis-de/theorie-analytique-des-probabilites-paris-courcier-1812-with-supplement-a-la-theorie-analytique">Laplace</a> and <a href="https://www.sophiararebooks.com/pages/books/4637/abraham-de-moivre/the-doctrine-of-chances-or-a-method-of-calculating-the-probability-of-events-in-play">de Moivre</a>. Are collectors also interested in Karl Pearson, Udny Yule, and R. A. Fisher?</em></p>
<p><img src="/post/2019-03-05-Sophia_files/deMoivre.png" height = "200" width="400"></p>
<p>CW: Certainly. Maybe so much so that every time I get one of Pearson or Fisher’s main papers they sell immediately. That’s why you don’t see them in my stock at the moment.</p>
<p><em>JBR: I noticed that you had two works by <a href="https://www.sophiararebooks.com/pages/books/4276/sonya-kowalevsky-sofya-vasilyevna-or-kovalevskaya/sur-une-propriete-du-systeme-dequations-differentielles-qui-definit-la-rotation-dun-corps-solide">Sofya Vasilyevna Kovalevskaya</a> on display in California. Do you see a renewed interest in the works of women scientists and mathematicians, or is this remarkable and brilliant woman an exception?</em></p>
<p>CW: There has definitely been a renewed interest in exceptional woman scientists. A few years ago the New York-based Grolier Club hosted an exhibition called ‘Extraordinary Women in Science and Medicine’, and several institutions are focusing on the subject. These woman who broke through the social constraints against them are exceptional and fascinating people.</p>
<p><em>JBR: Although there are notable exceptions (Donald Knuth’s typesetting comes to mind), I think most data scientists, computer scientists, and statisticians work in a digital world of ebooks and poorly printed texts. Do you think that the technical book as a collectable artifact will survive the twenty-first century? What advice would you give to working data scientists and statisticians who are interested in collecting?</em></p>
<p>CW: Good question. Many important papers nowadays are not even printed, and the only physical material a researcher might have left from some landmark work might be some scribbles he or she did on a piece of paper. There are examples of people who collect digital art. They use various ways of signing or otherwise authenticating the artists work even if it’s on a USB stick. Maybe that’s how some research papers might be collected in the future?</p>
<p>My advice for anyone wanting to start collecting would be to first focus on some of the classics in their field or some other field that fascinates them. The classics will have been collected by many others in the past and there will be good descriptions, bibliographies, and catalogues describing them and why they are collectible. That way one will gradually get a feeling about which mechanisms are important when collecting and what to focus on, e.g., condition, provenance, etc. And then I’d say it’s important to build a good relationship with at least one dealer with a good reputation in the trade. Any great collection is built on a collaboration were collectors and dealers work together.</p>
<p><em>JBR: Excellent advice! Thank you Christian.</em></p>
<p><sup>1</sup> For example, have a look at some of the R training at this year’s <a href="https://www.ire.org/conferences/nicar-2019/">IRE-CAR Conference</a>.</p>
<p><sup>2</sup> See, for example, these University of Washington <a href="https://libguides.wustl.edu/c.php?g=385216&p=3561786">resources</a> for the digital humanities.</p>
<p><em><a href="https://www.sophiararebooks.com/">Sophia Rare Books</a> (Copenhagen), specializes in rare and important books and manuscripts in the History of Science and Medicine fields.</em></p>
<script>window.location.href='https://rviews.rstudio.com/2019/03/07/treasured-books-an-interview-with-christian-westergaard/';</script>
Graph analysis using the tidyverse
https://rviews.rstudio.com/2019/03/06/intro-to-graph-analysis/
Wed, 06 Mar 2019 00:00:00 +0000https://rviews.rstudio.com/2019/03/06/intro-to-graph-analysis/
<p>It is because I am not a graph analysis expert that I thought it important to write this article. For someone who thinks in terms of single rectangular data sets, it is a bit of a mental leap to understand how to apply <em>tidy</em> principles to a more robust object, such as a graph table. Thankfully, there are two packages that make this work much easier:</p>
<ul>
<li><p><a href="https://github.com/thomasp85/tidygraph"><code>tidygraph</code></a> - Provides a way for <code>dplyr</code> to interact with graphs</p></li>
<li><p><a href="https://github.com/thomasp85/ggraph"><code>ggraph</code></a> - Extension to <code>ggplot2</code> for graph analysis</p></li>
</ul>
<div id="quick-intro" class="section level3">
<h3>Quick intro</h3>
<p>Simply put, graph theory studies relationships between objects in a group. Visually, we can think of a graph as a series of interconnected circles, each representing a member of a group, such as people in a Social Network. Lines drawn between the circles represent a relationship between the members, such as friendships in a Social Network. Graph analysis helps with figuring out things such as the influence of a certain member, or how many friends are in between two members. A more formal definition and detailed explanation of Graph Theory can be found in <a href="https://en.wikipedia.org/wiki/Graph_theory">Wikipedia here</a>.</p>
</div>
<div id="example" class="section level2">
<h2>Example</h2>
<p>Using an example, this article will introduce concepts of graph analysis work, and how <code>tidyverse</code> and <code>tidyverse</code>-adjacent tools can be used for such analysis.</p>
<div id="data-source" class="section level3">
<h3>Data source</h3>
<p>The <a href="https://github.com/rfordatascience/tidytuesday">tidytuesday</a> weekly project encourages new and experienced users to use the <code>tidyverse</code> tools to analyze data sets that change every week. I have been using that opportunity to lean new tools and techniques. One of the most recent data sets relates to French trains; it contains aggregate daily total trips per connecting stations.</p>
<pre class="r"><code>library(readr)
url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-26/small_trains.csv"
small_trains <- read_csv(url)</code></pre>
<pre class="r"><code>head(small_trains)</code></pre>
<pre><code>## # A tibble: 6 x 13
## year month service departure_stati… arrival_station journey_time_avg
## <int> <int> <chr> <chr> <chr> <dbl>
## 1 2017 9 Nation… PARIS EST METZ 85.1
## 2 2017 9 Nation… REIMS PARIS EST 47.1
## 3 2017 9 Nation… PARIS EST STRASBOURG 116.
## 4 2017 9 Nation… PARIS LYON AVIGNON TGV 161.
## 5 2017 9 Nation… PARIS LYON BELLEGARDE (AI… 164.
## 6 2017 9 Nation… PARIS LYON BESANCON FRANC… 129.
## # … with 7 more variables: total_num_trips <int>,
## # avg_delay_all_departing <dbl>, avg_delay_all_arriving <dbl>,
## # num_late_at_departure <int>, num_arriving_late <int>,
## # delay_cause <chr>, delayed_number <dbl></code></pre>
</div>
<div id="data-preparation" class="section level3">
<h3>Data Preparation</h3>
<p>Even though it was meant to analyze delays, I thought it would be interesting to use the data to understand how stations connect with each other. A new summarized data set is created, called <em>routes</em>, which contains a single entry for each connected station. It also includes the average journey time it takes to go between stations.</p>
<pre class="r"><code>library(dplyr)
routes <- small_trains %>%
group_by(departure_station, arrival_station) %>%
summarise(journey_time = mean(journey_time_avg)) %>%
ungroup() %>%
mutate(from = departure_station,
to = arrival_station) %>%
select(from, to, journey_time)
routes</code></pre>
<pre><code>## # A tibble: 130 x 3
## from to journey_time
## <chr> <chr> <dbl>
## 1 AIX EN PROVENCE TGV PARIS LYON 186.
## 2 ANGERS SAINT LAUD PARIS MONTPARNASSE 97.5
## 3 ANGOULEME PARIS MONTPARNASSE 146.
## 4 ANNECY PARIS LYON 225.
## 5 ARRAS PARIS NORD 52.8
## 6 AVIGNON TGV PARIS LYON 161.
## 7 BARCELONA PARIS LYON 358.
## 8 BELLEGARDE (AIN) PARIS LYON 163.
## 9 BESANCON FRANCHE COMTE TGV PARIS LYON 131.
## 10 BORDEAUX ST JEAN PARIS MONTPARNASSE 186.
## # … with 120 more rows</code></pre>
<p>The next step is to transform the tidy data set, into a graph table. In order to prepare <em>routes</em> for this transformation, it has to contain two variables specifically named: <em>from</em> and <em>to</em>, which are the names that <code>tidygraph</code> expects to see. Those variables should contain the name of each member (e.g., “AIX EN PROVENCE TGV”), and the relationship (“AIX EN PROVENCE TGV” -> “PARIS LYON”) .</p>
<p>In graph terminology, a member of the group is called a <strong>node</strong> (or vertex) in the graph, and a relationship between nodes is called an <strong>edge</strong>.</p>
<pre class="r"><code>library(tidygraph)
graph_routes <- as_tbl_graph(routes)
graph_routes</code></pre>
<pre><code>## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 59 x 1 (active)
## name
## <chr>
## 1 AIX EN PROVENCE TGV
## 2 ANGERS SAINT LAUD
## 3 ANGOULEME
## 4 ANNECY
## 5 ARRAS
## 6 AVIGNON TGV
## # … with 53 more rows
## #
## # Edge Data: 130 x 3
## from to journey_time
## <int> <int> <dbl>
## 1 1 39 186.
## 2 2 40 97.5
## 3 3 40 146.
## # … with 127 more rows</code></pre>
<p>The <code>as_tbl_graph()</code> function splits the <em>routes</em> table into two:</p>
<ol style="list-style-type: decimal">
<li><p>Node Data - Contains all of the unique values found in the <em>from</em> and <em>to</em> variables. In this case, it is a table with a single column containing the names of all of the stations.</p></li>
<li><p>Edge Data - Is a table of all relationships between <em>from</em> and <em>to</em>. A peculiarity of <code>tidygraph</code> is that it uses the row position of the node as the identifier for <em>from</em> and <em>to</em>, instead of its original name.</p></li>
</ol>
<p>Another interesting thing about <code>tidygraph</code> is that it allows us to attach more information about the node or edge in an additional column. In this case, <em>journey_time</em> is not really needed to create the graph table, but it may be needed for the analysis we plan to perform. The <code>as_tbl_graph()</code> function automatically created the column for us.</p>
<p>Thinking about <em>graph_routes</em> as two <code>tibbles</code> inside a larger table graph, was one of the two major mental breakthroughs I had during this exercise. At that point, it became evident that <code>dplyr</code> needs a way to know which of the two tables (nodes or edges) to perform the transformations on. In <code>tidygraph</code>, this is done using the <code>activate()</code> function. To showcase this, the nodes table will be “activated” in order to add two new string variables derived from <em>name</em>.</p>
<pre class="r"><code>library(stringr)
graph_routes <- graph_routes %>%
activate(nodes) %>%
mutate(
title = str_to_title(name),
label = str_replace_all(title, " ", "\n")
)
graph_routes</code></pre>
<pre><code>## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 59 x 3 (active)
## name title label
## <chr> <chr> <chr>
## 1 AIX EN PROVENCE TGV Aix En Provence Tgv "Aix\nEn\nProvence\nTgv"
## 2 ANGERS SAINT LAUD Angers Saint Laud "Angers\nSaint\nLaud"
## 3 ANGOULEME Angouleme Angouleme
## 4 ANNECY Annecy Annecy
## 5 ARRAS Arras Arras
## 6 AVIGNON TGV Avignon Tgv "Avignon\nTgv"
## # … with 53 more rows
## #
## # Edge Data: 130 x 3
## from to journey_time
## <int> <int> <dbl>
## 1 1 39 186.
## 2 2 40 97.5
## 3 3 40 146.
## # … with 127 more rows</code></pre>
<p>It was really impressive how easy it was to manipulate the graph table, because once one of the two tables are activated, all of the changes can be made using <code>tidyverse</code> tools. The same approach can be used to extract data from the graph table. In this case, a list of all the stations is pulled into a single character vector.</p>
<pre class="r"><code>stations <- graph_routes %>%
activate(nodes) %>%
pull(title)
stations</code></pre>
<pre><code>## [1] "Aix En Provence Tgv" "Angers Saint Laud"
## [3] "Angouleme" "Annecy"
## [5] "Arras" "Avignon Tgv"
## [7] "Barcelona" "Bellegarde (Ain)"
## [9] "Besancon Franche Comte Tgv" "Bordeaux St Jean"
## [11] "Brest" "Chambery Challes Les Eaux"
## [13] "Dijon Ville" "Douai"
## [15] "Dunkerque" "Francfort"
## [17] "Geneve" "Grenoble"
## [19] "Italie" "La Rochelle Ville"
## [21] "Lausanne" "Laval"
## [23] "Le Creusot Montceau Montchanin" "Le Mans"
## [25] "Lille" "Lyon Part Dieu"
## [27] "Macon Loche" "Madrid"
## [29] "Marne La Vallee" "Marseille St Charles"
## [31] "Metz" "Montpellier"
## [33] "Mulhouse Ville" "Nancy"
## [35] "Nantes" "Nice Ville"
## [37] "Nimes" "Paris Est"
## [39] "Paris Lyon" "Paris Montparnasse"
## [41] "Paris Nord" "Paris Vaugirard"
## [43] "Perpignan" "Poitiers"
## [45] "Quimper" "Reims"
## [47] "Rennes" "Saint Etienne Chateaucreux"
## [49] "St Malo" "St Pierre Des Corps"
## [51] "Strasbourg" "Stuttgart"
## [53] "Toulon" "Toulouse Matabiau"
## [55] "Tourcoing" "Tours"
## [57] "Valence Alixan Tgv" "Vannes"
## [59] "Zurich"</code></pre>
</div>
</div>
<div id="visualizing" class="section level2">
<h2>Visualizing</h2>
<p>In graphs, the absolute position of the each node is not as relevant as it is with other kinds of visualizations. A very minimal <code>ggplot2</code> theme is set to make it easier to view the plotted graph.</p>
<pre class="r"><code>library(ggplot2)
thm <- theme_minimal() +
theme(
legend.position = "none",
axis.title = element_blank(),
axis.text = element_blank(),
panel.grid = element_blank(),
panel.grid.major = element_blank(),
)
theme_set(thm)</code></pre>
<p>To create the plot, start with <code>ggraph()</code> instead of <code>ggplot2()</code>. The <code>ggraph</code> package contains <code>geoms</code> that are unique to graph analysis. The package contains <code>geoms</code> to specifically plot nodes, and other <code>geoms</code> for edges.</p>
<p>As a first basic test, the <em>point</em> <code>geom</code> will be used, but instead of calling<code>geom_point()</code>, we call <code>geom_node_point()</code>. The edges are plotted using <code>geom_edge_diagonal()</code>.</p>
<pre class="r"><code>library(ggraph)
graph_routes %>%
ggraph(layout = "kk") +
geom_node_point() +
geom_edge_diagonal() </code></pre>
<p><img src="/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-9-1.png" width="672" /></p>
<p>To make it easier to see where each station is placed in this plot, the <code>geom_node_text()</code> is used. Just as with regular <code>geoms</code> in <code>ggplot2</code>, other attributes such as <code>size</code>, <code>color</code>, and <code>alpha</code> can be modified.</p>
<pre class="r"><code>graph_routes %>%
ggraph(layout = "kk") +
geom_node_text(aes(label = label, color = name), size = 3) +
geom_edge_diagonal(color = "gray", alpha = 0.4) </code></pre>
<p><img src="/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-10-1.png" width="672" /></p>
</div>
<div id="morphing-time" class="section level2">
<h2>Morphing time!</h2>
<p>The second mental leap was understanding how a graph algorithm is applied. Typically, the output of a model function is a model object, not a data object. With <code>tidygraph</code>, the process begins and ends with a graph table. The steps are these:</p>
<ol style="list-style-type: decimal">
<li>Start with a graph table</li>
<li>Temporarily transform the graph to comply with the model that is requested (<code>morph()</code>)</li>
<li>Add additional transformations to the morphed data using <code>dplyr</code> (optional)</li>
<li>Restore the original graph table, but modified to keep the changes made during the morph</li>
</ol>
<p>The shortest path algorithm defines the “length” as the number of edges in between two nodes. There may be multiple routes to get from point A to point B, but the algorithm chooses the one with the fewest number of “hops”. The way to call the algorithm is inside the <code>morph()</code> function. Even though <code>to_shortest_path()</code> is a function in itself, and it is possible run it without <code>morph()</code>, it is not meant to be used that way. In the example, the <em>journey_time</em> is used as <code>weights</code> to help the algorithm find an optimal route between the <em>Arras</em> and the <em>Nancy</em> stations. The print output of the morphed graph will not be like the original graph table.</p>
<pre class="r"><code>from <- which(stations == "Arras")
to <- which(stations == "Nancy")
shortest <- graph_routes %>%
morph(to_shortest_path, from, to, weights = journey_time)
shortest</code></pre>
<pre><code>## # A tbl_graph temporarily morphed to a shortest path representation
## #
## # Original graph is a directed simple graph with 1 component
## # consisting of 59 nodes and 130 edges</code></pre>
<p>It is possible to make more transformations with the use of <code>activate()</code> and <code>dplyr</code> functions. The results can be previewed, or committed back to the original R variable using <code>unmorph()</code>. By default, nodes are active in a morphed graph, so there is no need to set that explicitly.</p>
<pre class="r"><code>shortest %>%
mutate(selected_node = TRUE) %>%
unmorph()</code></pre>
<pre><code>## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 59 x 4 (active)
## name title label selected_node
## <chr> <chr> <chr> <lgl>
## 1 AIX EN PROVENCE T… Aix En Provence T… "Aix\nEn\nProvence\n… NA
## 2 ANGERS SAINT LAUD Angers Saint Laud "Angers\nSaint\nLaud" NA
## 3 ANGOULEME Angouleme Angouleme NA
## 4 ANNECY Annecy Annecy NA
## 5 ARRAS Arras Arras TRUE
## 6 AVIGNON TGV Avignon Tgv "Avignon\nTgv" NA
## # … with 53 more rows
## #
## # Edge Data: 130 x 3
## from to journey_time
## <int> <int> <dbl>
## 1 1 39 186.
## 2 2 40 97.5
## 3 3 40 146.
## # … with 127 more rows</code></pre>
<p>While it was morphed, only the few nodes that make up the connections between the Arras and Nancy stations were selected. A simple <code>mutate()</code> adds a new variable called <em>selected_node</em>, which tags those nodes with TRUE. The new variable and value is retained once the rest of the nodes are restored via the <code>unmorph()</code> command.</p>
<p>To keep the change, the <em>shortest</em> variable is updated with the changes made to both edges and nodes.</p>
<pre class="r"><code>shortest <- shortest %>%
mutate(selected_node = TRUE) %>%
activate(edges) %>%
mutate(selected_edge = TRUE) %>%
unmorph() </code></pre>
<p>The next step is to coerce each NA into a 1, and the shortest route into a 2. This will allow us to easily re-arrange the order that the edges are drawn in the plot, ensuring that the route will be drawn at the top.</p>
<pre class="r"><code>shortest <- shortest %>%
activate(nodes) %>%
mutate(selected_node = ifelse(is.na(selected_node), 1, 2)) %>%
activate(edges) %>%
mutate(selected_edge = ifelse(is.na(selected_edge), 1, 2)) %>%
arrange(selected_edge)
shortest</code></pre>
<pre><code>## # A tbl_graph: 59 nodes and 130 edges
## #
## # A directed simple graph with 1 component
## #
## # Edge Data: 130 x 4 (active)
## from to journey_time selected_edge
## <int> <int> <dbl> <dbl>
## 1 1 39 186. 1
## 2 2 40 97.5 1
## 3 3 40 146. 1
## 4 4 39 225. 1
## 5 6 39 161. 1
## 6 7 39 358. 1
## # … with 124 more rows
## #
## # Node Data: 59 x 4
## name title label selected_node
## <chr> <chr> <chr> <dbl>
## 1 AIX EN PROVENCE T… Aix En Provence T… "Aix\nEn\nProvence\n… 1
## 2 ANGERS SAINT LAUD Angers Saint Laud "Angers\nSaint\nLaud" 1
## 3 ANGOULEME Angouleme Angouleme 1
## # … with 56 more rows</code></pre>
<p>A simple way to plot the route is to use the <em>selected_</em> variables to modify the <code>alpha</code>. This will highlight the shortest path, without completely removing the other stations. This is a personal design choice, so experimenting with different ways of highlighting the results is always recommended.</p>
<pre class="r"><code>shortest %>%
ggraph(layout = "kk") +
geom_edge_diagonal(aes(alpha = selected_edge), color = "gray") +
geom_node_text(aes(label = label, color =name, alpha = selected_node ), size = 3) </code></pre>
<p><img src="/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-15-1.png" width="672" /></p>
<p>The <em>selected_</em> fields can also be used in other <code>dplyr</code> functions to analyze the results. For example, to know the aggregate information about the trip, <em>selected_edge</em> is used to filter the edges, and then the totals can be calculated. There is no <code>summarise()</code> function for graph tables; this make sense because the graph table would become a summarized table with such a function. Since the end result we seek is a total rather than another graph table, a simple <code>as_tibble()</code> command will coerce the edges, which will then allows us to finish the calculation.</p>
<pre class="r"><code>shortest %>%
activate(edges) %>%
filter(selected_edge == 2) %>%
as_tibble() %>%
summarise(
total_stops = n() - 1,
total_time = round(sum(journey_time) / 60)
)</code></pre>
<pre><code>## # A tibble: 1 x 2
## total_stops total_time
## <dbl> <dbl>
## 1 8 23</code></pre>
</div>
<div id="re-using-the-code" class="section level2">
<h2>Re-using the code</h2>
<p>To compile most of the code in a single chunk, here is an example of how to re-run the shortest path for a different set of stations: the Laval and Montpellier stations.</p>
<pre class="r"><code>from <- which(stations == "Montpellier")
to <- which(stations == "Laval")
shortest <- graph_routes %>%
morph(to_shortest_path, from, to, weights = journey_time) %>%
mutate(selected_node = TRUE) %>%
activate(edges) %>%
mutate(selected_edge = TRUE) %>%
unmorph() %>%
activate(nodes) %>%
mutate(selected_node = ifelse(is.na(selected_node), 1, 2)) %>%
activate(edges) %>%
mutate(selected_edge = ifelse(is.na(selected_edge), 1, 2)) %>%
arrange(selected_edge)
shortest %>%
ggraph(layout = "kk") +
geom_edge_diagonal(aes(alpha = selected_edge), color = "gray") +
geom_node_text(aes(label = label, color =name, alpha = selected_node ), size = 3)</code></pre>
<p><img src="/post/2019-02-28-intro-to-graph-analysis_files/figure-html/unnamed-chunk-17-1.png" width="672" /></p>
<p>Additional, the same code can be recycled to obtain the trip summarized data.</p>
<pre class="r"><code>shortest %>%
activate(edges) %>%
filter(selected_edge == 2) %>%
as_tibble() %>%
summarise(
total_stops = n() - 1,
total_time = round(sum(journey_time) / 60)
)</code></pre>
<pre><code>## # A tibble: 1 x 2
## total_stops total_time
## <dbl> <dbl>
## 1 3 10</code></pre>
</div>
<div id="shiny-app" class="section level2">
<h2>Shiny app</h2>
<p>To see how to use this kind of analysis inside Shiny, please refer to <a href="https://beta.rstudioconnect.com/content/4606/">this application</a>. It lets the user select two stations, and it returns the route, plus the summarized data. The source code is embedded in the app.</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/03/06/intro-to-graph-analysis/';</script>
Some R Packages for ROC Curves
https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/
Fri, 01 Mar 2019 00:00:00 +0000https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/
<p>In a recent <a href="https://rviews.rstudio.com/2019/01/17/roc-curves/">post</a>, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful packages.</p>
<p>Although I began with a few ideas about packages that I wanted to talk about, like <a href="https://cran.r-project.org/package=ROCR">ROCR</a> and <a href="https://cran.r-project.org/package=pROC">pROC</a>, which I have found useful in the past, I decided to use Gábor Csárdi’s relatively new package <a href="https://cran.r-project.org/package=pkgsearch">pkgsearch</a> to search through CRAN and see what’s out there. The <code>package_search()</code> function takes a text string as input and uses basic text mining techniques to search all of CRAN. The algorithm searches through package text fields, and produces a score for each package it finds that is weighted by the number of reverse dependencies and downloads.</p>
<pre class="r"><code>library(tidyverse) # for data manipulation
library(dlstats) # for package download stats
library(pkgsearch) # for searching packages</code></pre>
<p>After some trial and error, I settled on the following query, which includes a number of interesting ROC-related packages.</p>
<pre class="r"><code>rocPkg <- pkg_search(query="ROC",size=200)</code></pre>
<p>Then, I narrowed down the field to 46 packages by filtering out orphaned packages and packages with a score less than 190.</p>
<pre class="r"><code>rocPkgShort <- rocPkg %>%
filter(maintainer_name != "ORPHANED", score > 190) %>%
select(score, package, downloads_last_month) %>%
arrange(desc(downloads_last_month))
head(rocPkgShort)</code></pre>
<pre><code>## # A tibble: 6 x 3
## score package downloads_last_month
## <dbl> <chr> <int>
## 1 690. ROCR 56356
## 2 7938. pROC 39584
## 3 1328. PRROC 9058
## 4 833. sROC 4236
## 5 266. hmeasure 1946
## 6 1021. plotROC 1672</code></pre>
<p>To complete the selection process, I did the hard work of browsing the documentation for the packages to pick out what I thought would be generally useful to most data scientists. The following plot uses Guangchuang Yu’s <code>dlstats</code> package to look at the download history for the six packages I selected to profile.</p>
<pre class="r"><code>library(dlstats)
shortList <- c("pROC","precrec","ROCit", "PRROC","ROCR","plotROC")
downloads <- cran_stats(shortList)
ggplot(downloads, aes(end, downloads, group=package, color=package)) +
geom_line() + geom_point(aes(shape=package)) +
scale_y_continuous(trans = 'log2')</code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-5-1.png" width="672" /></p>
<div id="rocr---2005" class="section level3">
<h3><a href="https://cran.r-project.org/package=ROCR">ROCR</a> - 2005</h3>
<p>ROCR has been around for almost 14 years, and has be a rock-solid workhorse for drawing ROC curves. I particularly like the way the <code>performance()</code> function has you set up calculation of the curve by entering the true positive rate, <code>tpr</code>, and false positive rate, <code>fpr</code>, parameters. Not only is this reassuringly transparent, it shows the flexibility to calculate nearly every performance measure for a <a href="https://en.wikipedia.org/wiki/Binary_classification">binary classifier</a> by entering the appropriate parameter. For example, to produce a precision-recall curve, you would enter <code>prec</code> and <code>rec</code>. Although there is no vignette, the documentation of the package is very good.</p>
<p>The following code sets up and plots the default <code>ROCR</code> ROC curve using a synthetic data set that comes with the package. I will use this same data set throughout this post.</p>
<pre class="r"><code>library(ROCR)</code></pre>
<pre><code>## Loading required package: gplots</code></pre>
<pre><code>##
## Attaching package: 'gplots'</code></pre>
<pre><code>## The following object is masked from 'package:stats':
##
## lowess</code></pre>
<pre class="r"><code># plot a ROC curve for a single prediction run
# and color the curve according to cutoff.
data(ROCR.simple)
df <- data.frame(ROCR.simple)
pred <- prediction(df$predictions, df$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)</code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-6-1.png" width="672" /></p>
</div>
<div id="proc---2010" class="section level3">
<h3><a href="https://CRAN.R-project.org/package=pROC">pROC</a> - 2010</h3>
<p>It is clear from the downloads curve that <code>pROC</code> is also popular with data scientists. I like that it is pretty easy to get confidence intervals for the Area Under the Curve, <code>AUC</code>, on the plot.</p>
<pre class="r"><code>library(pROC)</code></pre>
<pre><code>## Type 'citation("pROC")' for a citation.</code></pre>
<pre><code>##
## Attaching package: 'pROC'</code></pre>
<pre><code>## The following objects are masked from 'package:stats':
##
## cov, smooth, var</code></pre>
<pre class="r"><code>pROC_obj <- roc(df$labels,df$predictions,
smoothed = TRUE,
# arguments for ci
ci=TRUE, ci.alpha=0.9, stratified=FALSE,
# arguments for plot
plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
print.auc=TRUE, show.thres=TRUE)
sens.ci <- ci.se(pROC_obj)
plot(sens.ci, type="shape", col="lightblue")</code></pre>
<pre><code>## Warning in plot.ci.se(sens.ci, type = "shape", col = "lightblue"): Low
## definition shape.</code></pre>
<pre class="r"><code>plot(sens.ci, type="bars")</code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-7-1.png" width="672" /></p>
</div>
<div id="prroc---2014" class="section level3">
<h3><a href="https://cran.r-project.org/package=PRROC">PRROC</a> - 2014</h3>
<p>Although not nearly as popular as <code>ROCR</code> and <code>pROC</code>, <code>PRROC</code> seems to be making a bit of a comeback lately. The terminology for the inputs is a bit eclectic, but once you figure that out the <code>roc.curve()</code> function plots a clean ROC curve with minimal fuss. <code>PRROC</code> is really set up to do precision-recall curves as the <a href="https://cran.r-project.org/web/packages/PRROC/vignettes/PRROC.pdf">vignette</a> indicates.</p>
<pre class="r"><code>library(PRROC)
PRROC_obj <- roc.curve(scores.class0 = df$predictions, weights.class0=df$labels,
curve=TRUE)
plot(PRROC_obj)</code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-8-1.png" width="672" /></p>
</div>
<div id="plotroc---2014" class="section level3">
<h3><a href="https://CRAN.R-project.org/package=plotROC">plotROC</a> - 2014</h3>
<p><code>plotROC</code> is an excellent choice for drawing ROC curves with <code>ggplot()</code>. My guess is that it appears to enjoy only limited popularity because the documentation uses medical terminology like “disease status” and “markers”. Nevertheless, the documentation, which includes both a <a href="https://cran.r-project.org/web/packages/plotROC/vignettes/examples.html">vignette</a> and a <a href="https://sachsmc.shinyapps.io/plotROC/">Shiny application</a>, is very good.</p>
<p>The package offers a number of feature-rich <code>ggplot()</code> geoms that enable the production of elaborate plots. The following plot contains some styling, and includes <a href="https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper%E2%80%93Pearson_interval">Clopper and Pearson (1934) exact method</a> confidence intervals.</p>
<pre class="r"><code>library(plotROC)
rocplot <- ggplot(df, aes(m = predictions, d = labels))+ geom_roc(n.cuts=20,labels=FALSE)
rocplot + style_roc(theme = theme_grey) + geom_rocci(fill="pink") </code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-9-1.png" width="672" /></p>
</div>
<div id="precrec---2015" class="section level3">
<h3><a href="https://cran.r-project.org/package=precrec">precrec</a> - 2015</h3>
<p><code>precrec</code> is another library for plotting ROC and precision-recall curves.</p>
<pre class="r"><code>library(precrec)</code></pre>
<pre><code>##
## Attaching package: 'precrec'</code></pre>
<pre><code>## The following object is masked from 'package:pROC':
##
## auc</code></pre>
<pre class="r"><code>precrec_obj <- evalmod(scores = df$predictions, labels = df$labels)
autoplot(precrec_obj)</code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-10-1.png" width="672" /></p>
<p>Parameter options for the <code>evalmod()</code> function make it easy to produce basic plots of various model features.</p>
<pre class="r"><code>precrec_obj2 <- evalmod(scores = df$predictions, labels = df$labels, mode="basic")
autoplot(precrec_obj2) </code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-11-1.png" width="672" /></p>
</div>
<div id="rocit---2019" class="section level3">
<h3><a href="https://cran.r-project.org/package=ROCit">ROCit</a> - 2019</h3>
<p><code>ROCit</code> is a new package for plotting ROC curves and other binary classification visualizations that rocketed onto the scene in January, and is climbing quickly in popularity. I would never have discovered it if I had automatically filtered my original search by downloads. The default plot includes the location of the <a href="https://en.wikipedia.org/wiki/Youden%27s_J_statistic">Yourden’s J Statistic</a>.</p>
<pre class="r"><code>library(ROCit)</code></pre>
<pre><code>## Warning: package 'ROCit' was built under R version 3.5.2</code></pre>
<pre class="r"><code>ROCit_obj <- rocit(score=df$predictions,class=df$labels)
plot(ROCit_obj)</code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-12-1.png" width="672" /></p>
<p>Several other visualizations are possible. The following plot shows the cumulative densities of the positive and negative responses. The KS statistic shows the maximum distance between the two curves.</p>
<pre class="r"><code>ksplot(ROCit_obj)</code></pre>
<p><img src="/post/2019-02-08-some-r-packages-for-roc-curves_files/figure-html/unnamed-chunk-13-1.png" width="672" /></p>
<p>In this attempt to dig into CRAN and uncover some of the resources R contains for plotting ROC curves and other binary classifier visualizations, I have only scratched the surface. Moreover, I have deliberately ignored the many packages available for specialized applications, such as <a href="https://cran.r-project.org/package=survivalROC">survivalROC</a> for computing time-dependent ROC curves from censored survival data, and <a href="https://cran.r-project.org/web/packages/cvAUC/index.html">cvAUC</a>, which contains functions for evaluating cross-validated AUC measures. Nevertheless, I hope that this little exercise will help you find what you are looking for.</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2019/03/01/some-r-packages-for-roc-curves/';</script>
January 2019: “Top 40” New CRAN Packages
https://rviews.rstudio.com/2019/02/25/january-2019-top-40-new-cran-packages/
Mon, 25 Feb 2019 00:00:00 +0000https://rviews.rstudio.com/2019/02/25/january-2019-top-40-new-cran-packages/
<p>One hundred and fifty-three new packages made it to CRAN in January. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=cPCG">cPCG</a> v1.0: Provides a function to solve systems of linear equations using a (preconditioned) conjugate gradient algorithm. The <a href="https://cran.r-project.org/web/packages/cPCG/vignettes/cpcg-intro.html">vignette</a> shows how to use the package.</p>
<p><a href="https://cran.r-project.org/package=RcppDynProg">RcppDynProg</a> v0.1.1: Implements dynamic programming using <code>Rcpp</code>. Look <a href="https://winvector.github.io/RcppDynProg/">here</a> for examples.</p>
<p><img src="/post/2019-02-22-JanTop40_files/RcppDynProg.png" height = "400" width="600"></p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=cimir">cimir</a> v0.1-0: Provides functions to connect to the California Irrigation Management Information System (CIMIS) <a href="https://cimis.water.ca.gov/">Web API</a>. See the <a href="https://cran.r-project.org/web/packages/cimir/vignettes/quickstart.html">Quick Start</a> for details.</p>
<p><a href="https://cran.r-project.org/package=ecmwfr">ecmwfr</a> v1.1.0: Provides a programmatic interface to the European Centre for Medium-Range Weather Forecasts’ public and restricted dataset web services <a href="https://www.ecmwf.int/">ECMWF</a>, as well as Copernicus’s Climate Data Store <a href="https://cds.climate.copernicus.eu">CDS</a>, allowing users to download weather forecasts and climate data. There are vignettes for both <a href="https://cran.r-project.org/web/packages/ecmwfr/vignettes/cds_vignette.html">CDS</a> and <a href="https://cran.r-project.org/web/packages/ecmwfr/vignettes/webapi_vignette.html">ECMWFR</a>.</p>
<p><img src="/post/2019-02-22-JanTop40_files/ecmwfr.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=germanpolls">germanpolls</a> v0.2: Provides functions to extract data from <a href="http://www.wahlrecht.de/">Wahlen.de</a>.</p>
<p><a href="https://cran.r-project.org/package=nhdR">nhdR</a> v0.5.1: Provides tools for working with the National Hydrography Dataset, with functions for querying, downloading, and networking both the <a href="https://www.usgs.gov/core-science-systems/ngp/national-hydrography">NHD</a> and <a href="http://www.horizon-systems.com/nhdplus">NHDPlus</a> datasets. There are vignettes for <a href="https://cran.r-project.org/web/packages/nhdR/vignettes/demo.html">Creating Simple Maps</a> and <a href="https://cran.r-project.org/web/packages/nhdR/vignettes/flow.html">Quering Flow Information</a>, as well as an <a href="https://cran.r-project.org/web/packages/nhdR/vignettes/network.html">example</a>.</p>
<p><img src="/post/2019-02-22-JanTop40_files/nhdr.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=snotelr">snotelr</a> v1.0.1: Provides a programmatic interface to the <a href="https://www.wcc.nrcs.usda.gov/snow/">SNOTEL</a> snow data. See the <a href="https://cran.r-project.org/web/packages/snotelr/vignettes/snotelr-vignette.html">vignette</a> for information.</p>
<p><a href="https://cran.r-project.org/package=wdpar">wdpar</a> v0.0.2: Provides an interface to the World Database on Protected Areas (WDPA). Data is obtained from <a href="http://protectedplanet.net">Protected Planet</a>. See the <a href="https://cran.r-project.org/web/packages/wdpar/readme/README.html">README</a> for information.</p>
<p><img src="/post/2019-02-22-JanTop40_files/wdpar.png" height = "200" width="400"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=analysisPipelines">analysisPipelines</a> v1.0.0: Implements an R interface that enables data scientists to compose inter-operable pipelines between R, Spark, and Python for data manipulation, exploratory analysis, modeling, and reporting. There are vignettes for <a href="https://cran.r-project.org/web/packages/analysisPipelines/vignettes/Analysis_pipelines_for_working_with_Python_functions.html">Python Functions</a>, <a href="https://cran.r-project.org/web/packages/analysisPipelines/vignettes/Analysis_pipelines_for_working_with_R_dataframes.html">R data frames</a>, <a href="https://cran.r-project.org/web/packages/analysisPipelines/vignettes/Analysis_pipelines_for_working_with_sparkR.html">Spark data frames</a>, <a href="https://cran.r-project.org/web/packages/analysisPipelines/vignettes/Interoperable_Pipelines.html">Interoperable Pipelines</a>, <a href="https://cran.r-project.org/web/packages/analysisPipelines/vignettes/Meta_Pipelines.html">Meta-pipelines</a>, <a href="https://cran.r-project.org/web/packages/analysisPipelines/vignettes/Streaming_pipelines_for_working_Apache_Spark_Structured_Streaming.html">Streaming Analysis Pipelines</a>, and <a href="https://cran.r-project.org/web/packages/analysisPipelines/vignettes/Using_pipelines_inside_shiny_widgets.html">Using Pipelines with Spark</a>.</p>
<p><a href="https://cran.r-project.org/package=bender">bender</a> v0.1.1: Implements an R client for <a href="https://bender.dreem.com">Bender Hyperparameters optimizer</a>.</p>
<p><a href="https://cran.r-project.org/package=FiRE">FiRE</a> v1.0: Implements an algorithm to find outliers and rare entities in voluminous datasets. Look <a href="https://github.com/princethewinner/FiRE">here</a> for information.</p>
<p><img src="/post/2019-02-22-JanTop40_files/FiRE.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=foto">foto</a> v1.0.0: Implements the Fourier Transform Textural Ordination method, which uses a principal component analysis on radially averaged, two-dimensional Fourier spectra to characterize image texture. See the <a href="https://cran.r-project.org/web/packages/foto/vignettes/foto-vignette.html">vignette</a> for details.</p>
<p><img src="/post/2019-02-22-JanTop40_files/foto.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=RcppHNSW">RcppHNSW</a> v0.1.0: Provides bindings to the <a href="https://github.com/nmslib/hnswlib">Hnswlib</a> C++ library for Approximate Nearest Neighbors.</p>
<p><a href="https://cran.r-project.org/package=ruimtehol">ruimtehol</a> v0.1.2: Wraps the <a href="https://github.com/facebookresearch/StarSpace">StarSpace library</a>, allowing users to calculate word, sentence, article, document, webpage, link, and entity embeddings. The techniques are explained in detail in <a href="arXiv:1709.03856">Wu et al. (2017)</a>. See the <a href="https://cran.r-project.org/web/packages/ruimtehol/vignettes/ground-control-to-ruimtehol.pdf">vignette</a> for more information.</p>
<p><img src="/post/2019-02-22-JanTop40_files/ruimtehol.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=zoomgrid">zoomgrid</a> v1.0.0: Implements a grid search algorithm with zoom to help solve difficult optimization problems where there are many local optima inside the domain of the target function. Look <a href="https://github.com/yukai-yang/zoomgrid">here</a> for information.</p>
<p><img src="/post/2019-02-22-JanTop40_files/zoomgrid.png" height = "400" width="600"></p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=bayesCT">bayesCT</a> v0.99.0: Provides functions to simulate and analyze Bayesian adaptive clinical trials, incorporating historical data and allowing for early stopping. There is an <a href="https://cran.r-project.org/web/packages/bayesCT/vignettes/bayesCT.html">Introduction</a>, and vignettes for <a href="https://cran.r-project.org/web/packages/bayesCT/vignettes/binomial.html">Binomial</a> and <a href="https://cran.r-project.org/web/packages/bayesCT/vignettes/normal.html">Normal</a> outcomes.</p>
<p><a href="https://cran.r-project.org/package=BioMedR">BioMedR</a> v1.1.1: Provides tools for calculating 293 chemical descriptors and 14 kinds of chemical fingerprints, 9920 protein descriptors based on protein sequences, more than 6000 DNA/RNA descriptors from nucleotide sequences, and six types of interaction descriptors. There is a very informative <a href="https://cran.r-project.org/web/packages/BioMedR/vignettes/BioMedR.pdf">vignette</a>.</p>
<p><img src="/post/2019-02-22-JanTop40_files/BioMedR.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=dr4pl">dr4pl</a> v1.1.8: Models the relationship between dose levels and responses in a pharmacological experiment using the 4 Parameter Logistic model, and provides bounds that prevent parameter estimates from diverging. See <a href="doi:10.1016/j.vascn.2014.08.006">Gadagkar and Call (2015)</a> and <a href="doi:10.1371/journal.pone.0146021">Ritz et al. (2015)</a> for background information, and the <a href="https://cran.r-project.org/web/packages/dr4pl/vignettes/walk_through_in_R.html">vignette</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=GMMAT">GMMAT</a> v1.0.3: Provides functions to perform association tests using generalized linear mixed models (GLMMs) in genome-wide association studies (GWAS) and sequencing association studies. See <a href="https://doi.org/10.1016/j.ajhg.2016.02.012">Chen et al. (2016)</a> and <a href="https://doi.org/10.1016/j.ajhg.2018.12.012">Chen et al. (2019)</a> for background information, and the <a href="https://cran.r-project.org/web/packages/GMMAT/vignettes/GMMAT.pdf">vignette</a> for an introduction to the package.</p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=ethnobotanyR">ethnobotanyR</a> v0.1.4: Implements functions to calculate common quantitative ethnobotany indices to assess the cultural significance. See <a href="doi:10.1007/s12231-007-9004-5">Tardio and Pardo-de-Santayana (2008)</a> for background information, and the <a href="https://cran.r-project.org/web/packages/ethnobotanyR/vignettes/ethnobotanyr_vignette.html">vignette</a> for information on the package.</p>
<p><img src="/post/2019-02-22-JanTop40_files/ethnobotanyR.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=wsyn">wsyn</a> v1.0.0: Implements tools for a wavelet-based approach to analyzing spatial synchrony, principally in ecological data. The <a href="https://cran.r-project.org/web/packages/wsyn/vignettes/wsynvignette.pdf">vignette</a> gives the details.</p>
<p><img src="/post/2019-02-22-JanTop40_files/wsyn.png" height = "400" width="600"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=apcf">apcf</a> v0.1.2: Implements the adapted pair correlation function, which transfers the concept of the pair correlation function from point patterns to patterns of objects of finite size and irregular shape. This is a re-implementation of the method suggested by <a href="doi:10.1016/j.foreco.2009.09.050">Nuske et al. (2009)</a>. See the <a href="https://cran.r-project.org/web/packages/apcf/vignettes/intro.html">vignette</a> for details.</p>
<p><img src="/post/2019-02-22-JanTop40_files/apcf.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=concurve">concurve</a> v1.0.1: Provides functions to compute confidence (compatibility/consonance) intervals for various statistical tests, along with their corresponding P-values and S-values. Consonance functions allow modelers to determine what effect sizes are compatible with the test model at various compatibility levels. For details, see <a href="doi:10.2105/AJPH.77.2.195">Poole (1987)</a>, <a href="doi:10.1111/1467-9469.00285">Schweder and Hjort (2002)</a>, <a href="arXiv:0708.0976">Singh, Xie, and Strawderman (2007)</a>, and <a href="doi:10.7287/peerj.preprints.26857v4">Amrhein, Trafimow and Greenland (2018)</a>. See the <a href="https://cran.r-project.org/web/packages/concurve/vignettes/examples.html">vignette</a> for examples.</p>
<p><img src="/post/2019-02-22-JanTop40_files/concurve.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=IMaGES">IMaGES</a> v0.1: Provides functions to implement Independent Multiple-sample Greedy Equivalence Search (IMaGES), a causal inference algorithm for creating aggregate graphs and structural equation modeling data for one or more datasets. See <a href="doi:10.1016/j.neuroimage.2009.08.065">Ramsey et. al (2010)</a> for background information. There is a <a href="https://cran.r-project.org/web/packages/IMaGES/vignettes/IMaGES.html">vignette</a>.</p>
<p><img src="/post/2019-02-22-JanTop40_files/IMaGES.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=metamer">metamer</a> v0.1.0: Provides functions to create data with identical statistics (metamers) using an iterative algorithm proposed by <a href="doi:10.1145/3025453.3025912">Matejka & Fitzmaurice (2017)</a>. See <a href="https://cran.r-project.org/web/packages/metamer/readme/README.html">README</a> for help with the package.</p>
<p><img src="/post/2019-02-22-JanTop40_files/metamer.gif" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=mimi">mimi</a> v0.1.0: Implements functions to estimate main effects and interactions in mixed data sets with missing values. Estimation is done through a convex program where main effects are assumed sparse and the interactions low-rank. See <a href="arXiv:1806.09734">Geneviève et al. (2018)</a>.</p>
<p><a href="https://cran.r-project.org/package=pcLasso">pcLasso</a> v1.1: Implements a method for fitting the entire regularization path of the principal components lasso for linear and logistic regression models. See <a href="Principal componearXiv:1810.04651">Tay, Friedman, and Tibshirani (2014)</a> for details and the vignette for an <a href="https://cran.r-project.org/web/packages/pcLasso/vignettes/pcLasso.html">Introduction</a>.</p>
<p><a href="https://cran.r-project.org/package=qrandom">qrandom</a> v1.1: Implements an API to the ANU Quantum Random Number Generator, provided by the Australian National University, that generates true random numbers in real-time by measuring the quantum fluctuations of the vacuum. The quantum Random Number Generator is based on the papers by <a href="doi:10.1063/1.3597793">Symul et al. (2011)</a> and <a href="doi:10.1103/PhysRevApplied.3.054004">Haw, et al. (2015)</a>. Look <a href="https://qrng.anu.edu.au/index.php">here</a> for live random numbers.</p>
<p><img src="/post/2019-02-22-JanTop40_files/qrandom.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=rstap">rstap</a> v1.0.3: Provides tools for estimating spatial temporal aggregated predictor models with <code>stan</code>. See the <a href="https://cran.r-project.org/web/packages/rstap/vignettes/Introduction.html">vignette</a> for an introduction.</p>
<p><a href="https://cran.r-project.org/package=ROCit">ROCit</a> v1.1.1: Provides functions to calculate and visualize performance measures for binary classifiers. The <a href="https://cran.r-project.org/web/packages/ROCit/vignettes/my-vignette.html">vignette</a> describes the details.</p>
<p><img src="/post/2019-02-22-JanTop40_files/ROCit.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=surveysd">surveysd</a> v1.0.0: Provides functions to calculate point estimates and their standard errors in complex household surveys using bootstrap replicates. A comprehensive description of the methodology can be found <a href="https://statistikat.github.io/surveysd/articles/methodology.html">here</a>.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=askpass">askpass</a> v1.1: Provides safe password entry for R, Git, and SSH. Look <a href="https://github.com/jeroen/askpass#readme">here</a> for help.</p>
<p><a href="https://cran.r-project.org/package=logger">logger</a> v0.1: Provides a flexible and extensible way of formatting and delivering log messages with low overhead. There is an <a href="https://cran.r-project.org/web/packages/logger/vignettes/Intro.html">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/logger/vignettes/anatomy.html">The Anatomy of a Log Request</a>, <a href="https://cran.r-project.org/web/packages/logger/vignettes/customize_logger.html">Format Customization</a>, <a href="https://cran.r-project.org/web/packages/logger/vignettes/migration.html">Migration</a>, <a href="https://cran.r-project.org/web/packages/logger/vignettes/performance.html">Benchmarks</a>, <a href="https://cran.r-project.org/web/packages/logger/vignettes/r_packages.html">Logging</a>, and <a href="https://cran.r-project.org/web/packages/logger/vignettes/write_custom_extensions.html">Extensions</a>.</p>
<p><img src="/post/2019-02-22-JanTop40_files/logger.svg" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=pagedown">pagedown</a> v0.1: Implements tools to use the paged media properties in CSS and the JavaScript library <code>paged.js</code> to split the content of an HTML document into discrete pages. See the <a href="https://cran.r-project.org/web/packages/pagedown/readme/README.html">README</a> for details.</p>
<p><a href="https://cran.r-project.org/package=rmd">rmd</a> v0.1.4: Provides functions to manage multiple R Markdown packages. Look <a href="https://github.com/pzhaonet/rmd">here</a> for information.</p>
<p><a href="https://cran.r-project.org/package=tor">tor</a> v1.1.1: Provides functions to enable users to import multiple files at the same time. See the <a href="https://cran.r-project.org/web/packages/tor/readme/README.html">README</a> for details.</p>
<p><a href="https://cran.r-project.org/package=vitae">vitae</a> v0.1.0: Provides templates and functions to simplify the production and maintenance of curricula vitae. There is an <a href="https://cran.r-project.org/web/packages/vitae/vignettes/vitae.html">Introduction</a> and a <a href="https://cran.r-project.org/web/packages/vitae/vignettes/extending.html">vignette</a> for creating templates.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=gganimate">gganimate</a> v1.0.1: Implements a <code>ggplot2</code>-compatible grammar for creating animations. The <a href="https://cran.r-project.org/web/packages/gganimate/vignettes/gganimate.html">vignette</a> will get you started.</p>
<p><img src="/post/2019-02-22-JanTop40_files/gganimate.gif" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=RIdeogram">RIdeogram</a> v0.1.1: Implement tools to draw SVG (Scalable Vector Graphics) graphics to visualize and map genome-wide data in ideograms. See the <a href="https://cran.r-project.org/web/packages/RIdeogram/vignettes/RIdeogram.html">vignette</a> for information.</p>
<p><img src="/post/2019-02-22-JanTop40_files/RIdeogram.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=voronniTreeMap">voronniTreeMap</a> v0.2.0: Provides functions to create Voronni tree maps using the <code>d3.js</code> framework. Look <a href="https://github.com/uRosConf/voronoiTreemap">here</a> for examples.</p>
<p><img src="/post/2019-02-22-JanTop40_files/voronniTreeMap.png" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2019/02/25/january-2019-top-40-new-cran-packages/';</script>
A Few New R Books
https://rviews.rstudio.com/2019/02/20/a-few-new-books/
Wed, 20 Feb 2019 00:00:00 +0000https://rviews.rstudio.com/2019/02/20/a-few-new-books/
<p><em>Greg Wilson is a data scientist and professional educator at RStudio.</em></p>
<p>As a newcomer to R who prefers to read paper rather than pixels, I’ve been working my way through a more-or-less random selection of relevant books over the past few months. Some have discussed topics that I’m already familiar with in the context of R, while others have introduced me to entirely new subjects. This post describes four of them in brief; I hope to follow up with a second post in a few months as I work through the backlog on my desk.</p>
<p>First up is Sharon Machlis’ <a href="https://www.amazon.ca/dp/1138726915/"><em>Practical R for Mass Communcation and Journalism</em></a>, which is based on the author’s workshops for journalists. This book dives straight into doing the kinds of things a busy reporter or news analyst needs to do to meet a 5:00 pm deadline: data cleaning, presentation-quality graphics, and maps take precedence over control flow or the niceties of variable scope. I particularly enjoyed the way each chapter starts with a realistic project and works through what’s needed to build it. People who’ve never programmed before will be a little intimidated by how many packages they need to download if they try to work through the material on their own, but the instructions are clear, and the author’s enthusiasm for her material shines through in every example. (If anyone is working on a similar tutorial for sports data, please let me know - I have more than a few friends it would make very happy.)</p>
<p>In contrast, Chris Beeley and Shitalkumar Sukhdeve’s <a href="https://www.amazon.ca/Web-Application-Development-Using-Shiny/dp/1788993128/"><em>Web Application Development with R Using Shiny</em></a> focuses on a particular tool rather than a industry vertical. It covers exactly what its title promises, step by step from the basics through custom JavaScript functions and animations through persistent storage. Every example I ran was cleanly written and clearly explained, and it’s clear that the authors have tested their material with real audiences. I particularly appreciated the chapter on code patterns - while I’m still not sure I fully understand when and how to use <code>isolate()</code> and <code>req()</code>, I’m much less confused than I was.</p>
<p>Functional programming has been the next big thing in computing since I was a graduate student in the 1980s. It does finally seem to be getting some traction outside the craft-beer-and-Emacs community, and <a href="https://www.amazon.ca/dp/148422745X/"><em>Functional Programming in R</em></a> by Thomas Mailund looks at how these ideas can be used in R. Mailund writes clearly, and readers who don’t have a background in computer science may find this a gentle way into a complex subject. However, despite the subtitle “Advanced Statistical Programming for Data Science, Analysis and Finance”, there’s nothing particularly statistical or financial about the book’s content. Some parts felt rushed, such as the lightning coverage of point-free programming (which should have had either a detailed exposition or no mention at all), but my biggest complaint about the book is its price: I think $34 for 100 pages is more than most people will want to pay.</p>
<p>Finally, we have Stefano Allesina and Madlen Wilmes’ <a href="https://www.amazon.ca/dp/0691182752/"><em>Computing Skills for Biologists</em></a>. As the subtitle says, this book presents a toolbox that includes Python, Git, LaTeX, and SQL as well as R, and is aimed at graduate students in biology who have just realized that a few hundred megabytes of messy data are standing between them and their thesis. The authors present the basics of each subject clearly and concisely using real-world data analysis examples at every turn. They freely admit in the introduction that coverage will be broad and shallow, but that’s exactly what books like this should aim for, and they hit a bulls eye. The book’s only weakness - unfortunately, a significant one - is an almost complete lack of diagrams. There are only six figures in its 400 pages, and none in the material on visualization. I realize that readers who are coding along with the examples will be able to view some plots and charts as they go, but I would urge the authors to include these in a second edition.</p>
<p>R is growing by leaps and bounds, and so is the literature about it. If you have written or read a book on R recently that you think others would be interested in, please <a href="mailto:greg.wilson@rstudio.com">let us know</a> - we’d enjoy checking it out.</p>
<p>Stefano Allesina and Madlen Wilmes: <em><a href="https://www.amazon.ca/dp/0691182752/">Computing Skills for Biologists: A Toolbox</a></em>. Princeton University Press, 978-0691182759.</p>
<p>Chris Beeley and Shitalkumar Sukhdeve: <em><a href="https://www.amazon.ca/Web-Application-Development-Using-Shiny/dp/1788993128/">Web Application Development with R Using Shiny</a></em> (3rd ed.). Packt, 2018, 978-1788993128.</p>
<p>Sharon Machlis: <em><a href="https://www.amazon.ca/dp/1138726915/">Practical R for Mass Communcation and Journalism</a></em>. Chapman & Hall/CRC, 2018, 978-1138726918.</p>
<p>Thomas Mailund: <em><a href="https://www.amazon.ca/dp/148422745X/">Functional Programming in R: Advanced Statistical Programming for Data Science, Analysis and Finance</a></em>. Apress, 2017, 978-1484227459.</p>
<script>window.location.href='https://rviews.rstudio.com/2019/02/20/a-few-new-books/';</script>
A Look Back on 2018: Part 2
https://rviews.rstudio.com/2019/02/12/a-look-back-on-2018-part-2/
Tue, 12 Feb 2019 00:00:00 +0000https://rviews.rstudio.com/2019/02/12/a-look-back-on-2018-part-2/
<script src="/rmarkdown-libs/htmlwidgets/htmlwidgets.js"></script>
<script src="/rmarkdown-libs/jquery/jquery.min.js"></script>
<script src="/rmarkdown-libs/proj4js/proj4.js"></script>
<link href="/rmarkdown-libs/highcharts/css/motion.css" rel="stylesheet" />
<script src="/rmarkdown-libs/highcharts/highcharts.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-3d.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-more.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/stock.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/heatmap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/treemap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/annotations.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/boost.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/data.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drag-panes.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drilldown.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/funnel.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/item-series.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/offline-exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/overlapping-datalabels.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/parallel-coordinates.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sankey.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/solid-gauge.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/streamgraph.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sunburst.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/vector.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/wordcloud.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/xrange.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/export-data.js"></script>
<script src="/rmarkdown-libs/highcharts/maps/modules/map.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/grouped-categories.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/motion.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/multicolor_series.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/reset.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/symbols-extra.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/text-symbols.js"></script>
<script src="/rmarkdown-libs/highchart-binding/highchart.js"></script>
<p>Welcome to the second installment of Reproducible Finance 2019!</p>
<p>In the <a href="http://www.reproduciblefinance.com/2019/01/14/looking-back-on-last-year/">previous post</a>, we looked back on the daily returns for several market sectors in 2018. Today, we’ll continue that theme and look at some summary statistics for 2018, and then extend out to previous years and different ways of visualizing our data. There’s not much heavy computation or even modeling today, but the goal is to lay some foundational code that we could use for different years or buckets of stocks, and to create some exploratory visualizations.</p>
<p>First, let’s load up our packages for the day.</p>
<pre class="r"><code>library(tidyverse)
library(tidyquant)
library(riingo)
library(highcharter)</code></pre>
<p>Next let’s grab our prices and returns for market sectors. We covered this in detail in the <a href="http://www.reproduciblefinance.com/2019/01/14/looking-back-on-last-year/">previous post</a> and I won’t walk through it again, but here is the full code.</p>
<p>Note one change: last time, we imported data and calculated returns for just 2018. Today, I’ll set the start date to <code>start_date = "2007-12-29"</code> and import data for the 10 years from 2008 - 2018. That’s because, in addition to looking at summary statistics in just 2018, we will also look at some stats on a yearly basis from 2008 - 2018.</p>
<p>Here’s the code to import prices and calculate daily returns for our sector ETFs.</p>
<pre class="r"><code>etf_ticker_sector <- tibble(
ticker = c("XLY", "XLP", "XLE",
"XLF", "XLV", "XLI", "XLB",
"XLK", "XLU", "XLRE",
"SPY"),
sector = c("Consumer Discretionary", "Consumer Staples", "Energy",
"Financials", "Health Care", "Industrials", "Materials",
"Information Technology", "Utilities", "Real Estate",
"Market")
)
#riingo_set_token("your API key here")
sector_returns_2008_2018 <-
etf_ticker_sector %>%
pull(ticker) %>%
riingo_prices(.,
start_date = "2007-12-29",
end_date = "2018-12-31") %>%
mutate(date = ymd(date)) %>%
left_join(etf_ticker_sector, by = "ticker") %>%
select(sector, date, adjClose) %>%
group_by(sector) %>%
mutate(daily_return = log(adjClose) - log(lag(adjClose))) %>%
na.omit() </code></pre>
<p>Let’s take a quick peek at the first observation for each sector by using <code>slice(1)</code>, which will respect our <code>group_by()</code>.</p>
<pre class="r"><code>sector_returns_2008_2018 %>%
group_by(sector) %>%
slice(1)</code></pre>
<pre><code># A tibble: 11 x 4
# Groups: sector [11]
sector date adjClose daily_return
<chr> <date> <dbl> <dbl>
1 Consumer Discretionary 2008-01-02 27.3 -0.0154
2 Consumer Staples 2008-01-02 21.1 -0.0143
3 Energy 2008-01-02 62.5 0.00189
4 Financials 2008-01-02 18.6 -0.0199
5 Health Care 2008-01-02 28.8 -0.0105
6 Industrials 2008-01-02 30.4 -0.0167
7 Information Technology 2008-01-02 21.8 -0.0205
8 Market 2008-01-02 115. -0.00879
9 Materials 2008-01-02 32.2 -0.00964
10 Real Estate 2015-10-09 26.7 -0.00166
11 Utilities 2008-01-02 27.8 -0.00569</code></pre>
<p>This looks good, but I’d like to confirm that we successfully imported prices and calculated returns for each year and for each sector, meaning I want <code>group_by(year, sector)</code> and then <code>slice(1)</code>. Problem is: there’s not currently a column called <code>year</code>.</p>
<p>We can fix that by separating the date column into <code>year</code> and <code>month</code> with the incredibly useful <code>separate()</code> function. We will run <code>separate(date, c("year", "month"), sep = "-", remove = FALSE)</code>. I use <code>remove = FALSE</code> because I want to keep the <code>date</code> column.</p>
<p>It’s not necessary, but for ease of viewing in this post, I’ll peek at just sectors that contain the word “Consumer”, by calling <code>filter(sector, str_detect("Consumer"))</code>.</p>
<pre class="r"><code>sector_returns_2008_2018 %>%
separate(date, c("year", "month"), sep = "-", remove = FALSE) %>%
group_by(year, sector) %>%
slice(1) %>%
filter(str_detect(sector, "Consumer"))</code></pre>
<pre><code># A tibble: 22 x 6
# Groups: year, sector [22]
sector date year month adjClose daily_return
<chr> <date> <chr> <chr> <dbl> <dbl>
1 Consumer Discretionary 2008-01-02 2008 01 27.3 -0.0154
2 Consumer Staples 2008-01-02 2008 01 21.1 -0.0143
3 Consumer Discretionary 2009-01-02 2009 01 19.5 0.0489
4 Consumer Staples 2009-01-02 2009 01 18.5 0.0141
5 Consumer Discretionary 2010-01-04 2010 01 26.3 0.00770
6 Consumer Staples 2010-01-04 2010 01 20.9 0.00753
7 Consumer Discretionary 2011-01-03 2011 01 33.7 0.0117
8 Consumer Staples 2011-01-03 2011 01 23.6 0.00136
9 Consumer Discretionary 2012-01-03 2012 01 35.6 0.00842
10 Consumer Staples 2012-01-03 2012 01 26.9 -0.000924
# … with 12 more rows</code></pre>
<p>OK, we’ve confirmed that we have prices and returns for our sectors for each year. Those new <code>month</code> and <code>year</code> columns will come in handy later, so let’s go ahead and save them.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon <-
sector_returns_2008_2018 %>%
separate(date, c("year", "month"), sep = "-", remove = FALSE) %>%
group_by(year, sector)</code></pre>
<p>We’re going to look back on several summary statistics for 2018 first: mean daily return, standard deviation, skewness, and kurtosis of daily returns. We will use the <code>summarise()</code> function and then <code>filter(year == "2018")</code> to get our stats for just 2018.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(avg = mean(daily_return),
stddev = sd(daily_return),
skew = skewness(daily_return),
kurt = kurtosis(daily_return)) %>%
filter(year == "2018")</code></pre>
<pre><code># A tibble: 11 x 6
# Groups: year [1]
year sector avg stddev skew kurt
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2018 Consumer Discretionary 0.0000629 0.0122 -0.199 2.61
2 2018 Consumer Staples -0.000335 0.00915 -0.664 1.54
3 2018 Energy -0.000801 0.0140 -0.294 1.58
4 2018 Financials -0.000557 0.0124 -0.675 2.47
5 2018 Health Care 0.000243 0.0111 -0.604 2.31
6 2018 Industrials -0.000566 0.0120 -0.717 2.08
7 2018 Information Technology -0.0000667 0.0147 -0.336 1.82
8 2018 Market -0.000186 0.0108 -0.479 3.18
9 2018 Materials -0.000641 0.0121 -0.210 0.861
10 2018 Real Estate -0.0000957 0.0103 -0.548 1.41
11 2018 Utilities 0.000153 0.00956 -0.621 1.82 </code></pre>
<p>We can build off that code flow to select just the years 2014 and 2015 with <code>filter(year %in% c("2014", "2015")</code> and, say, the energy sector with <code>str_detect(sector, "Energy")</code>.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(avg = mean(daily_return),
stddev = sd(daily_return),
skew = skewness(daily_return),
kurt = kurtosis(daily_return)) %>%
filter(year %in% c("2014", "2015") &
str_detect(sector, "Energy"))</code></pre>
<pre><code># A tibble: 2 x 6
# Groups: year [2]
year sector avg stddev skew kurt
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2014 Energy -0.000361 0.0117 -0.891 4.43
2 2015 Energy -0.000959 0.0157 0.0157 0.813</code></pre>
<p>Think about how that code flow might be useful in a Shiny application, where we let the end user choose a sector, a year, and possibly which summary stats to calculate and display.</p>
<p>Now let’s do some visualizing.</p>
<p>We’ll start with a column chart, where the height is equal to the sector skewness for the chosen year.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(avg = mean(daily_return),
stddev = sd(daily_return),
skew = skewness(daily_return),
kurt = kurtosis(daily_return)) %>%
filter(year == "2018") %>%
ggplot(aes(x = sector, y = skew, fill = sector)) +
geom_col(width = .3) +
ylim(-1,1) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1))</code></pre>
<p><img src="/post/2019-02-06-a-look-back-on-2018-part-2_files/figure-html/unnamed-chunk-8-1.png" width="672" /></p>
<p>Here’s the same exact data, except we’ll use a scatter plot where the height of each point is the skewness.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(avg = mean(daily_return),
stddev = sd(daily_return),
skew = skewness(daily_return),
kurt = kurtosis(daily_return)) %>%
filter(year == "2018") %>%
ggplot(aes(x = sector, y = skew, color = sector)) +
geom_point(size = .8) +
ylim(-1, 1) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1))</code></pre>
<p><img src="/post/2019-02-06-a-look-back-on-2018-part-2_files/figure-html/unnamed-chunk-9-1.png" width="672" /></p>
<p>For both of the charts above, we could change our <code>filter(year == ...)</code> to choose a different year and build a new chart, but instead let’s comment out the year filter altogether, meaning we will chart all years, and then call <code>facet_wrap(~year)</code>.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(avg = mean(daily_return),
stddev = sd(daily_return),
skew = skewness(daily_return),
kurt = kurtosis(daily_return)) %>%
# filter(year == "2018") %>%
ggplot(aes(x = sector, y = skew, fill = sector)) +
geom_col(width = .5) +
ylim(-1, 1) +
theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1)) +
facet_wrap(~year)</code></pre>
<p><img src="/post/2019-02-06-a-look-back-on-2018-part-2_files/figure-html/unnamed-chunk-10-1.png" width="672" /></p>
<p>This post was originally going to be focused on standard deviation, and not skewness, but there was recently an <a href="https://blog.thinknewfound.com/2019/02/no-pain-no-premium/">excellent piece</a> on the Think New Found blog that discusses skewness and its importance as a risk measure. Definitely worth a read for the risk-return obsessed amongst us. For an R code reference, we covered skewness extensively in this <a href="https://rviews.rstudio.com/2017/12/13/introduction-to-skewness/">previous blog post</a>, and there’s bare code for the calculations on the Reproducible Finance site <a href="http://www.reproduciblefinance.com/code/skewness/">here</a>.</p>
<p>Those ggplots are nice, but let’s take a quick look at how we might do this with <code>highcharter</code>.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(avg = mean(daily_return),
stddev = sd(daily_return),
skew = skewness(daily_return),
kurt = kurtosis(daily_return)) %>%
filter(year == "2018") %>%
hchart(.,
type = 'column',
hcaes(y = skew,
x = sector,
group = sector)) %>%
hc_title(text = "2018 Sector Skew") %>%
hc_subtitle(text = "by sector") %>%
hc_xAxis(title = list(text = "")) %>%
hc_tooltip(headerFormat = "",
pointFormat = "skewness: {point.y: .4f}% <br>
mean return: {point.avg: .4f}") %>%
hc_yAxis(labels = list(format = "{value}%")) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_exporting(enabled = TRUE) %>%
hc_legend(enabled = FALSE)</code></pre>
<div id="htmlwidget-1" style="width:100%;height:500px;" class="highchart html-widget"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"hc_opts":{"title":{"text":"2018 Sector Skew"},"yAxis":{"title":{"text":"skew"},"type":"linear","labels":{"format":"{value}%"}},"credits":{"enabled":false},"exporting":{"enabled":true},"plotOptions":{"series":{"label":{"enabled":false},"turboThreshold":0,"showInLegend":true},"treemap":{"layoutAlgorithm":"squarified"},"scatter":{"marker":{"symbol":"circle"}}},"series":[{"name":"Consumer Discretionary","data":[{"year":"2018","sector":"Consumer Discretionary","avg":6.29212024387683e-05,"stddev":0.0121861445829787,"skew":-0.19944429120813,"kurt":2.60792743137546,"y":-0.19944429120813,"name":"Consumer Discretionary"}],"type":"column"},{"name":"Consumer Staples","data":[{"year":"2018","sector":"Consumer Staples","avg":-0.000335354442936463,"stddev":0.00914691288490999,"skew":-0.663911209508658,"kurt":1.53966192765663,"y":-0.663911209508658,"name":"Consumer Staples"}],"type":"column"},{"name":"Energy","data":[{"year":"2018","sector":"Energy","avg":-0.00080068717401515,"stddev":0.0140431552521789,"skew":-0.293792546325906,"kurt":1.58286282271065,"y":-0.293792546325906,"name":"Energy"}],"type":"column"},{"name":"Financials","data":[{"year":"2018","sector":"Financials","avg":-0.000556801288397995,"stddev":0.0123573811788945,"skew":-0.674658259310695,"kurt":2.47308484717972,"y":-0.674658259310695,"name":"Financials"}],"type":"column"},{"name":"Health Care","data":[{"year":"2018","sector":"Health Care","avg":0.000242798452412498,"stddev":0.0110593533992154,"skew":-0.603718531887702,"kurt":2.30880571887281,"y":-0.603718531887702,"name":"Health Care"}],"type":"column"},{"name":"Industrials","data":[{"year":"2018","sector":"Industrials","avg":-0.000565863327991203,"stddev":0.0120301635082814,"skew":-0.716625623353101,"kurt":2.07901897858249,"y":-0.716625623353101,"name":"Industrials"}],"type":"column"},{"name":"Information Technology","data":[{"year":"2018","sector":"Information Technology","avg":-6.66981199800315e-05,"stddev":0.0147197714609053,"skew":-0.336220712047966,"kurt":1.82085736162563,"y":-0.336220712047966,"name":"Information Technology"}],"type":"column"},{"name":"Market","data":[{"year":"2018","sector":"Market","avg":-0.000185823433694747,"stddev":0.0107588740626304,"skew":-0.478926291405256,"kurt":3.17915173743908,"y":-0.478926291405256,"name":"Market"}],"type":"column"},{"name":"Materials","data":[{"year":"2018","sector":"Materials","avg":-0.000641400233181771,"stddev":0.0120768281084307,"skew":-0.209666863078795,"kurt":0.86141220498136,"y":-0.209666863078795,"name":"Materials"}],"type":"column"},{"name":"Real Estate","data":[{"year":"2018","sector":"Real Estate","avg":-9.56835569738823e-05,"stddev":0.0102621792018636,"skew":-0.548410701832081,"kurt":1.41422002727793,"y":-0.548410701832081,"name":"Real Estate"}],"type":"column"},{"name":"Utilities","data":[{"year":"2018","sector":"Utilities","avg":0.000153255979575135,"stddev":0.00956326094648331,"skew":-0.620587742749803,"kurt":1.81712393049461,"y":-0.620587742749803,"name":"Utilities"}],"type":"column"}],"xAxis":{"type":"category","title":{"text":""}},"subtitle":{"text":"by sector"},"tooltip":{"headerFormat":"","pointFormat":"skewness: {point.y: .4f}% <br>\n mean return: {point.avg: .4f}"},"legend":{"enabled":false}},"theme":{"colors":["#f1c40f","#2ecc71","#9b59b6","#e74c3c","#34495e","#3498db","#1abc9c","#f39c12","#d35400"],"chart":{"backgroundColor":"#ECF0F1"},"xAxis":{"gridLineDashStyle":"Dash","gridLineWidth":1,"gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"yAxis":{"gridLineDashStyle":"Dash","gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"legendBackgroundColor":"rgba(0, 0, 0, 0.5)","background2":"#505053","dataLabelsColor":"#B0B0B3","textColor":"#34495e","contrastTextColor":"#F0F0F3","maskColor":"rgba(255,255,255,0.3)"},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"chart","fonts":[],"debug":false},"evals":[],"jsHooks":[]}</script>
<p>Hover on the bars and notice that we included the mean return for each sector as well. That’s the beauty of <code>highcharter</code>: we can easily add more data in the tooltip using the <code>hc_tooltip()</code> function. Those skews look pretty daunting, but that’s down to the scale of the y-axis of this chart, which defaults to a max of 0 and a minimum of .8%. Let’s coerce it to max of 1 and a min of -1, which is a rough boundary for where we are comfortable with skewness.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(avg = mean(daily_return),
stddev = sd(daily_return),
skew = skewness(daily_return),
kurt = kurtosis(daily_return)) %>%
filter(year == "2018") %>%
hchart(.,
type = 'column',
hcaes(y = skew,
x = sector,
group = sector)) %>%
hc_title(text = "2018 Skew by Sector") %>%
hc_xAxis(title = list(text = "")) %>%
hc_tooltip(headerFormat = "",
pointFormat = "skewness: {point.y: .4f}% <br>
mean return: {point.avg: .4f}") %>%
hc_yAxis(labels = list(format = "{value}%"),
min = -1,
max =1) %>%
hc_add_theme(hc_theme_flat()) %>%
hc_exporting(enabled = TRUE) %>%
hc_legend(enabled = FALSE)</code></pre>
<div id="htmlwidget-2" style="width:100%;height:500px;" class="highchart html-widget"></div>
<script type="application/json" data-for="htmlwidget-2">{"x":{"hc_opts":{"title":{"text":"2018 Skew by Sector"},"yAxis":{"title":{"text":"skew"},"type":"linear","labels":{"format":"{value}%"},"min":-1,"max":1},"credits":{"enabled":false},"exporting":{"enabled":true},"plotOptions":{"series":{"label":{"enabled":false},"turboThreshold":0,"showInLegend":true},"treemap":{"layoutAlgorithm":"squarified"},"scatter":{"marker":{"symbol":"circle"}}},"series":[{"name":"Consumer Discretionary","data":[{"year":"2018","sector":"Consumer Discretionary","avg":6.29212024387683e-05,"stddev":0.0121861445829787,"skew":-0.19944429120813,"kurt":2.60792743137546,"y":-0.19944429120813,"name":"Consumer Discretionary"}],"type":"column"},{"name":"Consumer Staples","data":[{"year":"2018","sector":"Consumer Staples","avg":-0.000335354442936463,"stddev":0.00914691288490999,"skew":-0.663911209508658,"kurt":1.53966192765663,"y":-0.663911209508658,"name":"Consumer Staples"}],"type":"column"},{"name":"Energy","data":[{"year":"2018","sector":"Energy","avg":-0.00080068717401515,"stddev":0.0140431552521789,"skew":-0.293792546325906,"kurt":1.58286282271065,"y":-0.293792546325906,"name":"Energy"}],"type":"column"},{"name":"Financials","data":[{"year":"2018","sector":"Financials","avg":-0.000556801288397995,"stddev":0.0123573811788945,"skew":-0.674658259310695,"kurt":2.47308484717972,"y":-0.674658259310695,"name":"Financials"}],"type":"column"},{"name":"Health Care","data":[{"year":"2018","sector":"Health Care","avg":0.000242798452412498,"stddev":0.0110593533992154,"skew":-0.603718531887702,"kurt":2.30880571887281,"y":-0.603718531887702,"name":"Health Care"}],"type":"column"},{"name":"Industrials","data":[{"year":"2018","sector":"Industrials","avg":-0.000565863327991203,"stddev":0.0120301635082814,"skew":-0.716625623353101,"kurt":2.07901897858249,"y":-0.716625623353101,"name":"Industrials"}],"type":"column"},{"name":"Information Technology","data":[{"year":"2018","sector":"Information Technology","avg":-6.66981199800315e-05,"stddev":0.0147197714609053,"skew":-0.336220712047966,"kurt":1.82085736162563,"y":-0.336220712047966,"name":"Information Technology"}],"type":"column"},{"name":"Market","data":[{"year":"2018","sector":"Market","avg":-0.000185823433694747,"stddev":0.0107588740626304,"skew":-0.478926291405256,"kurt":3.17915173743908,"y":-0.478926291405256,"name":"Market"}],"type":"column"},{"name":"Materials","data":[{"year":"2018","sector":"Materials","avg":-0.000641400233181771,"stddev":0.0120768281084307,"skew":-0.209666863078795,"kurt":0.86141220498136,"y":-0.209666863078795,"name":"Materials"}],"type":"column"},{"name":"Real Estate","data":[{"year":"2018","sector":"Real Estate","avg":-9.56835569738823e-05,"stddev":0.0102621792018636,"skew":-0.548410701832081,"kurt":1.41422002727793,"y":-0.548410701832081,"name":"Real Estate"}],"type":"column"},{"name":"Utilities","data":[{"year":"2018","sector":"Utilities","avg":0.000153255979575135,"stddev":0.00956326094648331,"skew":-0.620587742749803,"kurt":1.81712393049461,"y":-0.620587742749803,"name":"Utilities"}],"type":"column"}],"xAxis":{"type":"category","title":{"text":""}},"tooltip":{"headerFormat":"","pointFormat":"skewness: {point.y: .4f}% <br>\n mean return: {point.avg: .4f}"},"legend":{"enabled":false}},"theme":{"colors":["#f1c40f","#2ecc71","#9b59b6","#e74c3c","#34495e","#3498db","#1abc9c","#f39c12","#d35400"],"chart":{"backgroundColor":"#ECF0F1"},"xAxis":{"gridLineDashStyle":"Dash","gridLineWidth":1,"gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"yAxis":{"gridLineDashStyle":"Dash","gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"legendBackgroundColor":"rgba(0, 0, 0, 0.5)","background2":"#505053","dataLabelsColor":"#B0B0B3","textColor":"#34495e","contrastTextColor":"#F0F0F3","maskColor":"rgba(255,255,255,0.3)"},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"chart","fonts":[],"debug":false},"evals":[],"jsHooks":[]}</script>
<p>Let’s explore one more piece of data. After breaking up the date into year and month and looking at the daily returns and skewness, I got to wondering if the minimum daily return for each sector tended to fall in a certain month. There’s no reason it should, but it seems like a trend we might want to parse, or at least have thought about in case we need it.</p>
<p>My first instinct was to use <code>summarise()</code> and get the minimum daily return for each year-sector pair.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
summarise(min_ret = min(daily_return)) %>%
head()</code></pre>
<pre><code># A tibble: 6 x 3
# Groups: year [1]
year sector min_ret
<chr> <chr> <dbl>
1 2008 Consumer Discretionary -0.116
2 2008 Consumer Staples -0.0568
3 2008 Energy -0.160
4 2008 Financials -0.182
5 2008 Health Care -0.0687
6 2008 Industrials -0.0947</code></pre>
<p>The problem with that flow is that our <code>month</code> got deleted and we would like to preserve that for charting. We’re better off to <code>filter()</code> by the <code>min(daily_return)</code>.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
select(-adjClose, -date) %>%
filter(daily_return == min(daily_return)) %>%
group_by(sector) %>%
filter(year == "2008") %>%
head()</code></pre>
<pre><code># A tibble: 6 x 4
# Groups: sector [6]
sector year month daily_return
<chr> <chr> <chr> <dbl>
1 Consumer Discretionary 2008 10 -0.116
2 Consumer Staples 2008 12 -0.0568
3 Energy 2008 10 -0.160
4 Financials 2008 12 -0.182
5 Health Care 2008 10 -0.0687
6 Industrials 2008 10 -0.0947</code></pre>
<p>That’s giving us the same end data for the minimum daily return, but it’s also preserving the <code>month</code> column.</p>
<p>Let’s take a quick look to see if any months jump out as frequent holders of the minimum daily return. Note that we’ll need to <code>ungroup()</code> the data before running <code>count(month)</code>.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
filter(daily_return == min(daily_return)) %>%
ungroup() %>%
count(month) </code></pre>
<pre><code># A tibble: 11 x 2
month n
<chr> <int>
1 01 6
2 02 18
3 03 4
4 04 4
5 05 14
6 06 22
7 08 23
8 09 3
9 10 9
10 11 3
11 12 8</code></pre>
<p>Hmmm, months 5, 6 and 8 jump out a bit. Let’s translate those to their actual names using <code>mutate(month = month(date, label = TRUE, abbr = FALSE))</code>.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
filter(daily_return == min(daily_return)) %>%
mutate(month = month(date, label = TRUE, abbr = FALSE)) %>%
ungroup() %>%
count(month)</code></pre>
<pre><code># A tibble: 11 x 2
month n
<ord> <int>
1 January 6
2 February 18
3 March 4
4 April 4
5 May 14
6 June 22
7 August 23
8 September 3
9 October 9
10 November 3
11 December 8</code></pre>
<p>Visualizing these monthly tendencies was a bit more involved than I had anticipated, and that usually means I’ve missed a simpler solution somewhere, but I’ll post my brute force insanity in case it’s helpful to others.</p>
<p>I want to create a chart that looks like this, with months on the x-axis and the minimum daily return for each sector on the y-axis, almost as if we’re trying to see if the minimum daily returns tend to cluster in any months.</p>
<p><img src="/post/2019-02-06-a-look-back-on-2018-part-2_files/figure-html/unnamed-chunk-17-1.png" width="672" /></p>
<p>To create that chart, I want the names of the months on the x-axis, but also in the correct order. If we coerce the numbers to month names ahead of charting, <code>ggplot</code> will put them in alphabetical order, which is not what we want.</p>
<p>To solve that problem, I first created a vector of months.</p>
<pre class="r"><code>months <-
sector_returns_2008_2018_year_mon %>%
mutate(months = month(date, label = TRUE, abbr = FALSE)) %>%
pull() %>%
levels() %>%
as.character()
months</code></pre>
<pre><code> [1] "January" "February" "March" "April" "May"
[6] "June" "July" "August" "September" "October"
[11] "November" "December" </code></pre>
<p>Next, comes our usual flow from the sector returns to <code>ggplot</code>, but first we coerce the <code>month</code> column with <code>as.numeric()</code> (when we used <code>separate()</code> before, it created a character column). Then we put month on the x-axis with <code>ggplot(aes(x = month...))</code>. To create the proper labels, we use <code>scale_x_continuous( breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), labels = months)</code> to add 12 breaks and label them with our <code>months</code> vector that we created above.</p>
<pre class="r"><code> sector_returns_2008_2018_year_mon %>%
filter(daily_return == min(daily_return)) %>%
mutate(month = as.numeric(month)) %>%
ggplot(aes(x = month, y = daily_return, color = sector)) +
geom_point() +
scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
labels = months) +
labs(y = "min return", title = "2008 - 2018 Min Returns by Month") +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(hjust = 0.5)) </code></pre>
<p><img src="/post/2019-02-06-a-look-back-on-2018-part-2_files/figure-html/unnamed-chunk-19-1.png" width="672" /> We can facet by sector if we want to break this into pieces.</p>
<pre class="r"><code> sector_returns_2008_2018_year_mon %>%
filter(daily_return == min(daily_return)) %>%
mutate(month = as.numeric(month)) %>%
ggplot(aes(x = month, y = daily_return, color = sector)) +
geom_point() +
scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
labels = months) +
labs(y = "min return", title = "2008 - 2018 Min Returns by Month") +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~sector)</code></pre>
<p><img src="/post/2019-02-06-a-look-back-on-2018-part-2_files/figure-html/unnamed-chunk-20-1.png" width="672" /></p>
<p>Or we can facet by year.</p>
<pre class="r"><code>sector_returns_2008_2018_year_mon %>%
filter(daily_return == min(daily_return)) %>%
mutate(month = as.numeric(month)) %>%
ggplot(aes(x = month, y = daily_return, color = sector)) +
geom_point() +
scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
labels = months) +
labs(y = "min return", title = "2008 - 2018 Min Returns by Month") +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
plot.title = element_text(hjust = 0.5)) +
facet_wrap(~year)</code></pre>
<p><img src="/post/2019-02-06-a-look-back-on-2018-part-2_files/figure-html/unnamed-chunk-21-1.png" width="672" /></p>
<p>Interesting to see that in 2011, each of our sectors had their minimum daily return in the same month.</p>
<p>That’s all for today. Thanks for reading and see you next time!</p>
<script>window.location.href='https://rviews.rstudio.com/2019/02/12/a-look-back-on-2018-part-2/';</script>
R for Quantitative Health Sciences: An Interview with Jarrod Dalton
https://rviews.rstudio.com/2019/02/06/r-for-quantitative-health-sciences-an-interview-with-jarrod-dalton/
Wed, 06 Feb 2019 00:00:00 +0000https://rviews.rstudio.com/2019/02/06/r-for-quantitative-health-sciences-an-interview-with-jarrod-dalton/
<p>This interview came about through researching R-based medical applications in preparation for the upcoming <a href="https://r-medicine.com/">R/Medicine</a> conference. When we discovered the impressive number of Shiny-based <a href="http://riskcalc.org:3838/">Risk Calculators</a> developed by the <a href="https://my.clevelandclinic.org/">Cleveland Clinic</a> and implemented in public-facing sites, we wanted to learn more about the influence of R Language in the development of statistical science at this prominent institution. We were fortunate to have Jarrod Dalton of the <a href="https://www.lerner.ccf.org/qhs/">Quantitative Health Sciences</a> Department grant this interview.</p>
<p>Jarrod Dalton, PhD is an assistant staff scientist in the Department of Quantitative Health Sciences and an assistant professor of medicine in the Cleveland Clinic Lerner College of Medicine at Case Western Reserve University in Cleveland, Ohio. (Twitter: @daltonjarrod)</p>
<p><em>JBR: QHS has been a leader in medical statistical research for quite some time. Have recent developments in big data, data science and machine learning changed the nature of your work? Are these trends bringing new tools and new challenges to QHS projects?</em></p>
<p>JD: Yes and no. On one hand, the health care sector has been at least as dynamic over the past 10-20 years as the fields of data science and machine learning. Clinical and biological research is quite diverse, and these trends have only added to its diversity. We work on studies with p>>n, n>>p, and everything in between. The complexity of problems in biomedical research has spurred new methodological innovations by our department, such as random survival forests. Modern machine learning algorithms are amenable to certain types of problems; I’d say that the most impactful manifestations to date of machine learning in medicine are in the fields of radiology and genomics, where diagnostic and prognostic problems are well-defined.</p>
<p>On the other hand, there are unique challenges relating to the application of machine learning algorithms in medicine. Doctors and patients are averse to trusting a model if they don’t understand how and why the prediction is being made. Doctors often have justifiable and either unquantified or unquantifiable reasons for disbelieving predictions on the basis of other clinical information they obtain during the process of care. Predictions inform treatment decisions (or decisions not to administer treatments), and many other issues are involved with optimal clinical decision-making (e.g., physician judgment, patient preferences, cost effectiveness of different therapies, discounting rate for health events that are distant in the future, trade-offs between quality of life and longevity, issues relating to health literacy, numeracy and the communication of risk, and desired degree of participation in the decision-making process on behalf of the patient).</p>
<p>Much of our work has, and will continue to be, in the clinical trials space, as well as good-old biostatistical consulting. We have a department of over 100 people and publish over 400 academic research studies every year, the vast majority of which arising from traditional statistical collaboration.</p>
<p><em>JBR: Ten years ago, a typical medical statistics department might consist of a number of Ph.D. statisticians who did almost no coding supported by a legion of SAS programmers. Have open-source languages such as R and Python changed the way work gets done? Do more statisticians now do their own coding? Do you see a movement away from SAS towards open source tools? Do you see clinicians doing their own analyses with R?</em></p>
<p>JD: We have dedicated consulting teams that are embedded within some of the more research-intensive clinical specialties at our institution, such as heart and vascular, oncology, urology, anesthesiology, neurology, and orthopaedics. Another consulting team, which we call the “alpha-beta team”, is composed of statisticians who allocate their time to the smaller sub-specialties. More recently, we have been successful at establishing externally funded research labs, headed by QHS principal investigators. Each of these teams has their own way of doing things. We are supported by a number of dedicated RStudio servers, as well as a high-performance computing cluster with R and Python capabilities. On the SAS front, our institution has a high-performance Enterprise Miner environment to support both research and business intelligence. All this having been said, roughly speaking, about half of our department uses R and half uses SAS. Some of our researchers in genomics and image analytics use Python, or complex pipelines that incorporate Python, R, and other tools.</p>
<p>I personally have used R since 2002. I have seen the power of open-source software, with R constantly reinventing itself in a variety of ways. I can’t believe that <code>ggplot</code> and <code>plyr</code> are 10 years old. The tidyverse has changed the way I think. This is especially so for the <code>dplyr</code> and <code>purrr</code> packages, which have enabled much greater efficiency and transparency. My team has recently taken advantage of distributed database computing via a <code>dbplyr</code>/Teradata Warehouse stack, using electronic health data from 2.7 million patients.</p>
<p>More and more physician scientists are training with R. Our partner university, Case Western Reserve, has a two-course sequence on data science based in R, and that sequence is a component of several Master’s programs that many of our clinicians pursue. Some of them can code up a storm! Others know just enough to be dangerous.</p>
<p><em>JBR: The number of Shiny-based <a href="http://riskcalc.org:3838/">Risk Calculators</a> implemented on your website is astounding, both in their level of sophistication and in the number of topics covered. What is your goal for this project, and how would you like the calculators to be used? Can you say something about the challenges (both medical and technical) you faced in building these calculators?</em></p>
<p>JD: The goal of the project is to inform clinicians as to our best estimates of predicted outcomes. These predictions have been shown in several studies to be more accurate than clinical judgment or crude decision trees. Ultimately, these more accurate predictions should translate into better medical decision-making, especially with regard to treatment selection. The major challenge occurs up front: working with the clinician to clearly articulate the prediction that is needed – that which would be most hopeful for prospective decision-making. Modeling usually goes very well except when the outcome of interest is very rare: those models often turn out to be not very useful clinically because they never predict a high probability for the rare outcome.</p>
<p>From a technical perspective, the challenge is how to make our data insights and predictive models available online. Before R shiny, our RiskCalc team had tried several web platforms and were not satisfied. We have some very sophisticated models and those platforms either do not support complex computing algorithms or require a lot of programming effort. Using R shiny makes the process of converting our models into web applications quick and easy. The next steps for our RiskCalc team are to improve the user interface and collect feedback from the clinicians.</p>
<p><em>JBR: How important is reproducible research to QHS, and what role does R play in building reproducible workflows?</em></p>
<p>We have seen a steady progression toward reproducible research practices in medicine. All clinical trial protocols must now be pre-registered, and proof of adherence to the pre-registered protocol is now a standard requirement of many of the top journals. Somewhat controversially, there has recently been a lot of discussion about a “replication crisis”, particularly in the psychological sciences (but perhaps unfairly so). In any case, the increased focus on replicability has led to an increased need for reproducible research practices.</p>
<p>Nik Krieger and I have recently made an R package, called <a href="https://github.com/NikKrieger/projects"><code>projects</code></a>, that is specifically designed for reproducible academic manuscript development workflows. While the projects package has other features - like the ability to develop and maintain a coauthor database, complete with institutional affiliations, or the ability to automatically generate title pages for manuscripts using the authors’ institutional affiliations - its core functionality is establishing project directories with Markdown templates corresponding to each phase of the academic research pipeline (protocol, data wrangling, analysis, and reporting). Project metadata are stored in a tibble, so that teams can prioritize and strategize directly from the R console.</p>
<p><em>JBR: Do you have any additional thoughts about the use of R in Medicine that you would like to share with our readers?</em></p>
<p>What comes to mind are current challenges in integrating R into production environments in medicine, such as the electronic health record (EHR). EHR systems are not open-source, and there are many vendors. Even for a single vendor, implementations at different institutions may look wildly different from a data perspective. Our EHR system has over 10,000 tables. There are so many challenges to implementing anything in the healthcare space. That may sound pessimistic, but I actually intend to communicate the significant number of opportunities for using R to positively influence the health of populations. We have fantastic clinical partners and champions. We’re always getting better. The work is important and rewarding.</p>
<script>window.location.href='https://rviews.rstudio.com/2019/02/06/r-for-quantitative-health-sciences-an-interview-with-jarrod-dalton/';</script>
December 2018: “Top 40” New CRAN Packages
https://rviews.rstudio.com/2019/01/30/december-2108-top-40-new-cran-packages/
Wed, 30 Jan 2019 00:00:00 +0000https://rviews.rstudio.com/2019/01/30/december-2108-top-40-new-cran-packages/
<p>By my count, 157 new packages stuck to CRAN in December. Below are my “Top 40” picks in ten categories: Computational Methods, Data, Finance, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities and Visualization. This is the first time I have used the Medicine category. I am pleased that a few packages that appear to have clinical use made the cut. Also noteworthy in this month’s selection are the inclusion of four packages from the Microsoft Azure team (stuffing 41 packages into the “Top 40”), and some eclectic, but fascinating packages in the Science section.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=ar.matrix">ar.matrix</a> v0.1.0: Provides functions that use precision matrices and Choleski factorization to simulates auto-regressive data. The <a href="https://cran.r-project.org/web/packages/ar.matrix/readme/README.html">README</a> offers examples.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/ar.png" height = "400" width="600"></p>
<p><a href="https://CRAN.R-project.org/package=mvp">mvp</a> v1.0-2: Provides functions for the fast symbolic manipulation polynomials. See the <a href="https://cran.r-project.org/web/packages/mvp/vignettes/mvp.html">vignette</a> and this R Journal <a href="https://journal.r-project.org/archive/2013-1/kahle.pdf">paper</a> for details on how to create this image of the Rosenbrock function.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/mvp.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=pomdp">pomdp</a> v0.9.1: Provides an interface to <a href="http://www.pomdp.org/code/index.html"><code>pomdp-solve</code></a>, a solver for Partially Observable Markov Decision Processes (POMDP). See the <a href="https://cran.r-project.org/web/packages/pomdp/vignettes/POMDP.pdf">vignette</a> for examples.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/pomdp.png" height = "400" width="600"></p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=dbparser">dbparser</a> v1.0.0: Provides a tool for parsing the <a href="http://drugbank.ca">DrugBank</a> XML database. The <a href="https://cran.r-project.org/web/packages/dbparser/vignettes/dbparser.html">vignette</a> shows how to get started.</p>
<p><a href="https://cran.r-project.org/package=rdhs">rdhs</a> v0.6.1: Implements a client querying the <a href="https://api.dhsprogram.com/#/index.html">DHS API</a> to download and manipulate survey datasets and metadata. There are introductions to using <a href="https://cran.r-project.org/web/packages/rdhs/vignettes/introduction.html">rdhs</a> and the <a href="https://cran.r-project.org/web/packages/rdhs/vignettes/client.html">rdhs client</a>, an extended example about <a href="https://cran.r-project.org/web/packages/rdhs/vignettes/anemia.html">Anemia prevalence</a>, and vignettes on <a href="https://cran.r-project.org/web/packages/rdhs/vignettes/country_codes.html">Country Codes</a>, <a href="https://cran.r-project.org/web/packages/rdhs/vignettes/geojson.html">Interacting with the geojson API results</a>, and <a href="https://cran.r-project.org/web/packages/rdhs/vignettes/testing.html">Testing</a>.</p>
<h3 id="finance">Finance</h3>
<p><a href="https://cran.r-project.org/package=optionstrat">optionstrat</a> v1.0.0: Implements the Black-Scholes-Merton option pricing model to calculate key option analytics and graphical analysis of various option strategies. See the <a href="https://cran.r-project.org/web/packages/optionstrat/vignettes/optionstrat_vignette.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=riskParityPortfolio">riskParityPortfolio</a> v0.1.1: Provides functions to design risk parity portfolios for financial investment. In addition to the vanilla formulation, where the risk contributions are perfectly equalized, many other formulations are considered that allow for box constraints and short selling. The package is based on the papers: <a href="doi:10.1109/TSP.2015.2452219">Feng and Palomar (2015)</a>, <a href="doi:10.2139/ssrn.2297383">Spinu (2013)</a>, and <a href="arXiv:1311.4057">Griveau-Billion et al.(2013)</a>. See the <a href="https://cran.r-project.org/web/packages/riskParityPortfolio/vignettes/RiskParityPortfolio.html">vignette</a> for an example.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/riskParityPortfolio.png" height = "400" width="600"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=BTM">BTM</a> v0.2: Provides functions to find <a href="https://github.com/xiaohuiyan/xiaohuiyan.github.io/blob/master/paper/BTM-WWW13.pdf"><code>Biterm</code></a> topics in collections of short texts. In contrast to topic models, which analyze word-document co-occurrence, biterms consist of two words co-occurring in the same short text window.</p>
<p><a href="https://cran.r-project.org/package=ParBayesianOptimization">ParBayesianOptimization</a> v0.0.1: Provides a framework for optimizing Bayesian hyperparameters according to the methods described in <a href="https://arxiv.org/abs/1206.2944">Snoek et al. (2012)</a>. There are vignettes on <a href="https://cran.r-project.org/web/packages/ParBayesianOptimization/vignettes/standardFeatures.html">standard</a> and <a href="https://cran.r-project.org/web/packages/ParBayesianOptimization/vignettes/advancedFeatures.html">advanced</a> features.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/ParB.png" height = "400" width="600"></p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=LUCIDus">LUCIDus</a> v0.9.0: Implements the <code>LUCID</code> method to jointly estimate latent unknown clusters/subgroups with integrated data. See the <a href="https://cran.r-project.org/web/packages/LUCIDus/vignettes/LUCIDus-vignette.html">vignette</a> for details.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/LUCID.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=metaRMST">metaRMST</a> v1.0.0: Provides functions that use individual patient-level data to produce a multivariate meta-analysis of randomized controlled trials with the difference in restricted mean survival times ( <a href="https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-13-152">RMSTD</a> ).</p>
<p><a href="https://cran.r-project.org/package=webddx">webddx</a> v0.1.0: Implements a differential-diagnosis generating tool. Given a list of symptoms, the function <code>query_fz</code> queries the <a href="http://www.findzebra.com/">FindZebra</a> website and returns a differential-diagnosis list.</p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=bioRad">bioRad</a> v0.4.0: Provides functions to extract, visualize, and summarize aerial movements of birds and insects from weather radar data. There is an <a href="https://cran.r-project.org/web/packages/bioRad/vignettes/bioRad.html">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/bioRad/vignettes/rad_aero_18.html">Exercises</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/bioRad.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=pmd">pmd</a> v0.1.1: Implements the paired mass distance analysis proposed in <a href="doi:10.1016/j.aca.2018.10.062">Yu, Olkowicz and Pawliszyn (2018)</a> for gas/liquid chromatography–mass spectrometry. See the <a href="https://cran.r-project.org/web/packages/pmd/vignettes/globalstd.html">vignette</a> for an introduction.</p>
<p><a href="https://cran.r-project.org/package=tabula">tabula</a> v1.0.0: Provides functions to examine archaeological count data and includes several measures of diversity. There are vignettes on <a href="https://cran.r-project.org/web/packages/tabula/vignettes/diversity.html">Diversity Measures</a>, <a href="https://cran.r-project.org/web/packages/tabula/vignettes/matrix.html">Matrix Classes</a>, and <a href="https://cran.r-project.org/web/packages/tabula/vignettes/seriation.html">Matrix Seriation</a>. This last vignette includes an example reproducing the results of <a href="https://doi.org/10.1016/j.jas.2012.04.040">Peeples and Schachner (2012)</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/tabula.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=traitdataform">traitdataform</a> v0.5.2: Provides functions to assist with handling ecological trait data and applying the Ecological Trait-Data Standard terminology described in <a href="doi:10.1101/328302">Schneider et al. (2018)</a>.</p>
<p><a href="https://cran.r-project.org/package=waterquality">waterquality</a> v0.2.2: Implements over 45 algorithms to develop water quality indices from satellite reflectance imagery. The <a href="https://cran.r-project.org/web/packages/waterquality/vignettes/waterquality_vignette.html">vignette</a> introduces the package.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/waterquality.png" height = "400" width="600"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=areal">areal</a> v0.1.2: Implements areal weighted interpolation with support for multiple variables in a workflow that is compatible with the <code>tidyverse</code> and <code>sf</code> frameworks. There are vignettes on <a href="https://cran.r-project.org/web/packages/areal/vignettes/areal.html">Areal Interpolation</a>, <a href="https://cran.r-project.org/web/packages/areal/vignettes/areal-weighted-interpolation.html">Wieghted Areal Interpoaltion</a>, and <a href="https://cran.r-project.org/web/packages/areal/vignettes/data-preparation.html">Preparing Data for Interpolation</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/areal.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=FLAME">FLAME</a> v1.0.0: Implements the Fast Large-scale Almost Matching Exactly algorithm of <a href="arXiv:1707.06315">Roy et al. (2017)</a> for causal inference. Look at the <a href="https://cran.r-project.org/web/packages/FLAME/readme/README.html">README</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=mistr">mistr</a> v0.0.1: Offers a computational framework for mixture distributions with a focus on composite models. There is an <a href="https://cran.r-project.org/web/packages/mistr/vignettes/mistr-introduction.pdf">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/mistr/vignettes/mistr-extensions.pdf">Extensions</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/mistr.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=mlergm">mlergm</a> v0.1: Provides functions to estimate exponential-family random graph models for multilevel network data, assuming the multilevel structure is observed. There is a <a href="https://cran.r-project.org/web/packages/mlergm/vignettes/mlergm_tutorial.html">Tutorial</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/mlergm.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=MTLR">MTLR</a> v0.1.0: Implements the Multi-Task Logistic Regression (MTLR) proposed by <a href="https://papers.nips.cc/paper/4210-learning-patient-specific-cancer-survival-distributions-as-a-sequence-of-dependent-regressors">Yu et al. (2011)</a>. See the <a href="https://cran.r-project.org/web/packages/MTLR/vignettes/workflow.html">vignette</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/MTLR.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=multiRDPG">mulitRDPG</a> v1.0.1: Provides functions to fit the Multiple Random Dot Product Graph Model and performs a test for whether two networks come from the same distribution. See <a href="arXiv:1811.12172">Nielsen and Witten (2018)</a> for details.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/multiRDPG.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=ocp">ocp</a> v0.1.0: Implements the Bayesian online changepoint detection method of <a href="arXiv:0710.3742">Adams and MacKay (2007)</a> for univariate or multivariate data. Gaussian and Poisson probability models are implemented. The <a href="https://cran.r-project.org/web/packages/ocp/vignettes/introduction.html">vignette</a> provides an introduction.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/ocp.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=probably">probably</a> v0.0.1: Provides tools for post-processing class probability estimates. See the vignettes <a href="https://cran.r-project.org/web/packages/probably/vignettes/where-to-use.html">Where does probability fit in?</a> and <a href="https://cran.r-project.org/web/packages/probably/vignettes/equivocal-zones.html">Equivocal Zones</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/probably.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=smurf">smurf</a> v1.0.0: Implements the SMuRF algorithm of <a href="arXiv:1810.03136">Devriendt et al. (2018)</a> to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood. See the package <a href="https://cran.r-project.org/web/packages/smurf/vignettes/smurf.html">Introduction</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/smurf.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=subtee">subtee</a> v0.3-4: Provides functions for naive and adjusted treatment effect estimation for subgroups. Proposes model averaging <a href="doi:10.1002/pst.1796">Bornkamp et al. (2016)</a> and bagging <a href="doi:10.1002/bimj.201500147">Rosenkranz (2016)</a> to address the problem of selection bias in treatment effect estimation for subgroups. There is a <a href="https://cran.r-project.org/web/packages/subtee/vignettes/subtee_package.html">Introduction</a> and vignettes for the <a href="https://cran.r-project.org/web/packages/subtee/vignettes/plotting_functions.html">plot</a> and <a href="https://cran.r-project.org/web/packages/subtee/vignettes/subbuild_function.html">subbuild</a> functions.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/subtee.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=xspliner">xspliner</a> v0.0.2: Provides functions to assist model building using surrogate black-box models to train interpretable spline based, additive models. There are vignettes on <a href="https://cran.r-project.org/web/packages/xspliner/vignettes/xspliner.html">Basic Theory and Usage</a>, <a href="https://cran.r-project.org/web/packages/xspliner/vignettes/automation.html">Automation</a>, <a href="https://cran.r-project.org/web/packages/xspliner/vignettes/discrete.html">Classification</a>, <a href="https://cran.r-project.org/web/packages/xspliner/vignettes/cases.html">Use Cases</a>, <a href="https://cran.r-project.org/web/packages/xspliner/vignettes/graphics.html">Graphics</a>, <a href="https://cran.r-project.org/web/packages/xspliner/vignettes/extras.html">Extra Information</a>, and the <a href="https://cran.r-project.org/web/packages/xspliner/vignettes/methods.html">xspliner Environment</a>.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/xspliner.png" height = "400" width="600"></p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=mfbvar">mfbvar</a> v0.4.0: Provides functions for estimating mixed-frequency Bayesian vector autoregressive (VAR) models with Minnesota or steady-state priors as those used by <a href="doi:10.1080/07350015.2014.954707">Schorfheide and Song (2015)</a>, or by <a href="http://uu.diva-portal.org/smash/get/diva2:1260262/FULLTEXT01.pdf">Ankargren, Unosson and Yang (2018)</a>. Look at the <a href="https://github.com/ankargren/mfbvar">GitHub page</a> for an example.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/mfbvar.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=NTS">NTS</a> v1.0.0: Provides functions to simulate, estimate, predict, and identify models for nonlinear time series.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=AzureContainers">AzureContainers</a> v1.0.0: Implements an interface to container functionality in Microsoft’s <a href="https://azure.microsoft.com/en-us/overview/containers/"><code>Azure</code></a> cloud that enables users to manage the the <code>Azure Container Instance</code>, <code>Azure Container Registry</code>, and <code>Azure Kubernetes Service</code>. There are vignettes on <a href="https://cran.r-project.org/web/packages/AzureContainers/vignettes/vig01_plumber_deploy.html">Plumber model deployment</a> and <a href="https://cran.r-project.org/web/packages/AzureContainers/vignettes/vig02_mmls_deploy.html">Machine Learning server model deployment</a>.</p>
<p><a href="https://cran.r-project.org/package=AzureRMR">AzureRMR</a> v1.0.0: Implements lightweight interface to the <a href="https://docs.microsoft.com/en-us/rest/api/resources/">Azure Resource Manager</a> REST API. The package exposes classes and methods for <a href="https://searchmicroservices.techtarget.com/definition/OAuth"><code>OAuth</code> authentication</a> and working with subscriptions and resource group. There is an <a href="https://cran.r-project.org/web/packages/AzureRMR/vignettes/intro.html">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/AzureRMR/vignettes/extend.html">Extending AzureRMR</a>.</p>
<p><a href="https://cran.r-project.org/package=AzureStor">AzureStor</a> v1.0.0: Provides tools to manage storage in Microsoft’s <a href="https://azure.microsoft.com/services/storage"><code>Azure</code></a> cloud. See the <a href="https://cran.r-project.org/web/packages/AzureStor/vignettes/intro.html">Introduction</a>.</p>
<p><a href="https://cran.r-project.org/package=AzureVM">AzureVM</a> v1.0.0: Implements tools for working with virtual machines and clusters of virtual machines in Microsoft’s <a href="https://azure.microsoft.com/en-us/services/virtual-machines/"><code>Azure</code></a> cloud. See the <a href="https://cran.r-project.org/web/packages/AzureVM/vignettes/intro.html">Introduction</a>.</p>
<p><a href="https://cran.r-project.org/package=cliapp">cliapp</a> v0.1.0: Provides functions that facilitate creating rich command line applications with colors, headings, lists, alerts, progress bars, and custom CSS-based themes. See the <a href="https://cran.r-project.org/web/packages/cliapp/readme/README.html">README</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=projects">projects</a> v0.1.0: Provides a project infrastructure with a focus on manuscript creation. See the <a href="https://cran.r-project.org/web/packages/projects/readme/README.html">README</a> for the conceptual framework and an introduction to the package.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/projects.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=remedy">remedy</a> v0.1.0: Implements an RStudio Addin offering shortcuts for writing in <code>Markdown</code>.</p>
<p><a href="https://cran.r-project.org/package=solartime">solartime</a> v0.0.1: Provides functions for computing sun position and times of sunrise and sunset. The <a href="https://cran.r-project.org/web/packages/solartime/vignettes/overview.html">vignette</a> offers an overview.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://CRAN.R-project.org/package=easyalluvial">easyalluvial</a> v0.1.8: Provides functions to simplify Alluvial plots for visualizing categorical data over multiple dimensions as flows. See <a href="doi:10.1371/journal.pone.0008694">Rosvall and Bergstrom (2010)</a>. See the <a href="https://cran.r-project.org/web/packages/easyalluvial/readme/README.html">README</a> for details.</p>
<p><img src="/post/2019-01-24-Dec2018-NewPkgs_files/easyalluvial.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=spatialwidget">spatialwidget</a> v0.2: Provides functions for converting R objects, such as simple features, into structures suitable for use in <a href="https://cran.r-project.org/package=htmlwidgets"><code>htmlwidgets</code></a> mapping libraries. See the <a href="https://cran.r-project.org/web/packages/spatialwidget/vignettes/spatialwidget.html">vignette</a> for details.</p>
<p><a href="https://cran.r-project.org/package=transformr">transformr</a> v0.1.1: Provides an extensive framework for manipulating the shapes of polygons and paths and can be seen as the spatial brother to the <a href="https://CRAN.R-project.org/package=tweenr">tweenr</a> package. See the <a href="https://cran.r-project.org/web/packages/transformr/readme/README.html">README</a> for details.</p>
<p><img src="https://cran.r-project.org/web/packages/transformr/readme/man/figures/README-unnamed-chunk-5.gif" height = "400" width="600"></p>
<script>window.location.href='https://rviews.rstudio.com/2019/01/30/december-2108-top-40-new-cran-packages/';</script>
Onboard and Offboard Data Manipulation in Flexdashboard
https://rviews.rstudio.com/2019/01/23/onboard-and-offboard-data-manipulation-in-flexdashboard/
Wed, 23 Jan 2019 00:00:00 +0000https://rviews.rstudio.com/2019/01/23/onboard-and-offboard-data-manipulation-in-flexdashboard/
<p><a href="https://csbaonline.org/about/people/staff/harrison-schramm">Harrison Schramm</a> is a Professional Statistician and Non-Resident Senior Fellow at the Center for Strategic and Budgetary Assessments.</p>
<p>The Shiny set of tools, and, by extension, Flexdashboard, give professional analysts tools to rapidly put interactive versions of their work in the hands of clients. Frequently, an end user will interact with data by either uploading or downloading a new set in its entirety (typically from a .csv or other similarly structured source), or do so ‘on the fly’ interactively, using tools like RHandsonTable. What if you want to do both at the same time? That is – what if you want to be able to change data interactively, or completely upload a new database <em>without stopping the current instance</em> or other ‘clunky’ things?</p>
<p>It turns out that you can, and the implementation is relatively straightforward. In this post, our aims are to:
a. Make the reader more familiar with reactivity, particularly with respect to initializing and containerizing variables
b. Show how to manage a single object that can be ‘touched’ by the uploadHandler and rHandsonTable
c. Provide a worked example in Flexdashboard</p>
<p>We need to create an environment where a data object that may be manipulated by functions inside an app is initialized to a pre-loaded set. At runtime, the object may be downloaded to a .csv, manipulated directly in a spreadsheet-like interface, or replaced completely with an upload.</p>
<p>In the following sections, we highlight the working parts of the minimal example (also provided) based on the attributes of different ships in Star Wars.</p>
<h2 id="step-1-initialization">Step 1: Initialization</h2>
<pre><code class="language-r">library(flexdashboard)
library(dplyr)
library(magrittr)
library(rhandsontable)
</code></pre>
<p>After calling the applicable libraries (above), we load a starter dataset so that the application isn’t empty when launched. We also create a <code>reactiveValues</code> object called <code>values</code> that sets the handsontable <code>hot</code> to NULL. This is necessary because the handsontable object is self-referential in the sense that it is both an output and input device.</p>
<pre><code class="language-r">BFL = read.csv("StarterExampleData.csv")
BFL$Exclude = FALSE
values <- reactiveValues(hot = NULL)
</code></pre>
<p>The <strong>main</strong> trick is to hold the R dataframe in a reactive environment that responds to both the handsontable object and the upload handler. This is done by a series of nested <code>if</code> statements, that, in order, make the object equal:</p>
<ul>
<li>the initialization file, if the object is empty (from the previous code chunk),</li>
<li>the uploaded file (from the upload handler, below),</li>
<li>and finally the handsontable object.</li>
</ul>
<p>Note the unglamorous statement <code>read.csv(input$InputBFL$datapath)</code>. This is necessary when reading a file provided by an upload handler. Simply reading the handle attached to the csv will point to the ‘box’, not what is ‘in’ the box.</p>
<p>This is different than the approaches advocated on many forums where the initialization is done in the <code>rhandsontable</code> code. That approach is perfectly valid, but does not offer the flexibility of an uploaded file.</p>
<pre><code class="language-r">BFLD = reactive({
if(is.null(input$hot)){BFL}
else if(!is.null(input$InputBFL)){read.csv(input$InputBFL$datapath)}
else{
hot_to_r(input$hot)
}
})
</code></pre>
<h2 id="step-2-file-handlers">Step 2: File Handlers</h2>
<p>Now that we have set up the reactive structure for the object <code>BFLD</code>, which can be accessed by other functions inside our code, we add the functionality for the upload and download handlers.</p>
<p>The download function is two separate items: the <code>downloadButton</code> that triggers the action, and the <code>downloadHandler</code> that performs the interface. In Shiny apps, the linkage is explicit. In Flexdashboard, the linkage is made by putting the handler immediately after the button. <em>Note: The handlers do not always work in the RStudio App window; it may be necessary to ‘open in browser’</em></p>
<pre><code class="language-r,">downloadButton("StarWarsDownload", label = "Star Wars Download")
downloadHandler(
filename = function(){"Star_Wars_Download.csv"},
content = function(file){
write.csv(BFLD(), file)
}
)
</code></pre>
<p>The upload function is a single block. Here the input File is linked to the reactive variable built in Step 1 by the handle <code>input$InputBFL</code>.</p>
<pre><code class="language-r,">fileInput("InputBFL", "Choose CSV File",
multiple = FALSE,
accept = c("text/csv",
"text/comma-separated-values,text/plain",
".csv"))
</code></pre>
<h2 id="step-3-hands-on-table">Step 3: Hands On Table</h2>
<p>In the final step, we transform the stored reactive object <code>BFLD</code> into a handsontable object, and output. Note that because it is a reactive object, we call it as <code>BFLD()</code> with parentheses, and inside a ‘render’ context.</p>
<pre><code class="language-r,">output$hot = renderRHandsontable({
rhandsontable(BFLD(), height = 550) %>% hot_rows()
})
rHandsontableOutput("hot")
</code></pre>
<h2 id="conclusion">Conclusion</h2>
<p>We hope that this minimal example will help you use both <code>rhandsontable</code> and upload/download handlers in flexdashboard. There should be enough code and explanations here for simpler cases. These code chunks should help you worry less about IO handling in your applications and spend more time on graphics and analysis. Look <a href="https://hschramm.shinyapps.io/UploadDownloadMinEx/">here</a> to see the example working.</p>
<p>While we did not develop it here, it seems that this same construct could be used to add a web-based data source.</p>
<script>window.location.href='https://rviews.rstudio.com/2019/01/23/onboard-and-offboard-data-manipulation-in-flexdashboard/';</script>
ROC Curves
https://rviews.rstudio.com/2019/01/17/roc-curves/
Thu, 17 Jan 2019 00:00:00 +0000https://rviews.rstudio.com/2019/01/17/roc-curves/
<p>I have been thinking about writing a short post on R resources for working with (<a href="https://en.wikipedia.org/wiki/Receiver_operating_characteristic">ROC</a>) curves, but first I thought it would be nice to review the basics. In contrast to the usual (usual for data scientists anyway) machine learning point of view, I’ll frame the topic closer to its historical origins as a portrait of practical decision theory.</p>
<p>ROC curves were invented during WWII to help radar operators decide whether the signal they were getting indicated the presence of an enemy aircraft or was just noise. (<a href="https://web.stanford.edu/~yesavage/ROC%20Slides%20OHara.ppt">O’Hara et al.</a> specifically refer to the Battle of Britain, but I haven’t been able to track that down.)</p>
<p>I am relying comes from James Egan’s classic text <a href="https://amzn.to/2FgC3BH"><em>signal Detection Theory and ROC Analysis</em></a>) for the basic setup of the problem. It goes something like this: suppose there is an observed quantity (maybe the amplitude of the radar blip), X, that could indicate either the presence of a meaningful signal (e.g. from a <a href="https://en.wikipedia.org/wiki/Messerschmitt_Bf_109">Messerschmitt</a>) embedded in noise, or just noise alone (geese). When viewing X in some small interval of time, we would like to establish a threshold or cutoff value, c, such that if X > c we will we can be pretty sure we are observing a signal and not just noise. The situation is illustrated in the little animation below.</p>
<pre class="r"><code>library(tidyverse)
library(gganimate) #for animation
library(magick) # to put animations sicde by side</code></pre>
<p>We model the noise alone as random draws from a N(0,1) distribution, signal plus noise as draws from N(s_mean, S_sd), and we compute two conditional distributions. The probability of a “Hit” or P(X > c | a signal is present) and the probability of a “False Alarm”, P(X > c | noise only).</p>
<pre class="r"><code>s_mean <- 2 # signal mean
s_sd <- 1.1 # signal standard deviation
x <- seq(-5,5,by=0.01) # range of signal
signal <- rnorm(100000,s_mean,s_sd)
noise <- rnorm(100000,0,1)
PX_n <- 1 - pnorm(x, mean = 0, sd = 1) # P(X > c | noise only) = False alarm rate
PX_sn <- 1 - pnorm(x, mean = s_mean, sd = s_sd) # P(X > c | signal plus noise) = Hit rate</code></pre>
<p>We plot these two distributions in the left panel of the animation for different values of the cutoff threshold threshold.</p>
<pre class="r"><code>threshold <- data.frame(val = seq(from = .5, to = s_mean, by = .2))
dist <-
data.frame(signal = signal, noise = noise) %>%
gather(data, value) %>%
ggplot(aes(x = value, fill = data)) +
geom_density(trim = TRUE, alpha = .5) +
ggtitle("Conditional Distributions") +
xlab("observed signal") +
scale_fill_manual(values = c("pink", "blue"))
p1 <- dist + geom_vline(data = threshold, xintercept = threshold$val, color = "red") +
transition_manual(threshold$val)
p1 <- animate(p1)</code></pre>
<p>And, we plot the ROC curve for our detection system in the right panel. Each point in this plot corresponds to one of the cutoff thresholds in the left panel.</p>
<pre class="r"><code>df2 <- data.frame(x, PX_n, PX_sn)
roc <- ggplot(df2) +
xlab("P(X | n)") + ylab("P(X | sn)") +
geom_line(aes(PX_n, PX_sn)) +
geom_abline(slope = 1) +
ggtitle("ROC Curve") +
coord_equal()
q1 <- roc +
geom_point(data = threshold, aes(1-pnorm(val),
1- pnorm(val, mean = s_mean, sd = s_sd)),
color = "red") +
transition_manual(val)
q1 <- animate(q1)</code></pre>
<p>(The slick trick of getting these two animation panels to line up in the same frame is due to a helper function from Thomas Pedersen and Patrick Touche that can be found <a href="https://github.com/thomasp85/gganimate/issues/226">here</a>)</p>
<pre class="r"><code>combine_gifs(p1,q1)</code></pre>
<p><img src="/post/2019-01-06-roc-curves_files/figure-html/unnamed-chunk-6-1.gif" /><!-- --></p>
<p>Notice that as the cutoff line moves further to the right, giving the decision maker a better chance of making a correct decision, the corresponding point moves down the ROC curve towards a lower Hit rate. This illustrates the fundamental tradefoff between hit rate and false alarm rate in the underlying decision problem. For any given problem, a decision algorithm or classifier will live on some ROC curve in false alarm / hit rate space. Improving the hit rate usually come at the cost of increasing the probability of more false alarms.</p>
<p>The simulation code also lets you vary s_mean, the mean of the signal, Setting this to a large value (maybe 5), will sufficiently separate the signal from the noise, and you will get the kind of perfect looking ROC curve you may be accustomed to seeing produced by your best classification models.</p>
<p>The usual practice in machine learning applications is to compute the area under the ROC curve, <a href="https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve">AUC</a>. This has become the “gold standard” for evaluating classifiers. Given a choice between different classification algorithms, data scientists routinely select the classifier with the highest AUC. The intuition behind this is compelling: given that the ROC is always a monotone increasing, concave downward curve, the best possible curve will have an inflection point in the upper left hand corner and an AUC approaching one (All of the area in ROC space).</p>
<p>Unfortunately, the automatic calculation and model selection of the AUC discourages analysis of how the properties and weaknesses of ROC curves may pertain to the problem at hand. Keeping sight of the decision theory point of view may help to protect against the spell of mechanistic thinking encouraged by powerful algorithms. Although, automatically selecting a classifier based on the value of the AUC may make good sense most of the time, things can go wrong. For example, it is not uncommon for analysts to interpret AUC as a measure of the accuracy of the classifier. But, the AUC is not a measure of accuracy as a little thought about the decision problem would make clear. The irony here is that there was a time, not too long ago, when people thought it was necessary to argue that the AUC is a better measure than accuracy for evaluating machine learning algorithms. For example, have a look at the <a href="https://www.cse.ust.hk/nevinZhangGroup/readings/yi/Bradley_PR97.pdf">1997 paper</a> by Andrew Bradley where he concludes that <em>“…AUC be used in preference to overall accuracy for ‘single number’ evaluation of machine learning algorithms”.</em></p>
<p>What does the AUC measure? For the binary classification problem of our simple signal processing example, a little calculus will show that the AUC is the probability that a randomly drawn interval with a signal present will produce a higher X value than a signal interval containing noise alone. See <a href="https://link.springer.com/content/pdf/10.1007%2Fs10994-009-5119-5.pdf"><em>Hand (2009)</em></a>, and the very informative <a href="https://stats.stackexchange.com/questions/180638/how-to-derive-the-probabilistic-interpretation-of-the-auc"><em>StackExchange</em></a> discussion for the math.</p>
<p>Also note, that in the paper just cited, Hand examines some of the deficiencies of the AUC. His discussion provides an additional incentive for keeping the decision theory tradeoff in mind when working with ROC curves. Hand concludes:</p>
<blockquote>
<p>…it [AUC] is fundamentally incoherent in terms of misclassification costs: the AUC uses different misclassification cost distributions for different classifiers. This means that using the AUC is equivalent to using different metrics to evaluate different classification rules.</p>
</blockquote>
<p>and goes on to propose the <strong>H measure</strong> for ranking classifiers. (See the R package <a href="https://cran.r-project.org/package=hmeasure">hmeasure</a>) Following up on this will have to be an investigation for another day.</p>
<p>Our discussion in this post has taken us part way along just one path through the enormous literature on ROC curves which could not be totally explored in a hundred posts. I will just mention that not long after its inception, ROC analysis was used to establish a conceptual framework for problems relating to sensation and perception in the field of psychophysics (<a href="https://psych.nyu.edu/pelli/pubs/pelli1995methods.pdf"><em>Pelli and Farell (1995)</em></a>) and thereafter applied to decision problems in Medical Diagnostics, (<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755824/#B26"><em>Hajian-Tilaki (2013)</em></a>), National Intelligence (<a href="https://www.nap.edu/read/13062/chapter/7"><em>McCelland (2011)</em></a>) and just about any field that collects data to support decision making.</p>
<p>If you are interested in delving deeper into ROC curves, the references in papers mentioned above may help to guide further exploration.</p>
<script>window.location.href='https://rviews.rstudio.com/2019/01/17/roc-curves/';</script>
A Look Back on 2018: Part 1
https://rviews.rstudio.com/2019/01/10/a-look-back-on-2018-part-1/
Thu, 10 Jan 2019 00:00:00 +0000https://rviews.rstudio.com/2019/01/10/a-look-back-on-2018-part-1/
<script src="/rmarkdown-libs/htmlwidgets/htmlwidgets.js"></script>
<script src="/rmarkdown-libs/jquery/jquery.min.js"></script>
<script src="/rmarkdown-libs/proj4js/proj4.js"></script>
<link href="/rmarkdown-libs/highcharts/css/motion.css" rel="stylesheet" />
<script src="/rmarkdown-libs/highcharts/highcharts.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-3d.js"></script>
<script src="/rmarkdown-libs/highcharts/highcharts-more.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/stock.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/heatmap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/treemap.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/annotations.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/boost.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/data.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drag-panes.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/drilldown.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/funnel.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/item-series.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/offline-exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/overlapping-datalabels.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/parallel-coordinates.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sankey.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/solid-gauge.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/streamgraph.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/sunburst.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/vector.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/wordcloud.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/xrange.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/exporting.js"></script>
<script src="/rmarkdown-libs/highcharts/modules/export-data.js"></script>
<script src="/rmarkdown-libs/highcharts/maps/modules/map.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/grouped-categories.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/motion.js"></script>
<script src="/rmarkdown-libs/highcharts/plugins/multicolor_series.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/reset.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/symbols-extra.js"></script>
<script src="/rmarkdown-libs/highcharts/custom/text-symbols.js"></script>
<script src="/rmarkdown-libs/highchart-binding/highchart.js"></script>
<p>Welcome to Reproducible Finance 2019! It’s a new year, a new beginning, the Earth has completed one more trip around the sun, and that means it’s time to look back on the previous January to December cycle.</p>
<p>Today and next time, we’ll explore the returns and volatilities of various market sectors in 2018. We might also get into fund flows and explore a new data source because <a href="https://www.ft.com/content/fdc1c064-1142-11e9-a581-4ff78404524e">this fantastic piece</a> from the FT has the wheels turning. So much data, so little time.</p>
<p>Back to the task at hand, today we will grab data on the daily returns of various stock market sector ETFs and build exploratory data visualizations around that data.</p>
<p>From an R code perspective, we will get familiar with a new source for market data (tiingo, which has come up in several conversations recently and seems to be gaining nice traction in the R world), build some ggplots, and dive into <code>highcharter</code> a bit. In that sense, it’s also somewhat of a look back to our previous work because we’ll be stepping through some good ‘ol data import, wrangling, and visualization. Some of the code flows might look familiar to long-time readers, but if you’ve joined us recently and haven’t gone back to read the oh-so-invigorating previous posts, this should give a good sense of how we think about working with financial data.</p>
<p>Let’s get to it. We want to import data on 10 sector ETFs and also on SPY, the market ETF. We’ll first need the tickers of each sector ETF:</p>
<pre class="r"><code>ticker = ("XLY", "XLP", "XLE",
"XLF", "XLV", "XLI", "XLB",
"XLK", "XLU", "XLRE",
"SPY")</code></pre>
<p>And our sector labels are:</p>
<pre class="r"><code>sector = ("Consumer Discretionary", "Consumer Staples", "Energy",
"Financials", "Health Care", "Industrials", "Materials",
"Information Technology", "Utilities", "Real Estate",
"Market")</code></pre>
<p>We can use the <code>tibble()</code> function to save those as columns of new <code>tibble</code>.</p>
<p>First, let’s load up our packages for the day, because we’ll need the <code>tibble</code> package via <code>tidyverse</code>.</p>
<pre class="r"><code>library(tidyverse)
library(tidyquant)
library(riingo)
library(timetk)
library(tibbletime)
library(highcharter)
library(htmltools)</code></pre>
<p>And on to creating a tibble:</p>
<pre class="r"><code>etf_ticker_sector <- tibble(
ticker = c("XLY", "XLP", "XLE",
"XLF", "XLV", "XLI", "XLB",
"XLK", "XLU", "XLRE",
"SPY"),
sector = c("Consumer Discretionary", "Consumer Staples", "Energy",
"Financials", "Health Care", "Industrials", "Materials",
"Information Technology", "Utilities", "Real Estate",
"Market")
)
etf_ticker_sector</code></pre>
<pre><code># A tibble: 11 x 2
ticker sector
<chr> <chr>
1 XLY Consumer Discretionary
2 XLP Consumer Staples
3 XLE Energy
4 XLF Financials
5 XLV Health Care
6 XLI Industrials
7 XLB Materials
8 XLK Information Technology
9 XLU Utilities
10 XLRE Real Estate
11 SPY Market </code></pre>
<p>Now, we want to import the daily prices for 2018 for these tickers. We could use <code>getSymbols()</code> to access Yahoo! Finance as we have done for the last three years, but let’s do something crazy and explore a new data source, the excellent <a href="https://tiingo.com">tiingo</a>, which we access via the <a href="https://cran.r-project.org/web/packages/riingo/riingo.pdf">riingo</a> package. The workhorse function to grab price data is <code>riingo_prices()</code>, to which we need to supply our tickers and a <code>start_date</code>/<code>end_date</code> pair.</p>
<p>Let’s start with the tickers, which we have already saved in the <code>ticker</code> column of <code>etf_ticker_sector</code>. That wasn’t really necessary. We could have just created a vector called <code>tickers_vector</code> by calling <code>tickers_vector = c("ticker1", "ticker2", ...)</code> and then passed that vector straight to <code>riingo_prices</code>. But I didn’t want to do that because I prefer to get my data to a tibble first and, as we’ll see, it will make it easier to add back in our sector labels, since they are aligned with our tickers in one object.</p>
<p>To pass our <code>ticker</code> column to <code>riingo_prices()</code>, we start with our tibble <code>etf_ticker_sector</code> and then pipe it to <code>pull(ticker)</code>. That will create a vector from the <code>ticker</code> column. The <code>pull()</code> function is very useful in these situations where we want to pipe or extract a column as a vector.</p>
<p>Here’s the result of pulling the tickers:</p>
<pre class="r"><code> etf_ticker_sector %>%
pull(ticker)</code></pre>
<pre><code> [1] "XLY" "XLP" "XLE" "XLF" "XLV" "XLI" "XLB" "XLK" "XLU" "XLRE"
[11] "SPY" </code></pre>
<p>Now we want to pass those tickers to <code>riingo_prices()</code>, but first we need to create an API key. <code>riingo</code> makes that quite convenient:</p>
<pre class="r"><code>riingo_browse_signup()
# This requires that you are signed in on the site once you sign up
riingo_browse_token() </code></pre>
<p>Then we set our key for use this session with:</p>
<pre class="r"><code># Need an API key for tiingo
riingo_set_token("your API key here")</code></pre>
<p>Next, we can pipe straight to <code>riingo_prices()</code>. We will set <code>start_date = "2017-12-29"</code> and <code>end_date = "2018-12-31"</code> to get prices for just 2018. I wanted the last trading day of 2017 because eventually we’ll calculate daily returns of 2018.</p>
<pre class="r"><code> etf_ticker_sector %>%
pull(ticker) %>%
riingo_prices(.,
start_date = "2017-12-29",
end_date = "2018-12-31") %>%
head()</code></pre>
<pre><code># A tibble: 6 x 14
t