July 2017 New Package Picks

2017-08-28

by Joseph Rickert

Two hundred and twenty-four new packages were added to CRAN in July. Below are my picks for the “Top 40” packages arranged in eight categories: Machine Learning, Science, Statistics, Numerical Methods, Statistics, Time Series, Utilities and Visualizations. Science and Numerical Methods are categories that I have not used before. The idea behind the Science category is to find a place for packages that appear to have been created with some particular scientific investigation or problem in mind. The Numerical Methods category is reserved for packages that, while they may be targeted to some general form of statistical analysis, emphasize numerical considerations and carefully constructed algorithms.

As always, my selections are heavily weighted by the availability of documentation beyond what is included in the package PDF. I rarely select packages that do not have a vignette or some other source of documentation about how the package can be used, for example, README files or a referenced URL. I almost never select “professional” packages, which I define as packages that are devoted to esoteric topics that either include no documentation beyond the PDF, or exclusively refer to papers that are protected by a paywall. While these packages usually comprise serious, valuable contributions to R, they also appear to have been written for very small audiences.

Finally, before listing this month’s Top 40, I would like to call attention to an awesome display of productivity by Kevin R. Coombes, who had fourteen packages on various topics accepted by CRAN in July: BimodalIndex, ClassComparison, ClassDiscovery, CrossValidate, GeneAlgo, IntegIRTy, Modeler, NameNeedle, oompaBase, oompaData, PreProcess, SIBERG, TailRank and Umpire.

The July 2017 Top 40

Machine Learning

autoBagging v0.1.0: Implements a framework for automated machine learning that focuses on the optimization of bagging workflows. See the paper by Pinto et al. for details.

grf v0.9.3: Provides methods for non-parametric least-squares regression, quantile regression, and treatment effect estimation (optionally using instrumental variables).

iRF v2.0.0: Provides functions to iteratively grow feature-weighted random forests and finds high-order feature interactions in a stable fashion. Look here for the details.

keras v2.0.5: Implements an interface to Keras, a high-level neural networks API that runs on top of TensorFlow. There is an Overview of the Keras backend, and a number of vignettes including Keras Layers, Writing Custom Keras Layers, Keras Models, Using Pre-Trained Models, Sequential Models and more.

randomForestExplainer v0.9: Provide set of tools to help explain which variables are most important in a random forests. The vignette provides examples of visualizing multiple performance measures.

sgmcmc v0.1.0: Provides functions to implement stochastic gradient Markov chain Monte Carlo (SGMCMC) methods for user-specified models. TensorFlow is used to calculate the gradients. There is a Getting Started Guide and vignettes for Simulating from a Gaussian Mixture, a Multivariate Gaussian Mixture and Logistic Regression.

Numerical Methods

mcMST v1.0.0: Provides algorithms to approximate the Pareto-front of multi-criteria minimum spanning tree problems, along with a toolbox for generating multi-objective benchmark graph problems. There is an Introduction and a vignette on benchmarking optimization problems.

mize v0.1.1: Provides optimization algorithms, including conjugate gradient (CG), Broyden-Fletcher-Goldfarb-Shanno (BFGS), and the limited memory BFGS (L-BFGS) methods. There is an introduction and vignettes on Convergence, Metric MDS, and Stateful Optimization.

SuperGauss v1.0: Provides a fast C++ based algorithm for the evaluation of Gaussian time series, along with efficient implementations of the score and Hessian functions. The vignette shows an example of inference for the Hurst parameter.

Science

GROAN v1.0.0: Is a workbench for testing genomic regression accuracy on noisy phenotypes. It contains a noise-generator function. There is a vignette on Genomic Regression in Noisy Scenarios.

noaastormevents v0.1.0: Allows users to explore and plot data from the National Oceanic and Atmospheric Administration (NOAA) Storm Events database for United States counties through R. There is an Overview and a vignette providing details.

swmmr v0.7.0: Provides functions to connect the Storm Water Management Model (SWMM) of the United States Environmental Protection Agency (US EPA).

Statistics

blandr v0.4.3: Contains functions to carry out Bland Altman analyses (also known as a Tukey mean-difference plot) as described by JM Bland and DG Altman. See the vignette for details.

cnbdistr v1.0.1: Provides the distribution functions for the Conditional Negative Binomial distribution. The vignette shows the math.

diffpriv v0.4.2: Provides an implementation of major general-purpose mechanisms for privatizing statistics, models, and machine learners, within the framework of differential privacy of Dwork et al. (2006). There is a vignette on The Bernstein Mechanism and an Introduction

fence v1.0: Implements a new class of model-selection strategies for mixed-model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). The package points to several references in the literature including papers by Jiang et al. 2008, Jiang et al. 2010, Jiang et al. 2011, Nguyen et al. 2012, and Jiang 2014.

llogistic v1.0.0: Provides density, distribution, quantile and random generation functions for the L-Logistic distribution with parameters m (median) and b.

metaBMA v0.3.9: Provides functions to compute the posterior model probabilities the meta-analysis models assuming either fixed or random effects. See the paper by Gronau et al. and the vignette for details.

MFKnockoffs v0.9: Provides functions to create model-free knockoffs, a general procedure for controlling the false discovery rate FDR when performing variable selection. There are vignettes on using the the Model-Free Knockoff Filter Basic and Advanced, and Using the Filter with a Fixed Design Matrix.

Modeler v3.4.2: Provides tools to define classes and methods to learn models and use them to predict binary outcomes. There is a vignette on Learning and Predicting.

msde v1.0: Implements an MCMC sampler for the posterior distribution of arbitrary, time-homogeneous, multivariate stochastic differential equation (SDE) models with possibly latent components. There is a vignette with Sample Models and another for Inference.

RBesT v1.2-3: Provides a tool set to support Bayesian evidence synthesis, including meta-analysis, prior derivation from historical data, operating characteristics, and analysis. There is an Introduction and vignettes on Customizing Plots, Normal Endpoints, and Robust Priors.

RcppTN v0.2-1: Provides R and C++ functions to generate random deviates from and calculate moments of a Truncated Normal distribution using the algorithm of Robert (1995). There is a vignette showing how to use the package, and one for Performance.

rsample v0.0.1: Provides classes and functions to create and summarize different kinds of resampling objects. There is a vignette on the Basics, and another for Working with rsets.

SMM v1.0: Provides functions to simulate and estimate of Multi-State Discrete-Time Semi-Markov and Markov Models. The implementation details are described in two papers by Barbu, Limnios one and two, and one paper by Trevezas and Limnios. The vignette also provides considerable detail.

treeDA v0.02: Provides functions to perform sparse discriminant analysis on a combination of node and leaf predictors, when the predictor variables are structured according to a tree. There is a vignette.

vennLasso v0.1: Provides variable selection and estimation routines for models stratified by binary factors. There is a vignette.

Time Series

sweep v0.2.0: Provides tools for bringing tidyverse organization to time series forecasting. There is an Introduction, as well as vignettes on Forecasting and Forecasting with Mutiple Models.

timetk v0.1.0: Implements a toolkit for working with time series, including functions to interrogate time series objects and tibbles, and coerce between time-based tibbles (‘tbl’) and ‘xts’, ‘zoo’, and ‘ts’. There is an Introduction and vignettes on Working with time series index, Making a Future Index, and Forecasting.

Utilities

dataCompareR v0.1.0: Contains functions to compare two tabular data objects with the specific intent of showing differences in a way that should make it easier to understand the differences. The vignette shows how to use the package.

dataPreparation v0.2: Provides functions for data preparation that take advantage of data.table efficiencies. There is a tutorial.

datastructures v0.2.0: Provides implementations of advanced data structures such as hashmaps, heaps, and queues. There is a Tutorial.

docker v0.0.2: Provides access to the Docker SDK from R via python, using the reticulate package. There is a vignette to get you started.

seplyr v0.1.4: Supplies standard evaluation adapter methods for important common dplyr methods that currently have a non-standard programming interface. There is an Introduction, as well as vignettes for Using seplyr with dplyr, the operator named map builder, and the operator rename_se.

vetr v0.1.0: Provides a declarative template-based framework for verifying that objects meet structural requirements, and auto-composing error messages when they do not. There is a vignette on Alikeness and one on Trust, but Verify.

Visualization

ggjoy v0.3.0: Joyplots provide a convenient way of visualizing changes in distributions over time or space. ggjoy enables the creation of such plots in ggplot2. There is an Introduction and a Gallery of examples.

ggplotgui v1.0.0: Implements a shiny app for creating and exploring ggplot2 graphs that also generates the required R code. Look here for examples.

loon v1.1.0: Is an extensible toolkit for interactive data visualization and exploration. There are two vignettes containing examples: Visible minorities in Canadian cities and Smoothers and Bone Mineral Density

sugrrants v0.1.0: Provides ggplot2 graphics for analyzing time series data with the goal of fitting into the tidyverse and grammar of graphics framework. The vignette provides examples.

tidygraph v1.0.0: A graph, while not “tidy” in itself, can be thought of as two tidy data frames describing node and edge data respectively. tidygraph provides functions to manipulate these virtual data frames using the dplyr package. Look here for some details.

visdat v0.1.0: provides functions to create preliminary exploratory data visualizations of an entire dataset using ggplot2. There is a vignette to get you started.