June 2018: Top 40 New Packages

by Joseph Rickert

Approximately 144 new packages stuck to CRAN in June. That fact that 31 of these are specialized to particular scientific disciplines or analyses provides some evidence to my hypothesis that working scientists are actively adopting R. Below are my Top 40 picks for June, organized into the categories of Computational Methods, Data, Data Science, Economics, Science, Statistics, Time Series, Utilities and Visualizations. The Data packages, especially rtrek and opensensmapr, look like they have some interesting new data to explore.

Computational Methods

nnTensor v0.99.1: Provides methods for n-negative matrix factorization and decomposition. See Cichock et al (2009) for details.

RcppEigenAD v1.0.0: Provides functions to compile C++ code using Rcpp, Eigen, and CppAD to produce first- and second-order partial derivatives, and also provides an implementation of Faa’ di Bruno’s formula to combine the partial derivatives of composed functions. See Hardy (2006).

rcrane v1.0: Provides optimization algorithms to estimate coefficients in models such as linear regression and neural networks. Includes batch gradient descent, stochastic gradient descent, minibatch gradient descent, and coordinate descent. See [Kiwiel, (2001)](doi:10.1007/PL00011414, Yu Nesterov (2004), Ferguson (1982), Zeiler (2012), and Wright (2015). The vignette introduces the package.


bjscrapeR v0.1.0: Scrapes crime data from the National Crime Victimization Survey, which tracks personal and household crime in the USA.

genesysr v0.9.1: Implements an API to access data on plant genetic resources from genebanks around the world published on Genesys. The vignette offers a short tutorial.

opensensmapr v0.4.1: Allows users to download real-time environmental measurements and sensor station metadata from the OpenSenseMap API. There are vignettes for Visualization, Exploration, and Caching Data for Reproducibility.

readabs v0.2.1: Provides functions to read Excel files from the Australian Bureau of Statistics into Tidy Data Sets. See the vignette.

rppo v1.0: Implements an interface to the Global Plant Phenology Data Portal. See the vignette.

rtrek v0.1.0: Provides datasets related to the Star Trek fictional universe, functions for working with the data, and access to real-world datasets based on the televised series and other related licensed media productions. It interfaces with Wikipedia, the Star Trek API (STAPI), Memory Alpha, and Memory Beta to retrieve data, metadata, and other information relating to Star Trek. See the README for usage information.

skynet v1.2.2: Implements methods for generating air transport statistics based on publicly available data from the U.S. Bureau of Transport Statistics (BTS). See the vignette.

Data Science

AdaSampling v1.1: Implements the adaptive sampling procedure, a framework for both positive unlabeled learning and learning with class label noise. See Yang et al. (2018) and the vignette.

AROC v1.0: Provides functions to estimate the covariate-adjusted Receiver Operating Characteristic (AROC) curve and pooled (unadjusted) ROC curve. See de Carvalho and Rodriguez-Alvarez (2018).

cloudml v0.5.1: Provides an interface to the Google Cloud Machine Learning Platform. There is a Getting Sarted Guide and vignettes on Deploying Models, Cloud storage, Training, and Hyperparameter Tuning.

reclin v0.1.0: Provide functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities, forcing one-to-one matching. There is an Introduction and a vignette on Duplication.

vip v0.1.0: Provides a general framework for constructing variable importance plots from various types of machine learning models, based on a novel approach using partial dependence plots and individual conditional expectation curves as described in Greenwell et al. (2018). See the README for details and examples.

wevid v0.4.2: Provides functions to quantify the performance of a binary classifier through weight of evidence. These can be used with any test dataset on which you have observed case-control status, and have computed prior and posterior probabilities of case status using a model learned on a training dataset. Look at this website for details and examples.


trade v0.5.3: Provides tools for working with trade model, including the ability to calibrate different consumer-demand systems and simulate the effects of tariffs and quotas under different competitive regimes. The vignette provides details.


linpk v1.0: Provides functions and a shiny application to generate concentration-time profiles from linear pharmacokinetic (PK) systems. Single or multiple doses may be specified. The vignette offers details and examples.

ratematrix v1.0: Provides functions to estimate the evolutionary rate matrix ® using Markov chain Monte Carlo (MCMC), as described in Caetano and Harmon (2017). There is a vignette on Setting a custom starting point and another on Using prior distributions.

spectralAnalysis v3.12.0: Provides a toolkit for spectral-analysis, enabling users to pre-process, visualize, and analyse process analytical dat, by spectral data measurements made during a chemical process.


betaboost v1.0.1: Implements boosting beta regression for potentially high-dimensional data Mayr et al. (2018) using the same parametrization as betareg Cribari-Neto and Zeileis (2010). The underlying algorithms are implemented via the R add-on packages mboost Hofner et al. (2014) and gamboostLSS Mayr et al. (2012). The vignette offers examples.

bfw v0.1.0: Provides a framework for conducting Bayesian analysis using Markov chain Monte Carlo with the JAGS sampler. There are vignettes on Fitting Latent Data, Fitting Observed Data, the Predict Metric, Plotting, and Regression.

CaseBasedReasoning v0.1: Given a large set of problems and their individual solutions, case-based reasoning seeks to solve a new problem by referring to the solution of that problem that is “most similar” to the new problem. See Dippon et al. (2002), the vignette on Motivation, and examples of case-based reasoning with a Cox-Beta Model and a Random Forest Model.

coxed v0.1.1: Provides functions for generating, simulating, and visualizing expected durations and marginal changes in duration from the Cox proportional hazards model. There is a vignette on using the coxed() function and another on simulating survival data.

GLMMadaptive v0.2-0: Provides functions to fit generalized linear mixed models for a single grouping factor under maximum likelihood approximating the integrals over the random effects with an adaptive Gaussian quadrature rule. See Pinheiro and Bates (1995) and the vignettes on Custom Models, GLMMadaptive Basics, Methods for MixMod Objects, and Zero-Inflated and Two-Part Mixed Effects Models.

glmmfields v0.1.0: Implements generalized linear mixed models with robust random fields for spatiotemporal modeling. The vignette provides examples.

kendallRandomWalks v0.9.3: Provides functions for simulating Kendall random walks, continuous-space Markov chains generated by the Kendall generalized convolution. See Jasiulis-Gołdyn (2014) for details and the vignettes Kendall Random Walks and Studying the Behavior of Kendall Random Walks.

netSEM v0.5.0: Provides functions for structural equation modeling. There is an Introduction and vignettes on Backsheet Degradation, Backsheet Cracking, Current Voltage Features, and Modeling of the Weathering Driven Degradation of Poly(ethylene-terephthalate) Films.

umap v0.1.0.3: Implements the uniform manifold approximation and projection technique for dimension reduction as described in McInnes and Healy (2018). The vignette shows how to use the package.

vimp v1.0.0:Provides functions to calculate point estimates of, and valid confidence intervals for, non-parametric variable importance measures in high and low dimensions. For information about the methods, see Williamson et al. (2017). The vignette contains an introduction to the package.

vsgoftest v0.3-2: Implements Vasicek and Song goodness-of-fit tests (based on Kullbach-Leibler divergence) for a family of distributions that include uniform, Gaussian, log-normal, exponential, gamma, Weibull, Pareto, Fisher, Laplace, and beta distributions. See Lequesne and Regnault (2018) for details and the Tutorial.

Time Series

anomaly v1.0.0: Implements the CAPA (Collective And Point Anomaly) algorithm of Fisch, Eckley and Fearnhead (2018) for the detection of anomalies in time series data.

exuber v0.1.0: Provides functions for testing and dating periods of explosive dynamics (exuberance) in time series using recursive unit root tests as proposed by Phillips et al. (2015). See the README to get started.

Simulate a variety of periodically-collapsing bubble models. The estimation and simulation utilizes the matrix inversion lemma from the recursive least squares algorithm, which results in a significant speed improvement.


BiocManager v1.30.1: Implements a tool to install and update Bioconductor packages. The vignette shows how to use the package.

IntervalSurgeon v1.0: Provides functions for manipulating integer-bounded intervals including finding overlaps, piling, and merging. The vignette shows how to use the package.

pkgbuild v1.0.0: Provides functions used to build R packages. Locates compilers needed to build R packages on various platforms and ensures the PATH is configured appropriately.

rqdatatable v0.1.2: Implements the rquery piped query algebra using data.table. There is a vignette on Grouped Sampling and a Logistic Example.

ssh v0.2: Provides functions to connect to a remote server over SSH to transfer files via SCP, setup a secure tunnel, or run a command or script on the host while streaming stdout and stderr directly to the client. There is a vignette.


mgcViz v0.1.1: An extension of the mgcv package, providing visual tools for Generalized Additive Models (GAMs) that exploit the additive structure of GAMs, scale to large data sets, and can be used in conjunction with a wide range of response distributions. See the vignette for examples.

tiler v0.2.0: Provides functions to create geographic map tiles from geospatial map files or non-geographic map tiles from simple image files. The vignette provides an introduction.

Share Comments · · ·

You may leave a comment below or discuss the post in the forum community.rstudio.com.