Jan 2018: "Top 40" New Package Picks

by Joseph Rickert

Here are my “Top 40” picks from the two hundred or so new packages that stuck to CRAN in January, listed under seven categories: Data, Data Science, Science, Statistics, Time Series, Utilities and Visualizations (I say “stuck to” because I counted at least six packages that were accepted onto CRAN in January but removed within the month. Having packages quickly removed from CRAN is a phenomenon I have observed in recent months.)

While looking over the packages that I have listed under Data and Science, it struck me that in addition to being the world’s largest repository of statistical knowledge, CRAN is becoming a repository for practical, hard-won scientific knowledge.


cancensus v0.1.7: Provides an interface to Canadian census and geographic data using the CensusMapper API. There is an Introduction and a vignette for Making maps.

elevatr v0.1.4: Provides access to several services offering elevation data, and returns the data either as a SpatialPointsDataFrame from point elevation services or as a raster object from raster elevation services. Currently, the package supports access to the Mapzen Elevation Service, Mapzen Terrain Service, Amazon Web Services Terrain Tiles, and the USGS Elevation Point Query Service. The vignette shows how to use the package.

fabricatr v0.2.0: Provides functions to simulate hierarchical and correlated data. There are several vignettes including a Getting Started guide and an Advanced Features guide, as well as introductions to Resampling and Generating Discrete Random Variables.

getTBinR v0.5.2: Facilitates easy import of analysis-ready World Health Organisation Tuberculosis data, and provides plotting functions for exploratory data analysis. There is a vignette.

homologene v1.1.68: Provides a wrapper for the homologene database by the National Center for Biotechnology Information, which allows searching for gene homologs across species.

photobiologyFilters v0.4.4: Is a data-only package with spectral ‘transmittance’ data for frequently used filters and materials, including plastic sheets and films, optical glass and ordinary glass, and some labware. It complements the photobiology package. See this website and the vignette for details.

tfdatasets v1.5: Provides an interface to TensorFlow Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces.

washdata v0.1.2: Provides access to the urban water and sanitation survey data set collected by Water and Sanitation for the Urban Poor (WSUP), with technical support from Valid International. There is a vignette.

Data Science

CRPClustering v1.0: Provides a clustering method using the Chinese restaurant process Pitman (1995) that does not need to decide the number of clusters in advance. Also provides functions to calculate the ambiguity of clusters as entropy Yngvason (1999). The vignette shows how to use the package.

kerasformula v0.1.1: Provides a high-level interface for keras neural nets. See the vignette for details.

multiROC v1.0.0: Provides tools to solve problems with multiple classes by computing the areas under ROC curve via micro- and macro-averaging. The methodology is described in Van Asch (2013) and Pedregosa et al. (2011). See the vignette for a quick tour.

reinforcelearn v0.1.0: Implements reinforcement learning environments and algorithms. as described in Sutton & Barto (1998). There are vignettes for Agents and Environments.

stranger v0.3.2: Provides a framework for unsupervised anomalies detection There is a Vignette for the Impatient, and vignettes for Methods and Anomalies manual selection.

tidypredict v0.1.0: Provides functions to parse a fitted ‘R’ model object, and return a SQL query. There are vignettes for GLM and Random Forest models.


annovarR v1.0.0: Provides unctions and database resources to offer an integrated framework to annotate genetic variants from genome and transcriptome data. The wrapper functions unify the interface of many published annotation tools, such as VEP, ANNOVAR, vcfanno, and AnnotationDbi. There is an Introduction and a vignette on Databases.

pubh v0.1.7: Offers a toolbox for making R functions and capabilities more accessible to students and professionals from Epidemiology and Public Health related disciplines. There is an Introduction and a Regression Example.

trajr v1.0.0: Provides a toolbox to assist with statistical analysis of two-dimensional animal trajectories. The vignette provides several examples.


dalmatian v0.3.0: Automates fitting a double GLM in JAGS. There is a vignette on weighted regression and a two-part example using the Pied Flycatcher Data: Part 1 and Part 2.

dirichletprocess v0.2.0: Enables the creation of Dirichlet process objects that can be used as infinite mixture models. Examples include density estimation, Poisson process intensity inference, hierarchical modelling, and clustering. See Teh, Y. W. (2011) and the vignette for details.

detpack v1.0.1: Enables density estimation for possibly large data sets and conditional/unconditional random number generation with distribution element trees. For details on distribution element trees, see Meyer (2016), Meyer (2017), and Meyer (2017).

gnorm v1.0.0: Provides functions for obtaining generalized normal/exponential power distribution probabilities, quantiles, densities, and random deviates. See the vignette for details.

IROmiss v1.0.1: Provides a general algorithm, the Imputation Regularized Optimization (IRO) algorithm, for high-dimensional missing data problems. See Liang et al. (2018) for details.

KRIG v0.1.0: Provides functions for Kriging models and various methods for spatial statistics, including multivariate sensitivity analysis using reproducing kernel Hilbert spaces and computation of Sobol indexes. There are vignettes on Ordinary Kriging, Simple Kriging, Universal Kriging, and a worked example.

natural v0.9.0: Implements two error variance estimation methods in high-dimensional linear models. See Yu, Bien (2017) and the vignette.

OpVar v1.0: Provides functions for modeling operational (value-at-)risk, including loss frequencies and loss severities with plain, mixed (Frigessi et al. (2012)) or spliced distributions using Maximum Likelihood estimation and Bayesian approaches (Ergashev et al. (2013)). The vignette shows some examples.

netrankr v0.2.0: Implements methods for centrality-related analyses of networks, focusing on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. See Schoch (2018). There are vignettes for benchmarks, centrality indices, indirect relations, neighborhood inclusion, partial centrality, positional dominance, probabilistic centrality, uniquely ranked graphs, and a use case.

palmtree v0.9.0: Implements the PALM tree algorithm, an extension to the MOB algorithm (implemented in the partykit package), where some parameters are fixed across all groups. See Seibold et al. (2016) for details.

PMCMRplus v1.0.0: Provides functions to calculate many different types of pairwise multiple comparisons tests. See the vignette for charts listing the tests covered.

seminr v0.4.0: Implements a domain-specific language for building PLS structural equation models, allowing for the latest estimation methods for Consistent PLS as per Dijkstra & Henseler (2015), adjusted interactions as per Henseler & Chin (2010), and bootstrapping utilizing parallel processing as per Hair et al. (2017). There is a vignette.

Time Series

santaR v1.0: Provides a graphical, automated pipeline for the analysis of short time series that has been designed to accommodate asynchronous time sampling, inter-individual variability, noisy measurements and large numbers of variables. There is a Getting Started Guide and vignettes on advanced command line functions, automated command line functions, plotting options, preparing input, selecting degrees of freedom, the theoretical background, and the GUI.

TSrepr v1.0.0: Provides methods for representations (e.g., dimensionality reduction, preprocessing, feature extraction) of time series. There is an Introduction to the Framework, a vignettes on representations, and a Use Case.

TSstudio v0.1.1: Provides a set of interactive visualization tools for time series analysis supporting ts, mts, zoo and xts objects including visualization functions for forecasting model performance (forecasted vs. actual), time series interactive plots (single and multiple series), and seasonality plots. The vignette shows the features available.


arrangements v1.0.2: Provides fast generators and iterators for permutations, combinations and partitions, allowing users to generate arrangements in a memory-efficient manner. Benchmarks may be found here.

fs v1.1.0: Implements a cross-platform interface to file system operations, built on top of the libuv C library. See README for details.

googlePolylines v0.4.0: Provides functions to encode simple feature (sf) objects and coordinates using the Google polyline encoding algorithm. The vignette introduces the package.

prrd v0.2.0: Provides functions to queue reverse depends for a given package, such that multiple workers can run the tests in parallel. Look here or in the README for functionality details.

rquery v0.3.1: Implements a query generator based on Edgar F. Codd’s relational algebra and operator names, which is aimed at enhancing the experience using ‘SQL’ at big-data scale. There is a vignette on the Assignment Partitioner and one on Query Generation.

tsibble v0.1.3: Provides a tbl_ts class, the tsibble, to store and manage temporal-context data in a data-centric format. There is an Introduction.


breakDown v0.1.3: Implements break-down plots which show the contribution of every variable present in the model. Vignettes cover linear models, GLMs, and ranger models.

sigmaNet v1.0.3: Offers functions to create interactive graph visualizations using Sigma.js. The vignette shows how to get started.

Share Comments · · · · ·

You may leave a comment below or discuss the post in the forum community.rstudio.com.