May 2022: "Top 40" New CRAN Packages

by Joseph Rickert

One hundred seventy-nine new packages made it to CRAN in May. Here are my “Top 40” picks in twelve categories: Computational Methods, Data, Ecology, Epidemiology, Finance, Machine Learning, Networks, Science, Statistics, Time Series, Utilities, and Visualization.

Computational Methods

graDiEnt v1.0.1: Implements the derivative-free, optim-style Stochastic Quasi-Gradient Differential Evolution optimization algorithm published in Sala, Baldanzini, and Pierini (2018) that uses population members to build stochastic gradient estimates. See README for an example.

rxode2 V2.0.7: Provides facilities for running simulations from ordinary differential equation models, such as pharmacometrics and other compartmental models, but requires both C and Fortran compilers. See the vignette for an example.

ScaleSpikeSlab v1.0: Provides a scalable Gibbs sampling implementation for high dimensional Bayesian regression with the continuous spike-and-slab prior described in Biswas, Mackey & Meng (2022). See README for an example.

Data

bluebike v0.0.3: Provides functions that facilitate importing and working with the Boston Blue Bike Data Set including functions to compute trip distances and map the locations of stations within a given radius. See the vignette for examples.

Map showing radius from selected station

eurodata v1.4.2: Implements an interface to Eurostat’s Bulk Download Facility with fast data.table based import of data, labels, and metadata along with data search and data description and comparison functions. See README to get started.

gbifdh v0.1.2: Implements a high performance interface to the Global Biodiversity Information Facility that supports large-scale analyses using SQL or dplyr operations on complete tables. See the vignette for examples.

GBIF observations of vertebrates by class

getwiki v0.9.0: Implements a simple wrapper for Wikipedia data to retrieve text in a tidy format that can be used for Natural Language Processing. See the vignette.

Ecology

FIESTA v3.4.1: Implements an estimation tool for analysts that work with sample-based inventory data from the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis Program. There are nine vignettes including manuals for Module Estimates and Population Data, and Small Area Estimators and Spatial Tools.

Example of soil test plot from Cate & Nelson (1971)

soiltestcorr v2.1.2: Provides functions designed to assist users on the correlation analysis of crop yield and soil test values including functions to estimate crop response patterns to soil nutrient availability and critical soil test values using various approaches. See Correndo et al. (2017), Cate & Nelson (1971), Anderson & Nelson (1975), Bullock & Bullock (1994) and Melsted & Peck (1977) for background. There are seven vignettes including an Introduction and Quadratic-plateau response.

sspm v0.9.1: Implement a gam-based spatial surplus production model, aimed at modeling northern shrimp population in Atlantic Canada but potentially to any stock in any location. There is a vignette on Package and Workflow design and another that provides an example with simulated data.

Epidemiology

EpiInvert v0.1.1: Inverts a renewal equation to estimate time-varying reproduction numbers and restored incidence curves with festive days and weekly biases corrected as described in Alvarez et al. (2021) and Alvarez et al. (2022). See README for examples.

Plots of incidence curves

linelist v0.0.1: Provides tools to help storing and handling case line list data by adding a tagging system to classical data.frame objects to identify key epidemiological data. See the vignette for examples.

Finance

markets v1.0.3: Provides functions to estimate markets in equilibrium and disequilibrium based on full information maximum likelihood techniques given in Maddala and Nelson (1974) and implemented using the analytic derivative expressions calculated in Karapanagiotis (2020). There is an Overview providing theory and code and vignettes on Model initializion details, Market-clearing assessment, and Use cases.

portvine v1.01: Provides portfolio level risk estimates including value at Risk and Expected Shortfall following the approach described in Sommer (2022) by modeling each asset with an ARMA-GARCH model and then modeling their cross dependency via a Vine Copula in a rolling window fashion. See the vignette to get started.

Comparison of unconditional risk measurements of assets over a trading day

usincometaxes v0.4.0: Implements a wrapper to the NBER’s TAXSIM 35 tax simulator. TAXSIM 35 to calculate federal and state income taxes. There are vignettes on Uploading Data, Input Columns, Output Columns, and Calculating Taxes.

Plot of the relationship between wages and income taxes paid

Genomics

MixviR v3.3.5: Implements tools for exploring DNA and amino acid variation and inferring the presence of target lineages from microbial high-throughput genomic DNA samples that potentially contain mixtures of variants/lineages. MixviR was originally created to help analyze environmental SARS-CoV-2/Covid-19 samples from environmental sources such as waste water or dust, but can be applied to any microbial group. See DePristo et al. (2011) and Danecek et al. (2021) for background, and the vignette for examples.

Example of a lineage associated mutation file

simer v0.9.0.0: Implements a data simulator including genotype, phenotype, pedigree, selection and reproduction for animals and plants and provides data for genomic gelection, genome-wide association, and breeding studies. See Kao and Zeng (2002) and Ripley (1987) for background. Look here for extensive documentation.

Machine Learning

fastTopics v0.6-135: Implements fast, scalable optimization algorithms for fitting topic models and non-negative matrix factorization for count data. The methods exploit the special relationship between the multinomial topic model (probabilistic latent semantic indexing) and Poisson non-negative matrix factorization. See the vignettes: Relationship between NMF and topic modeling and Topic mideling vs. clustering.

metrica v1.2.3: Provides functions to evaluate prediction performance of point-forecast models accounting for different aspects of the agreement between predicted and observed values including error metrics, model efficiencies, indices of agreement, goodness of fit, concordance correlation, and error decomposition, and plots the visualized agreement. See the vignettes on Available Metrics and Model Assessment.

Plot model fit with metrics table.

SparseVFC v0.1.0: Implements The sparse vector field consensus algorithm described in Ma et al. (2013) for robust vector field learning. See the vignette.

Plot of vector field.

Networks

Rwclust v0.0.1: Implements the random walk clustering algorithm for weighted graphs as found in Harel and Koren (2001). See the vignette.

Plot of network with edge weights.

Science

EvoPhylo v0.1: Provides functions to support automated morphological character partitioning for phylogenetic analyses, and analyses of macroevolutionary parameter outputs. See Simões and Pierce (2021) for background. There is a vignette on Theoretical Background, and there are others on Character Partitioning, FBD parameters, and Evolutionary Rates.

Plots of Clade distributions by clock

sits v1.0.0: Provides an end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021). Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, and Digital Earth Africa using the STAC protocol and the gdalcubes package. An eBook provides extensive documentation.

Plot of CBERS-4 image covering an area in the Brazilian Cerrado.

Statistics

biosensors.usc v1.0: Provides a framework for using distributional representations of biosensor data such as ECG, medical imaging or fMRI data in various statistical modeling tasks: regression models, hypothesis testing, cluster analysis, visualization, and descriptive analysis. See Matabuena et al. (2021) for background and the vignette for an introduction.

Conditional mean, quantile, and residual curves for Wasserstein regression

CopSens v0.1.0: Implements the copula-based sensitivity analysis method for observational causal inference discussed in Zheng et al. (2022). See README for examples.

Estimated causal effect of covariates

GeoModels v1.0.1: Provides functions to analyze Gaussian and Non Gaussian (bivariate) spatial and spatio-temporal data and simulate random fields using likelihood methods. See Bevilacqua and Gaetan (2015) and Vallejos et al. (2020) for background, and look here for examples.

World map with MSE contours

nlmixr2 v2.0.6: Fit and compare nonlinear mixed-effects models with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics using differential equations solved by compiled C code provided in the rxode2 package. See Almquist et al. (2015) and Wang et al. (2015) for background, and the vignette for examples.

Default nlmixr2 plots

stdmod v0..1.7.1: Provides functions for computing a standardized moderation effect in moderated regression and forming its confidence interval by nonparametric bootstrapping as proposed in Cheung et al. (2002). There are six vignettes including a Quick Start Guide and a vignette on Standardized Moderation.

Plot of moderation effects on two variables

Time Series

forceR v1.0.15: Initially written and optimized to deal with insect bite force measurements, the functions in this package can be used to clean and analyze any time series. They provide a workflow to load, plot and crop data, correct amplifier and baseline drifts, identify individual peak shapes, rescale (normalize) peak curves, and find best polynomial fits to describe and analyze force curve shapes. See the vignette for examples.

Plot of time series correct with spline fit

ZINARp v0.1.0: Provides functions for simulation, exploratory data analysis and Bayesian analysis of p-order integer-valued autoregressive, INAR(p), and zero-inflated p-order integer-valued autoregressive, ZINAR(p), processes, as described in Garay et al. (2020).

Utilities

async v0.2.1: Provides functions for writing sequential-looking code that pauses and resumes similarly to generator and async constructs from Python or JavaScript. Objects produced are compatible with the iterators and promises packages. See the vignette for an example.

Musical score showing a loop

chronicler v0.2.0: Provides tools to decorate a function so that it returns its along with a log detailing when the function was run, what were its inputs, what were the errors (if the function failed to run) and other useful information. See the vignettes: A non-mathematician’s introduction to monads, The Maybe monad, and A real world example.

jpgrid v0.2.0: Provides functions to generate Japanese grid square codes from longitude, latitude and geometries. See README for examples.

Plot of grid codes

partialised v0.1.0: Provides a partialised class that extends the partialising function of purrr by making it easier to change the arguments. This is similar to the function-like object in `Julia’. See README for an example.

shinybrowser v1.0.0: Provides information about shiny app users including browser name and version, device type (mobile or desktop), operating system and version, and browser dimensions. See README for more information.

webshot2 v0.1.0: Takes screenshots of web pages, including Shiny applications and R Markdown documents using a headless Chrome or Chromium browser as the browser back-end. See README for examples.

Visualization

accrualPlotv1.0.1: Implements accrual plots for clinical trials. See the vignette.

Plots showing recruited patients by site over time

ggbraid v0.2.2: Implements stat_braid(), that extends the functionality of geom_ribbon() to correctly fill the area between two alternating lines (or steps) with two different colors, and geom_braid(). Three vignettes, US Supreme Court, NBA Finals Game, and Average Daily Temperatures provide examples.

Fills area between two lines with two colors. One color when the solid line is above the dashed line, and a different color when the solid line is below the dashed line?

ggisotonic v0.1.2: Provides stat_isotonic() to add weighted univariate isotonic regression curves. See README for an example.

Scatter plot with isotonic regression curve

UpSetVP v1.0.0: Uses the ideas of variance partitioning and hierarchical partitioning as described in Lai et al. (2022) to visualize the unique, common, or individual contribution of each predictor (or matrix of predictors) towards explaining variation. Look here for examples.

Bar plot and corresponding `upset_vp()` plot

Share Comments · · · ·

You may leave a comment below or discuss the post in the forum community.rstudio.com.