December 2021: "Top 40" New CRAN Packages

2022-01-24

by Joseph Rickert

One hundred thirty-four new packages made it to CRAN last December. Here are my “Top 40” picks in eight categories: Data, Genomics, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization.

Data

aurin v0.5.1: Implements an API for AURIN, Australia’s largest resource for accessing clean, integrated, spatially enabled and research-ready data on issues surrounding health and well being, socio-economic metrics, transportation, and land-use. See README for examples.

Map locating public toilets in Australia

fastRhockey v0.1.0: Implements a utility to scrape and load play-by-play data and statistics from the Premier Hockey Federation (PHF), formerly known as the National Women’s Hockey League (NWHL), and access the National Hockey League’s stats API. See README.

pedalfast.data v1.0.0: Provides data files and documentation for PEDiatric vALidation oF vAriableS in TBI (PEDALFAST) used in Bennett et al. (2016). There is a vignette describing the PEDALFAST Data and another on Functional Status Scale.

Plots of Hemorrhage and Hematoma distributions

rasterbc v1.0.1: Provides access to a large data set hosted at the Federated Research Data Repository relevant to forest ecology in British Columbia, Canada. The collection includes: elevation, biogeoclimatic zone, wildfire, and cutblocks forest attributes from Hansen et al. (2013) and Beaudoin et al. (2017). See the vignette.

Elevation map of Central Okanagan

spectator v0.1.0: Provides an interface to the Spectator Earth API, mainly for obtaining the acquisition plans and satellite overpasses for Sentinel-1, Sentinel-2 and Landsat-8 satellites. See the vignette.

Map of SPOT-& trajectory & position

Genomics

CAMML v0.1.1: Provides functions to create multi-label cell-types for single-cell RNA-sequencing data based on weighted VAM scoring of cell-type specific gene sets. See Schiebout (2022). There is a Quick Start guide, a Figure vignette and an extended example with melanoma data.

Scatter plot of CAMML vs. CITE-seq

hacksig v0.1.1: Provides a collection of cancer transcriptomics gene signatures as well as a simple, tidy interface to compute single sample enrichment scores either with the original procedure or with three alternatives: the combined z-score of Lee et al. (2008) the single sample GSEA of Barbie et al. (2009) and the singscore of Foroutan et al. (2018). See README for examples.

mispitools v0.1.5: Provides functions for computing likelihood ratios thresholds and error rates in DNA kinship testing. See Marsico et al. (2021) for background and README for examples.

Plot of False positive rate vs. false negative rate

paleobuddy v1.0.0: Provides functions to simulate species diversification, fossil records, and phylogenies along with environmental data from Morlon et al. (2016). See the vignette for an overview.

Plot of simulated lineages

toolStability v0.1.1: Provides a collection of functions for describing the stability of a trait in terms of genotype and environment and also includes a data set from Casadebaig et al. (1966). Computed indices are from Döring & Reckling (2018), Eberhart & Russell (1966), Finlay & Wilkinson GN (1963), Hanson WD (1970), Lin & Binnsn(1988), Pinthus (1973) and others. Th vignette provides an overview of the the theory and examples.

twosigma v1.0.2: Implements the TWO-Component Single Cell Model-Based Association Method for gene-level differential expression analysis and DE-based gene set testing of single-cell RNA-sequencing datasets. See Van Buren et al. (2020) and Van Buren et al. (2021) for the theory and README for examples.

Machine Learning

brulee v0.0.1: Provides high-level modeling functions to define and train models using the torch R package. Models include linear, logistic, and multinomial regression as well as multilayer perceptrons. See README for an example.

imageseg v0.4.0: Implements a general-purpose workflow for image segmentation using TensorFlow models based on the U-Net architecture by Ronneberger et al. (2015) and provides pre-trained models for assessing canopy density and understory vegetation density from vegetation photos. See the vignette.

Plots of algorithm performance

TrueSkillThroughTime v0.1.0: Provides methods to model the entire history of game activities using a single Bayesian network allowing the information to propagate correctly throughout the system. The core ideas implemented in this project were developed by Dangauthier et al. (2007). Look here for examples.

Comparison of top tennis player skills over time

Medicine

aba v0.0.9: Offers a tool to fit clinical prediction models and plan clinical trials using biomarker data across multiple analysis factors (groups, outcomes, predictors). There is a Package Overview and an Introduction to aba models.

Plot of ROC curves

expirest v0.1.2: Provides functions to estimate the release limit and the associated shelf life for chemically derived medicines in accordance with the recommendations of The Australian Regulatory Guidelines for Prescription Medicines guidance on Stability testing for prescription medicines and the International Council for Harmonisation’s guidance Q1E Evaluation of stability data. See README for examples.

Plot common slope common intercept model of moisture over time

fca v0.1.0: Provides functions to perform various floating catchment area methods to calculate a spatial accessibility index for demand point data. See Bauer & Groneberg (2016) for background and the vignette for an example.

grpseq v1.0: Provides functions to help with the design of group sequential trials, including non-binding futility analysis at multiple time points. See Gallo, Mao, & Shih (2014) for background and the vignette for an example.

Power curve for trial design

lemna v0.9.0: Implements the model equations and default parameters for the toxicokinetic-toxicodynamic model of the Lemna (duckweed) aquatic plant. Lemna is a standard test macrophyte used in ecotox effect studies. See Schmitt et al. (2013) for background. There is an introduction and a vignette on verification.

Plots of toxicant concentration, internal toxicant mass and population size

Science

AeroSampleR v0.1.12: Provides functions to estimate ideal efficiencies of aerosol sampling through sample lines. See Hogue et al. (2014) for background and README for examples.

Density plot of ambient and sampled activity

crestr v1.0.0: Implements the CREST climate reconstruction method to reconstruct past climates using biological proxies. See Chevalier (2021) for background. There is a Get started guide and vignettes on Theory, Formatting, and Calibration.

Multiple plots showing the distribution of climate data

renz v0.1.1: Provides utilities to analyze Michaelis-Menten Equation models of enzyme kinetics. See Aledo (2021) for background along with the vignettes: Enzyme Kinetic Parameters, M-M and the Lambert W function, Linearized M-M Equations, Fitting the M-M Model, and Integrated M-M Equation .

Block diagram summary of methods to estimate the kinetic properties of an enzyme

Statistics

brmsmargins v0.1.1: Provides functions to calculate Bayesian marginal effects and average marginal effects for models fit using the brms package including fixed effects, mixed effects, and location scale models. See Pavlou et al. (2015) for background and the vignettes on marginal effects for Fixed Effects, Location Scale, and Mixed models.

changepoints v1.0.0: Implements a series of offline and/or online change-point detection algorithms for univariate mean, univariate polynomials, univariate and multivariate nonparametric settings, high-dimensional covariances, high-dimensional networks with and without missing values, high-dimensional linear regression models, high-dimensional vector autoregressive models, high-dimensional self exciting point processes, dependent dynamic nonparametric random dot product graphs, and univariate mean against adversarial attacks. See README for references for all of these methods and the vignettes VAR and Univariate means for examples.

cmtest v0.1-1: Provides functions to perform conditional moments test, as proposed by Newey (1985) and Tauchen (1985), useful to detect specification violations for models estimated by maximum likelihood. See the vignette.

gplsim v0.9.1: Provides functions that employ penalized splines to estimate generalized partially linear single index models which extend the generalized linear models to include nonlinear effect for some predictors. See Yu et al. (2017) and Yu & Ruppert (2002) for the details and README for an example.

gremes v0.1.0: Provides tools for estimation of the tail dependence parameters in graphical models parameterized by family of Huesler-Reiss distributions. See Asenova et al. (2021) for background. There is an Introduction and fifteen small vignettes described here.

kfa v0.1.0: Provides functions to explore possible factor structures for a set of variables and helps identify plausible and replicable structures via k-fold cross validation. See Flora & Flake (2017) and MacCallum et al.(1996) for background and README for examples.

lrstat v0.1.1: Provides functions to perform power and sample size calculation for non-proportional hazards model using the Fleming-Harrington family of weighted log-rank tests. See Tsiatis (1982) for background. There are six vignettes including Sample Size Calculation with Fixed Follow-up and Power Calculation Using Max-Combo Tests.

nphPower v1.0.0: Provides tools to perform combination tests and sample size calculation for fixed designs with survival endpoints using combination tests under either proportional hazards or non-proportional hazards. See Brendel et al. (2014) and Cheng & He (2021) for background on projection and maximum weighted logrank tests, Schoenfeld (1981) and and Freedman (1982) on sample size calculation under the proportional hazards assumption and Lakatos (1988) for calculation under non-proportional hazards, and the [README]() for an example.

Plot of weights for a logistic regression model

optedr v1.0.0: Provides functions to calculate D-optimal, Ds-optimal, A-optimal, and I-optimal designs for non-linear models, via an implementation of the cocktail algorithm described in Yu (2011). See README for examples.

Efficiency curve for D-optimal design

quid v0.0.1: Provides functions to test whether equality and order constraints hold for all individuals simultaneously by comparing Bayesian mixed models through Bayes factors. See Haaf & Rouder (2017) and Haaf, Klaassen & Rouder (2019), and Rouder & Haaf (2021) for background and the Quick Start guide and Manual for examples.

Plot comparing observed effects with model estimates

Utilities

abbreviate v0.1: Provides functions to abbreviate strings to at least minlength characters, such that they remain unique (if they were) and are recognizable. See README for examples.

fledge v0.1.0: Offers functions to streamline the process of updating changelogs (NEWS.md) and versioning R packages developed in git repositories. There is Quick Start guide, a demo, and a short vignette on internals.

qlcal v0.0.1: Provides QuantLib bindings using Rcpp via an evolved version of the initial header-only Quantuccia project offering an subset of QuantLib. See README for examples.

jagstargets Builds on targets and JAGS to implement a pipeline toolkit tailored to Bayesian statistics making it easy to set up scalable JAGS pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. There is an Introduction and a vignette that uses simulation to validate a Bayesian model.

Network diagram of a pipeline

RCLables v0.1.0: Provides functions to assist manipulation of matrix row and column labels for all types of matrix mathematics where row and column labels are to be respected. See the vignette.

trampoline v0.1.1: Implements a trampoline algorithm based on on the Python trampoline module that enables users to write theoretically infinite recursive functions. See the vignette and README for examples.

Visualization

recolorize v0.1.0: Offers automatic, semi-automatic, and manual functions for generating color maps from images. The idea is to simplify the colors of an image according to a metric that is meaningful to the user, using deterministic methods whenever possible. There is an Introduction and a vignette for each of the six steps involved in the process: Acquisition & Preparation, Loading & Processing, Clustering, Refinement, Tweaks & Edits, and Export & Visualization.

Original, bit mapped, and vector mapped images of insect

scImmuneGraph v1.1.3: Provides functions to compute statistics and visualize multiple distributions including diversity, composition of clonotypes, abundance and length of CDR3, the abundance of V and J genes, and the abundance of V-J gene pairs; all of which are basic for single-cell immune group analysis. See the vignette.

Plots of CDR3 length distribution