August 2022: "Top 40" New CRAN Packages

2022-09-26

by Joseph Rickert

One hundred ninety-four new package made it to CRAN in August. Here are my “Top 40” picks in thirteen categories: “Computational Methods, Data, Epidemiology, Genomics, Insurance, Machine Learning, Mathematics, Medicine, Pharmaceutical Applications, Statistics, Time Series, Utilities, and Visualization.

Computational Methods

brassica v1.0.1: Executes BASIC programs from the 1970s for historical and educational purposes. This enables famous examples of early machine learning, artificial intelligence, natural language processing, cellular automata, and so on, to be run in their original form. See BASIC.

MultistatM v1.1.0: Provides algorithms to build set partitions and commutator matrices used in the construction of multivariate d-variate Hermite polynomials. See the vignette.

Data

hubeau v0.3.1: functions to retrieve data from Hub’Eau the free and public French National APIs on water. See the vignette.

tinytiger v0.0.4: Download geographic shapes from the United States Census Bureau TIGER/Line Shapefiles. See the vignette.

Epidemiology

disbayes v1.0.0: Provides functions to estimate incidence and case fatality for a chronic disease, given partial information, using a multi-state model. See Jackson et al. (2021) for a description of the methods and the vignette for examples.

insectDiseases v1.2.1: Provides the Ecological Database of the Worlds Insect Pathogens created by David Onstad. See the vignette.

nncc v1.0.0: Provides nearest-neighbors matching and analysis of case-control data. See Cui et al. (2022) for background and the vignette for examples.

Distance density plot

Genomics

epitopR v0.1.2: Offers a suite of tools to predict peptide MHC (major histocompatibility complex) presentation in the context of both human and mouse. Results are based on half maximal inhibitory concentration as queried through the immune epitope database API. See Vita et al. (2018) for background and the vignette for an example.

geneHapR v1.1.1: Provides functions to import genome variants data and perform gene haplotype Statistics, visualization and phenotype association. There is an Introduction, and there are vignettes on Data, and Workflow.

Workflow diagram

geniePBC v1.0.0: Linked to The American Association Research Project Genomics Evidence Neoplasia Information Exchange (GENIE) BioPharma Collaborative, this package provides an interface to the data corresponding to each release from Synapse and to prepare datasets for analysis. There are several vignettes including drug regimen sunburst and pull data synapse.

Sunburst plot showing 3 lines of treatment

slendr v0.3.0: Implements a framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed using a custom built-in simulation ‘SLiM’ script as described in Haller et al. (2019). There are several vignettes including A Basic Tutorial, Programming Dispersion Dynamics, and Tree-sequence processing.

Network of ancestral links

Insurance

actxps v0.2.0: Supports actuarial experience studies with functions to prepare data, create studies and perform exposure calculations as described in: Atkinson & McGarry (2016). Look here for an example.

Shiny app for surrender experience study

Machine Learning

aorsf v0.0.2: Provides functions to fit, interpret, and predict with oblique random survival forests. See Jaeger et al. (2022) for methods to accelerate and interpret oblique random survival forest models. There is an Introduction and there are vignettes on Predictions and PD and ICE Curves.

Risk ratio curve over time

calibrationband v0.2.1: Provides functions to assess the calibration of probabilistic classifiers using confidence bands for monotonic functions, and also facilitate constructing inverted goodness-of-fit tests whose rejection allows for a sought-after conclusion of a sufficiently well-calibrated model. See Dimitriadis et al. (2022) for details and README for an example.

Plot of calibration curve vs. predicted probability

kernelshap v0.2.0: Multidimensional version of the iterative Kernel SHAP algorithm from game theory that has become popular for interpreting Machine Learning Models. Covert & Lee (2021) for details and README for an example.

Shap value plot for iris data set

mlim v0.0.9: Uses automated machine learning techniques to fine-tune a Elastic Net, Gradient Boosting, Random Forest, Deep Learning, Extreme Gradient Boosting, or Stacked Ensemble machine learning model for imputing the missing observations of each variable. See README for examples.

mlim workflow

tidytags v1.0.2: Facilitates the analysis of Twitter data by coordinates the simplicity of collecting tweets over time with a Twitter Archiving Google Sheet and the utility of the rtweet package for processing and preparing Twitter metadata. There is a Getting Started Guide and a vignette on using conference hashtags.

Network plot showing popularity of tweets

Mathematics

ambit v0.1.2: Provides tools to simulate and estimate various types of ambit processes, including trawl processes and weighted trawl processes. There is a vignette on Simulating trawl processes and another on estimating the Trawl function.

Plot of trawl process

rTensor2 v0.1.1: Provides a set of tools for basic tensor operators including eigenvalue decomposition, the QR decomposition and LU decompositions, the inverse of a tensor, and the transpose of a symmetric tensor. (A tensor in this context is a multidimensional array.) See Kernfeld et al. (2015) for background.

Medicine

RIbench v1.0.1: Provides a benchmark suite of tools for indirect methods of reference interval estimation. See the vignette.

$Boxplots split by pathological fraction or sample size for a certain distribution type$

spiro v0.1.1: Provides functions to import, process, summarize and visualize raw data from metabolic carts. See Robergs, Dwyer, and Astorino (2010) for details and the vignettes Importing & Processing and Summarizing & Plotting.

Multiple plots from cardiopulmonary exercise testing

Pharmaceutical Applications

gtreg v0.1.1: Provides functions that leverage the gtsummary package to creates tables suitable for regulatory agency submissions. See the vignette on adverse event counting methods.

tidyCDISC v0.1.0: Implements a drag and drop Shiny application to allow users to construct [ADaM](https://www.cdisc.org/adam-course#:~:text=The%20Analysis%20Data%20Model%20(ADaM,wide%20variety%20of%20analysis%20methods.) and CDISC compliant tables and plots for studying population and patient level data. See the vignette for more information and look here to run a simulation.

Data entry page for Shiny app

Statistics

cassowaryr v2.0.0: Computes a range of scatterplot diagnostics (scagnostics) on pairs of numerical variables in a data set including the graph and association-based scagnostics described in Wilkinson & Graham (2008) doi:10.1198/106186008X320465 and the association-based scagnostics described by Grimm (2016). See the vignette.

Scagnostic plots

contsurvplot v0.1.1: Provides tools to visualize the causal effect of a continuous variable on a time-to-event outcome including survival area plots, survival contour plots, survival quantile plots and 3D surface plots. See Denz & Timmesfeld (2022) for details and the vignette for examples.

Plot of survival area curves

copre v0.1.0: Provides functions for Bayesian nonparametric density estimation using Martingale posterior distributions and includes the Copula Resampling (CopRe) algorithm, a Gibbs sampler for the marginal Mixture of Dirichlet Process (MDP) model, and an extension for full uncertainty quantification via a new Polya completion algorithm. See Moya & Walker (2022), Fong et al. (2021), and Escobar & West (1995) for background and README for an example .

Plot of mixture distributions

InteractionPoweR v0.1.1: Provides functions for power analysis of regression models which tests the interaction of two independent variables on a single dependent variable. See the tutorial and README to get started.

singR v0.1.1: Implements the SING algorithm to extract joint and individual non-Gaussian components from two datasets. See Risk & Gaynanova (2021) for the theory and the vignette for examples.

Multiple plots showing joint loadings

SIRthresholded v1.0.0: Implements a version of the Sliced Inverse Regression method which may be used for variable selection. See the vignette.

Plots showing thresholded regularization paths

sparseR v0.2.0: Implements ranked sparsity methods, including penalized regression methods such as the sparsity-ranked lasso, its non-convex alternatives, and elastic net, as well as the sparsity-ranked Bayesian Information Criterion. See Peterson and Cavanaugh (2022) for background and the vignette for an overview.

Plot of parameter estimate vs log of penalty

spmodel v0.1.0: Provides functions to fit, summarize, and predict a variety of spatial statistical models. Modeling features include anisotropy, random effects, partition factors and big data approaches. There is an Overview, a Detailed Guide and a vignette on Technical Details.

TRexSelector v0.0.1: Provides functions to perform fast variable selection in high-dimensional settings while controlling the false discovery rate at a user-defined target level. See Machkour et al. (2021) for background and the vignette for an overview.

Diagram of TRexSelector Framework

VAJointSurv v0.1.0: Provides functions to estimate joint marker (longitudinal) and survival (time-to-event) outcomes using variational approximations which allow for correlated error terms and multiple types of survival outcomes which may be left-truncated, right-censored, and recurrent. See the vignette for some theory and examples.

Plots of estimated population means with pointwise confidence intervals

Time Series

gmwmx v1.0.2: Implements the Generalized Method of Wavelet Moments with Exogenous Inputs estimator (GMWMX) presented in Cucci et al. (2022). There are vignettes on Estimating and comparing models, Removing extreme values in time series, Generating data, and Simulation.

Plot of functional model of a time series

Utilities

countdown v0.4.0: Provides a simple countdown timer for slides and HTML documents written in RMarkdown or Quarto and with Shiny. See README for an example.

Countdown clock

D4TAlink.light v2.2.9: Provides workload management tools to facilitate structuring R&D activities to comply with FAIR principles as discussed by Jacobsen et al. (2017) and with ALCOA+ principles as proposed by the FDA. See the vignette.

tinkr v0.1.0: Provides functions to convert Markdown and RMarkdown files to XML and back to allow their editing with xml2 (XPath) instead of numerous complicated regular expressions. See the vignette.

typetracer v0.1.1: The R language includes a set of defined types, but has been described as being “absurdly dynamic” (Turcotte & Vitek (2019), and lacks tools to specify which types are expected by an expression. typetracerprovides functions to extract detailed information on the properties of parameters passed to R functions. See the vignette.

Visualization

atime v2022.9.16: Provides functions for computing and visualizing comparative asymptotic timings of different algorithms and code versions. Also includes functionality for comparing empirical timings with expected references such as linear or quadratic. There are several vignettes including Comparing git versions, Optimal segmentation examples, and Regular Expression examples.

Time comparison plots

ggsurvfit v0.1.0: Extends ggplot2 to ease the creation of publication ready survival plots. See the vignettes Gallery and Themes.

Annotated Survival Plot

oblicubes v0.1.2: Extends the isocubes package to provide three-dimensional rendering for grid and ggplot2 graphics using cubes and cuboids drawn with an oblique projection. See the vignette.

Textured rendering of 3D object with multiple holes

ordr v0.1.0: Provides a tidyverse extension for ordinations and biplots. Ordination comprises several multivariate techniques including principal components analysis which rely on eigen-decompositions or singular value decompositions of pre-processed numeric matrix data. The overlay of the resulting shared coordinates on a scatterplot is called a biplot. See Roux & Rouanet (2005) for background. See the vignettes on Covariance Data and Ordination.

Example of a biplot