An R Users Guide to JSM 2019

by Joseph Rickert

If you are like me, and rather last minute about making a plan to get the most out of a large conference, you are just starting to think about JSM 2019 which will begin in just a few days. My plans always begin with an attempt to sleuth out the R-related sessions. While in the past it took quite a bit of work to identify talks that were likely backed by R-based calculations, this is clearly no longer the case. In fact, because Stanford Professor Trevor Hastie will be delivering the prestigious Wald Lectures this year, R-backed work will be front and center.

Professor Hastie has made numerous, important contributions to statistical learning, machine learning, data science and statistical computing. Among the latter, is the glmnet package he co-authored with Jerome Friedman, Rob Tibshirani, Noah Simon, Balasubramanian Narasimhan and Junyang Qian which has become a fundamental resource.

The Wald Lectures will be delivered over three days in room CC Four Seasons 1 according to the following schedule:
* Lecture 1: Mon, 7/29/2019, 10:30 AM - 12:20 PM
* Lecture 2: Tue, 7/30/2019, 2:00 PM - 3:50 PM
* Lecture 3: Wed, 7/31/2019, 10:30 AM - 12:20 PM

If you want to do some preparation for the lectures, you might have a look at the book Statistical Learnig with Sparsity; The Lasso and Generalizations by Hastie, Tibshirani and Wainwright.

The rest of this post lists some R-related talks that can help you fill your days at JSM! I am sure my list is not complete. Please feel free to add anything I may have missed to the comments section following this post.

Sunday, July 28, 2019

Findings from Analysis and Visualization of the New York City Housing and Vacancy Survey Data - CC 501 - 3:20 PM - Nels Grevstad, Metropolitan State University of Denver; Rachel Rosebrook, Metropolitan State University of Denver; Lance Barto, Metropolitan State University of Denver; Gil Leibovich, Metropolitan State University of Denver; Elizabeth Foster, Metropolitan State University of Denver; ThienNgo Le, Metropolitan State University of Denver; Kelsey Smith, Metropolitan State University of Denver; Nathanael Whitney, Metropolitan State University of Denver; Zoe Girkin, Metropolitan State University of Denver; Ahern Nelson, Metropolitan State University of Denver; Karan Bhargava, Metropolitan State University of Denver; Alex Whalen-Wagner, Metropolitan State University of Denver; Gemma Hoeppner, Metropolitan State University of Denver; Larry Breeden, Metropolitan State University of Denver; Ayako Zrust, Metropolitan State University of Denver; Travis Rebhan, Metropolitan State University of Denver; Anayeli Ochoa, Metropolitan State University of Denver

Bayesian Uncertainty Estimation Under Complex Sampling - Speed: CC 502 - 3:00 PM - Matthew Williams, National Science Foundation; Terrance Savitsky, Bureau of Labor Statistics

Measuring Gentrification Over Time with the NYCHVS - Poster: CC Hall C - 4:00 PM - 4:45 PM Robert Montgomery, NORC; Quentin Brummet, NORC; Nola du Toit, NORC at the University of Chicago; Peter Herman, NORC at the University of Chicago; Edward Mulrow, NORC at the University of Chicago

A SHINY Markov Machine for Decision-Making in Major League Baseball - Part 1: CC105 - 2:45 PM and Part 2: CC Hall C - 4:00 PM to 4:45 PM Jason Osborne, North Carolina State University

Measuring Gentrification Over Time with the NYCHVS - CC 501 - 2:55 PM - Robert Montgomery, NORC; Quentin Brummet, NORC; Nola du Toit, NORC at the University of Chicago; Peter Herman, NORC at the University of Chicago; Edward Mulrow, NORC at the University of Chicago

A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data - CC 301 - 3:25 PM - Earo Wang, Monash University; Dianne Cook, Monash University; Rob J Hyndman, Monash Univeristy

TensorFlow Versus H20, Predicting the SandP500 - CC 504 - 4:50 PM - Kenneth Davis

Model-Based Clustering Using Adjacent-Categories Logit Models via Finite Mixture Model - CC 504 - 5:05 PM - Lingyu Li, Victoria University of Wellington; Ivy Liu, Victoria University of Wellington; Richard Arnold, Victoria University of Wellington

The Estimable Luke Tierney – and Estimability in R - CC 501 - 5:20 PM - Russell V. Lenth, University of Iowa

Monday, July 29, 2019

Training Students Concurrently in Data Science and Team Science: Results and Lessons Learned from Multi-Institutional Interdisciplinary Student-Led Research Teams 2012-2018 -Poster: CC Hall C- 2:00 PM to 3:50 PM - Brent Ladd, Purdue University; Mark Ward, Purdue University

A Natural Language Processing Algorithm for Medication Extraction from Electronic Health Records Using the R Programming Language: MedExtractR - Pister: CC Hall C - 2:00 PM to 3:50 PM - Hannah L Weeks, Vanderbilt University; Cole Beck, Vanderbilt University Medical Center; Elizabeth McNeer, Vanderbilt University; Joshua C Denny, Vanderbilt University; Cosmin A Bejan, Vanderbilt University; Leena Choi, Vanderbilt University Medical Center

Conditional Probability and SQL for Data Science - Poster: CC Hall C - 10:30 AM to 12:20 PM - Eric Suess, CSU East Bay

R Markdown: a Software Ecosystem for Reproducible Publications - CC 107- 11:55 PM - Yihui Xie, RStudio, Inc.

Infusing Bayesian Strategies for Pharmaceutical Manufacturing and Development - CC 109- 12:05 PM - Bill Pikounis, Johnson & Johnson; Dwaine Banton, Janssen R&D; John Oleynick, Johnson & Johnson; Jyh-Ming Shoung, Janssen R&D

Tuesday, July 30, 2019

Controlling the False Discovery Proportion: a Simulation Study - Poster: CC Hall C - 10:30 AM to 12:20 PM HARLAN MCCAFFERY, University of Michigan; Chi Chang, Michigan State University

Give Your Statistician Colleague Iris Bulbs for Their House Warming! - CC 605 - 11:05 AM - Dianne Cook, Monash University

From Prediction Models to Shiny App: Creating a Tool for Contaminated Food Source Prediction in Salmonella and STEC Outbreaks - CC Hall C - 11:35 AM to 12:20 PM - Caroline Ledbetter, University of Colorado; Alice White, Colorado School of Public Health; Elaine Scallan Walter, Colorado School of Public Health; David Weitzenkamp, Colorado School of Public Health

Stats for Data Science - H-Centennial Ballroom G-H - Round Table: 12:30 PM to 1:50 PM - Daniel Kaplan, Macalester College

Experiences with Incorporating R into a Second-Level Biostatistics Course for MPH Students - CC Hall C - 2:00 PM to 2:45 PM - Christine Mauro, Columbia University; Nicholas Williams, Columbia University; Anjile An, Columbia University

From Prediction Models to Shiny App: Creating a Tool for Contaminated Food Source Prediction in Salmonella and STEC Outbreaks - CC 501 - 8:40 AM Caroline Ledbetter, University of Colorado; Alice White, Colorado School of Public Health; Elaine Scallan Walter, Colorado School of Public Health; David Weitzenkamp, Colorado School of Public Health

Tools for Evaluating Quality of State and Local Administrative Data - CC708 - 9:15AM - Zachary H Seeskin, NORC at the University of Chicago; Gabriel Ugarte, NORC at the University of Chicago; Rupa Datta, NORC at the University of Chicago

Wednesday, July 31, 2019

Ggvoronoi: Voronoi Tessellations in R - CC 105 - 11:20 AM -Thomas J Fisher, Miami University; Robert C Garrett, Miami University; Karsten Maurer, Miami University

Using R to Conduct Retrospective Analyzes of EHR and Imaging Data: a Case Study in MS - Poster: CC Hall C - 10:30 AM - 12:20 PM - Melissa Martin, University of Pennsylvania; Russell Shinohara, University of Pennsylvania

Generalized Causal Mediation and Path Analysis and Its R Package gmediation Talk: - CC 501 - 8:45 AM - and Poster: CC Hall C - 11:35 AM - 12:20 PM - Jang Ik Cho, Eli Lilly and Company; Jeffrey M Albert, Case Western Reserve University

Tidi_MIBI: a Tidy Pipeline for Microbiome Analysis and Visualization in R - Speed Talk: CC 501 - 10:15 AM and Poster: CC Hall C - 11:35 AM - 12:20 PM - Charlie Carpenter, University of Colorado-Biostatistics

Incorporating Spatial Statistics into Routine Analysis of Agricultural Field Trials - CC Hall C - 11:35 AM - 12:20 PM - Julia Piaskowski, University of Idaho; Chad Jackson, University of Idaho; Juliet Marshall, University of Idaho; William J Price, University of Idaho

Incorporating Spatial Statistics into Routine Analysis of Agricultural Field Trials - CC 501 - 10:05 AM - Julia Piaskowski, University of Idaho; Chad Jackson, University of Idaho; Juliet Marshall, University of Idaho; William J Price, University of Idaho

DemoR: Tools for Teaching and Presenting R Code - CC 302 - 10:35 AM - Kelly Bodwin, California Polytechnic State University; Hunter Glanz, California Polytechnic State University

Ghclass: An R Package for Managing Classes with GitHub - CC 302 - 10:50 AM - Colin Rundel, Duke University

Using and Building Shiny Apps for Teaching Introductory Biostatistics CC 504 - 11:05 AM - Adam Ciarleglio, The George Washington University

Using GitHub and RStudio to Facilitate Authentic Learning Experiences in a Regression Analysis Course - CC 302 - 11:05 AM - Maria Tackett, Duke University

A Generalized Additive Cox Model with L1-Penalty for Heart Failure Time-To-Event Outcomes and Comparison to Other Machine Learning Approaches - CC 712 - 3:20 PM - Matthias Kormaksson

Supplementary Code

In case you are wondering how I produced the plot above, here is the code which uses the cranly and dlstats packages to investigate CRAN.

library(tidyverse)
library(cranly)
library(dlstats)
# Get clean copy of CRAN
p_db <- tools::CRAN_package_db()
package_db <- clean_CRAN_db(p_db)
# Build package network
package_network <- build_network(package_db)

# Find Hastie packages
pkgs <- package_by(package_network, "Trevor Hastie")
# Find most downloaded Hastie packages
dstats <- cran_stats(pkgs)
topdown <- group_by(dstats,package) %>% 
           summarize(n=sum(downloads)) %>% 
           arrange(desc(n)) %>% filter(n > 100000)

# Plot the monthly downloads for Hastie's top 5 packages
shortlist <- select(topdown,package) %>% slice(1:5) 
toppkgs <- cran_stats(as.vector(shortlist$package))

ggplot(toppkgs, aes(end, downloads, group=package, color=package)) +
  geom_line() + geom_point(aes(shape=package)) + xlab("Monthly Downloads") + ggtitle("Trevor Hastie Packages")
Share Comments · ·

You may leave a comment below or discuss the post in the forum community.rstudio.com.