TokyoR #71

Last month, I was delighted to be invited to speak, along with Hadley Wickham, at the seventy-first meeting of the TokyoR user group in Tokyo, Japan. This day-long mini-conference attracted more than 200 attendees and featured 16 talks that covered a wide range of topics, including two near-real-time analyses of World Cup Soccer games (here and here) and an analysis of wind direction with circular data and autogressive processes (here). The tone of the talks ranged from light-hearted to business-serious.

Read more

Share Comments · · ·

Highcharting Jobs Friday

Today, in honor of last week’s jobs report from the Bureau of Labor Statistics (BLS), we will visualize jobs data with ggplot2 and then, more extensively with highcharter. Our aim is to explore highcharter and its similarity with ggplot and to create some nice interactive visualizations.

Read more

Share Comments · ·

Two Big Ideas from JSM 2018

The Joint Statistical Meetings offer an astounding number of talks. It is impossible for an individual to see more than a small portion of what is going on. Even so, a diligent attendee ought to come away with more than a few good ideas. The following are two big ideas that I got from the conference. Session 149, an invited panel on Theory versus Practice which featured an All-Star team of panelists (Edward George, Trevor Hastie, Elizaveta Levina, John Petkau, Nancy Reid, Richard J Samworth, Robert Tibshirani, Larry Wasserman and Bin Yu), covered a lot of ground and wove a rich tapestry of ideas.

Read more

Share Comments ·

June 2018: Top 40 New Packages

Approximately 144 new packages stuck to CRAN in June. That fact that 31 of these are specialized to particular scientific disciplines or analyses provides some evidence to my hypothesis that working scientists are actively adopting R. Below are my Top 40 picks for June, organized into the categories of Computational Methods, Data, Data Science, Economics, Science, Statistics, Time Series, Utilities and Visualizations. The Data packages, especially rtrek and opensensmapr, look like they have some interesting new data to explore.

Read more

Share Comments · · ·

JSM 2018 Itinerary

JSM 2018 is almost here! Usually around this time, I comb through the entire program manually making an itinerary for myself. But this year I decided to try something new – a programmatic way of going through the program, and then building a Shiny app that helps me better navigate the online program. The end result of the app is below. (I might tweak it a bit further after this post goes live, depending on feedback I receive.

Read more

Share Comments · · ·

REST APIs and Plumber

Moving R resources from development to production can be a challenge, especially when the resource isn’t something like a shiny application or rmarkdown document that can be easily published and consumed. Consider, as an example, a customer success model created in R. This model is responsible for taking customer data and returning a predicted outcome, like the likelihood the customer will churn. Once this model is developed and validated, there needs to be some way for the model output to be leveraged by other systems and individuals within the company.

Read more

Share Comments · · · ·

CVXR: A Direct Standardization Example

In our first blog post, we introduced CVXR, an R package for disciplined convex optimization, and showed how to model and solve a non-negative least squares problem using its interface. This time, we will tackle a non-parametric estimation example, which features new atoms as well as more complex constraints. Direct Standardization Consider a set of observations ((x_i,y_i)) drawn non-uniformly from an unknown distribution. We know the expected value of the columns of (X), denoted by (b \in {\mathbf R}^n), and want to estimate the true distribution of (y).

Read more

Share Comments · · ·

Monte Carlo Shiny: Part Three

In previous posts, we covered how to run a Monte Carlo simulation and how to visualize the results. Today, we will wrap that work into a Shiny app wherein a user can build a custom portfolio, and then choose a number of simulations to run and a number of months to simulate into the future. A link to that final Shiny app is here and here is a snapshot:

Read more

Share Comments · ·

Solver Interfaces in CVXR

Introduction In our previous blog post, we introduced CVXR, an R package for disciplined convex optimization. The package allows one to describe an optimization problem with Disciplined Convex Programming rules using high level mathematical syntax. Passing this problem definition along (with a list of constraints, if any) to the solve function transforms it into a form that can be handed off to a solver. The default installation of CVXR comes with two (imported) open source solvers:

Read more

Share Comments · ·

A First Look at NIMBLE

Writing a domain-specific language (DSL) is a powerful and fairly common method for extending the R language. Both ggplot2 and dplyr, for example, are DSLs. (See Hadley’s chapter in Advanced R for some elaboration.) In this post, I take a first look at NIMBLE (Numerical Inference for Statistical Models using Bayesian and Likelihood Estimation), a DSL for formulating and efficiently solving statistical models in general, and Bayesian hierarchical models in particular.

Read more

Share Comments · ·