Calculating Always-Valid p-values in R

In this post, we will develop a framework for always-valid inference based on the paper Always Valid Inference: Continuous Monitoring of A/B Tests (2019 Johari, Pekelis, Walsh). Using an always-valid p-value allows us to continuously monitor A/B tests, and potentially stop the test early in a valid way1. In section 5 of the paper, the authors propose their method for calculating always-valid p-values: the mixture sequential ratio probability test (mSPRT), first introduced by Robbins (1970).

Read more

Share Comments · ·

Tech Dividends, Part 2

In a previous post, we explored the dividend history of stocks included in the SP500, and we followed that with exploring the dividend history of some NASDAQ tickers. Today’s post is a short continuation of that tech dividend theme, with the aim of demonstrating how we can take our previous work and use it to quickly visualize research from the real world. In this case, the inspiration is the July 27th edition of Barron’s, which has an article called 8 Tech Stocks That Yield Steady Payouts.

Read more

Share Comments · · · · · ·

Plumber Logging

The plumber R package is used to expose R functions as API endpoints. Due to plumber’s incredible flexibility, most major API design decisions are left up to the developer. One important consideration to be made when developing APIs is how to log information about API requests and responses. This information can be used to determine how plumber APIs are performing and how they are being utilized. An example of logging API requests in plumber is included in the package documentation.

Read more

Share Comments · · · ·

Tech Dividends, Part 1

In a previous post, we explored the dividend history of stocks included in the SP500. Today, we’ll extend that analysis to cover the Nasdaq because, well, because in the previous post I said I would do that. We’ll also explore a different source for dividend data, do some string cleaning and check out ways to customize a tooltip in plotly. Bonus feature: we’ll get into some animation too.

Read more

Share Comments · · · · ·

Validating Type I and II Errors in A/B Tests in R

In this post, we seek to develop an intuitive sense of what type I (false-positive) and type II (false-negative) errors represent when comparing metrics in A/B tests, in order to gain an appreciation for “peeking”, one of the major problems plaguing the analysis of A/B test today. To better understand what “peeking” is, it helps to first understand how to properly run a test. We will focus on the case of testing whether there is a difference between the conversion rates cr_a and cr_b for groups A and B.

Read more

Share Comments · · ·

June 2019 "Top 40" R Packages

Approximately 136 new packages stuck to CRAN in June. (This number is difficult to nail down with certainty because packages may be removed from CRAN after sitting there for a few days.) Here are my picks for the June “Top 40” in ten categories: Computational Methods, Data, Finance, Genomics, Machine Learning, Science and Medicine, Statistics, Time Series, Utilities, and Visualization. Computational Methods cppRouting v1.1: Provides functions to calculate distances, shortest paths and isochrones on weighted graphs using several variants of Dijkstra algorithm.

Read more

Share Comments · · ·

An R Users Guide to JSM 2019

If you are like me, and rather last minute about making a plan to get the most out of a large conference, you are just starting to think about JSM 2019 which will begin in just a few days. My plans always begin with an attempt to sleuth out the R-related sessions. While in the past it took quite a bit of work to identify talks that were likely backed by R-based calculations, this is clearly no longer the case.

Read more

Share Comments · ·

Three Strategies for Working with Big Data in R

For many R users, it’s obvious why you’d want to use R with big data, but not so obvious how. In fact, many people (wrongly) believe that R just doesn’t work very well for big data. In this article, I’ll share three strategies for thinking about how to use big data in R, as well as some examples of how to execute each of them. By default R runs only on data that can fit into your computer’s memory.

Read more

Share Comments · · · ·

Dividend Sleuthing with R

Welcome to a mid-summer edition of Reproducible Finance with R. Today, we’ll explore the dividend histories of some stocks in the S&P 500. By way of history for all you young tech IPO and crypto investors out there: way back, a long time ago in the dark ages, companies used to take pains to generate free cash flow and then return some of that free cash to investors in the form of dividends.

Read more

Share Comments · · · ·

Imagine your Data Before You Collect It

This post introduces the fabricatr package, whose role in the DeclareDesign suite of packages is to simulate data structure and variables. fabricatr helps you to think about your data before you start analysis or even collect it.

Read more

Share Comments · · · ·