October 2019: "Top 40" New R Packages

Two Hundred twenty-three new packages made it to CRAN in October. Here are my “Top 40” picks in ten categories: Computational Methods, Data, Genomics, Machine Learning, Mathematics, Medicine, Pharmacology, Statistics, Utilities, and Visualization. Computational Methods admmDensestSubmatrix v0.1.0: Implements a method to identify the densest sub-matrix in a given or sampled binary matrix. See Bombina et al. (2019) for the technical details and the vignette for examples. mbend v1.2.3: Provides functions to “bend”” non-positive-definite (symmetric) matrices to positive-definite matrices using weighted and unweighted methods.

Read more

Share Comments · · ·

IPO Exploration Part Two

In a previous post, we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how portfolios formed with those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way. Since there are several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive.

Read more

Share Comments · · · ·

A comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality. In this series of blog posts, I will compare different machine and deep learning methods to predict clothing categories from images using the Fashion MNIST data. In this first blog of the series, we will explore and prepare the data for analysis.

Read more

Share Comments · · · · · ·

A First Look at Confidence Distributions

Using a probability distribution to characterize uncertainty is at the core of statistical inference. So, it seems natural to try to summarize the information about the parameters in statistical models with probability distributions.

Read more

Share Comments · · · · · ·

Sept 2019: "Top 40" New R Packages

One hundred and thirteen new packages made it to CRAN in September. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Economics, Machine Learning, Statistics, Time Series, Utilities, and Visualization. Computational Methods eRTG3D v0.6.2: Provides functions to create realistic random trajectories in a 3-D space between two given fixed points (conditional empirical random walks), based on empirical distribution functions extracted from observed trajectories (training data), and thus reflect the geometrical movement characteristics of the mover.

Read more

Share Comments · · ·

IPO Exploration

Inspired by recent headlines like “Fear Overtakes Greed in IPO Market after WeWork Debacle” and “This Year’s IPO Class is Least Profitable since the Tech Bubble”, today we’ll explore historical IPO data, and next time we’ll look at the the performance of IPO-driven portfolios constructed during the ten-year period from 2004 to 2014. I’ll admit, I’ve often wondered how a portfolio that allocated money to new IPOs each year might perform since this has to be an ultimate example of a few headline-gobbling whales dominating the collective consciousness.

Read more

Share Comments · · · ·

Productionizing Shiny and Plumber with Pins

Producing an API that serves model results or a Shiny app that displays the results of an analysis requires a collection of intermediate datasets and model objects, all of which need to be saved. Depending on the project, they might need to be reused in another project later, shared with a colleague, used to shortcut computationally intensive steps, or safely stored for QA and auditing. Some of these should be saved in a data warehouse, data lake, or database, but write access to an appropriate database isn’t always available.

Read more

Share Comments · · · · · · ·

Building Interactive World Maps in Shiny

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality. In this post, I will show you how to create interactive world maps and how to show these in the form of an R Shiny app. As the Shiny app cannot be embedded into this blog, I will direct you to the live app and show you in this post on my GitHub how to embed a Shiny app in your R Markdown files, which is a really cool and innovative way of preparing interactive documents.

Read more

Share Comments · · · · ·

Multiple Hypothesis Testing in R

In the first article of this series, we looked at understanding type I and type II errors in the context of an A/B test, and highlighted the issue of “peeking”. In the second, we illustrated a way to calculate always-valid p-values that were immune to peeking. We will now explore multiple hypothesis testing, or what happens when multiple tests are conducted on the same family of data. We will set things up as before, with the false positive rate (\alpha = 0.

Read more

Share Comments · · ·

August 2019: "Top 40" R packages

Two hundred and twenty-seven new packages made it to CRAN in August. Quite a few were devoted to medical or genomic applications, and this is reflected in my “Top 40” selections, listed below in nine categories: Computational Methods, Data, Genomics, Machine Learning, Medicine and Pharma, Statistics, Time Series, Utilities, and Visualization. Computational Methods fmcmc v0.2-0: Provides a flexible Markov Chain Monte Carlo (MCMC) framework for implementing Metropolis-Hastings algorithms. Thee is a vignette on user-defined kernels and another on workflows.

Read more

Share Comments · · ·