CVXR: A Direct Standardization Example

In our first blog post, we introduced CVXR, an R package for disciplined convex optimization, and showed how to model and solve a non-negative least squares problem using its interface. This time, we will tackle a non-parametric estimation example, which features new atoms as well as more complex constraints. Direct Standardization Consider a set of observations ((x_i,y_i)) drawn non-uniformly from an unknown distribution. We know the expected value of the columns of (X), denoted by (b \in {\mathbf R}^n), and want to estimate the true distribution of (y).

Read more

Share Comments · · ·

Monte Carlo Shiny: Part Three

In previous posts, we covered how to run a Monte Carlo simulation and how to visualize the results. Today, we will wrap that work into a Shiny app wherein a user can build a custom portfolio, and then choose a number of simulations to run and a number of months to simulate into the future. A link to that final Shiny app is here and here is a snapshot:

Read more

Share Comments · ·

Solver Interfaces in CVXR

Introduction In our previous blog post, we introduced CVXR, an R package for disciplined convex optimization. The package allows one to describe an optimization problem with Disciplined Convex Programming rules using high level mathematical syntax. Passing this problem definition along (with a list of constraints, if any) to the solve function transforms it into a form that can be handed off to a solver. The default installation of CVXR comes with two (imported) open source solvers:

Read more

Share Comments · ·

A First Look at NIMBLE

Writing a domain-specific language (DSL) is a powerful and fairly common method for extending the R language. Both ggplot2 and dplyr, for example, are DSLs. (See Hadley’s chapter in Advanced R for some elaboration.) In this post, I take a first look at NIMBLE (Numerical Inference for Statistical Models using Bayesian and Likelihood Estimation), a DSL for formulating and efficiently solving statistical models in general, and Bayesian hierarchical models in particular.

Read more

Share Comments · ·

May 2018: “Top 40” New Packages

While looking over the 215 or so new packages that made it to CRAN in May, I was delighted to find several packages devoted to subjects a little bit out of the ordinary; for instance, bioacoustics analyzes audio recordings, freegroup looks at some abstract mathematics, RQEntangle computes quantum entanglement, stemmatology analyzes textual musical traditions, and treedater estimates clock rates for evolutionary models. I take this as evidence that R is expanding beyond its traditional strongholds of statistics and finance as people in other fields with serious analytic and computational requirements become familiar with the language.

Read more

Share Comments · · ·

Reading and analysing log files in the RRD database format

I have frequent conversations with R champions and Systems Administrators responsible for R, in which they ask how they can measure and analyze the usage of their servers. Among the many solutions to this problem, one of the my favourites is to use an RRD database and RRDtool. From Wikipedia: RRDtool (round-robin database tool) aims to handle time series data such as network bandwidth, temperatures or CPU load. The data is stored in a circular buffer based database, thus the system storage footprint remains constant over time.

Read more

Share Comments · · ·

Player Data for the 2018 FIFA World Cup

The World Cup starts today! The tournament which runs from June 14 through July 15 is probably the most popular sporting event in the world. if you are a soccer fan, you know that learning about the players and their teams and talking about it all with your friends greatly enhances the experience. In this post, I will show you how to gather and explore data for the 736 players from the 32 teams at the 2018 FIFA World Cup.

Read more

Share Comments ·

Monte Carlo Part Two

In a previous post, we reviewed how to set up and run a Monte Carlo (MC) simulation of future portfolio returns and growth of a dollar. Today, we will run that simulation many, many, times and then visualize the results.

Read more

Share Comments · · ·

Monte Carlo

Today, we change gears from our previous work on Fama French and run a Monte Carlo (MC) simulation of future portfolio returns. Monte Carlo relies on repeated, random sampling. We will sample based on two parameters: mean and standard deviation of portfolio returns. Our long-term goal (long-term == over the next two or three blog posts) is to build a Shiny app that allows an end user to build a custom portfolio, simulate returns and visualize the results.

Read more

Share Comments · · · ·

Exploring R Packages with cranly

In a previous post, I showed a very simple example of using the R function tools::CRAN_package_db() to analyze information about CRAN packages. CRAN_package_db() extracts the metadata CRAN stores on all of its 12,000 plus packages and arranges it into a “database”, actually a complicated data frame in which some columns have vectors or lists as entries. It’s simple to run the function and it doesn’t take very long on my Mac Book Air.

Read more

Share Comments · · ·