Statistics in Glaucoma: Part II

This blog post is the second installment of a three-part series that introduces the role of statistical methods in glaucoma disease management, and the importance of software in glaucoma research. Part I provides an introduction to glaucoma and the use of visual fields for diagnosis purposes. Part II provides a case study applying a novel Bayesian method to learn about glaucoma progression and its use clinically. Finally, Part III details future directions for statistics in glaucoma, and details the importance of accessible software for use in clinical practice.

Read more

Share Comments · · · ·

Statistics in Glaucoma: Part I

This blog post is the first installment of a three-part series that introduces the role of statistical methods in glaucoma disease management, and the importance of software in glaucoma research. Part I provides an introduction to glaucoma and the use of visual fields for diagnosis purposes. Part II will provide a case study, applying a novel Bayesian method to learn about glaucoma progression and its use clinically. Finally, Part III will provide some details of future directions for statistics in glaucoma and the importance of accessible software for use in clinical practice.

Read more

Share Comments · · · ·

October 2018: “Top 40” New Packages

One hundred eighty-five new packages made it to CRAN in October. Here are my picks for the “Top 40” in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization. Computational Methods compboost v0.1.0: Provides a C++ implementation of component-wise boosting written to obtain high run-time performance and full memory control. The vignette shows how to use the package. RcppEnsmallen v0.1.10.0.1: Implements an interface to the C++ based Ensmallen mathematical optimization library that provides a simple set of abstractions for writing an objective function to optimize.

Read more

Share Comments · · · ·

Slack and Plumber, Part Two

This is the final entry in a three-part series about the plumber package. The first post introduces plumber as an R package for building REST API endpoints in R. The second post builds a working example of a plumber API that powers a Slack slash command. In this final entry, we will secure the API created in the previous post so that it only responds to authenticated requests, and deploy it using RStudio Connect.

Read more

Share Comments · · · ·

Many Factor Models

Today, we will return to the Fama French (FF) model of asset returns and use it as a proxy for fitting and evaluating multiple linear models. In a previous post, we reviewed how to run the FF three-factor model on the returns of a portfolio. That is, we ran one model on one set of returns. Today, we will run multiple models on multiple streams of returns, which will allow us to compare those models and hopefully build a code scaffolding that can be used when we wish to explore other factor models.

Read more

Share Comments · · ·

A Mathematician's Perspective on Topological Data Analysis and R

A few years ago, when I first became aware of Topological Data Analysis (TDA), I was really excited by the possibility that the elegant theorems of Algebraic Topology could provide some new insights into the practical problems of data analysis. But time has passed, and the sober assessment of Larry Wasserman seems to describe where things stand. TDA is an exciting area and is full of interesting ideas. But so far, it has had little impact on data analysis.

Read more

Share Comments · · ·

In-database xgboost predictions with R

Moving predictive machine learning algorithms into large-scale production environments can present many challenges. For example, problems arise when attempting to calculate prediction probabilities (“scores”) for many thousands of subjects using many thousands of features located on remote databases. xgboost (docs), a popular algorithm for classification and regression, and the model of choice in many winning Kaggle competitions, is no exception. However, to run xgboost, the subject-features matrix must be loaded into memory, a cumbersome and expensive process.

Read more

Share Comments · · · ·

Communicating results with R Markdown

In my training as a consultant, I learned that long hours of analysis were typically followed by equally long hours of preparing for presentations. I had to turn my complex analyses into recommendations, and my success as a consultant depended on my ability to influence decision makers. I used a variety of tools to convey my insights, but over time I increasingly came to rely on R Markdown as my tool of choice.

Read more

Share Comments · · ·

Reproducible Finance, the book! And a discount for our readers

I’m thrilled to announce the release of my new book Reproducible Finance with R: Code Flows and Shiny Apps for Portfolio Analysis, which originated as a series of R Views posts in this space. The first post was written way back in November of 2016 - thanks to all the readers who have supported us along the way! If you are familiar with the R Views posts, then you probably have a pretty good sense for the book’s style, prose, and code approach, but I’d like to add a few quick words of background.

Read more

Share Comments · · ·

CRAN’s New Missing Data Task View

It is a relatively rare event, and cause for celebration, when CRAN gets a new Task View. This week the r-miss-tastic team: Julie Josse, Nicholas Tierney and Nathalie Vialaneix launched the Missing Data Task View. Even though I did some research on R packages for a post on missing values a couple of years ago, I was dumbfounded by the number of packages included in the new Task View. This single page not only describes what R has to offer with respect to coping with missing data, it is probably the world’s most complete index of statistical knowledge on the subject.

Read more

Share Comments · · · ·