Deep learning at rstudio::conf 2018

Two weeks ago, rstudio::conf 2018 was held in San Diego. We had 1,100 people attend the sold-out event. In this post, I summarize my experience of the talks on the topic of deep learning with R, including the keynote by J.J. Allaire. Keynote The keynote on the second day was J.J. Allaire discussing “Machine Learning with Tensorflow and R”. In this talk, J.J. took us on a tour of how to use TensorFlow with R.

Read more

Share Comments · · · · · ·

Calculating Beta in the Capital Asset Pricing Model

Today we will continue our portfolio fun by calculating the CAPM beta of our portfolio returns. That will entail fitting a linear model and, when we get to visualization next time, considering the meaning of our results from the perspective of asset returns. By way of brief background, the Capital Asset Pricing Model (CAPM) is a model, created by William Sharpe, that estimates the return of an asset based on the return of the market and the asset’s linear relationship to the return of the market.

Read more

Share Comments · · · ·

Cost-Effective BigQuery with R

Introduction Companies using Google BigQuery for production analytics often run into the following problem: the company has a large user hit table that spans many years. Since queries are billed based on the fields accessed, and not on the date-ranges queried, queries on the table are billed for all available days and are increasingly wasteful. A solution is to partition the table by date, so that users can query a particular range of dates; saving costs and decreasing query duration.

Read more

Share Comments ·

Dec 2017: "Top 40" New Package Picks

Sometimes it appears to me that the invisible hand economists speak of guides the market for new R packages. Eight of the 129 new packages that stuck to CRAN in December fall under Computational Methods, a category I have only recently begun using. All of them made it into the list below of my “Top 40” picks. One day, I would like to go back and reexamine the categories I have been using to see if package developers really do respond to some idea that is “in the air” or whether the variation in categories is just one more of my many hidden biases.

Read more

Share Comments · · ·

Package Management for Reproducible R Code

Any programming environment should be optimized for its task, and not all tasks are alike. For example, if you are exploring uncharted mountain ranges, the portability of a tent is essential. However, when building a house to weather hurricanes, investing in a strong foundation is important. Similarly, when beginning a new data science programming project, it is prudent to assess how much effort should be put into ensuring the code is reproducible.

Read more

Share Comments · · · · · ·

Fitting a TensorFlow Linear Classifier with tfestimators

In a recent post, I mentioned three avenues for working with TensorFlow from R: * The keras package, which uses the Keras API for building scaleable, deep learning models * The tfestimators package, which wraps Google’s Estimators API for fitting models with pre-built estimators * The tensorflow package, which provides an interface to Google’s low-level TensorFlow API In this post, Edgar and I use the linear_classifier() function, one of six pre-built models currently in the tfestimators package, to train a linear classifier using data from the titanic package.

Read more

Share Comments · ·

Introduction to Kurtosis

Happy 2018 and welcome to our first reproducible finance post of the year! What better way to ring in a new beginning than pondering/calculating/visualizing returns distributions. We ended 2017 by tackling skewness, and we will begin 2018 by tackling kurtosis.

Read more

Share Comments · · · ·

Downtime Reading

Not everyone has the luxury of taking some downtime at the end the year, but if you do have some free time, you may enjoy something on my short list of downtime reading. The books and articles here are not exactly “light reading”, nor are they literature for cuddling by the fire. Nevertheless, you may find something that catches your eye. The Syncfusion series of free eBooks contains more than a few gems on a variety of programming subjects, including James McCaffrey’s R Programming Succinctly and Barton Poulson’s R Succinctly.

Read more

Share Comments · · · · · ·

Nov 2017: New Package Picks

Two hundred thirty-seven new packages made it to CRAN in November. Here are my picks for the “Top 40” organized into the categories: Computational Methods, Data, Data Science, Science, Social Science, Utilities and Visualizations. Computational Methods CVXR v0.94-4: Implements an object-oriented modeling language for disciplined convex programming (DCP) which allows users to formulate and solve convex optimization problems. The vignette introduces the package. Look here for examples and theory. PreciseSums v0.

Read more

Share Comments · · · ·

A Data Science Lab for R

In a previous post I described the role of analytic administrator as a data scientist who: onboards new tools, deploys solutions, supports existing standards, and trains other data scientists. In this post I will describe how someone in that role might set up a data science lab for R. Architecture A data science lab is an environment for developing code and creating content. It should enhance the productivity of your data scientists and integrate with your existing systems.

Read more

Share Comments · · · ·