How to Avoid Publishing Credentials in Your Code

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When accessing an API or database in R, it is often necessary to provide credentials such as a login name and password. You may find yourself being prompted with something like this: When writing an R script that requires a user to provide credentials, you will want a way to have the script prompt the user or, better yet, programatically provided the credentials in the R script.

Read more

Share Comments · ·

The reticulate package solves the hardest problem in data science: people

Andrew Mangano is the Director of eCommerce Analytics at Albertsons Companies. Part I - Modelling The reticulate package integrates Python within R and, when used with RStudio 1.2, brings the two languages together like never before. Much more important than the technical details of how it all works is the impact that it has on on both individuals and teams by enabling data scientists who speak different languages to collaborate seamlessly on a project.

Read more

Share Comments · · ·

Parsnipping Fama French

Today, we will continue our exploration of developments in the world of tidy models, and we will stick with our usual Fama French modeling flow to do so. For new readers who want get familiar with Fama French before diving into this post, see here where we covered importing and wrangling the data, here where we covered rolling models and visualization, and here where we covered managing many models.

Read more

Share Comments · · · · ·

Paid in Books: An Interview with Christian Westergaard

R is greatly benefiting from new users coming from disciplines that traditionally did not provoke much serious computation. Journalists1 and humanist scholars2, for example, are embracing R. But, does the avenue from the Humanities go both ways? In a recent conversation with Christian Westergaard, proprietor of Sophia Rare Books in Copenhagen, I was delighted to learn that it does. JBR: I was very pleased to learn when I spoke with you recently at the California Antiquarian Book Fair that you were an S and S+ user in graduate school.

Read more

Share Comments · ·

Graph analysis using the tidyverse

Walk-through of how to use tidyverse, along with tidygraph and ggraph to easily analyze graph data.

Read more

Share Comments · · · · ·

Some R Packages for ROC Curves

In a recent post, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful packages. Although I began with a few ideas about packages that I wanted to talk about, like ROCR and pROC, which I have found useful in the past, I decided to use Gábor Csárdi’s relatively new package pkgsearch to search through CRAN and see what’s out there.

Read more

Share Comments · · ·

January 2019: “Top 40” New CRAN Packages

One hundred and fifty-three new packages made it to CRAN in January. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization. Computational Methods cPCG v1.0: Provides a function to solve systems of linear equations using a (preconditioned) conjugate gradient algorithm. The vignette shows how to use the package. RcppDynProg v0.1.1: Implements dynamic programming using Rcpp. Look here for examples.

Read more

Share Comments · · ·

A Few New R Books

Greg Wilson is a data scientist and professional educator at RStudio. As a newcomer to R who prefers to read paper rather than pixels, I’ve been working my way through a more-or-less random selection of relevant books over the past few months. Some have discussed topics that I’m already familiar with in the context of R, while others have introduced me to entirely new subjects. This post describes four of them in brief; I hope to follow up with a second post in a few months as I work through the backlog on my desk.

Read more

Share Comments · · ·

A Look Back on 2018: Part 2

Welcome to the second installment of Reproducible Finance 2019! In the previous post, we looked back on the daily returns for several market sectors in 2018. Today, we’ll continue that theme and look at some summary statistics for 2018, and then extend out to previous years and different ways of visualizing our data.

Read more

Share Comments · · ·

R for Quantitative Health Sciences: An Interview with Jarrod Dalton

This interview came about through researching R-based medical applications in preparation for the upcoming R/Medicine conference. When we discovered the impressive number of Shiny-based Risk Calculators developed by the Cleveland Clinic and implemented in public-facing sites, we wanted to learn more about the influence of R Language in the development of statistical science at this prominent institution. We were fortunate to have Jarrod Dalton of the Quantitative Health Sciences Department grant this interview.

Read more

Share Comments · · ·