Solving Data Science's First Mile Problem

At data.world, we are out to solve the “first mile problem of data science”: helping people obtain and understand the data sets they need. Everybody starts here whether they are analyzing their fantasy football league or working on the Zika pandemic. But unfortunately, as Professor Eric Schwartz at the University of Michigan said to me recently: “for so many people, the first mile of data science is the last mile, because people just quit when they see the data.

Read more

Share Comments · · · ·

Organizing DataFest the tidy way

Organizing an event can be a full-time task in and of its own. I have been organizing ASA DataFest for six years at Duke, and over this time, the number of participants has grown from 23 students from Duke only, to 360 students from seven area schools this year! First, a bit about ASA DataFest: ASA DataFest is a data “hackathon” for students around the U.S., Canada, and Germany (for now; this list has been growing each year).

Read more

Share Comments · · · ·

March '17 Tips and Tricks

This month’s Tips and Tricks focus is on file navigation. Many of these tips are straightforward - more tip than trick - but they can save quite a bit of time and frustration. Open Recent Please don’t spend time navigating through your folder structure to find a recent project or file. Go to File/Function RStudio automatically indexes all open files as well as files in the current project. The index includes function calls and objects within the files.

Read more

Share Comments · · · ·

R and Singularity

R (https://www.r-project.org) is a premier system for statistical and scientific computing and data science. At its core, R is a very carefully curated high-level interface to low-level numerical libraries. True to this principle, R packages have greatly expanded the scope and number of these interfaces over the years, among them interfaces to a large number of distributed and parallel computing tools. Despite its impressive breadth of sophisticated high-performance computing (HPC) tools, R is not often that widely used for “big” problems.

Read more

Share Comments · · · · · · ·

Some Random Weekend Reading

Few of us have enough time to read, and most of us already have depressingly deep stacks of material that we would like to get through. However, sometimes a random encounter with something interesting is all that it takes to regenerate enthusiasm. Just in case you are not going to get to a book store with a good technical section this weekend, here are a few not-quite-random reads.

Read more

Share Comments · · ·

February 2017 New Package Picks

One hundred and forty-five new packages were added to CRAN in February. Here are 47 interesting packages organized into five categories; Biostatistics, Data, Data Science, Statistics and Utilities. Biostatistics BaTFLED3D v0.1.7: Implements a machine learning algorithm to make predictions and determine interactions in data that varies along three independent modes. It was developed to predict the growth of cell lines when treated with drugs at different doses. The vignette shows an example with simulated data.

Read more

Share Comments ·

Quandl and Forecasting

A Notebook for importing oil prices from Quandl, making a 6-month forecast, and visualizig the results

Read more

Share Comments · · · ·

Why I love R Notebooks

Note: R Notebooks requires RStudio Version 1.0 or later I’m a big fan of the R console. During my early years with R, that’s all I had, so I got very comfortable with pasting my code into the console. Since then I’ve used many code editors for R, but they all followed the same paradigm – script in one window and get output in another window. Notebooks on the other hand combine code, output, and narrative into a single document.

Read more

Share Comments ·

Madrid R User Group, A Brief History

(Editors note: A Spanish verison of the post follows the English text) In the first meeting we were 5, now we are consistently over 60. It was not difficult for us to start up the group of users of R of Madrid. Gregorio Serrano, Carlos Gil Bellosta, Pedro Concejero and I started our own help list R-help-es to regularly answers questions within this new community. It was in March 2012 when we first got together in a classroom of the Faculty of Economics of one of the Madrid’s public Universities.

Read more

Share Comments ·

Interactive Maps and ETF Analysis

In this post, I’ll describe a Shiny app to support the Emerging Markets ETF Country Exposure analysis developed in a previous post I have done some additional work and updated the analysis to include five ETFs in the app, whereas we originally imported data on 1 ETF. The new notebook is available here. As we start to build our Shiny app, we will assume that our underlying Notebook has been used to build the desired ETF data objects and spatial data frame, and that those have been saved in an appropriately named .

Read more

Share Comments · · ·