In-Database Logistic Regression with R

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. In a previous article we illustrated how to calculate xgboost model predictions in-database. This was referenced and incorporated into tidypredict. After learning more about what the tidypredict team is up to, I discovered another tidyverse package called modeldb that fits models in-database. It currently supports linear regression and k-means clustering, so I thought I would provide an example of how to do in-database logistic regression.

Read more

Share Comments · · ·

Introducing sortable to add drag-and-drop to your shiny apps

You can use the sortable package to add drag-and-drop behaviour to shiny apps.

Read more

Share Comments · · · · · ·

October 2019: "Top 40" New R Packages

Two Hundred twenty-three new packages made it to CRAN in October. Here are my “Top 40” picks in ten categories: Computational Methods, Data, Genomics, Machine Learning, Mathematics, Medicine, Pharmacology, Statistics, Utilities, and Visualization. Computational Methods admmDensestSubmatrix v0.1.0: Implements a method to identify the densest sub-matrix in a given or sampled binary matrix. See Bombina et al. (2019) for the technical details and the vignette for examples. mbend v1.2.3: Provides functions to “bend”” non-positive-definite (symmetric) matrices to positive-definite matrices using weighted and unweighted methods.

Read more

Share Comments · · ·

IPO Exploration Part Two

In a previous post, we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how portfolios formed with those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way. Since there are several hundred IPOs for which we need to pull returns data, today’s post will be a bit data intensive.

Read more

Share Comments · · · ·

A comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality. In this series of blog posts, I will compare different machine and deep learning methods to predict clothing categories from images using the Fashion MNIST data. In this first blog of the series, we will explore and prepare the data for analysis.

Read more

Share Comments · · · · · ·

A First Look at Confidence Distributions

Using a probability distribution to characterize uncertainty is at the core of statistical inference. So, it seems natural to try to summarize the information about the parameters in statistical models with probability distributions.

Read more

Share Comments · · · · · ·

Sept 2019: "Top 40" New R Packages

One hundred and thirteen new packages made it to CRAN in September. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Economics, Machine Learning, Statistics, Time Series, Utilities, and Visualization. Computational Methods eRTG3D v0.6.2: Provides functions to create realistic random trajectories in a 3-D space between two given fixed points (conditional empirical random walks), based on empirical distribution functions extracted from observed trajectories (training data), and thus reflect the geometrical movement characteristics of the mover.

Read more

Share Comments · · ·

IPO Exploration

Inspired by recent headlines like “Fear Overtakes Greed in IPO Market after WeWork Debacle” and “This Year’s IPO Class is Least Profitable since the Tech Bubble”, today we’ll explore historical IPO data, and next time we’ll look at the the performance of IPO-driven portfolios constructed during the ten-year period from 2004 to 2014. I’ll admit, I’ve often wondered how a portfolio that allocated money to new IPOs each year might perform since this has to be an ultimate example of a few headline-gobbling whales dominating the collective consciousness.

Read more

Share Comments · · · ·

Productionizing Shiny and Plumber with Pins

Producing an API that serves model results or a Shiny app that displays the results of an analysis requires a collection of intermediate datasets and model objects, all of which need to be saved. Depending on the project, they might need to be reused in another project later, shared with a colleague, used to shortcut computationally intensive steps, or safely stored for QA and auditing. Some of these should be saved in a data warehouse, data lake, or database, but write access to an appropriate database isn’t always available.

Read more

Share Comments · · · · · · ·

Building Interactive World Maps in Shiny

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality. In this post, I will show you how to create interactive world maps and how to show these in the form of an R Shiny app. As the Shiny app cannot be embedded into this blog, I will direct you to the live app and show you in this post on my GitHub how to embed a Shiny app in your R Markdown files, which is a really cool and innovative way of preparing interactive documents.

Read more

Share Comments · · · · ·