Exploratory Functional PCA with Sparse Data

In this post, I pick up where my June 10th post left off and look at how one might explore a sparse, longitudinal data set with the FPCA tools provided in the fdapace package. I begin by highligting some of the really nice tools available in the brolgar package for doing exploratory longitudinal data analysis, and then explore what FPCA might contrbute to an exploratory analysis. Instead of using artifical data as I did in my previous three posts, I take advantage of the work done by the brolgar authors and use the wages data set that they feature in several of their vignettes.

Read more

Share Comments · · · · · ·

May 2021: "Top 40" New CRAN Packages

Two hundred five packages made it to CRAN in May, but at least seven were removed before this post went to print. Here are my “Top40” picks in ten categories: Computational Methods, Data, Genomics, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities, and Visualization.

Read more

Share Comments · · ·

Summer Conferences!

Summer is here, but it is not too late sign up for some summer conferences. The following short list promises interesting speakers, a wide range of topics and plenty of R content.

Read more

Share Comments · · ·

Functional PCA with R

In two previous posts, Introduction to Functional Data Analysis with R and Basic FDA Descriptive Statistics with R, I began looking into FDA from a beginners perspective. In this post, I would like to continue where I left off and investigate Functional Principal Components Analysis (FPCA), the analog of ordinary Principal Components Analysis in multivariate statistics. I begin with the math, and then show how to compute FPCs with R.

Read more

Share Comments · · · · · · · · ·

R for Public Health

The COVID19 pandemic has raised the profile of public health workers at all levels from the nurses and doctors working on the front lines at our hospitals, to high level state and federal public health officials. I think its a good bet that eighteen months ago few of us had any clear idea about how the public health care system works, or thought much about the people charged with the awesome responsibility to keep us safe.

Read more

Share Comments · · · ·

April 2021: "Top 40" New CRAN Packages

One hundred seventy-nine new packages made it to CRAN in April. Here are my “Top 40” picks in twelve categories: Computational Methods, Data, Genomics, Machine Learning, Mathematics, Medicine, Networks, Operations Research, Statistics, Time Series, Utilities, and Visualization.

Read more

Share Comments · · ·

Basic FDA Descriptive Statistics with R

In a previous post, I introduced the topic of Functional Data Analysis (FDA). In that post, I provided some background on Functional Analysis, the mathematical theory that makes FDA possible, identified FDA resources that might be of interest R users, and showed how to turn a series of data points into an FDA object. In this post, I will pick up where I left off and move on to doing some very basic FDA descriptive statistics.

Read more

Share Comments · · · · · ·

Introduction to Functional Data Analysis with R

This post is meant to be a “gentle” introduction to doing Functional Data Analysis (FDA) with R for someone who is totally new to the subject. I will show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.

Read more

Share Comments · · · · · · · ·

March 2021: "Top 40" New CRAN Packages

Here are my Top 40 new CRAN packages for March 2021 in twelve categories: Computational Methods, Data, Engineering, Genomics, Machine Learning, Medicine, Music, Networks, Science, Statistics, Utility, and Visualization. Two of these categories Engineering and Music have only one entry each. However, I decided to give them their own category in order to draw attention to the use of R outside of the mainstream, and I have always lamented the fate of the Miscellaneous. In the same spirit, note that the complete works of the Bard appear among the packages in the Data category, and that due to tidypaleo Paleoenvironmental is now a thing in R.

Read more

Share Comments · · · · · · · · · · · · · · · ·

An Alternative to the Correlation Coefficient That Works For Numeric and Categorical Variables

When starting to work with a new dataset, it is useful to quickly pinpoint which pairs of variables appear to be strongly related. It helps you spot data issues, make better modeling decisions, and ultimately arrive at better answers. The correlation coefficient is used widely for this purpose, but it is well-known that it cannot detect non-linear relationships. In this post, I suggest an alternative statistic based on the idea of mutual information that works for both continuous and categorical variables and which can detect linear and nonlinear relationships.

Read more

Share Comments · · · ·