Functional PCA with R

In two previous posts, Introduction to Functional Data Analysis with R and Basic FDA Descriptive Statistics with R, I began looking into FDA from a beginners perspective. In this post, I would like to continue where I left off and investigate Functional Principal Components Analysis (FPCA), the analog of ordinary Principal Components Analysis in multivariate statistics. I begin with the math, and then show how to compute FPCs with R.

Read more

Share Comments · · · · · · · · ·

R for Public Health

The COVID19 pandemic has raised the profile of public health workers at all levels from the nurses and doctors working on the front lines at our hospitals, to high level state and federal public health officials. I think its a good bet that eighteen months ago few of us had any clear idea about how the public health care system works, or thought much about the people charged with the awesome responsibility to keep us safe.

Read more

Share Comments · · · ·

April 2021: "Top 40" New CRAN Packages

One hundred seventy-nine new packages made it to CRAN in April. Here are my “Top 40” picks in twelve categories: Computational Methods, Data, Genomics, Machine Learning, Mathematics, Medicine, Networks, Operations Research, Statistics, Time Series, Utilities, and Visualization.

Read more

Share Comments · · ·

Basic FDA Descriptive Statistics with R

In a previous post, I introduced the topic of Functional Data Analysis (FDA). In that post, I provided some background on Functional Analysis, the mathematical theory that makes FDA possible, identified FDA resources that might be of interest R users, and showed how to turn a series of data points into an FDA object. In this post, I will pick up where I left off and move on to doing some very basic FDA descriptive statistics.

Read more

Share Comments · · · · · ·

Introduction to Functional Data Analysis with R

This post is meant to be a “gentle” introduction to doing Functional Data Analysis (FDA) with R for someone who is totally new to the subject. I will show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.

Read more

Share Comments · · · · · · · ·

March 2021: "Top 40" New CRAN Packages

Here are my Top 40 new CRAN packages for March 2021 in twelve categories: Computational Methods, Data, Engineering, Genomics, Machine Learning, Medicine, Music, Networks, Science, Statistics, Utility, and Visualization. Two of these categories Engineering and Music have only one entry each. However, I decided to give them their own category in order to draw attention to the use of R outside of the mainstream, and I have always lamented the fate of the Miscellaneous. In the same spirit, note that the complete works of the Bard appear among the packages in the Data category, and that due to tidypaleo Paleoenvironmental is now a thing in R.

Read more

Share Comments · · · · · · · · · · · · · · · ·

An Alternative to the Correlation Coefficient That Works For Numeric and Categorical Variables

When starting to work with a new dataset, it is useful to quickly pinpoint which pairs of variables appear to be strongly related. It helps you spot data issues, make better modeling decisions, and ultimately arrive at better answers. The correlation coefficient is used widely for this purpose, but it is well-known that it cannot detect non-linear relationships. In this post, I suggest an alternative statistic based on the idea of mutual information that works for both continuous and categorical variables and which can detect linear and nonlinear relationships.

Read more

Share Comments · · · ·

COVID-19 Data Forum: Data Journalism

The COVID-19 Data Forum, a joint project of the Stanford Data Science Institute and the R Consortium, is an ongoing series of multidisciplinary webinars open to the public where topic experts discuss data-related aspects of the scientific response to the pandemic. This post walks through the video recording of the most recent event held on March 18, 2021 which explored the role of data journalism in the pandemic. Comments and time stamps should be helpful in viewing the video.

Read more

Share Comments · · · · · · · · ·

What does it take to do a t-test?

In this post, I examine the fundamental assumption of independence underlying the basic Independent two-sample t-test for comparing the means of two random samples. In addition to independence, we assume that both samples are draws from normal distributions where the population means and common variance are unknown. I am going to assume that you are familiar with this kind of test, but even if you are not you are still in the right place.

Read more

Share Comments · · · · ·

February 2021: "Top 40" New CRAN Packages

In February, two hundred forty-three new packages made it to CRAN, many of them very interesting and at least one entertaining. It was exceptionally difficult to pick the “Top 40”, but here they are, more or less, in eleven categories: Computational Methods, Data, Finance, Games, Genomics, Machine Learning, Mathematics, Medicine, Networks and Graphs, Statistics, Utilities, and Visualization. iconr in the Networks and Graphs section is a package for doing computational archaeology, a relatively new field that I hope will dig R. I also hope that sassy in the Statistics sections helps some statisticians find their way to R.

Read more

Share Comments · · · ·