Introduction to Functional Data Analysis with R

This post is meant to be a “gentle” introduction to doing Functional Data Analysis (FDA) with R for someone who is totally new to the subject. I will show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.

Read more

Share Comments · · · · · · · ·

March 2021: "Top 40" New CRAN Packages

Here are my Top 40 new CRAN packages for March 2021 in twelve categories: Computational Methods, Data, Engineering, Genomics, Machine Learning, Medicine, Music, Networks, Science, Statistics, Utility, and Visualization. Two of these categories Engineering and Music have only one entry each. However, I decided to give them their own category in order to draw attention to the use of R outside of the mainstream, and I have always lamented the fate of the Miscellaneous. In the same spirit, note that the complete works of the Bard appear among the packages in the Data category, and that due to tidypaleo Paleoenvironmental is now a thing in R.

Read more

Share Comments · · · · · · · · · · · · · · · ·

An Alternative to the Correlation Coefficient That Works For Numeric and Categorical Variables

When starting to work with a new dataset, it is useful to quickly pinpoint which pairs of variables appear to be strongly related. It helps you spot data issues, make better modeling decisions, and ultimately arrive at better answers. The correlation coefficient is used widely for this purpose, but it is well-known that it cannot detect non-linear relationships. In this post, I suggest an alternative statistic based on the idea of mutual information that works for both continuous and categorical variables and which can detect linear and nonlinear relationships.

Read more

Share Comments · · · ·

COVID-19 Data Forum: Data Journalism

The COVID-19 Data Forum, a joint project of the Stanford Data Science Institute and the R Consortium, is an ongoing series of multidisciplinary webinars open to the public where topic experts discuss data-related aspects of the scientific response to the pandemic. This post walks through the video recording of the most recent event held on March 18, 2021 which explored the role of data journalism in the pandemic. Comments and time stamps should be helpful in viewing the video.

Read more

Share Comments · · · · · · · · ·

What does it take to do a t-test?

In this post, I examine the fundamental assumption of independence underlying the basic Independent two-sample t-test for comparing the means of two random samples. In addition to independence, we assume that both samples are draws from normal distributions where the population means and common variance are unknown. I am going to assume that you are familiar with this kind of test, but even if you are not you are still in the right place.

Read more

Share Comments · · · · ·

February 2021: "Top 40" New CRAN Packages

In February, two hundred forty-three new packages made it to CRAN, many of them very interesting and at least one entertaining. It was exceptionally difficult to pick the “Top 40”, but here they are, more or less, in eleven categories: Computational Methods, Data, Finance, Games, Genomics, Machine Learning, Mathematics, Medicine, Networks and Graphs, Statistics, Utilities, and Visualization. iconr in the Networks and Graphs section is a package for doing computational archaeology, a relatively new field that I hope will dig R. I also hope that sassy in the Statistics sections helps some statisticians find their way to R.

Read more

Share Comments · · · ·

Cheat Sheets

In a previous post, I described how I was captivated by the virtual landscape imagined by the RStudio education team while looking for resources on the RStudio website. In this post, I’ll take a look at Cheatsheets, another amazing resource hiding in plain sight.

Read more

Share Comments · · · · · · · ·

2021 R Conferences

It is not yet clear what lasting impact the Covid-19 pandemic will ultimately have on R conferences. We are still adapting to our inability to attend large events, and trying to make the best of the “silver lining” of virtual events which permit worldwide participation. The following is an attempt to list 2021 conferences that are likely to have interesting R content. I suspect that it is incomplete. If you know of an R Conference that is not mentioned, please add it to the comments section for this post.

Read more

Share Comments · · ·

January 2020: "Top 40" New CRAN Packages

Two hundred thirty new packages made it to CRAN in January. Below are my “Top 40” selections (AlleleShift, autoharp, autoMrP, autostsm, aweSOM, bayesforecast, cachem, circularEV, cmprskcoxmsm, coder, dataquieR, eList, GenomeAdmixR, ggmulti, ggOceanMaps, ghcm, gplite, igoR, LPDynR, LSMRealOptions, Microsoft365R, MOSS, multibridge, NHSDataDictionaRy, OTrecod, pacviz, parallelPlot, partR2, pwt10, RandomForestsGLS, rgee, rtables, SAMtool, spNetwork, targets, thematic, torchaudio, trainR, ubms, and vimpclust) in ten categories: Data, Finance, Genomics, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities, and Visualization.

Read more

Share Comments · · ·

R Interface for MiniZinc

Constraint programming is a paradigm for solving combinatorial problems that draws on a wide range of techniques from artificial intelligence, computer science, and operations research. MiniZinc is a free and open-source constraint modeling language designed for formulating constraint satisfaction and discrete optimization problems. Models are compiled into an intermediate representation that is understood by a wide range of solvers.

Read more

Share Comments · · · · ·