When starting to work with a new dataset, it is useful to quickly pinpoint which pairs of variables appear to be strongly related. It helps you spot data issues, make better modeling decisions, and ultimately arrive at better answers. The correlation coefficient is used widely for this purpose, but it is well-known that it cannot detect non-linear relationships. In this post, I suggest an alternative statistic based on the idea of mutual information that works for both continuous and categorical variables and which can detect linear and nonlinear relationships.

Home | About | Contributors |