We were recently presented with a problem where the decision maker wanted to understand how their data would naturally group together. The classic technique of k-means clustering was a natural choice; it’s well known, computationally efficient, and implemented in base R via the kmeans() function.
Our problem has a slight wrinkle: the decision maker wished to see the data grouped with (nearly) equal sizes. Now, a ‘true’ statistician would tell the client that the right thing to do from a theoretical perspective was to use native k-means results because some centers can simply have more nearby points than other centers.
Home | About | Contributors |