In this post, I will walk you through some examples that show off the major features of the
ggforce package. The main goal is to share a few ideas about customizing visualizations that you may find useful in your everyday work.
ggforce package is an extension to
ggplot2 developed by Thomas Pedersen. Thanks to
ggforce, you can enhance almost any
ggplot by highlighting data groupings, and focusing attention on interesting features of the plot. The package contains
facets, and other
ggplot functions. Among such functions, there are some for marking the convex hull of a set of points, jittering data, and creating Voronoi plots.
The examples in this article will use data from the
nycflights13 package. Most of the examples will build on the same basic
ggplot that visualizes airports by geographical location. I am using this data set because it makes it easy to plot
y coordinates without having to remember what they “mean”. This basic plot will be saved to a variable, and then that variable will be used as the base of the examples of enhancing the visualization using
library(tidyverse) library(ggforce) library(nycflights13) p <- airports %>% filter(lon < 0, tzone != "\\N") %>% ggplot(aes(lon, lat, color = tzone)) + geom_point(show.legend = FALSE) p
I have long been waiting for an easy way to draw an outline around groups of data. The
geom_mark_...() family of functions does exactly that. There are four
mark functions in
ggforce, all different based on the shape they draw around the group:
Let’s start with
geom_mark_rect(); it will draw a rounded rectangle around each time zone group.
p + geom_mark_rect()
Like magic! The rectangles look amazing, even without modifying any arguments. Of course, more customization is possible via setting arguments. In this post, I will review some of the many great arguments available in
ggforce functions, but I don’t want to rob you of the fun of trying it yourself and discovering all of the different options.
Label, and an arrow!
This next addition to our plot deserves its own subheading. Adding a label and an arrow pointing to a group would typically be a major undertaking. Without
ggforce, this would require manually adding both the text and the arrow to the
ggplot. But, with
geom_mark it is a simple as setting the
label argument. So, without further ado, here is the
label argument in action:
p + geom_mark_rect(aes(label = tzone))
The labels and arrows are not only drawn, but they are also placed in an optimized location. In addition, the position will recalculate if the plot is re-sized! There are too many little details about this
label argument to mention. The backdrop is automatically white, the indicator is not really an arrow, it is a simple line that also underlines the text, so it is easy for the eye to know which group belongs to which label.
It is now easy to finalize the plot by resetting the theme, and again suppressing the legend using
p + geom_mark_rect(aes(label = tzone), show.legend = FALSE) + theme_void()
There are many cases where drawing a rectangle or circle around the groups is not ideal, or even preferable. The
geom_mark_hull() essentially traces a more complex polygon around the shape of the outline of the group.
p + geom_mark_hull(aes(label = tzone)) + theme_void()
Again, without adding any arguments to the function, the traced outline already looks wonderful. Another option to add now is
fill. And since the legend table is now redundant, it can be suppressed by setting
p + geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE) + theme_void()
Notice that the fill color is not totally opaque; by default,
ggforce has set the translucency lower to make sure that the dots are visible. This is something that I would have done anyway, usually by adding the
alpha argument. In this case, it saves having to remember to add that argument.
Another adjustment that I thought was important for this plot was to modify the size of the hull, to change the padding around the outline of the group. The
expand argument controls this aesthetic; it is possible to change it using the
p + geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) + theme_void()
To finalize plots such as this one, it is necessary to remove most components from the default theme. Usually,
theme_void() does the trick. For printed or online articles with white backgrounds, which is essentially all of them, it is often hard to determine the margins of the plot.
theme_no_axes() provides a great compromise by removing all but the one element.
p + geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) + theme_no_axes()
Another facet of
ggforce, and it’s magnify-cent
It is common to produce two plots, one to show the full picture, and another to magnify or focus on a specific area. With
facet_zoom(), it is incredibly easy to show “macro” and “micro” in one plot by using the same
ylim arguments to focus on an area of a plot.
p + facet_zoom(xlim = c(-155, -160.5), ylim = c(19, 22.3))
Skip the coordinates
Another cool feature of
facet_zoom() is the ability to set the zoom region based on a row selection. To do this, simply pass an expression that you would use in a function such as
filter() to the facet. So instead of using coordinates, I just tell the facet to zoom on anything that has a Pacific/Honolulu time zone.
p + facet_zoom(xy = tzone == "Pacific/Honolulu")
Putting it all together, with three lines of code
Using what has been covered so far, it is easy go from a very simple point plot to a sophisticated and nice-looking visualization with just three lines of code, thanks to
p + geom_mark_hull(aes(label = tzone, fill = tzone), show.legend = FALSE, expand = unit(3, "mm")) + theme_no_axes() + facet_zoom(x = tzone == "America/Los_Angeles")
“What is a Voronoi?”
This section title is based on my first reaction when I heard the word “Voronoi”. I have since learned about it, and can see why the Voronoi Diagram can be useful for very specific use cases. The good news is that if you encounter one of those use cases, you know that it is easy to draw it up in
The idea behind a Voronoi diagram is to split the area of the plot into as many sections as there are points. Unlike a grid or heat map, Voronoi draws custom shapes for each point based on the proximity to other points. It returns a plot that looks like stained glass. This can be good to determine the closest point inside each area. For example, a retailer can use it to see the area their store locations cover, and can help them make decisions to optimize their location based on the size of each Voronoi shape.
The following example will focus on airports in Alaska. The
ggplot will zoom into that state’s general location, and then trace a hull shape. The hull will provide a quasi-map overlay. The final step is to add the Voronoi diagram layer by calling the function:
p + geom_mark_hull(aes(fill = tzone), expand = unit(3, "mm")) + coord_cartesian(xlim = c(-130, -180), ylim = c(50, 75)) + geom_voronoi_segment()
Parallel to alluvial
geom_parallel... functions allow visualizing interactions between categorical variables. The implementation is generic enough to create Sankey or alluvial charts.
For this, I will use the Manufacturer and Engine data from the
planes table inside
nycflights13. In this case, some simple data preparation is needed first.
prep_planes <- planes %>% filter(year > 1998, year < 2005) %>% filter(engine != "Turbo-shaft") %>% select(manufacturer, engine) %>% head(500) prep_planes
## # A tibble: 500 x 2 ## manufacturer engine ## <chr> <chr> ## 1 EMBRAER Turbo-fan ## 2 AIRBUS INDUSTRIE Turbo-fan ## 3 AIRBUS INDUSTRIE Turbo-fan ## 4 EMBRAER Turbo-fan ## 5 AIRBUS INDUSTRIE Turbo-fan ## 6 AIRBUS INDUSTRIE Turbo-fan ## 7 AIRBUS INDUSTRIE Turbo-fan ## 8 AIRBUS INDUSTRIE Turbo-fan ## 9 AIRBUS INDUSTRIE Turbo-fan ## 10 EMBRAER Turbo-fan ## # … with 490 more rows
Prep for plotting with one line
gather_set_data() is a convenience function that, just like
gather(), creates a single line for each combination of categorical variables. The table contains three new columns -
y - which contain the combinations that each new row represents, and the row ID number from the original table.
prep_planes %>% gather_set_data(1:2)
## # A tibble: 1,000 x 5 ## manufacturer engine id x y ## <chr> <chr> <int> <chr> <chr> ## 1 EMBRAER Turbo-fan 1 manufacturer EMBRAER ## 2 AIRBUS INDUSTRIE Turbo-fan 2 manufacturer AIRBUS INDUSTRIE ## 3 AIRBUS INDUSTRIE Turbo-fan 3 manufacturer AIRBUS INDUSTRIE ## 4 EMBRAER Turbo-fan 4 manufacturer EMBRAER ## 5 AIRBUS INDUSTRIE Turbo-fan 5 manufacturer AIRBUS INDUSTRIE ## 6 AIRBUS INDUSTRIE Turbo-fan 6 manufacturer AIRBUS INDUSTRIE ## 7 AIRBUS INDUSTRIE Turbo-fan 7 manufacturer AIRBUS INDUSTRIE ## 8 AIRBUS INDUSTRIE Turbo-fan 8 manufacturer AIRBUS INDUSTRIE ## 9 AIRBUS INDUSTRIE Turbo-fan 9 manufacturer AIRBUS INDUSTRIE ## 10 EMBRAER Turbo-fan 10 manufacturer EMBRAER ## # … with 990 more rows
ggplot is primed with
x, and then new aesthetics:
id, we pass the
y, and finally,
value is fixed to 1. The
value is used to express the amount of “thickness” to add to that particular relationship; using 1 means that all combinations are weighted the same. At this point, the only argument to pass
geom_parallel_sets() will be the color
fill; in this case we will use
Plotting with parallel
prep_planes %>% gather_set_data(1:2) %>% ggplot(aes(x, id = id, split = y, value = 1)) + geom_parallel_sets(aes(fill = engine))
The plot shows how a specific plane’s engine relates to each of the manufacturers. Next
geom_parallel_sets_axes() provides a terminal box; the
axis.width argument is the only one necessary to use at this stage, and we will set it to 0.1. The labels are added by using
geom_parallel_sets_labels(), and they are automatically rotated.
prep_planes %>% gather_set_data(1:2) %>% ggplot(aes(x, id = id, split = y, value = 1)) + geom_parallel_sets(aes(fill = engine)) + geom_parallel_sets_axes(axis.width = 0.1) + geom_parallel_sets_labels()
The following is done to finalize the plot:
geom_parallel_sets()- Hide the legend and lower the
geom_parallel_sets_axes()- Change the fill color and font color
geom_parallel_sets_labels()- Remove the rotation of the label
prep_planes %>% gather_set_data(1:2) %>% ggplot(aes(x, id = id, split = y, value = 1)) + geom_parallel_sets(aes(fill = engine), show.legend = FALSE, alpha = 0.3) + geom_parallel_sets_axes(axis.width = 0.1, color = "lightgrey", fill = "white") + geom_parallel_sets_labels(angle = 0) + theme_no_axes()
When visualizing the combination of a continuous and a categorical variable, it is common practice to resort to a bar or column plot. Cases that require representing this in a single circle shape usually involve modifying a polar bar in
ggplot. But, this is much easier now with
ggforce. I start with the total number of planes by engine planes:
planes %>% count(engine)
## # A tibble: 6 x 2 ## engine n ## <chr> <int> ## 1 4 Cycle 2 ## 2 Reciprocating 28 ## 3 Turbo-fan 2750 ## 4 Turbo-jet 535 ## 5 Turbo-prop 2 ## 6 Turbo-shaft 5
and then pipe those results into
geom_arc_bar() to create the circle-shaped plot. The new aesthetics employed here are:
explode. The x, y, and r aesthetics refer to the position and the radius of the circle. Since only one plot is needed, I fix x and y to 0. For radius, the
r0 refers to the inside of the circle, and
r to the outside. Setting
r0 to 0.7 and
r to 1 will create a sort of doughnut with a 0.3 thickness. Finally, I use “pie” as the
planes %>% count(engine) %>% ggplot() + geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0.7, r = 1, amount = n, fill = engine), alpha = 0.3, stat = "pie")
Another cool thing this
geom does is to make it east to “break-away” one or several segments of the plot. The
explode aesthetic controls that. To break away the “Turbo-jet” results, I create a new column called
focus, setting it to 0.2 if it is part of that engine group, and to 0 if it is not, then finish up with
planes %>% count(engine) %>% mutate(focus = ifelse(engine == "Turbo-jet", 0.2, 0)) %>% ggplot() + geom_arc_bar(aes(x0 = 0, y0 = 0, r0 = 0.7, r = 1, amount = n, fill = engine, explode = focus), alpha = 0.3, stat = "pie") + theme_no_axes()
This section is titled “Danger Zone”, because hanging the
geom_arc_bar() may change the look of the plot to one that has fallen out of favor. That plot type happens to be the same name of the
stat that we are using.
ggforce is a great package that does a lot more than what I covered today. My hope is to have shared one or two things that will encourage you to try
ggforce in your everyday work.
Special thanks to Thomas Pedersen, the author of the package and a co-worker of mine. His contributions to the R community also include the
ggraph packages, which I wrote about in this blog post a few months back.