Some thoughts on rstudio::global talks

2021-02-04

by Joseph Rickert

The fifty-five videos from last month’s rstudio::global conference are now available online. You will find them at the link above arranged in ten categories (Keynotes, Data for Good, Language Interop, Learning, Modeling, Organizational Tooling, Package Dev, Programming, Teaching, and Visualization) that reflect fundamental areas of technical and community infrastructure and R applications. Theses talks were selected from hundreds of submissions, many of which were really very good. I participated in the first selection round and found it impossible to make some choices, so I am certain that it must have been agonizingly difficult for the program committee to pare down to the final selections.

I believe that you will find the content of most of these talks to be nothing less than compelling. The themes and moods of the talks range from informative and deeply technical R issues to data science, journalism, art, education and public service. A few talks transcend the parochial concerns of the R community and address issues that are important to society at large. It is gratifying to see that in the hands of committed people R is helping to make the world just a little bit better. The videos themselves are high quality and a pleasure to watch. Unlike typical conference videos recorded in real time, all of these were produced with excellent lighting, good audio, and were rehearsed, pre-recorded, and edited. Except for the keynotes, the talks are shorter than twenty minutes.

In the remainder of this post, I will highlight just five talks that I personally found compelling. I have arranged them in an order that I think makes sense to view them. But, you might do just as well to sample talks from the categories listed above that organize the talks on the conference page. My selections do not cover the whole range of topics submitted, and they certainly do not include all of the good stuff. I do think, however, that they reflect the quality of the talks, and I hope that if you watch these five you will be motivated to watch the rest too.

My first three selections are by data journalists who are out to make the world a better place. The Opioid Files: Turning big pharmacy data over to the public by Washington Post data journalist Andrew Ba Tran demonstrates the scale of the opiod scandal. I had the opportunity to meet Andrew at the NICAR conference for Investigative Reporters and Editors in 2019 where he was speaking and teaching R. NICAR opened my eyes to the discipline and tradition of Data Journalism and the efforts of data journalists to harness technology for the public good. Andrew’s conference talk represents this tradition and illustrates the data crunching skills, persistence, and unvarnished storytelling necessary to illuminate a dark topic.

Next, I recommend watching Trial and Error in Data Viz at the ACLU. Sophie Beiers is a data journalist whose work for the ACLU involves discovering and visualizing data with sufficient rigor and clarity to support arguments that will hold up in court. Sophie describes the messy work of iterating through visualizations in a process built around candid feedback from colleagues and stakeholders. Driving the process is a determination to make charts that effectively communicate key points to the intended audience. Sophie’s talk hints at the emotional toll caused by striving to see the people behind the data, and the satisfaction that comes from making a difference. The ACLU analytics team coined a word for expressing the excitement at being able to prove terrible news with quantitative evidence.

The third talk on my list by John Burn Murdoch on Reporting on and visualizing the pandemic continues the theme of polishing visualizations until they work for the target audience. John is a data journalist with the Financial Times who has garnered quite a following for producing data visualizations that command attention. John’s talk dives deeply into his process of evolving a visualization until it not only illustrates what he wants to show but also shows that it is resonating with his mass audience.

In thinking about Sophie and John’s work, the Japanese word kaizen (making something good for the good of other people) comes to mind. There are many R visualization experts who can lay down the basic principles of making a good data visualization, but few I think, that have Sophie and John’s empathy and capacity to listen and process criticism.

The final two talks on my list are by R developers who are concerned with the big picture of sustaining the R package ecosystem. Hadley Wickham’s Keynote Maintaining the house the tidyverse built addresses a fundamental challenge encountered by all complex software projects that develop over time. How do you maintain functional stability while coping with growth and change? Hadley’s solution for the tidyverse encapsulated in the following figure:

may not be the right solution for all subsystems within the R ecosystem, but surely something like it must evolve to reach into all of the corners of the R Universe. For example, consider the CRAN Task Views. When they were assembled, each of these curated lists of R packages represented the cutting edge of software for a particular functional area. The Task View maintainers do a mostly unacknowledged, essential service in keeping these up to date. Nevertheless, it is not difficult to discover Task View packages that have not changed significantly in five or ten years or more. To my knowledge, there are no standards for retiring packages and integrating new work. The future of R depends on systems thinking and the development of new ideas and tools for open source development.

These considerations lead to Jeroen Ooms’ talk: Monitoring health and impact of open-source projects which describes how ROpenSci is taking on the immense challenge of measuring the quality of R packages according to technical, social and scientific indicators, while building out the infrastructure to improve the entire R package ecosystem. Some of the tools for monitoring the status and “health” of open source software are already in place. ROpenSci is offering R-universe, a platform based in git for managing personal R repositories. Once a “universe” of packages is registered with R-universe every time an author pushes an update the platform will automatically build binaries and documentation.

Enjoy the videos!!