In my training as a consultant, I learned that long hours of analysis were typically followed by equally long hours of preparing for presentations. I had to turn my complex analyses into recommendations, and my success as a consultant depended on my ability to influence decision makers. I used a variety of tools to convey my insights, but over time I increasingly came to rely on R Markdown as my tool of choice. R Markdown is easy to use, allows others to reproduce my work, and has powerful features such as parameterized inputs and multiple output formats. With R Markdown, I can share more work with less effort than I did with previous tools, making me a more effective data scientist. In this post, I want to examine three commonly used communication tools and show how R Markdown is often the better choice.
The de facto tools for communication in the enterprise are still Microsoft Word, PowerPoint, and Excel. These tools, born in the 80’s and rising to prominence in the 90’s, are used everywhere for sharing reports, presentations, and dashboards. Although Microsoft Office documents are easy to share, they can be cumbersome for data scientists to write because they cannot be written with code. Additionally:
- They are not reproducible.
- They are separate from the code you used to create your analysis.
- They can be time-consuming to create and difficult to maintain.
In data science, your code - not your report or presentation - is the source of your results. Therefore, your documents should also be based on code! You can accomplish this with R Markdown, which produces documents that are generated by code, reproducible, and easy to maintain. Moreover, R Markdown documents can be rendered in Word, PowerPoint, and many other output formats. So, even if your client insists on having Microsoft documents, by generating them with R Markdown, you can spend more time working on your code and less time maintaining reports.
Data science often involves interactive analyses with code, but code by itself is usually not enough to communicate results in an enterprise setting. In a previous post, I explained the benefits of using R Notebooks over R scripts for doing data science. An R Notebook is a special execution mode of R Markdown with two characteristics that make it very useful for communicating results:
- Rendering a preview of an R Notebook does not execute R code, making it computationally convenient to create reports during or after interactive analyses.
- R Notebooks have an embedded copy of the source code, making it convenient for others to examine your work.
These two characteristics of R Notebooks combine the advantages of R scripts with the advantages of R Markdown. Like R scripts, you can do interactive data analyses and see all your code, but unlike R scripts you can easily create reports that explain why your code is important.
Shiny and R Markdown are both used to communicate results. They both depend on R, generate high-quality output, and can be designed to accept user inputs. In previous posts, we discussed Dashboards with Shiny and Dashboards with R Markdown. Knowing when to use Shiny and when to use R Markdown will increase your ability to influence decision makers.
|Shiny Apps||R Markdown Documents|
|Have an interactive and responsive user experience.||Are snapshots in time, rendered in batch.|
|Are hosted on a web server that runs R.||Have multiple output types such as HTML, Word, PDF, and many more.|
|Are not portable (i.e., users must visit the app).||Are files that can be sent via email or otherwise shared.|
Shiny is great – even “magical” – when you want your end users to have an interactive experience, but R Markdown documents are often simpler to program, easier to maintain, and can reach a wider audience. I use Shiny when I need an interactive user experience, but for everything else, I use R Markdown.
If you need to accept user input, but you don’t require the reactive framework of Shiny, you can add parameters to your R Markdown code. This process is easy and powerful, yet remains underutilized by most R users. It is a feature that would benefit a wide range of use cases, especially where the full power of Shiny is not required. Additionally, adding parameters to your document makes it easy to generate multiple versions of that document. If you host a document on RStudio Connect, then users can select inputs and generate new versions on demand. Many Shiny applications today would be better suited as parameterized R Markdown documents.
Finally, Shiny and R Markdown are not mutually exclusive. You can include Shiny elements in an R Markdown document, which enables you create a report that responds interactively to user inputs. These Shiny documents are created with the simplicity of R markdown, but have the same hosting requirements of a Shiny app and are not portable.
Using the right tools for communication matters. R Markdown is a better solution than conventional tools for the following problems:
|Problem||Common tool||Better tool|
|Share reports and presentations||Microsoft Office||R Markdown|
|Summarize and share your interactive analyses||R Scripts||R Notebooks|
|Update results (in batch) based on new inputs||Shiny||Parameterized reports|
R For Data Science explains that, “It doesn’t matter how great your analysis is unless you can explain it to others: you need to communicate your results.” I highly recommend reading Part V of this book, which has chapters on using R Markdown as a unified authoring framework for data science, using R Markdown formats for effective communication, and using R Markdown workflows to create analysis notebooks. There are references at the end of these chapters that describe where to learn more about communication.