<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>R for the Enterprise on R Views</title>
    <link>https://rviews.rstudio.com/categories/r-for-the-enterprise/</link>
    <description>Recent content in R for the Enterprise on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 17 Oct 2019 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/categories/r-for-the-enterprise/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Productionizing Shiny and Plumber with Pins</title>
      <link>https://rviews.rstudio.com/2019/10/17/deploying-data-with-pins/</link>
      <pubDate>Thu, 17 Oct 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/10/17/deploying-data-with-pins/</guid>
      <description>
        


&lt;p&gt;Producing an API that serves model results or a Shiny app that displays the results of an analysis requires a collection of intermediate datasets and model objects, all of which need to be saved. Depending on the project, they might need to be reused in another project later, shared with a colleague, used to shortcut computationally intensive steps, or safely stored for QA and auditing.&lt;/p&gt;
&lt;p&gt;Some of these &lt;em&gt;should&lt;/em&gt; be saved in a data warehouse, data lake, or database, but write access to an appropriate database isn’t always available. In other cases, especially with models, it may not be clear where they should be saved at all.&lt;/p&gt;
&lt;p&gt;Enter &lt;a href=&#34;https://rstudio.github.io/pins/&#34;&gt;&lt;code&gt;pins&lt;/code&gt;&lt;/a&gt;, a new R package written by &lt;a href=&#34;https://github.com/javierluraschi&#34;&gt;Javier Luraschi&lt;/a&gt;. &lt;code&gt;pins&lt;/code&gt; makes it easy to save (pin) R objects including datasets, models, and plots to a central location (board), and access them easily from both R and Python. Pins make it much easier to create production-ready R assets by simplifying the storage and updating of intermediate data artifacts.&lt;/p&gt;
&lt;div id=&#34;problems-you-can-put-a-pin-in&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Problems you can put a pin in&lt;/h2&gt;
&lt;p&gt;In general, pins are a good substitute for saving objects alongside analysis code as &lt;code&gt;.csv&lt;/code&gt; or &lt;code&gt;.rds&lt;/code&gt; objects. Especially when the object is reused several times or updated independently from the rest of the analysis, a pin is probably a better solution than saving a file with your code.&lt;/p&gt;
&lt;p&gt;In this article, I’ll create a predictive model, programmatically serve predictions via a &lt;a href=&#34;https://www.rplumber.io/&#34;&gt;Plumber API&lt;/a&gt;, and visualize those predictions in a Shiny app. Along the way, I’ll make extensive use of pins for important parts of my workflow.&lt;/p&gt;
&lt;p&gt;The model will predict future availability of bicycles at &lt;a href=&#34;https://www.capitalbikeshare.com/&#34;&gt;Capital Bikeshare&lt;/a&gt; docks, which provide short-term bicycle rentals in and around Washington DC. Capital Bikeshare makes data on the current availability of bikes at each station available via a public API.&lt;/p&gt;
&lt;p&gt;I’m going to make model predictions available in production by providing programmatic access to the model via an API and to humans via a Shiny app. All of the code for this demo is available on &lt;a href=&#34;https://github.com/rstudio/bike_predict/&#34;&gt;Github&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To get there, I’m going to follow this analysis workflow:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Ingest metadata about the stations, like name and location, from the bike data API.&lt;/li&gt;
&lt;li&gt;Combine the station metadata with raw data on bike availability from the data lake to create an analysis dataset.&lt;/li&gt;
&lt;li&gt;Train and deploy a model of future bike availability.&lt;/li&gt;
&lt;li&gt;Serve model predictions via a Plumber API.&lt;/li&gt;
&lt;li&gt;Visualize model predictions via a Shiny app.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Along the way, here are three specific times that a pin is going to come in handy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintaining the metadata table of station IDs and details. Especially since I’m reusing this table in multiple assets in this project, having it in a pin is a sure way to know it’s up-to-date.&lt;/li&gt;
&lt;li&gt;Saving the final analysis dataset. In this case, the raw Capitol Bikeshare data is being imported with a completely separate ETL script, and I don’t want to write my analysis dataset into a data lake. Without a separate database for analysis data, a pin is my best option.&lt;/li&gt;
&lt;li&gt;Deploying the model to serve the predictions. Saving the model separately from the API makes it easy to decouple API and model versions and to retrain the model and redeploy seamlessly when needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In all of these cases, pins drastically simplify my workflow, improve discoverability of the objects my analysis has created, and makes me more confident that I’m always using the newest version.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;where-to-pin&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Where to pin?&lt;/h2&gt;
&lt;p&gt;Before getting started describing exactly how this analysis project works, let’s dive a little deeper into the &lt;code&gt;pins&lt;/code&gt; package itself.&lt;/p&gt;
&lt;p&gt;Pins live on &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-understanding.html&#34;&gt;boards&lt;/a&gt;. A board is a set of content names and the associated files. The magic of the &lt;code&gt;pins&lt;/code&gt; package is that with only two commands and the name of some content, you can upload and download your R objects without having to worry about how how the content is stored.&lt;/p&gt;
&lt;p&gt;By default, there are two boards you can use immediately: the &lt;code&gt;packages&lt;/code&gt; board of the datasets from R packages that are installed, and the &lt;code&gt;local&lt;/code&gt; board, which caches datasets for quick loading later.&lt;/p&gt;
&lt;p&gt;The real power of &lt;code&gt;pins&lt;/code&gt; is unlocked with remote boards. &lt;code&gt;pins&lt;/code&gt; supports &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-kaggle.html&#34;&gt;Kaggle&lt;/a&gt;, &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-github.html&#34;&gt;Github&lt;/a&gt;, &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-websites.html&#34;&gt;website&lt;/a&gt;, and &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-rsconnect.html&#34;&gt;RStudio Connect&lt;/a&gt; boards, and also supports building &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-extending.html&#34;&gt;custom extensions&lt;/a&gt;. By using a remote board, you can use &lt;code&gt;pins&lt;/code&gt; to make your R objects accessible to others on your team in a central location.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;how-it-works&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;How it works&lt;/h2&gt;
&lt;p&gt;Using a pin works like this:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Register the board with the the &lt;code&gt;pins::board_register&lt;/code&gt; function. You’ll need to provide the proper authentication mechanism like a &lt;a href=&#34;https://www.kaggle.com/docs/api&#34;&gt;Kaggle token&lt;/a&gt;, &lt;a href=&#34;https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line&#34;&gt;Github Personal Access Token (PAT)&lt;/a&gt;, or &lt;a href=&#34;https://docs.rstudio.com/connect/1.5.4/user/api-keys.html&#34;&gt;RStudio Connect API key&lt;/a&gt; if you are using a remote board.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For GitHub, you need a repo that you have write access to, as well as a token:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pins::board_register(board = &amp;quot;github&amp;quot;, 
                     repo = &amp;quot;akgold/pins_demo&amp;quot;, 
                     branch = &amp;quot;master&amp;quot;,
                     token = Sys.getenv(&amp;quot;GITHUB_PAT&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For an RStudio Connect board, you need the server URL and an API key:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pins::board_register(board = &amp;quot;rsconnect&amp;quot;, 
                     server = &amp;quot;https://colorado.rstudio.com/rsc&amp;quot;, 
                     key = Sys.getenv(&amp;quot;RSTUDIOCONNECT_API_KEY&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At that point, your connections pane in RStudio will show the content available in the board.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;pins-connection-pane&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/pins_connection.png&#34; alt=&#34;Pins Connection Pane&#34; /&gt;&lt;/h1&gt;
&lt;p&gt;Once you’ve registered the board, your interactions are exactly the same no matter which board type you’re using.&lt;/p&gt;
&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Pin an object to the board.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pins::pin(
  x = mtcars, 
  name = &amp;quot;mtcars_pin&amp;quot;, 
  description = &amp;quot;A pin of the mtcars dataset.&amp;quot;, 
  board = &amp;quot;rsconnect&amp;quot;
)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Download the object later.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cars_data &amp;lt;- pins::pin_get(
  name = &amp;quot;mtcars_pin&amp;quot;
  board = &amp;quot;rsconnect&amp;quot;
)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;production-apps-with-pins&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Production Apps with Pins&lt;/h2&gt;
&lt;p&gt;In order to create, serve, and visualize my bike-availability predictions, I’m going to use RStudio’s publishing and scheduling platform, &lt;a href=&#34;https://rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As of RStudio Connect 1.7.8, you can publish pins to RStudio Connect, and pins of datasets provide a nice preview of the pin, as well as code to retrieve the pin in both R and Python.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;a-pin-on-rstudio-connect&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/rsc_pin.png&#34; alt=&#34;A pin on RStudio Connect&#34; /&gt;&lt;/h1&gt;
&lt;p&gt;The advantage of using RStudio Connect is that I can deploy R Markdown documents, Shiny apps, and Plumber APIs that create, use, and update the pins in addition to storing the pins themselves. I can also use the permissions and security of RStudio Connect to make sure that my pins are viewable only by those with the proper permissions.&lt;/p&gt;
&lt;p&gt;Here’s how the process works:&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;system-schematic&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/system_schematic.png&#34; alt=&#34;System Schematic&#34; /&gt;&lt;/h1&gt;
&lt;div id=&#34;section&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;1.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_station_info/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_station_data.png&#34; alt=&#34;The bike station metadata, pinned on RStudio Connect, is updated every week by a scheduled RMarkdown document&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_station_data_ingest/&#34;&gt;RMarkdown here&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-1&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;2.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_model_data/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_model_data.png&#34; alt=&#34;The analysis dataset is pinned to RStudio Connect by another RMarkdown job.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_data_ingest/&#34;&gt;RMarkdown here&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-2&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;3.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_available_model/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_model_train.png&#34; alt=&#34;An XGBoost model is trained and pinned to RStudio Connect on demand by a deployed RMarkdown script&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_model_build/&#34;&gt;RMarkdown here&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-3&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;4.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_predict/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_api.png&#34; alt=&#34;A Plumber API is deployed on RStudio Connect, which calls the pinned model and serves model predictions.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-4&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;5.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_predict-app/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_app.png&#34; alt=&#34;A Shiny app is deployed, which consumes the prediction API and visualizes the number of bikes available.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The three times that a pin was useful here turn out to represent three of the most compelling reasons to use a pin.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A small dataset that gets reused&lt;/strong&gt;. By accessing the station metadata dataset in a pin, I know I’m always getting the latest version regardless of which asset is using it, and it’s also accessible for other analyses in the future.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;An analysis dataset when you can’t write back to the database&lt;/strong&gt;. In this case, I don’t want to write an analysis dataset back to the raw data lake, so it’s easier to store it as a pin.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A model in production&lt;/strong&gt;. By using a pin to store my model, it’s easy to update the version that’s in production by running the R Markdown document that trains the model. It’s also conceptually simple to update the model independently from the API that serves predictions or the Shiny app that visualizes the predictions.&lt;/p&gt;
&lt;p&gt;Pins can be a fantastic way to enable Shiny and Plumber in production. By giving data scientists a place to save and deploy the output of their projects, pins make it easier to create, deploy, and update models, datasets, and other production-ready R objects.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/10/17/deploying-data-with-pins/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>How to Send Custom E-mails with R</title>
      <link>https://rviews.rstudio.com/2019/09/04/how-to-send-custom-e-mails-with-rstudio-connect/</link>
      <pubDate>Wed, 04 Sep 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/09/04/how-to-send-custom-e-mails-with-rstudio-connect/</guid>
      <description>
        


&lt;p&gt;A common business oriented data science task is to programatically craft and send custom emails. In this post, I will show how to accomplish this with R on the &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt; platform (a paid product built for the enterprise) using the &lt;a href=&#34;https://cran.r-project.org/package=blastula&#34;&gt;&lt;code&gt;blastula&lt;/code&gt;&lt;/a&gt; package.&lt;code&gt;blastula&lt;/code&gt; provides a set of functions for composing high-quality HTML e-mails that render across various e-mail clients, such as gmail and outlook, and also includes tooling for sending out those e-mails via SMTP, the standard protocol for electronic mail transmission between different e-mail providers. At the bottom of the post you can find a link to documentation showing how to publish email with &lt;code&gt;blastula&lt;/code&gt; via an SMTP server without using RStudio Connect.&lt;/p&gt;
&lt;p&gt;As an example, we’ll pretend that I work in a marketing analytics department at an insurance company. I’m responsible for a marketing report, created with the &lt;code&gt;rmarkdown&lt;/code&gt; package, that tracks the number of bound policies from different marketing activities:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;tibble&amp;#39; was built under R version 3.5.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;knitr&amp;#39; was built under R version 3.5.2&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Mktg_Activity&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;Policies&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;Target&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Partnerships&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;345&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;320&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;E-mail Mktg&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;434&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;410&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Direct Mail&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;240&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;235&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Radio&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;128&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Having written the report with R Markdown, I will publish the script to RStudio Connect and have Connect create and send an e-mail for me. Once this is done, I’ll turn on both the &lt;a href=&#34;https://docs.rstudio.com/connect/user/settings-panel.html#content-schedule&#34; target=&#34;&amp;quot;_blank&#34;&gt;&lt;em&gt;scheduler&lt;/em&gt;&lt;/a&gt; and &lt;em&gt;Send email after update&lt;/em&gt; options to have Connect re-run the report on a set schedule. By default, the e-mail generated by RStudio Connect looks something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;screenshot.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;Because we haven’t done anything to customize the e-mail notification yet, Connect generated a standard out-of-the-box e-mail. It used the published document name for the e-mail subject, included a link to the report, as well as the time stamp of when it was executed. The e-mail also contains the actual report as an attachment, which can be downloaded and viewed. This is already useful, but in this case I’d like to customize the e-mail to make it more tailored to fit my team’s needs. This is where the &lt;code&gt;blastula&lt;/code&gt; package comes in.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;blastula&lt;/code&gt; allows you to create and send e-mails using R. It works similarly to the Shiny package, but instead of writing R code to create an interactive application, you write R code to create an HTML e-mail that can be rendered across a wide variety of e-mail providers. Once you’ve programmatically created an HTML e-mail, &lt;code&gt;bastula&lt;/code&gt; can also be used to send out that e-mail programmatically.&lt;/p&gt;
&lt;p&gt;To create your custom e-mail, simply add a new R code chunk at the bottom of your R Markdown script. You can use the &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/r-code.html&#34;&gt;code chunk option&lt;/a&gt; &lt;code&gt;include = FALSE&lt;/code&gt; so that your R code isn’t printed in your actual RMardown report, since that wouldn’t be very helpful to whomever is reading the report:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# load the blastula package
library(&amp;quot;blastula&amp;quot;)

# create a simple e-mail
email &amp;lt;- compose_email(body = &amp;quot;Insert your e-mail body here&amp;quot;,
                       footer = &amp;quot;Insert your e-mail footer here&amp;quot;)

# preview e-mail in Viewer pane
preview_email(email)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;email1_screenshot.png&#34; /&gt;&lt;/p&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;&lt;code&gt;blastula&lt;/code&gt; supports string interpolation, meaning it can display the value of an R variable rather than simply printing your R code as plain text. The way to tell &lt;code&gt;blastula&lt;/code&gt; what is R code vs. what’s plain text, is by adding curly braces around anything you want to be interpreted as R:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# create an e-mail with R code

email_body &amp;lt;- 
&amp;quot;
Hi! This new report was generated at {Sys.time()}
&amp;quot;

email_footer &amp;lt;- 
&amp;quot;
Please contact *support@acme.com* with any questions
&amp;quot;

email &amp;lt;- compose_email(body = email_body,
                       footer = email_footer)

# preview e-mail in Viewer pane
preview_email(email)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;email2_screenshot.png&#34; /&gt;
 &lt;/p&gt;
&lt;p&gt;You’ll notice that not only can you include R code, but you can also supply Markdown syntax (notice how we italicized some of the footer text). In addition to all this, you can use helper functions included in the &lt;code&gt;blastula&lt;/code&gt; package to add other elements to your e-mail. For example, you can add a &lt;code&gt;ggplot2&lt;/code&gt; plot as an image using the &lt;code&gt;add_ggplot&lt;/code&gt; function:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# create an e-mail with a plot
library(ggplot2)
plot &amp;lt;- ggplot(tb, aes(Mktg_Activity, Policies)) + geom_bar(stat = &amp;quot;identity&amp;quot;)


email_body &amp;lt;- 
&amp;quot;
Hi! This new report was generated at {Sys.time()} \\


{add_ggplot(plot, width = 5, height = 3)}

&amp;quot;

email_footer &amp;lt;- 
&amp;quot;
Please contact *support@acme.com* with any questions
&amp;quot;

email &amp;lt;- compose_email(body = email_body,
                       footer = email_footer)

# preview e-mail in Viewer pane
preview_email(email)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;email3_screenshot.png&#34; /&gt;
 &lt;/p&gt;
&lt;p&gt;Now that we’ve created the e-mail programmatically, the next step is to send it out. Because my company uses RStudio Connect to host reports and send e-mail notifications, I need to add the following two lines of code to the bottom of my report so that RStudio Connect knows what to use as the e-mail body:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Use Blastula&amp;#39;s message as the email body in RStudio Connect.
rmarkdown::output_metadata$set(rsc_email_body_html = email$html_str)
rmarkdown::output_metadata$set(rsc_email_images = email$images)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Before this point, the blastula code we’d written generated a nice HTML e-mail, which in this case had been saved to a variable I called &lt;code&gt;email&lt;/code&gt;. However Connect didn’t know that. To remedy this, we saved the e-mail we created to the &lt;code&gt;output_metadata&lt;/code&gt; object. The &lt;code&gt;output_metadata&lt;/code&gt; object contains “metadata”, or information, some of which Connect uses. Two of those items are &lt;code&gt;rsc_email_body_html&lt;/code&gt; and &lt;code&gt;rsc_email_images&lt;/code&gt;, which Connect uses to build the HTML notification e-mail it sends out. For consistency, you can always assign both of these items at the end of your R Markdown report, even if initially it does not contain embedded images.&lt;/p&gt;
&lt;p&gt;If you do not wish to use RStudio Connect to send messages, you can also use the &lt;code&gt;smtp_send()&lt;/code&gt; function to send your e-mail via a SMTP server. For instructions, check out the package’s “Sending Email Using SMTP” vignette on &lt;a href=&#34;https://github.com/rich-iannone/blastula/blob/master/vignettes/sending_using_smtp.Rmd&#34;&gt;github&lt;/a&gt;. To learn more about crafting custom e-mails, check out the &lt;a href=&#34;https://cran.r-project.org/web/packages/blastula/blastula.pdf&#34; target=&#34;_blank&#34;&gt;blastula documentation&lt;/a&gt; and the &lt;a href=&#34;https://docs.rstudio.com/connect/user/r-markdown.html#r-markdown-email-customization&#34; target=&#34;_blank&#34;&gt;RStudio Connect User Guide&lt;/a&gt;.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/09/04/how-to-send-custom-e-mails-with-rstudio-connect/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Plumber Logging</title>
      <link>https://rviews.rstudio.com/2019/08/13/plumber-logging/</link>
      <pubDate>Tue, 13 Aug 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/08/13/plumber-logging/</guid>
      <description>
        


&lt;p&gt;The &lt;a href=&#34;https://www.rplumber.io/docs/&#34;&gt;plumber R package&lt;/a&gt; is used to expose R functions as API endpoints. Due to plumber’s incredible flexibility, most major API design decisions are left up to the developer. One important consideration to be made when developing APIs is how to log information about API requests and responses. This information can be used to determine how plumber APIs are performing and how they are being utilized.&lt;/p&gt;
&lt;p&gt;An example of logging API requests in plumber is included in the &lt;a href=&#34;https://www.rplumber.io/docs/routing-and-input.html#filters&#34;&gt;package documentation&lt;/a&gt;. That example uses a filter to log information about incoming requests before a response has been generated. This is certainly a valid approach, but it means that the log cannot contain details about the response since the response hasn’t been created yet. In this post we will look at an alternative approach to logging plumber APIs that uses &lt;a href=&#34;https://www.rplumber.io/docs/programmatic-usage.html#router-hooks&#34;&gt;preroute and postroute hooks&lt;/a&gt; to log information about each API request and its associated response.&lt;/p&gt;
&lt;div id=&#34;logging&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Logging&lt;/h2&gt;
&lt;p&gt;In this example, we will use the &lt;a href=&#34;https://daroczig.github.io/logger/&#34;&gt;logger package&lt;/a&gt; to generate the actual log entries. Using this package isn’t required, but it does provide some convenient functionality that we will take advantage of.&lt;/p&gt;
&lt;p&gt;Since we will be registering hooks for our API, we will need both a &lt;code&gt;plumber.R&lt;/code&gt; file and an &lt;code&gt;entrypoint.R&lt;/code&gt; file. The &lt;code&gt;plumber.R&lt;/code&gt; file contains the following:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# plumber.R
# A simple API to illustrate logging with Plumber

library(plumber)

#* @apiTitle Logging Example

#* @apiDescription Simple example API for implementing logging with Plumber

#* Echo back the input
#* @param msg The message to echo
#* @get /echo
function(msg = &amp;quot;&amp;quot;) {
  list(msg = paste0(&amp;quot;The message is: &amp;#39;&amp;quot;, msg, &amp;quot;&amp;#39;&amp;quot;))
}

#* Plot a histogram
#* @png
#* @get /plot
function() {
  rand &amp;lt;- rnorm(100)
  hist(rand)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we’ve defined two endpoints (&lt;code&gt;/echo&lt;/code&gt; and &lt;code&gt;/plot&lt;/code&gt;), we can use &lt;code&gt;entrypoint.R&lt;/code&gt; to setup logging using preroute and postroute hooks. First, we need to configure the logger package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# entrypoint.R
library(plumber)

# logging
library(logger)

# Specify how logs are written
log_dir &amp;lt;- &amp;quot;logs&amp;quot;
if (!fs::dir_exists(log_dir)) fs::dir_create(log_dir)
log_appender(appender_tee(tempfile(&amp;quot;plumber_&amp;quot;, log_dir, &amp;quot;.log&amp;quot;)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;log_appender()&lt;/code&gt; function is used to specify which appender method is used for logging. Here we use &lt;code&gt;appender_tee()&lt;/code&gt; so that logs will be written to &lt;code&gt;stdout&lt;/code&gt; and to a specific file path. We create a directory called &lt;code&gt;logs/&lt;/code&gt; in the current working directory to store the resulting logs. Every log file is assigned a unique name using &lt;code&gt;tempfile()&lt;/code&gt;. This prevents errors that can occur if concurrent processes try to write to the same file.&lt;/p&gt;
&lt;p&gt;Now, we need to create a helper function that we will use when creating log entries:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;convert_empty &amp;lt;- function(string) {
  if (string == &amp;quot;&amp;quot;) {
    &amp;quot;-&amp;quot;
  } else {
    string
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This function takes an empty string and converts it into a dash (&lt;code&gt;&amp;quot;-&amp;quot;&lt;/code&gt;). We will use this to ensure that empty log values still get recorded so that it is easy to read the log files. We’re now ready to create our plumber router and register the hooks necessary for logging:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pr &amp;lt;- plumb(&amp;quot;plumber.R&amp;quot;)

pr$registerHooks(
  list(
    preroute = function() {
      # Start timer for log info
      tictoc::tic()
    },
    postroute = function(req, res) {
      end &amp;lt;- tictoc::toc(quiet = TRUE)
      # Log details about the request and the response
      log_info(&amp;#39;{convert_empty(req$REMOTE_ADDR)} &amp;quot;{convert_empty(req$HTTP_USER_AGENT)}&amp;quot; {convert_empty(req$HTTP_HOST)} {convert_empty(req$REQUEST_METHOD)} {convert_empty(req$PATH_INFO)} {convert_empty(res$status)} {round(end$toc - end$tic, digits = getOption(&amp;quot;digits&amp;quot;, 5))}&amp;#39;)
    }
  )
)

pr&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We use the &lt;code&gt;$registerHooks()&lt;/code&gt; method to register both preroute and postroute hooks. The preroute hook uses the &lt;a href=&#34;http://collectivemedia.github.io/tictoc/&#34;&gt;tictoc package&lt;/a&gt; to start a timer. The postroute hook stops the timer and then writes a log entry using the &lt;code&gt;log_info()&lt;/code&gt; function from the logger package. Each log entry contains the following information:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Log level: This is a distinction made by the logger package, and in this
example the value is always INFO&lt;/li&gt;
&lt;li&gt;Timestamp: The timestamp for when the response was generated and sent back to
the client&lt;/li&gt;
&lt;li&gt;Remote Address: The address of the client making the request&lt;/li&gt;
&lt;li&gt;User Agent: The user agent making the request&lt;/li&gt;
&lt;li&gt;Http Host: The host of the API&lt;/li&gt;
&lt;li&gt;Method: The HTTP method attached to the request&lt;/li&gt;
&lt;li&gt;Path: The specific API endpoint requested&lt;/li&gt;
&lt;li&gt;Status: The HTTP status of the response&lt;/li&gt;
&lt;li&gt;Execution Time: The amount of time from when the request received until the response was generated&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This log format is loosely inspired by the &lt;a href=&#34;https://en.wikipedia.org/wiki/Common_Log_Format&#34;&gt;NCSA Common log format&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;testing&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Testing&lt;/h2&gt;
&lt;p&gt;Now that our API is all setup, it’s time to test to make sure logging works as expected. First, we need to start the API. The easiest way to do this is to click the Run API button that appears at the top of the &lt;code&gt;plumber.R&lt;/code&gt; file in the RStudio IDE. Once the API is running, you’ll see a message in the console similar to the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Running plumber API at http://127.0.0.1:5762&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we know the API is running, we need to make a request. One of the easiest ways to make a request in this case is to open a web browser (like Google Chrome) and type the API address in the address bar followed by &lt;code&gt;/plot&lt;/code&gt;. In this example, I would type &lt;code&gt;http://127.0.0.1:5762/plot&lt;/code&gt; into the address bar of my browser. If all goes well, you should see a plot rendered in the browser. The RStudio console will display the log output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;INFO [2019-08-09 12:30:23] 127.0.0.1 &amp;quot;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36&amp;quot; localhost:5762 GET /plot 200 0.158&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A new &lt;code&gt;logs/&lt;/code&gt; directory will have been created in the current working directory and it will contain a file with the log entry. You can generate more log entries by refreshing your browser window.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;analyzing&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Analyzing&lt;/h2&gt;
&lt;p&gt;Let’s say that we refreshed the browser window 1,000 times. The log file generated will contain an entry for each request. We can analyze this log file to find helpful information about the API. For example, we could plot a histogram of execution time:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggplot2)

plumber_log &amp;lt;- readr::read_log(&amp;quot;logs/plumber_fe3daed895d.log&amp;quot;,
                               col_names = c(&amp;quot;log_level&amp;quot;,
                                             &amp;quot;timestamp&amp;quot;,
                                             &amp;quot;remote_address&amp;quot;,
                                             &amp;quot;user_agent&amp;quot;,
                                             &amp;quot;http_host&amp;quot;,
                                             &amp;quot;method&amp;quot;,
                                             &amp;quot;path&amp;quot;,
                                             &amp;quot;status&amp;quot;,
                                             &amp;quot;execution_time&amp;quot;))

ggplot(plumber_log, aes(x = execution_time)) +
  geom_histogram() +
  theme_bw() +
  labs(title = &amp;quot;Execution Times&amp;quot;,
       x = &amp;quot;Execution Time&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-08-10-plumber-logging/index_files/figure-html/unnamed-chunk-1-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We could even build a &lt;a href=&#34;http://shiny.rstudio.com&#34;&gt;Shiny application&lt;/a&gt; to monitor the &lt;code&gt;logs/&lt;/code&gt; directory and provide real-time visibility into API metrics!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;log-monitoring.gif&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The details of this Shiny application go beyond the scope of this post, but the source code is available &lt;a href=&#34;https://github.com/sol-eng/plumber-logging/blob/master/R/shiny/app.R&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Plumber APIs published to &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt; can use this pattern to log and monitor API requests. Details on this use case can be found in &lt;a href=&#34;https://github.com/sol-eng/plumber-logging#deployment&#34;&gt;this repository&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Plumber is an incredibly flexible package for exposing R functions as API endpoints. Logging information about API requests and responses provides visibility into API usage and performance. These log files can be manually inspected or used in connection with other tools (like Shiny) to provide real-time metrics around API use. The code used in this example along with additional information is available in &lt;a href=&#34;https://github.com/sol-eng/plumber-logging&#34;&gt;this GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you are interested in learning more about using plumber, logging, Shiny, and RStudio Connect, please visit &lt;a href=&#34;https://community.rstudio.com/&#34;&gt;community.rstudio.com&lt;/a&gt; and let us know!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;James Blair is a solutions engineer at RStudio who focuses on tools,
technologies, and best practices for using R in the enterprise.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/08/13/plumber-logging/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Three Strategies for Working with Big Data in R</title>
      <link>https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/</link>
      <pubDate>Wed, 17 Jul 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/</guid>
      <description>
        


&lt;p&gt;For many R users, it’s obvious &lt;em&gt;why&lt;/em&gt; you’d want to use R with big data, but not so obvious how. In fact, many people (wrongly) believe that R just doesn’t work very well for big data.&lt;/p&gt;
&lt;p&gt;In this article, I’ll share three strategies for thinking about how to use big data in R, as well as some examples of how to execute each of them.&lt;/p&gt;
&lt;p&gt;By default R runs only on data that can fit into your computer’s memory. Hardware advances have made this less of a problem for many users since these days, most laptops come with at least 4-8Gb of memory, and you can get instances on any major cloud provider with terabytes of RAM. But this is still a real problem for almost any data set that could really be called &lt;em&gt;big data&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The fact that R runs on in-memory data is the biggest issue that you face when trying to use Big Data in R. The data has to fit into the RAM on your machine, and it’s not even 1:1. Because you’re actually &lt;em&gt;doing&lt;/em&gt; something with the data, a good rule of thumb is that your machine needs 2-3x the RAM of the size of your data.&lt;/p&gt;
&lt;p&gt;An other big issue for doing Big Data work in R is that data transfer speeds are extremely slow relative to the time it takes to actually do data processing once the data has transferred. For example, the time it takes to make a call over the internet from San Francisco to New York City takes over 4 times longer than reading from a standard hard drive and over 200 times longer than reading from a solid state hard drive.&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; This is an especially big problem early in developing a model or analytical project, when data might have to be pulled repeatedly.&lt;/p&gt;
&lt;p&gt;Nevertheless, there are effective methods for working with big data in R. In this post, I’ll share three strategies. And, it important to note that these strategies aren’t mutually exclusive – they can be combined as you see fit!&lt;/p&gt;
&lt;div id=&#34;strategy-1-sample-and-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Strategy 1: Sample and Model&lt;/h2&gt;
&lt;p&gt;To sample and model, you downsample your data to a size that can be easily downloaded in its entirety and create a model on the sample. Downsampling to thousands – or even hundreds of thousands – of data points can make model runtimes feasible while also maintaining statistical validity.&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If maintaining class balance is necessary (or one class needs to be over/under-sampled), it’s reasonably simple stratify the data set during sampling.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2019-07-01-3-big-data-paradigms-for-r_files/sample_model.png&#34; alt=&#34;Illustration of Sample and Model&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Illustration of Sample and Model&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;advantages&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Advantages&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Speed&lt;/strong&gt; Relative to working on your entire data set, working on just a sample can drastically decrease run times and increase iteration speed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prototyping&lt;/strong&gt; Even if you’ll eventually have to run your model on the entire data set, this can be a good way to refine hyperparameters and do feature engineering for your model.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Packages&lt;/strong&gt; Since you’re working on a normal in-memory data set, you can use all your favorite R packages.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;disadvantages&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Disadvantages&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Sampling&lt;/strong&gt; Downsampling isn’t terribly difficult, but does need to be done with care to ensure that the sample is valid and that you’ve pulled enough points from the original data set.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scaling&lt;/strong&gt; If you’re using sample and model to prototype something that will later be run on the full data set, you’ll need to have a strategy (such as &lt;a href=&#34;#push-compute&#34;&gt;pushing compute to the data&lt;/a&gt;) for scaling your prototype version back to the full data set.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Totals&lt;/strong&gt; Business Intelligence (BI) tasks frequently answer questions about totals, like the count of all sales in a month. One of the other strategies is usually a better fit in this case.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;strategy-2-chunk-and-pull&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Strategy 2: Chunk and Pull&lt;/h2&gt;
&lt;p&gt;In this strategy, the data is chunked into separable units and each chunk is pulled separately and operated on serially, in parallel, or after recombining. This strategy is conceptually similar to the &lt;a href=&#34;https://en.wikipedia.org/wiki/MapReduce&#34;&gt;MapReduce&lt;/a&gt; algorithm. Depending on the task at hand, the chunks might be time periods, geographic units, or logical like separate businesses, departments, products, or customer segments.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2019-07-01-3-big-data-paradigms-for-r_files/chunk_pull.png&#34; alt=&#34;Chunk and Pull Illustration&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Chunk and Pull Illustration&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;advantages-1&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Advantages&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Full data set&lt;/strong&gt; The entire data set gets used.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Parallelization&lt;/strong&gt; If the chunks are run separately, the problem is easy to treat as &lt;a href=&#34;https://en.wikipedia.org/wiki/Embarrassingly_parallel&#34;&gt;embarassingly parallel&lt;/a&gt; and make use of parallelization to speed runtimes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;disadvantages-1&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Disadvantages&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Need Chunks&lt;/strong&gt; Your data needs to have separable chunks for chunk and pull to be appropriate.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pull All Data&lt;/strong&gt; Eventually have to pull in all data, which may still be very time and memory intensive.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stale Data&lt;/strong&gt; The data may require periodic refreshes from the database to stay up-to-date since you’re saving a version on your local machine.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;push-compute&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Strategy 3: Push Compute to Data&lt;/h2&gt;
&lt;p&gt;In this strategy, the data is compressed on the database, and only the compressed data set is moved out of the database into R. It is often possible to obtain significant speedups simply by doing summarization or filtering in the database before pulling the data into R.&lt;/p&gt;
&lt;p&gt;Sometimes, more complex operations are also possible, including computing histogram and raster maps with &lt;a href=&#34;https://db.rstudio.com/dbplot/&#34;&gt;&lt;code&gt;dbplot&lt;/code&gt;&lt;/a&gt;, building a model with &lt;a href=&#34;https://cran.r-project.org/web/packages/modeldb/index.html&#34;&gt;&lt;code&gt;modeldb&lt;/code&gt;&lt;/a&gt;, and generating predictions from machine learning models with &lt;a href=&#34;https://db.rstudio.com/tidypredict/&#34;&gt;&lt;code&gt;tidypredict&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2019-07-01-3-big-data-paradigms-for-r_files/push_data.png&#34; alt=&#34;Push Compute to Data Illustration&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Push Compute to Data Illustration&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;advantages-2&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Advantages&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Use the Database&lt;/strong&gt; Takes advantage of what databases are often best at: quickly summarizing and filtering data based on a query.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;More Info, Less Transfer&lt;/strong&gt; By compressing before pulling data back to R, the entire data set gets used, but transfer times are far less than moving the entire data set.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;disadvantages-2&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Disadvantages&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Database Operations&lt;/strong&gt; Depending on what database you’re using, some operations might not be supported.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Database Speed&lt;/strong&gt; In some contexts, the limiting factor for data analysis is the speed of the database itself, and so pushing more work onto the database is the last thing analysts want to do.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;an-example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;An Example&lt;/h2&gt;
&lt;p&gt;I’ve preloaded the &lt;code&gt;flights&lt;/code&gt; data set from the &lt;a href=&#34;https://cran.r-project.org/web/packages/nycflights13/index.html&#34;&gt;&lt;code&gt;nycflights13&lt;/code&gt;&lt;/a&gt; package into a PostgreSQL database, which I’ll use for these examples.&lt;/p&gt;
&lt;p&gt;Let’s start by connecting to the database. I’m using a config file here to connect to the database, one of RStudio’s &lt;a href=&#34;https://db.rstudio.com/best-practices/managing-credentials/&#34;&gt;recommended database connection methods&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(DBI)
library(dplyr)
library(ggplot2)

db &amp;lt;- DBI::dbConnect(
  odbc::odbc(),
  Driver = config$driver,
  Server = config$server,
  Port = config$port,
  Database = config$database,
  UID = config$uid,
  PWD = config$pwd,
  BoolsAsChar = &amp;quot;&amp;quot;
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;a href=&#34;https://dplyr.tidyverse.org/&#34;&gt;&lt;code&gt;dplyr&lt;/code&gt;&lt;/a&gt; package is a great tool for interacting with databases, since I can write normal R code that is translated into SQL on the backend. I could also use the &lt;a href=&#34;https://db.rstudio.com/dbi/&#34;&gt;&lt;code&gt;DBI&lt;/code&gt;&lt;/a&gt; package to send queries directly, or a &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/language-engines.html#sql&#34;&gt;SQL chunk&lt;/a&gt; in the R Markdown document.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df &amp;lt;- dplyr::tbl(db, &amp;quot;flights&amp;quot;)
tally(df)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 1 x 1
##        n
##    &amp;lt;int&amp;gt;
## 1 336776&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With only a few hundred thousand rows, this example isn’t close to the kind of big data that really requires a Big Data strategy, but it’s rich enough to demonstrate on.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;sample-and-model&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Sample and Model&lt;/h2&gt;
&lt;p&gt;Let’s say I want to model whether flights will be delayed or not. This is a great problem to sample and model.&lt;/p&gt;
&lt;p&gt;Let’s start with some minor cleaning of the data&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Create is_delayed column in database
df &amp;lt;- df %&amp;gt;%
  mutate(
    # Create is_delayed column
    is_delayed = arr_delay &amp;gt; 0,
    # Get just hour (currently formatted so 6 pm = 1800)
    hour = sched_dep_time / 100
  ) %&amp;gt;%
  # Remove small carriers that make modeling difficult
  filter(!is.na(is_delayed) &amp;amp; !carrier %in% c(&amp;quot;OO&amp;quot;, &amp;quot;HA&amp;quot;))


df %&amp;gt;% count(is_delayed)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 2
##   is_delayed      n
##   &amp;lt;lgl&amp;gt;       &amp;lt;int&amp;gt;
## 1 FALSE      194078
## 2 TRUE       132897&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These classes are reasonably well balanced, but since I’m going to be using logistic regression, I’m going to load a perfectly balanced sample of 40,000 data points.&lt;/p&gt;
&lt;p&gt;For most databases, random sampling methods don’t work super smoothly with R, so I can’t use &lt;code&gt;dplyr::sample_n&lt;/code&gt; or &lt;code&gt;dplyr::sample_frac&lt;/code&gt;. I’ll have to be a little more manual.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(1028)

# Create a modeling dataset 
df_mod &amp;lt;- df %&amp;gt;%
  # Within each class
  group_by(is_delayed) %&amp;gt;%
  # Assign random rank (using random and row_number from postgres)
  mutate(x = random() %&amp;gt;% row_number()) %&amp;gt;%
  ungroup()

# Take first 20K for each class for training set
df_train &amp;lt;- df_mod %&amp;gt;%
  filter(x &amp;lt;= 20000) %&amp;gt;%
  collect()

# Take next 5K for test set
df_test &amp;lt;- df_mod %&amp;gt;%
  filter(x &amp;gt; 20000 &amp;amp; x &amp;lt;= 25000) %&amp;gt;%
  collect()

# Double check I sampled right
count(df_train, is_delayed)
count(df_test, is_delayed)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 2
##   is_delayed     n
##   &amp;lt;lgl&amp;gt;      &amp;lt;int&amp;gt;
## 1 FALSE      20000
## 2 TRUE       20000&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 2
##   is_delayed     n
##   &amp;lt;lgl&amp;gt;      &amp;lt;int&amp;gt;
## 1 FALSE       5000
## 2 TRUE        5000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s build a model – let’s see if we can predict whether there will be a delay or not by the combination of the carrier, the month of the flight, and the time of day of the flight.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mod &amp;lt;- glm(is_delayed ~ carrier + 
             as.character(month) + 
             poly(sched_dep_time, 3),
           family = &amp;quot;binomial&amp;quot;, 
           data = df_train)

# Out-of-Sample AUROC
df_test$pred &amp;lt;- predict(mod, newdata = df_test)
auc &amp;lt;- suppressMessages(pROC::auc(df_test$is_delayed, df_test$pred))
auc&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Area under the curve: 0.6425&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, this is not a great model and any modelers reading this will have many ideas of how to improve what I’ve done. But that wasn’t the point!&lt;/p&gt;
&lt;p&gt;I built a model on a small subset of a big data set. Including sampling time, this took my laptop less than 10 seconds to run, making it easy to iterate quickly as I want to improve the model. After I’m happy with this model, I could pull down a larger sample or even the entire data set if it’s feasible, or do something with the model from the sample.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;chunk-and-pull&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Chunk and Pull&lt;/h2&gt;
&lt;p&gt;In this case, I want to build another model of on-time arrival, but I want to do it per-carrier. This is exactly the kind of use case that’s ideal for chunk and pull. I’m going to separately pull the data in by carrier and run the model on each carrier’s data.&lt;/p&gt;
&lt;p&gt;I’m going to start by just getting the complete list of the carriers.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Get all unique carriers
carriers &amp;lt;- df %&amp;gt;% 
  select(carrier) %&amp;gt;% 
  distinct() %&amp;gt;% 
  pull(carrier)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, I’ll write a function that&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;takes the name of a carrier as input&lt;/li&gt;
&lt;li&gt;pulls the data for that carrier into R&lt;/li&gt;
&lt;li&gt;splits the data into training and test&lt;/li&gt;
&lt;li&gt;trains the model&lt;/li&gt;
&lt;li&gt;outputs the out-of-sample AUROC (a common measure of model quality)&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;carrier_model &amp;lt;- function(carrier_name) {
  # Pull a chunk of data
  df_mod &amp;lt;- df %&amp;gt;%
    dplyr::filter(carrier == carrier_name) %&amp;gt;%
    collect()
  
  # Split into training and test
  split &amp;lt;- df_mod %&amp;gt;%
    rsample::initial_split(prop = 0.9, strata = &amp;quot;is_delayed&amp;quot;) %&amp;gt;% 
    suppressMessages()
  
  # Get training data
  df_train &amp;lt;- split %&amp;gt;% rsample::training()
  
  # Train model
  mod &amp;lt;- glm(is_delayed ~ as.character(month) + poly(sched_dep_time, 3),
             family = &amp;quot;binomial&amp;quot;,
             data = df_train)
  
  # Get out-of-sample AUROC
  df_test &amp;lt;- split %&amp;gt;% rsample::testing()
  df_test$pred &amp;lt;- predict(mod, newdata = df_test)
  suppressMessages(auc &amp;lt;- pROC::auc(df_test$is_delayed ~ df_test$pred))
  
  auc
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, I’m going to actually run the carrier model function across each of the carriers. This code runs pretty quickly, and so I don’t think the overhead of parallelization would be worth it. But if I wanted to, I would replace the &lt;code&gt;lapply&lt;/code&gt; call below with a parallel backend.&lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(98765)
mods &amp;lt;- lapply(carriers, carrier_model) %&amp;gt;%
  suppressMessages()

names(mods) &amp;lt;- carriers&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s look at the results.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mods&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## $UA
## Area under the curve: 0.6408
## 
## $AA
## Area under the curve: 0.6041
## 
## $B6
## Area under the curve: 0.6475
## 
## $DL
## Area under the curve: 0.6162
## 
## $EV
## Area under the curve: 0.6419
## 
## $MQ
## Area under the curve: 0.5973
## 
## $US
## Area under the curve: 0.6096
## 
## $WN
## Area under the curve: 0.6968
## 
## $VX
## Area under the curve: 0.6969
## 
## $FL
## Area under the curve: 0.6347
## 
## $AS
## Area under the curve: 0.6906
## 
## $`9E`
## Area under the curve: 0.6071
## 
## $F9
## Area under the curve: 0.625
## 
## $YV
## Area under the curve: 0.7029&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So these models (again) are a little better than random chance. The point was that we utilized the chunk and pull strategy to pull the data separately by logical units and building a model on each chunk.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;push-compute-to-the-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Push Compute to the Data&lt;/h2&gt;
&lt;p&gt;In this case, I’m doing a pretty simple BI task - plotting the proportion of flights that are late by the hour of departure and the airline.&lt;/p&gt;
&lt;p&gt;Just by way of comparison, let’s run this first the naive way – pulling all the data to my system and then doing my data manipulation to plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;system.time(
  df_plot &amp;lt;- df %&amp;gt;%
    collect() %&amp;gt;%
    # Change is_delayed to numeric
    mutate(is_delayed = ifelse(is_delayed, 1, 0)) %&amp;gt;%
    group_by(carrier, sched_dep_time) %&amp;gt;%
    # Get proportion per carrier-time
    summarize(delay_pct = mean(is_delayed, na.rm = TRUE)) %&amp;gt;%
    ungroup() %&amp;gt;%
    # Change string times into actual times
    mutate(sched_dep_time = stringr::str_pad(sched_dep_time, 4, &amp;quot;left&amp;quot;, &amp;quot;0&amp;quot;) %&amp;gt;% 
             strptime(&amp;quot;%H%M&amp;quot;) %&amp;gt;% 
             as.POSIXct())) -&amp;gt; timing1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that wasn’t too bad, just &lt;code&gt;2.366&lt;/code&gt; seconds on my laptop.&lt;/p&gt;
&lt;p&gt;But let’s see how much of a speedup we can get from chunk and pull. The conceptual change here is significant - I’m doing as much work as possible on the Postgres server now instead of locally. But using &lt;code&gt;dplyr&lt;/code&gt; means that the code change is minimal. The only difference in the code is that the &lt;code&gt;collect&lt;/code&gt; call got moved down by a few lines (to below &lt;code&gt;ungroup()&lt;/code&gt;).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;system.time(
  df_plot &amp;lt;- df %&amp;gt;%
    # Change is_delayed to numeric
    mutate(is_delayed = ifelse(is_delayed, 1, 0)) %&amp;gt;%
    group_by(carrier, sched_dep_time) %&amp;gt;%
    # Get proportion per carrier-time
    summarize(delay_pct = mean(is_delayed, na.rm = TRUE)) %&amp;gt;%
    ungroup() %&amp;gt;%
    collect() %&amp;gt;%
    # Change string times into actual times
    mutate(sched_dep_time = stringr::str_pad(sched_dep_time, 4, &amp;quot;left&amp;quot;, &amp;quot;0&amp;quot;) %&amp;gt;% 
             strptime(&amp;quot;%H%M&amp;quot;) %&amp;gt;% 
             as.POSIXct())) -&amp;gt; timing2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It might have taken you the same time to read this code as the last chunk, but this took only &lt;code&gt;0.269&lt;/code&gt; seconds to run, almost an order of magnitude faster!&lt;a href=&#34;#fn4&#34; class=&#34;footnote-ref&#34; id=&#34;fnref4&#34;&gt;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; That’s pretty good for just moving one line of code.&lt;/p&gt;
&lt;p&gt;Now that we’ve done a speed comparison, we can create the nice plot we all came for.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df_plot %&amp;gt;%
  mutate(carrier = paste0(&amp;quot;Carrier: &amp;quot;, carrier)) %&amp;gt;%
  ggplot(aes(x = sched_dep_time, y = delay_pct)) +
  geom_line() +
  facet_wrap(&amp;quot;carrier&amp;quot;) +
  ylab(&amp;quot;Proportion of Flights Delayed&amp;quot;) +
  xlab(&amp;quot;Time of Day&amp;quot;) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_datetime(date_breaks = &amp;quot;4 hours&amp;quot;, 
                   date_labels = &amp;quot;%H&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-07-01-3-big-data-paradigms-for-r_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;It looks to me like flights later in the day might be a little more likely to experience delays, but that’s a question for another blog post.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;&lt;a href=&#34;https://blog.codinghorror.com/the-infinite-space-between-words/&#34; class=&#34;uri&#34;&gt;https://blog.codinghorror.com/the-infinite-space-between-words/&lt;/a&gt;&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;This isn’t just a general heuristic. You’ll probably remember that the error in many statistical processes is determined by a factor of &lt;span class=&#34;math inline&#34;&gt;\(\frac{1}{n^2}\)&lt;/span&gt; for sample size &lt;span class=&#34;math inline&#34;&gt;\(n\)&lt;/span&gt;, so a lot of the statistical power in your model is driven by adding the first few thousand observations compared to the final millions.&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;One of the biggest problems when parallelizing is dealing with random number generation, which you use here to make sure that your test/training splits are reproducible. It’s not an insurmountable problem, but requires some careful thought.&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn4&#34;&gt;&lt;p&gt;And lest you think the real difference here is offloading computation to a more powerful database, this Postgres instance is running on a container on my laptop, so it’s got exactly the same horsepower behind it.&lt;a href=&#34;#fnref4&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Reproducible Environments</title>
      <link>https://rviews.rstudio.com/2019/04/22/reproducible-environments/</link>
      <pubDate>Mon, 22 Apr 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/04/22/reproducible-environments/</guid>
      <description>
        
&lt;script src=&#34;/rmarkdown-libs/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-binding/plotly.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/typedarray/typedarray.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/jquery/jquery.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/crosstalk/css/crosstalk.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/crosstalk/js/crosstalk.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/plotly-htmlwidgets-css/plotly-htmlwidgets.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-main/plotly-latest.min.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;Great data science work should be reproducible. The ability to repeat
experiments is part of the foundation for all science, and reproducible work is
also critical for business applications. Team collaboration, project validation,
and sustainable products presuppose the ability to reproduce work over time.&lt;/p&gt;
&lt;p&gt;In my opinion, mastering just a handful of important tools will make
reproducible work in R much easier for data scientists. R users should be
familiar with version control, RStudio projects, and literate programming
through R Markdown. Once these tools are mastered, the major remaining challenge
is creating a reproducible environment.&lt;/p&gt;
&lt;p&gt;An environment consists of all the dependencies required to enable your code to
run correctly. This includes R itself, R packages, and system dependencies. As
with many programming languages, it can be challenging to manage reproducible R
environments. Common issues include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code that used to run no longer runs, even though the code has not changed.&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Being afraid to upgrade or install a new package, because it might break your code or someone else’s.&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Typing &lt;code&gt;install.packages&lt;/code&gt; in your environment doesn’t do anything, or doesn’t do the &lt;em&gt;right&lt;/em&gt; thing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These challenges can be addressed through a careful combination of tools and
strategies. This post describes two use cases for reproducible environments:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Safely upgrading packages&lt;/li&gt;
&lt;li&gt;Collaborating on a team&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The sections below each cover a strategy to address the use case, and the necessary
tools to implement each strategy. Additional use cases, strategies, and tools are
presented at &lt;a href=&#34;https://environments.rstudio.com&#34; class=&#34;uri&#34;&gt;https://environments.rstudio.com&lt;/a&gt;. This website is a work in
progress, but we look forward to your feedback.&lt;/p&gt;
&lt;div id=&#34;safely-upgrading-packages&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Safely Upgrading Packages&lt;/h2&gt;
&lt;p&gt;Upgrading packages can be a risky affair. It is not difficult to find serious R
users who have been in a situation where upgrading a package had unintended
consequences. For example, the upgrade may have broken parts of their current code, or upgrading a
package for one project accidentally broke the code in another project. A
strategy for safely upgrading packages consists of three steps:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Isolate a project&lt;/li&gt;
&lt;li&gt;Record the current dependencies&lt;/li&gt;
&lt;li&gt;Upgrade packages&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first step in this strategy ensures one project’s packages and upgrades
won’t interfere with any other projects. Isolating projects is accomplished by
creating per-project libraries. A tool that makes this easy is the new &lt;a href=&#34;https://github.com/rstudio/renv&#34;&gt;&lt;code&gt;renv&lt;/code&gt;
package&lt;/a&gt;. Inside of your R project, simply use:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# inside the project directory
renv::init()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second step is to record the current dependencies. This step is critical
because it creates a safety net. If the package upgrade goes poorly, you’ll be
able to revert the changes and return to the record of the working state. Again,
the &lt;code&gt;renv&lt;/code&gt; package makes this process easy.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# record the current dependencies in a file called renv.lock
renv::snapshot()

# commit the lockfile alongside your code in version control
# and use this function to view the history of your lockfile
renv::history()

# if an upgrade goes astray, revert the lockfile
renv::revert(commit = &amp;quot;abc123&amp;quot;)

# and restore the previous environment
renv::restore()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With an isolated project and a safety net in place, you can now proceed to
upgrade or add new packages, while remaining certain the current functional
environment is still reproducible. The &lt;a href=&#34;https://github.com/r-lib/pak&#34;&gt;&lt;code&gt;pak&lt;/code&gt;
package&lt;/a&gt; can be used to install and upgrade
packages in an interactive environment:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# upgrade packages quickly and safely
pak::pkg_install(&amp;quot;ggplot2&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The safety net provided by the &lt;code&gt;renv&lt;/code&gt; package relies on access to older versions
of R packages. For public packages, CRAN provides these older versions in the
&lt;a href=&#34;https://cran.rstudio.com/src/contrib/Archive&#34;&gt;CRAN archive&lt;/a&gt;. Organizations can
use tools like &lt;a href=&#34;https://rstudio.com/products/package-manager&#34;&gt;RStudio Package
Manager&lt;/a&gt; to make multiple versions
of private packages available. The &lt;a href=&#34;https://environments.rstudio.com/snapshot&#34;&gt;“snapshot and
restore”&lt;/a&gt; approach can also be used
to &lt;a href=&#34;https://environments.rstudio.com/deploy&#34;&gt;promote content to production&lt;/a&gt;. In
fact, this approach is exactly how &lt;a href=&#34;https://rstudio.com/products/connect&#34;&gt;RStudio
Connect&lt;/a&gt; and
&lt;a href=&#34;https://shinyapps.io&#34;&gt;shinyapps.io&lt;/a&gt; deploy thousands of R applications to
production each day!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;team-collaboration&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Team Collaboration&lt;/h2&gt;
&lt;p&gt;A common challenge on teams is sharing and running code. One strategy that
administrators and R users can adopt to facilitate collaboration is
shared baselines. The basics of the strategy are simple:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Administrators setup a common environment for R users by installing RStudio Server.&lt;/li&gt;
&lt;li&gt;On the server, administrators &lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/215488098&#34;&gt;install multiple versions of R&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Each version of R is tied to a frozen repository using a Rprofile.site file.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By using a frozen repository, either administrators or users can install
packages while still being sure that everyone will get the same set of packages.
A frozen repository also ensures that adding new packages won’t upgrade other
shared packages as a side-effect. New packages and upgrades are offered to users
over time through the addition of new versions of R.&lt;/p&gt;
&lt;p&gt;Frozen repositories can be created by manually cloning CRAN, accessing a service
like MRAN, or utilizing a supported product like &lt;a href=&#34;https://rstudio.com/products/package-manager&#34;&gt;RStudio Package
Manager&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-04-15-repro-envs_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;adaptable-strategies&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Adaptable Strategies&lt;/h2&gt;
&lt;p&gt;The prior sections presented specific strategies for creating reproducible
environments in two common cases. The same strategy may not be appropriate for
every organization, R user, or situation. If you’re a student reporting an
error to your professor, capturing your &lt;code&gt;sessionInfo()&lt;/code&gt; may be all you need. In
contrast, a statistician working on a clinical trial will need a robust
framework for recreating their environment. &lt;strong&gt;Reproducibility is not binary!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-04-15-repro-envs_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;To help pick between strategies, we’ve developed a &lt;a href=&#34;https://environments.rstudio.com/reproduce&#34;&gt;strategy
map&lt;/a&gt;. By answering two questions,
you can quickly identify where your team falls on this map and identify the
nearest successful strategy. The two questions are represented on the x and
y-axis of the map:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Do I have any restrictions on what packages can be used?&lt;/li&gt;
&lt;li&gt;Who is responsible for managing installed packages?&lt;/li&gt;
&lt;/ol&gt;
&lt;div id=&#34;htmlwidget-1&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;plotly html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-1&#34;&gt;{&#34;x&#34;:{&#34;data&#34;:[{&#34;x&#34;:[-0.05,1.05],&#34;y&#34;:[0.15,1.25],&#34;text&#34;:&#34;&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(0,0,0,0.2)&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0,0,0.8,0],&#34;y&#34;:[0.2,1,1,0.2],&#34;text&#34;:&#34;NA&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(255,0,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0,1,0.8,0,0],&#34;y&#34;:[null,0.8,1,0.2,null],&#34;text&#34;:&#34;NA&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(0,255,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0,0,1,0.2,0],&#34;y&#34;:[0,0.2,0.8,0,0],&#34;text&#34;:&#34;&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(0,255,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.2,1,1,0.2],&#34;y&#34;:[0,0,0.8,0],&#34;text&#34;:&#34;NA&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(255,0,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[-0.05,1.05],&#34;y&#34;:[-0.25,0.85],&#34;text&#34;:&#34;&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(0,0,0,0.2)&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.5,0.75,0.2],&#34;y&#34;:[0.75,0.2,0.8],&#34;text&#34;:[&#34;Open access, &lt;br /&gt; not reproducible, &lt;br /&gt; how we learn&#34;,&#34;Backdoor package access, &lt;br /&gt; offline systems without a strategy&#34;,&#34;Admins involved, &lt;br /&gt; no testing, &lt;br /&gt; slow updates, &lt;br /&gt; high risk of breakage&#34;],&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;markers&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(255,0,0,1)&#34;,&#34;opacity&#34;:1,&#34;size&#34;:5.66929133858268,&#34;symbol&#34;:&#34;circle&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(255,0,0,1)&#34;}},&#34;hoveron&#34;:&#34;points&#34;,&#34;name&#34;:&#34;FALSE&#34;,&#34;legendgroup&#34;:&#34;FALSE&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.1,0.5,0.8],&#34;y&#34;:[0.1,0.5,0.8],&#34;text&#34;:[&#34;Admins test and approve &lt;br /&gt; a subset of CRAN&#34;,&#34;All or most of CRAN, &lt;br /&gt; updated with R versions, &lt;br /&gt; tied to a system library&#34;,&#34;Open access, user or system &lt;br /&gt; records per-project dependencies&#34;],&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;markers&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(163,197,134,1)&#34;,&#34;opacity&#34;:1,&#34;size&#34;:5.66929133858268,&#34;symbol&#34;:&#34;circle&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(163,197,134,1)&#34;}},&#34;hoveron&#34;:&#34;points&#34;,&#34;name&#34;:&#34; TRUE&#34;,&#34;legendgroup&#34;:&#34; TRUE&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.125,0.525,0.525,0.825,0.775,0.225],&#34;y&#34;:[0.125,0.525,0.775,0.825,0.225,0.825],&#34;text&#34;:[&#34;Validated&#34;,&#34;Shared Baseline&#34;,&#34;Wild West&#34;,&#34;Snapshot&#34;,&#34;Blocked&#34;,&#34;Ticket System&#34;],&#34;hovertext&#34;:[&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;],&#34;textfont&#34;:{&#34;size&#34;:14.6645669291339,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;},&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;text&#34;,&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null}],&#34;layout&#34;:{&#34;margin&#34;:{&#34;t&#34;:43.7625570776256,&#34;r&#34;:7.30593607305936,&#34;b&#34;:40.1826484018265,&#34;l&#34;:89.8630136986302},&#34;font&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:14.6118721461187},&#34;title&#34;:&#34;Reproducing Environments: Strategies and Danger Zones&#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:17.5342465753425},&#34;xaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;type&#34;:&#34;linear&#34;,&#34;autorange&#34;:false,&#34;range&#34;:[-0.05,1.05],&#34;tickmode&#34;:&#34;array&#34;,&#34;ticktext&#34;:[&#34;Admins&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Users&#34;],&#34;tickvals&#34;:[0,0.25,0.5,0.75,1],&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;Admins&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Users&#34;],&#34;nticks&#34;:null,&#34;ticks&#34;:&#34;&#34;,&#34;tickcolor&#34;:null,&#34;ticklen&#34;:3.65296803652968,&#34;tickwidth&#34;:0,&#34;showticklabels&#34;:true,&#34;tickfont&#34;:{&#34;color&#34;:&#34;rgba(77,77,77,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:11.689497716895},&#34;tickangle&#34;:-0,&#34;showline&#34;:false,&#34;linecolor&#34;:null,&#34;linewidth&#34;:0,&#34;showgrid&#34;:true,&#34;gridcolor&#34;:&#34;rgba(235,235,235,1)&#34;,&#34;gridwidth&#34;:0.66417600664176,&#34;zeroline&#34;:false,&#34;anchor&#34;:&#34;y&#34;,&#34;title&#34;:&#34;Who is Responsible for Reproducing the Environment?&#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:14.6118721461187},&#34;hoverformat&#34;:&#34;.2f&#34;},&#34;yaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;type&#34;:&#34;linear&#34;,&#34;autorange&#34;:false,&#34;range&#34;:[-0.05,1.05],&#34;tickmode&#34;:&#34;array&#34;,&#34;ticktext&#34;:[&#34;Locked Down&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Open&#34;],&#34;tickvals&#34;:[0,0.25,0.5,0.75,1],&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;Locked Down&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Open&#34;],&#34;nticks&#34;:null,&#34;ticks&#34;:&#34;&#34;,&#34;tickcolor&#34;:null,&#34;ticklen&#34;:3.65296803652968,&#34;tickwidth&#34;:0,&#34;showticklabels&#34;:true,&#34;tickfont&#34;:{&#34;color&#34;:&#34;rgba(77,77,77,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:11.689497716895},&#34;tickangle&#34;:-0,&#34;showline&#34;:false,&#34;linecolor&#34;:null,&#34;linewidth&#34;:0,&#34;showgrid&#34;:true,&#34;gridcolor&#34;:&#34;rgba(235,235,235,1)&#34;,&#34;gridwidth&#34;:0.66417600664176,&#34;zeroline&#34;:false,&#34;anchor&#34;:&#34;x&#34;,&#34;title&#34;:&#34;Package Access&#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:14.6118721461187},&#34;hoverformat&#34;:&#34;.2f&#34;},&#34;shapes&#34;:[{&#34;type&#34;:&#34;rect&#34;,&#34;fillcolor&#34;:null,&#34;line&#34;:{&#34;color&#34;:null,&#34;width&#34;:0,&#34;linetype&#34;:[]},&#34;yref&#34;:&#34;paper&#34;,&#34;xref&#34;:&#34;paper&#34;,&#34;x0&#34;:0,&#34;x1&#34;:1,&#34;y0&#34;:0,&#34;y1&#34;:1}],&#34;showlegend&#34;:false,&#34;legend&#34;:{&#34;bgcolor&#34;:null,&#34;bordercolor&#34;:null,&#34;borderwidth&#34;:0,&#34;font&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:11.689497716895},&#34;y&#34;:1},&#34;hovermode&#34;:&#34;closest&#34;,&#34;barmode&#34;:&#34;relative&#34;},&#34;config&#34;:{&#34;doubleClick&#34;:&#34;reset&#34;,&#34;modeBarButtonsToAdd&#34;:[{&#34;name&#34;:&#34;Collaborate&#34;,&#34;icon&#34;:{&#34;width&#34;:1000,&#34;ascent&#34;:500,&#34;descent&#34;:-50,&#34;path&#34;:&#34;M487 375c7-10 9-23 5-36l-79-259c-3-12-11-23-22-31-11-8-22-12-35-12l-263 0c-15 0-29 5-43 15-13 10-23 23-28 37-5 13-5 25-1 37 0 0 0 3 1 7 1 5 1 8 1 11 0 2 0 4-1 6 0 3-1 5-1 6 1 2 2 4 3 6 1 2 2 4 4 6 2 3 4 5 5 7 5 7 9 16 13 26 4 10 7 19 9 26 0 2 0 5 0 9-1 4-1 6 0 8 0 2 2 5 4 8 3 3 5 5 5 7 4 6 8 15 12 26 4 11 7 19 7 26 1 1 0 4 0 9-1 4-1 7 0 8 1 2 3 5 6 8 4 4 6 6 6 7 4 5 8 13 13 24 4 11 7 20 7 28 1 1 0 4 0 7-1 3-1 6-1 7 0 2 1 4 3 6 1 1 3 4 5 6 2 3 3 5 5 6 1 2 3 5 4 9 2 3 3 7 5 10 1 3 2 6 4 10 2 4 4 7 6 9 2 3 4 5 7 7 3 2 7 3 11 3 3 0 8 0 13-1l0-1c7 2 12 2 14 2l218 0c14 0 25-5 32-16 8-10 10-23 6-37l-79-259c-7-22-13-37-20-43-7-7-19-10-37-10l-248 0c-5 0-9-2-11-5-2-3-2-7 0-12 4-13 18-20 41-20l264 0c5 0 10 2 16 5 5 3 8 6 10 11l85 282c2 5 2 10 2 17 7-3 13-7 17-13z m-304 0c-1-3-1-5 0-7 1-1 3-2 6-2l174 0c2 0 4 1 7 2 2 2 4 4 5 7l6 18c0 3 0 5-1 7-1 1-3 2-6 2l-173 0c-3 0-5-1-8-2-2-2-4-4-4-7z m-24-73c-1-3-1-5 0-7 2-2 3-2 6-2l174 0c2 0 5 0 7 2 3 2 4 4 5 7l6 18c1 2 0 5-1 6-1 2-3 3-5 3l-174 0c-3 0-5-1-7-3-3-1-4-4-5-6z&#34;},&#34;click&#34;:&#34;function(gd) { \n        // is this being viewed in RStudio?\n        if (location.search == &#39;?viewer_pane=1&#39;) {\n          alert(&#39;To learn about plotly for collaboration, visit:\\n https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html&#39;);\n        } else {\n          window.open(&#39;https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html&#39;, &#39;_blank&#39;);\n        }\n      }&#34;}],&#34;cloud&#34;:false,&#34;displayModeBar&#34;:false},&#34;source&#34;:&#34;A&#34;,&#34;attrs&#34;:{&#34;f87a7b9b28cc&#34;:{&#34;intercept&#34;:{},&#34;slope&#34;:{},&#34;type&#34;:&#34;scatter&#34;},&#34;f87a793a87a&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a6f19e578&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87ad286244&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a564b651b&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a6fdafbdf&#34;:{&#34;intercept&#34;:{},&#34;slope&#34;:{}},&#34;f87a11ce26d8&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;colour&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a75583809&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;label&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}}},&#34;cur_data&#34;:&#34;f87a7b9b28cc&#34;,&#34;visdat&#34;:{&#34;f87a7b9b28cc&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a793a87a&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a6f19e578&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87ad286244&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a564b651b&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a6fdafbdf&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a11ce26d8&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a75583809&#34;:[&#34;function (y) &#34;,&#34;x&#34;]},&#34;highlight&#34;:{&#34;on&#34;:&#34;plotly_click&#34;,&#34;persistent&#34;:false,&#34;dynamic&#34;:false,&#34;selectize&#34;:false,&#34;opacityDim&#34;:0.2,&#34;selected&#34;:{&#34;opacity&#34;:1},&#34;debounce&#34;:0},&#34;base_url&#34;:&#34;https://plot.ly&#34;,&#34;.hideLegend&#34;:true},&#34;evals&#34;:[&#34;config.modeBarButtonsToAdd.0.click&#34;],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p&gt;For more information on picking and using these strategies, please visit
&lt;a href=&#34;https://environments.rstudio.com&#34; class=&#34;uri&#34;&gt;https://environments.rstudio.com&lt;/a&gt;. By adopting a strategy for reproducible
environments, R users, administrators, and teams can solve a number of important
challenges. Ultimately, reproducible work adds credibility, creating a solid
foundation for research, business applications, and production systems. We are
excited to be working on tools to make reproducible work in R easy and fun. We
look forward to your feedback, community discussions, and future posts.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/04/22/reproducible-environments/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Slack and Plumber, Part Two</title>
      <link>https://rviews.rstudio.com/2018/11/27/slack-and-plumber-part-two/</link>
      <pubDate>Tue, 27 Nov 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/11/27/slack-and-plumber-part-two/</guid>
      <description>
        


&lt;p&gt;This is the final entry in a three-part series about the &lt;a href=&#34;https://www.rplumber.io/&#34;&gt;&lt;code&gt;plumber&lt;/code&gt;&lt;/a&gt; package. &lt;a href=&#34;https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/&#34;&gt;The first post&lt;/a&gt; introduces &lt;code&gt;plumber&lt;/code&gt; as an R package for building REST API endpoints in R. &lt;a href=&#34;https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/&#34;&gt;The second post&lt;/a&gt; builds a working example of a &lt;code&gt;plumber&lt;/code&gt; API that powers a &lt;a href=&#34;https://api.slack.com/slash-commands&#34;&gt;Slack slash command&lt;/a&gt;. In this final entry, we will secure the API created in the previous post so that it only responds to authenticated requests, and deploy it using &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-11-20-blair-plumber-slack-part-two-files/plumber-slack-demo.gif&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;As a reminder, this API is built on top of simulated customer call data. The slash command we create will allow users to view a customer status report within Slack. This status report contains customer name, total calls, date of birth, and a plot of call history for the past 20 weeks. The simulated data, along with the script used to create it, can be found in the &lt;a href=&#34;https://github.com/sol-eng/plumber-slack&#34;&gt;GitHub repository&lt;/a&gt; for this example.&lt;/p&gt;
&lt;div id=&#34;setup&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;p&gt;Successfully following this example assumes you have created a &lt;a href=&#34;https://slack.com&#34;&gt;Slack&lt;/a&gt; account and you have &lt;a href=&#34;https://api.slack.com/slack-apps&#34;&gt;followed the instructions for creating an app&lt;/a&gt;. The Plumber API as it currently exists is described in detail in the &lt;a href=&#34;https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/&#34;&gt;previous post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This API can be run through the UI as previously described, or by running &lt;code&gt;plumber::plumb(&amp;quot;plumber.R&amp;quot;)$run(port = 5762)&lt;/code&gt; from the directory containing the API defined in &lt;code&gt;plumber.R&lt;/code&gt;. As it stands now, this API could be deployed and used by Slack. However, it’s important to remember that we have no control over the request that Slack makes to the API. Because of this, we can’t rely on RStudio Connect’s &lt;a href=&#34;http://docs.rstudio.com/connect/admin/content-management.html#api-keys&#34;&gt;built-in API authentication mechanism&lt;/a&gt; to secure the API because there is no way to submit a key with the request. Our options are either to expose the API with no security, meaning anyone can access the endpoints we’ve defined, or to find some other mechanism for securing the API so that it only responds to authorized requests.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;api-security-patterns&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;API Security Patterns&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.rplumber.io/docs/security.html&#34;&gt;&lt;code&gt;plumber&lt;/code&gt; documentation&lt;/a&gt; provides a good introduction to API security for the R user:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The majority of R programmers have not been trained to give much attention to the security of the code that they write. This is for good reason since running R code on your own machine with no external input gives little opportunity for attackers to leverage your R code to do anything malicious. However, as soon as you expose an API on a network, your concerns and thought process must adapt accordingly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;API security can be challenging to address. As it stands today, it is the developer’s responsibility to provide proper security on API endpoints, though in the future, there may be additional security features added to &lt;code&gt;plumber&lt;/code&gt; or available via other R packages.&lt;/p&gt;
&lt;p&gt;As mentioned in the &lt;a href=&#34;https://www.rplumber.io/docs/security.html&#34;&gt;&lt;code&gt;plumber&lt;/code&gt; documentation&lt;/a&gt;, there are a number of things to consider when designing API security. For example, if the API is deployed on an internal network, securing the API may not be as important as it would be if the API was publicly exposed on the internet. When an API needs to be secured, there are several potential attack vectors that need to be handled. In this specific example, we are exposing a public endpoint that provides access to sensitive customer data. If we are unable to authenticate incoming requests, then we risk exposing sensitive data. To prevent this data from falling into the wrong hands, we will focus on verifying incoming requests so that the API only responds to requests made from Slack.&lt;/p&gt;
&lt;p&gt;There are several different methods for authenticating requests made to API endpoints. One common method is the use of API keys, which are cryptographically secure values sent with the request to verify the identity of the client. However, in this case, we have no control over the request Slack sends, so we cannot include such a key in the request. Thankfully, Slack has provided an alternative authentication method using &lt;a href=&#34;https://api.slack.com/docs/verifying-requests-from-slack&#34;&gt;signed secrets&lt;/a&gt;. Full details can be read in the Slack documentation, but in essence, each Slack application is assigned a unique secret value that, when used in connection with other request details, can be used to verify that an incoming request is indeed coming from Slack and not an unknown third party.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;securing-the-api&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Securing the API&lt;/h2&gt;
&lt;p&gt;In order to secure our API so that only requests from Slack are honored, we first need to obtain the signing secret for our application. This value can be found in the Basic Information section of the Slack application settings. Now, it is important to remember that this is called a signing secret for a reason: it should not be shared with anyone. To avoid exposing this secret, we can save it as an environment variable. We add this in a current R session by using &lt;code&gt;Sys.setenv(SLACK_SIGNING_SECRET = &amp;lt;our signing secret&amp;gt;)&lt;/code&gt;, or we can add it to our &lt;a href=&#34;https://csgillespie.github.io/efficientR/set-up.html#renviron&#34;&gt;&lt;code&gt;.Renviron&lt;/code&gt;&lt;/a&gt; file so that it is set for every R session. Once this is done, we can access this value in R using &lt;code&gt;Sys.getenv(&amp;quot;SLACK_SIGNING_SECRET&amp;quot;)&lt;/code&gt;. Now we are ready to create a function to verify if incoming requests are from Slack.&lt;/p&gt;
&lt;p&gt;Slack provides the following three-step process for verifying requests:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your app receives a request from Slack&lt;/li&gt;
&lt;li&gt;Your app computes a signature based on the request&lt;/li&gt;
&lt;li&gt;You make sure the computed signature matches the signature on the request&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In order to verify all incoming requests, we can define an additional filter for our API that follows the above recipe.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#* Verify incoming requests
#* @filter verify
function(req, res) {
  # Forward requests coming to swagger endpoints
  if (grepl(&amp;quot;swagger&amp;quot;, tolower(req$PATH_INFO))) return(forward())

  # Check for X_SLACK_REQUEST_TIMESTAMP header
  if (is.null(req$HTTP_X_SLACK_REQUEST_TIMESTAMP)) {
    res$status &amp;lt;- 401
  }

  # Build base string
  base_string &amp;lt;- paste(
    &amp;quot;v0&amp;quot;,
    req$HTTP_X_SLACK_REQUEST_TIMESTAMP,
    req$postBody,
    sep = &amp;quot;:&amp;quot;
  )

  # Slack Signing secret is available as environment variable
  # SLACK_SIGNING_SECRET
  computed_request_signature &amp;lt;- paste0(
    &amp;quot;v0=&amp;quot;,
    openssl::sha256(base_string, Sys.getenv(&amp;quot;SLACK_SIGNING_SECRET&amp;quot;))
  )

  # If the computed request signature doesn&amp;#39;t match the signature provided in the
  # request, set status of response to 401
  if (!identical(req$HTTP_X_SLACK_SIGNATURE, computed_request_signature)) {
    res$status &amp;lt;- 401
  } else {
    res$status &amp;lt;- 200
  }

  if (res$status == 401) {
    list(
      text = &amp;quot;Error: Invalid request&amp;quot;
    )
  } else {
    forward()
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are a lot of moving pieces to this filter, but essentially we are following the process outlined by Slack for verifying requests. We also allow Swagger endpoints to be served without verification so that the Swagger UI can still be generated for our API.&lt;/p&gt;
&lt;p&gt;Once this filter is in place, all incoming requests will be verified. However, this will create issues with our &lt;code&gt;/plot/history/&lt;/code&gt; endpoint since it is called using a standard GET request without any Slack authentication. To ensure that this endpoint is able to be utilized as we want, we’ll make some small updates to the endpoint and add &lt;code&gt;#* @preempt verify&lt;/code&gt; to the &lt;code&gt;plumber&lt;/code&gt; comments before the function. This prevents the &lt;code&gt;verify&lt;/code&gt; filter from applying to this endpoint.&lt;/p&gt;
&lt;p&gt;Now, this prevents the Slack authentication process from applying to our plot endpoint. However, this endpoint, if left unsecured, provides unfiltered access to sensitive customer data. We need an effective way to secure this endpoint so that it only responds to requests generated from Slack.&lt;/p&gt;
&lt;p&gt;Since the only thing we control in the request to this endpoint is the URL, we can update our endpoint so that an encrypted parameter is passed as part of the URL. This parameter is a combination of the current datetime and the customer ID that is then encrypted using our Slack signing secret. We can use the &lt;code&gt;encrypt_string()&lt;/code&gt; function from the &lt;a href=&#34;https://talegari.github.io/safer/&#34;&gt;&lt;code&gt;safer&lt;/code&gt;&lt;/a&gt; package to securely encrypt this string. The following example illustrates this process.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;current_time &amp;lt;- Sys.time()
customer_id &amp;lt;- 89
parameter_string &amp;lt;- paste(current_time, customer_id, sep = &amp;quot;;&amp;quot;)
safer::encrypt_string(parameter_string, Sys.getenv(&amp;quot;SLACK_SIGNING_SECRET&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;m7NfMZfpY1n5EuivjuiFQsyKopT68HiX+NIgk5S+VBlDHrVqzRM=&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once we have created this encrypted value, we pass it to the URL of our plot endpoint. Then, within the plot endpoint, we decrypt the string, extract the customer ID, and check to see if the current time is within five seconds of the time encoded in the string. If more than five seconds have passed, we consider the request to be unauthorized. To help with this process, we define two helper functions:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;encrypt_string &amp;lt;- function(string) {
  urltools::url_encode(safer::encrypt_string(paste(Sys.time(), string, sep = &amp;quot;;&amp;quot;),
                                             key = Sys.getenv(&amp;quot;SLACK_SIGNING_SECRET&amp;quot;)))
}

plot_auth &amp;lt;- function(endpoint, time_limit = 5) {
  # Save current time to compare against endpoint time value
  current_time &amp;lt;- Sys.time()

  # Try to decrypt endpoint and extract user id
  tryCatch({
    # Decrypt endpoint using SLACK_SIGNING_SECRET
    decrypted_endpoint &amp;lt;- safer::decrypt_string(endpoint,
                                                key = Sys.getenv(&amp;quot;SLACK_SIGNING_SECRET&amp;quot;))
    # Split endpoint on ;
    endpoint_split &amp;lt;- unlist(strsplit(decrypted_endpoint, split = &amp;quot;;&amp;quot;))
    # Convert time
    endpoint_time &amp;lt;- as.POSIXct(endpoint_split[1])
    # Calculate time difference
    time_diff &amp;lt;- difftime(current_time, endpoint_time, units = &amp;quot;secs&amp;quot;)

    # If more than 5 seconds have passed since the request was generated, then
    # error
    if (time_diff &amp;gt; time_limit) {
      &amp;quot;Unauthorized&amp;quot;
    } else {
      endpoint_split[2]
    }
  },
  error = function(e) &amp;quot;Unauthorized&amp;quot;
  )
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once these helper functions are in place, we can update our &lt;code&gt;/plot/history&lt;/code&gt; endpoint as follows:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#* Plot customer weekly calls
#* @png
#* @param cust_secret encrypted value calculated in /status endpoint
#* @response 400 No customer with the given ID was found.
#* @preempt verify
#* @get /plot/history
function(res, cust_secret) {
  # Authenticate that request came from /status
  cust_id &amp;lt;- plot_auth(cust_secret)

  # Return unauthorized error if cust_id is &amp;quot;Unauthorized&amp;quot;
  if (cust_id == &amp;quot;Unauthorized&amp;quot;) {
    res$status &amp;lt;- 401
    stop(&amp;quot;Unauthorized request&amp;quot;)
  } else if (!cust_id %in% sim_data$id) {
    res$status &amp;lt;- 400
    stop(&amp;quot;Customer id&amp;quot; , cust_id, &amp;quot; not found.&amp;quot;)
  }

  # Filter data to customer id provided
  plot_data &amp;lt;- dplyr::filter(sim_data, id == cust_id)

  # Customer name (id)
  customer_name &amp;lt;- paste0(unique(plot_data$name), &amp;quot; (&amp;quot;, unique(plot_data$id), &amp;quot;)&amp;quot;)

  # Create plot
  history_plot &amp;lt;- plot_data %&amp;gt;%
    ggplot(aes(x = time, y = calls, col = calls)) +
    ggalt::geom_lollipop(show.legend = FALSE) +
    theme_light() +
    labs(
      title = paste(&amp;quot;Weekly calls for&amp;quot;, customer_name),
      x = &amp;quot;Week&amp;quot;,
      y = &amp;quot;Calls&amp;quot;
    )

  # print() is necessary to render plot properly
  print(history_plot)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we need to make one small change to our &lt;code&gt;/status&lt;/code&gt; endpoint so that we properly build the appropriate URL for our image. We construct the list response to the &lt;code&gt;/status&lt;/code&gt; endpoint as follows, where &lt;code&gt;image_url&lt;/code&gt; has been updated.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;    attachments = list(
      list(
        color = customer_status,
        title = paste0(&amp;quot;Status update for &amp;quot;, customer_name, &amp;quot; (&amp;quot;, customer_id, &amp;quot;)&amp;quot;),
        fallback = paste0(&amp;quot;Status update for &amp;quot;, customer_name, &amp;quot; (&amp;quot;, customer_id, &amp;quot;)&amp;quot;),
        # History plot

        image_url = paste0(base_url,
                           &amp;quot;/plot/history?cust_secret=&amp;quot;,
                           encrypt_string(customer_id)),
        # Fields provide a way of communicating semi-tabular data in Slack
        fields = list(
          list(
            title = &amp;quot;Total Calls&amp;quot;,
            value = sum(customer_data$calls),
            short = TRUE
          ),
          list(
            title = &amp;quot;DoB&amp;quot;,
            value = unique(customer_data$dob),
            short = TRUE
          )
        )
      )
    )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Just like that, we have a secure API!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;all-together-now&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;All Together Now&lt;/h2&gt;
&lt;p&gt;Now, given the authorization pieces we have implemented, it is a bit more difficult to test our API since our endpoints will only respond to authorized requests. However, we can use the free version of &lt;a href=&#34;https://www.getpostman.com&#34;&gt;Postman&lt;/a&gt; to test our API. An in-depth look at the capabilities of Postman is beyond the scope of this post, so hopefully a gif will suffice. Further details about using Postman in this context can be found in the &lt;a href=&#34;https://github.com/sol-eng/plumber-slack#running-locally&#34;&gt;GitHub repository&lt;/a&gt; for this example.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-11-20-blair-plumber-slack-part-two-files/postman-demo.gif&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;It appears that everything is working as expected! Our endpoints fail when the authorization criteria are not met, and otherwise they succeed. Notice that the plot endpoint works when initially called, but when a subsequent call is made it fails since more than five seconds have passed since the &lt;code&gt;/status&lt;/code&gt; endpoint was invoked.&lt;/p&gt;
&lt;p&gt;Now, the final step in this process is publishing this API so that Slack can properly interact with it. The easiest way to do this is to publish the API to &lt;a href=&#34;http://docs.rstudio.com/connect/user/publishing.html#publishing-apis&#34;&gt;RStudio Connect&lt;/a&gt;. Once published, Slack can be updated to point the Slash command to our nice, newly secured API.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This brings us to the conclusion of this series. We’ve discovered the power of &lt;code&gt;plumber&lt;/code&gt; in exposing R to downstream consumers via RESTful API endpoints. We built a Slack app powered entirely by R and &lt;code&gt;plumber&lt;/code&gt;, and now we have secured the underlying API so that it only responds to authorized requests. As we have seen, &lt;code&gt;plumber&lt;/code&gt; provides a powerful and flexible framework for exposing R functions as APIs. These APIs can be safely secured so that only authorized requests are permitted.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;James Blair is a solutions engineer at RStudio who focuses on tools, technologies, and best practices for using R in the enterprise.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/11/27/slack-and-plumber-part-two/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Communicating results with R Markdown</title>
      <link>https://rviews.rstudio.com/2018/11/01/r-markdown-a-better-approach/</link>
      <pubDate>Thu, 01 Nov 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/11/01/r-markdown-a-better-approach/</guid>
      <description>
        


&lt;p&gt;&lt;img src=&#34;/post/2018-10-31-Stephens-Communicate_files/r4ds-com.png&#34; height = &#34;400&#34; width=&#34;600&#34;&gt;&lt;/p&gt;
&lt;p&gt;In my training as a consultant, I learned that long hours of analysis were typically followed by equally long hours of preparing for presentations. I had to turn my complex analyses into recommendations, and my success as a consultant depended on my ability to influence decision makers. I used a variety of tools to convey my insights, but over time I increasingly came to rely on &lt;a href=&#34;https://rmarkdown.rstudio.com/&#34;&gt;R Markdown&lt;/a&gt; as my tool of choice. R Markdown is easy to use, allows others to reproduce my work, and has powerful features such as parameterized inputs and multiple output formats. With R Markdown, I can share more work with less effort than I did with previous tools, making me a more effective data scientist. In this post, I want to examine three commonly used communication tools and show how R Markdown is often the better choice.&lt;/p&gt;
&lt;div id=&#34;microsoft-office&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Microsoft Office&lt;/h3&gt;
&lt;p&gt;The de facto tools for communication in the enterprise are still Microsoft Word, PowerPoint, and Excel. These tools, born in the 80’s and rising to prominence in the 90’s, are used everywhere for sharing reports, presentations, and dashboards. Although Microsoft Office documents are easy to share, they can be cumbersome for data scientists to write because they cannot be written with code. Additionally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They are not reproducible.&lt;/li&gt;
&lt;li&gt;They are separate from the code you used to create your analysis.&lt;/li&gt;
&lt;li&gt;They can be time-consuming to create and difficult to maintain.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In data science, your code - not your report or presentation - is the source of your results. Therefore, your documents should also be based on code! You can accomplish this with R Markdown, which produces documents that are generated by code, reproducible, and easy to maintain. Moreover, R Markdown documents can be rendered in &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/word-document.html&#34;&gt;Word&lt;/a&gt;, &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/powerpoint-presentation.html&#34;&gt;PowerPoint&lt;/a&gt;, and many other output formats. So, even if your client insists on having Microsoft documents, by generating them with R Markdown, you can spend more time working on your code and less time maintaining reports.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-scripts&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;R Scripts&lt;/h3&gt;
&lt;p&gt;Data science often involves interactive analyses with code, but code by itself is usually not enough to communicate results in an enterprise setting. In a &lt;a href=&#34;https://rviews.rstudio.com/2017/03/15/why-i-love-r-notebooks/&#34;&gt;previous post&lt;/a&gt;, I explained the benefits of using &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/notebook.html&#34;&gt;R Notebooks&lt;/a&gt; over R scripts for doing data science. An R Notebook is a special execution mode of R Markdown with two characteristics that make it very useful for communicating results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rendering a preview of an R Notebook does not execute R code, making it computationally convenient to create reports during or after interactive analyses.&lt;/li&gt;
&lt;li&gt;R Notebooks have an embedded copy of the source code, making it convenient for others to examine your work.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two characteristics of R Notebooks combine the advantages of R scripts with the advantages of R Markdown. Like R scripts, you can do interactive data analyses and see all your code, but unlike R scripts you can easily create reports that explain why your code is important.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;shiny&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Shiny&lt;/h3&gt;
&lt;p&gt;Shiny and R Markdown are both used to communicate results. They both depend on R, generate high-quality output, and can be designed to accept user inputs. In previous posts, we discussed &lt;a href=&#34;https://rviews.rstudio.com/2017/09/20/dashboards-with-r-and-databases/&#34;&gt;Dashboards with Shiny&lt;/a&gt; and &lt;a href=&#34;https://rviews.rstudio.com/2018/05/16/replacing-excel-reports-with-r-markdown-and-shiny/&#34;&gt;Dashboards with R Markdown&lt;/a&gt;. Knowing when to use Shiny and when to use R Markdown will increase your ability to influence decision makers.&lt;/p&gt;
&lt;table style=&#34;width:44%;&#34;&gt;
&lt;colgroup&gt;
&lt;col width=&#34;20%&#34; /&gt;
&lt;col width=&#34;23%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Shiny Apps&lt;/th&gt;
&lt;th&gt;R Markdown Documents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Have an interactive and responsive user experience.&lt;/td&gt;
&lt;td&gt;Are snapshots in time, rendered in batch.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;Are hosted on a web server that runs R.&lt;/td&gt;
&lt;td&gt;Have multiple output types such as HTML, Word, PDF, and many more.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Are not portable (i.e., users must visit the app).&lt;/td&gt;
&lt;td&gt;Are files that can be sent via email or otherwise shared.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Shiny is great – even “magical” – when you want your end users to have an interactive experience, but R Markdown documents are often simpler to program, easier to maintain, and can reach a wider audience. I use Shiny when I need an interactive user experience, but for everything else, I use R Markdown.&lt;/p&gt;
&lt;p&gt;If you need to accept user input, but you don’t require the reactive framework of Shiny, you can &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/parameterized-reports.html&#34;&gt;add parameters&lt;/a&gt; to your R Markdown code. This &lt;a href=&#34;https://resources.rstudio.com/rstudio-connect-2/parameterized-r-markdown-reports-with-rstudio-connect-aron-atkins&#34;&gt;process is easy and powerful&lt;/a&gt;, yet remains underutilized by most R users. It is a feature that would benefit a wide range of use cases, especially where the full power of Shiny is not required. Additionally, adding parameters to your document makes it easy to generate multiple versions of that document. If you host a document on &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;, then users can select inputs and generate new versions on demand. Many Shiny applications today would be better suited as parameterized R Markdown documents.&lt;/p&gt;
&lt;p&gt;Finally, Shiny and R Markdown are not mutually exclusive. You can include Shiny elements in an R Markdown document, which enables you create a report that responds interactively to user inputs. These &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/shiny-documents.html&#34;&gt;Shiny documents&lt;/a&gt; are created with the simplicity of R markdown, but have the same hosting requirements of a Shiny app and are not portable.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;Using the right tools for communication matters. R Markdown is a better solution than conventional tools for the following problems:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Common tool&lt;/th&gt;
&lt;th&gt;Better tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Share reports and presentations&lt;/td&gt;
&lt;td&gt;Microsoft Office&lt;/td&gt;
&lt;td&gt;R Markdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;Summarize and share your interactive analyses&lt;/td&gt;
&lt;td&gt;R Scripts&lt;/td&gt;
&lt;td&gt;R Notebooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Update results (in batch) based on new inputs&lt;/td&gt;
&lt;td&gt;Shiny&lt;/td&gt;
&lt;td&gt;Parameterized reports&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;a href=&#34;http://r4ds.had.co.nz/index.html&#34;&gt;R For Data Science&lt;/a&gt; explains that, &lt;em&gt;“It doesn’t matter how great your analysis is unless you can explain it to others: you need to communicate your results.”&lt;/em&gt; I highly recommend reading &lt;a href=&#34;https://r4ds.had.co.nz/communicate-intro.html&#34;&gt;Part V&lt;/a&gt; of this book, which has chapters on using &lt;a href=&#34;https://r4ds.had.co.nz/r-markdown.html&#34;&gt;R Markdown&lt;/a&gt; as a unified authoring framework for data science, using &lt;a href=&#34;https://r4ds.had.co.nz/r-markdown-formats.html&#34;&gt;R Markdown formats&lt;/a&gt; for effective communication, and using &lt;a href=&#34;https://r4ds.had.co.nz/r-markdown-workflow.html&#34;&gt;R Markdown workflows&lt;/a&gt; to create analysis notebooks. There are references at the end of these chapters that describe where to learn more about communication.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/11/01/r-markdown-a-better-approach/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Interactive plots in Shiny</title>
      <link>https://rviews.rstudio.com/2018/09/20/shiny-r2d3/</link>
      <pubDate>Thu, 20 Sep 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/09/20/shiny-r2d3/</guid>
      <description>
        


&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-09-17-shiny-r2d3/header.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;I wish this post existed when I was struggling to add interactive plots to my Shiny app. I was mainly focused on recreating functionality found in other “dashboarding” applications. When looking for options, I found that &lt;a href=&#34;https://www.htmlwidgets.org/&#34;&gt;htmlwidgets&lt;/a&gt; were the closest to what companies usually expect. However, while they are great for client-side interactivity, I often hit walls with them when I try to add click-through interactivity because the functionality is either not there, is very limited, or is bloated. With &lt;code&gt;r2d3&lt;/code&gt; there is more work, but the gains in customization and interactivity make it by far the best choice, in my opinion.&lt;/p&gt;
&lt;p&gt;I asked a good friend at work to help me test the sample app provided in this post. She was able to run it easily, but then told me that she didn’t know that she was supposed to click on things. Adding interactive plots is one of the most important capabilities to include in a Shiny app. Sadly though, it seems that very few do it. If we wish to offer an alternative to enterprise reporting and BI tools by using Shiny, we need to do our best to match the interactivity those other tools seem to offer out of the box.&lt;/p&gt;
&lt;div id=&#34;the-sample-app&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The sample app&lt;/h2&gt;
&lt;p&gt;I put together a sample app that should run in your R session by simply copying the code. This will allow us to focus on the details of the approach, and not on the setup.&lt;/p&gt;
&lt;p&gt;A working version of the app is available here: &lt;a href=&#34;https://beta.rstudioconnect.com/content/3940/&#34;&gt;Shiny-r2d3-app&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this app, we can click on the bars and see the &lt;code&gt;DT&lt;/code&gt; object update based on the value of the bar. When the drop-down changes, the plot will update with a nice transition, as well.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;d3-is-hard&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;“D3 is hard”&lt;/h2&gt;
&lt;p&gt;The title is a quote of a luminary in the R community. A few months ago, I told him that I wanted to start using &lt;code&gt;r2d3&lt;/code&gt; but was struggling with making heads or tails of D3. This person has forgotten more than I will ever learn about pretty much any subject. If he says it’s hard, then I’m in for a world of hurt. Nevertheless, my naivete and stubbornness prevailed.&lt;/p&gt;
&lt;p&gt;I’ve since discovered that D3 is a language with which the desired result can be obtained by using one of several coding approaches. The more I learn to use it, the more I like its flexibility as a stand-alone visualization language.&lt;/p&gt;
&lt;p&gt;One thing that helped was to realize that D3 and &lt;code&gt;ggplot2&lt;/code&gt; are similar in the amount of flexibility they offer. Picture that what you are drawing for a bar plot are the actual rectangles, almost as if you’re using &lt;code&gt;geom_rect()&lt;/code&gt;. Except that in D3, the 0,0 coordinates are top/left, as opposed to bottom/left, so we have to flip our thinking upside down when we create a visualization with D3. In addition, the vertical and horizontal positions and sizes are expressed in fractions (read: percentages), so there are no absolute positions.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;a-good-way-to-start&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;A good way to start&lt;/h2&gt;
&lt;p&gt;After trying out several approaches, I think that a good way to start is by having a few “primer” D3 scripts that can be modified to suit a particular app.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;r2d3&lt;/code&gt; calls a D3 script with a &lt;code&gt;.js&lt;/code&gt; extension. As a result, the D3 code sits outside the R script, away from view. With &lt;code&gt;r2d3&lt;/code&gt;, a &lt;code&gt;data.frame&lt;/code&gt; can be used to pass all sorts of attributes (x/y coordinates, colors, etc.) to D3.&lt;/p&gt;
&lt;p&gt;A good way of thinking about these “primers” is that you are building your own &lt;code&gt;geom&lt;/code&gt;s as &lt;code&gt;.js&lt;/code&gt; scripts. So, once it’s done, you can pass the regular “right-side-up” coordinate data to &lt;code&gt;r2d3&lt;/code&gt; and it will know how to calculate the proper offsets to place the shapes in the correct spot.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;a-first-primer&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;A first primer&lt;/h2&gt;
&lt;p&gt;The idea in this section is to provide the smallest possible example that covers what I feel are the most important pieces that make up a presentable and functional product. My hope is that, if you find this interesting and useful for your line of work, you will take your time to dissect what each code section does, to learn the principles of this approach. This way, you can customize and even expand on the primer.&lt;/p&gt;
&lt;p&gt;The first example below is not the full primer. Instead, it is the section where most of the nuances of how the primer works exist. I’ll use that to explain some of the mechanics.&lt;/p&gt;
&lt;p&gt;You can copy-paste the following code in your R session and run it without worrying about file dependencies. I know how important that is when learning new things, so I’m using a small workaround to providing &lt;code&gt;r2d3&lt;/code&gt; a separate &lt;code&gt;.js&lt;/code&gt; file by saving the contents of a character variable that contains the D3 script into a temporary file. This is probably not something that you’ll do in a final Shiny app, but it works well for this example. Based on how the R Views’ code highlighter is setup, all of the D3 code will be in red, and the R code mostly in black:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(shiny)
library(dplyr)
library(r2d3)
library(forcats)

# D3 code inside an R character variable
r2d3_script &amp;lt;- &amp;quot;
// !preview r2d3 data= data.frame(y = 0.1, ylabel = &amp;#39;1%&amp;#39;, fill = &amp;#39;#E69F00&amp;#39;, mouseover = &amp;#39;green&amp;#39;, label = &amp;#39;one&amp;#39;, id = 1)
function svg_height() {return parseInt(svg.style(&amp;#39;height&amp;#39;))}
function svg_width()  {return parseInt(svg.style(&amp;#39;width&amp;#39;))}
function col_top()  {return svg_height() * 0.05; }
function col_left() {return svg_width()  * 0.20; }
function actual_max() {return d3.max(data, function (d) {return d.y; }); }
function col_width()  {return (svg_width() / actual_max()) * 0.55; }
function col_heigth() {return svg_height() / data.length * 0.95; }
var bars = svg.selectAll(&amp;#39;rect&amp;#39;).data(data);
bars.enter().append(&amp;#39;rect&amp;#39;)
    .attr(&amp;#39;x&amp;#39;,      col_left())
    .attr(&amp;#39;y&amp;#39;,      function(d, i) { return i * col_heigth() + col_top(); })
    .attr(&amp;#39;width&amp;#39;,  function(d) { return d.y * col_width(); })
    .attr(&amp;#39;height&amp;#39;, col_heigth() * 0.9)
    .attr(&amp;#39;fill&amp;#39;,   function(d) {return d.fill; })
    .attr(&amp;#39;id&amp;#39;,     function(d) {return (d.label); })
    .on(&amp;#39;click&amp;#39;, function(){
      Shiny.setInputValue(&amp;#39;bar_clicked&amp;#39;, d3.select(this).attr(&amp;#39;id&amp;#39;), {priority: &amp;#39;event&amp;#39;});
    })
    .on(&amp;#39;mouseover&amp;#39;, function(){
      d3.select(this).attr(&amp;#39;fill&amp;#39;, function(d) {return d.mouseover; });
    })
    .on(&amp;#39;mouseout&amp;#39;, function(){
      d3.select(this).attr(&amp;#39;fill&amp;#39;, function(d) {return d.fill; });
    });
&amp;quot;
# Save D3 code into a tempfile
r2d3_file &amp;lt;- tempfile()
writeLines(r2d3_script, r2d3_file)

# Shiny app starts here
ui &amp;lt;- fluidPage(
    d3Output(&amp;quot;d3&amp;quot;)
)

server &amp;lt;- function(input, output, session) {
    output$d3 &amp;lt;- renderD3({
        gss_cat %&amp;gt;%
            group_by(marital) %&amp;gt;%
            tally() %&amp;gt;%
            arrange(desc(n)) %&amp;gt;%
            mutate(
                y = n,
                ylabel = prettyNum(n, big.mark = &amp;quot;,&amp;quot;),
                fill = &amp;quot;#E69F00&amp;quot;,
                mouseover = &amp;quot;#0072B2&amp;quot;
            ) %&amp;gt;%
            r2d3(r2d3_file)
            # ^^ Use the temp file containing the D3 code
    })}

shinyApp(ui = ui, server = server)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result should look like the screenshot below. In your R session, hovering over the bar will change the color. Also notice that the bars do not cover the entire window. This is because there are limits placed in the way of ratios within the functions used on the top of the script.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-09-17-shiny-r2d3/first.png&#34; /&gt;

&lt;/div&gt;
&lt;div id=&#34;code-breakdown&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Code breakdown&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;First, is the D3 code:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;I start by defining some canvas size function beginning with: &lt;code&gt;function svg_height() {return parseInt(svg.style(&#39;height&#39;))}&lt;/code&gt;. These allow for the correct relative placement and size, as well as adapting to a window resize. For example: &lt;code&gt;function actual_max() {return d3.max(data, function (d) {return d.y; }); }&lt;/code&gt; obtains the value of the longest bar, and then: &lt;code&gt;function col_width()  {return (svg_width() / actual_max()) * 0.55; }&lt;/code&gt; makes sure that the largest rectangle (representing a bar) drawn is 55% the size of the window. I used to define these as regular D3 variables, but found that as functions, they worked more consistently when running with Shiny.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With &lt;code&gt;var bars = svg.selectAll(&#39;rect&#39;).data(data);&lt;/code&gt;, we create a new rectangle - better said, a new rectangle set. Just like with &lt;code&gt;geom_rect()&lt;/code&gt;, if you pass a vector with multiple values, it will create multiple rectangles. The last function, &lt;code&gt;data()&lt;/code&gt;, tells D3 to use the &lt;code&gt;data&lt;/code&gt; data set, which is the default name that &lt;code&gt;r2d3&lt;/code&gt; is using when it translates our &lt;code&gt;data.frame&lt;/code&gt; to a D3-friendly format. This is the “secret sauce” that allows us to use that data as attributes of the plot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The rectangles are initially drawn with: &lt;code&gt;bars.enter().append(&#39;rect&#39;)&lt;/code&gt;. This will work fine as long as nothing changes. But with Shiny, we want change, so in a later section, I will introduce the &lt;code&gt;bars.transition()&lt;/code&gt; function.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next, are the attributes (&lt;code&gt;.attr&lt;/code&gt;). Attributes are interesting in these kinds of objects. They are all named as a character variable (&lt;code&gt;x&lt;/code&gt;, &lt;code&gt;fill&lt;/code&gt;, etc.), so it’s essentially free-form. Each type of D3 shape has its own set of expected attributes, such as &lt;code&gt;x&lt;/code&gt;, &lt;code&gt;y&lt;/code&gt;, and &lt;code&gt;width&lt;/code&gt;, but I can also pass a “made-up” attribute and the script will not fail. In other words, if you pass an attribute of a “reserved” name for the shape. it will be used; for example, &lt;code&gt;r&lt;/code&gt; is the attribute for radius of a D3 circle. But if the attribute does not exist, it just becomes metadata that we can use later on if we want. This comes in handy if we want an ID field to be passed to Shiny, but that ID field is not displayed in the plot. The downside is that a misspelled attribute will fail silently, so it makes debugging a bit difficult. In other words, make sure that your attributes are spelled correctly! In the meantime, defining &lt;code&gt;x&lt;/code&gt; is easy because we want it to be as far to the left as possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Most attributes are set based on data passed via &lt;code&gt;r2d3&lt;/code&gt;. We do that by wrapping the value of the attribute inside a function. We already told D3 where the data comes from, so it is implied that in &lt;code&gt;function(d)&lt;/code&gt; the data object will be represented by &lt;code&gt;d&lt;/code&gt;. Another interesting thing about these functions is the second argument, usually represented by &lt;code&gt;i&lt;/code&gt;. It represents the “row number” of the observation. This means that a function like &lt;code&gt;function(d, i) { return d.x * i}&lt;/code&gt; will give the attribute the value of the &lt;code&gt;x&lt;/code&gt; variable of the &lt;code&gt;data.frame&lt;/code&gt; we passed to &lt;code&gt;r2d3&lt;/code&gt;, times the row number. So &lt;code&gt;.attr(&#39;fill&#39;,   function(d) {return d.fill; })&lt;/code&gt; simply passes the &lt;code&gt;fill&lt;/code&gt; value of our &lt;code&gt;data.frame&lt;/code&gt; to D3. Notice that we can name these fields whatever we want; we just need to map them appropriately. With a primer, I found that it’s better to keep either matching (or at the very least, generic) names so we can use them for other plots.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;on()&lt;/code&gt; functions track named events, such as &lt;code&gt;click&lt;/code&gt;, &lt;code&gt;mouseover&lt;/code&gt;, and &lt;code&gt;mouseout&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;click&lt;/code&gt; function will use a Shiny JavaScript function that makes the interaction possible. In &lt;code&gt;Shiny.setInputValue(&#39;bar_clicked&#39;, d3.select(this).attr(&#39;id&#39;), {priority: &#39;event&#39;});&lt;/code&gt;, I specify the name of the input inside Shiny, so &lt;code&gt;bar_clicked&lt;/code&gt; becomes &lt;code&gt;input$bar_clicked&lt;/code&gt; in R. The attribute &lt;code&gt;id&lt;/code&gt; is the value passed to R via that input. This is only a brief introduction to the topic; a much more detailed explanation with illustrations can be found in the &lt;a href=&#34;https://rstudio.github.io/r2d3/articles/shiny.html#d3-to-shiny&#34;&gt;&lt;code&gt;r2d3&lt;/code&gt; site&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;mouseover&lt;/code&gt; and &lt;code&gt;mouseout&lt;/code&gt; events are used to get the color-changing, hover-over effect. On &lt;code&gt;mouseover&lt;/code&gt;, the &lt;code&gt;fill&lt;/code&gt; attribute is updated to use the highlighting color and then restore it to the original color when the pointer leaves with &lt;code&gt;mouseout&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;For the R/Shiny code:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;As mentioned above, using &lt;code&gt;r2d3_file &amp;lt;- tempfile()&lt;/code&gt; and then &lt;code&gt;writeLines(r2d3_script, r2d3_file)&lt;/code&gt; is done to keep the D3 and R code in one location. This allows you to copy and run the script without worrying about dependencies.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;r2d3&lt;/code&gt; includes functions to interact with Shiny. The &lt;code&gt;d3Output()&lt;/code&gt; function is used in the &lt;code&gt;ui&lt;/code&gt; section of the app, and &lt;code&gt;renderD3()&lt;/code&gt; is used in the &lt;code&gt;server&lt;/code&gt; section of the app.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using &lt;code&gt;dplyr&lt;/code&gt;, the &lt;code&gt;forcats::gss_cat&lt;/code&gt; data is transformed to fit what the primer expects. In other words, the variable that the total count obtained with &lt;code&gt;tally()&lt;/code&gt; is renamed to &lt;code&gt;y&lt;/code&gt;. Additionally, new fields are added to specify the colors. A note about colors with D3: you can pass color names (“red”), or the Hex code of the color (“#E69F00”). Some additional tips for Hex color selection can be found in the &lt;a href=&#34;http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#a-colorblind-friendly-palette&#34;&gt;&lt;code&gt;ggplot2&lt;/code&gt; cookbook&lt;/a&gt;. A very nice application to test different color schemes and explore contrast with different color deficiencies is &lt;a href=&#34;http://projects.susielu.com/viz-palette&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Thanks to the fact that the &lt;code&gt;r2d3()&lt;/code&gt; function uses the data as its first argument, we can simply pipe (&lt;code&gt;%&amp;gt;%&lt;/code&gt;) the &lt;code&gt;dplyr&lt;/code&gt; transformations directly to it. The only argument to pass to &lt;code&gt;r2d3()&lt;/code&gt; is the location of the new temporary file.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;the-full-example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The full example&lt;/h2&gt;
&lt;p&gt;Here is the full code for the sample app linked above. The D3 script is what I would consider a more complete “primer” that you can use in other apps. Copy and run the code to try out the Shiny app; as mentioned before, it should run without having to worry about any other file dependencies. More explanation and code breakdown is available after this code section:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(shiny)
library(dplyr)
library(r2d3)
library(forcats)
library(DT)
library(rlang)

r2d3_script &amp;lt;- &amp;quot;
// !preview r2d3 data= data.frame(y = 0.1, ylabel = &amp;#39;1%&amp;#39;, fill = &amp;#39;#E69F00&amp;#39;, mouseover = &amp;#39;green&amp;#39;, label = &amp;#39;one&amp;#39;, id = 1)
function svg_height() {return parseInt(svg.style(&amp;#39;height&amp;#39;))}
function svg_width()  {return parseInt(svg.style(&amp;#39;width&amp;#39;))}
function col_top()  {return svg_height() * 0.05; }
function col_left() {return svg_width()  * 0.20; }
function actual_max() {return d3.max(data, function (d) {return d.y; }); }
function col_width()  {return (svg_width() / actual_max()) * 0.55; }
function col_heigth() {return svg_height() / data.length * 0.95; }

var bars = svg.selectAll(&amp;#39;rect&amp;#39;).data(data);
bars.enter().append(&amp;#39;rect&amp;#39;)
    .attr(&amp;#39;x&amp;#39;,      col_left())
    .attr(&amp;#39;y&amp;#39;,      function(d, i) { return i * col_heigth() + col_top(); })
    .attr(&amp;#39;width&amp;#39;,  function(d) { return d.y * col_width(); })
    .attr(&amp;#39;height&amp;#39;, col_heigth() * 0.9)
    .attr(&amp;#39;fill&amp;#39;,   function(d) {return d.fill; })
    .attr(&amp;#39;id&amp;#39;,     function(d) {return (d.label); })
    .on(&amp;#39;click&amp;#39;, function(){
      Shiny.setInputValue(&amp;#39;bar_clicked&amp;#39;, d3.select(this).attr(&amp;#39;id&amp;#39;), {priority: &amp;#39;event&amp;#39;});
    })
    .on(&amp;#39;mouseover&amp;#39;, function(){
      d3.select(this).attr(&amp;#39;fill&amp;#39;, function(d) {return d.mouseover; });
    })
    .on(&amp;#39;mouseout&amp;#39;, function(){
      d3.select(this).attr(&amp;#39;fill&amp;#39;, function(d) {return d.fill; });
    });
bars.transition()
  .duration(500)
    .attr(&amp;#39;x&amp;#39;,      col_left())
    .attr(&amp;#39;y&amp;#39;,      function(d, i) { return i * col_heigth() + col_top(); })
    .attr(&amp;#39;width&amp;#39;,  function(d) { return d.y * col_width(); })
    .attr(&amp;#39;height&amp;#39;, col_heigth() * 0.9)
    .attr(&amp;#39;fill&amp;#39;,   function(d) {return d.fill; })
    .attr(&amp;#39;id&amp;#39;,     function(d) {return d.label; });
bars.exit().remove();

// Identity labels
var txt = svg.selectAll(&amp;#39;text&amp;#39;).data(data);
txt.enter().append(&amp;#39;text&amp;#39;)
    .attr(&amp;#39;x&amp;#39;, width * 0.01)
    .attr(&amp;#39;y&amp;#39;, function(d, i) { return i * col_heigth() + (col_heigth() / 2) + col_top(); })
    .text(function(d) {return d.label; })
    .style(&amp;#39;font-family&amp;#39;, &amp;#39;sans-serif&amp;#39;);
txt.transition()
    .duration(1000)
    .attr(&amp;#39;x&amp;#39;, width * 0.01)
    .attr(&amp;#39;y&amp;#39;, function(d, i) { return i * col_heigth() + (col_heigth() / 2) + col_top(); })
    .text(function(d) {return d.label; });
txt.exit().remove();

// Numeric labels
var totals = svg.selectAll().data(data);
totals.enter().append(&amp;#39;text&amp;#39;)
    .attr(&amp;#39;x&amp;#39;, function(d) { return ((d.y * col_width()) + col_left()) * 1.01; })
    .attr(&amp;#39;y&amp;#39;, function(d, i) { return i * col_heigth() + (col_heigth() / 2) + col_top(); })
    .style(&amp;#39;font-family&amp;#39;, &amp;#39;sans-serif&amp;#39;)
    .text(function(d) {return d.ylabel; });
totals.transition()
    .duration(1000)
    .attr(&amp;#39;x&amp;#39;, function(d) { return ((d.y * col_width()) + col_left()) * 1.01; })
    .attr(&amp;#39;y&amp;#39;, function(d, i) { return i * col_heigth() + (col_heigth() / 2) + col_top(); })
    .attr(&amp;#39;d&amp;#39;, function(d) { return d.x; })
    .text(function(d) {return d.ylabel; });
totals.exit().remove();
&amp;quot;
r2d3_file &amp;lt;- tempfile()
writeLines(r2d3_script, r2d3_file)

ui &amp;lt;- fluidPage(
  selectInput(&amp;quot;var&amp;quot;, &amp;quot;Variable&amp;quot;,
              list(&amp;quot;marital&amp;quot;, &amp;quot;rincome&amp;quot;, &amp;quot;partyid&amp;quot;, &amp;quot;relig&amp;quot;, &amp;quot;denom&amp;quot;),
              selected = &amp;quot;marital&amp;quot;),
  d3Output(&amp;quot;d3&amp;quot;),
  DT::dataTableOutput(&amp;quot;table&amp;quot;),
  textInput(&amp;quot;val&amp;quot;, &amp;quot;Value&amp;quot;, &amp;quot;Married&amp;quot;)
)

server &amp;lt;- function(input, output, session) {
  output$d3 &amp;lt;- renderD3({
    gss_cat %&amp;gt;%
      mutate(label = !!sym(input$var)) %&amp;gt;%
      group_by(label) %&amp;gt;%
      tally() %&amp;gt;%
      arrange(desc(n)) %&amp;gt;%
      mutate(
        y = n,
        ylabel = prettyNum(n, big.mark = &amp;quot;,&amp;quot;),
        fill = ifelse(label != input$val, &amp;quot;#E69F00&amp;quot;, &amp;quot;red&amp;quot;),
        mouseover = &amp;quot;#0072B2&amp;quot;
      ) %&amp;gt;%
      r2d3(r2d3_file)
  })
  observeEvent(input$bar_clicked, {
      updateTextInput(session, &amp;quot;val&amp;quot;, value = input$bar_clicked)
  })
  output$table &amp;lt;- renderDataTable({
    gss_cat %&amp;gt;%
      filter(!!sym(input$var) == input$val) %&amp;gt;%
      datatable()
  })
}

shinyApp(ui = ui, server = server)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;additions-to-d3-code&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Additions to D3 code&lt;/h3&gt;
&lt;p&gt;Hopefully, you can see a coding pattern emerging in the more lengthy example above. Here are some explanations for items that are new or outside the pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;bars.transition()&lt;/code&gt; function “re-draws” the shape or text when the underlying data changes, when we make a change within the Shiny app. The &lt;code&gt;duration()&lt;/code&gt; function defines the time that the changes take. Be sure to copy all of the attributes from the &lt;code&gt;enter()&lt;/code&gt; function. This is needed when adding D3 plots into a Shiny app.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;var txt = svg.selectAll(&#39;text&#39;).data(data);&lt;/code&gt; code adds a new text object, similar to &lt;code&gt;geom_text()&lt;/code&gt;. The same coding pattern as the &lt;code&gt;rect&lt;/code&gt; shape applies. The additions are: a &lt;code&gt;text()&lt;/code&gt; function that defines what its displayed on screen (note that there’s no &lt;code&gt;attr(&#39;text&#39;,...&lt;/code&gt;), and the &lt;code&gt;style()&lt;/code&gt; function to allow setting the font type size.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;setting-up-the-shiny-interactivity&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Setting up the Shiny interactivity&lt;/h3&gt;
&lt;p&gt;There are three options to integrate the Shiny input created inside the D3 script:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;Have a given Shiny &lt;code&gt;output&lt;/code&gt; react to the D3/Shiny input. An example would be to use it as a value to filter data in &lt;code&gt;filter(id_field == input$bar_clicked)&lt;/code&gt;. This works OK when there are not too many plots to integrate, but for a large dashboard, the second option would be better. An example of this approach can be found &lt;a href=&#34;https://rstudio.github.io/r2d3/articles/shiny.html#shiny-code&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Shiny’s &lt;code&gt;observeEvent()&lt;/code&gt; to monitor the D3/Shiny input and have it run a specific action based on the value of the input. I usually use this approach to update another Shiny input in the app, and that is the approach used in this app.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the &lt;code&gt;reactive()&lt;/code&gt; function to wrap all of the data transformations that are common across all of the plots inside the dashboard. Then have each plot use that function as the base of further &lt;code&gt;dplyr&lt;/code&gt; transformations. That approach can be found in the &lt;a href=&#34;http://db.rstudio.com/best-practices/dashboards/&#34;&gt;Enterprise Dashboards article&lt;/a&gt; on db.rstudio.com; here is &lt;a href=&#34;https://github.com/sol-eng/db-dashboard/blob/f3f42eabe722207510e6670ad81e36722e0b3d44/local_app.R#L91-L116&#34;&gt;a direct link to the code&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id=&#34;other-r-additions&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Other R additions&lt;/h3&gt;
&lt;p&gt;A few additional tips that are helpful, but not mandatory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;To get the effect of keeping the selected bar with a different color than the others, I used an &lt;code&gt;ifelse()&lt;/code&gt; inside the &lt;code&gt;mutate()&lt;/code&gt; that checks if a particular row matches to the selected &lt;code&gt;input&lt;/code&gt;: &lt;code&gt;fill = ifelse(label != input$val, &amp;quot;#E69F00&amp;quot;, &amp;quot;red&amp;quot;)&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In this line: &lt;code&gt;mutate(label = !!sym(input$var))&lt;/code&gt;, I am using &lt;code&gt;rlang&lt;/code&gt;’s convention to allow for the plot to change the field that it is displaying. This is a very rare requirement in an app, so I hope that it doesn’t throw anyone off. This is an advanced R programming concept not necessary for D3/Shiny.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I decided to use a separate field with the total count (&lt;code&gt;y&lt;/code&gt;) and the label that will be shown in that bar (&lt;code&gt;ylabel&lt;/code&gt;). It was easier for me to edit the format in R than in D3. Some may decide to do that in the D3 script.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;rstudio-1.2&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;RStudio 1.2&lt;/h2&gt;
&lt;p&gt;If you have the RStudio IDE Preview Release installed, you can easily preview the D3 visualization right in the Viewer pane. Information on how to do this is &lt;a href=&#34;https://rstudio.github.io/r2d3/#d3-preview&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the first line in the script above, there is a D3 comment line with metadata that RStudio will pass to &lt;code&gt;r2d3&lt;/code&gt; so that you do not run R code in the console to see a preview. This integration also lets us use the IDE to edit the D3 file, which accelerates learning D3.&lt;/p&gt;
&lt;p&gt;To try this out with the visualization above, copy and paste the contents of the &lt;code&gt;r2d3_script&lt;/code&gt; variable to a new D3 file inside the RStudio IDE.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;closing-words&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Closing words&lt;/h2&gt;
&lt;p&gt;Thank you for making it this far! Even if you were just skimming, I hope one or two things I’ve shown were interesting enough to consider trying out the exercise.&lt;/p&gt;
&lt;p&gt;Sometimes, we forget how far we have progressed on a subject and forget what it feels like to begin the learning process. Hopefully these explanations avoid this pitfall and will simplify your learning experience. Please feel free to ask questions or start a topic of discussion at &lt;a href=&#34;https://community.rstudio.com/&#34;&gt;community.rstudio.com&lt;/a&gt;, where many are happy to help!&lt;/p&gt;
&lt;p&gt;Here are some additional links to resources that you may want to check out. The first two I wrote for RStudio documentation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://rstudio.github.io/r2d3/articles/shiny.html&#34;&gt;Using r2d3 with Shiny&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://db.rstudio.com/best-practices/dashboards/#using-r2d3-for-interactivity-and-drill-down&#34;&gt;Enterprise-ready dashboards&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href=&#34;https://github.com/d3/d3/blob/master/API.md&#34;&gt;D3 API reference&lt;/a&gt; is really good, I use it often.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href=&#34;https://rstudio.github.io/r2d3/&#34;&gt;&lt;code&gt;r2d3&lt;/code&gt; site&lt;/a&gt; has a great Gallery and articles to review. It has a section about &lt;a href=&#34;https://rstudio.github.io/r2d3/articles/learning_d3.html&#34;&gt;learning D3&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/09/20/shiny-r2d3/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Slack and Plumber, Part One</title>
      <link>https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/</link>
      <pubDate>Thu, 30 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/</guid>
      <description>
        


&lt;p&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2018/07/23/rest-apis-and-plumber/&#34;&gt;the previous post&lt;/a&gt;, we introduced &lt;a href=&#34;https://www.rplumber.io&#34;&gt;&lt;code&gt;plumber&lt;/code&gt;&lt;/a&gt; as a way to expose R processes and programs to external systems via REST API endpoints. In this post, we’ll go further by building out an API that powers a &lt;a href=&#34;https://api.slack.com/slash-commands&#34;&gt;Slack slash command&lt;/a&gt;, all from within R using &lt;code&gt;plumber&lt;/code&gt;. A subsequent post will outline deploying and securing the API.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-08-15-blair-plumber-slack-files/slash-command-preview.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;We will create an API built on top of simulated customer call data that powers a slash command. This command will allows users to view a customer status report within Slack. As shown, this status report contains customer name, total calls, date of birth, and a plot of call history for the past 20 weeks. The simulated data, along with the script used to create it, can be found in the &lt;a href=&#34;https://github.com/sol-eng/plumber-slack&#34;&gt;GitHub repository&lt;/a&gt; for this example.&lt;/p&gt;
&lt;div id=&#34;setup&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Setup&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://slack.com&#34;&gt;Slack&lt;/a&gt; is a commonly used communication tool that’s highly customizable through various integrations. It’s even possible to build your own integrations, which is what we’ll be doing here. In order to build a Slack app, you need to have a Slack account and follow &lt;a href=&#34;https://api.slack.com/slack-apps&#34;&gt;the instructions for creating an app&lt;/a&gt;. In this example, we will build an app that includes a slash command that users can access by typing &lt;code&gt;/&amp;lt;command-name&amp;gt;&lt;/code&gt; into a Slack message.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-slack-request&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The Slack request&lt;/h2&gt;
&lt;p&gt;In this scenario, we’re building an API that will interact with a known request. This means that we need to understand the nature of the incoming request so that we can appropriately handle it within the API. Slack provides &lt;a href=&#34;https://api.slack.com/slash-commands#app_command_handling&#34;&gt;some documentation&lt;/a&gt; about the request that is sent when a slash command is invoked. In short, an HTTP POST request is made that contains a URL-encoded data payload. An example data payload looks like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;token=gIkuvaNzQIHg97ATvDxqgjtO
&amp;amp;team_id=T0001
&amp;amp;team_domain=example
&amp;amp;enterprise_id=E0001
&amp;amp;enterprise_name=Globular%20Construct%20Inc
&amp;amp;channel_id=C2147483705
&amp;amp;channel_name=test
&amp;amp;user_id=U2147483697
&amp;amp;user_name=Steve
&amp;amp;command=/weather
&amp;amp;text=94070
&amp;amp;response_url=https://hooks.slack.com/commands/1234/5678
&amp;amp;trigger_id=13345224609.738474920.8088930838d88f008e0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There’s a lot of detail included in the Slack request, and the &lt;a href=&#34;https://api.slack.com/slash-commands#app_command_handling&#34;&gt;Slack documentation&lt;/a&gt; provides details about each field. We’re mainly interested in the &lt;code&gt;text&lt;/code&gt; field, which contains the text entered into Slack after the slash command. In the above example, the user entered &lt;code&gt;/weather 94070&lt;/code&gt; into Slack, so the request indicates that the command was &lt;code&gt;/weather&lt;/code&gt; and the text was &lt;code&gt;94070&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note that this approach is different from APIs that are not being built around a known request or specification. In such instances, we are free to expose endpoints and return data in whatever method seems most beneficial to downstream consumers. In such a scenario, we would provide downstream API consumers with an understanding of how the API handles requests and what types of responses are generated so that they can appropriately interact with the API. But in this example, the design of our API is, in part, dictated by the specifications Slack provides.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;building-the-api&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Building the API&lt;/h2&gt;
&lt;p&gt;Now that we have an understanding of what is included in the incoming request, we can begin to build out the API using &lt;code&gt;plumber&lt;/code&gt;. First, we need to set up the global environment for the API by loading necessary packages and global objects, including the simulated data this API is built on. In reality, this data would likely come from an external database accessed via an ODBC connection.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Packages ----
library(plumber)
library(magrittr)
library(ggplot2)

# Data ----
# Load sample customer data
sim_data &amp;lt;- readr::read_rds(&amp;quot;data/sim-data.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following diagram outlines what we want to build.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-08-15-blair-plumber-slack-files/plumber-slack-architecture.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;In essence, an incoming request will pass through two &lt;a href=&#34;https://www.rplumber.io/docs/routing-and-input.html#filters&#34;&gt;filters&lt;/a&gt; before reaching an endpoint. This first filter is responsible for routing incoming requests to the correct endpoint. This is done so that a single slash command can serve multiple endpoints without the need to create separate commands for each service. The second filter simply logs details about the request for future review.&lt;/p&gt;
&lt;div id=&#34;filters&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Filters&lt;/h3&gt;
&lt;p&gt;The first filter is responsible for parsing the incoming request and ensuring it is assigned to the appropriate endpoint. This is done because when a slash command is created in Slack, there is only one endpoint defined for requests made from the command. This filter enables several endpoints to be utilized by the same slash command by parsing the incoming &lt;code&gt;text&lt;/code&gt; of the command and treating the first value of that command as the endpoint to which the request should be routed.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#* Parse the incoming request and route it to the appropriate endpoint
#* @filter route-endpoint
function(req, text = &amp;quot;&amp;quot;) {
  # Identify endpoint
  split_text &amp;lt;- urltools::url_decode(text) %&amp;gt;%
    strsplit(&amp;quot; &amp;quot;) %&amp;gt;%
    unlist()
  
  if (length(split_text) &amp;gt;= 1) {
    endpoint &amp;lt;- split_text[[1]]
    
    # Modify request with updated endpoint
    req$PATH_INFO &amp;lt;- paste0(&amp;quot;/&amp;quot;, endpoint)
    
    # Modify request with remaining commands from text
    req$ARGS &amp;lt;- split_text[-1] %&amp;gt;% 
      paste0(collapse = &amp;quot; &amp;quot;)
  }
  
  # Forward request 
  forward()
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This filter requires an understanding of the &lt;a href=&#34;https://www.rplumber.io/docs/routing-and-input.html#the-request-object&#34;&gt;&lt;code&gt;req&lt;/code&gt; object&lt;/a&gt;. It’s important to note that a few things happen in this filter. First, we parse the &lt;code&gt;text&lt;/code&gt; argument and use the first part of &lt;code&gt;text&lt;/code&gt; as the &lt;code&gt;req$PATH_INFO&lt;/code&gt;, which tells &lt;code&gt;plumber&lt;/code&gt; where to route the request. Second, we take anything remaining from &lt;code&gt;text&lt;/code&gt; and attach it to the request in &lt;code&gt;req$ARGS&lt;/code&gt;. This means that any downstream filters or endpoints will have access to &lt;code&gt;req$ARGS&lt;/code&gt;. The second filter is taken straight from the &lt;a href=&#34;https://www.rplumber.io/docs/routing-and-input.html#forward-to-another-handler&#34;&gt;&lt;code&gt;plumber&lt;/code&gt; documentation&lt;/a&gt; and simply logs details about the incoming request.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#* Log information about the incoming request
#* @filter logger
function(req){
  cat(as.character(Sys.time()), &amp;quot;-&amp;quot;, 
      req$REQUEST_METHOD, req$PATH_INFO, &amp;quot;-&amp;quot;, 
      req$HTTP_USER_AGENT, &amp;quot;@&amp;quot;, req$REMOTE_ADDR, &amp;quot;\n&amp;quot;)
  
  # Forward request
  forward()
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;endpoints&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Endpoints&lt;/h3&gt;
&lt;p&gt;There are a few endpoints we need to define. First, we need to define an endpoint that provides a response Slack can understand and interpret into a message. In this case, we’re going to return a &lt;a href=&#34;https://www.json.org&#34;&gt;JSON object&lt;/a&gt; that Slack interprets into a message with attachments. Slack provides &lt;a href=&#34;https://api.slack.com/docs/message-attachments&#34;&gt;detailed documentation&lt;/a&gt; on what fields it accepts in a response. Also note that Slack expects unboxed JSON, while the &lt;a href=&#34;https://www.rplumber.io/docs/rendering-and-output.html#boxed-vs-unboxed-json&#34;&gt;&lt;code&gt;plumber&lt;/code&gt; default&lt;/a&gt; is to return boxed JSON. In order to ensure that Slack understands the response, we set the serializer for this response to be &lt;code&gt;unboxedJSON&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#* Return a message containing status details about the customer
#* @serializer unboxedJSON
#* @post /status
function(req, res) {
  # Check req$ARGS and match to customer - if no customer match is found, return
  # an error
  
  customer_ids &amp;lt;- unique(sim_data$id)
  customer_names &amp;lt;- unique(sim_data$name)
  
  if (!as.numeric(req$ARGS) %in% customer_ids &amp;amp; !req$ARGS %in% customer_names) {
    res$status &amp;lt;- 400
    return(
      list(
        response_type = &amp;quot;ephemeral&amp;quot;,
        text = paste(&amp;quot;Error: No customer found matching&amp;quot;, req$ARGS)
      )
    )
  }
  
  # Filter data to customer data based on provided id / name
  if (as.numeric(req$ARGS) %in% customer_ids) {
    customer_id &amp;lt;- as.numeric(req$ARGS)
    customer_data &amp;lt;- dplyr::filter(sim_data, id == customer_id)
    customer_name &amp;lt;- unique(customer_data$name)
  } else {
    customer_name &amp;lt;- req$ARGS
    customer_data &amp;lt;- dplyr::filter(sim_data, name == customer_name)
    customer_id &amp;lt;- unique(customer_data$id)
  }
  
  # Simple heuristics for customer status
  total_customer_calls &amp;lt;- sum(customer_data$calls)
  
  customer_status &amp;lt;- dplyr::case_when(total_customer_calls &amp;gt; 250 ~ &amp;quot;danger&amp;quot;,
                                      total_customer_calls &amp;gt; 130 ~ &amp;quot;warning&amp;quot;,
                                      TRUE ~ &amp;quot;good&amp;quot;)
  
  # Build response
  list(
    # response type - ephemeral indicates the response will only be seen by the
    # user who invoked the slash command as opposed to the entire channel
    response_type = &amp;quot;ephemeral&amp;quot;,
    # attachments is expected to be an array, hence the list within a list
    attachments = list(
      list(
        color = customer_status,
        title = paste0(&amp;quot;Status update for &amp;quot;, customer_name, &amp;quot; (&amp;quot;, customer_id, &amp;quot;)&amp;quot;),
        fallback = paste0(&amp;quot;Status update for &amp;quot;, customer_name, &amp;quot; (&amp;quot;, customer_id, &amp;quot;)&amp;quot;),
        # History plot
        image_url = paste0(&amp;quot;localhost:5762/plot/history/&amp;quot;, customer_id),
        # Fields provide a way of communicating semi-tabular data in Slack
        fields = list(
          list(
            title = &amp;quot;Total Calls&amp;quot;,
            value = sum(customer_data$calls),
            short = TRUE
          ),
          list(
            title = &amp;quot;DoB&amp;quot;,
            value = unique(customer_data$dob),
            short = TRUE
          )
        )
      )
    )
  )
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are three main things that happen in this endpoint. First, we check to ensure that the provided customer name or ID appear in the dataset. Next, we create a subset of the data for only the identified customer and use a simple heuristic to determine the customer’s status. Finally, we put a list together that will be serialized into JSON in response to requests made to this endpoint. This list conforms to the standards outlined by Slack.&lt;/p&gt;
&lt;p&gt;The second endpoint is used to provide the history plot that is referenced in the first endpoint. When an &lt;code&gt;image_url&lt;/code&gt; is provided in a Slack attachment, Slack uses a GET request to fetch the image from the URL. So, this endpoint responds to incoming GET requests with an image.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#* Plot customer weekly calls
#* @png
#* @param cust_id ID of the customer
#* @get /plot/history/&amp;lt;cust_id:int&amp;gt;
function(cust_id, res) {
  # Throw error if cust_id doesn&amp;#39;t exist in data
  if (!cust_id %in% sim_data$id) {
    res$status &amp;lt;- 400
    stop(&amp;quot;Customer id&amp;quot; , cust_id, &amp;quot; not found.&amp;quot;)
  }
  
  # Filter data to customer id provided
  plot_data &amp;lt;- dplyr::filter(sim_data, id == cust_id)
  
  # Customer name (id)
  customer_name &amp;lt;- paste0(unique(plot_data$name), &amp;quot; (&amp;quot;, unique(plot_data$id), &amp;quot;)&amp;quot;)
  
  # Create plot
  history_plot &amp;lt;- plot_data %&amp;gt;%
    ggplot(aes(x = time, y = calls, col = calls)) +
    ggalt::geom_lollipop(show.legend = FALSE) +
    theme_light() +
    labs(
      title = paste(&amp;quot;Weekly calls for&amp;quot;, customer_name),
      x = &amp;quot;Week&amp;quot;,
      y = &amp;quot;Calls&amp;quot;
    )
  
  # print() is necessary to render plot properly
  print(history_plot)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once these pieces are together, you can run the API either through the UI as described in the previous post, or by running &lt;code&gt;plumber::plumb(&amp;quot;plumber.R&amp;quot;)$run(port = 5762)&lt;/code&gt; from the directory containing the API.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;testing-the-api&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Testing the API&lt;/h2&gt;
&lt;p&gt;Once the API is up and running, we can test it to make sure it’s behaving as we expect. Since the main point of contact is making a POST request to the &lt;code&gt;/status&lt;/code&gt; endpoint, it’s easiest to interact with the API through &lt;code&gt;curl&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;$ curl -X POST --data &amp;#39;{&amp;quot;text&amp;quot;:&amp;quot;status 1&amp;quot;}&amp;#39; localhost:5762 | jq &amp;#39;.&amp;#39;
{
  &amp;quot;response_type&amp;quot;: &amp;quot;ephemeral&amp;quot;,
  &amp;quot;attachments&amp;quot;: [
    {
      &amp;quot;color&amp;quot;: &amp;quot;good&amp;quot;,
      &amp;quot;title&amp;quot;: &amp;quot;Status update for Rahul Wilderman IV (001)&amp;quot;,
      &amp;quot;fallback&amp;quot;: &amp;quot;Status update for Rahul Wilderman IV (001)&amp;quot;,
      &amp;quot;image_url&amp;quot;: &amp;quot;localhost:5762/plot/history/001&amp;quot;,
      &amp;quot;fields&amp;quot;: [
        {
          &amp;quot;title&amp;quot;: &amp;quot;Total Calls&amp;quot;,
          &amp;quot;value&amp;quot;: 27,
          &amp;quot;short&amp;quot;: true
        },
        {
          &amp;quot;title&amp;quot;: &amp;quot;DoB&amp;quot;,
          &amp;quot;value&amp;quot;: &amp;quot;2004-04-01&amp;quot;,
          &amp;quot;short&amp;quot;: true
        }
      ]
    }
  ]
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Success! Our API successfully routed our request to the appropriate endpoint and returned a valid JSON response. As a final check, we can visit the &lt;code&gt;image_url&lt;/code&gt; in our browser to see if the plot is properly rendered.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-08-15-blair-plumber-slack-files/plot-screen-shot.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Everything appears to be running as expected!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this post, we used &lt;code&gt;plumber&lt;/code&gt; to create an API that can properly interact with the Slack slash command interface. In the next post, we will explore API security and deployment. Continuing with this example, we will secure our API using &lt;a href=&#34;https://api.slack.com/docs/verifying-requests-from-slack&#34;&gt;Slack’s guidelines&lt;/a&gt;, deploy the API, and finally connect Slack so that we can use our new slash command. At the conclusion of the next post, we will have a fully functioning Slack slash command, all built using R.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;James Blair is a solutions engineer at RStudio who focusses on tools, technologies, and best practices for using R in the enterprise.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/08/30/slack-and-plumber-part-one/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Learning Analytic Administration through a Sandbox</title>
      <link>https://rviews.rstudio.com/2018/08/23/learning-analytic-administration-through-a-sandbox/</link>
      <pubDate>Thu, 23 Aug 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/08/23/learning-analytic-administration-through-a-sandbox/</guid>
      <description>
        

&lt;p&gt;It all starts with sandboxes. Development sandboxes are dedicated safe spaces for experimentation and creativity. A sandbox is a place where you can go to test and break things, without the ramifications of breaking the real, important things. If you&amp;rsquo;re an analytic administrator who doesn&amp;rsquo;t have access or means to get a sandbox, I recommend that you consider advocating to change that. Here are just some of the arguments for why sandboxes are a powerful tool for the R admin that you may find helpful.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sandbox experimentation develops valuable experience and promotes exposure to best practices.&lt;/li&gt;
&lt;li&gt;Sandboxes can be used to demonstrate quick wins or establish grounds for future investments.&lt;/li&gt;
&lt;li&gt;Sandboxes can increase engagement with the IT group through communicating from a more informed position.&lt;/li&gt;
&lt;li&gt;They can be instrumental in creating installation and configuration recipes for the administration of R in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To be an effective R admin, I have to learn through doing. In my case, this often means standing up small server instances through Amazon Web Services so that I can test out different configurations or architectures. I like to follow a fairly regimented crawl-walk-run strategy for acquiring R administration knowledge, but things still slip through the cracks.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-08-20-learning-analytic-administration-through-a-sandbox_files/crawl-walk.jpg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;For example, I wish I had taken time to explore the very basic &lt;strong&gt;Run As :HOME_USER:&lt;/strong&gt; configuration pattern when I was first learning the ropes of Shiny Server. This solves a very interesting problem: even with Shiny Server and RStudio Server installed on the same machine, Shiny applications developed in a user&amp;rsquo;s home directory within the RStudio IDE still need to be &amp;ldquo;deployed&amp;rdquo; to the Shiny Server directory in order to be made accessible there.&lt;/p&gt;

&lt;p&gt;The Shiny Server documentation lays out a simple and elegant way to run applications as the user in whose home directory that app exists, thus circumventing the need to deploy from one location on the server to another. While this solution may not be desirable for many situations, it has great merits as a sandbox:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The single-server infrastructure can be installed and configured in minutes.&lt;/li&gt;
&lt;li&gt;It can give you and your team a quick win if you&amp;rsquo;re looking to create a proof of concept.&lt;/li&gt;
&lt;li&gt;You&amp;rsquo;ll gain exposure to the Shiny Server documentation and learn how to make edits to the default shiny-server configuration file.&lt;/li&gt;
&lt;li&gt;You can create a recipe for installation and configuration that could potentially be reused by you or others, including IT.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this post, I&amp;rsquo;ll go through the high-level steps it takes to implement this configuration as a sandbox server running on a single Amazon Web Services Elastic Cloud Compute (AWS EC2) instance. I&amp;rsquo;m going to assume you have very little experience with the technologies involved, but that you&amp;rsquo;re a tenacious R admin-in-training, hungry to learn and read whatever is necessary.&lt;/p&gt;

&lt;p&gt;Note: sandboxes can be created on all sorts of different servers. I&amp;rsquo;ve chosen an AWS EC2 instance because it is an easily accessible and commonly used cloud platform, but you could create a sandbox on your local machine with a Virtual Machine, using something like VirtualBox; use another cloud provider; or find a different solution entirely. If you already have a fresh sandbox server to play with, skip the first section and proceed straight to &lt;em&gt;Setting up the Sandbox&lt;/em&gt;.&lt;/p&gt;

&lt;h3 id=&#34;getting-started-with-amazon-web-services-and-elastic-cloud-compute&#34;&gt;Getting Started with Amazon Web Services and Elastic Cloud Compute&lt;/h3&gt;

&lt;p&gt;There are a few things you&amp;rsquo;ll need to do to get started with AWS EC2. First, you need an AWS account. That will require some initial setup and a credit card. Once you have all of that, you&amp;rsquo;ll have access to the Amazon Web Services console. This is the view of all the web services Amazon has to offer - it can be quite overwhelming to ponder. The service we&amp;rsquo;re interested in today is Elastic Cloud Compute (EC2). If you&amp;rsquo;re looking at the &lt;em&gt;All Services&lt;/em&gt; view, it should be listed under &lt;em&gt;Compute&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;On the EC2 console page, you&amp;rsquo;ll need to do a couple of things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html&#34;&gt;Create a key pair and download it&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Launch an Instance (click the blue button under “Create Instance” to go to the launch wizard)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stepping through the launch wizard, you&amp;rsquo;ll have many options. Here were my selections:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-08-20-learning-analytic-administration-through-a-sandbox_files/instance-details.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Take special note of the security group formulation. I added two custom TCP rules for opening port 8787 (RStudio Server&amp;rsquo;s default port) and 3838 (Shiny Server&amp;rsquo;s default port).&lt;/p&gt;

&lt;p&gt;At this point you&amp;rsquo;re ready to launch.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;Instances&lt;/em&gt; view under &lt;em&gt;Resources&lt;/em&gt; on the main EC2 console page will show you a list of all the running EC2 instances you have in this region. Once the instance you launched is listed as &lt;em&gt;running&lt;/em&gt;, you&amp;rsquo;ll want to connect to it. Click on your instance to select it in the list; the &lt;em&gt;Connect&lt;/em&gt; button should become enabled once you do.&lt;/p&gt;

&lt;p&gt;Click the Connect button and follow the steps listed there to SSH into your EC2 instance. Congrats - you now have a fresh CentOS-flavored Linux machine to learn on and configure!&lt;/p&gt;

&lt;h3 id=&#34;setting-up-the-sandbox-installation&#34;&gt;Setting up the Sandbox: Installation&lt;/h3&gt;

&lt;p&gt;Now that you have a clean sandbox, it&amp;rsquo;s time to bring in the toys.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://aws.amazon.com/premiumsupport/knowledge-center/ec2-enable-epel/&#34;&gt;Install and enable the Extra Packages for Enterprise Linux (EPEL) repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Follow the guidelines for installing R and the shiny package library listed in the &lt;a href=&#34;https://www.rstudio.com/products/shiny/download-server/&#34;&gt;instructions for Shiny Server open source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Continue using the same instructions to download and install Shiny Server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point, the Shiny Server service will start running with all default configurations in place. Go back to the EC2 console and your Connection dialog pane to grab the public DNS address. Navigate to that address in a web browser, using port 3838 (e.g. &lt;a href=&#34;http://ec2-public-dns:3838&#34;&gt;http://ec2-public-dns:3838&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;You should see the welcome page!&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-08-20-learning-analytic-administration-through-a-sandbox_files/welcome-shiny-server.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;The Shiny Server welcome page has two panels on the right-hand side. The top frame should feature a functional shiny application. The bottom frame is meant to show an R markdown document, but because you haven&amp;rsquo;t yet configured the server to host those documents, it should show an error message. If hosting R markdown documents is important to the success of your sandbox, &lt;a href=&#34;http://docs.rstudio.com/shiny-server/#r-markdown&#34;&gt;learn how to set that up&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now that Shiny Server is up and running, you&amp;rsquo;ll need to go through a similar &lt;a href=&#34;https://www.rstudio.com/products/rstudio/download-server/&#34;&gt;installation process for RStudio Server&lt;/a&gt;. Remember that our plan from the beginning was to install both these services on the same machine - don&amp;rsquo;t create a new EC2 instance for RStudio Server.&lt;/p&gt;

&lt;p&gt;Once installed, open a separate web browser window or tab and navigate to the public DNS again, but this time at port 8787 (e.g. &lt;a href=&#34;http://ec2-public-dns:8787&#34;&gt;http://ec2-public-dns:8787&lt;/a&gt;). Here you should see the RStudio Server sign-in landing page. To sign in to RStudio Server you&amp;rsquo;ll need a user and password. As the sandbox administrator, it&amp;rsquo;s your job to create this first user. Take a look at the &lt;a href=&#34;https://github.com/sol-eng/data-science-lab/blob/master/redhat/01-instance-setup.Rmd&#34;&gt;RStudio Data Science Lab manual&lt;/a&gt; for instructions on how to do this. After you create a user, verify that you can sign into the RStudio Server IDE. This is where you&amp;rsquo;ll be able to build new Shiny applications.&lt;/p&gt;

&lt;h3 id=&#34;configuring-the-sandbox&#34;&gt;Configuring the Sandbox&lt;/h3&gt;

&lt;p&gt;Shiny Server and RStudio Server should now be installed and running on your machine. The installation step is usually the easy part. Configuration tends to be harder. This is the stage where you&amp;rsquo;ll start adapting the default product so that it can perform in your particular environment. Configuration changes should be made based on your goals, architecture, and ultimately the type of experience you would like the end user to have with this software. In some cases, you may end up testing and combining configuration options from very different sources in the documentation. It can be easy to lose track of what changes were made, which is why keeping notes, or making step-by-step recipes is important.&lt;/p&gt;

&lt;p&gt;Your goal is to change the default configuration of Shiny Server so that users (Shiny developers) in the RStudio IDE can save applications to a folder in their home directory, and have those applications be run as the home user and served from that home location.&lt;/p&gt;

&lt;p&gt;To accomplish this change, &lt;a href=&#34;http://docs.rstudio.com/shiny-server/#default-configuration&#34;&gt;find and edit the shiny-server.conf file&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are two sections of the Shiny Server documentation that I found helpful in crafting my changes to the configuration file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://docs.rstudio.com/shiny-server/#run-as&#34;&gt;2.3.1 :HOME_USER:&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://docs.rstudio.com/shiny-server/#host-per-user-application-directories&#34;&gt;2.7.3 Host Per-User Application Directories&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you finish making changes to the configuration file, &lt;a href=&#34;http://docs.rstudio.com/shiny-server/#stopping-and-starting&#34;&gt;restart the shiny-server service&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Test your changes! The template &lt;em&gt;new&lt;/em&gt; Shiny application should make it easy to test your deployment configuration. My user is named &lt;em&gt;rstudio&lt;/em&gt; and this is what the tree structure of my home directory looks like for the deployment of a Shiny application, &lt;em&gt;app1&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-08-20-learning-analytic-administration-through-a-sandbox_files/dir-tree.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;From the Shiny Server side, &lt;em&gt;app1&lt;/em&gt; is available at: &lt;a href=&#34;http://ec2-public-dns:3838/rstudio/app1/&#34;&gt;http://ec2-public-dns:3838/rstudio/app1/&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&#34;write-a-recipe-and-retire-the-sandbox&#34;&gt;Write a Recipe and Retire the Sandbox&lt;/h3&gt;

&lt;p&gt;Remember to summarize your notes &lt;a href=&#34;https://www.shellscript.sh/first.html&#34;&gt;into scripts&lt;/a&gt; that you can reuse or just save as a reference. I like to keep my installation and configuration scripts in a version control system like git so that I have a lasting record of all the changes I make over time. Don&amp;rsquo;t worry if your script isn&amp;rsquo;t perfect right now. We will cover techniques for writing recipes to meet IT standards in a later post.&lt;/p&gt;

&lt;p&gt;The final step of this process is to shut everything down. Once I declare success, make the notes I want to keep, and share any lessons learned, it&amp;rsquo;s time to terminate. If you invest in writing out a recipe script now, it shouldn&amp;rsquo;t take much time to recreate this sandbox. There&amp;rsquo;s no reason to spend money keeping it running for longer than you need it.&lt;/p&gt;

&lt;p&gt;Use the &lt;em&gt;Actions&lt;/em&gt; button in the Instances view of the EC2 console to terminate any running instances that you&amp;rsquo;re finished using.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-08-20-learning-analytic-administration-through-a-sandbox_files/terminate-instance.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h3 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;As an analytic administrator, the job of legitimizing R and advocating for the best, cutting-edge software falls on you. This is challenging, potentially frustrating, but hopefully ultimately rewarding work. There are an infinite number of sandboxes to create and learn from; hopefully, this post will inspire you to pursue the creation and design of some of your own. Remember that sandboxes are a great tool for demonstrating the value of R as a proof-of-concept, or teaching yourself a new set of skills, but they generally aren&amp;rsquo;t meant to be taken into production.&lt;/p&gt;

&lt;p&gt;For more information on running Shiny in production in an enterprise environment, I would recommend starting with an evaluation of &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt; and &lt;a href=&#34;https://www.rstudio.com/products/rstudio-server-pro/&#34;&gt;RStudio Server Pro&lt;/a&gt;. There will also be a &lt;a href=&#34;http://www.cvent.com/events/rstudio-conf-austin/event-summary-dd6d75526f3c4554b67c4de32aeffb47.aspx&#34;&gt;workshop at RStudio conf 2019&lt;/a&gt; called &lt;em&gt;Shiny in Production | Data Products at Scale&lt;/em&gt; taught by me and my colleague, Sean Lopp, which may be of interest to you. In the meantime, if you build a cool sandbox or learn something worth sharing, we hope you&amp;rsquo;ll post about it on the &lt;a href=&#34;https://community.rstudio.com/c/r-admin&#34;&gt;RStudio community forum for R admins&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Kelly O&amp;rsquo;Briant is a solutions engineer at RStudio interested in configuration and workflow management with a passion for R administration.&lt;/em&gt;&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/08/23/learning-analytic-administration-through-a-sandbox/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>REST APIs and Plumber</title>
      <link>https://rviews.rstudio.com/2018/07/23/rest-apis-and-plumber/</link>
      <pubDate>Mon, 23 Jul 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/07/23/rest-apis-and-plumber/</guid>
      <description>
        


&lt;p&gt;Moving R resources from development to production can be a challenge, especially when the resource isn’t something like a &lt;a href=&#34;http://shiny.rstudio.com&#34;&gt;&lt;code&gt;shiny&lt;/code&gt; application&lt;/a&gt; or &lt;a href=&#34;https://rmarkdown.rstudio.com&#34;&gt;&lt;code&gt;rmarkdown&lt;/code&gt; document&lt;/a&gt; that can be easily published and consumed. Consider, as an example, a customer success model created in R. This model is responsible for taking customer data and returning a predicted outcome, like the likelihood the customer will churn. Once this model is developed and validated, there needs to be some way for the model output to be leveraged by other systems and individuals within the company.&lt;/p&gt;
&lt;p&gt;Traditionally, moving this model into production has involved one of two approaches: either running customer data through the model on a batch basis and caching the results in a database, or handing the model definition off to a development team to translate the work done in R into another language, such as Java or Scala. Both approaches have significant downsides. Batch processing works, but it misses real-time updates. For example, if the batch job runs every night and a customer calls in the next morning and has a heated conversation with support, the model output will have no record of that exchange when the customer calls the customer loyalty department later the same day to cancel their service. In essence, model output is served on a lag, which can sometimes lead to critical information loss. However, the other option requires a large investment of time and resources to convert an existing model into another language just for the purpose of exposing that model as a real-time service. Neither of these approaches is ideal; to solve this problem, the optimal solution is to expose the existing R model as a service that can be easily accessed by other parts of the organization.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.rplumber.io&#34;&gt;&lt;code&gt;plumber&lt;/code&gt;&lt;/a&gt; is an R package that allows existing R code to be exposed as a web service through special decorator comments. With minimal overhead, R programmers and analysts can use &lt;code&gt;plumber&lt;/code&gt; to create REST APIs that expose their work to any number of internal and external systems. This solution provides real-time access to processes and services created entirely in R, and can effectively eliminate the need to perform batch operations or technical hand-offs in order to move R code into production.&lt;/p&gt;
&lt;p&gt;This post will focus on a brief introduction to RESTful APIs, then an introduction to the &lt;code&gt;plumber&lt;/code&gt; package and how it can be used to expose R services as API endpoints. In subsequent posts, we’ll build a functioning web API using &lt;code&gt;plumber&lt;/code&gt; that integrates with &lt;a href=&#34;https://slack.com&#34;&gt;Slack&lt;/a&gt; and provides real-time customer status reports.&lt;/p&gt;
&lt;div id=&#34;web-apis&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Web APIs&lt;/h2&gt;
&lt;p&gt;For some, &lt;a href=&#34;https://en.wikipedia.org/wiki/Application_programming_interface&#34;&gt;APIs (Application Programming Interface)&lt;/a&gt; are things heard of but seldom seen. However, whether seen or unseen, APIs are part of everyday digital life. In fact, you’ve likely used a web API from within R, even if you didn’t recognize it at the time! Several R packages are simply wrappers around popular web APIs, such as &lt;a href=&#34;https://walkerke.github.io/tidycensus/&#34;&gt;&lt;code&gt;tidycensus&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;https://github.com/r-lib/gh&#34;&gt;&lt;code&gt;gh&lt;/code&gt;&lt;/a&gt;. Web APIs are a common framework for sharing information across a network, most commonly through &lt;a href=&#34;https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol&#34;&gt;HTTP&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;http&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;HTTP&lt;/h3&gt;
&lt;p&gt;To understand how HTTP requests work, it’s helpful to know the players involved. A &lt;em&gt;client&lt;/em&gt; makes a request to a &lt;em&gt;server&lt;/em&gt;, which interprets the request and provides a response. An HTTP request can be thought of simply as a packet of information sent to the server, which the server attempts to interpret and respond to. Every time you visit a URL in a web browser, an HTTP request is made and the response is rendered by the browser as the website you see. It is possible to inspect this interaction using the development tools in a browser.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-07-17-blair-plumber-intro-files/devtools-screenshot-request.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;As seen above, this request is composed of a URL and a request method, which in the case of a web browser accessing a website, is GET.&lt;/p&gt;
&lt;div id=&#34;request&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Request&lt;/h4&gt;
&lt;p&gt;There are several components of an &lt;a href=&#34;https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html&#34;&gt;HTTP request&lt;/a&gt;, but here we’ll mention on only a few.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;URL: the address or endpoint for the request&lt;/li&gt;
&lt;li&gt;Verb / method: a specific method invoked on the endpoint (GET, POST, DELETE, PUT)&lt;/li&gt;
&lt;li&gt;Headers: additional data sent to the server, such as who is making the request and what type of response is expected&lt;/li&gt;
&lt;li&gt;Body: data sent to the server outside of the headers, common for POST and PUT requests&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the browser example above, a GET request was made by the web browser to www.rstudio.com.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;response&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Response&lt;/h4&gt;
&lt;p&gt;The API response mirrors the request to some extent. It includes headers that contain information about the response and a body that contains any data returned by the API. The headers include the HTTP status code that informs the client how the request was received, along with details about the content that’s being delivered. In the example of a web browser accessing www.rstudio.com, we can see below that the response headers include the status code (200) along with details about the response content, including the fact that the content returned is HTML. This HTML content is what the browser renders into a webpage.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-07-17-blair-plumber-intro-files/devtools-screenshot-response.png&#34; /&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;httr&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;httr&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;http://httr.r-lib.org/index.html&#34;&gt;&lt;code&gt;httr&lt;/code&gt;&lt;/a&gt; package provides a nice framework for working with HTTP requests in R. The following basic example demonstrates some of what we’ve already learned by using &lt;code&gt;httr&lt;/code&gt; and &lt;a href=&#34;http://httpbin.org/&#34;&gt;httpbin.org&lt;/a&gt;, which provides a playground of sorts for HTTP requests.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(httr)
# A simple GET request
response &amp;lt;- GET(&amp;quot;http://httpbin.org/get&amp;quot;)
response&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Response [http://httpbin.org/get]
##   Date: 2018-07-23 14:57
##   Status: 200
##   Content-Type: application/json
##   Size: 266 B
## {&amp;quot;args&amp;quot;:{},&amp;quot;headers&amp;quot;:{&amp;quot;Accept&amp;quot;:&amp;quot;application/json, text/xml, application/...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this example we’ve made a GET request to httpbin.org/get and received a response. We know our request was successful because we see that the status is 200. We also see that the response contains data in JSON format. The &lt;a href=&#34;http://httr.r-lib.org/articles/quickstart.html&#34;&gt;&lt;em&gt;Getting started with httr&lt;/em&gt;&lt;/a&gt; page provides additional examples of working with HTTP requests and responses.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;rest&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;REST&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Representational_state_transfer&#34;&gt;Representational State Transfer (REST)&lt;/a&gt; is an architectural style for APIs that includes specific constraints for building APIs to ensure that they are consistent, performant, and scalable. In order to be considered truly RESTful, an API must meet each of the following six constraints:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Uniform interface: clearly defined interface between client and server&lt;/li&gt;
&lt;li&gt;Stateless: state is managed via the requests themselves, not through reliance on an external service&lt;/li&gt;
&lt;li&gt;Cacheable: responses should be cacheable in order to improve scalability&lt;/li&gt;
&lt;li&gt;Client-Server: clear separation of client and server, each with it’s on distinct responsibilities in the exchange&lt;/li&gt;
&lt;li&gt;Layered System: there may be intermediaries between the client and the server, but the client should be unaware of them&lt;/li&gt;
&lt;li&gt;Code on Demand: the response can include logic executable by the client&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We could spend a lot of time diving further into each of these specifications, but that is beyond the scope of this post. More detail about REST can be found &lt;a href=&#34;https://www.restapitutorial.com&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;plumber&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Plumber&lt;/h2&gt;
&lt;p&gt;Creating RESTful APIs using R is straightforward using the &lt;code&gt;plumber&lt;/code&gt; package. Even if you have never written an API, &lt;code&gt;plumber&lt;/code&gt; makes it easy to turn existing R functions into API endpoints. Developing &lt;code&gt;plumber&lt;/code&gt; endpoints is simply a matter of providing specialized R comments before R functions. &lt;code&gt;plumber&lt;/code&gt; recognizes both &lt;code&gt;#&#39;&lt;/code&gt; and &lt;code&gt;#*&lt;/code&gt; comments, although the latter is recommended in order to avoid potential conflicts with &lt;a href=&#34;https://github.com/yihui/roxygen2&#34;&gt;&lt;code&gt;roxygen2&lt;/code&gt;&lt;/a&gt;. The following defines a &lt;code&gt;plumber&lt;/code&gt; endpoint that simply returns the data provided in the request query string.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(plumber)

#* @apiTitle Simple API

#* Echo provided text
#* @param text The text to be echoed in the response
#* @get /echo
function(text = &amp;quot;&amp;quot;) {
  list(
    message_echo = paste(&amp;quot;The text is:&amp;quot;, text)
  )
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we’ve defined a simple function that takes a parameter, &lt;code&gt;text&lt;/code&gt;, and returns it with some additional comments as part of a list. By default, &lt;code&gt;plumber&lt;/code&gt; will serialize the object returned from a function into JSON using the &lt;a href=&#34;https://github.com/jeroen/jsonlite&#34;&gt;&lt;code&gt;jsonlite&lt;/code&gt;&lt;/a&gt; package. We’ve provided specialized comments to inform &lt;code&gt;plumber&lt;/code&gt; that this endpoint is available at &lt;code&gt;api-url/echo&lt;/code&gt; and will respond to GET requests.&lt;/p&gt;
&lt;p&gt;There are a few ways this &lt;code&gt;plumber&lt;/code&gt; script can be run locally. First, assuming the file is saved as &lt;code&gt;plumber.R&lt;/code&gt;, the following code would start a local web server hosting the API.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plumber::plumb(&amp;quot;plumber.R&amp;quot;)$run(port = 5762)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once the web server has started, the API can be interacted with using any set of HTTP tools. We could even interact with it using &lt;code&gt;httr&lt;/code&gt; as demonstrated earlier, although we would need to open a separate R session to do so since the current R session is busy serving the API.&lt;/p&gt;
&lt;p&gt;The other method for running the API requires a recent &lt;a href=&#34;https://www.rstudio.com/products/rstudio/download/preview/&#34;&gt;preview build&lt;/a&gt; of the RStudio IDE. Recent preview builds include features that make it easier to work with &lt;code&gt;plumber&lt;/code&gt;. When editing a &lt;code&gt;plumber&lt;/code&gt; script in a recent version of the IDE, a “Run API” icon will appear in the top right hand corner of the source editor. Clicking this button will automatically run a line of code similar to the one we ran above to start a web server hosting the API. A &lt;a href=&#34;https://swagger.io&#34;&gt;swagger&lt;/a&gt;-generated UI will be rendered in the Viewer pane, and the API can be interacted with directly from within this UI.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-07-17-blair-plumber-intro-files/swagger-screenshot.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Now that we have a running &lt;code&gt;plumber&lt;/code&gt; API, we can query it using &lt;code&gt;curl&lt;/code&gt; from the command line to investigate it’s behavior.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;$ curl &amp;quot;localhost:5762/echo&amp;quot; | jq &amp;#39;.&amp;#39;
{
  &amp;quot;message_echo&amp;quot;: [
    &amp;quot;The text is: &amp;quot;
  ]
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, we queried the API without providing any additional data or parameters. As a result, the &lt;code&gt;text&lt;/code&gt; parameter is the default empty string, as seen in the response. In order to pass a value to our underlying function, we can define a query string in the request as follows:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;$ curl &amp;quot;localhost:5762/echo?text=Hi%20there&amp;quot; | jq &amp;#39;.&amp;#39;
{
  &amp;quot;message_echo&amp;quot;: [
    &amp;quot;The text is: Hi there&amp;quot;
  ]
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, the &lt;code&gt;text&lt;/code&gt; parameter is defined as part of the query string, which is appended to the end of the URL. Additional parameters could be defined by separating each key-value pair with &lt;code&gt;&amp;amp;&lt;/code&gt;. It’s also possible to pass the parameter as part of the request body. However, to leverage this method of data delivery, we need to update our API definition so that the &lt;code&gt;/echo&lt;/code&gt; endpoint also accepts POST requests. We’ll also update our API to consider multiple parameters, and return the parsed parameters along with the entire request body.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(plumber)

#* @apiTitle Simple API

#* Echo provided text
#* @param text The text to be echoed in the response
#* @param number A number to be echoed in the response
#* @get /echo
#* @post /echo
function(req, text = &amp;quot;&amp;quot;, number = 0) {
  list(
    message_echo = paste(&amp;quot;The text is:&amp;quot;, text),
    number_echo = paste(&amp;quot;The number is:&amp;quot;, number),
    raw_body = req$postBody
  )
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this new API definition, the following &lt;code&gt;curl&lt;/code&gt; request can be made to pass parameters to the API via the request body.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;$ curl --data &amp;quot;text=Hi%20there&amp;amp;number=42&amp;amp;other_param=something%20else&amp;quot; &amp;quot;localhost:5762/echo&amp;quot; | jq &amp;#39;.&amp;#39;
{
  &amp;quot;message_echo&amp;quot;: [
    &amp;quot;The text is: Hi there&amp;quot;
  ],
  &amp;quot;number_echo&amp;quot;: [
    &amp;quot;The number is: 42&amp;quot;
  ],
  &amp;quot;raw_body&amp;quot;: [
    &amp;quot;text=Hi%20there&amp;amp;number=42&amp;amp;other_param=something%20else&amp;quot;
  ]
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that we passed more than just &lt;code&gt;text&lt;/code&gt; and &lt;code&gt;number&lt;/code&gt; in the request body. &lt;code&gt;plumber&lt;/code&gt; parses the request body and matches any arguments found in the R function definition. Additional arguments, like &lt;code&gt;other_param&lt;/code&gt; in this case, are ignored. &lt;code&gt;plumber&lt;/code&gt; can parse the request body if it is URL-encoded or JSON. The following example shows the same request, but with the request body encoded as JSON.&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;$ curl --data &amp;#39;{&amp;quot;text&amp;quot;:&amp;quot;Hi there&amp;quot;, &amp;quot;number&amp;quot;:&amp;quot;42&amp;quot;, &amp;quot;other_param&amp;quot;:&amp;quot;something else&amp;quot;}&amp;#39; &amp;quot;localhost:5762/echo&amp;quot; | jq &amp;#39;.&amp;#39;
{
  &amp;quot;message_echo&amp;quot;: [
    &amp;quot;The text is: Hi there&amp;quot;
  ],
  &amp;quot;number_echo&amp;quot;: [
    &amp;quot;The number is: 42&amp;quot;
  ],
  &amp;quot;raw_body&amp;quot;: [
    &amp;quot;{\&amp;quot;text\&amp;quot;:\&amp;quot;Hi there\&amp;quot;, \&amp;quot;number\&amp;quot;:\&amp;quot;42\&amp;quot;, \&amp;quot;other_param\&amp;quot;:\&amp;quot;something else\&amp;quot;}&amp;quot;
  ]
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While these examples are fairly simple, they demonstrate the extraordinary facility of &lt;code&gt;plumber&lt;/code&gt;. Thanks to &lt;code&gt;plumber&lt;/code&gt;, it is now a fairly straightforward process to expose R functions so they can be consumed and leveraged by any number of systems and processes. We’ve only scratched the surface of its capabilities and, as mentioned, future posts will walk through the creation of a Slack app using &lt;code&gt;plumber&lt;/code&gt;. Comprehensive documentation for &lt;code&gt;plumber&lt;/code&gt; can be found &lt;a href=&#34;https://www.rplumber.io/docs/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;deploying&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Deploying&lt;/h2&gt;
&lt;p&gt;Up until now, we’ve just been interacting with our APIs in our local development environment. That’s great for development and testing, but when it comes time to expose an API to external services, we don’t want our laptop held responsible (at least, I don’t!). There are several &lt;a href=&#34;https://www.rplumber.io/docs/hosting.html&#34;&gt;deployment methods&lt;/a&gt; for &lt;code&gt;plumber&lt;/code&gt; outlined in the documentation. The most straightforward method of deployment is to use &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;. When editing a &lt;code&gt;plumber&lt;/code&gt; script in recent versions of the RStudio IDE, a blue publish button will appear in the top right-hand corner of the source editor. Clicking this button brings up a menu that enables the user to publish the API to an instance of RStudio Connect. Once published, API access and performance can be configured through RStudio Connect and the API can be leveraged by external systems and processes.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Web APIs are a powerful mechanism for providing systematic access to computational processes. Writing APIs with &lt;code&gt;plumber&lt;/code&gt; makes it easy for others to take advantage of the work you’ve created in R without the need to rely on batch processing or code rewriting. &lt;code&gt;plumber&lt;/code&gt; is exceptionally flexible and can be used to define a wide variety of endpoints. These endpoints can be used to integrate R with other systems. As an added bonus, downstream consumers of these APIs require no knowledge of R. They only need to know how to properly interact with the API via HTTP. &lt;code&gt;plumber&lt;/code&gt; provides a convenient and reliable bridge between R and other systems and/or languages used within an organization.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/07/23/rest-apis-and-plumber/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Reading and analysing log files in the RRD database format</title>
      <link>https://rviews.rstudio.com/2018/06/20/reading-rrd-files/</link>
      <pubDate>Wed, 20 Jun 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/06/20/reading-rrd-files/</guid>
      <description>
        

&lt;p&gt;I have frequent conversations with R champions and Systems Administrators responsible for R, in which they ask how they can measure and analyze the usage of their servers.  Among the many solutions to this problem, one of the my favourites is to use an &lt;strong&gt;RRD&lt;/strong&gt; database and &lt;strong&gt;RRDtool&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;From &lt;a href=&#34;https://en.wikipedia.org/wiki/RRDtool&#34;&gt;Wikipedia&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;RRDtool&lt;/strong&gt; (&lt;em&gt;round-robin database tool&lt;/em&gt;) aims to handle time series data such as network bandwidth, temperatures or CPU load. The data is stored in a &lt;a href=&#34;https://en.wikipedia.org/wiki/Circular_buffer&#34;&gt;circular buffer&lt;/a&gt; based &lt;a href=&#34;https://en.wikipedia.org/wiki/Database&#34;&gt;database&lt;/a&gt;, thus the system storage footprint remains constant over time.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&#34;https://oss.oetiker.ch/rrdtool/index.en.html&#34;&gt;RRDtool&lt;/a&gt; is a library written in C, with implementations that can also be accessed from the Linux command line. This makes it convenient for system development, but makes it difficult for R users to extract and analyze this data.&lt;/p&gt;

&lt;p&gt;I am pleased to announce that I&amp;rsquo;ve been working on the &lt;code&gt;rrd&lt;/code&gt; &lt;a href=&#34;https://github.com/andrie/rrd&#34;&gt;R package&lt;/a&gt; to import RRD files directly into &lt;code&gt;tibble&lt;/code&gt; objects, thus making it easy to analyze your metrics.&lt;/p&gt;

&lt;p&gt;As an aside, the RStudio Pro products (specifically &lt;a href=&#34;https://www.rstudio.com/products/rstudio-server-pro/&#34;&gt;RStudio Server Pro&lt;/a&gt; and &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;) also make use of RRD to store metrics &amp;ndash; more about this later.&lt;/p&gt;

&lt;h2 id=&#34;understanding-the-rrd-format-as-an-r-user&#34;&gt;Understanding the RRD format as an R user&lt;/h2&gt;

&lt;p&gt;The name RRD is an initialism of &lt;strong&gt;R&lt;/strong&gt;ound &lt;strong&gt;R&lt;/strong&gt;obin &lt;strong&gt;D&lt;/strong&gt;atabase.  The &amp;ldquo;round robin&amp;rdquo; refers to the fact that the database is always fixed in size, and as a new entry enters the database, the oldest entry is discarded. In practical terms, the database collects data for a fixed period of time, and information that is older than the threshold gets removed.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-06-21-analysing-rrd-files_files/rra.png&#34; alt=&#34;Image from loriotpro(https://bit.ly/2tk2MFa)&#34; /&gt;&lt;/p&gt;

&lt;p&gt;A second quality of RRD databases is that each datum is stored in different &amp;ldquo;consolidation data points&amp;rdquo;, where every data point is an aggregation over time. For example, a data point can represent an average value for the time period, or a maximum over the period.  Typical consolidation functions include &lt;code&gt;average&lt;/code&gt;, &lt;code&gt;min&lt;/code&gt; and  &lt;code&gt;max&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The third quality is that every RRD database file typically consists of multiple archives. Each archive measures data for a different time period. For instance, the archives can capture data for intervals of 10 seconds, 30 seconds, 1 minute or 5 minutes.&lt;/p&gt;

&lt;p&gt;As an example, here is a description of an RRD file that originated in RStudio Connect:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;describe_rrd(&amp;quot;rrd_cpu_0&amp;quot;)
#&amp;gt; A RRD file with 10 RRA arrays and step size 60
#&amp;gt; [1] AVERAGE_60 (43200 rows)
#&amp;gt; [2] AVERAGE_300 (25920 rows)
#&amp;gt; [3] MIN_300 (25920 rows)
#&amp;gt; [4] MAX_300 (25920 rows)
#&amp;gt; [5] AVERAGE_3600 (8760 rows)
#&amp;gt; [6] MIN_3600 (8760 rows)
#&amp;gt; [7] MAX_3600 (8760 rows)
#&amp;gt; [8] AVERAGE_86400 (1825 rows)
#&amp;gt; [9] MIN_86400 (1825 rows)
#&amp;gt; [10] MAX_86400 (1825 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This &lt;code&gt;RRD&lt;/code&gt; file contains data for the properties of CPU 0 of the system.  In this example, the first &lt;code&gt;RRA&lt;/code&gt; archive contains averaged metrics for one minute (60s) intervals, while the second &lt;code&gt;RRA&lt;/code&gt; measures the same metric, but averaged over five minutes.  The same metrics are also available for intervals of one hour and one day.&lt;/p&gt;

&lt;p&gt;Notice also that every archive has a different number of rows, representing a different historical period where the data is kept.  For example, the &lt;em&gt;per minute&lt;/em&gt; data &lt;code&gt;AVERAGE_60&lt;/code&gt; is retained for 43,200 periods (12 days) while the &lt;em&gt;daily&lt;/em&gt; data &lt;code&gt;MAX_86400&lt;/code&gt; is retained for 1,825 periods (5 years).&lt;/p&gt;

&lt;p&gt;If you want to know more, please read the excellent &lt;a href=&#34;https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html&#34;&gt;introduction tutorial&lt;/a&gt; to RRD database.&lt;/p&gt;

&lt;h2 id=&#34;introducing-the-rrd-package&#34;&gt;Introducing the &lt;code&gt;rrd&lt;/code&gt; package&lt;/h2&gt;

&lt;p&gt;Until recently, it wasn&amp;rsquo;t easy to import RRD files into R.  But I was pleased to discover that a &lt;a href=&#34;https://www.google-melange.com/archive/gsoc/2014&#34;&gt;Google Summer of Code 2014&lt;/a&gt; project created a proof-of-concept R package to read these files.  The author of this package is &lt;a href=&#34;http://plamendimitrov.net/&#34;&gt;Plamen Dimitrov&lt;/a&gt;, who published the code on &lt;a href=&#34;https://github.com/pldimitrov/Rrd&#34;&gt;GitHub&lt;/a&gt; and also wrote an &lt;a href=&#34;http://plamendimitrov.net/blog/2014/08/09/r-package-for-working-with-rrd-files/&#34;&gt;explanatory blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Because I had to provide some suggestions to our customers, I decided to update the package, provide some example code, and generally improve the reliability.&lt;/p&gt;

&lt;p&gt;The result is not yet on CRAN, but you can install the development version of package from &lt;a href=&#34;https://github.com/andrie/rrd&#34;&gt;github&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&#34;installing-the-package&#34;&gt;Installing the package&lt;/h3&gt;

&lt;p&gt;To build the package from source, you first need to install &lt;a href=&#34;http://oss.oetiker.ch/rrdtool/doc/librrd.en.html&#34;&gt;librrd&lt;/a&gt;. Installing &lt;a href=&#34;http://oss.oetiker.ch/rrdtool/&#34;&gt;RRDtool&lt;/a&gt; from your Linux package manager will usually also install this library.&lt;/p&gt;

&lt;p&gt;Using Ubuntu:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-sh&#34;&gt;sudo apt-get install rrdtool librrd-dev
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Using RHEL / CentOS:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-sh&#34;&gt;sudo yum install rrdtool rrdtool-devel
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Once you have the system requirements in place, you can install the development version of the R package from &lt;a href=&#34;https://github.com/andrie/rrd&#34;&gt;GitHub&lt;/a&gt; using:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;# install.packages(&amp;quot;devtools&amp;quot;)
devtools::install_github(&amp;quot;andrie/rrd&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&#34;limitations&#34;&gt;Limitations&lt;/h3&gt;

&lt;p&gt;The package is not yet available for Windows.&lt;/p&gt;

&lt;h3 id=&#34;using-the-package&#34;&gt;Using the package&lt;/h3&gt;

&lt;p&gt;Once you&amp;rsquo;ve installed the package, you can start to use it. The package itself contains some built-in RRD files, so you should be able to run the following code directly.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(rrd)
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id=&#34;describing-the-contents-of-a-rrd&#34;&gt;Describing the contents of a RRD&lt;/h4&gt;

&lt;p&gt;To describe the contents of an RRD file, use &lt;code&gt;describe_rrd()&lt;/code&gt;. This function reports information about the names of each archive (RRA) file, the consolidation function, and the number of observations:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;rrd_cpu_0 &amp;lt;- system.file(&amp;quot;extdata/cpu-0.rrd&amp;quot;, package = &amp;quot;rrd&amp;quot;)

describe_rrd(rrd_cpu_0)
#&amp;gt; A RRD file with 10 RRA arrays and step size 60
#&amp;gt; [1] AVERAGE_60 (43200 rows)
#&amp;gt; [2] AVERAGE_300 (25920 rows)
#&amp;gt; [3] MIN_300 (25920 rows)
#&amp;gt; [4] MAX_300 (25920 rows)
#&amp;gt; [5] AVERAGE_3600 (8760 rows)
#&amp;gt; [6] MIN_3600 (8760 rows)
#&amp;gt; [7] MAX_3600 (8760 rows)
#&amp;gt; [8] AVERAGE_86400 (1825 rows)
#&amp;gt; [9] MIN_86400 (1825 rows)
#&amp;gt; [10] MAX_86400 (1825 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id=&#34;reading-an-entire-rrd-file&#34;&gt;Reading an entire RRD file&lt;/h4&gt;

&lt;p&gt;To read an entire RRD file, i.e. all of the RRA archives, use &lt;code&gt;read_rrd()&lt;/code&gt;. This returns a list of &lt;code&gt;tibble&lt;/code&gt; objects:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;cpu &amp;lt;- read_rrd(rrd_cpu_0)

str(cpu, max.level = 1)
#&amp;gt; List of 10
#&amp;gt;  $ AVERAGE60   :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    43199 obs. of  9 variables:
#&amp;gt;  $ AVERAGE300  :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    25919 obs. of  9 variables:
#&amp;gt;  $ MIN300      :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    25919 obs. of  9 variables:
#&amp;gt;  $ MAX300      :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    25919 obs. of  9 variables:
#&amp;gt;  $ AVERAGE3600 :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    8759 obs. of  9 variables:
#&amp;gt;  $ MIN3600     :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    8759 obs. of  9 variables:
#&amp;gt;  $ MAX3600     :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    8759 obs. of  9 variables:
#&amp;gt;  $ AVERAGE86400:Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    1824 obs. of  9 variables:
#&amp;gt;  $ MIN86400    :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    1824 obs. of  9 variables:
#&amp;gt;  $ MAX86400    :Classes &#39;tbl_df&#39;, &#39;tbl&#39; and &#39;data.frame&#39;:    1824 obs. of  9 variables:
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Since the resulting object is a list of &lt;code&gt;tibble&lt;/code&gt; objects, you can easily use R functions to work with an individual archive:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;names(cpu)
#&amp;gt;  [1] &amp;quot;AVERAGE60&amp;quot;    &amp;quot;AVERAGE300&amp;quot;   &amp;quot;MIN300&amp;quot;       &amp;quot;MAX300&amp;quot;      
#&amp;gt;  [5] &amp;quot;AVERAGE3600&amp;quot;  &amp;quot;MIN3600&amp;quot;      &amp;quot;MAX3600&amp;quot;      &amp;quot;AVERAGE86400&amp;quot;
#&amp;gt;  [9] &amp;quot;MIN86400&amp;quot;     &amp;quot;MAX86400&amp;quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To inspect the contents of the first archive (&lt;code&gt;AVERAGE60&lt;/code&gt;), simply print the object - since it&amp;rsquo;s a &lt;code&gt;tibble&lt;/code&gt;, you get 10 lines of output.&lt;/p&gt;

&lt;p&gt;For example, the CPU metrics contains a time stamp and metrics for average &lt;em&gt;user&lt;/em&gt; and &lt;em&gt;sys&lt;/em&gt; usage, as well as the &lt;a href=&#34;https://en.wikipedia.org/wiki/Nice_(Unix)&#34;&gt;&lt;em&gt;nice&lt;/em&gt;&lt;/a&gt; value, &lt;em&gt;idle&lt;/em&gt; time, &lt;a href=&#34;https://en.wikipedia.org/wiki/Interrupt_request_(PC_architecture)&#34;&gt;&lt;em&gt;interrupt requests&lt;/em&gt;&lt;/a&gt; and &lt;em&gt;soft interrupt requests&lt;/em&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;cpu[[1]]
#&amp;gt; # A tibble: 43,199 x 9
#&amp;gt;    timestamp              user     sys  nice  idle  wait   irq softirq
#&amp;gt;  * &amp;lt;dttm&amp;gt;                &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
#&amp;gt;  1 2018-04-02 12:24:00 0.0104  0.00811     0 0.981     0     0       0
#&amp;gt;  2 2018-04-02 12:25:00 0.0126  0.00630     0 0.979     0     0       0
#&amp;gt;  3 2018-04-02 12:26:00 0.0159  0.00808     0 0.976     0     0       0
#&amp;gt;  4 2018-04-02 12:27:00 0.00853 0.00647     0 0.985     0     0       0
#&amp;gt;  5 2018-04-02 12:28:00 0.0122  0.00999     0 0.978     0     0       0
#&amp;gt;  6 2018-04-02 12:29:00 0.0106  0.00604     0 0.983     0     0       0
#&amp;gt;  7 2018-04-02 12:30:00 0.0147  0.00427     0 0.981     0     0       0
#&amp;gt;  8 2018-04-02 12:31:00 0.0193  0.00767     0 0.971     0     0       0
#&amp;gt;  9 2018-04-02 12:32:00 0.0300  0.0274      0 0.943     0     0       0
#&amp;gt; 10 2018-04-02 12:33:00 0.0162  0.00617     0 0.978     0     0       0
#&amp;gt; # ... with 43,189 more rows, and 1 more variable: stolen &amp;lt;dbl&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Since the data is in &lt;code&gt;tibble&lt;/code&gt; format, you can easily extract specific data, e.g., the last values of the system usage:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;
tail(cpu$AVERAGE60$sys)
#&amp;gt; [1] 0.0014390667 0.0020080000 0.0005689333 0.0000000000 0.0014390667
#&amp;gt; [6] 0.0005689333
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id=&#34;reading-only-a-single-archive&#34;&gt;Reading only a single archive&lt;/h4&gt;

&lt;p&gt;The underlying code in the &lt;code&gt;rrd&lt;/code&gt; package is written in C, and is therefore blazingly fast. Reading an entire RRD file takes a fraction of a second, but sometimes you may want to extract a specific RRA archive immediately.&lt;/p&gt;

&lt;p&gt;To read a single RRA archive from an RRD file, use &lt;code&gt;read_rra()&lt;/code&gt;. To use this function, you must specify several arguments that define the specific data to retrieve. This includes the consolidation function (e.g., &lt;code&gt;&amp;quot;AVERAGE&amp;quot;&lt;/code&gt;) and time step (e.g., &lt;code&gt;60&lt;/code&gt;). You must also specify either the &lt;code&gt;start&lt;/code&gt; time or the number of steps, &lt;code&gt;n_steps&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In this example, I extract the average for one-minute periods (&lt;code&gt;step = 60&lt;/code&gt;) for one day (&lt;code&gt;n_steps = 24 * 60&lt;/code&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;end_time &amp;lt;- as.POSIXct(&amp;quot;2018-05-02&amp;quot;) # timestamp with data in example
avg_60 &amp;lt;- read_rra(rrd_cpu_0, cf = &amp;quot;AVERAGE&amp;quot;, step = 60, n_steps = 24 * 60,
                     end = end_time)

avg_60
#&amp;gt; # A tibble: 1,440 x 9
#&amp;gt;    timestamp              user      sys  nice  idle     wait   irq softirq
#&amp;gt;  * &amp;lt;dttm&amp;gt;                &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
#&amp;gt;  1 2018-05-01 00:01:00 0.00458 0.00201      0 0.992 0            0       0
#&amp;gt;  2 2018-05-01 00:02:00 0.00258 0.000570     0 0.996 0            0       0
#&amp;gt;  3 2018-05-01 00:03:00 0.00633 0.00144      0 0.992 0            0       0
#&amp;gt;  4 2018-05-01 00:04:00 0.00515 0.00201      0 0.991 0            0       0
#&amp;gt;  5 2018-05-01 00:05:00 0.00402 0.000569     0 0.995 0            0       0
#&amp;gt;  6 2018-05-01 00:06:00 0.00689 0.00144      0 0.992 0            0       0
#&amp;gt;  7 2018-05-01 00:07:00 0.00371 0.00201      0 0.993 0.00144      0       0
#&amp;gt;  8 2018-05-01 00:08:00 0.00488 0.00201      0 0.993 0.000569     0       0
#&amp;gt;  9 2018-05-01 00:09:00 0.00748 0.000568     0 0.992 0            0       0
#&amp;gt; 10 2018-05-01 00:10:00 0.00516 0            0 0.995 0            0       0
#&amp;gt; # ... with 1,430 more rows, and 1 more variable: stolen &amp;lt;dbl&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id=&#34;plotting-the-results&#34;&gt;Plotting the results&lt;/h4&gt;

&lt;p&gt;The original &lt;code&gt;RRDTool&lt;/code&gt; library for Linux contains some functions to &lt;a href=&#34;https://oss.oetiker.ch/rrdtool/gallery/index.en.html&#34;&gt;easily plot&lt;/a&gt; the RRD data, a feature that distinguishes RRD from many other databases.&lt;/p&gt;

&lt;p&gt;However, R already has very rich plotting capability, so the &lt;code&gt;rrd&lt;/code&gt; R package doesn&amp;rsquo;t expose any specific plotting functions.&lt;/p&gt;

&lt;p&gt;For example, you can easily plot these data using your favourite packages, like &lt;code&gt;ggplot2&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;library(ggplot2)
ggplot(avg_60, aes(x = timestamp, y = user)) + 
  geom_line() +
  stat_smooth(method = &amp;quot;loess&amp;quot;, span = 0.125, se = FALSE) +
  ggtitle(&amp;quot;CPU0 usage, data read from RRD file&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-06-21-analysing-rrd-files_files/ggplot.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;getting-the-rrd-files-from-rstudio-server-pro-and-rstudio-connect&#34;&gt;Getting the RRD files from RStudio Server Pro and RStudio Connect&lt;/h2&gt;

&lt;p&gt;As I mentioned in the introduction, both &lt;a href=&#34;https://www.rstudio.com/products/rstudio-server-pro/&#34;&gt;RStudio Server Pro&lt;/a&gt; and &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt; use RRD to store metrics. In fact, these metrics are used to power the administration dashboard of these products.&lt;/p&gt;

&lt;p&gt;This means that often the easiest solution is simply to enable the admin dashboard and view the information there.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-06-21-analysing-rrd-files_files/rsp_admin_dashboard.png&#34; alt=&#34;RStudio Server Pro admin dashboard&#34; /&gt;&lt;/p&gt;

&lt;p&gt;However, sometimes R users and system administrators have a need to analyze the metrics in more detail, so in this section, I discuss where you can find the files for analysis.&lt;/p&gt;

&lt;p&gt;The administration guides for these products explain where to find the metrics files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The admin guide for &lt;strong&gt;RStudio Server Pro&lt;/strong&gt; discusses metrics in this in section &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/auditing-and-monitoring.html#monitoring-configuration&#34;&gt;8.2 Monitoring Configuration&lt;/a&gt;.

&lt;ul&gt;
&lt;li&gt;By default, the metrics are stored at &lt;code&gt;/var/lib/rstudio-server/monitor/rrd&lt;/code&gt;, although this path is configurable by the server administrator&lt;/li&gt;
&lt;li&gt;RStudio Server Pro stores system metrics as well as user metrics&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RStudio Connect&lt;/strong&gt; discusses metrics in section &lt;a href=&#34;http://docs.rstudio.com/connect/admin/historical-information.html#metrics&#34;&gt;16.1 Historical Metrics&lt;/a&gt;

&lt;ul&gt;
&lt;li&gt;The default path for metrics logs is &lt;code&gt;/var/lib/rstudio-connect/metrics&lt;/code&gt;, though again, this is configurable by the server administrator.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;rsc &amp;lt;- &amp;quot;/var/lib/rstudio-connect/metrics/rrd&amp;quot;
rsp &amp;lt;- &amp;quot;/var/lib/rstudio-server/monitor/rrd&amp;quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you want to analyze these files, it is best to copy the files to a different location.  The security and permissions on both products are configured in such a way that it&amp;rsquo;s not possible to read the files while they are in the original folder. Therefore, we recommend that you copy the files to a different location and do the analysis there.&lt;/p&gt;

&lt;h3 id=&#34;warning-about-using-the-rstudio-connect-rrd-files&#34;&gt;Warning about using the RStudio Connect RRD files:&lt;/h3&gt;

&lt;p&gt;The RStudio Connect team is actively planning to change the way content-level metrics are stored, so data related to shiny apps, markdown reports, etc. will likely look different in a future release.&lt;/p&gt;

&lt;p&gt;To be clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The schemas might change&lt;/li&gt;
&lt;li&gt;RStudio Connect may stop tracking some metrics&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s also possible that the entire mechanism might change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only guarantees that we make in RStudio Connect are around the data that we actually surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;server-wide user counts&lt;/li&gt;
&lt;li&gt;RAM&lt;/li&gt;
&lt;li&gt;CPU data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means that if you analyze RRD files, you should be aware that &lt;strong&gt;the entire mechanism for storing metrics might change in future&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 id=&#34;additional-caveat&#34;&gt;Additional caveat&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The metrics collection process runs in a sandboxed environment, and it is not possible to publish a report to RStudio Connect that reads the metrics directly. If you want to automate a process to read the Connect metrics, you will have to set up a &lt;a href=&#34;https://en.wikipedia.org/wiki/Cron&#34;&gt;cron&lt;/a&gt; job to copy the files to a different location, and run the analysis against the copied files. (Also, re-read the warning that everything might change!)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;example&#34;&gt;Example&lt;/h3&gt;

&lt;p&gt;In the following worked example, I copied some &lt;code&gt;rrd&lt;/code&gt; files that originated in RStudio Connect to a different location on disk, and stored this in a &lt;a href=&#34;https://github.com/rstudio/config&#34;&gt;&lt;code&gt;config&lt;/code&gt;&lt;/a&gt; file.&lt;/p&gt;

&lt;p&gt;First, list the file names:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;config &amp;lt;- config::get()
rrd_location &amp;lt;- config$rrd_location
rrd_location %&amp;gt;% 
  list.files() %&amp;gt;% 
  tail(20)
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;content-978.rrd&amp;quot;      &amp;quot;content-986.rrd&amp;quot;      &amp;quot;content-98.rrd&amp;quot;      
##  [4] &amp;quot;content-990.rrd&amp;quot;      &amp;quot;content-995.rrd&amp;quot;      &amp;quot;content-998.rrd&amp;quot;     
##  [7] &amp;quot;cpu-0.rrd&amp;quot;            &amp;quot;cpu-1.rrd&amp;quot;            &amp;quot;cpu-2.rrd&amp;quot;           
## [10] &amp;quot;cpu-3.rrd&amp;quot;            &amp;quot;license-users.rrd&amp;quot;    &amp;quot;network-eth0.rrd&amp;quot;    
## [13] &amp;quot;network-lo.rrd&amp;quot;       &amp;quot;system-CPU.rrd&amp;quot;       &amp;quot;system.cpu.usage.rrd&amp;quot;
## [16] &amp;quot;system.load.rrd&amp;quot;      &amp;quot;system.memory.rrd&amp;quot;    &amp;quot;system-RAM.rrd&amp;quot;      
## [19] &amp;quot;system.swap.rrd&amp;quot;      &amp;quot;system-SWAP.rrd&amp;quot;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The file names indicated that RStudio Connect collects metrics for the system (CPU, RAM, etc.), as well as for every piece of published content.&lt;/p&gt;

&lt;p&gt;To look at the system load, first describe the contents of the &lt;code&gt;&amp;quot;system.load.rrd&amp;quot;&lt;/code&gt; file:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;sys_load &amp;lt;- file.path(rrd_location, &amp;quot;system.load.rrd&amp;quot;)
describe_rrd(sys_load)
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## A RRD file with 10 RRA arrays and step size 60
## [1] AVERAGE_60 (43200 rows)
## [2] AVERAGE_300 (25920 rows)
## [3] MIN_300 (25920 rows)
## [4] MAX_300 (25920 rows)
## [5] AVERAGE_3600 (8760 rows)
## [6] MIN_3600 (8760 rows)
## [7] MAX_3600 (8760 rows)
## [8] AVERAGE_86400 (1825 rows)
## [9] MIN_86400 (1825 rows)
## [10] MAX_86400 (1825 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This output tells you that metrics are collected every 60 seconds (one minute), and then in selected multiples (1 minute, 5 minutes, 1 hour and 1 day.) You can also tell that the consolidation functions are &lt;code&gt;average&lt;/code&gt;, &lt;code&gt;min&lt;/code&gt; and &lt;code&gt;max&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To extract one month of data, averaged at 5-minute intervals use &lt;code&gt;step = 300&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;dat &amp;lt;- read_rra(sys_load, cf = &amp;quot;AVERAGE&amp;quot;, step = 300L, n_steps = (3600 / 300) * 24 * 30)
dat
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;## # A tibble: 8,640 x 4
##    timestamp            `1min` `5min` `15min`
##  * &amp;lt;dttm&amp;gt;                &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
##  1 2018-05-10 19:10:00 0.0254  0.0214  0.05  
##  2 2018-05-10 19:15:00 0.263   0.153   0.0920
##  3 2018-05-10 19:20:00 0.0510  0.117   0.101 
##  4 2018-05-10 19:25:00 0.00137 0.0509  0.0781
##  5 2018-05-10 19:30:00 0       0.0168  0.0534
##  6 2018-05-10 19:35:00 0       0.01    0.05  
##  7 2018-05-10 19:40:00 0.0146  0.0166  0.05  
##  8 2018-05-10 19:45:00 0.00147 0.0115  0.05  
##  9 2018-05-10 19:50:00 0.0381  0.0306  0.05  
## 10 2018-05-10 19:55:00 0.0105  0.018   0.05  
## # ... with 8,630 more rows
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It is very easy to plot this using your preferred plotting package, e.g., &lt;code&gt;ggplot2&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;ggplot(dat, aes(x = timestamp, y = `5min`)) + 
  geom_line() + 
  stat_smooth(method = &amp;quot;loess&amp;quot;, span = 0.125)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-06-21-analysing-rrd-files_files/ggplot.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;rrd&lt;/code&gt; package, available from &lt;a href=&#34;https://github.com/andrie/rrd&#34;&gt;GitHub&lt;/a&gt;, makes it very easy to read metrics stored in the RRD database format. Reading an archive is very quick, and your resulting data is a &lt;code&gt;tibble&lt;/code&gt; for an individual archive, or a list of &lt;code&gt;tibble&lt;/code&gt;s for the entire file.&lt;/p&gt;

&lt;p&gt;This makes it easy to analyze your data using the &lt;code&gt;tidyverse&lt;/code&gt; packages, and to plot the information.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/06/20/reading-rrd-files/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Enterprise Dashboards with R Markdown</title>
      <link>https://rviews.rstudio.com/2018/05/16/replacing-excel-reports-with-r-markdown-and-shiny/</link>
      <pubDate>Wed, 16 May 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/05/16/replacing-excel-reports-with-r-markdown-and-shiny/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;This is a second post in a series on enterprise dashboards. See our previous post, &lt;a href=&#34;https://rviews.rstudio.com/2017/09/20/dashboards-with-r-and-databases/&#34;&gt;Enterprise-ready dashboards with Shiny Databases&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We have been living with spreadsheets for so long that most office workers think it is obvious that spreadsheets generated with programs like &lt;a href=&#34;https://products.office.com/en-us/excel&#34;&gt;Microsoft Excel&lt;/a&gt; make it easy to understand data and communicate insights. Everyone in a business, from the newest intern to the CEO, has had some experience with spreadsheets. But using Excel as the de facto analytic standard is problematic. Relying exclusively on Excel produces environments where it is almost impossible to organize and maintain efficient operational workflows. In addition to fostering low productivity, organizations risk profits and reputations in an age where insightful analyses and process control translate to a competitive advantage. Most organizations want better control over accessing, distributing, and processing data. You can use the R programming language, along with with R Markdown reports and RStudio Connect, to build enterprise dashboards that are robust, secure, and manageable.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-05-16-replacing-excel-with-r-markdown-and-shiny/tracker-excel.png&#34; width=&#34;400&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;This Excel dashboard attempts to function as a real application by allowing its users to filter and visualize key metrics about customers. It took dozens of hours to build. The intent was to hand off maintenance to someone else, but the dashboard was so complex that the author was forced to maintain it. Every week, the author copied data from an ETL tool and pasted it into the workbook, spot checked a few cells, and then emailed the entire workbook to a distribution list. Everyone on the distribution list got a new copy in their inbox every week. There were no security controls around data management or data access. Anyone with the report could modify its contents. The update process often broke the brittle cell dependencies; or worse, discrepancies between weeks passed unnoticed. It was almost impossible to guarantee the integrity of each weekly report.&lt;/p&gt;
&lt;div id=&#34;why-coding-is-important&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Why coding is important&lt;/h3&gt;
&lt;p&gt;Excel workbooks are hard to maintain, collaborate on, and debug because they are not reproducible. The content of every cell and the design of every chart is set without ever recording the author’s actions. There is no simple way to recreate an Excel workbook because there is no recipe (i.e., set of instructions) that describes how it was made. Because Excel workbooks lack a recipe, they tend to be hard to maintain and prone to errors. It takes care, vigilance, and subject-matter knowledge to maintain a complex Excel workbook. Even then, human errors abound and changes require a lot of effort.&lt;/p&gt;
&lt;p&gt;A better approach is to write code. There are many &lt;a href=&#34;https://twitter.com/MaartenvSmeden/status/995791001825431552&#34;&gt;reasons to start programming&lt;/a&gt;. When you create a recipe with code, anyone can reproduce your work (including your future self). The act of coding implicitly invites others to collaborate with you. You can systematically validate and debug your code. All of these things lead to better code over time. Coding in R has particular advantages given its vast ecosystem of packages, its vibrant community, and its powerful tool chain.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;using-r-markdown&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Using R Markdown&lt;/h3&gt;
&lt;p&gt;There are many tools for replacing complex Excel dashboards with R code. One of these tools is &lt;a href=&#34;https://rmarkdown.rstudio.com/&#34;&gt;R Markdown&lt;/a&gt;, an open-source R package that turns your analyses into high quality documents, reports, presentations and dashboards. R Markdown documents are fully reproducible and support dozens of output formats including HTML, PDF, and Microsoft Word documents.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-05-16-replacing-excel-with-r-markdown-and-shiny/tracker-rmd.png&#34; width=&#34;400&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;&lt;a href=&#34;http://colorado.rstudio.com/rsc/tracker-report/tracker-report.html&#34;&gt;Here&lt;/a&gt; is the same Excel dashboard translated to an R Markdown report. Because this report is written in code, it is vastly simpler and easier to maintain. Like the Excel dashboard above, this R Markdown report is designed to take user inputs so that it could render custom report versions.&lt;/p&gt;
&lt;p&gt;Many people are already aware that R Markdown reports combine narrative, code, and output in a single document. What is less commonly known is that you can generalize any R Markdown report by declaring parameters in the document header. R Markdown documents with parameters are known as &lt;a href=&#34;https://rmarkdown.rstudio.com/developer_parameterized_reports.html&#34;&gt;parameterized reports&lt;/a&gt;. In the Excel dashboard users can select &lt;code&gt;segment&lt;/code&gt;, &lt;code&gt;group&lt;/code&gt;, and &lt;code&gt;period&lt;/code&gt;. In a parameterized R Markdown document, you would specify these inputs with the following YAML header:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---
title: Customer Tracker Report
output: html_notebook
params:
  seg: 
    label: &amp;quot;Segment:&amp;quot;
    value: Total
    input: select
    choices: [Total, Heavy, Mainstream, Focus1, Focus2, 
              Specialty, Diverse1, Diverse2, Other, New]
  grp: 
    label: &amp;quot;Group:&amp;quot;
    value: Total
    input: select
    choices: [Total, Core, Extra]
  per: 
    label: &amp;quot;Period:&amp;quot;
    value: Week
    input: radio
    choices: [Week, YTD]
---&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can then call the parameters you declare in the YAML header from your R code chunks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;```{r}
params$segment
params$grp
params$per
```&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can render the document with different inputs by selecting &lt;a href=&#34;https://rmarkdown.rstudio.com/developer_parameterized_reports.html#parameter_user_interfaces&#34;&gt;knit with parameters&lt;/a&gt; in RStudio. This option will open a user interface that allows you to select the parameters you want.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://media.giphy.com/media/vwicMYfRPL6YuRQGfo/giphy.gif&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;If you want to automate the process of creating custom report versions, you can render these documents programmatically with the &lt;code&gt;rmarkdown::render()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;rmarkdown::render(
  input = &amp;quot;tracker-report.Rmd&amp;quot;, 
  params = list(seg = &amp;quot;Focus1&amp;quot;, grp = &amp;quot;Core&amp;quot;, per = &amp;quot;Weekly&amp;quot;)
)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;publishing-to-rstudio-connect&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Publishing to RStudio Connect&lt;/h3&gt;
&lt;p&gt;Managing access and permissions for an ocean of Excel files is painful. Data in Excel spreads through an organization without controls like a virus spreads through a body without disease prevention. There are better ways to secure the operation, access, and distribution of information.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;https://media.giphy.com/media/9M52kMrLHrDfxI3nrq/giphy.gif&#34; /&gt;

&lt;/div&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2018-05-16-replacing-excel-with-r-markdown-and-shiny/pb-publishing.png&#34; width=&#34;50&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;RStudio Connect is a server product from RStudio that is designed for secure sharing of R content. It is on-premises software you run behind your firewall. You keep control of your data and of who has access. With RStudio Connect, you can see all your content, decide who should be able to view and collaborate on it, tune performance, schedule updates, and view logs. You can schedule your R Markdown reports to run automatically or even distribute the latest version by email.&lt;/p&gt;
&lt;p&gt;When you publish a parameterized R Markdown report to RStudio Connect, an interface appears for selecting inputs. Viewers can create new report versions, then email themselves a copy. Collaborators can save and schedule new report versions, then email others a copy. You can even attach &lt;a href=&#34;http://docs.rstudio.com/connect/1.6.2/user/r-markdown.html#r-markdown-output-files&#34;&gt;output files&lt;/a&gt; to these versions. Using parameterized R Markdown documents in RStudio Connect is a powerful way to communicate information.&lt;/p&gt;
&lt;p&gt;You can publish content from the RStudio IDE by clicking the &lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/228270928-Push-button-publishing-to-RStudio-Connect&#34;&gt;Publish button&lt;/a&gt; that looks like a blue Eye of Horus. Pressing this button will begin the publishing process. First, it creates a set of instructions for recreating your content. Second, it deploys your content bundle to the server. Third, it recreates your content on RStudio Connect. Push-button publishing has a long history of being used with RStudio. In 2012, RStudio enabled push-button publishing of R Markdown documents to &lt;a href=&#34;https://rpubs.com/&#34;&gt;RPubs&lt;/a&gt;. In 2014, RStudio enabled push-button publishing of Shiny apps to &lt;a href=&#34;http://www.shinyapps.io/&#34;&gt;shinyapps.io&lt;/a&gt;. In 2016, RStudio enabled push-button publishing to &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;adding-shiny&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Adding Shiny&lt;/h3&gt;
&lt;p&gt;R Markdown documents are rendered with batch processing. That makes them ideal for automation, long running workflows, and custom report versions. However, if you want your documents to be immediately reactive to user input, then you can add a Shiny runtime. These &lt;a href=&#34;https://rmarkdown.rstudio.com/authoring_shiny.html&#34;&gt;interactive documents&lt;/a&gt; behave like a Shiny application in that they must be hosted. You can host &lt;a href=&#34;https://rmarkdown.rstudio.com/authoring_shiny.html&#34;&gt;interactive documents&lt;/a&gt; and &lt;a href=&#34;http://shiny.rstudio.com/&#34;&gt;Shiny applications&lt;/a&gt; with RStudio Connect. Deciding when to choose between R Markdown, interactive documents, and Shiny applications is a subject for a later post.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;Reproducible code in R leads to better analysis and collaboration. You can use parameterized R Markdown reports to create complex, interactive dashboards. Hosting these dashboards securely in RStudio Connect gives you control over accessing, distributing, and processing data. You can use the R programming language, along with with R Markdown reports and RStudio Connect, to build enterprise dashboards that are robust, secure, and manageable.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Click &lt;a href=&#34;https://github.com/sol-eng/customer-tracker&#34;&gt;here&lt;/a&gt; for source code.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/05/16/replacing-excel-reports-with-r-markdown-and-shiny/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Multiple Versions of R</title>
      <link>https://rviews.rstudio.com/2018/03/21/multiple-versions-of-r/</link>
      <pubDate>Wed, 21 Mar 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/03/21/multiple-versions-of-r/</guid>
      <description>
        


&lt;p&gt;&lt;img src=&#34;/post/2018-03-21-multiple-versions-r/pyramids.jpg&#34; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Data scientists prefer using the latest R packages to analyze their data. To ensure a good user experience, you will need a recent version of R running on a modern operating system. If you run R on a production server – and especially if you use RStudio Connect – plan to support multiple versions of R side by side so that your code, reports, and apps remain stable over time. You can support multiple versions of R concurrently by building R from source. Plan to install a new version of R at least once per year on your servers.&lt;/em&gt;&lt;/p&gt;

&lt;div id=&#34;a-solid-foundation-for-r&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;A solid foundation for R&lt;/h3&gt;
&lt;p&gt;Administering R on the desktop is relatively easy, because desktops are designed for a single user at a specific time. Desktop users upgrade R versions and R packages as new software becomes available, leaving old versions and packages behind. Servers, on the other hand, are designed to support multiple people who want to access content across time. Servers are increasingly used for building &lt;a href=&#34;https://rviews.rstudio.com/2017/12/20/rstudio-server-quick-start/&#34;&gt;data science labs in R&lt;/a&gt;, deploying R in production, and running R in the cloud. You may find that the same strategies you use to administer R on your desktop do not work as well on a server. In particular, upgrading your version of R must be handled differently.&lt;/p&gt;
&lt;p&gt;If you upgrade R on your server as you do on your desktop, you could easily break some apps and disrupt your teams. Administrators should exercise caution when &lt;a href=&#34;https://shiny.rstudio.com/articles/upgrade-R.html&#34;&gt;upgrading to a new version of R&lt;/a&gt; on a Linux server. Consider the following situations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You are hosting apps on RStudio Connect and Shiny Server for more than a year. When you upgrade R, you break many of your older apps.&lt;/li&gt;
&lt;li&gt;Your team is developing code on a shared instance of RStudio Server. When you upgrade R, you disrupt people’s work and break their code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Instead of upgrading your existing version of R, a better solution to these problems is to run multiple versions of R side by side. This strategy preserves past versions of R so you can &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/r-versions.html#managing-upgrades-of-r&#34;&gt;manage upgrades&lt;/a&gt; and keep your code, apps, and reports stable over time.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;building-r-from-source&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Building R from source&lt;/h3&gt;
&lt;p&gt;The best way to run multiple versions of R side by side is to build R from source. If you are running R on a Linux server – and particularly in the enterprise – you should always build R from source, because it will help you:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Run multiple versions of R side by side&lt;/li&gt;
&lt;li&gt;Guarantee that R will work on your unique server configuration&lt;/li&gt;
&lt;li&gt;Potentially speed up certain low-level computations used by R&lt;/li&gt;
&lt;li&gt;Build technical expertise that will help you administer R at scale&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Most enterprise IT departments will be comfortable building software from source. If you have never built R from source, it is very straightforward. First, you need the build dependencies for R. If you’ve already installed R from a binary source like CRAN or EPEL, you may already have these dependencies installed; otherwise, you can run &lt;code&gt;sudo yum-builddep R&lt;/code&gt; on RedHat or &lt;code&gt;sudo apt-get build-dep r-base&lt;/code&gt; on Ubuntu. Second, you should obtain and unpack the &lt;a href=&#34;https://cran.rstudio.com/src/base/&#34;&gt;source tarball&lt;/a&gt; for the version of R you want to install from CRAN. Third, from within the extracted source directory, build R from source using &lt;code&gt;configure&lt;/code&gt;, &lt;code&gt;make&lt;/code&gt;, and &lt;code&gt;make install&lt;/code&gt; commands. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# BUILD R FROM SOURCE ON REDHAT LINUX
# R-3.4.3

# Install Linux dependencies
$ sudo yum-builddep R

# Download and extract source code
$ wget https://cran.r-project.org/src/base/R-3/R-3.4.3.tar.gz
$ tar -xzvf R-3.4.3.tar.gz
$ cd R-3.4.3

# Build R from source
$ ./configure --prefix=/opt/R/$(cat VERSION) --enable-R-shlib --with-blas --with-lapack
$ make
$ sudo make install&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This script installs R version 3.4.3 into &lt;code&gt;/opt/R/3.4.3&lt;/code&gt;, but you can install R into any of the &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/r-versions.html#recommended-installation-directories&#34;&gt;recommended directories&lt;/a&gt;. The &lt;code&gt;--enable-R-shlib&lt;/code&gt; option is required to make the shared libraries known to RStudio. The &lt;code&gt;--with-blas&lt;/code&gt; and &lt;code&gt;--with-lapack&lt;/code&gt; options are not required, but are commonly included. These options install the system &lt;a href=&#34;http://www.netlib.org/blas/#_presentation&#34;&gt;BLAS&lt;/a&gt; and &lt;a href=&#34;http://www.netlib.org/lapack/#_presentation&#34;&gt;LAPACK&lt;/a&gt; libraries, which are used to speed up certain low-level math computations (e.g., multiplying and inverting matrices). These libraries will not speed up R itself, but can significantly speed up the underlying code execution.&lt;/p&gt;
&lt;p&gt;If you run into problems installing R from source, you can always remove the installation directory and start over. However, once the installation succeeds, you should never move the installation directory – in other words, always install into the final destination directory. If you run into problems with dependencies, make sure you are able to identify and install all of the required Linux libraries (e.g., the X11 library is commonly overlooked). Building R from source will be much easier with a modern operating system that is connected to the Internet.&lt;/p&gt;
&lt;p&gt;For further details about building R from source, see the &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/r-versions.html#building-additional-versions-from-source&#34;&gt;RStudio Server Admin Guide&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;rstudio-professional-products&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;RStudio professional products&lt;/h3&gt;
&lt;p&gt;RStudio professional products automatically support multiple versions of R and provide &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/r-versions.html#overview-3&#34;&gt;additional features&lt;/a&gt;, such as having administrators control access to multiple versions, or allowing users to choose for themselves. RStudio Connect automatically provides &lt;a href=&#34;http://docs.rstudio.com/connect/admin/r.html#r-version-matching&#34;&gt;R version matching&lt;/a&gt;. Running multiple versions of R side by side with RStudio Connect will ensure that your content persists over time.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;References&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/215488098-Installing-multiple-versions-of-R-on-Linux&#34;&gt;Installing multiple versions of R on Linux&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://shiny.rstudio.com/articles/upgrade-R.html&#34;&gt;Upgrading to a new version of R&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/212364537-Multiple-Versions-of-R-in-RStudio-Server-Pro&#34;&gt;Multiple versions of R with RStudio Server Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://docs.rstudio.com/shiny-server/#r_path&#34;&gt;Multiple versions of R with Shiny Server Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://docs.rstudio.com/connect/admin/r.html#upgrading-r&#34;&gt;Multiple versions of R with RStudio Connect&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/03/21/multiple-versions-of-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Deep learning at rstudio::conf 2018</title>
      <link>https://rviews.rstudio.com/2018/02/14/deep-learning-rstudio-conf-2018/</link>
      <pubDate>Wed, 14 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/02/14/deep-learning-rstudio-conf-2018/</guid>
      <description>
        

&lt;p&gt;Two weeks ago, &lt;a href=&#34;https://www.rstudio.com/conference/&#34;&gt;rstudio::conf 2018&lt;/a&gt; was held in San Diego. We had 1,100 people attend the sold-out event.  In this post, I summarize my experience of the talks on the topic of deep learning with R, including the keynote by &lt;a href=&#34;https://www.linkedin.com/profile/view?id=10843566/&#34;&gt;J.J. Allaire&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-02-13_de_Vries_deep_learning_at_rstudio_conf_files/J.J._video.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h1 id=&#34;keynote&#34;&gt;Keynote&lt;/h1&gt;

&lt;p&gt;The keynote on the second day was J.J. Allaire discussing &amp;ldquo;Machine Learning with Tensorflow and R&amp;rdquo;. In this talk, J.J. took us on a tour of how to use TensorFlow with R.  He started with the basics, e.g., what is a tensor (it&amp;rsquo;s an array), and explained how the tensors &amp;ldquo;flow&amp;rdquo; in a computation graph in the &lt;code&gt;TensorFlow&lt;/code&gt; library. The &lt;code&gt;tensorflow&lt;/code&gt; package in R is an interface to the &lt;code&gt;TensorFlow&lt;/code&gt; library, meaning you can access the full power of TensorFlow directly from R.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-02-13_de_Vries_deep_learning_at_rstudio_conf_files/tensors_flowing.gif&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;For several years, there has been a great deal of hype about deep learning, with multiple libraries (primarily written in Python and C++). Of these libraries, TensorFlow seems to get the dominant share of interest. R has always been a language that excels in its ability to interact with other languages, including Fortran, C++, and now Python. With the release of the &lt;code&gt;tensorflow&lt;/code&gt; package, R users can make full use of &lt;em&gt;all&lt;/em&gt; of the functions in TensorFlow.&lt;/p&gt;

&lt;p&gt;Advances in deep learning, including algorithms, GPU computing, and availability of large data sets, have combined for the enormous success of deep learning in many fields. This includes near-human-level performance in the fields of image classification, speech recognition, and machine translation, to name a few.&lt;/p&gt;

&lt;p&gt;However, J.J. points out that TensorFlow is quite a low-level mathematical library, and that most practitioners would benefit from writing their neural network code using &lt;code&gt;keras&lt;/code&gt;, a package that exposes a high-level API. Keras supports multiple back ends, including TensorFlow, CNTK and Theano. You can find out more at the &lt;a href=&#34;https://keras.rstudio.com/&#34;&gt;&lt;code&gt;keras&lt;/code&gt; package page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;J.J. concluded his talk by demonstrating several ways to deploy a &lt;code&gt;keras&lt;/code&gt; or &lt;code&gt;tensorflow&lt;/code&gt; model, including publishing to RStudio Connect.&lt;/p&gt;

&lt;p&gt;To find out more about J.J.&amp;rsquo;s talk, you can watch the &lt;a href=&#34;https://www.youtube.com/watch?v=atiYXm7JZv0&#34;&gt;keynote video&lt;/a&gt; or view the &lt;a href=&#34;https://rstd.io/ml-with-tensorflow-and-r/&#34;&gt;slides&lt;/a&gt;. You can also download the &lt;a href=&#34;https://github.com/rstudio/cheatsheets/raw/master/keras.pdf&#34;&gt;&lt;code&gt;keras&lt;/code&gt; cheat sheet&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&#34;other-talks&#34;&gt;Other talks&lt;/h1&gt;

&lt;p&gt;Following the keynote, the conference split into several tracks. I attended the session1: &amp;ldquo;interop&amp;rdquo;, which focused on interoperability between R and several deep-learning frameworks, including deployment options.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The first talk in this session was by &lt;a href=&#34;https://www.linkedin.com/in/michaelquinn32?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base%3B87KZ5Uq%2FQxSpdhxC0jwFkg%3D%3D&#34;&gt;Michael Quinn&lt;/a&gt; from Google. Michael discussed &amp;ldquo;large-scale machine learning using TensorFlow, BigQuery and Cloud ML&amp;rdquo;. Once you have a &lt;code&gt;keras&lt;/code&gt; or &lt;code&gt;tensorflow&lt;/code&gt; model, you can deploy this to &lt;a href=&#34;https://cloud.google.com/ml-engine/&#34;&gt;Google Cloud Machine Learning&lt;/a&gt; (Cloud ML). What I find interesting about this is that Cloud ML is a service designed for machine learning. Using this service, you can deploy models without having to stand up a virtual machine first.  You can do this deployment using R code with the &lt;a href=&#34;https://tensorflow.rstudio.com/tools/cloudml/articles/getting_started.html&#34;&gt;&lt;code&gt;cloudml&lt;/code&gt; package&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;The next talk was by Javier Luraschi from RStudio, who spoke about &amp;ldquo;Deploying TensorFlow models with &lt;code&gt;tfdeploy&lt;/code&gt;&amp;rdquo;. The &lt;a href=&#34;https://tensorflow.rstudio.com/tools/tfdeploy/articles/introduction.html&#34;&gt;&lt;code&gt;tfdeploy&lt;/code&gt; package&lt;/a&gt; exposes a unified way to deploy models to several platforms, including &lt;a href=&#34;https://www.tensorflow.org/serving/https://www.tensorflow.org/serving/&#34;&gt;TensorFlow Serving&lt;/a&gt;, &lt;a href=&#34;https://tensorflow.rstudio.com/tools/cloudml/&#34;&gt;Cloud ML&lt;/a&gt;, and &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;. Javier made his talk available as &lt;a href=&#34;http://rpubs.com/jluraschi/deploying-tensorflow-rstudio-conf&#34;&gt;slides&lt;/a&gt; as well as &lt;a href=&#34;https://github.com/rstudio/rstudio-conf/tree/master/2018/Deploying_TensorFlow_Models--Javier%20Luraschi&#34;&gt;code&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;The final presentation was by &lt;a href=&#34;https://www.linkedin.com/in/alikzaidi/&#34;&gt;Ali Zaid&lt;/a&gt; from Microsoft, who talked about &amp;ldquo;Reinforcement learning in Minecraft with CNTK-R&amp;rdquo;. Ali showed how he trained a deep-learning model to control an agent in &lt;a href=&#34;https://minecraft.net/en-us/&#34;&gt;Minecraft&lt;/a&gt;, the popular online game. In his experiment, he taught the agent to navigate a maze, as well as understand the some natural language, e.g., &amp;ldquo;Pick up the red flowers&amp;rdquo;. He used the &lt;a href=&#34;https://github.com/Microsoft/CNTK-R&#34;&gt;&lt;code&gt;CNTK-R&lt;/code&gt; package&lt;/a&gt;, which wraps the &lt;a href=&#34;https://github.com/microsoft/cntk&#34;&gt;Microsoft Cognitive Toolkit (CNTK)&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;In conclusion, I&amp;rsquo;ll quote directly from J.J. Allaire&amp;rsquo;s keynote, in which he describes the key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;TensorFlow is a new general-purpose numerical-computing library with lots to offer the R community.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Deep learning has made great progress and will likely increase in importance in various fields in the coming years.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;R now has a great set of APIs and supporting tools for using TensorFlow and doing deep learning.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/02/14/deep-learning-rstudio-conf-2018/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Package Management for Reproducible R Code</title>
      <link>https://rviews.rstudio.com/2018/01/18/package-management-for-reproducible-r-code/</link>
      <pubDate>Thu, 18 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/01/18/package-management-for-reproducible-r-code/</guid>
      <description>
        

&lt;p&gt;Any programming environment should be optimized for its task, and not all tasks are alike.  For example, if you are exploring uncharted mountain ranges, the portability of a tent is essential.  However, when building a house to weather hurricanes, investing in a strong foundation is important. Similarly, when beginning a new data science programming project, it is prudent to assess how much effort should be put into ensuring the code is reproducible.&lt;/p&gt;

&lt;p&gt;Note that it is certainly possible to go back later and &amp;ldquo;shore up&amp;rdquo; the reproducibility of a project where it is weak. This is often the case when an &amp;ldquo;ad-hoc&amp;rdquo; project becomes an important production analysis. However, the first step in starting a project is to make a decision regarding the trade-off between the amount of time to set up the project and the probability that the project will need to be reproducible in arbitrary environments.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-01-17-package-management-for-reproducible-r-code_files/spectrum-notext.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;challenges&#34;&gt;Challenges&lt;/h2&gt;

&lt;p&gt;It is important to understand the reasons that reproducible programming is challenging. Once programming practices and external data are taken into account, the primary difficulty is dependency management over time.  Dependency management is important because dependencies are so essential to R development.  R has a fast-moving community and many extremely valuable packages to make your work more effective and efficient.&lt;/p&gt;

&lt;p&gt;You will typically want to ensure that you are using recent versions of packages for a new project.  By extension, this will require a recent operating system and a recent version of R.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The best place to start is with a recent operating system and a recent version of R&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Typically, this equates to upgrading R to the latest version once or twice per year, and upgrading your operating system to a new major version every two to three years.&lt;/p&gt;

&lt;p&gt;Despite the upsides of a vibrant package ecosystem, R programmers are familiar with the pain that can come with the many (very useful) packages that change, break, and are deprecated over time.  Good dependency management ensures your project can be recomputed again in another time or another place.&lt;/p&gt;

&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;

&lt;p&gt;R package management is where most reproducibility decision-making needs to happen, although we will mention system dependencies shortly.  CRAN archives source code for all versions of R packages, past and present.  As a result, it is always possible to rebuild from source for package versions that you used to build an analysis (even on different operating systems).  &lt;em&gt;How&lt;/em&gt; you keep track of the dependencies that you used will establish how reproducible your analysis is.  As we indicated before, there is a spectrum along which you might fall.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-01-17-package-management-for-reproducible-r-code_files/spectrum-ex.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h3 id=&#34;ignoring-reproducibility&#34;&gt;Ignoring Reproducibility&lt;/h3&gt;

&lt;p&gt;There are occasionally times of rapid exploration where the simplest solution is to ignore reproducibility.&lt;/p&gt;

&lt;p&gt;Many R developers opt for a single massive system library of R packages and no record of what packages they used for an analysis.  It is still recommended to use &amp;ldquo;&lt;a href=&#34;https://www.tidyverse.org/articles/2017/12/workflow-vs-script/&#34;&gt;RStudio Projects&lt;/a&gt;&amp;rdquo;, if you are using the RStudio IDE, and version control code in git or some other version-control system.  This approach is optimal for exploring because it involves almost no setup, and gets the programmer into the problem immediately.&lt;/p&gt;

&lt;p&gt;However, even with code version control, it can be very challenging to reproduce a result without documentation of the package versions that were in use when the code was checked in.  Further, if one project updates a package that another project was using, it is possible to have the two projects conflict on version dependencies, and one or both can break.&lt;/p&gt;

&lt;p&gt;When exploration begins to stabilize, it is best to establish a reproducible environment.  You can always capture dependencies at a given time with &lt;code&gt;sessionInfo()&lt;/code&gt; or &lt;code&gt;devtools::session_info&lt;/code&gt;, but this does not facilitate easily rebuilding your dependency tree.&lt;/p&gt;

&lt;h3 id=&#34;tracking-package-dependencies-per-project&#34;&gt;Tracking Package Dependencies per Project&lt;/h3&gt;

&lt;p&gt;Tracking dependencies per project isolates package versions at a project level and avoids using the system library.  &lt;a href=&#34;https://rstudio.github.io/packrat/&#34;&gt;&lt;code&gt;packrat&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html&#34;&gt;&lt;code&gt;checkpoint&lt;/code&gt;&lt;/a&gt;/&lt;a href=&#34;https://mran.microsoft.com/timemachine&#34;&gt;&lt;code&gt;MRAN&lt;/code&gt;&lt;/a&gt; both take this approach, so we will discuss each separately.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Programmers in other languages will be familiar with &lt;a href=&#34;https://rstudio.github.io/packrat/&#34;&gt;&lt;code&gt;packrat&lt;/code&gt;&lt;/a&gt;&amp;rsquo;s approach to storing the exact versions of packages that the project uses in a text file (&lt;code&gt;packrat.lock&lt;/code&gt;).  It works for CRAN, GitHub, and local packages, and provides a high level of reproducibility.  However, a fair amount of time is spent building packages from source, re-installing packages into the local project&amp;rsquo;s folder, and downloading the source code for packages.  Fortunately, &lt;code&gt;packrat&lt;/code&gt; has a &amp;ldquo;global cache&amp;rdquo; that can speed things up by symlinking package versions that have been installed elsewhere on the system.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://mran.microsoft.com/timemachine&#34;&gt;&lt;code&gt;MRAN&lt;/code&gt;&lt;/a&gt; and &lt;a href=&#34;https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html&#34;&gt;&lt;code&gt;checkpoint&lt;/code&gt;&lt;/a&gt; also take the library-per-project approach, but focus on CRAN packages and determine dependencies based on the &amp;ldquo;snapshot&amp;rdquo; of CRAN that Microsoft stored on a given day.  The programmer need only store the &amp;ldquo;checkpoint&amp;rdquo; day they are referencing to keep up with package versions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both packages leverage up-front work to make reproducing an analysis quite straightforward later, but it is worth noting the differences between them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;packrat&lt;/code&gt; keeps tabs on the packages installed in your project folder and presumes that they form a complete, working, and self-consistent whole.  It also downloads package sources to your computer for future re-compiling, if necessary.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;checkpoint&lt;/code&gt; chooses package versions based on a given day in MRAN history.  This presumes that all the package versions you need are on that day, and that CRAN had a self-consistent system that day.  If you want to update a package, you will need to choose a date in the future and re-install all packages to be sure that none of them break.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;tracking-all-dependencies-per-project&#34;&gt;Tracking All Dependencies per Project&lt;/h3&gt;

&lt;p&gt;When it comes to other system libraries or dependencies, containers are one of the most popular solutions for reproducibility.  Containers behave like lightweight virtual machines, and are more fitting for reproducible data science.  To give containers a shot, you can &lt;a href=&#34;https://docs.docker.com/engine/installation/&#34;&gt;install docker&lt;/a&gt; and then take a look at the &lt;a href=&#34;https://www.rocker-project.org/&#34;&gt;rocker project&lt;/a&gt; (R on docker).&lt;/p&gt;

&lt;p&gt;At a high level, Docker saves a snapshot called an &amp;ldquo;image&amp;rdquo; that includes all of the software necessary to complete a task.  A running &amp;ldquo;image&amp;rdquo; is called a &amp;ldquo;container.&amp;rdquo;  These images are extensible, so that you can more easily build an image that has the dependencies you need for a given project.  For instance, to use the tidyverse, you might execute the following:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;docker pull rocker/tidyverse
docker run -d --name=my-r-container -p 8787:8787 rocker/tidyverse
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can then get an interactive terminal with &lt;code&gt;docker exec -it my-r-container bash&lt;/code&gt;, or open RStudio in the browser by going to &lt;code&gt;localhost:8787&lt;/code&gt; and authenticating with user:pass &lt;code&gt;rstudio:rstudio&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It is important to consider the difficulty of maintaining package dependencies within the image.  If your &lt;code&gt;Dockerfile&lt;/code&gt; installs packages from CRAN or GitHub, the regeneration of your image will still be susceptible to changes in the published version of a package.  As a result, it is advisable to pair up &lt;code&gt;packrat&lt;/code&gt; with Docker for complete dependency management.&lt;/p&gt;

&lt;p&gt;A simple &lt;code&gt;Dockerfile&lt;/code&gt; like the following will copy the current project folder into the &lt;code&gt;rstudio&lt;/code&gt; user&amp;rsquo;s home (within the container) and install the necessary dependencies using &lt;code&gt;packrat&lt;/code&gt;.  It requires using &lt;code&gt;packrat&lt;/code&gt; for the project.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;FROM rocker/rstudio

# install packrat
RUN R -e &#39;install.packages(&amp;quot;packrat&amp;quot;, repos=&amp;quot;http://cran.rstudio.com&amp;quot;, dependencies=TRUE, lib=&amp;quot;/usr/local/lib/R/site-library&amp;quot;);&#39;

USER rstudio

# copy lock file &amp;amp; install deps
COPY --chown=rstudio:rstudio packrat/packrat.* /home/rstudio/project/packrat/
RUN R -e &#39;packrat::restore(project=&amp;quot;/home/rstudio/project&amp;quot;);&#39;

# copy the rest of the directory
# .dockerignore can ignore some files/folders if desirable
COPY --chown=rstudio:rstudio . /home/rstudio/project

USER root
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then the following will get your image started, much like the tidyverse example above.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-bash&#34;&gt;docker build --tag=my-test-image .
docker run --rm -d --name=my-test-container -p 8787:8787 my-test-image
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that doing more complex work typically involves a bit of foresight, familiarity with design conventions, and the creation of a custom &lt;code&gt;Dockerfile&lt;/code&gt;.  However, this up-front work is rewarded by a full operating-system snapshot, including all system and package dependencies.  As a result, Docker provides optimal reproducibility for an analysis.&lt;/p&gt;

&lt;h2 id=&#34;how-certain-do-you-need-to-be-that-your-code-is-reproducible&#34;&gt;How certain do you need to be that your code is reproducible?&lt;/h2&gt;

&lt;p&gt;It is necessary and increasingly popular to start thinking about notebooks when discussing reproducibility.  However, if the aim is to recompute results in another time or place, we cannot stop there.&lt;/p&gt;

&lt;p&gt;When it comes to the management of packages and other system dependencies, you will need to decide whether you want to spend more time setting up a reproducible environment, or if you want to start exploring immediately.  Whether you are putting up a tent for the night or building a house that future generations will enjoy, there are plenty of tools to help you on your way and assist you if you ever need to change course.&lt;/p&gt;

&lt;p&gt;In future posts, I hope to explore additional aspects of reproducibility.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/01/18/package-management-for-reproducible-r-code/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>A Data Science Lab for R</title>
      <link>https://rviews.rstudio.com/2017/12/20/rstudio-server-quick-start/</link>
      <pubDate>Wed, 20 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/12/20/rstudio-server-quick-start/</guid>
      <description>
        


&lt;p&gt;In a &lt;a href=&#34;https://rviews.rstudio.com/2017/06/21/analytics-administration-for-r/&#34;&gt;previous post&lt;/a&gt; I described the role of analytic administrator as a data scientist who: onboards new tools, deploys solutions, supports existing standards, and trains other data scientists. In this post I will describe how someone in that role might set up a data science lab for R.&lt;/p&gt;
&lt;div id=&#34;architecture&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Architecture&lt;/h3&gt;
&lt;p&gt;A data science lab is an environment for developing code and creating content. It should enhance the productivity of your data scientists and integrate with your existing systems. Your data science lab might live on your premises or in the cloud. It might be built with hardware, virtual machines, or containers. You may use it to support a single data scientist or hundreds of R developers. Here is one reference architecture of a data science lab based on server instances.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-12-20-DS-lab_files/rsp-setup-small.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Key components of this setup include: authentication; load balancing; a testing environment; data connectivity; and a publishing platform. In this server-based architecture, data scientists use a web browser to access the data science lab. High performance compute and live data reside securely behind a firewall.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;instance-sizing&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Instance Sizing&lt;/h3&gt;
&lt;p&gt;The size of your server instance depends on how many concurrent sessions you run and how large your sessions are. Keep in mind that R is single threaded by default and holds data in memory. Here is a list of example server sizes:&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col width=&#34;27%&#34; /&gt;
&lt;col width=&#34;9%&#34; /&gt;
&lt;col width=&#34;9%&#34; /&gt;
&lt;col width=&#34;54%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Instance Size&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;Cores&lt;/th&gt;
&lt;th align=&#34;center&#34;&gt;RAM&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;p&gt;Minimum recommended&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;2&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;4G&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;This server will be for lightweight jobs, testing, and sandboxing.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;p&gt;Small&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;4&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;8G&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;This server will support one or two analysts with small data.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;p&gt;Large&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;16&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;256G&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;This server will support 15 analysts with a blend of large and small sessions. Alternatively, it will support dozens of analysts with small sessions.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;p&gt;Jumbo&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;32+&lt;/p&gt;&lt;/td&gt;
&lt;td align=&#34;center&#34;&gt;&lt;p&gt;1T+&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;p&gt;May be useful for heavier workloads.&lt;/p&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div id=&#34;open-source-r&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Open-Source R&lt;/h3&gt;
&lt;p&gt;If you haven’t done so already, I recommend you &lt;a href=&#34;https://rviews.rstudio.com/2016/11/16/make-r-a-legitimate-part-of-your-organization/&#34;&gt;make R a legitimate part of your organization&lt;/a&gt; by officially recognizing it as an analytic standard. You should be familiar with installing and managing R and its packages.&lt;/p&gt;
&lt;p&gt;You can install R as a pre-compiled binary from a repository, or you can install R from source. Installing R from source allows you to install &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/r-versions.html#installing-multiple-versions-of-r&#34;&gt;multiple versions of R side by side&lt;/a&gt;. If you compile R from source, I recommend you link to the &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/r-versions.html#building-additional-versions-from-source&#34;&gt;BLAS libraries&lt;/a&gt; so that you can speed up certain low-level math computations.&lt;/p&gt;
&lt;p&gt;Data science labs tend to require a modern toolkit. You should expect to upgrade R at least once a year. You should also keep your operating system up to date. New and improved R packages tend to work better when you use them with recent versions of R and updated system libraries.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;rstudio-server-pro&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;RStudio Server Pro&lt;/h3&gt;
&lt;p&gt;Building a data science lab involves installing, configuring, and managing tools. In this section I will describe how to administer RStudio Server Pro which has features for authentication, security, and admin controls.&lt;/p&gt;
&lt;div id=&#34;installation&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;1. Installation&lt;/h4&gt;
&lt;p&gt;Once you have installed R, you can install &lt;a href=&#34;https://www.rstudio.com/products/rstudio-server-pro/&#34;&gt;RStudio Server Pro&lt;/a&gt; by downloading the binaries and following the instructions. You will need root privileges to install and run the software. You will also need to create local system accounts for all of your R developers.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;configuration&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;2. Configuration&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/authenticating-users.html&#34;&gt;Authentication&lt;/a&gt;.&lt;/strong&gt; The first thing you will want to do after you install RStudio Server Pro is to configure it with your authentication system. RStudio Server Pro supports LDAP via PAM sessions. If you use single sign on or another system, you can configure RStudio Server Pro to work in proxied auth mode. You can also authenticate via Google accounts and local system accounts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/data-connectivity.html&#34;&gt;Data Connectivity&lt;/a&gt;.&lt;/strong&gt; Most data scientists use R with databases. The &lt;a href=&#34;https://www.rstudio.com/products/drivers/&#34;&gt;RStudio Pro Drivers&lt;/a&gt; are ODBC drivers that will connect R to some of the most popular databases today. These drivers are a free add-on for RStudio Server Pro. If you are using a data source that is not supported, or if you are using the open source version of RStudio Server, you can bring your own ODBC driver.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/load-balancing.html&#34;&gt;Load Balancing&lt;/a&gt;.&lt;/strong&gt; If you want to load balance your server instances, you can use the load balancer that is built into RStudio Server Pro or you can bring your own load balancer. Load balancing is designed to balance user sessions seamlessly across the cluster and provide high availability. It requires a shared home drive that is mounted to each one of the instances.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;https://www.rstudio.com/products/rstudio-server-pro/&#34;&gt;More Features&lt;/a&gt;.&lt;/strong&gt; RStudio Server Pro has a list of features that you can configure. You should decide which features you want to enable or disable. For more information on configuring each of these features, see the &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/&#34;&gt;admin guide&lt;/a&gt;.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col width=&#34;38%&#34; /&gt;
&lt;col width=&#34;61%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;p&gt;Authentication&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;LDAP, Active Directory, Google Accounts and system accounts&lt;/li&gt;
&lt;li&gt;Full support for Pluggable Authentication Modules, Kerberos via PAM, and custom authentication via proxied HTTP header&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;p&gt;Data Connectivity&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;RStudio Professional Drivers are ODBC data connectors that help you connect to some of the most popular databases.&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;p&gt;Load Balancing&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Load balance R sessions across two or more servers&lt;/li&gt;
&lt;li&gt;Ensure high availability using multiple masters&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;p&gt;Enhanced security&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Encrypt traffic using SSL and restrict client IP addresses&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;p&gt;Administrative dashboard&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Monitor active sessions and their CPU and memory utilization&lt;/li&gt;
&lt;li&gt;Suspend, forcibly terminate, or assume control of any active session&lt;/li&gt;
&lt;li&gt;Review historical usage and server logs&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;p&gt;Auditing and monitoring&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Monitor server resources (CPU, memory, etc.) on both a per-user and system-wide basis&lt;/li&gt;
&lt;li&gt;Send metrics to external systems with the Graphite/Carbon plaintext protocol&lt;/li&gt;
&lt;li&gt;Health check with configurable output (custom XML, JSON)&lt;/li&gt;
&lt;li&gt;Audit all R console activity by writing input and output to a central location&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;p&gt;Advanced R session management&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Tailor the version of R, reserve CPU, prioritize scheduling and limit resources by User and Group&lt;/li&gt;
&lt;li&gt;Provision accounts and mount home directories dynamically via the PAM Session API&lt;/li&gt;
&lt;li&gt;Automatically execute per-user profile scripts for database and cluster connectivity&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;p&gt;Project sharing&lt;/p&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Share projects &amp;amp; edit code files simultaneously with others&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div id=&#34;management&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;3. Management&lt;/h4&gt;
&lt;p&gt;Once RStudio Server Pro is installed and configured, you’ll need to manage it over time. RStudio Server Pro comes with a variety of tools for workspace and server management that will help keep your environment organized. For example, you can kill sessions, set session timeouts, and broadcast notifications to user sessions in real-time. You can manage product &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/license-management.html&#34;&gt;licenses&lt;/a&gt; for both online and offline environments. If your instances start and stop frequently you can opt for using a &lt;a href=&#34;http://docs.rstudio.com/ide/server-pro/license-management.html#floating-licensing&#34;&gt;floating license&lt;/a&gt; manager.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;next-steps&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Next Steps&lt;/h3&gt;
&lt;p&gt;Your data science lab for R should be designed to scale. That might mean adding more people, more systems, or more tools. It also might mean creating more content. &lt;a href=&#34;http://shiny.rstudio.com/&#34;&gt;Shiny&lt;/a&gt; is an R package that makes it easy to build interactive web apps straight from R. &lt;a href=&#34;http://rmarkdown.rstudio.com/&#34;&gt;R Markdown&lt;/a&gt; is an R package that makes it easy to author reports and build dashboards. You can publish your Shiny apps or R Markdown reports with the push of a button to &lt;a href=&#34;https://www.rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;. RStudio Connect lets you share and manage content in one convenient place. You can also publish Shiny apps to &lt;a href=&#34;http://shinyapps.io&#34;&gt;shinyapps.io&lt;/a&gt;, which allows you to share your Shiny apps online.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/12/20/rstudio-server-quick-start/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Using Shiny with Scheduled and Streaming Data</title>
      <link>https://rviews.rstudio.com/2017/11/15/shiny-and-scheduled-data-r/</link>
      <pubDate>Wed, 15 Nov 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/11/15/shiny-and-scheduled-data-r/</guid>
      <description>
        

&lt;p&gt;&lt;em&gt;Note: This article is now several years old. If you have RStudio Connect, there are more &lt;a href=&#34;https://medium.com/@kelly.obriant/basic-builds-how-to-update-data-in-a-shiny-app-on-rstudio-connect-48593902b1e2&#34;&gt;modern ways of updating data in a Shiny app&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Shiny applications are often backed by fluid, changing data. Data updates can occur at different time scales: from scheduled daily updates to live streaming data and ad-hoc user inputs. This article describes best practices for handling data updates in Shiny, and discusses deployment strategies for automating data updates.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-11-15-shiny-and-scheduled-data/rviews_scheduled_shiny.002.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;This post builds off of a 2017 rstudio::conf talk. The recording of the &lt;a href=&#34;https://www.rstudio.com/resources/videos/dashboards-made-easy/&#34;&gt;original talk&lt;/a&gt; and the &lt;a href=&#34;https://github.com/slopp/scheduledsnow&#34;&gt;sample code&lt;/a&gt; for this post are available.&lt;/p&gt;

&lt;p&gt;The end goal of this example is a dashboard to help skiers in Colorado select a resort to visit. Recommendations are based on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Snow reports that provide useful metrics like number of runs open and amount of new snow. Snow reports are updated &lt;strong&gt;daily&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Weather data, updated in &lt;strong&gt;near real-time from a live stream&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;User preferences, entered in the dashboard.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The backend for the dashboard looks like:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-11-15-shiny-and-scheduled-data/rviews_scheduled_shiny.003.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;automate-scheduled-data-updates&#34;&gt;Automate Scheduled Data Updates&lt;/h2&gt;

&lt;p&gt;The first challenge is preparing the daily data. In this case, the data preparation requires a series of API requests and then basic data cleansing. The code for this process is written &lt;strong&gt;into an R Markdown document&lt;/strong&gt;, alongside process documentation and a few simple graphs that help validate the new data. The R Markdown document ends by saving the cleansed data into a shared data directory. The entire R Markdown document is scheduled for execution.&lt;/p&gt;

&lt;p&gt;It may seem odd at first to use a R Markdown document as the scheduled task. However, our team has found it incredibly useful to be able to look back through historical renderings of the &amp;ldquo;report&amp;rdquo; to gut-check the process. Using R Markdown also forces us to properly document the scheduled process.&lt;/p&gt;

&lt;p&gt;We use RStudio Connect to easily schedule the document, view past historical renderings, and ultimately to host the application. If the job fails, Connect also sends us an email containing &lt;code&gt;stdout&lt;/code&gt; from the render, which helps us stay on top of errors. (Connect can optionally send the successfully rendered report, as well.) However, the same scheduling could be accomplished with a workflow tool or even CRON.&lt;/p&gt;

&lt;p&gt;Make sure the data, written to shared storage, is readable by the user running the Shiny application - typically a service account like &lt;code&gt;rstudio-connect&lt;/code&gt; or &lt;code&gt;shiny&lt;/code&gt; can be set as the run-as user to ensure consistent behavior.&lt;/p&gt;

&lt;p&gt;Alternatively, instead of writing results to the file system, prepped data can be saved to a view in a database.&lt;/p&gt;

&lt;h2 id=&#34;using-scheduled-data-in-shiny&#34;&gt;Using Scheduled Data in Shiny&lt;/h2&gt;

&lt;p&gt;The dashboard needs to look for updates to the underlying shared data and automatically update when the data changes. (It wouldn&amp;rsquo;t be a very good dashboard if users had to refresh a page to see new data.) In Shiny, this behavior is accomplished with the &lt;code&gt;reactiveFileReader&lt;/code&gt; function:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;daily_data &amp;lt;- reactiveFileReader(
  intervalMillis = 100,
  filePath       = &#39;path/to/shared/data&#39;,
  readFunc       = readr::read_cs
)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The function checks the shared data file&amp;rsquo;s update timestamp every &lt;code&gt;intervalMillis&lt;/code&gt; to see if the data has changed. If the data has changed, the file is re-read using &lt;code&gt;readFunc&lt;/code&gt;. The resulting data object, &lt;code&gt;daily_data&lt;/code&gt;, is reactive and can be used in downstream functions like &lt;code&gt;render***&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If the cleansed data is stored in a database instead of written to a file in shared storage, use &lt;code&gt;reactivePoll&lt;/code&gt;. &lt;code&gt;reactivePoll&lt;/code&gt; is similar to &lt;code&gt;reactiveFileReader&lt;/code&gt;, but instead of checking the file&amp;rsquo;s update timestamp, a second function needs to be supplied that identifies when the database is updated. The function&amp;rsquo;s &lt;a href=&#34;https://shiny.rstudio.com/reference/shiny/latest/reactivePoll.html&#34;&gt;help documentation&lt;/a&gt; includes an example.&lt;/p&gt;

&lt;h2 id=&#34;streaming-data&#34;&gt;Streaming Data&lt;/h2&gt;

&lt;p&gt;The second challenge is updating the dashboard with live streaming weather data. One way for Shiny to ingest a stream of data is by turning the stream into &amp;ldquo;micro-batches&amp;rdquo;. The &lt;code&gt;invalidateLater&lt;/code&gt; function can be used for this purpose:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;liveish_data &amp;lt;- reactive({
  invalidateLater(100)
  httr::GET(...)
})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This causes Shiny to poll the streaming API every 100 milliseconds for new data. The results are available in the reactive data object &lt;code&gt;liveish_data&lt;/code&gt;. Picking how often to poll for data depends on a few factors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does the upstream API enforce rate limits?&lt;/li&gt;
&lt;li&gt;How long does a data update take? The application will be blocked while it polls data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal is to pick a polling time that balances the user&amp;rsquo;s desire for &amp;ldquo;live&amp;rdquo; data with these two concerns.&lt;/p&gt;

&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;To summarize, this architecture provides a number of benefits: No more painful, manual running of R code every day! Dashboard code is isolated from data prep code. There is enough flexibility to meet user requirements for live and daily data, while preventing un-necessary number crunching on the backend.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/11/15/shiny-and-scheduled-data-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Database Queries With R</title>
      <link>https://rviews.rstudio.com/2017/10/18/database-queries-with-r/</link>
      <pubDate>Wed, 18 Oct 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/10/18/database-queries-with-r/</guid>
      <description>
        


&lt;p&gt;There are many ways to query data with R. This post shows you three of the most common ways:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Using &lt;code&gt;DBI&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;dplyr&lt;/code&gt; syntax&lt;/li&gt;
&lt;li&gt;Using R Notebooks&lt;/li&gt;
&lt;/ol&gt;
&lt;div id=&#34;background&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Background&lt;/h3&gt;
&lt;p&gt;Several recent package improvements make it easier for you to use databases with R. The query examples below demonstrate some of the capabilities of these R packages.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://rstats-db.github.io/DBI//index.html&#34;&gt;DBI&lt;/a&gt;. The &lt;code&gt;DBI&lt;/code&gt; specification has gone through many &lt;a href=&#34;https://www.r-consortium.org/blog/2017/05/15/improving-dbi-a-retrospect&#34;&gt;recent improvements&lt;/a&gt;. When working with databases, you should always use packages that are &lt;code&gt;DBI&lt;/code&gt;-compliant.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://dplyr.tidyverse.org/&#34;&gt;dplyr&lt;/a&gt; &amp;amp; &lt;a href=&#34;http://dbplyr.tidyverse.org/&#34;&gt;dbplyr&lt;/a&gt;. The &lt;code&gt;dplyr&lt;/code&gt; package now has a generalized SQL backend for talking to databases, and the new &lt;code&gt;dbplyr&lt;/code&gt; package translates R code into database-specific variants. As of this writing, SQL variants are supported for the following databases: Oracle, Microsoft SQL Server, PostgreSQL, Amazon Redshift, Apache Hive, and Apache Impala. More will follow over time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/rstats-db/odbc&#34;&gt;odbc&lt;/a&gt;. The &lt;code&gt;odbc&lt;/code&gt; R package provides a standard way for you to connect to any database as long as you have an ODBC driver installed. The &lt;code&gt;odbc&lt;/code&gt; R package is &lt;code&gt;DBI&lt;/code&gt;-compliant, and is recommended for ODBC connections.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RStudio also made recent improvements to its products so they work better with databases.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://blog.rstudio.com/2017/10/09/rstudio-v1.1-released/&#34;&gt;RStudio IDE (v1.1)&lt;/a&gt;. With the latest version of the RStudio IDE, you can connect to, explore, and view data in a variety of databases. The IDE has a wizard for setting up new connections, and a tab for exploring established connections. These new features are extensible and will work with any R package that has a &lt;a href=&#34;https://rstudio.github.io/rstudio-extensions/connections-contract.html&#34;&gt;connections contract&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/products/drivers/&#34;&gt;RStudio Professional Drivers&lt;/a&gt;. If you are using RStudio professional products, you can download RStudio Professional Drivers for no additional cost. The examples below use the Oracle ODBC driver. If you are using open-source tools, you can bring your own driver or use community packages – many open-source drivers and community packages exist for connecting to a variety of databases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using databases with R is a broad subject and there is more work to be done. An earlier blog post discussed &lt;a href=&#34;https://blog.rstudio.com/2017/06/27/dbplyr-1-1-0/&#34;&gt;our vision&lt;/a&gt;. Part of that vision was to create a website where you can find everything about databases and R in one place. To learn more, visit our site at &lt;a href=&#34;http://db.rstudio.com/best-practices/drivers&#34;&gt;db.rstudio.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example-query-bank-data-in-an-oracle-database&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Example: Query bank data in an Oracle database&lt;/h3&gt;
&lt;p&gt;In this example, we will query bank data in an Oracle database. We connect to the database by using the &lt;code&gt;DBI&lt;/code&gt; and &lt;code&gt;odbc&lt;/code&gt; packages. This specific connection requires a database driver and a data source name (DSN) that have both been configured by the system administrator. Your connection might use another method.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;library(DBI)
library(dplyr)
library(dbplyr)
library(odbc)
con &amp;lt;- dbConnect(odbc::odbc(), &amp;quot;Oracle DB&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;query-using-dbi&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;1. Query using &lt;code&gt;DBI&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;You can query your data with &lt;code&gt;DBI&lt;/code&gt; by using the &lt;code&gt;dbGetQuery()&lt;/code&gt; function. Simply paste your SQL code into the R function as a quoted string. This method is sometimes referred to as &lt;em&gt;pass through SQL code&lt;/em&gt;, and is probably the simplest way to query your data. Care should be used to escape your quotes as needed. For example, &lt;code&gt;&#39;yes&#39;&lt;/code&gt; is written as &lt;code&gt;\&#39;yes\&#39;&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;dbGetQuery(con,&amp;#39;
  select &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;,
  sum(case when &amp;quot;term_deposit&amp;quot; = \&amp;#39;yes\&amp;#39; then 1.0 else 0.0 end) as subscribe,
  count(*) as total
  from &amp;quot;bank&amp;quot;
  group by &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;
&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;query-using-dplyr-syntax&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;2. Query using dplyr syntax&lt;/h3&gt;
&lt;p&gt;You can write your code in &lt;code&gt;dplyr&lt;/code&gt; syntax, and &lt;code&gt;dplyr&lt;/code&gt; will translate your code into SQL. There are several benefits to writing queries in &lt;code&gt;dplyr&lt;/code&gt; syntax: you can keep the same consistent language both for R objects and database tables, no knowledge of SQL or the specific SQL variant is required, and you can take advantage of the fact that &lt;code&gt;dplyr&lt;/code&gt; uses &lt;a href=&#34;http://dbplyr.tidyverse.org/articles/dbplyr.html&#34;&gt;lazy evaluation&lt;/a&gt;. &lt;code&gt;dplyr&lt;/code&gt; syntax is easy to read, but you can always inspect the SQL translation with the &lt;code&gt;show_query()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;q1 &amp;lt;- tbl(con, &amp;quot;bank&amp;quot;) %&amp;gt;%
  group_by(month_idx, year, month) %&amp;gt;%
  summarise(
    subscribe = sum(ifelse(term_deposit == &amp;quot;yes&amp;quot;, 1, 0)),
    total = n())
show_query(q1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;SQL&amp;gt;
SELECT &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;, SUM(CASE WHEN (&amp;quot;term_deposit&amp;quot; = &amp;#39;yes&amp;#39;) THEN (1.0) ELSE (0.0) END) AS &amp;quot;subscribe&amp;quot;, COUNT(*) AS &amp;quot;total&amp;quot;
FROM (&amp;quot;bank&amp;quot;) 
GROUP BY &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;query-using-an-r-notebooks&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;3. Query using an R Notebooks&lt;/h3&gt;
&lt;p&gt;Did you know that you can run SQL code in an &lt;a href=&#34;http://rmarkdown.rstudio.com/r_notebooks.html&#34;&gt;R Notebook&lt;/a&gt; code chunk? To use SQL, open an &lt;a href=&#34;http://rmarkdown.rstudio.com/r_notebooks.html&#34;&gt;R Notebook&lt;/a&gt; in the RStudio IDE under the &lt;strong&gt;File &amp;gt; New File&lt;/strong&gt; menu. Start a new code chunk with &lt;code&gt;{sql}&lt;/code&gt;, and specify your connection with the &lt;code&gt;connection=con&lt;/code&gt; code chunk option. If you want to send the query output to an R dataframe, use &lt;code&gt;output.var = &amp;quot;mydataframe&amp;quot;&lt;/code&gt; in the code chunk options. When you specify &lt;code&gt;output.var&lt;/code&gt;, you will be able to use the output in subsequent R code chunks. In this example, we use the output in &lt;code&gt;ggplot2&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;```{sql, connection=con, output.var = &amp;quot;mydataframe&amp;quot;}
SELECT &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;, SUM(CASE WHEN (&amp;quot;term_deposit&amp;quot; = &amp;#39;yes&amp;#39;) THEN (1.0) ELSE (0.0) END) AS &amp;quot;subscribe&amp;quot;,
COUNT(*) AS &amp;quot;total&amp;quot;
FROM (&amp;quot;bank&amp;quot;) 
GROUP BY &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;
```&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;```{r}
library(ggplot2)
ggplot(mydataframe, aes(total, subscribe, color = year)) +
  geom_point() +
  xlab(&amp;quot;Total contacts&amp;quot;) +
  ylab(&amp;quot;Term Deposit Subscriptions&amp;quot;) +
  ggtitle(&amp;quot;Contact volume&amp;quot;)
```&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-10-18-database-queries-with-R/bankggplot.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;The benefits to using SQL in a code chunk are that you can paste your SQL code without any modification. For example, you do not have to escape quotes. If you are using the proverbial &lt;em&gt;spaghetti code&lt;/em&gt; that is hundreds of lines long, then a SQL code chunk might be a good option. Another benefit is that the SQL code in a code chunk is highlighted, making it very easy to read. For more information on SQL engines, see this page on &lt;a href=&#34;http://rmarkdown.rstudio.com/authoring_knitr_engines.html&#34;&gt;knitr language engines&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;There is no single best way to query data with R. You have many methods to chose from, and each has its advantages. Here are some of the advantages using the methods described in this article.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col width=&#34;34%&#34; /&gt;
&lt;col width=&#34;65%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Advantages&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;DBI::dbGetQuery&lt;/li&gt;
&lt;/ol&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Fewer dependencies required&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;dplyr syntax&lt;/li&gt;
&lt;/ol&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Use the same syntax for R and database objects&lt;/li&gt;
&lt;li&gt;No knowledge of SQL required&lt;/li&gt;
&lt;li&gt;Code is standard across SQL variants&lt;/li&gt;
&lt;li&gt;Lazy evaluation&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;R Notebook SQL engine&lt;/li&gt;
&lt;/ol&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Copy and paste SQL – no formatting required&lt;/li&gt;
&lt;li&gt;SQL syntax is highlighted&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;You can download the R Notebook for these examples &lt;a href=&#34;http://rpubs.com/nwstephens/318586&#34;&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/10/18/database-queries-with-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Enterprise-ready dashboards with Shiny and databases</title>
      <link>https://rviews.rstudio.com/2017/09/20/dashboards-with-r-and-databases/</link>
      <pubDate>Wed, 20 Sep 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/09/20/dashboards-with-r-and-databases/</guid>
      <description>
        


&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-09-12-dashboards-with-r-and-databases/main.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Inside the enterprise, a dashboard is expected to have up-to-the-minute information, to have a fast response time despite the large amount of data that supports it, and to be available on any device. An end user may expect that clicking on a bar or column inside a plot will result in either a more detailed report, or a list of the actual records that make up that number. This article will cover how to use a set of R packages, along with Shiny, to meet those requirements.&lt;/p&gt;
&lt;div id=&#34;the-code&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The code&lt;/h2&gt;
&lt;p&gt;A working example for the dashboard pictured above is available here: &lt;a href=&#34;https://edgarruiz.shinyapps.io/flights-dashboard/&#34;&gt;Flights Dashboard&lt;/a&gt;. The example has all of the functionality that is discussed in this article, except the database connectivity. The code for the dashboard is available in this Gist: &lt;a href=&#34;https://gist.github.com/edgararuiz/89e771b5d1b82adaa0033c0928d1846d&#34;&gt;app.R&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The code for the dashboard that actually connects to a database is available in this Gist: &lt;a href=&#34;https://gist.github.com/edgararuiz/876ba4718e56af66c3e1181482b6cb99&#34;&gt;app.R&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;begin-with-shinydashboard&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Begin with &lt;code&gt;shinydashboard&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://rstudio.github.io/shinydashboard/&#34;&gt;shinydashboard&lt;/a&gt; package has three important advantages:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Provides an out-of-the-box framework to create dashboards in Shiny.&lt;/strong&gt; This saves a lot of time, because the developer does not have to create the dashboard features manually using “base” Shiny.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Has a dashboard-firendly tag structure.&lt;/strong&gt; This allows the developer to get started quickly. Inside the &lt;code&gt;dashboardPage()&lt;/code&gt;tag, the &lt;code&gt;dashboardHeader()&lt;/code&gt;, &lt;code&gt;dashboardSidebar()&lt;/code&gt; and &lt;code&gt;dashboardBody()&lt;/code&gt; can be added to easily lay out a new dashboard.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;It is mobile-ready.&lt;/strong&gt; Without any additional code, the dashboard layout will adapt to a smaller screen automatically.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div id=&#34;quick-example&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Quick example&lt;/h3&gt;
&lt;p&gt;If you are new to &lt;code&gt;shinydashboard&lt;/code&gt;, please feel free to copy and paste the following code to see a very simple dashboard in your environment:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(shinydashboard)
library(shiny)
ui &amp;lt;- dashboardPage(
  dashboardHeader(title = &amp;quot;Quick Example&amp;quot;),
  dashboardSidebar(textInput(&amp;quot;text&amp;quot;, &amp;quot;Text&amp;quot;)),
  dashboardBody(
    valueBox(100, &amp;quot;Basic example&amp;quot;),
    tableOutput(&amp;quot;mtcars&amp;quot;)
  )
)
server &amp;lt;- function(input, output) {
  output$mtcars &amp;lt;- renderTable(head(mtcars))
}
shinyApp(ui, server)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;deploy-using-config&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Deploy using &lt;code&gt;config&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;It is very common that credentials used during development will not be the same ones used for publishing. For databases, the best way to accommodate for this is to have a Data Source Name (DSN) with the same alias name set up on both environments. If it is not possible to set up DSNs, then the &lt;code&gt;config&lt;/code&gt; package can be used to make the switch between credentials used in the different environments invisible. The &lt;a href=&#34;http://docs.rstudio.com/connect/admin/process-management.html#using-the-config-package&#34;&gt;RStudio Connect&lt;/a&gt; product supports the use of the &lt;code&gt;config&lt;/code&gt; package out of the box. Another advantage of using &lt;code&gt;config&lt;/code&gt;, in lieu of Kerberos or DSN, is that the credentials used will not appear in the plain text of the R code. A more detailed write-up is available in the &lt;a href=&#34;http://db.rstudio.com/best-practices/portable-code&#34;&gt;Make scripts portable&lt;/a&gt; article.&lt;/p&gt;
&lt;p&gt;This code snippet is an example YAML file that &lt;code&gt;config&lt;/code&gt; is able to read. It has one driver name for local development, and a different name for use during deployment:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;default:
  mssql:
      Driver: &amp;quot;SQL Server&amp;quot;
      Server: &amp;quot;[server&amp;#39;s path]&amp;quot;
      Database: &amp;quot;[database name]&amp;quot;
      UID: &amp;quot;[user id]&amp;quot;
      PWD: &amp;quot;[pasword]&amp;quot;
      Port: 1433
rsconnect:
  mssql:
      Driver: &amp;quot;SQLServer&amp;quot;
      Server: &amp;quot;[server&amp;#39;s path]&amp;quot;
      Database: &amp;quot;[database name]&amp;quot;
      UID: &amp;quot;[user id]&amp;quot;
      PWD: &amp;quot;[pasword]&amp;quot;
      Port: 1433&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;default&lt;/code&gt; setting will be automatically used when development, and RStudio Connect will use the &lt;code&gt;rsconnect&lt;/code&gt; values when executing this code:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dw &amp;lt;- config::get(&amp;quot;mssql&amp;quot;)
con &amp;lt;- DBI::dbConnect(odbc::odbc(),
                      Driver = dw$Driver,
                      Server = dw$Server,
                      UID    = dw$UID,
                      PWD    = dw$PWD,
                      Port   = dw$Port,
                      Database = dw$Database)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;populate-shiny-inputs-using-purrr&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Populate Shiny inputs using &lt;code&gt;purrr&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;It is very common for Shiny inputs to retrieve their values from a table or a query. Because other queries in the dashboard will use the selected input to filter accordingly, the value required to pass to the other queries is normally an identification code, and not the label displayed in the drop down. To separate the keys from the values, the &lt;code&gt;map()&lt;/code&gt; function in the &lt;code&gt;purrr&lt;/code&gt; package can be used. In the example below, all of the records in the airlines table are collected, and a list of names is created, &lt;code&gt;map()&lt;/code&gt; is then used to insert the carrier codes into each name node.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# This code runs in ui
airline_list &amp;lt;- tbl(con, &amp;quot;airlines&amp;quot;) %&amp;gt;%
  collect  %&amp;gt;%
  split(.$name) %&amp;gt;%    # Place here the field that will be used for the labels
  map(~.$carrier)      # Place here the field that will be used for keys&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;selectInput()&lt;/code&gt; drop-down menu is able to read the resulting &lt;code&gt;airline_list&lt;/code&gt; list variable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# This code runs in ui
 selectInput(
    inputId = &amp;quot;airline&amp;quot;,
    label = &amp;quot;Airline:&amp;quot;, 
    choices = airline_list) # Use airline_list as the choices argument value&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;take-advantage-of-dplyrs-laziness&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Take advantage of &lt;code&gt;dplyr&lt;/code&gt;’s “laziness”&lt;/h2&gt;
&lt;p&gt;Dashboards normally have a common data theme, which is sourced with a common data set. A base query can be built because &lt;code&gt;dplyr&lt;/code&gt; translates into SQL under the covers and, due to “laziness”, doesn’t evaluate the query until something is requested from it.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;db_flights &amp;lt;- tbl(con, &amp;quot;flights&amp;quot;) %&amp;gt;%
  left_join(tbl(con, &amp;quot;airlines&amp;quot;), by = &amp;quot;carrier&amp;quot;) %&amp;gt;%
  rename(airline = name) %&amp;gt;%
  left_join(tbl(con, &amp;quot;airports&amp;quot;), by = c(&amp;quot;origin&amp;quot; = &amp;quot;faa&amp;quot;)) %&amp;gt;%
  rename(origin_name = name) %&amp;gt;%
  select(-lat, -lon, -alt, -tz, -dst) %&amp;gt;%
  left_join(tbl(con, &amp;quot;airports&amp;quot;), by = c(&amp;quot;dest&amp;quot; = &amp;quot;faa&amp;quot;)) %&amp;gt;%
  rename(dest_name = name) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;dplyr&lt;/code&gt; variable can then be used in more than one Shiny output. A second example is in the code used to build the &lt;code&gt;highcharter&lt;/code&gt; plot below.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;output$total_flights &amp;lt;- renderValueBox({

  result &amp;lt;- db_flights %&amp;gt;%           # Use the db_flights variable
    filter(carrier == input$airline)
  if(input$month != 99) result &amp;lt;- filter(result, month == input$month)
  
  result &amp;lt;- result %&amp;gt;%
    tally %&amp;gt;%
    pull %&amp;gt;%                        # Use pull to get the total count as a vector
    as.integer()
  
  valueBox(value = prettyNum(result, big.mark = &amp;quot;,&amp;quot;),
           subtitle = &amp;quot;Number of Flights&amp;quot;)
})&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;drill-down&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Drill down&lt;/h2&gt;
&lt;p&gt;The idea of a “drill-down” action is that the end user is able to see part or all of the data that makes up the aggregate result displayed in the dashboard. A “drill-down” action has two parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A dashboard element that displays a result is clicked. &lt;/strong&gt; The result is usually aggregate data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A new screen is displayed with another report.&lt;/strong&gt; The new report could be another report showing a lower-level aggregation, or it could show a list of rows that make up the result.&lt;/li&gt;
&lt;/ul&gt;
&lt;div id=&#34;a-dashboard-element-is-clicked&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;A dashboard element is clicked&lt;/h3&gt;
&lt;p&gt;The following is one way that capturing a click event is possible. The idea is to display the top airport destinations for a given airline in a bar plot. When a bar is clicked, the desired result is for the plot to activate a drill-down. The &lt;code&gt;highcharter&lt;/code&gt; package will be used in this example.&lt;/p&gt;
&lt;p&gt;To capture a bar-click event in &lt;code&gt;highcharter&lt;/code&gt;, a small JavaScript needs to be written. The example below could be used in most cases, so you can copy and paste it as-is into your code. The variable name and the input name (&lt;code&gt;bar_clicked&lt;/code&gt;) would be the only two statements that would have to be changed to match your chart.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt; js_bar_clicked &amp;lt;- JS(&amp;quot;function(event) {Shiny.onInputChange(&amp;#39;bar_clicked&amp;#39;, [event.point.category]);}&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The command above creates a new JavaScript inside R that makes it possible to track when a bar is clicked. Here is a breakdown of the code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;JS&lt;/strong&gt; - Indicates that the following function is JavaScript&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;function(event)&lt;/strong&gt; - Creates a new function, and expect an &lt;code&gt;event&lt;/code&gt; variable. The event that Highchart will pass is when a bar is clicked on, so the &lt;code&gt;event&lt;/code&gt; will contain information about that given bar.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shiny.onInputChange&lt;/strong&gt; - Is the function that JavaScript will use to interact with Shiny&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;bar_clicked&lt;/strong&gt; - Is the name of a new Shiny input; its value will default to the next item&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[event.point.category]&lt;/strong&gt; - Passes the &lt;strong&gt;category&lt;/strong&gt; value of the &lt;strong&gt;point&lt;/strong&gt; where the click was made&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The next section will illustrate how to capture the change of the new &lt;code&gt;input$bar_clicked&lt;/code&gt;, and perform the second part of the “drill down”.&lt;/p&gt;
&lt;p&gt;In the &lt;code&gt;renderHighchart()&lt;/code&gt; output function, the variable that contains the JavaScript is passed as part of a list of events: &lt;code&gt;events = list(click = js_bar_clicked))&lt;/code&gt;. Because the event is inside the &lt;code&gt;hc_add_series()&lt;/code&gt; that creates the bar plot, then such click-event is tied to when the bar is clicked.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;output$top_airports &amp;lt;- renderHighchart({
  # Reuse the dplyr db_flights variable as the base query
  result &amp;lt;- db_flights %&amp;gt;%
    filter(carrier == input$airline) 
  if(input$month != 99) result &amp;lt;- filter(result, month == input$month) 
  result &amp;lt;- result %&amp;gt;
    group_by(dest_name) %&amp;gt;%
    tally() %&amp;gt;%
    arrange(desc(n)) %&amp;gt;%                          
    collect %&amp;gt;%
    head(10)                                      
  highchart() %&amp;gt;%
    hc_add_series(
      data = result$n, 
      type = &amp;quot;bar&amp;quot;,
      name = paste(&amp;quot;No. of Flights&amp;quot;),
      events = list(click = js_bar_clicked)) %&amp;gt;%   # The JavaScript variable is called here
    hc_xAxis(
      categories = result$dest_name,               # Value in event.point.category
        tickmarkPlacement=&amp;quot;on&amp;quot;)})&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;using-appendtab-to-create-the-drill-down-report&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Using &lt;code&gt;appendTab()&lt;/code&gt; to create the drill-down report&lt;/h3&gt;
&lt;p&gt;The plan is to display a new drill-down report every time the end user clicks on a bar. To prevent pulling the same data unnecessarily, the code will be smart enough to simply switch the focus to an existing tab if the same bar has been clicked on before.&lt;/p&gt;
&lt;p&gt;The new, and really cool, &lt;code&gt;appendTab()&lt;/code&gt; function is used to dynamically create a new Shiny tab with a &lt;strong&gt;DataTable&lt;/strong&gt; that contains the first 100 rows of the selection. A simple vector, called &lt;code&gt;tab_list&lt;/code&gt;, is used to track all existing detail tabs. The &lt;code&gt;updateTabsetPanel()&lt;/code&gt; function is used to switch to the newly or previously created tab.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;observeEvent()&lt;/code&gt; function is the one that “catches” the event executed by the JavaScript, because it monitors the &lt;code&gt;bar_clicked&lt;/code&gt; Shiny input. Comments are added to the code below to cover more aspects of how to use these features.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tab_list &amp;lt;- NULL

observeEvent(input$bar_clicked,{  
       airport &amp;lt;- input$bar_clicked[1]              # Selects the first value sent in [event.point.category]
       tab_title &amp;lt;- paste(input$airline,            # tab_title is the tab&amp;#39;s name and unique identifier
                          &amp;quot;-&amp;quot;, airport ,            
                          if(input$month != 99)     
                            paste(&amp;quot;-&amp;quot; , month.name[as.integer(input$month)]))
       
       if(tab_title %in% tab_list == FALSE){        # Checks to see if the title already exists
         details &amp;lt;- db_flights %&amp;gt;%                  # Reuses the db_flights dbplyr variable
           filter(dest_name == airport,             # Uses the [event.point.category] value for the filter
                  carrier == input$airline)         # Matches the current airline filter
         
         if(input$month != 99)                      # Matches the current month selection
            details &amp;lt;- filter(details, month == input$month) 
         details &amp;lt;- details %&amp;gt;%
           head(100) %&amp;gt;%                            # Select only the first 100 records
           collect()                                # Brings the 100 records into the R environment 
           
         appendTab(inputId = &amp;quot;tabs&amp;quot;,                # Starts a new Shiny tab inside the tabsetPanel named &amp;quot;tabs&amp;quot;
                   tabPanel(
                     tab_title,                     # Sets the name &amp;amp; ID
                     DT::renderDataTable(details)   # Renders the DataTable with the 100 newly collected rows
                   ))
         tab_list &amp;lt;&amp;lt;- c(tab_list, tab_title)        # Adds the new tab to the list, important to use &amp;lt;&amp;lt;- 
         }
         
       # Switches over to a panel that matched the name in tab_title.  
       # Notice that this function sits outside the if statement because
       # it still needs to run to select a previously created tab
       updateTabsetPanel(session, &amp;quot;tabs&amp;quot;, selected = tab_title)  
     })&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;remove-all-tabs-using-removetab-and-purrr&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Remove all tabs using &lt;code&gt;removeTab()&lt;/code&gt; and &lt;code&gt;purrr&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Creating new tabs dynamically can clutter the dashboard. So a simple &lt;code&gt;actionLink()&lt;/code&gt; button can be added to the &lt;code&gt;dashboardSidebar()&lt;/code&gt; in order to remove all tabs except the main dashboard tab.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# This code runs in ui
  dashboardSidebar(
       actionLink(&amp;quot;remove&amp;quot;, &amp;quot;Remove detail tabs&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;observeEvent()&lt;/code&gt; function is used once more to catch when the link is clicked. The &lt;code&gt;walk()&lt;/code&gt; command from &lt;code&gt;purrr&lt;/code&gt; is then used to iterate through each tab title in the &lt;code&gt;tab_list&lt;/code&gt; vector and proceeds to execute the Shiny &lt;code&gt;removeTab()&lt;/code&gt; command for each name. After that, the tab list variable is reset. Because of environment scoping, make sure to use double less than ( &lt;code&gt;&amp;lt;&amp;lt;-&lt;/code&gt; ) when resetting the variable, so it knows to reset the variable defined outside of the &lt;code&gt;observeEvent()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# This code runs in server
  observeEvent(input$remove,{
    # Use purrr&amp;#39;s walk command to cycle through each
    # panel tabs and remove them
    tab_list %&amp;gt;%
      walk(~removeTab(&amp;quot;tabs&amp;quot;, .x))
    tab_list &amp;lt;&amp;lt;- NULL
  })
  &lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This example uses Shinydashboard to create enterprise dashboards, but there are other technologies as well. Flexdashboard is a great way to build similar enterprise dashboards in R Markdown. We used SQL Server to populate this dashboard, but you can use any database. For more information on using databases with R, see &lt;a href=&#34;http://db.rstudio.com/&#34; class=&#34;uri&#34;&gt;http://db.rstudio.com/&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/09/20/dashboards-with-r-and-databases/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Visualizations with R and Databases</title>
      <link>https://rviews.rstudio.com/2017/08/16/visualizations-with-r-and-databases/</link>
      <pubDate>Wed, 16 Aug 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/08/16/visualizations-with-r-and-databases/</guid>
      <description>
        


&lt;div id=&#34;the-challenge&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The Challenge&lt;/h2&gt;
&lt;p&gt;Visualizations are one of R’s strengths. There are many functions and packages that create complex plots, often with one simple command. These plotting functions do two things: first, they take the raw data and run the calculations needed for a given visualization, and second, they draw the plot. If the source of the data resides within a database, the usual approach is to import all of the data and then create the plot. This is a problem, especially if the data is large.&lt;/p&gt;
&lt;p&gt;A strategy to address this problem is found in the new &lt;a href=&#34;http://db.rstudio.com/&#34;&gt;Database with RStudio&lt;/a&gt; website. The &lt;a href=&#34;http://db.rstudio.com/visualization/&#34;&gt;Creating Visualizations&lt;/a&gt; page outlines a solution that introduces the &lt;em&gt;“Transform in Database, plot in R”&lt;/em&gt; concept, and demonstrates its practical implementation. The article focused on knowledge sharing, rather than on providing a tool.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;introducing-dbplot&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introducing &lt;code&gt;dbplot&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The new &lt;code&gt;dbplot&lt;/code&gt; package is meant to collect multiple functions for in-database visualization code. It implements the principles laid out in the &lt;a href=&#34;http://db.rstudio.com/visualization/&#34;&gt;Creating Visualizations&lt;/a&gt; page, and it provides three types of functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Helper functions that return a &lt;code&gt;ggplot2&lt;/code&gt; visualization&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Helper functions that return the results of the plot’s calculations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;db_bin()&lt;/code&gt; function introduced in the &lt;strong&gt;Creating Visualizations&lt;/strong&gt; page&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The package provides calculations or “base” &lt;code&gt;ggplot2&lt;/code&gt; visualizations for the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bar plot&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Line plot&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Histogram&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Raster&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;installation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Installation&lt;/h2&gt;
&lt;p&gt;Install &lt;code&gt;dbplot&lt;/code&gt; from GitHub using the &lt;code&gt;devtools&lt;/code&gt; package&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;devtools::install_github(&amp;quot;edgararuiz/dbplot&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;p&gt;This example will use a Microsoft SQL Server database connection to provide a quick glance of how the package works. For more examples, please visit the &lt;a href=&#34;https://github.com/edgararuiz/dbplot&#34;&gt;package’s GitHub repository&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;dbplot-functions&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;strong&gt;dbplot&lt;/strong&gt; functions&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;dbplot_histogram()&lt;/code&gt; function creates a 30-bin histogram by default. Because it uses &lt;code&gt;dplyr&lt;/code&gt; commands to perform the bin calculations, the function will work with any database that has &lt;code&gt;dplyr&lt;/code&gt; support, including &lt;code&gt;sparklyr&lt;/code&gt;. The only caveat is that the database must support basic functions like &lt;code&gt;max()&lt;/code&gt; and &lt;code&gt;min()&lt;/code&gt;, which some database types do not support.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dbplyr)

tbl(con, &amp;quot;airports&amp;quot;) %&amp;gt;% 
  dbplot_histogram(alt)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-08-14-database-visualize_files/figure-html/unnamed-chunk-3-1.png&#34;, width = 500, height = 400&gt;&lt;/p&gt;
&lt;p&gt;This example shows how the resulting plot object can be further refined after the &lt;code&gt;dbplot_histogram()&lt;/code&gt; function returns a plot:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tbl(con, &amp;quot;airports&amp;quot;) %&amp;gt;% 
  dbplot_histogram(alt, binwidth = 700) + 
  labs(title = &amp;quot;Airports Altitude&amp;quot;) +
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-08-14-database-visualize_files/figure-html/unnamed-chunk-4-1.png&#34;, width = 500, height = 400&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;db_compute-functions&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;strong&gt;db_compute&lt;/strong&gt; functions&lt;/h3&gt;
&lt;p&gt;If more control over the plot is needed, then the &lt;code&gt;db_compute_bins()&lt;/code&gt; function returns a data frame with the lowest value of each bin and the record count per bin:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tbl(con, &amp;quot;airports&amp;quot;) %&amp;gt;% 
  db_compute_bins(alt)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## # A tibble: 28 x 2
##       alt count
##     &amp;lt;dbl&amp;gt; &amp;lt;int&amp;gt;
##  1  -54.0   559
##  2  250.4   176
##  3  554.8   203
##  4  859.2   131
##  5 1163.6    82
##  6 1468.0    40
##  7 1772.4    20
##  8 2076.8    18
##  9 2381.2    16
## 10 2685.6    12
## # ... with 18 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The results of the compute command can then be piped into a plot:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tbl(con, &amp;quot;airports&amp;quot;) %&amp;gt;% 
  db_compute_bins(alt) %&amp;gt;%
  ggplot() +
  geom_col(aes(alt, count, fill = count))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-08-14-database-visualize_files/figure-html/unnamed-chunk-6-1.png&#34;, width = 500, height = 400&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;db_bin&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;&lt;strong&gt;db_bin()&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;dbplot&lt;/code&gt; package includes the &lt;code&gt;db_bin()&lt;/code&gt; function, first introduced in the &lt;strong&gt;Creating Visualizations&lt;/strong&gt; page. For more information, please read the &lt;a href=&#34;http://db.rstudio.com/visualization/#histogram&#34;&gt;Histogram&lt;/a&gt; section.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;db_bin(any_field)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## (((max(any_field) - min(any_field))/(30)) * ifelse((as.integer(floor(((any_field) - 
##     min(any_field))/((max(any_field) - min(any_field))/(30))))) == 
##     (30), (as.integer(floor(((any_field) - min(any_field))/((max(any_field) - 
##     min(any_field))/(30))))) - 1, (as.integer(floor(((any_field) - 
##     min(any_field))/((max(any_field) - min(any_field))/(30))))))) + 
##     min(any_field)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;next-steps&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Next steps&lt;/h2&gt;
&lt;p&gt;More plots will be possible as &lt;code&gt;dplyr&lt;/code&gt;-to-SQL translations are fine-tuned and enhanced. The &lt;code&gt;dbplot&lt;/code&gt; package will be the place where new calculations and plots will be implemented.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/08/16/visualizations-with-r-and-databases/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Some Ideas for your Internal R Package</title>
      <link>https://rviews.rstudio.com/2017/07/19/supporting-corporate-r-user-groups/</link>
      <pubDate>Wed, 19 Jul 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/07/19/supporting-corporate-r-user-groups/</guid>
      <description>
        
&lt;!-- BLOGDOWN-HEAD --&gt;
&lt;!-- /BLOGDOWN-HEAD --&gt;

&lt;!-- BLOGDOWN-BODY-BEFORE --&gt;
&lt;!-- /BLOGDOWN-BODY-BEFORE --&gt;
&lt;p&gt;At RStudio, I have the pleasure of interacting with data science teams around the world. Many of these teams are led by R users stepping into the role of &lt;a href=&#34;https://rviews.rstudio.com/2017/06/21/analytics-administration-for-r/&#34;&gt;analytic admins&lt;/a&gt;. These users are responsible for supporting and growing the R user base in their organization and often lead internal R user groups.&lt;/p&gt;
&lt;p&gt;One of the most successful strategies to support a corporate R user group is the creation of an internal R package. This article outlines some common features and functions shared in internal packages. Creating an R package is easier than you might expect. A good place to start is this &lt;a href=&#34;https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-programming-part-3/&#34;&gt;webinar on package creation&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;logos-and-custom-css&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Logos and Custom CSS&lt;/h2&gt;
&lt;p&gt;Interestingly, one powerful way to increase the adoption of data science outputs - plots, reports, and even slides - is to stick to consistent branding. Having a common look and feel makes it easier for management to recognize the work of the data science team, especially as the team grows. Consistent branding also saves the R user time that would normally be spent picking fonts and color schemes.&lt;/p&gt;
&lt;p&gt;It is easy to include logos and custom CSS inside of an R package, and to write wrapper functions that copy the assets from the package to a user’s local working directory. For example, this wrapper function adds a logo from the &lt;code&gt;RStudioInternal&lt;/code&gt; package to the working directory:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;getLogo &amp;lt;- function(copy_to = getwd()){
      copy_to &amp;lt;- normalizePath(copy_to)
      file.copy(system.file(“logo.png”, package = “RStudioInternal”) , copy_to)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once available, logos and CSS can be added to &lt;a href=&#34;https://shiny.rstudio.com/articles/css.html&#34;&gt;Shiny apps&lt;/a&gt; and &lt;a href=&#34;http://rmarkdown.rstudio.com/html_document_format.html#custom_css&#34;&gt;R Markdown documents&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ggplot2-themes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;ggplot2 Themes&lt;/h2&gt;
&lt;p&gt;Similar to logos and custom CSS, many internal R packages include a custom ggplot2 theme. These themes ensure consistency across plots in an organization, making data science outputs easier to recognize and read.&lt;/p&gt;
&lt;p&gt;ggplot2 themes are shared as functions. To get started writing a ggplot2 theme, see &lt;a href=&#34;http://ggplot2.tidyverse.org/reference/theme.html&#34;&gt;resource 1&lt;/a&gt;. For inspiration, take a look at the &lt;a href=&#34;https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html&#34;&gt;ggthemes package&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;data-connections&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Data Connections&lt;/h2&gt;
&lt;p&gt;Internal R packages are also an effective way to share functions that make it easy for analysts to connect to internal data sources. Nothing is more frustrating for a first time R user than trying to navigate the world of ODBC connections and complex database schemas before they can get started with data relevant to their day-to-day job.&lt;/p&gt;
&lt;p&gt;If you’re not sure where to begin, look through your own scripts for common database connection strings or configurations. &lt;a href=&#34;https://db.rstudio.com&#34;&gt;db.rstudio.com&lt;/a&gt; can provide more information on how to handle credentials, drivers, and config files.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;learnr-tutorials&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;learnr Tutorials&lt;/h2&gt;
&lt;p&gt;RStudio recently released a new package for creating interactive tutorials in R Markdown called &lt;a href=&#34;https://rstudio.github.io/learnr/&#34;&gt;learnr&lt;/a&gt;. There are many great resources online for getting started with R, but it can be useful to create tutorials specific to your internal data and domain. learnr tutorials can serve as training wheels for the other components of the internal R package or teach broader concepts and standards accepted across the organization. For example, you might provide a primer that teaches new users your organization’s R style guide.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-07-17-corporate-r-user-groups/learnr_example.gif&#34; /&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;sharing-an-internal-r-package&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Sharing an Internal R Package&lt;/h2&gt;
&lt;p&gt;Internal packages can be built in the RStudio IDE and distributed as tar files. Alternatively, many organizations use RStudio Server or RStudio Server Pro to standardize the R environment in their organization. In addition to making it easy to share an internal package, a standard compute environment keeps new R users from having to spend time installing R, RStudio, and packages. While these are necessary skills, the first interactions with R should get new users to a data insight as fast as possible. RStudio Server Pro also includes IT functions for monitoring and restricting resources.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;wrap-up&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Wrap Up&lt;/h2&gt;
&lt;p&gt;If you are leading an R group, an internal R package is a powerful way to support your users and the adoption of R. Imagine how easy it would be to introduce R to co-workers if they could connect to real, internal data and create a useful, beautiful plot in under 10 minutes. Investing in an internal R packages makes that on boarding experience possible.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/07/19/supporting-corporate-r-user-groups/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analytics Administration for R</title>
      <link>https://rviews.rstudio.com/2017/06/21/analytics-administration-for-r/</link>
      <pubDate>Wed, 21 Jun 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/06/21/analytics-administration-for-r/</guid>
      <description>
        
&lt;!-- BLOGDOWN-HEAD --&gt;
&lt;!-- /BLOGDOWN-HEAD --&gt;

&lt;!-- BLOGDOWN-BODY-BEFORE --&gt;
&lt;!-- /BLOGDOWN-BODY-BEFORE --&gt;
&lt;p&gt;Analytic administrator is a role that data scientists assume when they onboard new tools, deploy solutions, support existing standards, or train other data scientists. It is a role that works closely with IT to maintain, upgrade, and scale analytic environments. Analytic admins have a multiplier effect - as they go about their work, they influence others in the organization to be more effective. If you are a data scientist using R, you might consider filling the role of analytic admin for your organization.&lt;/p&gt;
&lt;p&gt;Consider the data scientist who wants to make R a legitimate part of their organization. This person has to introduce a new technology and help IT build the architecture around it. In this role, the data scientist – acting as an analytic admin – influences their entire organization.&lt;/p&gt;
&lt;div id=&#34;the-need-for-analytic-admins&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;The need for analytic admins&lt;/h1&gt;
&lt;p&gt;What organizations need analytic admins? Analytic admins are important for any organization that wants to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Modernize their analytic tools&lt;/li&gt;
&lt;li&gt;Take advantage of all their data&lt;/li&gt;
&lt;li&gt;Build analytic products and applications&lt;/li&gt;
&lt;li&gt;Develop a best-in-class data science team&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Despite the fact that the need for analytic admins is pervasive in industry, companies rarely list it as a dedicated role. Instead, they require teamwork between data science and IT operations, or they may require data scientists to function as their own admins. But the need is real. Most organizations need help bridging the gap between data science and IT. If you see an opportunity to function in the capacity as an analytics admin, I suggest you take it.&lt;/p&gt;
&lt;p&gt;Analytic admins typically have to train themselves and carve out their own career. It is common for data scientists who operate as analytic admins to feel as though they are in no-man’s land. It is natural to feel lost between the worlds of data science and information technology. As someone who had been there, I can say the feeling is disorienting. However, I can also say the value of that position is tremendous. If you feel like you are operating in no-man’s land as you function in this role, just know you are exactly where you need to be.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-tooling-and-integration&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;R tooling and integration&lt;/h1&gt;
&lt;p&gt;At RStudio, we think about doing data science as a development process that begins with accessing and understanding your data, and then communicating your results. This process is thoroughly explained in the book &lt;a href=&#34;http://r4ds.had.co.nz/explore-intro.html&#34;&gt;R for Data Science&lt;/a&gt;, by Wickham and Grolemond.&lt;/p&gt;
&lt;p&gt;RStudio builds open-source and enterprise-ready products to help you do data science in R. These products include the RStudio IDE, RStudio Connect, and Shiny Server. These are designed to work with open-source R packages like Shiny, R Markdown, and the Tidyverse.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-06-21-analytics-administration-for-r_files/rstudio-toolchain.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Most of the software that RStudio makes is open source, but enterprises often require additional professional features. Common Professional features are security, authentication, high availability, administration, and load balancing.&lt;/p&gt;
&lt;p&gt;R is also used with production environments for hosting web applications, exposing APIs, and automating workflows. R is sometimes integrated into other systems such as data warehouses, Hadoop, and Spark. The role of the analytic admin is to provide tooling for data scientists, as well as to integrate R into production systems.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;linux-and-r&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Linux and R&lt;/h1&gt;
&lt;p&gt;RStudio products run on Linux, so understanding Linux will help you become self-sufficient, use R with other systems, and build better solutions. We will talk more about what you can do with Linux commands in an upcoming blog post.&lt;/p&gt;
&lt;p&gt;There are many resources for learning Linux online. Here is just &lt;a href=&#34;https://training.linuxfoundation.org/free-linux-training&#34;&gt;one offered by the Linux Foundation&lt;/a&gt;. Analytics admins need to know how to navigate (e.g., &lt;code&gt;cd&lt;/code&gt;, &lt;code&gt;pwd&lt;/code&gt;, &lt;code&gt;ls&lt;/code&gt;), install Linux packages (e.g., &lt;code&gt;apt-get install&lt;/code&gt;), and execute commands as root (e.g., &lt;code&gt;sudo&lt;/code&gt;). Also important are tab completion, keyboard shortcuts, and text editors (e.g., vim, nano).&lt;/p&gt;
&lt;p&gt;Did you know you can execute basic Linux commands from inside RStudio Server using the Tools &amp;gt; Shell option? You can also execute Linux commands inside the R console with the &lt;code&gt;system&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;Another major benefit of learning Linux is the ability to administer production systems that run with Shiny Server, and the ability to deploy Shiny web applications into production.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;running-shiny-in-production&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Running Shiny in production&lt;/h1&gt;
&lt;p&gt;There is a growing trend in using Shiny web apps in production analytic workflows. The vibrant Shiny community now spans all verticals including pharmaceuticals, high technology, and finance. For many organizations, adopting Shiny is their first experience in running R in production.&lt;/p&gt;
&lt;p&gt;Production environments that depend on Shiny also need analytic admins who can deploy and support these applications. For example, some organizations now have complex Shiny applications that serve hundreds of end users over a cluster of load-balanced Shiny Servers. These applications often go through a standard development &amp;gt; test &amp;gt; production deployment process. New tools are being built for &lt;a href=&#34;https://github.com/rstudio/shinytest&#34;&gt;correctness testing&lt;/a&gt; and &lt;a href=&#34;https://github.com/rstudio/shinyloadtest&#34;&gt;load testing&lt;/a&gt; in Shiny. RStudio and other platform vendors are making significant investments in building architectures - like Shiny Server and RStudio Connect - that will help Shiny grow over the long term.&lt;/p&gt;
&lt;p&gt;The growth of Shiny opens an opportunity to analytic admins who want to make analytic content available to a wide audience. Shiny apps allow end users who know nothing about R to take advantage of the power of the R programming language. They have the potential to influence decision-makers who can take actions and see results based on the work data scientists share with them. There is an immediate need for analytic admins who understand Shiny and can help support environments that depend on Shiny.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;getting-started-installing-rstudio-server&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Getting started: Installing RStudio Server&lt;/h1&gt;
&lt;p&gt;A great way to get started learning analytics administration is to build your own open source RStudio Server on Linux. Building an RStudio Server by hand is the analytic admin equivalent of the Jedi building their own light sabers. It’s a core skill, so you should be able to do it yourself no matter what.&lt;/p&gt;
&lt;p&gt;An easy way to get started with RStudio Server is to set it up on Ubuntu with Amazon Web Services. AWS even has an instruction guide for &lt;a href=&#34;https://aws.amazon.com/blogs/big-data/running-r-on-aws/&#34;&gt;running R on AWS&lt;/a&gt;. The core commands of the install are the following four lines of code (note: this installs RStudio Server version 1.0.143).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ sudo apt-get install r-base
$ sudo apt-get install gdebi-core
$ wget https://download2.rstudio.org/rstudio-server-1.0.143-amd64.deb
$ sudo gdebi rstudio-server-1.0.143-amd64.deb&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, your installation is going to require more than just installing RStudio Server. You will probably want use the CRAN repository, install Linux dependencies, add users, and manage R packages. Here is a complete script I used to set up RStudio Server on a simple AWS AMI (ami-efd0428f) using a T2-medium instance. I included instructions from &lt;a href=&#34;https://www.digitalocean.com/community/tutorials/how-to-install-r-on-ubuntu-16-04-2&#34;&gt;this document&lt;/a&gt; on how to install R from CRAN. I also opened port 8787 in my AWS security group so I could log into RStudio Server via my web browser.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;### Simple RStudio Server Install
### Based on AWS image: ami-efd0428f
 
## Install R from CRAN repository
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
$ sudo add-apt-repository &amp;#39;deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/&amp;#39;
$ sudo apt-get update
$ sudo apt-get -y install r-base
 
## Install RStudio Server version 1.0.143
$ sudo apt-get install gdebi-core
$ wget https://download2.rstudio.org/rstudio-server-1.0.143-amd64.deb
$ sudo gdebi rstudio-server-1.0.143-amd64.deb
 
## Add a new user
$ sudo useradd -m myuser
$ sudo passwd myuser
 
## (Optional - may take time) Install common Linux dependencies
$ sudo apt-get -y install libcurl4-openssl-dev openssl libssl-dev
$ sudo apt-get -y install texlive texlive-latex-extra libxml2-dev
 
## (Optional - may take time) Install common R packages
$ sudo Rscript -e &amp;#39;install.packages(&amp;quot;shiny&amp;quot;, repos = &amp;quot;http://cran.rstudio.com/&amp;quot;)&amp;#39;
$ sudo Rscript -e &amp;#39;install.packages(&amp;quot;tidyverse&amp;quot;, repos = &amp;quot;http://cran.rstudio.com/&amp;quot;)&amp;#39;
 
## Point your browser to &amp;lt;AWS-instance-IP&amp;gt;:8787&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you don’t want to install RStudio Server from scratch, there are other ways to get started. One is to use a community AMI like &lt;a href=&#34;http://www.louisaslett.com/RStudio_AMI/&#34;&gt;this one&lt;/a&gt;. Another is to use the &lt;a href=&#34;https://aws.amazon.com/marketplace/pp/B06W2G9PRY?qid=1497719355342&amp;amp;sr=0-1&amp;amp;ref_=srh_res_product_title&#34;&gt;AWS Marketplace&lt;/a&gt; to install RStudio Server Pro with 1-Click Launch.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;Installation is just the first step to administering R. You should also consider the topics of authentication, security, scale, integration, hardware sizing, and configuration. Systems administrators have to do a lot of their own training, and analytic admins are no different. Fortunately, there are plenty of references to help you get started. Here are a few useful references for learning analytic administration for R, RStudio, and Shiny.&lt;/p&gt;
&lt;div id=&#34;rstudio-products&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;RStudio Products&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://docs.rstudio.com/&#34;&gt;RStudio documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us&#34;&gt;RStudio Support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/resources/webinars/administration-of-rstudio-connect-in-production/&#34;&gt;Administering RStudio Server Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/resources/webinars/administering-shiny-server-pro/&#34;&gt;Administering Shiny Server Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/resources/webinars/administration-of-rstudio-connect-in-production/&#34;&gt;Administration of RStudio Connect in Production&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;authentication-and-security&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Authentication and security&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/226865027-Authentication-in-RStudio-Connect&#34;&gt;Authentication in RStudio Connect&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/115000782547-Security-FAQ&#34;&gt;Security FAQ&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/221682007-Security-features-in-RStudio-Server-Pro&#34;&gt;Security features in RStudio Server Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/217801438-Can-I-load-balance-across-multiple-nodes-running-Shiny-Server-Pro-&#34;&gt;Can I load balance across multiple nodes running Shiny Server Pro?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;managing-r-packages&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Managing R Packages&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/215733837-Managing-libraries-for-RStudio-Server&#34;&gt;Managing libraries for RStudio Server&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/226871467-Package-management-in-RStudio-Connect&#34;&gt;Package management in RStudio Connect&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/206827897-Secure-Package-Downloads-for-R&#34;&gt;Secure package downloads for R&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/115006298728-Package-Management-for-Offline-RStudio-Connect-Installations&#34;&gt;Package Management for Offline RStudio Connect Installations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;shiny-server&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Shiny Server&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/221319028-How-do-I-deploy-Shiny-applications-to-Shiny-Server-&#34;&gt;How do I deploy Shiny applications to Shiny Server?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/220546267-Scaling-and-Performance-Tuning-Applications-in-Shiny-Server-Pro&#34;&gt;Scaling and Performance - Tuning Applications in Shiny Server Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/219482057-Shiny-Server-Pro-Authentication-Examples&#34;&gt;Shiny Server Pro: Authentication examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/products/shiny/download-server/&#34;&gt;Shiny Server download&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;other-useful-links&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Other useful links&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/products/rstudio/download-server/&#34;&gt;RStudio Server Download&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/236226087-Scaling-R-and-RStudio&#34;&gt;Scaling R and RStudio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/115002344588-Configuration-and-sizing-recommendations&#34;&gt;Configuration and sizing recommendations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/06/21/analytics-administration-for-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Databases using R</title>
      <link>https://rviews.rstudio.com/2017/05/17/databases-using-r/</link>
      <pubDate>Wed, 17 May 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/05/17/databases-using-r/</guid>
      <description>
        


&lt;div id=&#34;current-state&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Current State&lt;/h1&gt;
&lt;p&gt;Using databases is unavoidable for those who analyze data as part of their jobs. As R developers, our first instinct may be to approach databases the same way we do regular files. We may attempt to read the data either all at once or as few times as possible. The aim is to reduce the number of times we go back to the data ‘well’, so our queries extract as much data as possible. After that, we spend cycles analyzing the data in memory. Here is what this model looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-11-databases-using-r_files/today.png&#34;  height=&#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;
&lt;p&gt;Because the volume of data is significant with this approach, we usually attempt to come up with strategies to minimize the resources and time it takes to analyze the data. We may try to retrieve all rows of a few columns, or a few rows of several of columns. Another tactic is to save the query results into individual files for later analysis.&lt;/p&gt;
&lt;p&gt;An improvement to the current approach would be to use the database’s SQL Engine to perform as much of the data exploration as possible. An enterprise-grade SQL server will have more power, and will be better tuned, to execute transformations of large amounts of data. Our goal would then be to bring into R a more targeted data set that will be used for visualization and modeling.&lt;/p&gt;
&lt;p&gt;This improvement comes at a cost: we will need to know how to write SQL queries, and will have to switch between both languages. We may also end up using an external querying tool that is able to provide a list of tables and inline SQL code helpers. Of course, this involves switching between tools. On a personal note, I used to switch from R to Microsoft SQL Management Studio. After I that, I would bring the finalized query back into my code in R.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;a-better-way&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;A better way&lt;/h1&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-11-databases-using-r_files/better.png&#34;  height=&#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;dplyr&lt;/code&gt; package simplifies data transformation. It provides a consistent set of functions, called verbs, that can be used in succession and interchangeably to gain understanding of the data iteratively. The first time I re-wrote R code using &lt;code&gt;dplyr&lt;/code&gt;, the new script was at least half as long and much easier to understand.&lt;/p&gt;
&lt;p&gt;Another nice thing about &lt;code&gt;dplyr&lt;/code&gt; is that it can interact with databases directly. It accomplishes this by translating the &lt;code&gt;dplyr&lt;/code&gt; verbs into SQL queries. This incredibly convenient feature allows us to ‘speak’ directly with the database from R, thus resolving the issues brought up in the previous section:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run data exploration over all of the data&lt;/strong&gt; - Instead of coming up with a plan to decide what data to import, we can focus on analyzing the data inside the database, which in turn should yield faster insights.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use the SQL Engine to run the data transformations&lt;/strong&gt; - We are, in effect, pushing the computation to the database because &lt;code&gt;dplyr&lt;/code&gt; is sending SQL queries to the database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Collect a targeted dataset&lt;/strong&gt; - After become familiar with the data and choosing the data points that will either be shared or modeled, a final query can then be used to bring back only that data into memory in R.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;All your code is in R!&lt;/strong&gt; - Because we are using &lt;code&gt;dplyr&lt;/code&gt; to communicate with the database, there is no need to change language, or tools, to perform the data exploration.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;example&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Example&lt;/h1&gt;
&lt;p&gt;There are three things that we will need to get started:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A database we can access&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A database driver installed in either our workstation or RStudio Server&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All of the required packages installed in R&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this section, we will demonstrate how to access a Microsoft SQL Server database from a workstation that is running on Microsoft Windows.&lt;/p&gt;
&lt;div id=&#34;database-driver&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Database Driver&lt;/h2&gt;
&lt;p&gt;A &lt;strong&gt;database driver&lt;/strong&gt; is a program that allows the workstation and the database to communicate. In Microsoft Windows, the drivers that connect to MS SQL databases are installed by default. We need the &lt;strong&gt;name&lt;/strong&gt; of the driver that will be used inside our code in R. The easiest way to do this is to open the ODBC Data Source Administrator. To find it in your, system please refer to this article: &lt;a href=&#34;https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/check-the-odbc-sql-server-driver-version-windows&#34;&gt;Check the ODBC SQL Server Driver Version (Windows)&lt;/a&gt; . Once the administrator program is open, click on the &lt;strong&gt;Drivers&lt;/strong&gt; tab. In my laptop, these are the drivers available. I will use &lt;strong&gt;SQL Server&lt;/strong&gt; for the Driver argument in my connection in R.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-05-11-databases-using-r_files/odbc.png&#34;  height=&#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-packages&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;R packages&lt;/h2&gt;
&lt;p&gt;Besides &lt;code&gt;dplyr&lt;/code&gt;, the following packages are required:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;odbc&lt;/code&gt; - This is the interface between the database driver and R&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DBI&lt;/code&gt; - Standardizes the functions related to database operations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dbplyr&lt;/code&gt; - Enables &lt;code&gt;dplyr&lt;/code&gt; to interact with databases. It also contains the vendor-specific SQL translations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The database accessibility feature is still being developed, so we will use the development versions of &lt;code&gt;dbplyr&lt;/code&gt; and &lt;code&gt;dplyr&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;devtools::install_github(&amp;quot;tidyverse/dplyr&amp;quot;)
devtools::install_github(&amp;quot;tidyverse/dbplyr&amp;quot;)
devtools::install_github(&amp;quot;rstats-db/odbc&amp;quot;)
install.packages(&amp;quot;DBI&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;connect-to-the-database&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Connect to the database&lt;/h2&gt;
&lt;p&gt;We will use the &lt;code&gt;dbConnect()&lt;/code&gt; function from the &lt;code&gt;DBI&lt;/code&gt; package to connect to the database. The value for the &lt;code&gt;Driver&lt;/code&gt; argument is the name we determined in the &lt;em&gt;Database Driver&lt;/em&gt; section above.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;library(DBI)

con &amp;lt;- dbConnect(odbc::odbc(),
                   Driver    = &amp;quot;SQL Server&amp;quot;, 
                   Server    = &amp;quot;localhost&amp;quot;,
                   Database  = &amp;quot;airontime&amp;quot;,
                   UID       = [My User ID],
                   PWD       = [My Password],
                   Port      = 1433)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A very useful function in &lt;code&gt;DBI&lt;/code&gt; is &lt;code&gt;dbListTables()&lt;/code&gt;, which retrieves the names of available tables.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;dbListTables(con)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;[1] &amp;quot;airlines&amp;quot; &amp;quot;airport&amp;quot;  &amp;quot;airports&amp;quot; &amp;quot;faithful&amp;quot; &amp;quot;flights&amp;quot;  &amp;quot;iris&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Another useful function is the &lt;code&gt;dbListFields&lt;/code&gt;, which returns a vector with all of the column names in a table.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;dbListFields(con, &amp;quot;flights&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;[1] &amp;quot;year&amp;quot;           &amp;quot;month&amp;quot;          &amp;quot;day&amp;quot;            &amp;quot;dep_time&amp;quot;       &amp;quot;sched_dep_time&amp;quot;  [6] &amp;quot;dep_delay&amp;quot;      &amp;quot;arr_time&amp;quot;       &amp;quot;sched_arr_time&amp;quot; &amp;quot;arr_delay&amp;quot;      &amp;quot;carrier&amp;quot;        [11] &amp;quot;flight&amp;quot;         &amp;quot;tailnum&amp;quot;        &amp;quot;origin&amp;quot;         &amp;quot;dest&amp;quot;           &amp;quot;air_time&amp;quot;       [16] &amp;quot;distance&amp;quot;       &amp;quot;hour&amp;quot;           &amp;quot;minute&amp;quot;         &amp;quot;time_hour&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;interacting-with-the-data-using-dplyr&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Interacting with the data using dplyr&lt;/h2&gt;
&lt;p&gt;Using &lt;code&gt;dplyr&lt;/code&gt;, we can easily preview a database. The &lt;code&gt;tbl()&lt;/code&gt; command creates a reference to the table.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;library(dplyr)
tbl(con, &amp;quot;flights&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Source:     table&amp;lt;flights&amp;gt; [?? x 19]
Database:   Microsoft SQL Server 12.00.4422[username@localhost/airontime]

    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin  dest
   &amp;lt;int&amp;gt; &amp;lt;int&amp;gt; &amp;lt;int&amp;gt;    &amp;lt;int&amp;gt;          &amp;lt;int&amp;gt;     &amp;lt;dbl&amp;gt;    &amp;lt;int&amp;gt;          &amp;lt;int&amp;gt;     &amp;lt;dbl&amp;gt;   &amp;lt;chr&amp;gt;  &amp;lt;int&amp;gt;   &amp;lt;chr&amp;gt;  &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;
1   2013     1     1      517            515         2      830            819        11      UA   1545  N14228    EWR   IAH
2   2013     1     1      533            529         4      850            830        20      UA   1714  N24211    LGA   IAH
3   2013     1     1      542            540         2      923            850        33      AA   1141  N619AA    JFK   MIA
4   2013     1     1      544            545        -1     1004           1022       -18      B6    725  N804JB    JFK   BQN
5   2013     1     1      554            600        -6      812            837       -25      DL    461  N668DN    LGA   ATL
6   2013     1     1      554            558        -4      740            728        12      UA   1696  N39463    EWR   ORD
7   2013     1     1      555            600        -5      913            854        19      B6    507  N516JB    EWR   FLL
8   2013     1     1      557            600        -3      709            723       -14      EV   5708  N829AS    LGA   IAD
9   2013     1     1      557            600        -3      838            846        -8      B6     79  N593JB    JFK   MCO
10  2013     1     1      558            600        -2      753            745         8      AA    301  N3ALAA    LGA   ORD
# ... with more rows, and 5 more variables: air_time &amp;lt;dbl&amp;gt;, distance &amp;lt;dbl&amp;gt;, hour &amp;lt;dbl&amp;gt;, minute &amp;lt;dbl&amp;gt;, time_hour &amp;lt;dttm&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt; The &lt;code&gt;tally()&lt;/code&gt; verb in &lt;code&gt;dplyr&lt;/code&gt; returns the row count.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;tally(tbl(con, &amp;quot;flights&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;/br&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Source:     lazy query [?? x 1]
Database:   Microsoft SQL Server 12.00.4422[username@localhost/airontime]

       n
   &amp;lt;int&amp;gt;
1 336776&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt; When used against a database, the previous function is converted to a SQL query that works with MS SQL Server. The &lt;code&gt;show_query()&lt;/code&gt; function displays the translation.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;show_query(tally(tbl(con, &amp;quot;flights&amp;quot;)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;&amp;lt;SQL&amp;gt; SELECT COUNT(*) AS &amp;quot;n&amp;quot; FROM &amp;quot;flights&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;bringing-it-all-together&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Bringing it all together&lt;/h2&gt;
&lt;p&gt;The last code sample shows how easy it is to find out what the top airlines are by number of flights. Additionally, we wish to see the names of the airlines and not their codes. The steps taken are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Start with the &lt;code&gt;flights&lt;/code&gt; table and join it to the &lt;code&gt;carrier&lt;/code&gt; table to obtain the airline name&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Group the data by the airline &lt;code&gt;name&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tally the total rows by airline &lt;code&gt;name&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Order the data by the resulting tallies in a descending order&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of these steps are translated into a SQL statement and processed inside the database. We do not need to import the tables into R memory at any time, we just use &lt;code&gt;dplyr&lt;/code&gt; to get the results quickly.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;tbl(con, &amp;quot;flights&amp;quot;) %&amp;gt;%
  left_join(tbl(con, &amp;quot;airlines&amp;quot;), by = &amp;quot;carrier&amp;quot;) %&amp;gt;%
  group_by(name) %&amp;gt;%
  tally %&amp;gt;%
  arrange(desc(n))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Source:     lazy query [?? x 2]
Database:   Microsoft SQL Server 12.00.4422[username@localhost/airontime]
Ordered by: desc(n)

# S3: tbl_dbi
                       name     n
                      &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;
 1    United Air Lines Inc. 58665
 2          JetBlue Airways 54635
 3 ExpressJet Airlines Inc. 54173
 4     Delta Air Lines Inc. 48110
 5   American Airlines Inc. 32729
 6                Envoy Air 26397
 7          US Airways Inc. 20536
 8        Endeavor Air Inc. 18460
 9   Southwest Airlines Co. 12275
10           Virgin America  5162
# ... with more rows&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;additional-resources&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Additional Resources&lt;/h1&gt;
&lt;p&gt;Here are links that will provide a deeper look into their respective subjects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://dplyr.tidyverse.org/&#34;&gt;dplyr’s Official Site&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/web/packages/DBI/vignettes/DBI-1.html&#34;&gt;Vignette of the DBI package&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;http://r4ds.had.co.nz/&#34;&gt;R for Data Science&lt;/a&gt; - An online book that covers how to use &lt;code&gt;dplyr&lt;/code&gt; and other like packages that together are called the &lt;code&gt;tidyverse&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;When we have only one method available to us, it is sometimes hard to see its inherent flaws. The method does what we need, so we do our best to overcome its shortfalls.&lt;/p&gt;
&lt;p&gt;Our hope is that highlighting the issues related to importing large amounts of data into R, and the advantages of using &lt;code&gt;dplyr&lt;/code&gt; to interact with databases, will be the encouragement needed to learn more about &lt;code&gt;dplyr&lt;/code&gt; and to give it a try.&lt;/p&gt;
&lt;p&gt;We plan to continue writing about the subject of databases using R in future posts. We will cover different aspects and techniques to get the most out of working with these two great technologies.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/05/17/databases-using-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>R for Enterprise: Understanding R’s Startup</title>
      <link>https://rviews.rstudio.com/2017/04/19/r-for-enterprise-understanding-r-s-startup/</link>
      <pubDate>Wed, 19 Apr 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/04/19/r-for-enterprise-understanding-r-s-startup/</guid>
      <description>
        
&lt;!-- BLOGDOWN-HEAD --&gt;
&lt;!-- /BLOGDOWN-HEAD --&gt;

&lt;!-- BLOGDOWN-BODY-BEFORE --&gt;
&lt;!-- /BLOGDOWN-BODY-BEFORE --&gt;
&lt;p&gt;R’s startup behavior is incredibly powerful. R sets environment variables, loads base packages, and understands whether you’re running a script, an interactive session, or even a build command.&lt;/p&gt;
&lt;p&gt;Most R users will never have to worry about changing R’s startup process. In fact, for portability and reproducibility of code, we recommend that users do not modify R’s startup profile. But, for system administrators, package developers, and R enthusiasts, customizing the launch process can provide a powerful tool and help avoid common gotchas. R’s behavior is thoroughly documented in &lt;a href=&#34;https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html&#34;&gt;R’s base documentation: “Initialization at Start of an R Session”&lt;/a&gt;. This post will elaborate on the official documentation and provide some examples. Read on if you’ve ever wondered how to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tell R about a &lt;a href=&#34;https://rstudio.github.io/packrat/custom-repos.html&#34;&gt;local CRAN-like repository&lt;/a&gt; to host and share R packages internally&lt;/li&gt;
&lt;li&gt;Use a different version of Python, e.g., to support a &lt;a href=&#34;http://rstudio.github.io/tensorflow&#34;&gt;Tensorflow&lt;/a&gt; project&lt;/li&gt;
&lt;li&gt;Define a proxy so R can reach the internet in locked-down environments&lt;/li&gt;
&lt;li&gt;Understand why &lt;a href=&#34;https://rstudio.github.io/packrat/&#34;&gt;Packrat&lt;/a&gt; creates a .Rprofile&lt;/li&gt;
&lt;li&gt;Automatically run code at the end of a session to capture and log &lt;code&gt;sesssionInfo()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We’ll also discuss how RStudio starts R. Spoiler: it’s a bit different than you might expect!&lt;/p&gt;
&lt;div id=&#34;rprofile-.renviron-and-r.site-oh-my&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;.Rprofile, .Renviron, and R*.site oh my!&lt;/h2&gt;
&lt;p&gt;R’s startup process follows three steps: starting R, setting environment variables, and sourcing profile scripts. In the last two steps, R looks for site-wide files and user- or project-specific files. The R documentation explains this process in detail.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-04-04-rs-quirky-and-powerful-startup_files/R_STARTUP.jpeg&#34; /&gt;

&lt;/div&gt;
&lt;div id=&#34;common-gotchas-and-tricks&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Common Gotchas and Tricks:&lt;/h3&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;The Renviron file located at &lt;code&gt;R_HOME/etc&lt;/code&gt; is unique and different from Renviron.site and the user-specific .Renviron files. &lt;em&gt;Do not edit the Renviron file!&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A site-wide file, and either a project file &lt;em&gt;or&lt;/em&gt; a user file, can be loaded at the same time. It is not possible to use both a user file and a project file. If the project file exists, it will be used instead of the user file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The environment files are plain-text files in the form &lt;code&gt;name=value&lt;/code&gt;. The profile files contain R code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To double check what environment variables are defined in the R environment, run &lt;code&gt;Sys.getenv()&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Do not place things in a profile that limit the reproducibility or portability of your code. For example, setting &lt;code&gt;options(stringsAsFactors = FALSE)&lt;/code&gt; is discouraged because it will cause your code to break in mysterious ways in other environments. Other bad ideas include: reading in data, loading packages, and defining functions.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id=&#34;where-to-put-what&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Where to put what?&lt;/h3&gt;
&lt;p&gt;The R Startup process is very flexible, which means there are different ways to achieve the same results. For example, you may be wondering which environment variables to set in .Renviron versus Renviron.site. (Don’t even think about calling &lt;code&gt;Sys.setenv()&lt;/code&gt; in a Rprofile…)&lt;/p&gt;
&lt;p&gt;A simple rule of thumb is to answer the question: “When else do I want this variable to be set?”&lt;/p&gt;
&lt;p&gt;For example, if you’re on a shared server and you want the settings every time you run R, place .Renviron or .Rprofile in your home directory. If you’re a system admin and you want the settings to take affect for every user, modify Renviron.site or Rprofile.site.&lt;/p&gt;
&lt;p&gt;The best practice is to scope these settings as narrowly as possible. That means if you can place code in .Rprofile instead of Rprofile.site you should! This practice complements the previous warnings about modifying R’s startup. The narrowest scope is to setup the environment within the code, not the profile.&lt;/p&gt;
&lt;div id=&#34;quiz&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Quiz&lt;/h4&gt;
&lt;p&gt;What is the best way to modify the path? The answer depends on the desired scope for the change.&lt;/p&gt;
&lt;p&gt;For example, in an R project using the &lt;a href=&#34;http://rstudio.github.io/tensorflow&#34;&gt;Tensorflow&lt;/a&gt; package, I might want R to use the version of Python installed in &lt;code&gt;/usr/local/bin&lt;/code&gt; instead of &lt;code&gt;/usr/bin&lt;/code&gt;. This change is best implemented by reordering the &lt;code&gt;PATH&lt;/code&gt; using &lt;code&gt;PATH=/usr/local/bin:${PATH}&lt;/code&gt;. This is a change I only want for this project, so I’d place the line in a .Renviron file in the project directory.&lt;/p&gt;
&lt;p&gt;On the other hand, I may want to add the JAVA SDK to the path so that any R session can use the &lt;code&gt;rJava&lt;/code&gt; package. To do so, I’d add a line like &lt;code&gt;PATH=${PATH}:/opt/jdk1.7.0_75/bin:/opt/jdk1.7.0_75/jre/bin&lt;/code&gt; to Renviron.site.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;r-startup-in-rstudio&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;R Startup in RStudio&lt;/h2&gt;
&lt;p&gt;A common misconception is that R and RStudio are one in the same. RStudio runs on top of R and requires R to be installed separately. If you look at the process list while running RStudio, you’ll see at least two different processes: usually one called RStudio and one called rsession.&lt;/p&gt;
&lt;p&gt;RStudio starts R a bit differently than running R from the terminal. Technically, RStudio doesn’t “start” R, it uses R as a library, either as a DLL on Windows or as a shared object on Mac and Linux.&lt;/p&gt;
&lt;p&gt;The main difference is that the script wrapped around R’s binary is not run, and any customization to the script will not take affect. To see the script try:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;cat $(which R)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For most people, this difference won’t be noticeable. Any settings in the startup files will still take affect. For user’s that build R from source, it is important to include the &lt;code&gt;--enable-R-shlib&lt;/code&gt; flag to ensure R also builds the shared libraries used by RStudio.&lt;/p&gt;
&lt;div id=&#34;r-startup-in-rstudio-server-pro&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;R Startup in RStudio Server Pro&lt;/h3&gt;
&lt;p&gt;RStudio Server Pro acts differently from R and the open-source version of RStudio. Prior to starting R, RStudio Server Pro uses PAM to create a session, and sources the &lt;code&gt;rsession-profile&lt;/code&gt;. In addition, RStudio Server Pro launches R from bash, which means settings defined in the user’s bash profile are available.&lt;/p&gt;
&lt;p&gt;In short, RStudio Server Pro provides more ways to customize the environment used by R. You might ask why you’d ever want &lt;em&gt;more&lt;/em&gt; options. Recall our rule of thumb: “When else do I want this variable to be set?”&lt;/p&gt;
&lt;p&gt;In server environments, there are often environment variables set every time a user interacts with the server. These environment variables are placed in a user’s bash profile by a system admin. Normally R wouldn’t pick up these settings. RStudio Server Pro allows R to make use of the work the system admin has already done by picking up these profiles.&lt;/p&gt;
&lt;p&gt;Likewise, there may be some actions that take place on the server when a user logs in that have to happen before R starts. For example, a Kerberos ticket used by the R session to access a data source must exist before R is started. RStudio Server Pro uses PAM sessions to enable these actions.&lt;/p&gt;
&lt;p&gt;There may also be actions or variables that should only be defined for RStudio, and not any other time R is run. To facilitate this use case, RStudio Server Pro provides the &lt;code&gt;rsession-profile&lt;/code&gt;. For example, if your environment makes use of RStudio Server Pro’s support for multiple versions of R, you’d place any environment variables that should defined for all versions of R inside of &lt;code&gt;rsession-profile&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;examples&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Examples:&lt;/h2&gt;
&lt;div id=&#34;define-proxy-settings-in-renviron.site&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Define proxy settings in Renviron.site&lt;/h3&gt;
&lt;p&gt;Renviron.site is commonly used to tell R how to access the internet in environments with restricted network access. Renviron.site is used so the settings take affect for all R sessions and users. For example,&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;http_proxy=http://proxy.mycompany.com&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This article contains more details on how to &lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-or-HTTPS-Proxy&#34;&gt;configure RStudio to use a proxy&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;add-a-local-cran-repository-for-all-users&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Add a local CRAN repository for all users&lt;/h3&gt;
&lt;p&gt;Organizations with offline environments often use local CRAN repositories instead of installing packages directly from a CRAN mirror. Local CRAN repositories are also useful for sharing internally developed R packages among colleagues.&lt;/p&gt;
&lt;p&gt;To use a local CRAN repository, it is necessary to add the repository to R’s list of repos. This setting is important for all sessions and users, so &lt;code&gt;Rprofile.site&lt;/code&gt; is used.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;old_repos &amp;lt;- getOption(&amp;quot;repos&amp;quot;)
local_CRAN_URI &amp;lt;- paste0(&amp;quot;file://&amp;quot;, normalizePath(&amp;quot;path_to_local_CRAN_repo&amp;quot;))
options(repos = c(old_repos, my_repo = lcoal_CRAN_URI))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;More information on setting up a local CRAN repository is available &lt;a href=&#34;https://rstudio.github.io/packrat/custom-repos.html&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;record-sessioninfo-automatically&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Record sessionInfo automatically&lt;/h3&gt;
&lt;p&gt;Reproducibility is a critical part of any analysis done in R. One challenge for reproducible scripts and documents is tracking the version of R packages used during an analysis.&lt;/p&gt;
&lt;p&gt;The following code can be added to a .Rprofile file within an RStudio project to automatically log the &lt;code&gt;sessionInfo()&lt;/code&gt; after every RStudio session.&lt;/p&gt;
&lt;p&gt;This log could be referenced if an analysis needs to be run at a later date and fails due to a package discrepancy.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;.Last &amp;lt;- function(){
  if (interactive()) {
    
    ## check to see if we&amp;#39;re in an RStudio project (requires the rstudioapi package)
    if (!requireNamespace(&amp;quot;rstudioapi&amp;quot;))
      return(NULL)
    pth &amp;lt;- rstudioapi::getActiveProject()
    if (is.null(pth))
      return(NULL)
    
    ## append date + sessionInfo to a file called sessionInfoLog
    cat(&amp;quot;Recording session info into the project&amp;#39;s sesionInfoLog file...&amp;quot;)
    info &amp;lt;-  capture.output(sessionInfo())
    info &amp;lt;- paste(&amp;quot;\n----------------------------------------------&amp;quot;,
                  paste0(&amp;#39;Session Info for &amp;#39;, Sys.time()),
                  paste(info, collapse = &amp;quot;\n&amp;quot;),
                  sep  = &amp;quot;\n&amp;quot;)
    f &amp;lt;- file.path(pth, &amp;quot;sessionInfoLog&amp;quot;)
    cat(info, file = f, append = TRUE)
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;automatically-turn-on-packrat&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Automatically turn on packrat&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;http://rstudio.github.io/packrat&#34;&gt;Packrat&lt;/a&gt; is an automated tool for package management and reproducible research. Packrat acts as a super-set of the previous example. When a user opts in to using packrat with an RStudio project, one of the things packrat automatically does is create (or modify) a project-specific .Rprofile. Packrat uses the .Rprofile to ensure that each time the project opens, Packrat mode is turned on.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;to-wrap-up&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;To Wrap Up&lt;/h2&gt;
&lt;p&gt;R’s startup behavior can be complex, sometimes quirky, but always powerful. At RStudio, we’ve worked hard to ensure that R starts and stops correctly whether you’re running RStudio Desktop, serving a Shiny app on shinyapps.io, rendering a report in RStudio Connect, or supporting hundreds of users and thousands of sessions in a load balanced configuration of RStudio Server Pro.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/04/19/r-for-enterprise-understanding-r-s-startup/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>R Markdown for the Enterprise</title>
      <link>https://rviews.rstudio.com/2017/01/25/r-markdown-for-the-enterprise/</link>
      <pubDate>Wed, 25 Jan 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/01/25/r-markdown-for-the-enterprise/</guid>
      <description>
        

&lt;p&gt;In the corporate world, spreadsheets and PowerPoint presentations still dominate as the tools used for analyzing and sharing information. So, it is not at all surprising that even when business analysts use R for the analytical heavy lifting, they frequently revert to using spreadsheets and slide decks to share their results. This may seem like the easiest way to communicate with colleagues, but any modestly complicated project is likely to be error-prone and generate hours of unnecessary rework.&lt;/p&gt;

&lt;p&gt;An R-savvy analyst can harness R Markdown to develop reproducible business reporting and information sharing workflows in any business organization; all it takes is a little effort to master some basic R document preparation tools.&lt;/p&gt;

&lt;p&gt;In this post, I would like to examine a scenario that represents some experiences I had as an analytics professional.&lt;/p&gt;

&lt;h2 id=&#34;the-report-is-great-but-scenario&#34;&gt;“The report is great but…” Scenario&lt;/h2&gt;

&lt;p&gt;&lt;/BR&gt;
&lt;img src=&#34;/post/2017-01-23-r-markdown-for-the-enterprise_files/new_analysis.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;A new R analysis is delivered in a PowerPoint presentation, and everyone thinks that the insights are very valuable. They all want more associates to see it, so almost immediately, the following three requests are made:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;“&amp;hellip;we need it broken out by”&lt;/strong&gt; - The presentation needs to be split by a specific segment. The segment is normally geographical or managerial in nature.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;“&amp;hellip;they shouldn’t see each others data”&lt;/strong&gt; - Since the results are not published in a central publishing platform, it is necessary to create multiple versions of the same report in order to secure the contents.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;“&amp;hellip;we need it every”&lt;/strong&gt; - Satisfying requests 1 and 2 may not be too overwhelming if this were meant as a one-time analysis, but usually the analysis and its distribution need to be repeated on a regular interval.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because we exported the findings into a presentation, sharing results becames more complex and time-consuming if we wish to satisfy the new requirements.&lt;/p&gt;

&lt;h2 id=&#34;how-can-r-markdown-help&#34;&gt;How can R Markdown help?&lt;/h2&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-01-23-r-markdown-for-the-enterprise_files/rmarkdown.png&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;R Markdown combines the creation and sharing steps. The three requests can be satisfied using the following features of R Markdown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Break out the reports&lt;/strong&gt; - Using R Markdown&amp;rsquo;s &lt;a href=&#34;http://rmarkdown.rstudio.com/developer_parameterized_reports.html&#34;&gt;Parameterized Reports&lt;/a&gt; feature, we can easily create documents for each required segment.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automate the file creation&lt;/strong&gt; - R Markdown can be run from code, so a separate R script can iteratively run the R Markdown and pass a different parameter for each iteration.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create the slides inside R&lt;/strong&gt; - Take advantage of &lt;a href=&#34;http://rmarkdown.rstudio.com/ioslides_presentation_format.html&#34;&gt;R Markdown Presentation&lt;/a&gt; output to create a slide deck.  Without having to learn a new scripting language, we can code the slide deck and use the same Parameter feature to automate its creation.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Keep the interactivity&lt;/strong&gt; - In many cases, the end user needs a level of interactivity with the report. This interactivity can be achieved by using &lt;a href=&#34;http://www.htmlwidgets.org/&#34;&gt;htmlwidgets&lt;/a&gt; inside the R Markdown document.  For example, the &lt;a href=&#34;http://www.htmlwidgets.org/showcase_leaflet.html&#34;&gt;Leaflet&lt;/a&gt; widget can be used for interactive maps, the &lt;a href=&#34;http://www.htmlwidgets.org/showcase_datatables.html&#34;&gt;Data Table&lt;/a&gt; widget for interactive tables, and the &lt;a href=&#34;http://www.htmlwidgets.org/showcase_dygraphs.html&#34;&gt;dygraphs&lt;/a&gt; widget for interactive time series charting.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;additional-benefits&#34;&gt;Additional benefits&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Accessible and easy to open&lt;/strong&gt; - Any alternative tool needs to be as accessible as the current spreadsheet and presentation tool. R Markdown can output results in &lt;a href=&#34;http://rmarkdown.rstudio.com/formats.html&#34;&gt;HTML, PDF, and Word&lt;/a&gt;. Additionally, the Presentation output uses the highly accessible HTML5 format.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt; - Copying-and-pasting files, text, or images inevitably introduces human error. In R, data import, wrangling and modeling are already automated, so why not take it to its natural conclusion by using R Markdown to automate the presentation end of the process, as well?&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Creating a dashboard is easy&lt;/strong&gt; - In a spreadsheet, this is normally accomplished with a combination of pivot tables and graphs. R Markdown uses &lt;a href=&#34;http://rmarkdown.rstudio.com/flexdashboard/&#34;&gt;flexdashboard&lt;/a&gt; to create visually striking dashboards that are self-contained.  By using this in combination with htmlwidgets, the audience gains access to a very powerful tool.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is an example of a &lt;strong&gt;live&lt;/strong&gt; parameterized R Markdown flexdashboard based on stock data:
&lt;/BR&gt;
&lt;/BR&gt;
&lt;center&gt;&lt;embed src=&#34;http://colorado.rstudio.com:3939/content/239/parameterized-flexdashboard-stock.html&#34;, width = &#34;800&#34;, height=&#34;400&#34;&lt;/embed&gt;&lt;/center&gt;&lt;/p&gt;

&lt;h2 id=&#34;how-to-get-started&#34;&gt;How to get started&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;R Markdown is a free package&lt;/strong&gt;, so if you have R (and ideally RStudio), you can start using it today. Also, there are a lot of resources available for learning how to use R Markdown; the package’s &lt;a href=&#34;http://rmarkdown.rstudio.com/lesson-1.html&#34;&gt;official website&lt;/a&gt; is a good place to start.&lt;/p&gt;

&lt;p&gt;Here is a sample script that uses Parameterized R Markdown to create a slide deck based on a selected stock. In this case we used Google:&lt;/p&gt;

&lt;script src=&#34;https://gist.github.com/edgararuiz/0ad9a1cc3586b99d2ac57186d90e1aa7.js&#34;&gt;&lt;/script&gt;

&lt;p&gt;And here is the resulting deck. Press the left arrow key to see the next slide:
&lt;BR&gt;
&lt;center&gt;&lt;embed src=&#34;http://colorado.rstudio.com:3939/content/250/Sample_Presentation.html&#34;, width = &#34;800&#34;, height=&#34;400&#34;, frameborder=&#34;1&#34;&gt;&lt;/embed&gt;&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;This simple script creates an nice-looking and interactive deck that needs no manual intervention if the data needs to be refreshed, and one small parameter change if a different stock is to be selected.&lt;/p&gt;

&lt;h2 id=&#34;final-thought&#34;&gt;Final thought&lt;/h2&gt;

&lt;p&gt;We encourage you to try R Markdown yourself. The “start small and then build big” strategy rarely fails, so you could begin by automating a simple report first, and then start taking advantage of more advanced features as you grow comfortable with the tool.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/01/25/r-markdown-for-the-enterprise/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>R for Enterprise: How to Scale Your Analytics Using R</title>
      <link>https://rviews.rstudio.com/2016/12/21/r-for-enterprise-how-to-scale-your-analytics-using-r/</link>
      <pubDate>Wed, 21 Dec 2016 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2016/12/21/r-for-enterprise-how-to-scale-your-analytics-using-r/</guid>
      <description>
        

&lt;p&gt;At RStudio, we work with many companies interested in scaling R. They typically want to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How can R scale for big data or big computation?&lt;/li&gt;
&lt;li&gt;How can R scale for a growing team of data scientists?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post provides a framework for answering both questions.&lt;/p&gt;

&lt;h1 id=&#34;scaling-r-for-big-data-or-big-computation&#34;&gt;Scaling R for Big Data or Big Computation&lt;/h1&gt;

&lt;p&gt;The first step to scaling R is understanding what class of problems your organization faces. At RStudio, we think of three use cases: data extraction, embarrassingly parallel problems, and analysis on the whole. Garrett Grolemund hosted an excellent webinar on &lt;a href=&#34;https://www.rstudio.com/resources/webinars/working-with-big-data-in-r/&#34;&gt;Big Data in R&lt;/a&gt;, in which he outlined the differences in these three cases.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://www.rstudio.com/wp-content/uploads/2016/12/scalingR.001.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;DISCLAIMER: These three cases are not exhaustive, nor are most problems easily categorized into one of the three classes. But, when scoping a scaled R environment, it is imperative to understand which class needs to be enabled. Your organization might have all three cases, or it might have only one or two.&lt;/p&gt;

&lt;h2 id=&#34;case-1-compute-on-the-data-extract&#34;&gt;Case 1: Compute on the data extract&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Example: I want to build a predictive model. I only need a few dozen features and a three-month window to build a good model. I can also aggregate my data from the transaction level to the user level. The result is a much smaller data set that I can use to train my model in R.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Computing on data extracts is arguably the most common use case; an analyst will run a query to pull a subset of data from an external source into R. If your data extracts are large, you can run R on a server. At RStudio, we recommend using the server version of the IDE (either open-source or professional), but there are many ways to use R interactively on a server.&lt;/p&gt;

&lt;h2 id=&#34;case-2-compute-on-the-parts&#34;&gt;Case 2: Compute on the parts&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Example: When I worked at a national lab (NREL), we validated fuel economy models against real-world datasets. Each dataset had hundreds of recorded trips from individual vehicles. While the total dataset was TBs, each individual trip was a few hundred MBs. We ran independent models in parallel against each trip. Each of these jobs added a single line to a results file. Then we aggregated the results with a reduction step (taking a weighted mean). By using an HPC system, a task that would take weeks to run sequentially was completed in a few hours.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Compute on the parts happens when the analyst needs to run the same analysis over many subsets of data, or needs to run the same analysis many times, and each model is independent of the others.&lt;/p&gt;

&lt;p&gt;Examples include cross validation, sensitivity analysis, and model scoring. These problems are called: &amp;ldquo;embarrassingly parallel&amp;rdquo; (often a misnomer, since scaling for embarrassingly parallel problems is rarely embarrassingly simple).&lt;/p&gt;

&lt;h3 id=&#34;compute-on-the-parts-with-a-single-machine&#34;&gt;&lt;strong&gt;Compute on the parts with a single machine&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;By default, R is single threaded; however, you can also use R packages to do parallel processing on a multicore server or a multicore desktop. Local parallelization is facilitated by packages like parallel, snow, foreach, etc. These packages parallelize your R commands by running them on independent threads in multicore processors. Alternatively, low-level parallelization can be facilitated with packages like Rcpp and RcppParallel. These packages facilitate the interaction of R with C++.&lt;/p&gt;

&lt;h3 id=&#34;compute-on-the-parts-with-a-high-performance-cluster-hpc&#34;&gt;&lt;strong&gt;Compute on the parts with a high performance cluster (HPC)&lt;/strong&gt;&lt;/h3&gt;

&lt;p&gt;In some cases, R users have access to High Performance Computing environments. These environments are becoming more readily available with technologies like Docker Swarm. An R user will test R code interactively (on an edge node or their local machine), and then submit the R code to the cluster as a series of batch jobs. Each batch job will call R on a slave node.&lt;/p&gt;

&lt;p&gt;Note that RStudio, as an interactive IDE, may run on an edge node of the cluster or on a local machine. RStudio does not run on the slave nodes. Only R is run on the slave nodes and is executed in batch (not interactively).&lt;/p&gt;

&lt;p&gt;One challenge faced by R users is knowing how to submit batch jobs to the cluster, tracking their progress, and re-running jobs that fail. One solution is the batchtools package. This package abstracts the details of job submission and tracking into a series of R function calls. The R functions, in turn, use generic templates provided by system administrators. Parallel R with Batch Jobs provides a nice overview. Some analysts have created Shiny applications that leverage these functions to provide an interactive Job Management interface from within RStudio!&lt;/p&gt;

&lt;p&gt;One challenge faced by system administrators is ensuring the dependencies for the batch R script are available on all the slave nodes. Dependencies include: data access, the correct version of R, and correct versions of R packages. One solution is to store the R binaries and package libraries on shared storage (accessible by every slave node), alongside shared data and the project&amp;rsquo;s read/write scratch space.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://www.rstudio.com/wp-content/uploads/2016/12/scalingR.003.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Case 2: Compute on the parts. Technologies: parallel, snow, RcppParallel, LSF, &lt;a href=&#34;https://slurm.schedmd.com/&#34;&gt;SLURM&lt;/a&gt;, &lt;a href=&#34;http://www.adaptivecomputing.com/products/open-source/torque/&#34;&gt;Torque&lt;/a&gt;, Docker Swarm&lt;/p&gt;

&lt;h2 id=&#34;case-3-compute-on-the-whole&#34;&gt;Case 3: Compute on the whole&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Example: A recommendation engine for movies that is robust to &amp;ldquo;unique&amp;rdquo; tastes. The entire domain space needs to be considered all at once. Image classification falls into this class; the weights for a complex neural network need to be fit against the entire training set. This class of problem is the most difficult to solve, and has generated the most hype. Sometimes analysts will purchase, use, and modify ready-made implementations of these algorithms.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Computing on the whole happens when the analyst needs to run a model against an entire dataset, and the model is not embarrassingly parallel or the data does not fit on a single machine. Typically, the analyst will leverage specialized tools such as MapReduce, SQL, Spark, H20.ai, and others. R is used as an orchestration layer. Orchestration involves using R to run jobs in other languages. R has a long history of orchestrating other languages to accomplish computationally intensive tasks. See &lt;a href=&#34;https://www.amazon.com/Extending-Chapman-Hall-John-Chambers-ebook/dp/B01GRHCLG0/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1481307605&amp;amp;sr=1-1&amp;amp;keywords=extending+R+john+chambers&#34;&gt;Extending R&lt;/a&gt; by John Chambers.&lt;/p&gt;

&lt;p&gt;When orchestrating a case 3 problem, the R analyst will use R to direct an external computation engine that does the heavy lifting. This approach is very similar case 1. For example, Oracle&amp;rsquo;s Big Data Appliance and Microsoft SQL Server 2016 with R Server both include routines for fitting models in the database. These routines are accessible as specialized R functions. These functions are used in addition to case 1 extracts created with traditional SQL queries through RODBC or dplyr.&lt;/p&gt;

&lt;p&gt;Another example is Apache Spark. The R analyst will work from an edge node running R. (The open-source or professional RStudio Server can facilitate this interactive use.) In R, the user will call functions from a specialized R package, which in turn accesses Spark&amp;rsquo;s data processing and machine learning routines. One available R package is sparklyr.&lt;/p&gt;

&lt;p&gt;Note that the machine learning routines are not running in R. The analyst uses these routines as black boxes that can be pieced together into pipelines, but not modified directly.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://www.rstudio.com/wp-content/uploads/2016/12/scalingR.004.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Case 3: Compute on the whole. Technologies: Hadoop, Spark, Tensorflow, In-DB computing (RevoScaleR, OracleR, Aster, etc)&lt;/p&gt;

&lt;h1 id=&#34;multiple-users-scaling-r-for-teams&#34;&gt;Multiple Users: Scaling R for Teams&lt;/h1&gt;

&lt;p&gt;As organizations grow, another concern is how to scale R for a team of data scientists. This type of scale is orthogonal to the previous topic. Scaling for a team addresses questions like: How can analysts share their work? How can compute resources be shared? How does R integrate with the IT landscape? In many cases, these questions need to be answered even if the R environment doesn&amp;rsquo;t need to scale for big data.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://www.rstudio.com/wp-content/uploads/2016/12/scalingR.002.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Scaling R for teams. Technologies: Version control (Git, SVN), miniCRAN, RStudio Server Pro&lt;/p&gt;

&lt;p&gt;Open-source packages can address many of these concerns. For example, many organizations use packrat and miniCRAN to manage R&amp;rsquo;s package ecosystem. The use of version control become increasingly important as teams grow and work together. Many companies will create internal R packages to facilitate sharing things like data access scripts, ggplot2 themes, and R Markdown templates. Airbnb provides a &lt;a href=&#34;https://medium.com/airbnb-engineering/using-r-packages-and-education-to-scale-data-science-at-airbnb-906faa58e12d#.ftpmn6tpn&#34;&gt;detailed example&lt;/a&gt;. For more information on version control, packrat, and packages, see the webinar series &lt;a href=&#34;https://www.rstudio.com/resources/webinars/rstudio-essentials-webinar-series-part-1/&#34;&gt;RStudio Essentials&lt;/a&gt;. At RStudio, we recommend using RStudio Server Pro because its features such as load balancing, multi-session support, collaborative editing and auditing are designed specifically to support a large numbers of user sessions.&lt;/p&gt;

&lt;h1 id=&#34;wrap-up&#34;&gt;Wrap Up&lt;/h1&gt;

&lt;p&gt;Whether you need to compute on big data, grow your analytic team, or do both, R has tools to help you succeed. As more companies look to data to drive business decisions, creating a scaleable R environment will be a critical step towards success. Many of the topics in this blog deserve their own posts. However, understanding and discussing these different types of scale can help create the correct roadmap. If you&amp;rsquo;ve created an R environment at scale, we&amp;rsquo;d love to hear from you. In a later post, we&amp;rsquo;ll address another outstanding question: after I scale the R platform, how do I scale the distribution of results and insights to non-R users?&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2016/12/21/r-for-enterprise-how-to-scale-your-analytics-using-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Make R a Legitimate Part of Your Organization</title>
      <link>https://rviews.rstudio.com/2016/11/16/make-r-a-legitimate-part-of-your-organization/</link>
      <pubDate>Wed, 16 Nov 2016 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2016/11/16/make-r-a-legitimate-part-of-your-organization/</guid>
      <description>
        

&lt;h1 id=&#34;how-r-enters-through-the-back-door&#34;&gt;How R Enters Through the Back Door&lt;/h1&gt;

&lt;p&gt;In many organizations, R enters through the back door when analysts download the free software and install it on their local workstations.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Jamie has been an avid R programmer since college. When she takes a new job at a large corporation, she finds that she is the only analyst in the company who knows and uses R. In addition to the other tools her company gives her, Jamie decides to download R onto her laptop. She installs R without consulting her manager or IT. With R she can pull data, build models, and create nice reports. Her manager knows nothing about R, but goes along with it because Jamie is happy and doing quality work. Her co-workers, ever curious about analytics, also download R and learn from Jamie. Before long, R becomes an important part of the day-to-day operations of her team. When Jamie starts hiring new analysts, she lists R as a required skill. Now Jamie wants to &amp;ldquo;go big&amp;rdquo; by putting R on the company servers so she can scale her analyses, socialize her results, and integrate her apps. Unfortunately, she finds she is unable to get the resources she needs because R is not officially recognized in the company.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Whether you are an analyst wanting to do more, a stakeholder wanting a competitive analytic platform, or an IT professional wanting a controlled and secured environment, you should make R a legitimate part of your organization and get the resources needed to support it.&lt;/p&gt;

&lt;h1 id=&#34;bringing-r-through-the-front-door&#34;&gt;Bringing R Through the Front Door&lt;/h1&gt;

&lt;p&gt;All organizations have a process for onboarding software through official channels. If you are part of a large organization, your IT department probably has a review board whose purpose is to review and make decisions about new tools.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The last time I introduced R into an organization, I created a presentation explaining why R should be supported by IT. My proposal was presented to the IT review board. It included slides on cost savings, strategic advantages, hardware and software requirements, and more. It took a few iterations to get through all of the requirements, but in the end, the board approved R as an additional analytic standard, paving the way for future growth.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;iframe src=&#34;//www.slideshare.net/slideshow/embed_code/key/slRGUZHuzIA5ld&#34; width=&#34;100%&#34; height=&#34;485&#34; frameborder=&#34;0&#34; marginwidth=&#34;0&#34; marginheight=&#34;0&#34; scrolling=&#34;no&#34; style=&#34;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&#34; allowfullscreen=&#34;&#34;&gt;&lt;/iframe&gt;

&lt;p&gt;The review board is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reviewing new software initiatives and approve expenditures.&lt;/strong&gt; Does this tool increase or decrease costs? What line items will this go under? What is the long-term cost projected to be? What is the cost of support?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Supporting the organization&amp;rsquo;s strategic vision.&lt;/strong&gt; Does the tool help satisfy a customer need? Does it help us remain competitive? Can it help us attract better talent? Does it make existing systems more efficient and agile?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Complying with existing systems architectures.&lt;/strong&gt; Does the tool integrate with other supported tools? Will it be used in development and/or production? Does it duplicate the capabilities of other supported tools?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Managing risk and ensure security.&lt;/strong&gt; Does the tool comply with our formal security policies? Do the software licenses meet our legal requirements?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Defining roles and responsibilities for support.&lt;/strong&gt; What groups own the tool? What support is offered with the tool? What internal resources will be required to maintain it? Who will provide training?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of R&amp;rsquo;s popularity and explosive growth, many organizations are friendly and even eager to bring R through the front door. If your organization is friendly toward R but has not made it an official part of the organization, a formal review process is still valuable. The review process gives IT a formal stake in the ground when it comes to supporting R for the long term. It also makes future decisions about growth and investment much easier.&lt;/p&gt;

&lt;h1 id=&#34;the-ubiquity-of-open-source-software&#34;&gt;The Ubiquity of Open Source Software&lt;/h1&gt;

&lt;p&gt;Here at RStudio, we work with customers every day who want to bring R through the front door. One complaint we sometimes hear is that IT does not want to support open-source software (OSS). The reality is that most organizations are already supporting OSS. The &lt;a href=&#34;https://www.blackducksoftware.com/2016-future-of-open-source&#34;&gt;2016 future of open source survey&lt;/a&gt; estimated that &lt;a href=&#34;http://www.zdnet.com/article/its-an-open-source-world-78-percent-of-companies-run-open-source-software/&#34;&gt;78% of companies run part or all of its operations on OSS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Most organizations know about R by now. IEEE Spectrum ranked R fifth in the &lt;a href=&#34;http://spectrum.ieee.org/computing/software/the-2016-top-programming-languages&#34;&gt;top programming languages of 2016&lt;/a&gt; making it one of the most commonly used analytic tools in industry today.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;https://www.rstudio.com/wp-content/uploads/2016/11/top_languages2016.png&#34; alt=&#34;Top Languages 2016&#34; /&gt;&lt;/p&gt;

&lt;p&gt;Some organizations struggle to standardize on R due to a lack of management and governance around OSS. At the same time, organizations may neglect R on user workstations, thereby increasing security, legal, and operational risks. It is riskier to leave R unmanaged than it is to bring it through the front door.&lt;/p&gt;

&lt;h1 id=&#34;getting-the-resources-you-need&#34;&gt;Getting the Resources You Need&lt;/h1&gt;

&lt;p&gt;Passing the review board should get you resources. You&amp;rsquo;ll need physical resources and human resources to build, scale, and maintain an R environment.&lt;/p&gt;

&lt;h2 id=&#34;physical-resources&#34;&gt;Physical Resources&lt;/h2&gt;

&lt;p&gt;Investing resources in R is a great way to legitimize it. An organization that allocates budget and people into R will also expect to see value from that investment. In effect, spending money on R is legitimizing it. Some resources you need might include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A budget or line items in a budget&lt;/li&gt;
&lt;li&gt;Physical or virtual hardware&lt;/li&gt;
&lt;li&gt;Software tools and licenses&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;human-resources&#34;&gt;Human Resources&lt;/h2&gt;

&lt;p&gt;The type of IT support you get will depends on how your IT organization is structured. You might have a single admin designated to support R, or you might have an entire support team. You will probably want to define the following roles and responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An R advocate who promotes R&lt;/li&gt;
&lt;li&gt;An executive sponsor who supports the R users&lt;/li&gt;
&lt;li&gt;A designated R admin or R support team&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your IT support will manage your environment, so getting the right people and policies in place is critical. Generally speaking, having a point of contact in IT — a name and a face — is a good thing. Your admin support should be familiar with the Linux operating system. Training admins on R-related issues is also helpful.&lt;/p&gt;

&lt;h1 id=&#34;adopting-r&#34;&gt;Adopting R&lt;/h1&gt;

&lt;p&gt;After you bring R through the front door and it becomes part of the organization, you should have a vision and path for growth. You should also have resources to support that growth. So what are you going to do with your newfound resources?&lt;/p&gt;

&lt;p&gt;The next step is adoption. Adoption means R is self-sustaining. The goal is for your organization to fully embrace R as an integral part of your business. The survival of R should not depend on one or two R advocates any more than SQL depends on one or to DBAs. Instead, there should be systems, resources, and people in place that will sustain the growth of R.&lt;/p&gt;

&lt;p&gt;Making R a legitimate part of your organization and getting the resources you need to support it is the foundation for future growth and adoption.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2016/11/16/make-r-a-legitimate-part-of-your-organization/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
