<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>pins on R Views</title>
    <link>https://rviews.rstudio.com/tags/pins/</link>
    <description>Recent content in pins on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 17 Oct 2019 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/pins/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Productionizing Shiny and Plumber with Pins</title>
      <link>https://rviews.rstudio.com/2019/10/17/deploying-data-with-pins/</link>
      <pubDate>Thu, 17 Oct 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/10/17/deploying-data-with-pins/</guid>
      <description>
        


&lt;p&gt;Producing an API that serves model results or a Shiny app that displays the results of an analysis requires a collection of intermediate datasets and model objects, all of which need to be saved. Depending on the project, they might need to be reused in another project later, shared with a colleague, used to shortcut computationally intensive steps, or safely stored for QA and auditing.&lt;/p&gt;
&lt;p&gt;Some of these &lt;em&gt;should&lt;/em&gt; be saved in a data warehouse, data lake, or database, but write access to an appropriate database isn’t always available. In other cases, especially with models, it may not be clear where they should be saved at all.&lt;/p&gt;
&lt;p&gt;Enter &lt;a href=&#34;https://rstudio.github.io/pins/&#34;&gt;&lt;code&gt;pins&lt;/code&gt;&lt;/a&gt;, a new R package written by &lt;a href=&#34;https://github.com/javierluraschi&#34;&gt;Javier Luraschi&lt;/a&gt;. &lt;code&gt;pins&lt;/code&gt; makes it easy to save (pin) R objects including datasets, models, and plots to a central location (board), and access them easily from both R and Python. Pins make it much easier to create production-ready R assets by simplifying the storage and updating of intermediate data artifacts.&lt;/p&gt;
&lt;div id=&#34;problems-you-can-put-a-pin-in&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Problems you can put a pin in&lt;/h2&gt;
&lt;p&gt;In general, pins are a good substitute for saving objects alongside analysis code as &lt;code&gt;.csv&lt;/code&gt; or &lt;code&gt;.rds&lt;/code&gt; objects. Especially when the object is reused several times or updated independently from the rest of the analysis, a pin is probably a better solution than saving a file with your code.&lt;/p&gt;
&lt;p&gt;In this article, I’ll create a predictive model, programmatically serve predictions via a &lt;a href=&#34;https://www.rplumber.io/&#34;&gt;Plumber API&lt;/a&gt;, and visualize those predictions in a Shiny app. Along the way, I’ll make extensive use of pins for important parts of my workflow.&lt;/p&gt;
&lt;p&gt;The model will predict future availability of bicycles at &lt;a href=&#34;https://www.capitalbikeshare.com/&#34;&gt;Capital Bikeshare&lt;/a&gt; docks, which provide short-term bicycle rentals in and around Washington DC. Capital Bikeshare makes data on the current availability of bikes at each station available via a public API.&lt;/p&gt;
&lt;p&gt;I’m going to make model predictions available in production by providing programmatic access to the model via an API and to humans via a Shiny app. All of the code for this demo is available on &lt;a href=&#34;https://github.com/rstudio/bike_predict/&#34;&gt;Github&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To get there, I’m going to follow this analysis workflow:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Ingest metadata about the stations, like name and location, from the bike data API.&lt;/li&gt;
&lt;li&gt;Combine the station metadata with raw data on bike availability from the data lake to create an analysis dataset.&lt;/li&gt;
&lt;li&gt;Train and deploy a model of future bike availability.&lt;/li&gt;
&lt;li&gt;Serve model predictions via a Plumber API.&lt;/li&gt;
&lt;li&gt;Visualize model predictions via a Shiny app.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Along the way, here are three specific times that a pin is going to come in handy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintaining the metadata table of station IDs and details. Especially since I’m reusing this table in multiple assets in this project, having it in a pin is a sure way to know it’s up-to-date.&lt;/li&gt;
&lt;li&gt;Saving the final analysis dataset. In this case, the raw Capitol Bikeshare data is being imported with a completely separate ETL script, and I don’t want to write my analysis dataset into a data lake. Without a separate database for analysis data, a pin is my best option.&lt;/li&gt;
&lt;li&gt;Deploying the model to serve the predictions. Saving the model separately from the API makes it easy to decouple API and model versions and to retrain the model and redeploy seamlessly when needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In all of these cases, pins drastically simplify my workflow, improve discoverability of the objects my analysis has created, and makes me more confident that I’m always using the newest version.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;where-to-pin&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Where to pin?&lt;/h2&gt;
&lt;p&gt;Before getting started describing exactly how this analysis project works, let’s dive a little deeper into the &lt;code&gt;pins&lt;/code&gt; package itself.&lt;/p&gt;
&lt;p&gt;Pins live on &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-understanding.html&#34;&gt;boards&lt;/a&gt;. A board is a set of content names and the associated files. The magic of the &lt;code&gt;pins&lt;/code&gt; package is that with only two commands and the name of some content, you can upload and download your R objects without having to worry about how how the content is stored.&lt;/p&gt;
&lt;p&gt;By default, there are two boards you can use immediately: the &lt;code&gt;packages&lt;/code&gt; board of the datasets from R packages that are installed, and the &lt;code&gt;local&lt;/code&gt; board, which caches datasets for quick loading later.&lt;/p&gt;
&lt;p&gt;The real power of &lt;code&gt;pins&lt;/code&gt; is unlocked with remote boards. &lt;code&gt;pins&lt;/code&gt; supports &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-kaggle.html&#34;&gt;Kaggle&lt;/a&gt;, &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-github.html&#34;&gt;Github&lt;/a&gt;, &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-websites.html&#34;&gt;website&lt;/a&gt;, and &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-rsconnect.html&#34;&gt;RStudio Connect&lt;/a&gt; boards, and also supports building &lt;a href=&#34;https://rstudio.github.io/pins/articles/boards-extending.html&#34;&gt;custom extensions&lt;/a&gt;. By using a remote board, you can use &lt;code&gt;pins&lt;/code&gt; to make your R objects accessible to others on your team in a central location.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;how-it-works&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;How it works&lt;/h2&gt;
&lt;p&gt;Using a pin works like this:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Register the board with the the &lt;code&gt;pins::board_register&lt;/code&gt; function. You’ll need to provide the proper authentication mechanism like a &lt;a href=&#34;https://www.kaggle.com/docs/api&#34;&gt;Kaggle token&lt;/a&gt;, &lt;a href=&#34;https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line&#34;&gt;Github Personal Access Token (PAT)&lt;/a&gt;, or &lt;a href=&#34;https://docs.rstudio.com/connect/1.5.4/user/api-keys.html&#34;&gt;RStudio Connect API key&lt;/a&gt; if you are using a remote board.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For GitHub, you need a repo that you have write access to, as well as a token:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pins::board_register(board = &amp;quot;github&amp;quot;, 
                     repo = &amp;quot;akgold/pins_demo&amp;quot;, 
                     branch = &amp;quot;master&amp;quot;,
                     token = Sys.getenv(&amp;quot;GITHUB_PAT&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For an RStudio Connect board, you need the server URL and an API key:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pins::board_register(board = &amp;quot;rsconnect&amp;quot;, 
                     server = &amp;quot;https://colorado.rstudio.com/rsc&amp;quot;, 
                     key = Sys.getenv(&amp;quot;RSTUDIOCONNECT_API_KEY&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At that point, your connections pane in RStudio will show the content available in the board.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;pins-connection-pane&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/pins_connection.png&#34; alt=&#34;Pins Connection Pane&#34; /&gt;&lt;/h1&gt;
&lt;p&gt;Once you’ve registered the board, your interactions are exactly the same no matter which board type you’re using.&lt;/p&gt;
&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Pin an object to the board.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pins::pin(
  x = mtcars, 
  name = &amp;quot;mtcars_pin&amp;quot;, 
  description = &amp;quot;A pin of the mtcars dataset.&amp;quot;, 
  board = &amp;quot;rsconnect&amp;quot;
)&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Download the object later.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cars_data &amp;lt;- pins::pin_get(
  name = &amp;quot;mtcars_pin&amp;quot;
  board = &amp;quot;rsconnect&amp;quot;
)&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;production-apps-with-pins&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Production Apps with Pins&lt;/h2&gt;
&lt;p&gt;In order to create, serve, and visualize my bike-availability predictions, I’m going to use RStudio’s publishing and scheduling platform, &lt;a href=&#34;https://rstudio.com/products/connect/&#34;&gt;RStudio Connect&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As of RStudio Connect 1.7.8, you can publish pins to RStudio Connect, and pins of datasets provide a nice preview of the pin, as well as code to retrieve the pin in both R and Python.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;a-pin-on-rstudio-connect&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/rsc_pin.png&#34; alt=&#34;A pin on RStudio Connect&#34; /&gt;&lt;/h1&gt;
&lt;p&gt;The advantage of using RStudio Connect is that I can deploy R Markdown documents, Shiny apps, and Plumber APIs that create, use, and update the pins in addition to storing the pins themselves. I can also use the permissions and security of RStudio Connect to make sure that my pins are viewable only by those with the proper permissions.&lt;/p&gt;
&lt;p&gt;Here’s how the process works:&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;system-schematic&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/system_schematic.png&#34; alt=&#34;System Schematic&#34; /&gt;&lt;/h1&gt;
&lt;div id=&#34;section&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;1.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_station_info/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_station_data.png&#34; alt=&#34;The bike station metadata, pinned on RStudio Connect, is updated every week by a scheduled RMarkdown document&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_station_data_ingest/&#34;&gt;RMarkdown here&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-1&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;2.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_model_data/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_model_data.png&#34; alt=&#34;The analysis dataset is pinned to RStudio Connect by another RMarkdown job.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_data_ingest/&#34;&gt;RMarkdown here&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-2&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;3.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_available_model/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_model_train.png&#34; alt=&#34;An XGBoost model is trained and pinned to RStudio Connect on demand by a deployed RMarkdown script&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_model_build/&#34;&gt;RMarkdown here&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-3&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;4.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_predict/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_api.png&#34; alt=&#34;A Plumber API is deployed on RStudio Connect, which calls the pinned model and serves model predictions.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;section-4&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;5.&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://colorado.rstudio.com/rsc/bike_predict-app/&#34;&gt;&lt;img src=&#34;/post/2019-10-11-deploying-data-with-pins/index_files/bike_app.png&#34; alt=&#34;A Shiny app is deployed, which consumes the prediction API and visualizes the number of bikes available.&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The three times that a pin was useful here turn out to represent three of the most compelling reasons to use a pin.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A small dataset that gets reused&lt;/strong&gt;. By accessing the station metadata dataset in a pin, I know I’m always getting the latest version regardless of which asset is using it, and it’s also accessible for other analyses in the future.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;An analysis dataset when you can’t write back to the database&lt;/strong&gt;. In this case, I don’t want to write an analysis dataset back to the raw data lake, so it’s easier to store it as a pin.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A model in production&lt;/strong&gt;. By using a pin to store my model, it’s easy to update the version that’s in production by running the R Markdown document that trains the model. It’s also conceptually simple to update the model independently from the API that serves predictions or the Shiny app that visualizes the predictions.&lt;/p&gt;
&lt;p&gt;Pins can be a fantastic way to enable Shiny and Plumber in production. By giving data scientists a place to save and deploy the output of their projects, pins make it easier to create, deploy, and update models, datasets, and other production-ready R objects.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/10/17/deploying-data-with-pins/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
