<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Streaming Data on R Views</title>
    <link>https://rviews.rstudio.com/tags/streaming-data/</link>
    <description>Recent content in Streaming Data on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 15 Nov 2017 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/streaming-data/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Using Shiny with Scheduled and Streaming Data</title>
      <link>https://rviews.rstudio.com/2017/11/15/shiny-and-scheduled-data-r/</link>
      <pubDate>Wed, 15 Nov 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/11/15/shiny-and-scheduled-data-r/</guid>
      <description>
        

&lt;p&gt;&lt;em&gt;Note: This article is now several years old. If you have RStudio Connect, there are more &lt;a href=&#34;https://medium.com/@kelly.obriant/basic-builds-how-to-update-data-in-a-shiny-app-on-rstudio-connect-48593902b1e2&#34;&gt;modern ways of updating data in a Shiny app&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Shiny applications are often backed by fluid, changing data. Data updates can occur at different time scales: from scheduled daily updates to live streaming data and ad-hoc user inputs. This article describes best practices for handling data updates in Shiny, and discusses deployment strategies for automating data updates.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-11-15-shiny-and-scheduled-data/rviews_scheduled_shiny.002.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;This post builds off of a 2017 rstudio::conf talk. The recording of the &lt;a href=&#34;https://www.rstudio.com/resources/videos/dashboards-made-easy/&#34;&gt;original talk&lt;/a&gt; and the &lt;a href=&#34;https://github.com/slopp/scheduledsnow&#34;&gt;sample code&lt;/a&gt; for this post are available.&lt;/p&gt;

&lt;p&gt;The end goal of this example is a dashboard to help skiers in Colorado select a resort to visit. Recommendations are based on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Snow reports that provide useful metrics like number of runs open and amount of new snow. Snow reports are updated &lt;strong&gt;daily&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Weather data, updated in &lt;strong&gt;near real-time from a live stream&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;User preferences, entered in the dashboard.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The backend for the dashboard looks like:&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2017-11-15-shiny-and-scheduled-data/rviews_scheduled_shiny.003.jpeg&#34; alt=&#34;&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;automate-scheduled-data-updates&#34;&gt;Automate Scheduled Data Updates&lt;/h2&gt;

&lt;p&gt;The first challenge is preparing the daily data. In this case, the data preparation requires a series of API requests and then basic data cleansing. The code for this process is written &lt;strong&gt;into an R Markdown document&lt;/strong&gt;, alongside process documentation and a few simple graphs that help validate the new data. The R Markdown document ends by saving the cleansed data into a shared data directory. The entire R Markdown document is scheduled for execution.&lt;/p&gt;

&lt;p&gt;It may seem odd at first to use a R Markdown document as the scheduled task. However, our team has found it incredibly useful to be able to look back through historical renderings of the &amp;ldquo;report&amp;rdquo; to gut-check the process. Using R Markdown also forces us to properly document the scheduled process.&lt;/p&gt;

&lt;p&gt;We use RStudio Connect to easily schedule the document, view past historical renderings, and ultimately to host the application. If the job fails, Connect also sends us an email containing &lt;code&gt;stdout&lt;/code&gt; from the render, which helps us stay on top of errors. (Connect can optionally send the successfully rendered report, as well.) However, the same scheduling could be accomplished with a workflow tool or even CRON.&lt;/p&gt;

&lt;p&gt;Make sure the data, written to shared storage, is readable by the user running the Shiny application - typically a service account like &lt;code&gt;rstudio-connect&lt;/code&gt; or &lt;code&gt;shiny&lt;/code&gt; can be set as the run-as user to ensure consistent behavior.&lt;/p&gt;

&lt;p&gt;Alternatively, instead of writing results to the file system, prepped data can be saved to a view in a database.&lt;/p&gt;

&lt;h2 id=&#34;using-scheduled-data-in-shiny&#34;&gt;Using Scheduled Data in Shiny&lt;/h2&gt;

&lt;p&gt;The dashboard needs to look for updates to the underlying shared data and automatically update when the data changes. (It wouldn&amp;rsquo;t be a very good dashboard if users had to refresh a page to see new data.) In Shiny, this behavior is accomplished with the &lt;code&gt;reactiveFileReader&lt;/code&gt; function:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;daily_data &amp;lt;- reactiveFileReader(
  intervalMillis = 100,
  filePath       = &#39;path/to/shared/data&#39;,
  readFunc       = readr::read_cs
)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The function checks the shared data file&amp;rsquo;s update timestamp every &lt;code&gt;intervalMillis&lt;/code&gt; to see if the data has changed. If the data has changed, the file is re-read using &lt;code&gt;readFunc&lt;/code&gt;. The resulting data object, &lt;code&gt;daily_data&lt;/code&gt;, is reactive and can be used in downstream functions like &lt;code&gt;render***&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If the cleansed data is stored in a database instead of written to a file in shared storage, use &lt;code&gt;reactivePoll&lt;/code&gt;. &lt;code&gt;reactivePoll&lt;/code&gt; is similar to &lt;code&gt;reactiveFileReader&lt;/code&gt;, but instead of checking the file&amp;rsquo;s update timestamp, a second function needs to be supplied that identifies when the database is updated. The function&amp;rsquo;s &lt;a href=&#34;https://shiny.rstudio.com/reference/shiny/latest/reactivePoll.html&#34;&gt;help documentation&lt;/a&gt; includes an example.&lt;/p&gt;

&lt;h2 id=&#34;streaming-data&#34;&gt;Streaming Data&lt;/h2&gt;

&lt;p&gt;The second challenge is updating the dashboard with live streaming weather data. One way for Shiny to ingest a stream of data is by turning the stream into &amp;ldquo;micro-batches&amp;rdquo;. The &lt;code&gt;invalidateLater&lt;/code&gt; function can be used for this purpose:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-r&#34;&gt;liveish_data &amp;lt;- reactive({
  invalidateLater(100)
  httr::GET(...)
})
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This causes Shiny to poll the streaming API every 100 milliseconds for new data. The results are available in the reactive data object &lt;code&gt;liveish_data&lt;/code&gt;. Picking how often to poll for data depends on a few factors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does the upstream API enforce rate limits?&lt;/li&gt;
&lt;li&gt;How long does a data update take? The application will be blocked while it polls data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal is to pick a polling time that balances the user&amp;rsquo;s desire for &amp;ldquo;live&amp;rdquo; data with these two concerns.&lt;/p&gt;

&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;To summarize, this architecture provides a number of benefits: No more painful, manual running of R code every day! Dashboard code is isolated from data prep code. There is enough flexibility to meet user requirements for live and daily data, while preventing un-necessary number crunching on the backend.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/11/15/shiny-and-scheduled-data-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
