<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Reproducibility on R Views</title>
    <link>https://rviews.rstudio.com/categories/reproducibility/</link>
    <description>Recent content in Reproducibility on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Fri, 17 Jun 2022 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/categories/reproducibility/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Frank&#39;s R Workflow</title>
      <link>https://rviews.rstudio.com/2022/06/17/frank-s-workflow/</link>
      <pubDate>Fri, 17 Jun 2022 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2022/06/17/frank-s-workflow/</guid>
      <description>
        &lt;p&gt;&lt;a href=&#34;https://www.fharrell.com/&#34;&gt;Frank Harrell&amp;rsquo;s&lt;/a&gt; new eBook, &lt;a href=&#34;http://hbiostat.org/rflow/&#34;&gt;&lt;em&gt;R Workflow&lt;/em&gt;&lt;/a&gt;, which aims to: &amp;ldquo;to foster best practices in reproducible data documentation and manipulation, statistical analysis, graphics, and reporting&amp;rdquo; is an ambitious document that is notable on multiple levels.&lt;/p&gt;

&lt;p&gt;To begin with, the workflow itself is much more than a simple progression of logical steps.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;workflow.png&#34; height = &#34;500&#34; width=&#34;100%&#34; alt=&#34;Diagram of Reproducible Research Workflow&#34;&gt;&lt;/p&gt;

&lt;p&gt;This workflow is clearly the result of a process forged through trial and error by a master statistician over many years. As the diagram indicates, the document takes a holistic viewpoint of a statistical analysis covering document preparation, data manipulation, statistical practice computational concerns, and more.&lt;/p&gt;

&lt;p&gt;Then, there is the synthesis of a wide range of content into a succinct, very readable exposition that dips in to some very deep topics. Frank&amp;rsquo;s examples are streamlined presentations of analyses and code that are both sophisticated an practical. The missing value section suggests a whole array of analyses through a careful presentation of plots, and the section on data checking introduces a level of automation beyond what is commonly done.&lt;/p&gt;

&lt;p&gt;Frank&amp;rsquo;s writing style is clear, informal and from the perspective of a teacher who wants to show you some cool things along with the basics. For example, don&amp;rsquo;t miss the &lt;em&gt;if Trick&lt;/em&gt; in section 2.4.3.&lt;/p&gt;

&lt;p&gt;I should mention that Frank&amp;rsquo;s eBook is not a &lt;em&gt;tidyverse&lt;/em&gt; presentation. The code examples are built around base R, Frank&amp;rsquo;s &lt;code&gt;Hmisc&lt;/code&gt; and &lt;code&gt;rms&lt;/code&gt; packages and an eclectic mix of  packages that include &lt;code&gt;data.table&lt;/code&gt;. &lt;code&gt;plotly&lt;/code&gt; and &lt;em&gt;tidyverse&lt;/em&gt; packages &lt;code&gt;haven&lt;/code&gt; and  &lt;code&gt;ggplot2&lt;/code&gt;. In a way, this selection of packages reflects the evolution of R itself.  For example, as with many popular R packages,  &lt;code&gt;Hmisc&lt;/code&gt; most likely started out as Frank&amp;rsquo;s personal tool kit. However, after many years of Frank&amp;rsquo;s deep commitment to using R and contributing R tools, which includes seventy versions of &lt;code&gt;Hmisc&lt;/code&gt; in nineteen years, the package has become a fundamental resource. (Have a look at the reverse depends, imports, and suggests.) Also, the mix of packages with different design philosophies underlying &lt;em&gt;R Workflow&lt;/em&gt; reflects the flexibility of the R language and the organic growth of the R ecosystem.&lt;/p&gt;

&lt;p&gt;Perhaps the most striking aspect of the eBook is the way Frank uses &lt;a href=&#34;https://quarto.org/&#34;&gt;&lt;code&gt;Quarto&lt;/code&gt;&lt;/a&gt;, &lt;code&gt;knitr&lt;/code&gt; and &lt;code&gt;Hmisc&lt;/code&gt; to build an elegant reproducible document about building reproducible documents. For example, &lt;code&gt;Quarto&lt;/code&gt; permits the effective placement of plots in the right margins of the document, and the &lt;code&gt;Quarto&lt;/code&gt; &lt;em&gt;callouts&lt;/em&gt; in Section 3.4 enable the mini tutorials that include &lt;em&gt;Special Considerations for Latex/pdf&lt;/em&gt; and &lt;em&gt;Using Tooltips with Mermaid&lt;/em&gt; to be embedded in the document without interrupting its flow. Moreover, along with functions like &lt;code&gt;Hmisc::getHdata()&lt;/code&gt; and &lt;code&gt;Hmisc::getRs()&lt;/code&gt;, &lt;code&gt;Quarto&lt;/code&gt; enables the document to achieve a high level of reproducibility by pulling data and code directly from GitHub repositories.&lt;/p&gt;

&lt;p&gt;Not only can Frank&amp;rsquo;s &lt;em&gt;R Workflow&lt;/em&gt; teach you some serious statistics, but studying its construction will take you a long way towards building aesthetically pleasing reproducible documents.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Frank Harrell will be delivering a keynote address on August 26th at the upcoming &lt;a href=&#34;https://events.linuxfoundation.org/r-medicine/&#34;&gt;R/Medicine&lt;/a&gt; conference.&lt;/em&gt;&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2022/06/17/frank-s-workflow/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Reproducible Environments</title>
      <link>https://rviews.rstudio.com/2019/04/22/reproducible-environments/</link>
      <pubDate>Mon, 22 Apr 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/04/22/reproducible-environments/</guid>
      <description>
        
&lt;script src=&#34;/rmarkdown-libs/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-binding/plotly.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/typedarray/typedarray.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/jquery/jquery.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/crosstalk/css/crosstalk.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/crosstalk/js/crosstalk.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/plotly-htmlwidgets-css/plotly-htmlwidgets.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-main/plotly-latest.min.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;Great data science work should be reproducible. The ability to repeat
experiments is part of the foundation for all science, and reproducible work is
also critical for business applications. Team collaboration, project validation,
and sustainable products presuppose the ability to reproduce work over time.&lt;/p&gt;
&lt;p&gt;In my opinion, mastering just a handful of important tools will make
reproducible work in R much easier for data scientists. R users should be
familiar with version control, RStudio projects, and literate programming
through R Markdown. Once these tools are mastered, the major remaining challenge
is creating a reproducible environment.&lt;/p&gt;
&lt;p&gt;An environment consists of all the dependencies required to enable your code to
run correctly. This includes R itself, R packages, and system dependencies. As
with many programming languages, it can be challenging to manage reproducible R
environments. Common issues include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code that used to run no longer runs, even though the code has not changed.&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Being afraid to upgrade or install a new package, because it might break your code or someone else’s.&lt;br /&gt;
&lt;/li&gt;
&lt;li&gt;Typing &lt;code&gt;install.packages&lt;/code&gt; in your environment doesn’t do anything, or doesn’t do the &lt;em&gt;right&lt;/em&gt; thing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These challenges can be addressed through a careful combination of tools and
strategies. This post describes two use cases for reproducible environments:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Safely upgrading packages&lt;/li&gt;
&lt;li&gt;Collaborating on a team&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The sections below each cover a strategy to address the use case, and the necessary
tools to implement each strategy. Additional use cases, strategies, and tools are
presented at &lt;a href=&#34;https://environments.rstudio.com&#34; class=&#34;uri&#34;&gt;https://environments.rstudio.com&lt;/a&gt;. This website is a work in
progress, but we look forward to your feedback.&lt;/p&gt;
&lt;div id=&#34;safely-upgrading-packages&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Safely Upgrading Packages&lt;/h2&gt;
&lt;p&gt;Upgrading packages can be a risky affair. It is not difficult to find serious R
users who have been in a situation where upgrading a package had unintended
consequences. For example, the upgrade may have broken parts of their current code, or upgrading a
package for one project accidentally broke the code in another project. A
strategy for safely upgrading packages consists of three steps:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Isolate a project&lt;/li&gt;
&lt;li&gt;Record the current dependencies&lt;/li&gt;
&lt;li&gt;Upgrade packages&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first step in this strategy ensures one project’s packages and upgrades
won’t interfere with any other projects. Isolating projects is accomplished by
creating per-project libraries. A tool that makes this easy is the new &lt;a href=&#34;https://github.com/rstudio/renv&#34;&gt;&lt;code&gt;renv&lt;/code&gt;
package&lt;/a&gt;. Inside of your R project, simply use:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# inside the project directory
renv::init()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second step is to record the current dependencies. This step is critical
because it creates a safety net. If the package upgrade goes poorly, you’ll be
able to revert the changes and return to the record of the working state. Again,
the &lt;code&gt;renv&lt;/code&gt; package makes this process easy.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# record the current dependencies in a file called renv.lock
renv::snapshot()

# commit the lockfile alongside your code in version control
# and use this function to view the history of your lockfile
renv::history()

# if an upgrade goes astray, revert the lockfile
renv::revert(commit = &amp;quot;abc123&amp;quot;)

# and restore the previous environment
renv::restore()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With an isolated project and a safety net in place, you can now proceed to
upgrade or add new packages, while remaining certain the current functional
environment is still reproducible. The &lt;a href=&#34;https://github.com/r-lib/pak&#34;&gt;&lt;code&gt;pak&lt;/code&gt;
package&lt;/a&gt; can be used to install and upgrade
packages in an interactive environment:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# upgrade packages quickly and safely
pak::pkg_install(&amp;quot;ggplot2&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The safety net provided by the &lt;code&gt;renv&lt;/code&gt; package relies on access to older versions
of R packages. For public packages, CRAN provides these older versions in the
&lt;a href=&#34;https://cran.rstudio.com/src/contrib/Archive&#34;&gt;CRAN archive&lt;/a&gt;. Organizations can
use tools like &lt;a href=&#34;https://rstudio.com/products/package-manager&#34;&gt;RStudio Package
Manager&lt;/a&gt; to make multiple versions
of private packages available. The &lt;a href=&#34;https://environments.rstudio.com/snapshot&#34;&gt;“snapshot and
restore”&lt;/a&gt; approach can also be used
to &lt;a href=&#34;https://environments.rstudio.com/deploy&#34;&gt;promote content to production&lt;/a&gt;. In
fact, this approach is exactly how &lt;a href=&#34;https://rstudio.com/products/connect&#34;&gt;RStudio
Connect&lt;/a&gt; and
&lt;a href=&#34;https://shinyapps.io&#34;&gt;shinyapps.io&lt;/a&gt; deploy thousands of R applications to
production each day!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;team-collaboration&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Team Collaboration&lt;/h2&gt;
&lt;p&gt;A common challenge on teams is sharing and running code. One strategy that
administrators and R users can adopt to facilitate collaboration is
shared baselines. The basics of the strategy are simple:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Administrators setup a common environment for R users by installing RStudio Server.&lt;/li&gt;
&lt;li&gt;On the server, administrators &lt;a href=&#34;https://support.rstudio.com/hc/en-us/articles/215488098&#34;&gt;install multiple versions of R&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Each version of R is tied to a frozen repository using a Rprofile.site file.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By using a frozen repository, either administrators or users can install
packages while still being sure that everyone will get the same set of packages.
A frozen repository also ensures that adding new packages won’t upgrade other
shared packages as a side-effect. New packages and upgrades are offered to users
over time through the addition of new versions of R.&lt;/p&gt;
&lt;p&gt;Frozen repositories can be created by manually cloning CRAN, accessing a service
like MRAN, or utilizing a supported product like &lt;a href=&#34;https://rstudio.com/products/package-manager&#34;&gt;RStudio Package
Manager&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-04-15-repro-envs_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;adaptable-strategies&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Adaptable Strategies&lt;/h2&gt;
&lt;p&gt;The prior sections presented specific strategies for creating reproducible
environments in two common cases. The same strategy may not be appropriate for
every organization, R user, or situation. If you’re a student reporting an
error to your professor, capturing your &lt;code&gt;sessionInfo()&lt;/code&gt; may be all you need. In
contrast, a statistician working on a clinical trial will need a robust
framework for recreating their environment. &lt;strong&gt;Reproducibility is not binary!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019-04-15-repro-envs_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;To help pick between strategies, we’ve developed a &lt;a href=&#34;https://environments.rstudio.com/reproduce&#34;&gt;strategy
map&lt;/a&gt;. By answering two questions,
you can quickly identify where your team falls on this map and identify the
nearest successful strategy. The two questions are represented on the x and
y-axis of the map:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Do I have any restrictions on what packages can be used?&lt;/li&gt;
&lt;li&gt;Who is responsible for managing installed packages?&lt;/li&gt;
&lt;/ol&gt;
&lt;div id=&#34;htmlwidget-1&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;plotly html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-1&#34;&gt;{&#34;x&#34;:{&#34;data&#34;:[{&#34;x&#34;:[-0.05,1.05],&#34;y&#34;:[0.15,1.25],&#34;text&#34;:&#34;&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(0,0,0,0.2)&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0,0,0.8,0],&#34;y&#34;:[0.2,1,1,0.2],&#34;text&#34;:&#34;NA&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(255,0,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0,1,0.8,0,0],&#34;y&#34;:[null,0.8,1,0.2,null],&#34;text&#34;:&#34;NA&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(0,255,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0,0,1,0.2,0],&#34;y&#34;:[0,0.2,0.8,0,0],&#34;text&#34;:&#34;&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(0,255,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.2,1,1,0.2],&#34;y&#34;:[0,0,0.8,0],&#34;text&#34;:&#34;NA&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;transparent&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;fill&#34;:&#34;toself&#34;,&#34;fillcolor&#34;:&#34;rgba(255,0,0,0.1)&#34;,&#34;hoveron&#34;:&#34;fills&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;skip&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[-0.05,1.05],&#34;y&#34;:[-0.25,0.85],&#34;text&#34;:&#34;&#34;,&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;lines&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(0,0,0,0.2)&#34;,&#34;dash&#34;:&#34;solid&#34;},&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.5,0.75,0.2],&#34;y&#34;:[0.75,0.2,0.8],&#34;text&#34;:[&#34;Open access, &lt;br /&gt; not reproducible, &lt;br /&gt; how we learn&#34;,&#34;Backdoor package access, &lt;br /&gt; offline systems without a strategy&#34;,&#34;Admins involved, &lt;br /&gt; no testing, &lt;br /&gt; slow updates, &lt;br /&gt; high risk of breakage&#34;],&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;markers&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(255,0,0,1)&#34;,&#34;opacity&#34;:1,&#34;size&#34;:5.66929133858268,&#34;symbol&#34;:&#34;circle&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(255,0,0,1)&#34;}},&#34;hoveron&#34;:&#34;points&#34;,&#34;name&#34;:&#34;FALSE&#34;,&#34;legendgroup&#34;:&#34;FALSE&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.1,0.5,0.8],&#34;y&#34;:[0.1,0.5,0.8],&#34;text&#34;:[&#34;Admins test and approve &lt;br /&gt; a subset of CRAN&#34;,&#34;All or most of CRAN, &lt;br /&gt; updated with R versions, &lt;br /&gt; tied to a system library&#34;,&#34;Open access, user or system &lt;br /&gt; records per-project dependencies&#34;],&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;markers&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(163,197,134,1)&#34;,&#34;opacity&#34;:1,&#34;size&#34;:5.66929133858268,&#34;symbol&#34;:&#34;circle&#34;,&#34;line&#34;:{&#34;width&#34;:1.88976377952756,&#34;color&#34;:&#34;rgba(163,197,134,1)&#34;}},&#34;hoveron&#34;:&#34;points&#34;,&#34;name&#34;:&#34; TRUE&#34;,&#34;legendgroup&#34;:&#34; TRUE&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[0.125,0.525,0.525,0.825,0.775,0.225],&#34;y&#34;:[0.125,0.525,0.775,0.825,0.225,0.825],&#34;text&#34;:[&#34;Validated&#34;,&#34;Shared Baseline&#34;,&#34;Wild West&#34;,&#34;Snapshot&#34;,&#34;Blocked&#34;,&#34;Ticket System&#34;],&#34;hovertext&#34;:[&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;],&#34;textfont&#34;:{&#34;size&#34;:14.6645669291339,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;},&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;text&#34;,&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null}],&#34;layout&#34;:{&#34;margin&#34;:{&#34;t&#34;:43.7625570776256,&#34;r&#34;:7.30593607305936,&#34;b&#34;:40.1826484018265,&#34;l&#34;:89.8630136986302},&#34;font&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:14.6118721461187},&#34;title&#34;:&#34;Reproducing Environments: Strategies and Danger Zones&#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:17.5342465753425},&#34;xaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;type&#34;:&#34;linear&#34;,&#34;autorange&#34;:false,&#34;range&#34;:[-0.05,1.05],&#34;tickmode&#34;:&#34;array&#34;,&#34;ticktext&#34;:[&#34;Admins&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Users&#34;],&#34;tickvals&#34;:[0,0.25,0.5,0.75,1],&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;Admins&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Users&#34;],&#34;nticks&#34;:null,&#34;ticks&#34;:&#34;&#34;,&#34;tickcolor&#34;:null,&#34;ticklen&#34;:3.65296803652968,&#34;tickwidth&#34;:0,&#34;showticklabels&#34;:true,&#34;tickfont&#34;:{&#34;color&#34;:&#34;rgba(77,77,77,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:11.689497716895},&#34;tickangle&#34;:-0,&#34;showline&#34;:false,&#34;linecolor&#34;:null,&#34;linewidth&#34;:0,&#34;showgrid&#34;:true,&#34;gridcolor&#34;:&#34;rgba(235,235,235,1)&#34;,&#34;gridwidth&#34;:0.66417600664176,&#34;zeroline&#34;:false,&#34;anchor&#34;:&#34;y&#34;,&#34;title&#34;:&#34;Who is Responsible for Reproducing the Environment?&#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:14.6118721461187},&#34;hoverformat&#34;:&#34;.2f&#34;},&#34;yaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;type&#34;:&#34;linear&#34;,&#34;autorange&#34;:false,&#34;range&#34;:[-0.05,1.05],&#34;tickmode&#34;:&#34;array&#34;,&#34;ticktext&#34;:[&#34;Locked Down&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Open&#34;],&#34;tickvals&#34;:[0,0.25,0.5,0.75,1],&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;Locked Down&#34;,&#34;&#34;,&#34;&#34;,&#34;&#34;,&#34;Open&#34;],&#34;nticks&#34;:null,&#34;ticks&#34;:&#34;&#34;,&#34;tickcolor&#34;:null,&#34;ticklen&#34;:3.65296803652968,&#34;tickwidth&#34;:0,&#34;showticklabels&#34;:true,&#34;tickfont&#34;:{&#34;color&#34;:&#34;rgba(77,77,77,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:11.689497716895},&#34;tickangle&#34;:-0,&#34;showline&#34;:false,&#34;linecolor&#34;:null,&#34;linewidth&#34;:0,&#34;showgrid&#34;:true,&#34;gridcolor&#34;:&#34;rgba(235,235,235,1)&#34;,&#34;gridwidth&#34;:0.66417600664176,&#34;zeroline&#34;:false,&#34;anchor&#34;:&#34;x&#34;,&#34;title&#34;:&#34;Package Access&#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:14.6118721461187},&#34;hoverformat&#34;:&#34;.2f&#34;},&#34;shapes&#34;:[{&#34;type&#34;:&#34;rect&#34;,&#34;fillcolor&#34;:null,&#34;line&#34;:{&#34;color&#34;:null,&#34;width&#34;:0,&#34;linetype&#34;:[]},&#34;yref&#34;:&#34;paper&#34;,&#34;xref&#34;:&#34;paper&#34;,&#34;x0&#34;:0,&#34;x1&#34;:1,&#34;y0&#34;:0,&#34;y1&#34;:1}],&#34;showlegend&#34;:false,&#34;legend&#34;:{&#34;bgcolor&#34;:null,&#34;bordercolor&#34;:null,&#34;borderwidth&#34;:0,&#34;font&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:11.689497716895},&#34;y&#34;:1},&#34;hovermode&#34;:&#34;closest&#34;,&#34;barmode&#34;:&#34;relative&#34;},&#34;config&#34;:{&#34;doubleClick&#34;:&#34;reset&#34;,&#34;modeBarButtonsToAdd&#34;:[{&#34;name&#34;:&#34;Collaborate&#34;,&#34;icon&#34;:{&#34;width&#34;:1000,&#34;ascent&#34;:500,&#34;descent&#34;:-50,&#34;path&#34;:&#34;M487 375c7-10 9-23 5-36l-79-259c-3-12-11-23-22-31-11-8-22-12-35-12l-263 0c-15 0-29 5-43 15-13 10-23 23-28 37-5 13-5 25-1 37 0 0 0 3 1 7 1 5 1 8 1 11 0 2 0 4-1 6 0 3-1 5-1 6 1 2 2 4 3 6 1 2 2 4 4 6 2 3 4 5 5 7 5 7 9 16 13 26 4 10 7 19 9 26 0 2 0 5 0 9-1 4-1 6 0 8 0 2 2 5 4 8 3 3 5 5 5 7 4 6 8 15 12 26 4 11 7 19 7 26 1 1 0 4 0 9-1 4-1 7 0 8 1 2 3 5 6 8 4 4 6 6 6 7 4 5 8 13 13 24 4 11 7 20 7 28 1 1 0 4 0 7-1 3-1 6-1 7 0 2 1 4 3 6 1 1 3 4 5 6 2 3 3 5 5 6 1 2 3 5 4 9 2 3 3 7 5 10 1 3 2 6 4 10 2 4 4 7 6 9 2 3 4 5 7 7 3 2 7 3 11 3 3 0 8 0 13-1l0-1c7 2 12 2 14 2l218 0c14 0 25-5 32-16 8-10 10-23 6-37l-79-259c-7-22-13-37-20-43-7-7-19-10-37-10l-248 0c-5 0-9-2-11-5-2-3-2-7 0-12 4-13 18-20 41-20l264 0c5 0 10 2 16 5 5 3 8 6 10 11l85 282c2 5 2 10 2 17 7-3 13-7 17-13z m-304 0c-1-3-1-5 0-7 1-1 3-2 6-2l174 0c2 0 4 1 7 2 2 2 4 4 5 7l6 18c0 3 0 5-1 7-1 1-3 2-6 2l-173 0c-3 0-5-1-8-2-2-2-4-4-4-7z m-24-73c-1-3-1-5 0-7 2-2 3-2 6-2l174 0c2 0 5 0 7 2 3 2 4 4 5 7l6 18c1 2 0 5-1 6-1 2-3 3-5 3l-174 0c-3 0-5-1-7-3-3-1-4-4-5-6z&#34;},&#34;click&#34;:&#34;function(gd) { \n        // is this being viewed in RStudio?\n        if (location.search == &#39;?viewer_pane=1&#39;) {\n          alert(&#39;To learn about plotly for collaboration, visit:\\n https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html&#39;);\n        } else {\n          window.open(&#39;https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html&#39;, &#39;_blank&#39;);\n        }\n      }&#34;}],&#34;cloud&#34;:false,&#34;displayModeBar&#34;:false},&#34;source&#34;:&#34;A&#34;,&#34;attrs&#34;:{&#34;f87a7b9b28cc&#34;:{&#34;intercept&#34;:{},&#34;slope&#34;:{},&#34;type&#34;:&#34;scatter&#34;},&#34;f87a793a87a&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a6f19e578&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87ad286244&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a564b651b&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a6fdafbdf&#34;:{&#34;intercept&#34;:{},&#34;slope&#34;:{}},&#34;f87a11ce26d8&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;colour&#34;:{},&#34;text&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}},&#34;f87a75583809&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;label&#34;:{},&#34;x.1&#34;:{},&#34;y.1&#34;:{}}},&#34;cur_data&#34;:&#34;f87a7b9b28cc&#34;,&#34;visdat&#34;:{&#34;f87a7b9b28cc&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a793a87a&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a6f19e578&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87ad286244&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a564b651b&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a6fdafbdf&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a11ce26d8&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;f87a75583809&#34;:[&#34;function (y) &#34;,&#34;x&#34;]},&#34;highlight&#34;:{&#34;on&#34;:&#34;plotly_click&#34;,&#34;persistent&#34;:false,&#34;dynamic&#34;:false,&#34;selectize&#34;:false,&#34;opacityDim&#34;:0.2,&#34;selected&#34;:{&#34;opacity&#34;:1},&#34;debounce&#34;:0},&#34;base_url&#34;:&#34;https://plot.ly&#34;,&#34;.hideLegend&#34;:true},&#34;evals&#34;:[&#34;config.modeBarButtonsToAdd.0.click&#34;],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;p&gt;For more information on picking and using these strategies, please visit
&lt;a href=&#34;https://environments.rstudio.com&#34; class=&#34;uri&#34;&gt;https://environments.rstudio.com&lt;/a&gt;. By adopting a strategy for reproducible
environments, R users, administrators, and teams can solve a number of important
challenges. Ultimately, reproducible work adds credibility, creating a solid
foundation for research, business applications, and production systems. We are
excited to be working on tools to make reproducible work in R easy and fun. We
look forward to your feedback, community discussions, and future posts.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/04/22/reproducible-environments/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Organizing DataFest the tidy way</title>
      <link>https://rviews.rstudio.com/2017/04/05/datafestorg/</link>
      <pubDate>Wed, 05 Apr 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/04/05/datafestorg/</guid>
      <description>
        

&lt;p&gt;Organizing an event can be a full-time task in and of its own. I have been organizing ASA DataFest for six years at Duke, and over this time, the number of participants has grown from 23 students from Duke only, to 360 students from seven area schools this year!&lt;/p&gt;
&lt;p&gt;First, a bit about ASA DataFest: ASA DataFest is a data “hackathon” for students around the U.S., Canada, and Germany (for now; this list has been growing each year). Students spend a weekend working in small teams, around the clock, to find insight and meaning in a large, messy, and rich data set. For almost all students, it is the most complex data they have encountered, and they push themselves to master new skills, resurrect forgotten knowledge, and bring everything they’ve got to compete for the honor of being declared the best by a panel of expert judges.&lt;/p&gt;
&lt;p&gt;As an educator, statistician, and data scientist, growth in student interest in this event sounds fantastic to me. However, as the person responsible for running the event at Duke, it has also meant that for the couple months leading up to DataFest, I have almost an additional full-time job dealing with everything from student registrations to promoting the event to putting in food orders. While I have not found an R-based solution for ordering food (yet!), this year I incorporated R and R Markdown in my organization workflow for grabbing, processing, and reporting registration information.&lt;/p&gt;
&lt;p&gt;This post highlights using Google Forms for data collection (e.g., registration), the &lt;code&gt;googlesheets&lt;/code&gt; package to pull that data into R, and packages from the tidyverse to manipulate, summarise, and visualize that data. Then, we use &lt;a href=&#34;http://rpubs.com/&#34;&gt;RPubs&lt;/a&gt; for publishing documents to be shared with participants and other constituents.&lt;/p&gt;
&lt;p&gt;Here is a list of all packages used in this article:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(googlesheets)
library(tidyverse)
library(stringr)
library(DT)
library(knitr)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In an effort to make it easier for others organizing DataFests to replicate my workflow, I have created a Google Drive containing all forms needed for registering participants and collecting information from consultants (mentors), and judges. I have also populated these forms with randomly generated names to showcase how these data are processed to yield the rosters and reports that are useful for organizing the event and disseminating registration information. All Google Forms mentioned can be found in the &lt;a href=&#34;https://drive.google.com/drive/u/1/folders/0B0Y2lFgS9uiDaEZvXzNGZ2xKNmM&#34;&gt;DataFest Organization Google Drive&lt;/a&gt;, which is available for public viewing. You can make a copy for your own use.&lt;/p&gt;
&lt;p&gt;Additionally, all R scripts and R Markdown documents used to process these data are available on the &lt;a href=&#34;https://github.com/mine-cetinkaya-rundel/datafest&#34;&gt;datafest GitHub repo&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;team-sign-ups&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Team sign ups&lt;/h3&gt;
&lt;p&gt;If a group of students has already formed a team, it makes sense for them to sign up as a team to ensure that they use the same team name and that everyone registers at once. &lt;a href=&#34;https://goo.gl/forms/0hXPw0Bj1zYhsfNP2&#34;&gt;This Google Form&lt;/a&gt; is used to sign such students up.&lt;/p&gt;
&lt;p&gt;One issue with registering each team as a single entry is that we end up with what we call “wide” data: each row represents a team, and within that row we have information on all students in that team. However for most practical purposes (counting participants, plotting distributions of years and majors, figuring out how many of each size t-shirt to order, etc.) we need the data to be in “long” format, where each row represents a student.&lt;/p&gt;
&lt;p&gt;To accomplish this transformation, we first read the data in using the &lt;code&gt;googlesheets&lt;/code&gt; package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;part_wide &amp;lt;- gs_title(&amp;quot;DataFest [YEAR] @ [HOST] - Team Sign up (Responses)&amp;quot;) %&amp;gt;%
  gs_read()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, we realize that the variable names are a mess since they come directly from questions on the Google form!&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;names(part_wide)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;timestamp&amp;quot;       &amp;quot;team_name&amp;quot;       &amp;quot;last_name_1&amp;quot;    
##  [4] &amp;quot;first_name_1&amp;quot;    &amp;quot;school_1&amp;quot;        &amp;quot;tshirt_size_1&amp;quot;  
##  [7] &amp;quot;class_year_1&amp;quot;    &amp;quot;major_1&amp;quot;         &amp;quot;email_1&amp;quot;        
## [10] &amp;quot;participation_1&amp;quot; &amp;quot;diet_1&amp;quot;          &amp;quot;last_name_2&amp;quot;    
## [13] &amp;quot;first_name_2&amp;quot;    &amp;quot;school_2&amp;quot;        &amp;quot;tshirt_size_2&amp;quot;  
## [16] &amp;quot;class_year_2&amp;quot;    &amp;quot;major_2&amp;quot;         &amp;quot;email_2&amp;quot;        
## [19] &amp;quot;participation_2&amp;quot; &amp;quot;diet_2&amp;quot;          &amp;quot;last_name_3&amp;quot;    
## [22] &amp;quot;first_name_3&amp;quot;    &amp;quot;school_3&amp;quot;        &amp;quot;tshirt_size_3&amp;quot;  
## [25] &amp;quot;class_year_3&amp;quot;    &amp;quot;major_3&amp;quot;         &amp;quot;email_3&amp;quot;        
## [28] &amp;quot;participation_3&amp;quot; &amp;quot;diet_3&amp;quot;          &amp;quot;last_name_4&amp;quot;    
## [31] &amp;quot;first_name_4&amp;quot;    &amp;quot;school_4&amp;quot;        &amp;quot;tshirt_size_4&amp;quot;  
## [34] &amp;quot;class_year_4&amp;quot;    &amp;quot;major_4&amp;quot;         &amp;quot;email_4&amp;quot;        
## [37] &amp;quot;participation_4&amp;quot; &amp;quot;diet_4&amp;quot;          &amp;quot;last_name_5&amp;quot;    
## [40] &amp;quot;first_name_5&amp;quot;    &amp;quot;school_5&amp;quot;        &amp;quot;tshirt_size_5&amp;quot;  
## [43] &amp;quot;class_year_5&amp;quot;    &amp;quot;major_5&amp;quot;         &amp;quot;email_5&amp;quot;        
## [46] &amp;quot;participation_5&amp;quot; &amp;quot;diet_5&amp;quot;          &amp;quot;photo&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using &lt;code&gt;stringr&lt;/code&gt;, we can get these variable names in concise snake_case shape:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;names(part_wide) &amp;lt;- names(part_wide) %&amp;gt;%
  str_replace(&amp;quot; of team member&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;%
  str_replace(&amp;quot; in DataFest before&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;%
  str_replace(&amp;quot; Check all that apply.&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;%
  str_replace(&amp;quot;Email address&amp;quot;, &amp;quot;email&amp;quot;) %&amp;gt;%
  str_replace(&amp;quot;Dietary restrictions&amp;quot;, &amp;quot;diet&amp;quot;) %&amp;gt;%
  str_replace(&amp;quot;Check if you agree&amp;quot;, &amp;quot;photo&amp;quot;) %&amp;gt;%
  str_replace(&amp;quot;\\:&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;%
  str_replace(&amp;quot;-&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;%
  str_replace_all(&amp;quot; &amp;quot;, &amp;quot;_&amp;quot;) %&amp;gt;%
  tolower()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see that things look a lot better now:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;names(part_wide)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;timestamp&amp;quot;       &amp;quot;team_name&amp;quot;       &amp;quot;last_name_1&amp;quot;    
##  [4] &amp;quot;first_name_1&amp;quot;    &amp;quot;school_1&amp;quot;        &amp;quot;tshirt_size_1&amp;quot;  
##  [7] &amp;quot;class_year_1&amp;quot;    &amp;quot;major_1&amp;quot;         &amp;quot;email_1&amp;quot;        
## [10] &amp;quot;participation_1&amp;quot; &amp;quot;diet_1&amp;quot;          &amp;quot;last_name_2&amp;quot;    
## [13] &amp;quot;first_name_2&amp;quot;    &amp;quot;school_2&amp;quot;        &amp;quot;tshirt_size_2&amp;quot;  
## [16] &amp;quot;class_year_2&amp;quot;    &amp;quot;major_2&amp;quot;         &amp;quot;email_2&amp;quot;        
## [19] &amp;quot;participation_2&amp;quot; &amp;quot;diet_2&amp;quot;          &amp;quot;last_name_3&amp;quot;    
## [22] &amp;quot;first_name_3&amp;quot;    &amp;quot;school_3&amp;quot;        &amp;quot;tshirt_size_3&amp;quot;  
## [25] &amp;quot;class_year_3&amp;quot;    &amp;quot;major_3&amp;quot;         &amp;quot;email_3&amp;quot;        
## [28] &amp;quot;participation_3&amp;quot; &amp;quot;diet_3&amp;quot;          &amp;quot;last_name_4&amp;quot;    
## [31] &amp;quot;first_name_4&amp;quot;    &amp;quot;school_4&amp;quot;        &amp;quot;tshirt_size_4&amp;quot;  
## [34] &amp;quot;class_year_4&amp;quot;    &amp;quot;major_4&amp;quot;         &amp;quot;email_4&amp;quot;        
## [37] &amp;quot;participation_4&amp;quot; &amp;quot;diet_4&amp;quot;          &amp;quot;last_name_5&amp;quot;    
## [40] &amp;quot;first_name_5&amp;quot;    &amp;quot;school_5&amp;quot;        &amp;quot;tshirt_size_5&amp;quot;  
## [43] &amp;quot;class_year_5&amp;quot;    &amp;quot;major_5&amp;quot;         &amp;quot;email_5&amp;quot;        
## [46] &amp;quot;participation_5&amp;quot; &amp;quot;diet_5&amp;quot;          &amp;quot;photo&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, then we use &lt;code&gt;dplyr&lt;/code&gt; and &lt;code&gt;tidyr&lt;/code&gt; to transform the data from wide to long:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;participants &amp;lt;- part_wide %&amp;gt;%
  select(-photo) %&amp;gt;%
  gather(column, entry, last_name_1:diet_5, -timestamp, -team_name) %&amp;gt;%
  mutate(person_in_team = str_match(column, &amp;quot;[0-9]&amp;quot;)) %&amp;gt;%
  mutate(column = str_replace(column, &amp;quot;_[0-9]&amp;quot;, &amp;quot;&amp;quot;)) %&amp;gt;%
  spread(column, entry) %&amp;gt;%
  filter(!is.na(last_name)) %&amp;gt;%
  arrange(team_name, last_name, first_name) %&amp;gt;%
  select(-person_in_team) %&amp;gt;%
  select(timestamp, team_name, first_name, last_name, email, school, 
         class_year, major, participation, diet, tshirt_size)    # reorder&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s take a peek:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;participants&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 16 x 11
##             timestamp          team_name first_name last_name
##                 &amp;lt;chr&amp;gt;              &amp;lt;chr&amp;gt;      &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt;
## 1   4/2/2017 22:20:05      Bae&amp;#39;s Theorem   Adrienne    Fuller
## 2   4/2/2017 22:20:05      Bae&amp;#39;s Theorem     Sylvia     Hicks
## 3   4/2/2017 22:20:05      Bae&amp;#39;s Theorem       Toni   Simpson
## 4   4/2/2017 22:20:05      Bae&amp;#39;s Theorem      Vicky     Water
## 5    4/4/2017 1:03:26      Bayes Anatomy   Meredith      Gray
## 6    4/4/2017 1:03:26      Bayes Anatomy      Derek  Shepherd
## 7   4/3/2017 16:14:00         Fake iid&amp;#39;s    Carolyn      Byrd
## 8   4/3/2017 16:14:00         Fake iid&amp;#39;s     Gordon   Hawkins
## 9   4/3/2017 16:14:00         Fake iid&amp;#39;s    Cecilia   Pittman
## 10  4/3/2017 16:14:00         Fake iid&amp;#39;s       Paul      Rios
## 11  4/3/2017 16:14:00         Fake iid&amp;#39;s       Ryan      Rose
## 12 3/31/2017 23:55:00 Passive Regression     Amanda      Boyd
## 13 3/31/2017 23:55:00 Passive Regression       Rosa       Fox
## 14 3/31/2017 23:55:00 Passive Regression      Lucas  Gonzales
## 15  4/1/2017 20:14:05            The Pit      James   Andrews
## 16  4/1/2017 20:14:05            The Pit        Tom  Lawrence
## # ... with 7 more variables: email &amp;lt;chr&amp;gt;, school &amp;lt;chr&amp;gt;, class_year &amp;lt;chr&amp;gt;,
## #   major &amp;lt;chr&amp;gt;, participation &amp;lt;chr&amp;gt;, diet &amp;lt;chr&amp;gt;, tshirt_size &amp;lt;chr&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can now easily look for duplicates (sometimes students sign up twice or more times) or use these data to explore the various features of participants.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;individual-sign-ups&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Individual sign-ups&lt;/h3&gt;
&lt;p&gt;If a student is wanting to participate in DataFest but they don’t have a team in mind, we ask them to fill out a brief survey where they answer questions about their background as well as how much time they are wanting to commit to DataFest, ranging from &lt;em&gt;“I’m in it to win it”&lt;/em&gt; to &lt;em&gt;“I’m more interested in the experience, and am not really sure if I’ll submit a final presentation.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Sometimes students find a team and register with that team after having filled out this survey. These students should be removed from the list of those looking for teammates, though there is no easy way for them to do so in Google Forms (they can’t go back and remove their response).&lt;/p&gt;
&lt;p&gt;However we can easily do this with an &lt;code&gt;anti_join&lt;/code&gt;. Suppose this data frame is called &lt;code&gt;looking&lt;/code&gt;, and remember that the earlier data frame of students registering with teams was called &lt;code&gt;participants&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;looking &amp;lt;- anti_join(looking, participants, by = &amp;quot;email&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, the survey results are made available to the same students who are looking for teammates so that they can match up with others and form a team.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;looking %&amp;gt;%
  select(first_name, last_name, participation_level, class_year, major, school, participation_before, email) %&amp;gt;%
  arrange(participation_level, class_year, major, school) %&amp;gt;%
  datatable()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here we make use of the &lt;code&gt;datatable&lt;/code&gt; function in the &lt;code&gt;DT&lt;/code&gt; package to display the list of students in a pretty and easily sortable and searchable format.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;consultants-and-judges&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Consultants and judges&lt;/h3&gt;
&lt;p&gt;Using a similar approach we can also grab, organize, and report lists of consultants and judges. All relevant code for this can be found in the &lt;a href=&#34;https://github.com/mine-cetinkaya-rundel/datafest&#34;&gt;datafest GitHub repo&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;participant-summary&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Participant summary&lt;/h3&gt;
&lt;p&gt;Now that we have our participant data in a tidy format, we can visualize distributions of majors, years, previous participation etc.&lt;/p&gt;
&lt;p&gt;For example, we can count how many teams are participating from each school:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;participants %&amp;gt;%
  distinct(team_name, .keep_all = TRUE) %&amp;gt;%
  count(school) %&amp;gt;%
  arrange(desc(n))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 3 x 2
##                    school     n
##                     &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;
## 1           Faber College     2
## 2 Port Chester University     2
## 3     Harrison University     1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or visualize the distribution of class years per school:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(data = participants, aes(x = school, fill = class_year)) +
  geom_bar(position = &amp;quot;fill&amp;quot;) +
  labs(title = &amp;quot;Schools and class years&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-04-04-organizing-datafest-the-tidy-way_files/figure-html/unnamed-chunk-2-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;information-guides&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Information guides&lt;/h3&gt;
&lt;p&gt;We can also use R Markdown to create documents that are mostly text, that introduce the event to the participants, consultants, and judges. Then, summary statistics and visualizations of the participants can easily be included in these guides.&lt;/p&gt;
&lt;p&gt;Sample guides for participants and consultants/judges can also be found on the &lt;a href=&#34;https://github.com/mine-cetinkaya-rundel/datafest&#34;&gt;GitHub repo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And finally, all of these can be published on RPubs. However, note that these documents will be publicly available.&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/04/05/datafestorg/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
