<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>dbplyr on R Views</title>
    <link>https://rviews.rstudio.com/tags/dbplyr/</link>
    <description>Recent content in dbplyr on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Wed, 18 Oct 2017 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/dbplyr/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>Database Queries With R</title>
      <link>https://rviews.rstudio.com/2017/10/18/database-queries-with-r/</link>
      <pubDate>Wed, 18 Oct 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/10/18/database-queries-with-r/</guid>
      <description>
        


&lt;p&gt;There are many ways to query data with R. This post shows you three of the most common ways:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Using &lt;code&gt;DBI&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Using &lt;code&gt;dplyr&lt;/code&gt; syntax&lt;/li&gt;
&lt;li&gt;Using R Notebooks&lt;/li&gt;
&lt;/ol&gt;
&lt;div id=&#34;background&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Background&lt;/h3&gt;
&lt;p&gt;Several recent package improvements make it easier for you to use databases with R. The query examples below demonstrate some of the capabilities of these R packages.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://rstats-db.github.io/DBI//index.html&#34;&gt;DBI&lt;/a&gt;. The &lt;code&gt;DBI&lt;/code&gt; specification has gone through many &lt;a href=&#34;https://www.r-consortium.org/blog/2017/05/15/improving-dbi-a-retrospect&#34;&gt;recent improvements&lt;/a&gt;. When working with databases, you should always use packages that are &lt;code&gt;DBI&lt;/code&gt;-compliant.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://dplyr.tidyverse.org/&#34;&gt;dplyr&lt;/a&gt; &amp;amp; &lt;a href=&#34;http://dbplyr.tidyverse.org/&#34;&gt;dbplyr&lt;/a&gt;. The &lt;code&gt;dplyr&lt;/code&gt; package now has a generalized SQL backend for talking to databases, and the new &lt;code&gt;dbplyr&lt;/code&gt; package translates R code into database-specific variants. As of this writing, SQL variants are supported for the following databases: Oracle, Microsoft SQL Server, PostgreSQL, Amazon Redshift, Apache Hive, and Apache Impala. More will follow over time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/rstats-db/odbc&#34;&gt;odbc&lt;/a&gt;. The &lt;code&gt;odbc&lt;/code&gt; R package provides a standard way for you to connect to any database as long as you have an ODBC driver installed. The &lt;code&gt;odbc&lt;/code&gt; R package is &lt;code&gt;DBI&lt;/code&gt;-compliant, and is recommended for ODBC connections.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RStudio also made recent improvements to its products so they work better with databases.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://blog.rstudio.com/2017/10/09/rstudio-v1.1-released/&#34;&gt;RStudio IDE (v1.1)&lt;/a&gt;. With the latest version of the RStudio IDE, you can connect to, explore, and view data in a variety of databases. The IDE has a wizard for setting up new connections, and a tab for exploring established connections. These new features are extensible and will work with any R package that has a &lt;a href=&#34;https://rstudio.github.io/rstudio-extensions/connections-contract.html&#34;&gt;connections contract&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.rstudio.com/products/drivers/&#34;&gt;RStudio Professional Drivers&lt;/a&gt;. If you are using RStudio professional products, you can download RStudio Professional Drivers for no additional cost. The examples below use the Oracle ODBC driver. If you are using open-source tools, you can bring your own driver or use community packages – many open-source drivers and community packages exist for connecting to a variety of databases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using databases with R is a broad subject and there is more work to be done. An earlier blog post discussed &lt;a href=&#34;https://blog.rstudio.com/2017/06/27/dbplyr-1-1-0/&#34;&gt;our vision&lt;/a&gt;. Part of that vision was to create a website where you can find everything about databases and R in one place. To learn more, visit our site at &lt;a href=&#34;http://db.rstudio.com/best-practices/drivers&#34;&gt;db.rstudio.com&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example-query-bank-data-in-an-oracle-database&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Example: Query bank data in an Oracle database&lt;/h3&gt;
&lt;p&gt;In this example, we will query bank data in an Oracle database. We connect to the database by using the &lt;code&gt;DBI&lt;/code&gt; and &lt;code&gt;odbc&lt;/code&gt; packages. This specific connection requires a database driver and a data source name (DSN) that have both been configured by the system administrator. Your connection might use another method.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;library(DBI)
library(dplyr)
library(dbplyr)
library(odbc)
con &amp;lt;- dbConnect(odbc::odbc(), &amp;quot;Oracle DB&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;query-using-dbi&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;1. Query using &lt;code&gt;DBI&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;You can query your data with &lt;code&gt;DBI&lt;/code&gt; by using the &lt;code&gt;dbGetQuery()&lt;/code&gt; function. Simply paste your SQL code into the R function as a quoted string. This method is sometimes referred to as &lt;em&gt;pass through SQL code&lt;/em&gt;, and is probably the simplest way to query your data. Care should be used to escape your quotes as needed. For example, &lt;code&gt;&#39;yes&#39;&lt;/code&gt; is written as &lt;code&gt;\&#39;yes\&#39;&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;dbGetQuery(con,&amp;#39;
  select &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;,
  sum(case when &amp;quot;term_deposit&amp;quot; = \&amp;#39;yes\&amp;#39; then 1.0 else 0.0 end) as subscribe,
  count(*) as total
  from &amp;quot;bank&amp;quot;
  group by &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;
&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;query-using-dplyr-syntax&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;2. Query using dplyr syntax&lt;/h3&gt;
&lt;p&gt;You can write your code in &lt;code&gt;dplyr&lt;/code&gt; syntax, and &lt;code&gt;dplyr&lt;/code&gt; will translate your code into SQL. There are several benefits to writing queries in &lt;code&gt;dplyr&lt;/code&gt; syntax: you can keep the same consistent language both for R objects and database tables, no knowledge of SQL or the specific SQL variant is required, and you can take advantage of the fact that &lt;code&gt;dplyr&lt;/code&gt; uses &lt;a href=&#34;http://dbplyr.tidyverse.org/articles/dbplyr.html&#34;&gt;lazy evaluation&lt;/a&gt;. &lt;code&gt;dplyr&lt;/code&gt; syntax is easy to read, but you can always inspect the SQL translation with the &lt;code&gt;show_query()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;q1 &amp;lt;- tbl(con, &amp;quot;bank&amp;quot;) %&amp;gt;%
  group_by(month_idx, year, month) %&amp;gt;%
  summarise(
    subscribe = sum(ifelse(term_deposit == &amp;quot;yes&amp;quot;, 1, 0)),
    total = n())
show_query(q1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;SQL&amp;gt;
SELECT &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;, SUM(CASE WHEN (&amp;quot;term_deposit&amp;quot; = &amp;#39;yes&amp;#39;) THEN (1.0) ELSE (0.0) END) AS &amp;quot;subscribe&amp;quot;, COUNT(*) AS &amp;quot;total&amp;quot;
FROM (&amp;quot;bank&amp;quot;) 
GROUP BY &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;query-using-an-r-notebooks&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;3. Query using an R Notebooks&lt;/h3&gt;
&lt;p&gt;Did you know that you can run SQL code in an &lt;a href=&#34;http://rmarkdown.rstudio.com/r_notebooks.html&#34;&gt;R Notebook&lt;/a&gt; code chunk? To use SQL, open an &lt;a href=&#34;http://rmarkdown.rstudio.com/r_notebooks.html&#34;&gt;R Notebook&lt;/a&gt; in the RStudio IDE under the &lt;strong&gt;File &amp;gt; New File&lt;/strong&gt; menu. Start a new code chunk with &lt;code&gt;{sql}&lt;/code&gt;, and specify your connection with the &lt;code&gt;connection=con&lt;/code&gt; code chunk option. If you want to send the query output to an R dataframe, use &lt;code&gt;output.var = &amp;quot;mydataframe&amp;quot;&lt;/code&gt; in the code chunk options. When you specify &lt;code&gt;output.var&lt;/code&gt;, you will be able to use the output in subsequent R code chunks. In this example, we use the output in &lt;code&gt;ggplot2&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;```{sql, connection=con, output.var = &amp;quot;mydataframe&amp;quot;}
SELECT &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;, SUM(CASE WHEN (&amp;quot;term_deposit&amp;quot; = &amp;#39;yes&amp;#39;) THEN (1.0) ELSE (0.0) END) AS &amp;quot;subscribe&amp;quot;,
COUNT(*) AS &amp;quot;total&amp;quot;
FROM (&amp;quot;bank&amp;quot;) 
GROUP BY &amp;quot;month_idx&amp;quot;, &amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;
```&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br/&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;```{r}
library(ggplot2)
ggplot(mydataframe, aes(total, subscribe, color = year)) +
  geom_point() +
  xlab(&amp;quot;Total contacts&amp;quot;) +
  ylab(&amp;quot;Term Deposit Subscriptions&amp;quot;) +
  ggtitle(&amp;quot;Contact volume&amp;quot;)
```&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-10-18-database-queries-with-R/bankggplot.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;The benefits to using SQL in a code chunk are that you can paste your SQL code without any modification. For example, you do not have to escape quotes. If you are using the proverbial &lt;em&gt;spaghetti code&lt;/em&gt; that is hundreds of lines long, then a SQL code chunk might be a good option. Another benefit is that the SQL code in a code chunk is highlighted, making it very easy to read. For more information on SQL engines, see this page on &lt;a href=&#34;http://rmarkdown.rstudio.com/authoring_knitr_engines.html&#34;&gt;knitr language engines&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;There is no single best way to query data with R. You have many methods to chose from, and each has its advantages. Here are some of the advantages using the methods described in this article.&lt;/p&gt;
&lt;table&gt;
&lt;colgroup&gt;
&lt;col width=&#34;34%&#34; /&gt;
&lt;col width=&#34;65%&#34; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Advantages&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;DBI::dbGetQuery&lt;/li&gt;
&lt;/ol&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Fewer dependencies required&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;&lt;ol start=&#34;2&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;dplyr syntax&lt;/li&gt;
&lt;/ol&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Use the same syntax for R and database objects&lt;/li&gt;
&lt;li&gt;No knowledge of SQL required&lt;/li&gt;
&lt;li&gt;Code is standard across SQL variants&lt;/li&gt;
&lt;li&gt;Lazy evaluation&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;&lt;ol start=&#34;3&#34; style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;R Notebook SQL engine&lt;/li&gt;
&lt;/ol&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;Copy and paste SQL – no formatting required&lt;/li&gt;
&lt;li&gt;SQL syntax is highlighted&lt;/li&gt;
&lt;/ul&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;You can download the R Notebook for these examples &lt;a href=&#34;http://rpubs.com/nwstephens/318586&#34;&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/10/18/database-queries-with-r/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
