Calling R From Python With rpy2

2022-05-25

by Isabella Velásquez

This post on R Views is about… Python! Surprising, I know. Python has several well-written packages for statistics and data science, but CRAN, R’s central repository, contains thousands of packages implementing sophisticated statistical algorithms that have been field-tested over many years. Thanks to the rpy2 package, Pythonistas can take advantage of the great work already done by the R community. rpy2 provides an interface that allows you to run R in Python processes. Users can move between languages and use the best of both programming languages.

Below, I walk you through how to call three powerful R packages from Python: stats, lme4, and ggplot2. Each section contains detailed steps, and you can find the complete script in the appendix.

Getting started with rpy2

Installing rpy2

First up, install the necessary packages. You must have Python >=3.7 and R >= 4.0 installed to use rpy2 3.5.2. Once R is installed, install the rpy2 package by running pip install rpy2. If you’d like to see where you installed rpy2 on your machine, you can run python -m rpy2.situation.

If you are working in a Jupyter notebook, you may want to see your ggplot2 plots within your notebook. Run conda install r-ggplot2 in your Jupyter environment so that they show up.

Installation

pip install rpy2
conda install r-ggplot2

Importing rpy2 packages and subpackages

Then, you’ll import the packages and subpackages. Import the top-level rpy2 package by running import rpy2.

Import the top-level subpackage robjects with import rpy2.robjects as robjects. Running robjects also initializes R in the current Python process.

There are a couple of other steps to make working in a notebook a bit easier:

rpy2 customizes the display of R objects, such as data frames in a notebook. Run rpy2.ipython.html.init_printing() to enable this customization.
You may want to see ggplot2 objects in an output cell of a notebook. Enable this with from rpy2.ipython.ggplot import image_png.

Import rpy2 packages and subpackages

import rpy2
import rpy2.robjects as robjects

## To aid in printing HTML in notebooks
import rpy2.ipython.html
rpy2.ipython.html.init_printing()

## To see plots in an output cell
from rpy2.ipython.ggplot import image_png

Installing and loading R packages with rpy2

Installing and loading R packages are often the first steps in R scripts. The rpy2 package provides a function rpy2.robjects.packages.importr() that mimics these steps. Run the below, where you’ll also import the function data() for later.

from rpy2.robjects.packages import importr, data

Use importr() to load the utils and base packages, which normally come preinstalled with R.

utils = importr('utils')
base = importr('base')

You can also download and install packages from a package repository (such as CRAN) with rpy2.

First, select your desired mirror with utils.chooseCRANmirror() and install your packages with utils.install_packages(). Below, you will install stats and lme4.

utils.install_packages('stats')
utils.install_packages('lme4')

Once installed, load the packages with importr().

stats = importr('stats')
lme4 = importr('lme4')

Installing and loading ggplot2 with rpy2

The rpy2 documentation recommends a different method for installing ggplot2:

import rpy2.robjects.lib.ggplot2 as ggplot2

Installation of R packages

from rpy2.robjects.packages import importr, data

utils = importr('utils')
base = importr('base')

utils.chooseCRANmirror(ind=1)
utils.install_packages('lme4')
utils.install_packages('stats')

stats = importr('stats')
lme4 = importr('lme4')

import rpy2.robjects.lib.ggplot2 as ggplot2

Using the Python interface for R

With everything set up, you can begin using the Python interface! However, this is not as simple as writing R code in your Jupyter notebook. You have to ‘translate’ R functions to call them from Python.

Below, I analyze the sleepstudy data from lme4. The data represents the average reaction time per day (in milliseconds) for subjects in a sleep deprivation study. On days 0-1, the subjects had their regular amount of sleep. Day 2 was baseline, after which sleep deprivation started.

Working with data in rpy2

You can ‘fetch’ data from R packages with rpy2. The code below is the equivalent to lme4::sleepstudy in R. Notice you use the data() function imported earlier:

sleepstudy = data(lme4).fetch('sleepstudy')['sleepstudy']
sleepstudy

DataFrame with 180 rows and 3 columns:

		Reaction	Days	Subject
0	1	249.56	0.0	308
1	2	258.7047	1.0	308
2	3	250.8006	2.0	308
3	4	321.4398	3.0	308
4	5	356.8519	4.0	308
5	6	414.6901	5.0	308
6	7	382.2038	6.0	308
7	8	290.1486	7.0	308
...	...	...	...	...
178	179	369.1417	8.0	372
179	180	364.1236	9.0	372

Visualizing data with ggplot2 from Python

The ggplot2 package provides flexible and robust functionality for creating graphics. We can plot our data in Python with a few tweaks to the usual ggplot2 syntax.

Visualize the relationship between days and reaction time from the sleep study:

gp = ggplot2.ggplot(sleepstudy)

p1 = (gp +
      ggplot2.aes_string(x = 'Days', y = 'Reaction') +
      ggplot2.geom_point(color = '#648FFF', alpha = 0.7) +
      ggplot2.geom_smooth(method = 'lm', color = 'black') +
      ggplot2.theme_minimal())

display(image_png(p1))

R[write to console]: `geom_smooth()` using formula 'y ~ x'

Scatterplot of sleep trial days vs reaction time with a positive correlation

Running models and viewing results with rpy2

You can leverage R’s statistical capabilities in Python. Call the lm() function from the stats package to write a linear model with rpy2:

lm1 = stats.lm('Reaction ~ Days', data = sleepstudy)
# print(base.summary(lm1))

Printing the fit object with print(base.summary(lm1)) displays both the code used to perform the fit and the results. For brevity, limit the output to the coefficients:

print(stats.coef(lm1))

(Intercept)        Days 
  251.40510    10.46729

Print the 95% confidence intervals as well:

print(stats.confint(lm1))

                 2.5 %    97.5 %
(Intercept) 238.360753 264.44946
Days          8.023855  12.91072

The results show that the more days of sleep deprivation, the slower the reaction time. However, you may realize that the linear model disguises the variation across the study’s subjects. To illustrate these differences, you can plot Reaction vs. Days for each subject separately with ggplot2’s facet_wrap():

p2 = (p1 +
      ggplot2.facet_wrap(robjects.Formula('. ~ Subject'), **{'ncol':6}))

display(image_png(p2))

R[write to console]: `geom_smooth()` using formula 'y ~ x'

Scatterplot of sleep trial days vs reaction time by the 18 subjects in the study showing varying correlations for each one

The plots show that the average reaction time increases approximately linearly with the number of sleep-deprived days. However, the slope and intercept vary by subject.

You can use the lme4 package to create a model that calculates the fixed effects of days of sleep deprivation on reaction time while considering starting reaction time and differences for each individual.

fm1 = lme4.lmer('Reaction ~ Days + (Days | Subject)', data = sleepstudy)
print(base.summary(fm1))

Here is the output from summary, abbreviated to the results:

REML criterion at convergence: 1743.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9536 -0.4634  0.0231  0.4634  5.1793 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.10   24.741       
          Days         35.07    5.922   0.07
 Residual             654.94   25.592       
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.825  36.838
Days          10.467      1.546   6.771

Correlation of Fixed Effects:
     (Intr)
Days -0.138

The estimates of the standard deviations of the random effects for the intercept and the slope are 24.74 ms and 5.92 ms/day. The fixed-effects coefficients are 251.4 ms and 10.47 ms/day for the intercept and slope. These results match the ones output by R.¹

Learning how to use rpy2

Working with rpy2 took investigative work. There were R functions that I assumed would be straightforward to translate to Python but they turned out to be more complicated.

For example, I wanted to confirm that each subject had different linear model results. The subset() function is part of base R, so I thought I could run:

lm1 = stats.lm('Reaction ~ Days', data =  base.subset(sleepstudy, Subject == 308))

Unfortunately, this is incorrect. I found the answer reading the Extracting items section of the documentation. After learning how rpy2 works with R data frames, I was able to find the actual way to subset the data:

lm308 = stats.lm('Reaction ~ Days', data = sleepstudy.rx(sleepstudy.rx('Subject').ro == 308, True))
print(stats.confint(lm308))

                 2.5 %    97.5 %
(Intercept) 179.433863 308.95148
Days          9.634267  33.89514

You will need knowledge of Python for more flexibility with the rpy2 interface. For instance, the ability to define the number of columns with **{'ncol':6} in facet_wrap() above was not in the rpy2 documentation. Unpacking values with keyword arguments, denoted by **{}, is part of the general Python syntax.

Learn more

There are many ways that Python users can leverage R packages and functions in their scripts. The rpy2 is a helpful tool for those who want to use the R community’s impactful work. As stated in the rpy2 documentation, “Reuse. Get things done. Don’t reimplement.”

Read the official documentation.
- The documentation includes help on using rpy2 in notebooks.
Check out the lme4 vignette.

Appendix

# Installation
pip install rpy2
conda install r-ggplot2

# Import rpy2 packages and subpackages
import rpy2
import rpy2.robjects as robjects

## To aid in printing HTML in notebooks
import rpy2.ipython.html
rpy2.ipython.html.init_printing()

## To see plots in an output cell
from rpy2.ipython.ggplot import image_png

# Installation of R packages
from rpy2.robjects.packages import importr, data

utils = importr('utils')
base = importr('base')

utils.chooseCRANmirror(ind=1)
utils.install_packages('lme4')
utils.install_packages('stats')

stats = importr('stats')
lme4 = importr('lme4')

import rpy2.robjects.lib.ggplot2 as ggplot2

# Working with data in rpy2
sleepstudy = data(lme4).fetch('sleepstudy')['sleepstudy']
sleepstudy

# Visualizing with ggplot2 in Python
gp = ggplot2.ggplot(sleepstudy)

p1 = (gp +
      ggplot2.aes_string(x = 'Days', y = 'Reaction') +
      ggplot2.geom_point(color = '#648FFF', alpha = 0.7) +
      ggplot2.geom_smooth(method = 'lm', color = 'black') +
      ggplot2.theme_minimal())

display(image_png(p1))

p2 = (p1 +
      ggplot2.facet_wrap(robjects.Formula('. ~ Subject'), **{'ncol':6}))

display(image_png(p2))

# Run models and view results
lm1 = stats.lm('Reaction ~ Days', data = sleepstudy)
print(stats.coef(lm1))
print(stats.confint(lm1))

fm1 = lme4.lmer('Reaction ~ Days + (Days | Subject)', data = sleepstudy)
print(base.summary(fm1))

Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01. ↩