In a recent post, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful packages.

Although I began with a few ideas about packages that I wanted to talk about, like ROCR and pROC, which I have found useful in the past, I decided to use Gábor Csárdi’s relatively new package pkgsearch to search through CRAN and see what’s out there. The `package_search()`

function takes a text string as input and uses basic text mining techniques to search all of CRAN. The algorithm searches through package text fields, and produces a score for each package it finds that is weighted by the number of reverse dependencies and downloads.

```
library(tidyverse) # for data manipulation
library(dlstats) # for package download stats
library(pkgsearch) # for searching packages
```

After some trial and error, I settled on the following query, which includes a number of interesting ROC-related packages.

`rocPkg <- pkg_search(query="ROC",size=200)`

Then, I narrowed down the field to 46 packages by filtering out orphaned packages and packages with a score less than 190.

```
rocPkgShort <- rocPkg %>%
filter(maintainer_name != "ORPHANED", score > 190) %>%
select(score, package, downloads_last_month) %>%
arrange(desc(downloads_last_month))
head(rocPkgShort)
```

```
## # A tibble: 6 x 3
## score package downloads_last_month
## <dbl> <chr> <int>
## 1 690. ROCR 56356
## 2 7938. pROC 39584
## 3 1328. PRROC 9058
## 4 833. sROC 4236
## 5 266. hmeasure 1946
## 6 1021. plotROC 1672
```

To complete the selection process, I did the hard work of browsing the documentation for the packages to pick out what I thought would be generally useful to most data scientists. The following plot uses Guangchuang Yu’s `dlstats`

package to look at the download history for the six packages I selected to profile.

```
library(dlstats)
shortList <- c("pROC","precrec","ROCit", "PRROC","ROCR","plotROC")
downloads <- cran_stats(shortList)
ggplot(downloads, aes(end, downloads, group=package, color=package)) +
geom_line() + geom_point(aes(shape=package)) +
scale_y_continuous(trans = 'log2')
```

### ROCR - 2005

ROCR has been around for almost 14 years, and has be a rock-solid workhorse for drawing ROC curves. I particularly like the way the `performance()`

function has you set up calculation of the curve by entering the true positive rate, `tpr`

, and false positive rate, `fpr`

, parameters. Not only is this reassuringly transparent, it shows the flexibility to calculate nearly every performance measure for a binary classifier by entering the appropriate parameter. For example, to produce a precision-recall curve, you would enter `prec`

and `rec`

. Although there is no vignette, the documentation of the package is very good.

The following code sets up and plots the default `ROCR`

ROC curve using a synthetic data set that comes with the package. I will use this same data set throughout this post.

`library(ROCR)`

`## Loading required package: gplots`

```
##
## Attaching package: 'gplots'
```

```
## The following object is masked from 'package:stats':
##
## lowess
```

```
# plot a ROC curve for a single prediction run
# and color the curve according to cutoff.
data(ROCR.simple)
df <- data.frame(ROCR.simple)
pred <- prediction(df$predictions, df$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf,colorize=TRUE)
```

### pROC - 2010

It is clear from the downloads curve that `pROC`

is also popular with data scientists. I like that it is pretty easy to get confidence intervals for the Area Under the Curve, `AUC`

, on the plot.

`library(pROC)`

`## Type 'citation("pROC")' for a citation.`

```
##
## Attaching package: 'pROC'
```

```
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
```

```
pROC_obj <- roc(df$labels,df$predictions,
smoothed = TRUE,
# arguments for ci
ci=TRUE, ci.alpha=0.9, stratified=FALSE,
# arguments for plot
plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
print.auc=TRUE, show.thres=TRUE)
sens.ci <- ci.se(pROC_obj)
plot(sens.ci, type="shape", col="lightblue")
```

```
## Warning in plot.ci.se(sens.ci, type = "shape", col = "lightblue"): Low
## definition shape.
```

`plot(sens.ci, type="bars")`

### PRROC - 2014

Although not nearly as popular as `ROCR`

and `pROC`

, `PRROC`

seems to be making a bit of a comeback lately. The terminology for the inputs is a bit eclectic, but once you figure that out the `roc.curve()`

function plots a clean ROC curve with minimal fuss. `PRROC`

is really set up to do precision-recall curves as the vignette indicates.

```
library(PRROC)
PRROC_obj <- roc.curve(scores.class0 = df$predictions, weights.class0=df$labels,
curve=TRUE)
plot(PRROC_obj)
```

### plotROC - 2014

`plotROC`

is an excellent choice for drawing ROC curves with `ggplot()`

. My guess is that it appears to enjoy only limited popularity because the documentation uses medical terminology like “disease status” and “markers”. Nevertheless, the documentation, which includes both a vignette and a Shiny application, is very good.

The package offers a number of feature-rich `ggplot()`

geoms that enable the production of elaborate plots. The following plot contains some styling, and includes Clopper and Pearson (1934) exact method confidence intervals.

```
library(plotROC)
rocplot <- ggplot(df, aes(m = predictions, d = labels))+ geom_roc(n.cuts=20,labels=FALSE)
rocplot + style_roc(theme = theme_grey) + geom_rocci(fill="pink")
```

### precrec - 2015

`precrec`

is another library for plotting ROC and precision-recall curves.

`library(precrec)`

```
##
## Attaching package: 'precrec'
```

```
## The following object is masked from 'package:pROC':
##
## auc
```

```
precrec_obj <- evalmod(scores = df$predictions, labels = df$labels)
autoplot(precrec_obj)
```

Parameter options for the `evalmod()`

function make it easy to produce basic plots of various model features.

```
precrec_obj2 <- evalmod(scores = df$predictions, labels = df$labels, mode="basic")
autoplot(precrec_obj2)
```

### ROCit - 2019

`ROCit`

is a new package for plotting ROC curves and other binary classification visualizations that rocketed onto the scene in January, and is climbing quickly in popularity. I would never have discovered it if I had automatically filtered my original search by downloads. The default plot includes the location of the Yourden’s J Statistic.

`library(ROCit)`

`## Warning: package 'ROCit' was built under R version 3.5.2`

```
ROCit_obj <- rocit(score=df$predictions,class=df$labels)
plot(ROCit_obj)
```

Several other visualizations are possible. The following plot shows the cumulative densities of the positive and negative responses. The KS statistic shows the maximum distance between the two curves.

`ksplot(ROCit_obj)`

In this attempt to dig into CRAN and uncover some of the resources R contains for plotting ROC curves and other binary classifier visualizations, I have only scratched the surface. Moreover, I have deliberately ignored the many packages available for specialized applications, such as survivalROC for computing time-dependent ROC curves from censored survival data, and cvAUC, which contains functions for evaluating cross-validated AUC measures. Nevertheless, I hope that this little exercise will help you find what you are looking for.

You may leave a comment below or discuss the post in the forum community.rstudio.com.