Interview with Oscar Baruffa, Creator of the Big Book of R

by Isabella Velásquez

Welcome to the new year! If you’re itching to improve your R skills in 2022, we have the resource for you.

We’re excited to share the Big Book of R. “Your last-ever bookmark”, the Big Book of R is an impressive collection of R-related books from a variety of subjects. Creator Oscar Baruffa first published the book in August 2020. Since then, it has grown from a list of 80 to over 200, garnering 73,000 unique visitors and 195,000 pageviews from readers around the globe.

The Book is organized by different subjects that range from introduction to R programming to big data to archeology. Its organization and search functionality make it easy for newcomers to find books related to their topic of interest.

Bar chart of the books by section in the Big Book of R

The Big Book of R is a wonderful example of collaboration in the R Community. Oscar wrote the book using the bookdown package. Contributors can file an issue or create a pull request on Github. And of course, the book would not be possible without the authors who have written books to guide others on their R journey.

While you are preparing for 2022, we encourage you to cozy up to one of the great books in the Big Book of R.

Interview with Oscar

Hello! Could you tell us a bit about yourself, please?

I’m a South African now living in the Netherlands, working as a Data Specialist in an international development non-profit focused on sustainable trade systems. My role is basically that of a senior analytics manager and I’m the first one, so I get to have all the fun directing the development of our data pipeline! No really, it’s a lot of fun :).

I studied Mechanical Engineering in my undergraduate and masters degrees and have been dabbling in tech-related side projects and hobbies for many years.

How did you get started with the R Community?

I think it was sometime in late 2018 when I was busy learning a bit of Python and Jesse Mostipak started popping up on my Twitter feed. She made R sound quite fun so I thought I’d give it a try. After I got to the exercise in R for Data Science by Hadley Wickham and Garrett Grolemund where you’re introduced to faceting a plot, my mind was blown and I was hooked.

I then started participating in the #TidyTuesday challenge and recorded some screencasts on my “Other People’s RStats” YouTube channel. I took a bit of a leap of faith submitting a lightning talk for satRday Johannesburg in April 2019, which was being organised by Andrew Collier and Megan Beckett. I was selected, which was a welcome surprise for my little topic. They were so helpful when I was trying to figure out the pull request flow for submitting my presentation. I also attended my first-ever R workshop the day before (which Andrew presented), which was the first time I’d ever sat in a room with other people who love R. It was awesome!

While doing all of this, I was also following more and more people on Twitter who were tweeting about R, collecting bookmarks of packages, tutorials, and, of course, books. I was having a lot of fun. By early 2020, I wrote my first book with Veerle van Son called Twitter for R Programmers as a way to introduce others to the R community on the social platform. So by that point, you could say I’d become heavily invested in the community :).

What inspired you to start the Big Book of R?

I had been diligently collecting bookmarks of books as I was finding them. After about 2 years of doing so, I had an inkling that I must have a large collection. One day, I counted about 80. I compared it to other lists of books that I’d seen published and I had way more, so I figured this might be quite unique to have so many.

Having had written my own short book, I also appreciated how much effort it took and I felt there must be better usability for readers and discoverability for authors if books were all listed in one place and grouped by topic (I invented a library - haha!). I also hoped (still do) that it might encourage more writing too. I put all the books together, spent some time categorising them, and in August 2020 published it. It made quite a splash!

Tell us about the design choices you made to make the book inclusive and open to the community.

I opted to use bookdown and git as I already had a bit of experience using them for Twitter for R Programmers. It felt very “meta” and fitting to use bookdown, which had in turn been used for almost all of the books in the collection. I was hoping this format would allow others to submit books as well — and they’ve generously done so. I kept the collection as a plain text format in the hope that it would slightly lower the barrier to people submitting books, but in the end, many people just tag me in a tweet which is welcome :).

What has surprised you the most about this project?

I knew people would like this but I didn’t expect how much it would be appreciated. Every now and then, I get a message of thanks for creating and maintaining it that really makes me feel warm and fuzzy inside. I hope people also reach out to the authors and do the same. Their effort in writing these books is immense and a little bit of appreciation will make their day — guaranteed!

What also surprises me is how there’s very consistent spikes in views and a steady growth in daily visitors whenever someone else shares the Big Book of R. If you’re interested to see the analytics, I’ve made them publicly accessible.

What’s in store for the Big Book of R in 2022?

I’m sure the collection will keep growing :).

I’ve just remodeled how the content is generated. Instead of capturing the book entries directly into markdown, it’s now generated by reading the data from a Google sheet. This new setup gives me the flexibility to do things like alphabetize the books more easily, add additional fields and tags more easily, set up Twitter/LinkedIn bots to automatically post about books in the collection, set up scripts to detect book updates, etc. Basically, I want to open up more possibilities for further automation and discoverability. If anyone is building anything using this data, I’d love to hear about it.

If anyone has ideas of how to improve, please get in touch with me or submit an issue in the repo.

Do you have any ongoing or upcoming projects you’d like the R Community to know about?

If things work out, there’s a chance I’ll be a technical reviewer on two R books being worked on in 2022, so that’ll be a new experience for me that I’m looking forward to. I’m going to keep writing useful articles about R, data and data careers over on my blog (and releasing fun R-related products here and there). The best way to be notified of those is to sign up to my newsletter. I’m also looking to write some more on the topic of Project Management to build upon my other bit of work that I’m really proud of, Project Management Fundamentals for Data Analysts.

Closing question: what is your favorite R package right now?

I think I’d give that accolade to {dplyr}! I’m pretty sure that filter() and group_by() are my most-used functions. Nothing gives me greater pleasure than a good anti_join().

We at RStudio would like to thank Oscar for his contribution to RViews and the creation of a great resource for the R Community. Happy reading in 2022!

Share Comments · · ·

You may leave a comment below or discuss the post in the forum