Analysing the HIV pandemic, Part 3: Genetic diversity

This is part 3 of a four-part series about the HIV epidemic in Africa. In a recent publication in PLoS ONE, the authors described how they used affordable hardware to create a phylogenetic pipeline, tailored for the HIV drug-resistance testing facility. In this part, we discuss genetic diversity and how this can be analysed using markov chains and heatmaps.

Read more

Share Comments · · ·

Virtual Morel Foraging with R

Enjoy a virtual mushroom hunt with R and RSelenium, which allows R to use a web browser as a human would, including clicking on buttons, etc.

Read more

Share Comments · · · ·

Analysing the HIV pandemic, Part 2: Drug resistance testing

This is part 2 of a four-part series about the HIV epidemic in Africa. In a recent publication in PLoS ONE, the authors described how they used affordable hardware to create a phylogenetic pipeline, tailored for the HIV drug-resistance testing facility. Part 2 discusses drug-resistance testing of HIV isolates in sub-Saharan Africa.

Read more

Share Comments · · ·

Analysing the HIV pandemic, Part 1: HIV in sub-Sahara Africa

The Human Immunodeficiency Virus (HIV) is the virus that causes acquired immunodeficiency syndrome (AIDS). The virus invades various immune cells, causing loss of immunity, and thus increases susceptibility to infections, including Tuberculosis and cancer. In a recent publication in PLoS ONE, the authors described how they used affordable hardware to create a phylogenetic pipeline, tailored for the HIV drug resistance testing facility. In this first of a series of four posts, we highlight the serious problem of HIV infection in sub-Saharan Africa, with special analysis of the situation in South Africa. The subsequent posts will describe the phylogenetics pipeline (running on a Raspberry Pi), and the analysis of viral sequences using R.

Read more

Share Comments · · ·

March 2019: "Top 40" New CRAN Packages

By my count, two hundred and thirty-three packages stuck to CRAN last month. I have tried to capture something of the diversity of the offerings by selecting packages in ten categories: Computational Methods, Data, Machine Learning, Medicine, Science, Shiny, Statistics, Time Series, Utilities, and Visualization. The Shiny category contains packages that expand on Shiny capabilities, not just packages that implement a Shiny application. It is not clear whether this is going to be a new cottage industry or not.

Read more

Share Comments · · ·

A Few Old Books

Greg Wilson is a data scientist and professional educator at RStudio. My previous column looked at a few new books about R. In this one, I’d like to explore a few books about programming that people coming from data science backgrounds may not have stumbled upon. The first is Michael Nygard’s Release It!, which more than lives up to its subtitle, “Design and Deploy Production-Ready Software”. Most of us can write programs that work for us on our machines; this book explores what it takes to create software that will work reliably for other people, on machines you’ve never met, long after you’ve moved on to your next project.

Read more

Share Comments · · ·

Reproducible Environments

Great data science work should be reproducible. The ability to repeat experiments is part of the foundation for all science, and reproducible work is also critical for business applications. Team collaboration, project validation, and sustainable products presuppose the ability to reproduce work over time. In my opinion, mastering just a handful of important tools will make reproducible work in R much easier for data scientists.

Read more

Share Comments · · · · · · ·

Setting up RStudio Server on a Cloud for Collaboration and Reproducibility

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When setting up R and RStudio Server on a cloud Linux instance, some thought should be given to implementing a workflow that facilitates collaboration and ensures R project reproducibility. There are many possible workflows to accomplish this. In this post, we offer an “opinionated” solution based on what we have found to work in a production environment.

Read more

Share Comments · · · · · ·

On Meeting Data Journalists

“I’d rather do data than date”. I overheard this while eavesdropping on a conversation among three female data journalists while waiting for an elevator at the IRE-CAR (Investigative Reporters and Editors - Computer-Assisted Reporting) conference last month. I would like to think the remark was overloaded with hyperbole, but maybe not. Most of the attendees as this conference were motivated, tenacious, and highly skilled data hounds, the kind of investigative journalists who pry information from government databases through persistent requests, legal leverage, and SQL expertise.

Read more

Share Comments · · ·

How to share R visualizations in Microsoft PowerPoint

Hadrien Dykiel is an RStudio Customer Success Engineer Microsoft PowerPoint is often the de facto choice for creating presentation slides, especially at larger companies. In many organizations, it comes pre-installed on workstations and pretty much everybody knows how to use it. This can make it an effective medium for sharing information, since most folks are comfortable with it. Unfortunately, valuable time is often lost manually creating slides. R developers often find themselves copying and pasting their results into presentation decks.

Read more

Share Comments · · ·