<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>R/Medicine on R Views</title>
    <link>https://rviews.rstudio.com/tags/r/medicine/</link>
    <description>Recent content in R/Medicine on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 09 Sep 2021 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/r/medicine/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>A Guide to Binge Watching R / Medicine 2021</title>
      <link>https://rviews.rstudio.com/2021/09/09/a-guide-to-binge-watching-r-medicine/</link>
      <pubDate>Thu, 09 Sep 2021 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2021/09/09/a-guide-to-binge-watching-r-medicine/</guid>
      <description>
        

&lt;p&gt;&lt;a href=&#34;https://r-medicine.org/&#34;&gt;R / Medicine&lt;/a&gt; is a big deal. This year, the conference grew by 13% with 665 people from over 60 countries signing up for the virtual event which was held last month. 34% percent of the registrants were from outside of the United States and 17% identified as physicians.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;rmed.png&#34; height = &#34;300&#34; width=&#34;500&#34; alt=&#34;Global map with locations of R Medicine registrants indicated&#34;&gt;&lt;/p&gt;

&lt;p&gt;The conference is now an established international event where experts report on the advanced use of the R language, Machine Learning, and statistical analysis, and discuss the successes and challenges associated with bringing these technologies to day-to-day medical practice.&lt;/p&gt;

&lt;p&gt;Almost all of the talks, including keynotes, regular talks, lightning talks, pre-conference workshops and poster sessions are available online. &lt;a href=&#34;https://r-medicine.org/schedule/&#34;&gt;Find the links&lt;/a&gt; on the R / Medicine site or look through the &lt;a href=&#34;https://www.youtube.com/playlist?list=PL4IzsxWztPdmHxCpS_c2l_jbMfrywWciZ&#34;&gt;playlist &lt;/a&gt; on the &lt;a href=&#34;https://www.r-consortium.org/&#34;&gt;R Consortium Youtube&lt;/a&gt; Channel. Note that the posters can be viewed by going to the &lt;a href=&#34;https://spatial.chat/s/R-Medicine2021?room=231308&#34;&gt;conference spatial.chat site&lt;/a&gt;. (If you and a friend visit at the same time you should be able to &amp;ldquo;walk around&amp;rdquo; the posters and chat about what you see.)&lt;/p&gt;

&lt;p&gt;To kick off an evening of binge watching the conference I would begin with the keynotes.&lt;/p&gt;

&lt;h3 id=&#34;the-keynotes&#34;&gt;The Keynotes&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://medicine.umich.edu/dept/lhs/karandeep-singh-md-mmsc&#34;&gt;Dr. Karandeep Singh&lt;/a&gt; sets the hook for his talk, &lt;a href=&#34;https://www.youtube.com/watch?v=l71wLKUr26E&amp;amp;list=PL4IzsxWztPdmHxCpS_c2l_jbMfrywWciZ&amp;amp;index=7&#34;&gt;Bringing Machine Learning Models to the Bedside at Scale&lt;/a&gt;, two minutes into the video when he asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Who are the twenty sickest patients in the hospital right now who are not in the ICU?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This straightforward question immediately gets to the promise and the problems of introducing large scale machine learning algorithms into the hospital, and indicates how medical practice interacts with big money questions about allocating resources. Both physicians and administrators would like to identify high risk patients and treat them proactively while being able to confidently spend less on unnecessary test for low risk patients. About (5:10) into the talk, Karandeep begins discussing the challenges associated with introducing machine learning models.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;chal.png&#34; height = &#34;300&#34; width=&#34;500&#34; alt=&#34;Slide with list of challenges discussed. Is there infrastructure to support models? Should we implement a model? Once implemented, how do we measure model performance? Is a model “good enough” to use? Do users agree on how to use the model? Is the model effective when used? What does governance look like for machine learning models?&#34;&gt;&lt;/p&gt;

&lt;p&gt;In the remainder of the talk he describes the technical infrastructure and then the governance or &amp;ldquo;social infrastructure&amp;rdquo; needed for success.&lt;/p&gt;

&lt;p&gt;If you enjoy a good detective story, and take pride in your ability to interpret a well-done statistical plot you are certainly going to want to watch &lt;a href=&#34;http://ziadobermeyer.com/&#34;&gt;Ziad Obermeyer&amp;rsquo;s&lt;/a&gt; keynote  &lt;a href=&#34;https://www.youtube.com/watch?v=JfKYO1W4uuA&amp;amp;list=PL4IzsxWztPdmHxCpS_c2l_jbMfrywWciZ&amp;amp;index=27&#34;&gt;Dissecting Algorithmic Bias&lt;/a&gt;. About two minutes into the video Professor Obermeyer sets the stage with the warning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The single greatest threat to all of the gains that we can make in using algorithms in medicine is letting them go wrong in increasingly well known ways.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and the observation that due to the focus of the US health care management on &amp;ldquo;high risk care management&amp;rdquo; an estimated 150 to 200 million Americans are sorted by algorithms every year. He goes on to work through a case study that illustrates how an algorithm built with good intentions had the effect of scaling up racial bias.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;bias.png&#34; height = &#34;300&#34; width=&#34;600&#34; alt=&#34;Dot plot with regression line of algorithm risk score versus realized cost to show the racial bias in high risk care management&#34;&gt;&lt;/p&gt;

&lt;p&gt;A second case study features an algorithm that &amp;ldquo;fights against&amp;rdquo; racial bias. Along the way, Ziad weaves two common themes into his presentation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;So many of the ways that algorithms can go wrong come from training algorithms with the wrong target variables, often &amp;ldquo;convenient and tempting proxies&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;The necessity of follow-up work to fix underlying problems.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the remainder of this post, I have organized the talks into six categories that you may find helpful for setting your viewing program: Clinical Practice, Clinical Trials, Medical Data, R in Production, R Tools, and Short Courses. The majority of the talks have a machine learning angle. There is quite a bit of Shiny and several R packages, not all of them on CRAN, are featured. I have provided links when I could find them. I don&amp;rsquo;t want to spoil anyone&amp;rsquo;s fun in searching through the videos for &amp;ldquo;Easter Eggs&amp;rdquo;, but the &lt;em&gt;Reproducible Research with R&lt;/em&gt; short course contains the first preview on the &lt;a href=&#34;https://quarto.org/&#34;&gt;Quarto&lt;/a&gt; Publishing system in a talk from anyone at RStudio. (Note that the video needs some editing. Start watching at 9 minutes.)&lt;/p&gt;

&lt;h3 id=&#34;clinical-practice&#34;&gt;Clinical Practice&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Building an Interpretable ML Model API for Interpretation of CNVs in Patients with Rare Diseases -    Francisco Requena&lt;/li&gt;
&lt;li&gt;Subgroup Identification and Precision Medicine with the personalized R Package -  Jared Huling&lt;/li&gt;
&lt;li&gt;R and Shiny Dashboards to Facilitate Quality Improvement in Anesthesiology and Periopeartive Care -   Robert Lobato&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tidytof&lt;/code&gt;: Predicting Patient Outcomes from Single-cell Data using Tidy Data Principles   - Timothy Keyes&lt;/li&gt;
&lt;li&gt;Assessing ML Model Performance in DIverse Populations and Across Time - Victor Castro, Roy Perlis&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;clinical-trials&#34;&gt;Clinical Trials&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Designing Early Phase Clinical Trials with &lt;a href=&#34;https://github.com/zabore/ppseq&#34;&gt;&lt;code&gt;ppseq&lt;/code&gt;&lt;/a&gt; -   Emily Zabor&lt;/li&gt;
&lt;li&gt;Collaborative, Reproducible Exploration of Clinical Trial Data -  Michael Kane&lt;/li&gt;
&lt;li&gt;Graphical Displays in R for Clinical Trials - Steven Schwager&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.gitmemory.com/presagia-analytics/ctrialsgov&#34;&gt;&lt;code&gt;ctrialsgov&lt;/code&gt;&lt;/a&gt;: Access, Visualization, and Discovery of the ClinicalTrials.gov Database - Taylor Arnold&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;medical-data&#34;&gt;Medical Data&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scaling Up and Deploying Shiny and Text Mining for National Health Decisions - Andreas Soteriade, Chris Beeley&lt;/li&gt;
&lt;li&gt;Mapping African Health Data with &lt;a href=&#34;https://afrimapr.github.io/afrimapr.website/&#34;&gt;&lt;code&gt;afrimapr&lt;/code&gt;&lt;/a&gt; Package, Training &amp;amp; Community -   Andy South&lt;/li&gt;
&lt;li&gt;You R What You Measure: Digital Biomarkers for Insights in Personalized Health - Irene van den Broek&lt;/li&gt;
&lt;li&gt;Shiny and REDCap for a Global Research Consortium - Judith Lewis, Stephany Duda&lt;/li&gt;
&lt;li&gt;Diving into Registry Data: Using R for Large Norwegian Health Registries -    Julia Romanowska&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/web/packages/ReviewR/index.html&#34;&gt;&lt;code&gt;ReviewR&lt;/code&gt;&lt;/a&gt;: A Shiny App for Reviewing Clinical Records   - Laura Wiley,  David Mayer&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=DOPE&#34;&gt;&lt;code&gt;DOPE&lt;/code&gt;&lt;/a&gt;: An R package for Processing and Classifying Drug Names -   Layla Bouzoubaa&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cran.r-project.org/package=medicaldata&#34;&gt;&lt;code&gt;medicaldata&lt;/code&gt;&lt;/a&gt; for Teaching #Rstats -    Peter Higgins&lt;/li&gt;
&lt;li&gt;Stem Cell Transplant Outcomes Reporting using R/Shiny -   Richard Hanna,  Stephan Kadauke&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;r-in-production&#34;&gt;R in Production&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Second Server to the Right and Straight On &amp;lsquo;til Production: Deploying a GxP Shiny Application -   Marcus Adams&lt;/li&gt;
&lt;li&gt;Target Markdown and &lt;a href=&#34;https://docs.ropensci.org/stantargets/&#34;&gt;&lt;code&gt;stantargets&lt;/code&gt;&lt;/a&gt; for Bayesian model validation pipelines - Will Landau&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.themillerlab.io/publication/genetex/&#34;&gt;&lt;code&gt;GENETEX&lt;/code&gt;&lt;/a&gt;: A Genomics Report Text Mining R Package to Capture Real-world Clinico-genomic Data - David Miller, Sophia Shalhout&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;r-tools&#34;&gt;R Tools&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Generalized Additive Models for Longitudinal Biomedical Data  -   Ariel Mundo&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;Multistate Data Using the &lt;a href=&#34;https://cran.r-project.org/package=survival&#34;&gt;&lt;code&gt;survival&lt;/code&gt;&lt;/a&gt; Package   - Beth Atkinson&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;Bayesian Random-Effects Meta-analysis using &lt;a href=&#34;https://cran.r-project.org/package=survival&#34;&gt;&lt;code&gt;bayesmeta&lt;/code&gt;&lt;/a&gt; -  Christian Rover&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;An &lt;a href=&#34;https://cran.r-project.org/package=arsenal&#34;&gt;&lt;code&gt;arsenal&lt;/code&gt;&lt;/a&gt; of R Functions for Statistical Summaries - Ethan Heinzen,  Beth Atkinson,  Jason Sinnwell&lt;/li&gt;
&lt;li&gt;R Markdown and &lt;a href=&#34;https://cran.r-project.org/package=officedown&#34;&gt;&lt;code&gt;officedown&lt;/code&gt;&lt;/a&gt; to Automate Clinical Trial Reporting -   Damian Rodziewicz&lt;/li&gt;
&lt;li&gt;Creating and Styling PPTX Slides with &lt;a href=&#34;https://cran.r-project.org/package=rmarkdown&#34;&gt;&lt;code&gt;rmarkdown&lt;/code&gt;&lt;/a&gt; -   Emil Hvitfeldt&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/ML4LHS/runway&#34;&gt;&lt;code&gt;runway&lt;/code&gt;&lt;/a&gt;: an R Package to Visualize Prediction Model Performance -    Jie Cao,    Karandeep Singh&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=clinspacy&#34;&gt;&lt;code&gt;clinspacy&lt;/code&gt;&lt;/a&gt;: An R package for Clinical Natural Language Processing -  Jie Cao,    Karandeep Singh&lt;/li&gt;
&lt;li&gt;Data Visualization for Machine Learning Practitioners -   Julie Silge&lt;/li&gt;
&lt;li&gt;Animated Data Visualizations with &lt;a href=&#34;https://CRAN.R-project.org/package=gganimate&#34;&gt;&lt;code&gt;gganimate&lt;/code&gt;&lt;/a&gt; for Science Communication during the Pandemic - Kristen Panthagani&lt;/li&gt;
&lt;li&gt;Incorporating Risk-of-Bias Assessments into Evidence Syntheses with &lt;a href=&#34;https://cran.r-project.org/package=robvis&#34;&gt;&lt;code&gt;robvis&lt;/code&gt;&lt;/a&gt; -   Luke McGuinness,    Randall Boyes,  Alex Fowler&lt;/li&gt;
&lt;li&gt;&amp;lsquo;gpmodels&amp;rsquo;: A Grammar of Prediction Models -  Sean Meyer, Karandeep Singh&lt;/li&gt;
&lt;li&gt;CONSORT Diagrams in R with &lt;a href=&#34;https://github.com/tgerke/ggconsort&#34;&gt;&lt;code&gt;ggconsort&lt;/code&gt;&lt;/a&gt; -   Travis Gerke&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&#34;short-courses&#34;&gt;Short Courses&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Secure Medical Data Collection: Best Practices with Excel, and Leveling Up to REDCap and &lt;a href=&#34;https://github.com/kamclean/collaborator&#34;&gt;&lt;code&gt;CollaboratoR&lt;/code&gt;&lt;/a&gt; - Peter Higgins,  Will Beasley,   Kenneth MacLean, Amanda Miller&lt;/li&gt;
&lt;li&gt;Introduction to R for Medical Data -  Ted Laderas, Daniel Chen,   Mara Alexeev&lt;/li&gt;
&lt;li&gt;An Introductory R Guide for Targeted Maximum Likelihood Estimation in Medical Research - Ehsan Karim, Hanna Frank&lt;/li&gt;
&lt;li&gt;Mapping Spatial Health Data   - Marynia Kolak,    Susan Paykin&lt;/li&gt;
&lt;li&gt;From SAS to R - Joe Krsszun&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;Reproducible Research with R - Alison Hill, Stephan Kaduke,   Paul Villanueva&lt;/li&gt;
&lt;/ul&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2021/09/09/a-guide-to-binge-watching-r-medicine/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 4: Classification of lab samples</title>
      <link>https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/</link>
      <pubDate>Thu, 23 May 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this post we complete our series on analysing the HIV pandemic in Africa. Previously we covered the bigger picture of &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;HIV infection in Africa&lt;/a&gt;, and a &lt;a href=&#34;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#34;&gt;pipeline for drug resistance testing&lt;/a&gt; of samples in the lab.&lt;/p&gt;
&lt;p&gt;Then, in &lt;a href=&#34;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/&#34;&gt;part 3&lt;/a&gt; we saw that sometimes the same patient’s genotype must be repeatedly analysed in the lab, from samples taken years apart.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Let’s say we have genotyped a patient five years ago and we have a current genotype sequence. It should be possible to retrieve the previous sequence from a database of sequences without relying on identifiers only or at all. Sometimes when someone remarries they may change their surname or transcription errors can be made, which makes finding previous samples tedious and error-prone. So instead of using patient information to look for previous samples to include, we can rather use the sequence data itself and then confirm the sequences belong to the same patient or investigate any irregularities. If we suspect mother-to-child transmission from our analysis, we confirm this with the healthcare worker who sent the sample.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this final part, we discuss how the inter- and intra-patient HIV genetic distances were analyzed using logistic regression to gain insights into the probability distribution of these two classes. In other words, the goal is to find a way to tell whether two genetic samples are from the same person or from two different people.&lt;/p&gt;
&lt;p&gt;Samples from the same person can have slightly different genetic sequences, due to mutations and other errors. This is especially useful in comparing samples of genetic material from retroviruses.&lt;/p&gt;
&lt;div id=&#34;preliminary-analysis&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Preliminary analysis&lt;/h2&gt;
&lt;p&gt;To help answer this question, we downloaded data from the &lt;a href=&#34;https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html&#34;&gt;Los Alamos HIV sequence database&lt;/a&gt; (specifically, &lt;em&gt;Virus HIV-1, subtype C, genetic region POL CDS&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;Each observation is the (dis)similarity distance between different samples.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(readr)
library(dplyr)
library(ggplot2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;ggplot2&amp;#39; was built under R version 3.5.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pt_distance &amp;lt;- 
  read_csv(&amp;quot;dist_sample_10.csv.zip&amp;quot;, col_types = &amp;quot;ccdccf&amp;quot;)

head(pt_distance)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 6
##   sample1                sample2                 distance sub   area  type 
##   &amp;lt;chr&amp;gt;                  &amp;lt;chr&amp;gt;                      &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;fct&amp;gt;
## 1 KI_797.67744.AB874124… KI_481.67593.AB873933.…   0.0644 B     INT   Inter
## 2 502-2794.39696.JF3202… WC3.27170.EF175209.B.U…   0.0418 B     INT   Inter
## 3 KI_882.67653.AB874186… KI_813.67589.AB874131.…   0.0347 B     INT   Inter
## 4 HTM360.13332.DQ322231… C11-2069070.63977.AB87…   0.0487 B     INT   Inter
## 5 O5598.34737.GQ372062.… LM49.4011.AF086817.B.T…   0.0360 B     INT   Inter
## 6 GKN.45901.HQ026515.B.… C11-2069083.65198.AB87…   0.0699 B     INT   Inter&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, plot a histogram of the distance between samples. This clearly shows that the distance between samples of the same subject (intra-patient) is smaller than the distance between different subjects (inter-patient). This is not surprising.&lt;/p&gt;
&lt;p&gt;However, from the histogram it is also clear that there is not a clear demarcation between these types. Simply eye-balling the data seems to indicate that one could use an arbitrary threshold of around 0.025 to indicate whether the sample is from the same person or different people.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pt_distance %&amp;gt;% 
  mutate(
    type = forcats::fct_rev(type)
  ) %&amp;gt;% 
  ggplot(aes(x = distance, fill = type)) +
  geom_histogram(binwidth = 0.001) +
  facet_grid(rows = vars(type), scales = &amp;quot;free_y&amp;quot;) +
  scale_fill_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +
  coord_cartesian(xlim = c(0, 0.1)) +
  ggtitle(&amp;quot;Histogram of phylogenetic distance by type&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-21_analysing-hiv-pandemic-part-4/2019-05-21-analysing-hiv-pandemic-part-4_files/figure-html/histogram-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;modeling&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Modeling&lt;/h2&gt;
&lt;p&gt;Since we have &lt;strong&gt;two&lt;/strong&gt; sample types (intra-patient vs inter-patient), this is a binary classification problem.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Logistic_regression&#34;&gt;Logistic regression&lt;/a&gt; is a simple algorithm for binary classification, and a special case of a &lt;a href=&#34;https://en.wikipedia.org/wiki/Generalized_linear_model&#34;&gt;generalized linear model&lt;/a&gt; (&lt;strong&gt;GLM&lt;/strong&gt;). In &lt;strong&gt;R&lt;/strong&gt;, you can use the &lt;code&gt;glm()&lt;/code&gt; function to fit a GLM, and to specify a logistic regression, use the &lt;code&gt;family = binomial&lt;/code&gt; argument.&lt;/p&gt;
&lt;p&gt;In this case we want to train a model with &lt;code&gt;distance&lt;/code&gt; as independent variable, and &lt;code&gt;type&lt;/code&gt; the dependent variable, i.e. &lt;code&gt;type ~ distance&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We train on 100,000 (&lt;code&gt;n = 1e5&lt;/code&gt;) observations purely to reduce computation time:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pt_sample &amp;lt;- 
  pt_distance %&amp;gt;% 
  sample_n(1e5)
model &amp;lt;- glm(type ~ distance, data = pt_sample, family = binomial)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Note that sometimes the model throws a warning indicating numerical problems. This happens because the overlap between intra and inter is very small. If there is a very sharp dividing line between classes, the logistic regression algorithm has problems to converge.)&lt;/p&gt;
&lt;p&gt;However, in this case the numerical problems doesn’t actually cause a practical problem with model itself.&lt;/p&gt;
&lt;p&gt;The model summary tells us that the &lt;code&gt;distance&lt;/code&gt; variable is highly significant (indicated by the ***):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(model)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## glm(formula = type ~ distance, family = binomial, data = pt_sample)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.4035  -0.0050  -0.0010  -0.0002   8.4904  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(&amp;gt;|z|)    
## (Intercept)    5.7887     0.1796   32.23   &amp;lt;2e-16 ***
## distance    -355.1454     9.3247  -38.09   &amp;lt;2e-16 ***
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 23659.2  on 99999  degrees of freedom
## Residual deviance:  1440.5  on 99998  degrees of freedom
## AIC: 1444.5
## 
## Number of Fisher Scoring iterations: 12&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can use the model to compute a prediction for a range of genetic distances (from 0 to 0.05) and create a plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;newdata &amp;lt;-  data.frame(distance = seq(0, 0.05, by = 0.001))
pred &amp;lt;- predict(model, newdata, type = &amp;quot;response&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot_inter &amp;lt;- 
  pt_sample %&amp;gt;% 
  filter(distance &amp;lt;= 0.05, type == &amp;quot;Inter&amp;quot;) %&amp;gt;% 
  sample_n(2000)
  
plot_intra &amp;lt;- 
  pt_sample %&amp;gt;% 
  filter(distance &amp;lt;= 0.05, type == &amp;quot;Intra&amp;quot;) %&amp;gt;% 
  sample_n(2000)

threshold &amp;lt;-  with(newdata, approx(pred, distance, xout = 0.5))$y

ggplot() +
  geom_point(data = plot_inter, aes(x = distance, y = 0), alpha = 0.05, col = &amp;quot;blue&amp;quot;) +
  geom_point(data = plot_intra, aes(x = distance, y = 1), alpha = 0.05, col = &amp;quot;red&amp;quot;) +
  geom_rug(data = plot_inter, aes(x = distance, y = 0), col = &amp;quot;blue&amp;quot;) +
  geom_rug(data = plot_intra, aes(x = distance, y = 0), col = &amp;quot;red&amp;quot;) +
  geom_line(data = newdata, aes(x = distance, y = pred)) +
  annotate(x = 0.005, y = 0.9, label = &amp;quot;Type == intra&amp;quot;, geom = &amp;quot;text&amp;quot;, col = &amp;quot;red&amp;quot;) +
  annotate(x = 0.04, y = 0.1, label = &amp;quot;Type == inter&amp;quot;, geom = &amp;quot;text&amp;quot;, col = &amp;quot;blue&amp;quot;) +
  geom_vline(xintercept = threshold, col = &amp;quot;grey50&amp;quot;) +
  ggtitle(&amp;quot;Model results&amp;quot;, subtitle = &amp;quot;Predicted probability that Type == &amp;#39;Intra&amp;#39;&amp;quot;) +
  xlab(&amp;quot;Phylogenetic distance&amp;quot;) +
  ylab(&amp;quot;Probability&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-21_analysing-hiv-pandemic-part-4/2019-05-21-analysing-hiv-pandemic-part-4_files/figure-html/predictionplot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Logistic regression essentially fits an s-curve that indicates the probability. In this case, for small distances (lower than ~0.01) the probability of being the same person (i.e., type is intra) is almost 100%. For distances greater than 0.03 the probability of being type intra is almost zero (i.e., the model predicts type inter).&lt;/p&gt;
&lt;p&gt;The model puts the distance threshold at approximately 0.016.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-practical-value-of-this-work&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The practical value of this work&lt;/h2&gt;
&lt;p&gt;In part 2, we discussed how &lt;a href=&#34;https://journals.plos.org/plosone/article/authors?id=10.1371/journal.pone.0213241&#34;&gt;researchers&lt;/a&gt; developed an automated pipeline of phylogenetic analysis. The project was designed to run on the Raspberry Pi, a very low-cost computing device. This meant that the cost of implementation of the project is low, and the project has been implemented at the &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt; in South Africa.&lt;/p&gt;
&lt;p&gt;In this part, we described the very simple logistic regression model that runs as part of the pipeline. In addition to the descriptive analysis, e.g., heat maps and trees (as described in part 3), this logistic regression makes a prediction whether two samples were obtained from the same person, or from two different people. This prediction is helpful in allowing the laboratory staff identify potential contamination of samples, or indeed to match samples from people who weren’t matched properly by their name and other identifying information (e.g., through spelling mistakes or name changes).&lt;/p&gt;
&lt;p&gt;Finally, it’s interesting to note that traditionally the decision whether two samples were intra-patient or inter-patient was made on heuristics, instead of modelling. For example, a heuristic might say that if the genetic distance between two samples is less than 0.01, they should be considered a match from a single person.&lt;/p&gt;
&lt;p&gt;Heuristics are easy to implement in the lab, but sometimes it can happen that the origin of the original heuristic gets lost. This means that it’s possible that the heuristic is no longer applicable to the sample population.&lt;/p&gt;
&lt;p&gt;This modelling gave the researchers a tool to establish confidence intervals around predictions. In addition, it is now possible to repeat the model for many different local sample populations of interest, and thus have a tool that is better able to discriminate given the most recent data.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this multi-part series of HIV in Africa we covered four topics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;part 1&lt;/a&gt;, we analysed the incidence of HIV in sub-Sahara Africa, with special mention of the effect of the wide-spread availability of anti-retroviral (ARV) drugs during 2004. Since then, there was a rapid decline in HIV infection rates in South Africa.&lt;/li&gt;
&lt;li&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-2/&#34;&gt;part 2&lt;/a&gt;, we described the PhyloPi project - a phylogenetic pipeline to analyse HIV in the lab, available for the low-cost RaspBerry Pi. This work as published in the &lt;a href=&#34;https://journals.plos.org/plosone/&#34;&gt;PLoS ONE journal&lt;/a&gt;: “&lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility&lt;/a&gt;”&lt;/li&gt;
&lt;li&gt;Then, &lt;a href=&#34;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/&#34;&gt;part 3&lt;/a&gt; described the biological mechanism how the HIV virus mutates, and how this can be modeled using a Markov chain, and visualized as heat maps and phylogenetic trees.&lt;/li&gt;
&lt;li&gt;This final part covered how we used a very simple logistic regression model to identify if two samples in the lab came from the same person or two different people.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;closing-thoughts&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Closing thoughts&lt;/h2&gt;
&lt;p&gt;Dear readers,&lt;/p&gt;
&lt;p&gt;I hope that you enjoyed this series on ‘Analysing the HIV pandemic’ using R and some of the tools available as part of the &lt;a href=&#34;https://www.tidyverse.org/&#34;&gt;&lt;code&gt;tidyverse&lt;/code&gt;&lt;/a&gt; packages. Learning R provided me not only with a tool set to analyse data problems, but also a &lt;a href=&#34;https://stackoverflow.com/questions/tagged/r&#34;&gt;community&lt;/a&gt;. Being a biologist, I was not sure of the best approach for solving the problem of inter- and intra-patient genetic distances. I contacted Andrie from &lt;a href=&#34;https://resources.rstudio.com/authors/andrie-de-vries&#34;&gt;Rstudio&lt;/a&gt;, and not only did he help us with this, but he was also excited about it. It was a pleasure telling you about our journey on this blog site, and a privilege doing this with experts.&lt;/p&gt;
&lt;p&gt;Armand&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 3: Genetic diversity</title>
      <link>https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/</link>
      <pubDate>Thu, 16 May 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/</guid>
      <description>
        
&lt;script src=&#34;/rmarkdown-libs/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-binding/plotly.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/typedarray/typedarray.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/jquery/jquery.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/crosstalk/css/crosstalk.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/crosstalk/js/crosstalk.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/plotly-htmlwidgets-css/plotly-htmlwidgets.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-main/plotly-latest.min.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;recap&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Recap&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#34;&gt;part 2 of this series&lt;/a&gt;, we discussed the &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PhyloPi&lt;/a&gt; pipeline for conducting routine HIV phylogenetics in the drug-resistance testing laboratory as a part of quality control. As mentioned, during HIV replication the error-prone viral reverse transcriptase (RT) converts its RNA genome into DNA before it can be integrated into the host cell genome. During this conversion, the enzyme makes random mistakes in the copying process. These mistakes, or mutations, can be deleterious, beneficial or may have no measurable impact on the replicative fitness of the virus. However, the fast rate of mutation provides enough divergence to be useful for phylogenetic analysis.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;As infections spread from person to person, the virus continues to mutate and become more and more divergent. This allows us to use the genetic information we obtain while doing the drug resistance test and analyse the sequences for abnormalities.&lt;/p&gt;
&lt;p&gt;We showed how DNA sequences can be aligned and, based on the composition of ‘columns’ in these strings, a distance matrix can be calculated of each string against each other. In the example we discussed in part 2, we had a very simple method for calculating matches, i.e., we used either a one or zero. We can get closer to the truth by using substitution models, as we will explain below. In many machine learning algorithms, it is required that one first calculate the distances of each observation against each other, and the choice of algorithm is up to the analyst. Phylogenetic inference is very similar in that a distance matrix needs to be constructed on which the tree can be calculated.&lt;/p&gt;
&lt;p&gt;If the sequence targeted for phylogenetic inference is very stable with little or no evolution, the distances calculated will be zero or very close to it. This will not allow for differentiation. However, as we mentioned, HIV has a very fast rate of evolution due to its error-prone reverse transcriptase.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002251&#34;&gt;Cuevas&lt;/a&gt; &lt;em&gt;et al.&lt;/em&gt; (2015) published work on the &lt;em&gt;in vivo&lt;/em&gt; rate of HIV evolution. Their analysis revealed the highest mutation rate of any biological entity of &lt;span class=&#34;math inline&#34;&gt;\(4.1 \cdot 10^{-3}\)&lt;/span&gt; (&lt;span class=&#34;math inline&#34;&gt;\(sd=1.7 \cdot 10^{-3}\)&lt;/span&gt;). However, the error-prone reverse transcriptase is not the only mechanism of mutation. One defence against HIV infection is an enzyme called apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like or &lt;strong&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/APOBEC3G&#34;&gt;APOBEC&lt;/a&gt;&lt;/strong&gt;. These enzymes act on RNA and convert or mutate cytidine to uridine (uridine in RNA is the thymadine counterpart in DNA). This results in a G to A mutation on the cDNA.&lt;/p&gt;
&lt;p&gt;Also, shown by Cuevas &lt;em&gt;et al&lt;/em&gt;, these enzymes are not equally active in all people. On the other hand, the viral Vif protein inhibits this hypermutation by ‘tagging’ the APOBEC protein with ubiquinone for degradation by the cytoplasmic ubiquitin-dependent proteasome machinery.&lt;/p&gt;
&lt;p&gt;But how does this virus-driven mutation, or APOBEC-driven hypermutation, affect the virus in a negative (or positive) way?&lt;/p&gt;
&lt;p&gt;We first need to understand how RNA is translated into proteins. Below is a table showing the codon combinations for each of the 20 amino acids.&lt;/p&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span id=&#34;fig:codons&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/codon-table-by-sabal-edu.jpg&#34; alt=&#34;Amino acid encoding. Available at https://www.biologyjunction.com/protein-synthesis-worksheet/&#34; width=&#34;80%&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Amino acid encoding. Available at &lt;a href=&#34;https://www.biologyjunction.com/protein-synthesis-worksheet/&#34; class=&#34;uri&#34;&gt;https://www.biologyjunction.com/protein-synthesis-worksheet/&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As can be seen from the table above, some amino acids are encoded by more than one codon. For example, if we change the codon CGU to AGA, the resulting amino acid stays Arginine or R. This is referred to as a silent mutation, since the resulting protein will look the same. On the other hand, if we mutate AGU to CGU, the resulting mutation is from Serine to Arginine, or in single-letter notation, &lt;strong&gt;S to R&lt;/strong&gt;. A change in the amino acid is referred to as a non-synonymous mutation.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;p&gt;In reality, the APOBEC enzyme recognizes specific RNA sequence motifs, but just to give an idea of how this works, let’s look at an example.&lt;/p&gt;
&lt;p&gt;Load some packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ape)
library(Biostrings)
library(tibble)
library(tidyr)
library(dplyr)
library(knitr)
library(plotly)
library(RColorBrewer)
library(diagram)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Create a RNA sequence (remember &lt;code&gt;U&lt;/code&gt; is &lt;code&gt;T&lt;/code&gt; in RNA language):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;WT &amp;lt;- c(&amp;quot;CGA&amp;quot;, &amp;quot;GUU&amp;quot;, &amp;quot;AUA&amp;quot;, &amp;quot;GAG&amp;quot;, &amp;quot;UGG&amp;quot;, &amp;quot;AGU&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We have the sequence CGAGUUAUAGAGUGGAGU that we created in the cell block above as codons for clarity. We can now translate this sequence using the codon table or some function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;translate_dna_sequence &amp;lt;- function(x){
  x %&amp;gt;% 
    paste0(collapse = &amp;quot;&amp;quot;) %&amp;gt;% 
    gsub(&amp;quot;U&amp;quot;, &amp;quot;T&amp;quot;, .) %&amp;gt;% 
    DNAString() %&amp;gt;% 
    as.DNAbin() %&amp;gt;% 
    trans() %&amp;gt;% 
    .[[1]] %&amp;gt;% 
    as.character.AAbin()
}

AA &amp;lt;- WT %&amp;gt;% translate_dna_sequence()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code block above translated our RNA sequence into a protein sequence: R, V, I, E, W, S.&lt;/p&gt;
&lt;p&gt;Now let’s mutate all occurrences of &lt;code&gt;C&lt;/code&gt; to &lt;code&gt;U/T&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;MUT &amp;lt;- gsub(&amp;quot;C&amp;quot;, &amp;quot;U&amp;quot;, WT)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The resulting mutant sequence is: UGA, GUU, AUA, GAG, UGG, AGU, and if we now translate that, we get …&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;AA &amp;lt;- MUT %&amp;gt;% translate_dna_sequence()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;… the protein sequence: *, V, I, E, W, S.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;*&lt;/code&gt; means a &lt;em&gt;stop codon&lt;/em&gt; was introduced. Stop codons are responsible for terminating translation from RNA to protein. If one of the viral genes has a stop codon in it, the protein will truncate prematurely and the protein will most likely be dysfunctional. Mutations other than stop codons could also have a negative effect on the virus, or it can cause resistance to an ARV.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;calculating-genetic-distances-from-a-multiple-sequence-alignment-msa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Calculating genetic distances from a multiple sequence alignment (MSA)&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#34;&gt;part 2&lt;/a&gt;, we showed the general principle of a MSA. In biology, sequence alignments are used to look at similarities of DNA or protein sequences. For most phylogenetic analysis, a multiple sequence alignment is a requirement, and the more accurate the MSA, the more accurate the phylogenetic inference.&lt;/p&gt;
&lt;p&gt;First, we read in the multiple sequence alignment file.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read in the alignment file
aln &amp;lt;- read.dna(&amp;#39;example.aln&amp;#39;, format = &amp;#39;fasta&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we can calculate the distance matrix using the Kimura two-parameter (K80) model. There are various models that can be applied when looking at DNA substitution models. We will use a model based on &lt;a href=&#34;https://en.wikipedia.org/wiki/Markov_chain&#34;&gt;Markov chains&lt;/a&gt;. Remember:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“All models are wrong, but some are useful” - &lt;a href=&#34;https://en.wikipedia.org/wiki/George_E._P._Box&#34;&gt;George Box&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is &lt;strong&gt;very&lt;/strong&gt; true when it comes to estimating genetic distances and phylogenetic inference. Consider the image below:&lt;/p&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span id=&#34;fig:sumbstetutions&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/1024px-All_transitions_and_transversions.svg.png&#34; alt=&#34;transversions vs transitions. Available at https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: transversions vs transitions. Available at &lt;a href=&#34;https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png&#34; class=&#34;uri&#34;&gt;https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The figure above shows transition and transversion events. &lt;strong&gt;Transition&lt;/strong&gt; between &lt;strong&gt;A&lt;/strong&gt; and &lt;strong&gt;G&lt;/strong&gt; (the purines) and &lt;strong&gt;C&lt;/strong&gt; and &lt;strong&gt;T&lt;/strong&gt; (the pyrimidines) are more likely than &lt;strong&gt;transversions&lt;/strong&gt; (indicated by the red arrows). The K80 model takes this into account as one of its parameters, and these rates, or probabilities, are calculated or estimated by maximum likelihood.&lt;/p&gt;
&lt;p&gt;Let’s see what that looks like:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tmDNA &amp;lt;- matrix(c(0.8,0.05,0.1,0.05,
                  0.05,0.8,0.05,0.1,
                  0.1,0.05,0.8,0.05,
                  0.05,0.1,0.05,0.8),
                nrow = 4, byrow = TRUE)
stateNames &amp;lt;- c(&amp;quot;A&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;G&amp;quot;, &amp;quot;T&amp;quot;)
row.names(tmDNA) &amp;lt;- stateNames; colnames(tmDNA) &amp;lt;- stateNames

tmDNA %&amp;gt;% 
  kable(
    caption = &amp;quot;Example K80 probabilities of transitions or transversions&amp;quot;
  )&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-6&#34;&gt;Table 1: &lt;/span&gt;Example K80 probabilities of transitions or transversions&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;A&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;C&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;G&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;T&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;G&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plotmat(tmDNA,pos = c(2,2), 
        lwd = 1, box.lwd = 2, 
        cex.txt = 0.8, 
        box.size = 0.1, 
        box.type = &amp;quot;circle&amp;quot;, 
        box.prop = 0.5,
        box.col = &amp;quot;light blue&amp;quot;,
        arr.length=.1,
        arr.width=.1,
        self.cex = .6,
        self.shifty = -.01,
        self.shiftx = .14,
        main = &amp;quot;Markov Chain&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-14_analysing-hiv-pandemic-part-3/2019-05-14-analysing-hiv-pandemic-part-3_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This example is contrived, but should explain the concept of a substitutions model. The viral reverse transcriptase is not a random sequence generator, but it does make mistakes. Most of the time when it is copying the RNA into DNA, the base (state) stays the same. Then also, the probability of a transversion &lt;em&gt;vs.&lt;/em&gt; a transition is different. If you look at the figure above where we introduced transversion and transition, you will notice that A is more similar to G, and T is more similar to C in its chemical structure.&lt;/p&gt;
&lt;p&gt;There are many other &lt;a href=&#34;http://www.iqtree.org/doc/Substitution-Models&#34;&gt;substitution models&lt;/a&gt;. It is not always trivial to select the best model for phylogenetic inference. One technique is to run multiple maximum likelihood phylogenetic calculations using different models, and then pick the model with the lowest AIC (Akaike Information Criterion). For our pipeline, we selected the rather simple K80 model. Since we are looking at different sets of sequences at each submission, a simple model is probably better in order to avoid the problems caused by overfitting.&lt;/p&gt;
&lt;p&gt;We can use the &lt;code&gt;ape&lt;/code&gt; package and calculate distances using the &lt;code&gt;K80&lt;/code&gt; model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Calculate the genetic distances between sequences using the K80 model, as.mattrix makes the rest easier
alnDist &amp;lt;- dist.dna(aln, model = &amp;quot;K80&amp;quot;, as.matrix = TRUE)
alnDist[1:5, 1:5] %&amp;gt;% 
  kable(caption = &amp;quot;First few rows of our distance matrix&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-8&#34;&gt;Table 2: &lt;/span&gt;First few rows of our distance matrix&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;01_AE.JP.AB253686_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.US.HM450245_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.AU.AF407664_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.CN.KJ820110_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.RU.HM466986_INT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;01_AE.JP.AB253686_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0935626&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0961965&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;B.US.HM450245_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0935626&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378446&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378167&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378748&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;B.AU.AF407664_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0961965&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378446&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0454602&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0494138&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;B.CN.KJ820110_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378167&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0454602&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0479955&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;B.RU.HM466986_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378748&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0494138&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0479955&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The matrix has a shape of 47 by 47, so we just preview the first 5 rows and columns.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reduction-of-the-heatmap-to-focus-on-the-important-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Reduction of the heatmap to focus on the important data&lt;/h2&gt;
&lt;p&gt;The pipeline mentioned uses the &lt;strong&gt;Basic Local Alignment Search Tool&lt;/strong&gt; (BLAST) to retrieve previously sampled sequences, and adds these retrieved sequences to the analysis. &lt;a href=&#34;https://blast.ncbi.nlm.nih.gov/Blast.cgi&#34;&gt;BLAST&lt;/a&gt; is like a search engine you use on the web, but for protein or DNA sequences. By doing this, important sequences from retrospective samples are included, which enables PhyloPi to be aware of past sequences and not just batch-per-batch aware. Have a look at the &lt;a href=&#34;https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0213241&#34;&gt;paper&lt;/a&gt; for some examples.&lt;/p&gt;
&lt;p&gt;The data we have is ready to use for heatmap plotting purposes, but since the data also contains previously sampled sequences, comparing those sequences amongst themselves would be a distraction. We are interested in those samples, but only compared to the current batch of samples analysed. The figures below should explain this a bit better.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/heatmap_full.png&#34; alt=&#34;A diagram of a heatmap with lots of redundant and distracting data. &#34; width=&#34;50%&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
(#fig:distracting data)A diagram of a heatmap with lots of redundant and distracting data.
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;From the image above you can see that, typical of a heatmap, it is symmetrical on the diagonal. We show submitted &lt;em&gt;vs&lt;/em&gt; retrieved samples in both the horizontal and vertical direction. Notice also, annotated as “Distraction”, the previous samples are compared amongst themselves. We are not interested in those samples now, as we would already have acted on any issues then. What we want instead is a heatmap, as depicted in the image below.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/heatmap_focused.png&#34; alt=&#34;A diagram of a more focussed heatmap with the redundant and distracting data removed.&#34; width=&#34;50%&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
(#fig:focussed data)A diagram of a more focussed heatmap with the redundant and distracting data removed.
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;Fortunately, we have a very powerful tool, &lt;strong&gt;R&lt;/strong&gt;, at our disposal, and plenty of really useful and convenient packages like &lt;code&gt;dplyr&lt;/code&gt; to fix this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;alnDistLong &amp;lt;- 
  alnDist %&amp;gt;% 
  as.data.frame(stringsToFactors = FALSE) %&amp;gt;% 
  rownames_to_column(var = &amp;quot;sample_1&amp;quot;) %&amp;gt;% 
  gather(key = &amp;quot;sample_2&amp;quot;, value = &amp;quot;distance&amp;quot;, -sample_1, na.rm = TRUE) %&amp;gt;% 
  arrange(distance)

alnDistLong %&amp;gt;% head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                sample_1              sample_2 distance
## 1 01_AE.JP.AB253686_INT 01_AE.JP.AB253686_INT        0
## 2     B.US.HM450245_INT     B.US.HM450245_INT        0
## 3     B.AU.AF407664_INT     B.AU.AF407664_INT        0
## 4     B.CN.KJ820110_INT     B.CN.KJ820110_INT        0
## 5     B.RU.HM466986_INT     B.RU.HM466986_INT        0
## 6     B.US.DQ127546_INT     B.US.DQ127546_INT        0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Final cleanup and removal of distracting data&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# get the names of samples originally in the fasta file used for submission
qSample &amp;lt;- names(read.dna(&amp;quot;example.fasta&amp;quot;, format = &amp;quot;fasta&amp;quot;))

# compute new order of samples, so the new alignment is in the order of the heatmap example
sample_1 &amp;lt;- unique(alnDistLong$sample_1)
new_order &amp;lt;- c(sort(qSample), setdiff(sample_1, qSample))
new_order&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;01_AE.JP.AB253686_INT&amp;quot;  &amp;quot;01_AE.TH.JX448243_INT&amp;quot; 
##  [3] &amp;quot;01_AE.VN.LC100946_INT&amp;quot;  &amp;quot;38_BF1.UY.FJ213783_INT&amp;quot;
##  [5] &amp;quot;B.AU.AF407664_INT&amp;quot;      &amp;quot;B.CN.KJ820110_INT&amp;quot;     
##  [7] &amp;quot;B.KR.JN417106_INT&amp;quot;      &amp;quot;B.RU.HM466986_INT&amp;quot;     
##  [9] &amp;quot;B.US.DQ127546_INT&amp;quot;      &amp;quot;B.US.GU076504_INT&amp;quot;     
## [11] &amp;quot;B.US.HM450245_INT&amp;quot;      &amp;quot;BC.CN.JQ898256_INT&amp;quot;    
## [13] &amp;quot;C.ZA.KT183056_INT&amp;quot;      &amp;quot;C.ZM.KM049918_INT&amp;quot;     
## [15] &amp;quot;C.ZM.KM050042_INT&amp;quot;      &amp;quot;01_AE.TH.JX448252_INT&amp;quot; 
## [17] &amp;quot;01_AE.TH.JX448250_INT&amp;quot;  &amp;quot;01_AE.TH.JX448249_INT&amp;quot; 
## [19] &amp;quot;C.ZA.KT183058_INT&amp;quot;      &amp;quot;C.ZM.KM049913_INT&amp;quot;     
## [21] &amp;quot;B.KR.JN417120_INT&amp;quot;      &amp;quot;B.KR.JN417117_INT&amp;quot;     
## [23] &amp;quot;B.KR.JN417116_INT&amp;quot;      &amp;quot;57_BC.CN.JX679207_INT&amp;quot; 
## [25] &amp;quot;C.ZM.KM050043_INT&amp;quot;      &amp;quot;C.ZM.KM050041_INT&amp;quot;     
## [27] &amp;quot;01_AE.JP.AB253682_INT&amp;quot;  &amp;quot;01_AE.JP.AB253689_INT&amp;quot; 
## [29] &amp;quot;B.US.KJ704790_INT&amp;quot;      &amp;quot;B.ES.KC238594_INT&amp;quot;     
## [31] &amp;quot;B.AU.AF407665_INT&amp;quot;      &amp;quot;B.AU.AF407667_INT&amp;quot;     
## [33] &amp;quot;B.CN.KC987976_INT&amp;quot;      &amp;quot;B.CN.KT192001_INT&amp;quot;     
## [35] &amp;quot;B.US.AF040369_INT&amp;quot;      &amp;quot;B.US.M38429_INT&amp;quot;       
## [37] &amp;quot;B.US.DQ127547_INT&amp;quot;      &amp;quot;B.US.DQ127543_INT&amp;quot;     
## [39] &amp;quot;C.ZA.KT183062_INT&amp;quot;      &amp;quot;B.US.GU076505_INT&amp;quot;     
## [41] &amp;quot;B.US.GU076507_INT&amp;quot;      &amp;quot;C.ZM.KM049917_INT&amp;quot;     
## [43] &amp;quot;01_AE.CN.JQ302565_INT&amp;quot;  &amp;quot;01_AE.VN.FJ185234_INT&amp;quot; 
## [45] &amp;quot;F1.BR.FJ771006_INT&amp;quot;     &amp;quot;BF.AR.AF408631_INT&amp;quot;    
## [47] &amp;quot;BC.CN.KC898983_INT&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Plot the heatmap using &lt;code&gt;plotly&lt;/code&gt; for interactivity&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;alnDistLong %&amp;gt;% 
  filter(
    sample_1 %in% qSample,
    sample_1 != sample_2
    ) %&amp;gt;% 
  mutate(
    sample_2 = factor(sample_2, levels = new_order)
  ) %&amp;gt;% 
  plot_ly(
    x = ~sample_2,
    y = ~sample_1,
    z = ~distance,
    type = &amp;quot;heatmap&amp;quot;, colors = brewer.pal(11, &amp;quot;RdYlBu&amp;quot;), 
    zmin = 0.0, zmax = 0.03,  xgap = 2, ygap = 1
) %&amp;gt;% 
  layout(
    margin = list(l = 100, r = 10, b = 100, t = 10, pad = 4), 
    yaxis = list(tickfont = list(size = 10), showspikes = TRUE),
    xaxis = list(tickfont = list(size = 10), showspikes = TRUE)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;htmlwidget-1&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;plotly html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-1&#34;&gt;{&#34;x&#34;:{&#34;visdat&#34;:{&#34;538659887c83&#34;:[&#34;function () &#34;,&#34;plotlyVisDat&#34;]},&#34;cur_data&#34;:&#34;538659887c83&#34;,&#34;attrs&#34;:{&#34;538659887c83&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;z&#34;:{},&#34;zmin&#34;:0,&#34;zmax&#34;:0.03,&#34;xgap&#34;:2,&#34;ygap&#34;:1,&#34;colors&#34;:[&#34;#A50026&#34;,&#34;#D73027&#34;,&#34;#F46D43&#34;,&#34;#FDAE61&#34;,&#34;#FEE090&#34;,&#34;#FFFFBF&#34;,&#34;#E0F3F8&#34;,&#34;#ABD9E9&#34;,&#34;#74ADD1&#34;,&#34;#4575B4&#34;,&#34;#313695&#34;],&#34;alpha_stroke&#34;:1,&#34;sizes&#34;:[10,100],&#34;spans&#34;:[1,20],&#34;type&#34;:&#34;heatmap&#34;}},&#34;layout&#34;:{&#34;margin&#34;:{&#34;b&#34;:100,&#34;l&#34;:100,&#34;t&#34;:10,&#34;r&#34;:10,&#34;pad&#34;:4},&#34;yaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;tickfont&#34;:{&#34;size&#34;:10},&#34;showspikes&#34;:true,&#34;title&#34;:&#34;sample_1&#34;,&#34;type&#34;:&#34;category&#34;,&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;]},&#34;xaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;tickfont&#34;:{&#34;size&#34;:10},&#34;showspikes&#34;:true,&#34;title&#34;:&#34;sample_2&#34;,&#34;type&#34;:&#34;category&#34;,&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;BC.CN.KC898983_INT&#34;]},&#34;scene&#34;:{&#34;zaxis&#34;:{&#34;title&#34;:&#34;distance&#34;}},&#34;hovermode&#34;:&#34;closest&#34;,&#34;showlegend&#34;:false,&#34;legend&#34;:{&#34;yanchor&#34;:&#34;top&#34;,&#34;y&#34;:0.5}},&#34;source&#34;:&#34;A&#34;,&#34;config&#34;:{&#34;showSendToCloud&#34;:false},&#34;data&#34;:[{&#34;colorbar&#34;:{&#34;title&#34;:&#34;distance&#34;,&#34;ticklen&#34;:2,&#34;len&#34;:0.5,&#34;lenmode&#34;:&#34;fraction&#34;,&#34;y&#34;:1,&#34;yanchor&#34;:&#34;top&#34;},&#34;colorscale&#34;:[[&#34;0&#34;,&#34;rgba(165,0,38,1)&#34;],[&#34;0.0416666666666667&#34;,&#34;rgba(186,25,39,1)&#34;],[&#34;0.0833333333333333&#34;,&#34;rgba(207,42,39,1)&#34;],[&#34;0.125&#34;,&#34;rgba(222,66,46,1)&#34;],[&#34;0.166666666666667&#34;,&#34;rgba(235,91,57,1)&#34;],[&#34;0.208333333333333&#34;,&#34;rgba(245,115,69,1)&#34;],[&#34;0.25&#34;,&#34;rgba(249,143,82,1)&#34;],[&#34;0.291666666666667&#34;,&#34;rgba(253,169,94,1)&#34;],[&#34;0.333333333333333&#34;,&#34;rgba(254,191,112,1)&#34;],[&#34;0.375&#34;,&#34;rgba(254,212,132,1)&#34;],[&#34;0.416666666666667&#34;,&#34;rgba(254,229,152,1)&#34;],[&#34;0.458333333333333&#34;,&#34;rgba(255,242,171,1)&#34;],[&#34;0.5&#34;,&#34;rgba(255,255,191,1)&#34;],[&#34;0.541666666666667&#34;,&#34;rgba(244,250,215,1)&#34;],[&#34;0.583333333333333&#34;,&#34;rgba(230,245,239,1)&#34;],[&#34;0.625&#34;,&#34;rgba(211,236,244,1)&#34;],[&#34;0.666666666666667&#34;,&#34;rgba(189,226,238,1)&#34;],[&#34;0.708333333333333&#34;,&#34;rgba(167,213,231,1)&#34;],[&#34;0.75&#34;,&#34;rgba(144,195,221,1)&#34;],[&#34;0.791666666666667&#34;,&#34;rgba(121,177,211,1)&#34;],[&#34;0.833333333333333&#34;,&#34;rgba(101,154,199,1)&#34;],[&#34;0.875&#34;,&#34;rgba(82,131,187,1)&#34;],[&#34;0.916666666666667&#34;,&#34;rgba(67,106,175,1)&#34;],[&#34;0.958333333333333&#34;,&#34;rgba(59,80,162,1)&#34;],[&#34;1&#34;,&#34;rgba(49,54,149,1)&#34;]],&#34;showscale&#34;:true,&#34;x&#34;:[&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.US.GU076507_INT&#34;],&#34;y&#34;:[&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;],&#34;z&#34;:[0,0,0,0,0,0,0,0,0,0,0,0.00118588838545844,0.00118624017681518,0.00118624017681518,0.00237530137929893,0.00356295206894832,0.00356719705693705,0.00356719705693705,0.00595350496095845,0.00595954611860516,0.00715254630296478,0.00715254630296478,0.0144078126601721,0.0144078126601721,0.0155870201477025,0.0156273254598515,0.0156273254598515,0.0168178038999123,0.0168178038999123,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.019250040193739,0.019250040193739,0.0204770346745223,0.0241565618280353,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0242220267343138,0.0253584872200611,0.0253941862223067,0.0254642633981448,0.0254642633981448,0.0254642633981448,0.0266143894883516,0.0266348816479994,0.0278192110973387,0.0278192110973387,0.0278786633839003,0.0290614477611697,0.0291019568084084,0.0291255468231015,0.0291255468231015,0.0302883205933425,0.030350396907702,0.030350396907702,0.0316019620167614,0.0328067836257485,0.0340614894284614,0.0340614894284614,0.0341145301180979,0.0352946202395574,0.035319351727085,0.0353755648344528,0.0354070364734542,0.0364714474011735,0.0365300078460321,0.0365300078460321,0.0365540667492993,0.0365803864434399,0.0365803864434399,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378446096201193,0.0378446096201193,0.0378747853263698,0.0378747853263698,0.0378747853263698,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391778645270624,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0402300862987281,0.0403226935406603,0.0403515533580218,0.0415651479742212,0.0416238186755191,0.0417288904690102,0.0428098896677898,0.0428993295713217,0.0429337120779973,0.0429337120779973,0.0440569279050769,0.0440837757071892,0.0440837757071892,0.0441443681969443,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0442141225088342,0.0442141225088342,0.0453324388008796,0.0453324388008796,0.0453324388008796,0.0454247786277811,0.0454247786277811,0.0454247786277811,0.0454247786277811,0.0454601546313492,0.0454601546313492,0.0454601546313492,0.0455377705654831,0.0466112118764381,0.0466112118764381,0.0466413260405744,0.0467084763804041,0.0467084763804041,0.0478367045264898,0.0478367045264898,0.0479250237931973,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0481184214555959,0.0491187565109586,0.0491187565109586,0.0491786116124573,0.0492477575062494,0.0494137589676422,0.0494137589676422,0.0504345221143317,0.0504672721196076,0.0505397478892119,0.0505397478892119,0.0517592625025701,0.0517592625025701,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0519201928139014,0.0519201928139014,0.0519201928139014,0.0519201928139014,0.0530546000146163,0.0531768718957638,0.0543155247820763,0.0543533020434591,0.0543533020434591,0.055615915632105,0.055615915632105,0.055655386112653,0.055655386112653,0.055655386112653,0.055655386112653,0.0556972010120974,0.055741355269655,0.055741355269655,0.055741355269655,0.055741355269655,0.055741355269655,0.0568808787627047,0.0569196973356454,0.0569608698830132,0.0570043912991329,0.0570043912991329,0.0582697711540526,0.05831500803907,0.05831500803907,0.05831500803907,0.05831500803907,0.05836259325208,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0727187684900024,0.0740077866714672,0.0752992347330038,0.0753554963823033,0.0766487632175863,0.0780618629356785,0.0781890298399144,0.0781890298399144,0.0783259358529553,0.0783259358529553,0.07924264135199,0.07924264135199,0.0794205594239284,0.0794205594239284,0.079551428503007,0.0805994942877881,0.0806581857138925,0.0807193420094362,0.0808490289258925,0.0809885151347227,0.0809885151347227,0.0809885151347227,0.0809885151347227,0.0812967310572018,0.0813798501393859,0.0813798501393859,0.0819600393449685,0.082020584376985,0.082020584376985,0.082020584376985,0.082020584376985,0.0820835993360382,0.0820835993360382,0.0820835993360382,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0823602581075981,0.0825934447555855,0.0825934447555855,0.0826760422767417,0.082761066487341,0.082761066487341,0.082761066487341,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0834515892454756,0.0835887618082053,0.083735774793543,0.0841461089242488,0.0841461089242488,0.0841461089242488,0.0846922865810079,0.0847565693159816,0.0847565693159816,0.0848925702635871,0.0854461324645494,0.0855349987062,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0861301965535511,0.0862699816030094,0.0862699816030094,0.0862699816030094,0.0864196683669504,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.0865792181538065,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0871923074146537,0.0872503555801866,0.0874395462398282,0.0875076078929734,0.0875076078929734,0.0875076078929734,0.0875076078929734,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0878047108038582,0.0878047108038582,0.0878851762008905,0.0878851762008905,0.0879681079357578,0.0880535013582811,0.0880535013582811,0.0880535013582811,0.0885640503875291,0.0885640503875291,0.0885640503875291,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.088818857459368,0.0888888242409284,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0890362403878722,0.0890362403878722,0.0890362403878722,0.0891136799014978,0.0891136799014978,0.0891136799014978,0.089275997681829,0.089275997681829,0.089275997681829,0.089275997681829,0.0894482024020703,0.0894482024020703,0.0894482024020703,0.089538000835919,0.0900657428371278,0.0900657428371278,0.0901326079009568,0.0901326079009568,0.0901326079009568,0.0902019842058131,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0904251301698234,0.0904251301698234,0.0905045013824362,0.0905863591465226,0.0905863591465226,0.0905863591465226,0.0906706987256182,0.0906706987256182,0.0907575154492772,0.0907575154492772,0.090846804712601,0.090846804712601,0.090846804712601,0.0913826105051333,0.0914488695835729,0.0915889476471757,0.0915889476471757,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0918992024262255,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0920693010361488,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0927676529420411,0.0927676529420411,0.0927676529420411,0.0927676529420411,0.0928358330249355,0.0928358330249355,0.0929065401198158,0.0929797691281142,0.0930555150205005,0.0930555150205005,0.0930555150205005,0.0930555150205005,0.093214537683343,0.0932978047367561,0.0932978047367561,0.0933835692391848,0.0933835692391848,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0936558008647126,0.0936558008647126,0.0942266545058739,0.094299298680529,0.094299298680529,0.094299298680529,0.094299298680529,0.094299298680529,0.0943744701719034,0.0943744701719034,0.0943744701719034,0.0943744701719034,0.0945323751469019,0.0945323751469019,0.0946150988231988,0.0946150988231988,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0949710235302174,0.0949710235302174,0.0949710235302174,0.0950662390355332,0.0950662390355332,0.0950662390355332,0.0956213555496632,0.0956213555496632,0.095773072482434,0.095773072482434,0.095773072482434,0.095773072482434,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.096196516180269,0.096196516180269,0.096196516180269,0.0962887353590812,0.0962887353590812,0.0962887353590812,0.0962887353590812,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0966754557211777,0.0967392384692493,0.0968055853353737,0.0968055853353737,0.0969459502375118,0.0969459502375118,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0973413712990135,0.097428045764283,0.097428045764283,0.0975172390596885,0.0976089464980463,0.0976089464980463,0.0976089464980463,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0980682142818908,0.0980682142818908,0.0983465113773677,0.0984224833239981,0.0984224833239981,0.0984224833239981,0.0984224833239981,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0987518094698342,0.0987518094698342,0.0987518094698342,0.0989316671599273,0.0990253780401984,0.0990253780401984,0.0990253780401984,0.0991216041269198,0.0991216041269198,0.0991216041269198,0.0991216041269198,0.0996756187798842,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0999094557142755,0.0999094557142755,0.0999925068053664,0.0999925068053664,0.100166237570073,0.100166237570073,0.100166237570073,0.100256907624212,0.100350107827527,0.100544080193036,0.100544080193036,0.100644843278986,0.100644843278986,0.101321886032053,0.101321886032053,0.101321886032053,0.101321886032053,0.101321886032053,0.101406934905605,0.101406934905605,0.101406934905605,0.101584678237673,0.101677363125014,0.101677363125014,0.101677363125014,0.101772583893643,0.101870335929038,0.101870335929038,0.101870335929038,0.101870335929038,0.102738317574205,0.102738317574205,0.102738317574205,0.102738317574205,0.102738317574205,0.103007154303788,0.103298908329428,0.104158773075007,0.104247851639321,0.104247851639321,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104942864013885,0.104942864013885,0.104942864013885,0.105674386130524,0.105674386130524,0.105674386130524,0.105674386130524,0.105864304935508,0.105963103650085,0.106274801952539,0.107011847863434,0.107011847863434,0.107011847863434,0.107399886247951,0.1083518876488,0.108539723485016,0.108539723485016,0.108539723485016,0.108637512537915,0.108637512537915,0.108637512537915,0.109881296197943,0.109978573398824,0.11113129880212,0.11113129880212,0.11113129880212,0.11113129880212,0.111421575843391,0.11247859423247,0.112668461221151],&#34;zmin&#34;:0,&#34;zmax&#34;:0.03,&#34;xgap&#34;:2,&#34;ygap&#34;:1,&#34;type&#34;:&#34;heatmap&#34;,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;frame&#34;:null}],&#34;highlight&#34;:{&#34;on&#34;:&#34;plotly_click&#34;,&#34;persistent&#34;:false,&#34;dynamic&#34;:false,&#34;selectize&#34;:false,&#34;opacityDim&#34;:0.2,&#34;selected&#34;:{&#34;opacity&#34;:1},&#34;debounce&#34;:0},&#34;shinyEvents&#34;:[&#34;plotly_hover&#34;,&#34;plotly_click&#34;,&#34;plotly_selected&#34;,&#34;plotly_relayout&#34;,&#34;plotly_brushed&#34;,&#34;plotly_brushing&#34;,&#34;plotly_clickannotation&#34;,&#34;plotly_doubleclick&#34;,&#34;plotly_deselect&#34;,&#34;plotly_afterplot&#34;],&#34;base_url&#34;:&#34;https://plot.ly&#34;},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;/div&gt;
&lt;div id=&#34;phylogenetic-tree&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Phylogenetic tree&lt;/h2&gt;
&lt;p&gt;Above we used the package &lt;a href=&#34;http://ape-package.ird.fr/&#34;&gt;ape&lt;/a&gt; to calculate the genetic distances for the heatmap.&lt;/p&gt;
&lt;p&gt;Another way of looking at our alignment data is to use phylogenetic inference. The PhyloPi pipeline saves each step of phylogenetic inference to allow the user to intercept at any step. We can use the &lt;a href=&#34;https://en.wikipedia.org/wiki/Newick_format&#34;&gt;newick tree file&lt;/a&gt; (a text file formatted as newick) and draw our own tree:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tree &amp;lt;- read.tree(&amp;quot;example-tree.txt&amp;quot;)
plot.phylo(
  tree, cex = 0.8, 
  use.edge.length = TRUE, 
  tip.color = &amp;#39;blue&amp;#39;, 
  align.tip.label = FALSE, 
  show.node.label = TRUE
)
nodelabels(&amp;quot;This one&amp;quot;, 9, frame = &amp;quot;r&amp;quot;, bg = &amp;quot;red&amp;quot;, adj = c(-8.2,-46))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-14_analysing-hiv-pandemic-part-3/2019-05-14-analysing-hiv-pandemic-part-3_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;1152&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We have highlighted a node with a red block, with the text “This one”, which we can now discuss. We have three leaves in this node - KM050043, KM050042, KM050041 - and if you would look up these accession numbers at &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/nuccore/KM050041.1/&#34;&gt;NCBI&lt;/a&gt;, you will notice the publication it is tied to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“HIV transmission. Selection bias at the heterosexual HIV-1 transmission bottleneck”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this paper, the authors looked at selection bias when the infection is transmitted. They found that in a pool of viral quasi-species, transmission is biased to benefit the fittest viral quasi-species. The node highlighted above shows the kind of clustering one would expect with a study like the one mentioned above. You will also notice plenty of other nodes, which you can explore using the accession number and searching for it &lt;a href=&#34;https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The tree above is much like a &lt;a href=&#34;https://en.wikipedia.org/wiki/Dendrogram&#34;&gt;dendrogram&lt;/a&gt; used when displaying &lt;a href=&#34;https://en.wikipedia.org/wiki/Hierarchical_clustering#Agglomerative_clustering_example&#34;&gt;agglomerative&lt;/a&gt; or &lt;a href=&#34;https://en.wikipedia.org/wiki/Hierarchical_clustering&#34;&gt;hierarchical clustering&lt;/a&gt;. The numbers on the tree indicate the probability that the corresponding clusters are correct. The branch lengths indicate the distances between samples. In conjunction with a properly coloured heatmap, this is very useful for finding relevant clusters to investigate. If the reason for close clustering cannot be explained, the tests are repeated.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-importance-of-phylogenetics&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The importance of phylogenetics&lt;/h2&gt;
&lt;p&gt;Phylogenetics, and thus genetic distance calculations, are used in many branches of biology. It is one of the quality-control measures at our disposal, but it has been used for the reconstruction of the origin of HIV. You may find the research papers listed below interesting where the authors used phylogenetics to infer the zoonotic origins of HIV.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3234451/&#34;&gt;Paul M. Sharp and Beatrice H. Hahn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://science.sciencemag.org/content/287/5453/607.long&#34;&gt;Beatrice H. Hahn et al.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As another example, in 1998, six foreign medical workers were accused of deliberately infecting hospitalized children with HIV and were &lt;a href=&#34;https://en.wikipedia.org/wiki/HIV_trial_in_Libya&#34;&gt;sentenced to death in Libya&lt;/a&gt;. In 2006, &lt;a href=&#34;https://www.nature.com/articles/444836a&#34;&gt;de Oliveira, et al.&lt;/a&gt; used phylogenetics to provide evidence that the origin of the HIV strains that infected the children had an evolutionary history in the mid-90s, which was before the health care workers arrived in 1998. The six medics were released in 2007. There is also a very good writeup on the case by &lt;a href=&#34;https://www.nature.com/articles/444658b&#34;&gt;Declan Butler&lt;/a&gt;. Although probably very emotional, this would be a great movie.&lt;/p&gt;
&lt;p&gt;These techniques are also used in criminal convictions. However, the interpretation of this kind of evidence in court cases can be unsafe. The insights of &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1971185/&#34;&gt;Pillay, et al.&lt;/a&gt; should bring this to light.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this post we discussed that as infections spread from person to person, the virus continues to mutate and become more and more divergent. This allows using the genetic information we obtain while doing the drug resistance test and analyse the sequences for abnormalities.&lt;/p&gt;
&lt;p&gt;We then showed how to compute genetic distance using multiple sequence alignment (MSA) and that it’s possible to model this process as a Markov chain. Then you can view the resulting model as a heatmap or phylogenetic trees.&lt;/p&gt;
&lt;p&gt;This finds practical application in diverse situations, for exampling shedding light on the origin of the HIV virus, as well as evidence in legal trials.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;whats-next&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What’s next&lt;/h2&gt;
&lt;p&gt;In the fourth and final part of this series, we will show how we analysed the inter- and intra-patient genetic distances of HIV sequences by logistic regression. This was useful in properly colouring our heatmap explained in this series. See you there!&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 2: Drug resistance testing</title>
      <link>https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/</link>
      <pubDate>Tue, 07 May 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Dominique Goedhals is a pathologist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies”, and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;part 1&lt;/a&gt; of this four-part series about HIV AIDS, we discussed the &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;HIV pandemic in Sub-Saharan Africa&lt;/a&gt;. In this second installment, we cover a recent publication in the &lt;a href=&#34;https://journals.plos.org/plosone/&#34;&gt;PLoS ONE journal&lt;/a&gt;: “&lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility&lt;/a&gt;”.&lt;/p&gt;
&lt;p&gt;The authors described how they used affordable hardware to create a &lt;a href=&#34;https://en.wikipedia.org/wiki/Phylogenetics&#34;&gt;phylogenetic&lt;/a&gt; pipeline, tailored for the HIV drug-resistance testing facility.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-drug-resistance&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;HIV drug resistance&lt;/h2&gt;
&lt;p&gt;Natural selection is the process by which some form of selective pressure favours a &lt;strong&gt;phenotypic&lt;/strong&gt; trait or change. These phenotypic traits can be the blood group of a person, whether a pea is wrinkly or not, or whether an infectious organism is susceptible or resistant to a drug. Many times these phenotypic traits, or physical attributes, are caused by genetics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Genotyping&lt;/strong&gt; is the process by which one can infer this phenotypic trait from a genotype, and this is used more and more frequently in medicine. For exampe, in breast cancer treatment, the BRCA (BReast CAncer) genes are genotyped to determine whether these cancer suppressing genes are intact. If there is a deleterious or damaging mutation in one of these genes, it can increase the risk of developing breast cancer, thus a phenotype of increased risk of breast cancer.&lt;/p&gt;
&lt;p&gt;For most organisms, the copying of genetic material happens by very precise enzymes or pathways, but occasionally mutations do occur. If a mutation occurs and is sufficiently damaging, it gets removed from the gene pool. However, if the mutation is sufficiently beneficial, it increases the survival of this genetic variation and might biasly select for it.&lt;/p&gt;
&lt;p&gt;In the &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;previous post&lt;/a&gt;, we discussed &lt;strong&gt;ARVs&lt;/strong&gt; (antiretrovirals) and how these drugs changed the landscape of HIV infection by preventing the development of AIDS. We mentioned that ARVs suppress viral replication. One of the steps in HIV replication is the conversion of its single-stranded RNA to DNA, which can then be incorporated into the DNA of infected cells. The enzyme responsible for this conversion is reverse transcriptase, and it has a high error rate when doing this conversion. One can thus say that HIV has a high evolutionary rate, or mutation rate. These genes are translated into viral proteins, which are required to make more virions (viral particles). Proteins are strings or polymers of amino acid residues with an alphabet of 20 choices of amino acids or letters. The sequence of the DNA or RNA influences the sequence of the protein; thus, mutations in the DNA or RNA can result in changes in the protein, and our targets for stopping HIV replication are proteins/enzymes.&lt;/p&gt;
&lt;p&gt;There are various classes of ARVs which interfere with viral replication by inhibition of viral enzymes. If the DNA or RNA sequence encoding this enzyme is changed, the result might be an unfit virus not capable of further infection or replication. On the other hand, if this mutation results in an ARV-resistant virus, replication and infection can still continue in the presence of the ARV in question, possibly causing the ARV to become ineffective in stopping replication.&lt;/p&gt;
&lt;p&gt;The question remains, why do people develop resistance? The short answer: it’s a numbers game.&lt;/p&gt;
&lt;p&gt;If the patient received the correct regimen of ARVs (known as &lt;strong&gt;HAART&lt;/strong&gt;, or highly active antiretroviral treatment) and is taking the doses correctly, the viral load will suppress. Suppression is caused by stopping viral replication, and if the virus is not replicating, the error-prone reverse transcriptase can’t cause mutations, which in turn cannot be favoured by selective pressure. If the patient is not taking any treatment, the virus is replicating and thus inevitably mutating, but there is no selective pressure to select for these variants. Lastly, if the patient is adhering poorly to the treatment, there are times where the levels of the treatment are too low to effectively suppress viral replication completely. In this scenario, mutants with a mutation which makes them less susceptible to the treatment will replicate more than the wild type counterparts - these are called escape mutants.&lt;/p&gt;
&lt;p&gt;The reason why this is a numbers game is that the virus is mutating randomly and one resulting amino acid residue could be replaced by any of 19 other amino acid residues. It is only when this change causes an increase in replicative fitness while there is some form of selective pressure that this mutant can become a dominant quasi-species and the patient develops resistance.&lt;/p&gt;
&lt;p&gt;Mutations are expressed using the notation &lt;code&gt;[WT AA][POS][Mutant AA]&lt;/code&gt;, where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;WT denotes wild type (the typical genotype)&lt;/li&gt;
&lt;li&gt;AA denotes amino acid residue&lt;/li&gt;
&lt;li&gt;POS denotes the position on the protein&lt;/li&gt;
&lt;li&gt;Mutant means the changed genotype&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We mentioned some classes of ARVs in part 1. To the viral reverse transcriptase, &lt;strong&gt;NRTIs&lt;/strong&gt; (Nucleoside/Nucleotide Reverse Transcriptase Inhibitors) look like the building blocks of DNA called nucleotides. If the reverse transcriptase incorporates one of these ‘fake’ nucleotides, it is not able to further extend the DNA strand, leaving it incomplete, thus interfering with replication. Not all mutations cause the same level of resistance. These levels are:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Total score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Susceptible&lt;/td&gt;
&lt;td&gt;0 to 9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;Potential low-level resistance&lt;/td&gt;
&lt;td&gt;10 to 14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Low-level resistance&lt;/td&gt;
&lt;td&gt;15 to 29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;Intermediate resistance&lt;/td&gt;
&lt;td&gt;30 to 59&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;High-level resistance&lt;/td&gt;
&lt;td&gt;&amp;gt;= 60&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;a href=&#34;https://hivdb.stanford.edu/page/release-notes/&#34;&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can plot resistance scores for five commonly used NRTIs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
  library(stringr)
  library(tidyr)
  library(ggplot2)
  library(knitr)
  library(broom)
})&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nrti_dr_scores &amp;lt;- read_tsv(&amp;quot;ScoresNRTI_1555579653110.tsv&amp;quot;, col_types = &amp;quot;cdcddddddd&amp;quot;)

nrti_dr_scores %&amp;gt;% 
  select(Rule, ABC:AZT, FTC:TDF) %&amp;gt;% 
  gather(arv, score, 2:6) %&amp;gt;% 
  filter(!grepl(&amp;quot; &amp;quot;, Rule)) %&amp;gt;% 
  mutate(effect = ifelse(score &amp;gt; 0, &amp;quot;resistance&amp;quot;, &amp;quot;hyper-susceptible&amp;quot;)) %&amp;gt;% 
  
  ggplot(aes(x = Rule, y = score, fill = effect)) +
  geom_col() +
  coord_flip() +
  theme_bw() +
  facet_grid(. ~ arv)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-1-1.png&#34; width=&#34;960&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can see that 3TC and FTC have the exact same profiles, and they are chemically also very similar, as shown in the figure below.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;
&lt;img src=&#34;/post/2019-05-07-analysis-hiv-pandemic-part-2_files/lamivu10.gif&#34; alt=&#34;The chemical structures of 3TC (left) and FTC (right). Available at http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
(#fig:3TC and FTC)The chemical structures of 3TC (left) and FTC (right). Available at &lt;a href=&#34;http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber&#34; class=&#34;uri&#34;&gt;http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;Also, note that some of the mutations increase susceptibility for AZT and TDF, indicated by a negative value for resistance. This is called &lt;strong&gt;hyper-susceptibility&lt;/strong&gt;, and is used by clinicians treating patients.&lt;/p&gt;
&lt;p&gt;For example, the mutation &lt;strong&gt;M184V&lt;/strong&gt; means that the wild type AA at position 184 is a methionine (M) and it has been mutated to valine (V). Although this mutation makes the virus highly resistant to 3TC, it has a crippling effect on viral replication, i.e., the virus can still replicate in the presence of 3TC, but slower. This mutation also makes the virus hypersusceptible to AZT and TDF. The way clinicians use this knowledge is to keep patients on 3TC in order to keep the selective pressure for M184V, and use AZT or TDF as the other NRTI. It is typical to have a patient on two NRTIs, which is sometimes referred to as the “back bone”, and then one drug from another drug class to which the patient is fully susceptible. Knowing the genotype of the virus allows us to infer the phenotype, which in this case is the drug-resistance profile.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;phylopi-an-affordable-purpose-built-phylogenetic-pipeline-for-the-hiv-drug-resistance-testing-facility&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility&lt;/h2&gt;
&lt;p&gt;The goal of HIV drug resistance genotyping is to determine which drugs will produce the best response in the patient, and, as mentioned earlier, we use the viral sequence information for this. Due to the rapid evolution of HIV, we can use this attribute in quality assurance. &lt;strong&gt;PCR&lt;/strong&gt; (polymerase chain reaction) is very sensitive to contamination, and if gross cross-contamination occurred during this process, the sequences of, say, two unrelated individuals might be very similar. Also, the viral sequences of a patient over time will be more similar than the sequences between different people.&lt;/p&gt;
&lt;p&gt;Let’s say we genotyped a patient five years ago and we have a current genotype sequence. It should be possible to retrieve the previous sequence from a database of sequences without relying on identifiers only, or at all. Sometimes when someone remarries they may change their surname or transcription errors can be made, which makes finding previous samples tedious and error-prone. So instead of using patient information to look for previous samples to include, we can instead use the sequence data itself, and then confirm the sequences belong to the same patient, or investigate any irregularities. If we suspect mother-to-child transmission from our analysis, we confirm this with the health care worker who sent the sample.&lt;/p&gt;
&lt;p&gt;We recently published an automated pipeline for maintaining a sequence database, automatically retrieving the most similar sequences from previous genotyped viral isolates, calculating genetic distances and phylogenetic inference. Let’s look at each of these steps.&lt;/p&gt;
&lt;p&gt;Firstly, we cannot conduct phylogenetic analysis on all past and present sequences; this would be very computationally expensive and time-consuming, and the result will be very difficult to interpret. Rather, we want to focus on the current batch of sequences the laboratory generated, but also the most similar sequences from previous batches stored in our rolling database:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We used a tool called &lt;a href=&#34;https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&amp;amp;PAGE_TYPE=BlastDocs&amp;amp;DOC_TYPE=Download&#34;&gt;&lt;code&gt;BLAST&lt;/code&gt;&lt;/a&gt; (Basic Local Alignment Search Tool) for this. This tool is used to add our new submissions to the current rolling database and then also retrieve the most similar previous sequences.&lt;/li&gt;
&lt;li&gt;These sequences are aligned using &lt;a href=&#34;https://mafft.cbrc.jp/alignment/software/&#34;&gt;&lt;code&gt;MAFFT&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The resulting multiple sequence alignment is automatically curated with &lt;a href=&#34;http://trimal.cgenomics.org/&#34;&gt;&lt;code&gt;trimAl&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, the sequences are ready for phylogenetic inference.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For this, we used &lt;a href=&#34;http://www.microbesonline.org/fasttree/&#34;&gt;&lt;code&gt;FastTree&lt;/code&gt;&lt;/a&gt;. As its name implies, it is fast and capable of handling large datasets requiring minimal resources.&lt;/li&gt;
&lt;li&gt;The resulting tree is rendered using the &lt;a href=&#34;http://etetoolkit.org/&#34;&gt;&lt;code&gt;ETE3&lt;/code&gt;&lt;/a&gt; python API.&lt;/li&gt;
&lt;li&gt;R is used to calculate a distance matrix from the multiple sequence alignment using the &lt;a href=&#34;https://cran.r-project.org/web/packages/ape/index.html&#34;&gt;&lt;code&gt;ape&lt;/code&gt;&lt;/a&gt; library and &lt;a href=&#34;https://plot.ly/r/&#34;&gt;&lt;code&gt;plotly&lt;/code&gt;&lt;/a&gt; for visualization.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In part 3 of this series, we will talk more about the distance matrix calculation and how logistic regression was used to look at inter- and intra-patient genetic distances of HIV sequences by mining a large public database at the &lt;a href=&#34;https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html&#34;&gt;Los Alamos HIV sequence database&lt;/a&gt;. This was important, as the insights gained here were used to colour the distance matrix so that the user’s attention is drawn to relevant samples.&lt;/p&gt;
&lt;p&gt;This is an R for medicine blog post, but there is a lot of jargon in the paragraph above. We can clear things up a bit, but please check out our &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;publication&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;how-does-it-work&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;How does it work?&lt;/h2&gt;
&lt;p&gt;Firstly, our DNA sequences are strings consisting of an alphabet: A, C, G, and T. Also, genetic distances are much like &lt;a href=&#34;https://en.wikipedia.org/wiki/Levenshtein_distance&#34;&gt;Levenshtein&lt;/a&gt; or &lt;a href=&#34;https://en.wikipedia.org/wiki/Hamming_distance&#34;&gt;Hamming&lt;/a&gt; distances, or other &lt;a href=&#34;https://en.wikipedia.org/wiki/Edit_distance&#34;&gt;edit distance&lt;/a&gt; algorithms.&lt;/p&gt;
&lt;div id=&#34;raw-strings&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Raw strings&lt;/h3&gt;
&lt;p&gt;Consider the following strings, A, B and C:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;A: peter kicked the ball really far
B: i think it was yesterday when peter kicked the ball really far
C: pieter kicked the round ball really hard&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see that there are obvious similarities between these three sentences, but it would be much easier if they where aligned.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;aligned-strings&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Aligned strings&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;A: ______________________________p eter kicked the _____ ball really far
B: i think it was yesterday when p eter kicked the _____ ball really far
C: ______________________________pieter kicked the round ball really hard&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By aligning the string it is much easier to calculate the similarities or differences.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;curated-strings&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Curated strings&lt;/h3&gt;
&lt;p&gt;Next, we remove the overhangs since it is possible that in reality strings A and C also had more text on the left-hand side, but it was not sampled. Depending on your situation, we could also remove the internal ‘gaps’ like the word ‘round’. For our pipeline, insertions and deletions, like the letter ‘i’ in our example and the word ‘round’ are real features we would like to include. We also have a substitution in C, where the ‘f’ in A and B was changed to an ‘h’.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;A: p eter kicked the _____ ball really far
B: p eter kicked the _____ ball really far
C: pieter kicked the round ball really har&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;calculation&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Calculation&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;A: p eter kicked the _____ ball really far
B: p eter kicked the _____ ball really far
M: 111111 111111 111 11111 1111 111111 111&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see for A and B we have matches for all of the features. If we sum up all the ones, we get 33, so the distance between them:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ d = \frac{33 - 33}{33} = 0\]&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;B: p eter kicked the _____ ball really far
C: pieter kicked the round ball really har
M: 101111 111111 111 00000 1111 111111 011&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ d = \frac{33 - 26}{33} = 0.212\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;After the multiple sequence alignment and curation, each sequence is compared to each in order to calculate a distance matrix. This can then be used to create a phylogenetic tree, like a kind of dendrogram that can be calculated using hierarchical clustering. The above is very simplified, but should give enough background to understand the rest of the post. The resource at &lt;a href=&#34;https://www.ebi.ac.uk/training/online/course/introduction-phylogenetics/what-phylogenetics&#34;&gt;EMBL-EBI Train Online&lt;/a&gt; is a good place to get started if you want to know more&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;the-pipeline-on-a-raspberry-pi&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The pipeline on a Raspberry Pi&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.raspberrypi.org/&#34;&gt;Raspberry Pi&lt;/a&gt; is a small and cheap single-board computer. It is used amongst many hobbyists for all kinds of projects, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://pyvideo.org/pycon-us-2012/militarizing-your-backyard-with-python-computer.html&#34;&gt;Militarizing Your Backyard with Python: Computer Vision and the Squirrel Hordes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://hackaday.com/2013/01/20/raspberry-pi-and-r/&#34;&gt;Brewing beer with the help of R&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://retropie.org.uk/&#34;&gt;Retro gaming machines&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One of the motivations behind developing this computer was to teach kids to &lt;a href=&#34;http://blog.sparkfuneducation.com/teaching-coding-to-kids-using-raspberry-pi-3-and-scratch&#34;&gt;code or engage in electronics&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All of the above are very important, but the Raspberry Pi has made its way into &lt;strong&gt;science and medicine&lt;/strong&gt; as well. For example, a group developed a cheap &lt;a href=&#34;https://pubs.rsc.org/en/content/articlehtml/2017/sc/c7sc03281a&#34;&gt;instrument&lt;/a&gt; to diagnose Ebola virus infection in the field. Researchers can attach various sensors to the Raspberry Pi and use it for data collection.&lt;/p&gt;
&lt;div id=&#34;benchmarking&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Benchmarking&lt;/h3&gt;
&lt;p&gt;For our application, we needed to show that the Pi can handle the problem we wanted it to solve, so we did some benchmarking.&lt;/p&gt;
&lt;p&gt;We used &lt;a href=&#34;https://www.seleniumhq.org/&#34;&gt;Selenium WebDriver&lt;/a&gt; to operate the pipeline as a human would, by actually browsing for an input file and submitting it through the button. Time stamps were taken for each step, and the number of blast hits that were included in the phylogenetic inference was also recorded. For this exercise, we set the number of closest sequences to retrieve for each sample to 5, which means the submitted sample and 4 of the genetically closest samples. However, it is possible that different submitted sequences have retrieved a sequence in common; these will be included in the analysis only once. When we start analyzing this data, we will see this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read csv with time data
time_dat &amp;lt;- read_csv(
  &amp;quot;timeFile.csv&amp;quot;, 
  col_types = &amp;quot;ccd&amp;quot;,
  col_names = c(&amp;quot;Run&amp;quot;, &amp;quot;Description&amp;quot;, &amp;quot;Measure&amp;quot;)
)

head(time_dat) %&amp;gt;% 
  kable(caption = &amp;quot;First few lines of the benchmarking data.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:import&#34;&gt;Table 1: &lt;/span&gt;First few lines of the benchmarking data.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Run&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Description&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;Measure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;blastHits&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;blast&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;11.219230&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;mafftTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;13.404623&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;trimalTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.111737&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;fasttreeTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.986582&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;heatmapTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.354820&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The &lt;code&gt;Run&lt;/code&gt; column shows some info regarding the benchmarking experiment. We know we asked for the five best hits to be included; the sequences were pseudo-randomly selected. We started with one sequence for submission and then incremented this by one up to 50. The above again shows how data is not always in the best format for working with. We need to extract the digits at the end of the Run variable. Previously we used the &lt;code&gt;tidyr::gather()&lt;/code&gt; function to pivot data from wide to long. This time we will use the &lt;code&gt;spread()&lt;/code&gt; function to make long data wide.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_dat &amp;lt;- time_dat %&amp;gt;% 
  mutate(nSubmitted = str_extract(Run, &amp;quot;\\d+$&amp;quot;) %&amp;gt;% as.numeric) %&amp;gt;% 
  select(-Run ) %&amp;gt;% 
  spread(Description, Measure)

head(time_dat) %&amp;gt;% 
  kable(caption = &amp;quot;First few lines of the benchmarking data after some cleaning.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-2&#34;&gt;Table 2: &lt;/span&gt;First few lines of the benchmarking data after some cleaning.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;right&#34;&gt;nSubmitted&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;blast&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;blastHits&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;fasttreeTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;heatmapTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;mafftTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;renderTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;trimalTime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;11.21923&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.986582&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.354820&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;13.40462&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.686239&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1117370&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;22.08694&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3.129514&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.369152&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;30.26920&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.890183&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2699649&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;33.67705&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;15&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5.480334&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.400223&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;47.42213&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.107776&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.4849610&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;43.58782&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;21&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.627502&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.437273&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;76.47209&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.243336&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.7980120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;55.43246&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;25&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;10.753521&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.476636&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;105.21836&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.494058&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.0820050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;65.18629&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;30&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;9.688977&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.516058&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;128.93219&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.653201&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.4656579&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We got rid of the useless data in the &lt;code&gt;Run&lt;/code&gt; variable and extracted the useful information into the &lt;code&gt;nSubmitted&lt;/code&gt; variable.&lt;/p&gt;
&lt;p&gt;Below are the explanations for the variables.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;nSubmitted&lt;/code&gt;: Number of sequences submitted or uploaded to the pipeline&lt;/li&gt;
&lt;li&gt;&lt;code&gt;blast&lt;/code&gt;: time in seconds for blast to find most similar previously sequenced samples&lt;/li&gt;
&lt;li&gt;&lt;code&gt;blastHits&lt;/code&gt;: the number of sequences retrieved&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mafftTime&lt;/code&gt;: the time it took to create a multiple-sequence alignment&lt;/li&gt;
&lt;li&gt;&lt;code&gt;trimalTime&lt;/code&gt;: the time it took to clean the multiple-sequence alignment&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fasttreeTime&lt;/code&gt;: the time it took for phylogenetic inference&lt;/li&gt;
&lt;li&gt;&lt;code&gt;heatmapTime&lt;/code&gt;: the time it took to produce the heatmap&lt;/li&gt;
&lt;li&gt;&lt;code&gt;renderTime&lt;/code&gt;: the time it took to render the tree&lt;/li&gt;
&lt;/ul&gt;
&lt;div id=&#34;number-of-sequences-submitted-vs.-most-similar-sequences-retrieved&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Number of sequences submitted &lt;em&gt;vs.&lt;/em&gt; most similar sequences retrieved&lt;/h4&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_dat %&amp;gt;%
  ggplot(aes(x = nSubmitted, y = blastHits)) +
  geom_smooth(method = lm, se = FALSE, colour = &amp;quot;black&amp;quot;, formula = y ~ x - 1, size = 0.25) +
  geom_point() +
  theme_bw() +
  xlab(&amp;quot;Number of sequences submitted&amp;quot;) +
  ylab(&amp;quot;Number of sequences retrieved using blastn&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 41, y = 72, label = &amp;quot;y == 4.628 * x&amp;quot;, parse = TRUE) +
  annotate(&amp;quot;text&amp;quot;, x = 40, y = 60, label = &amp;quot;R^2 == 0.998&amp;quot;, parse = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;- lm(blastHits ~ nSubmitted - 1, data = time_dat)
tidy(fit) %&amp;gt;% 
  kable(caption = &amp;quot;Regression analysis of the number of blast hits retrieved.&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-4&#34;&gt;Table 3: &lt;/span&gt;Regression analysis of the number of blast hits retrieved.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;term&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;statistic&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;nSubmitted&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.628026&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0280312&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;165.1026&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A linear line fits the data really well. We mentioned that if different sequences retrieve the same sequence from the database, it is used only once. The slope of this line will depend on the genetic diversity of the database. A more diverse database will have a steeper slope, whereas a less diverse database will have a shallower slope. Also, theoretically, at some point, the line will reach an asymptote as the number of requested sequences start to saturate the number of available sequences. Practically, one would not have to submit more than 16 - 24 samples at a time; thus, we are in the linear part of the rarefaction curve. We can thus see from this that for the Los Alamos data used in the analysis, about 4.5 sequences get retrieved for every sequence submitted.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;blast-time-vs.-number-of-sequences-submitted&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;BLAST time &lt;em&gt;vs.&lt;/em&gt; number of sequences submitted&lt;/h4&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_dat %&amp;gt;%
  ggplot(aes(x = nSubmitted, y = blast)) +
  geom_smooth(method = lm, se = FALSE, colour = &amp;quot;black&amp;quot;, formula = y ~ x, size = 0.25) +
  geom_point(colour = &amp;quot;blue&amp;quot;) +
  theme_bw() +
  xlab(&amp;quot;Number of input sequences&amp;quot;) + ylab(&amp;quot;Time in seconds (blastn)&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 41, y = 90, label = &amp;quot;y == 11.0453 * x&amp;quot;, parse = TRUE) +
  annotate(&amp;quot;text&amp;quot;, x = 40, y = 60, label = &amp;quot;R^2 == 0.9999&amp;quot;, parse = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;- lm(time_dat$blast ~ time_dat$nSubmitted)
tidy(fit) %&amp;gt;% 
  kable(caption = &amp;quot;Regression analysis of blastn time vs. number of sequences.&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-6&#34;&gt;Table 4: &lt;/span&gt;Regression analysis of blastn time vs. number of sequences.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;term&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;statistic&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;(Intercept)&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.8176139&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.5185500&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-1.576731&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.121426&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;time_dat$nSubmitted&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;11.0453236&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0176978&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;624.105409&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Again, we see a linear relationship for &lt;code&gt;blastn&lt;/code&gt; and the time it takes to complete. For every sequence submitted, it takes about 11 seconds to search a database of about 11,000 sequence entries. We can say the &lt;code&gt;blastn&lt;/code&gt; displays linear time complexity or &lt;span class=&#34;math inline&#34;&gt;\(O(n)\)&lt;/span&gt; time. We did not discover anything new here. Remember, the purpose of this is to show off the Pi flexing its muscles. (You can read about the BLAST algorithm &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pubmed/2231712&#34;&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;multiple-sequence-alignment-time-vs.-number-of-total-sequences-submitted-and-retrieved&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Multiple sequence alignment time &lt;em&gt;vs.&lt;/em&gt; number of total sequences, submitted and retrieved&lt;/h4&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;- lm(mafftTime ~ I(blastHits^2) - 1, data = time_dat)

time_dat %&amp;gt;%
  ggplot(aes(x = blastHits, y = mafftTime)) +
  geom_point(colour = &amp;quot;blue&amp;quot;) +
  geom_smooth(method = &amp;quot;lm&amp;quot;,formula = y ~ I(x^2) - 1, colour = &amp;quot;black&amp;quot;, size = 0.25) +
  annotate(&amp;quot;text&amp;quot;, x = 190, y = 1800, label = &amp;quot;y == 0.09997 * x^2&amp;quot;, parse = TRUE) +
  theme_bw() +
  xlab(&amp;quot;Number of sequences in alignment&amp;quot;) + 
  ylab(&amp;quot;Time in seconds (MAFFT)&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tidy(fit) %&amp;gt;% 
  kable(caption = &amp;quot;Regression analysis of multiple sequence alignment.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-8&#34;&gt;Table 5: &lt;/span&gt;Regression analysis of multiple sequence alignment.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;term&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;statistic&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;I(blastHits^2)&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.099974&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0004048&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;246.9813&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Since in multiple sequence alignment, each sequence is aligned with each other sequence, we would expect &lt;span class=&#34;math inline&#34;&gt;\(O(N^2)\)&lt;/span&gt; time complexity. We can see in our regression result that we are very close to what we expect. And &lt;span class=&#34;math inline&#34;&gt;\(O\)&lt;/span&gt; is a bit less than a sixth of a second. Thus, if we would analyse 16 sequences, we would retrieve &lt;span class=&#34;math inline&#34;&gt;\(16 * 4.5 = 72\)&lt;/span&gt;, and the multiple-sequence alignment would take &lt;span class=&#34;math inline&#34;&gt;\(0.09997 * 72^2 = 518\)&lt;/span&gt; seconds or ~8.6 minutes, which is not bad. Also consider that you can submit your samples and walk away.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;impact&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Impact&lt;/h3&gt;
&lt;p&gt;It is important to mention that PhyloPi is not used for tracking or detecting transmission clusters, but rather offers a way of automating phylogenetic analysis. Some patients will be genotyped more than once, and these sequences will cluster very closely on a phylogenetic tree. This offers a spot check into the quality of the results. Sometimes we find that the patient has two different first names, which they interchangeably use depending on the health care worker and patient language preference. We have also detected sample swaps which otherwise would have gone unnoticed.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;what-next&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What next?&lt;/h2&gt;
&lt;p&gt;In part 3, we will discuss how the inter- and intrapatient HIV genetic distances were analyzed using logistic regression to gain insights into the probability distribution of these two classes. This is also where we asked Andrie from RStudio for help. It was useful for us biologists and virologists to have someone not just to oversee the analysis we did, but also to implement the correct analysis to get the job done. Hope to see you in the next section!&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 1: HIV in sub-Sahara Africa</title>
      <link>https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/</link>
      <pubDate>Tue, 30 Apr 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Sabeehah Vawda is a pathologist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.immunology.org/public-information/bitesized-immunology/pathogens-and-disease/human-immunodeficiency-virus-hiv&#34;&gt;Human Immunodeficiency Virus&lt;/a&gt; (&lt;strong&gt;HIV&lt;/strong&gt;) is the virus that causes acquired immunodeficiency syndrome (&lt;strong&gt;AIDS&lt;/strong&gt;). The virus invades various immune cells, causing loss of immunity, and thus increased susceptibility to infections, including Tuberculosis and cancer. In a recent publication in &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PLoS ONE&lt;/a&gt;, the authors described how they used affordable hardware to create a &lt;a href=&#34;https://en.wikipedia.org/wiki/Phylogenetics&#34;&gt;phylogenetic&lt;/a&gt; pipeline, tailored for the HIV drug resistance testing facility. In this series of blog posts we highlight the serious problem of HIV infection in sub-Saharan Africa, with special analysis of the situation in South Africa.&lt;/p&gt;
&lt;div id=&#34;stages-of-hiv-infection&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Stages of HIV infection&lt;/h3&gt;
&lt;p&gt;HIV infection can be divided into the three consecutive stages: acute primary infection, asymptomatic stage, and the symptomatic stage.&lt;/p&gt;
&lt;p&gt;The first stage, &lt;strong&gt;acute primary infection&lt;/strong&gt;, has symptoms very much like flu and may last for a week or two. The body reacts with an immune response, which results in the production of antibodies to fight the HIV infection. This process is called seroconversion and can last a couple of months. During this stage, although the patient is infected and the virus is spreading through the body, the patient might not test positive. This initial period of seroconversion is called ‘the window period’ and depends on the type of test used. Rapid tests are done at the point of care. This means that the test can be done at the clinic with a finger prick and the result is ready in 20 minutes. The drawback of this test is a window period of three months and a small false positive rate. The rapid test detects HIV antibodies, and because the immune system needs some time to produce sufficient antibodies to be detected, there is this window period. Most laboratories these days use fourth-generation &lt;a href=&#34;https://www.immunology.org/public-information/bitesized-immunology/experimental-techniques/enzyme-linked-immunosorbent-assay&#34;&gt;ELISA&lt;/a&gt; (Enzyme-Linked Immunosorbent Assay) for HIV diagnosis and confirmation. This technique detects both HIV antibodies and antigens. Antigens are the foreign objects that the immune system recognizes as ‘non-self’; in this case, it is the viral protein p24. The advantage of this technique is a window period of only one month.&lt;/p&gt;
&lt;p&gt;This first stage, including the window period, is then followed by the &lt;strong&gt;asymptomatic stage&lt;/strong&gt;, which may last for as long as ten years. During this stage, the infected person does not experience symptoms and feels healthy. However, the virus is still replicating and destroying immune cells, especially CD4 cells. This damages the immune system and ultimately leads to stage 3 if not treated. This does not mean that people at stage 3 are doomed, but the earlier treatment starts, the better the outcome.&lt;/p&gt;
&lt;p&gt;Stage 3 is referred to as &lt;strong&gt;symptomatic HIV infection or AIDS&lt;/strong&gt; (Acquired Immune Deficiency Syndrome). At this stage, the immune system is so weak that it is not able to fight off bacterial or fungal infections that typically do not cause infections in immune competent people. These serious infections are called opportunistic infections, and have a high morbidity and mortality rate.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;transmission-and-epidemiology&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Transmission and epidemiology&lt;/h3&gt;
&lt;p&gt;Worldwide, approximately 36.9 million (UNAIDS) people are living with HIV.&lt;/p&gt;
&lt;p&gt;HIV is transmitted mainly by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Having unprotected sex&lt;/li&gt;
&lt;li&gt;Non-sterile needles in drug use or sharing needles&lt;/li&gt;
&lt;li&gt;Mother-to-child transmission during birth or breastfeeding&lt;/li&gt;
&lt;li&gt;Infected blood transfusions, transplants or other medical procedures (very unlikely)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We mentioned the window period of the HIV infection as well as the asymptomatic stage. During any of the stages, it is possible to transmit the infection. The problem with the window period is an unknown HIV status or falsely assumed negative status, and during the asymptomatic stage, there is no reason for the infected person to seek medical attention. There are obviously behavioural issues in HIV transmission, and due to the long asymptomatic phase, HIV-positive status can be unknown for a long period. For these reasons, it is important that high-risk individuals do frequent HIV tests to determine their status.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;treatment-for-hiv-infection&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Treatment for HIV infection&lt;/h3&gt;
&lt;p&gt;HIV is treatable but not (yet) curable. The good news, however, is that if a person receives &lt;strong&gt;antiretroviral (ARV) treatment&lt;/strong&gt;, their viral load suppresses (viral replication stops) and the chance of transmitting HIV drastically decreases.&lt;/p&gt;
&lt;p&gt;So 30 years into this pandemic, the big question is, why is HIV still a problem?&lt;/p&gt;
&lt;p&gt;Not all countries adopted the use of ARVs in an equal manner. Although AZT (Zidovudine) was the first drug to be approved by the &lt;a href=&#34;https://www.fda.gov/forpatients/illness/hivaids/history/ucm151074.htm&#34;&gt;FDA&lt;/a&gt; in March 1987, it was soon discovered that monotherapy with only AZT was not effective for very long, as the virus developed resistance to the medicine quickly. Since then, ARVs have come a long way, and patients are placed on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HAART&lt;/strong&gt; (Highly Active Antiretroviral Treatment), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;cART&lt;/strong&gt; (combination Antiretroviral Treatment), which typically consists of 3 drugs of different classes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-in-africa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;HIV in Africa&lt;/h2&gt;
&lt;p&gt;Let’s look at the rates of HIV infection in different African countries. The world factbook by the CIA has some HIV infection rate &lt;a href=&#34;https://www.cia.gov/LIBRARY/publications/the-world-factbook/rankorder/rawdata_2155.txt&#34;&gt;data&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
  library(stringr)
  library(tidyr)
  library(ggplot2)
  library(forcats)
  library(knitr)
  library(maptools)
  library(viridis)
  library(RColorBrewer)
  library(mapproj)
  library(broom)
  library(ggrepel)
  library(sf)
})&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# read the HIV data
HIV_rate_2016 &amp;lt;- read_csv(
  file.path(file_path, &amp;quot;HIV rates.csv&amp;quot;), col_names = TRUE, col_types = &amp;quot;cd&amp;quot;
  )

# read the Africa shape file
africa &amp;lt;-
  sf::st_read(
    file.path(file_path, &amp;quot;Africa_SHP/Africa.shp&amp;quot;), 
    stringsAsFactors = FALSE, quiet = TRUE
    ) %&amp;gt;%
  rename(Country = &amp;quot;COUNTRY&amp;quot;) %&amp;gt;%
  left_join(HIV_rate_2016, by = &amp;quot;Country&amp;quot;)

africa %&amp;gt;%
  ggplot(aes(fill = Rate)) +
  geom_sf() +
  coord_sf() +
  scale_fill_viridis(option = &amp;quot;plasma&amp;quot;) +
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_map-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In the choropleth above, we see that South Africa, Botswana, Lesotho, and Swaziland seem to have the highest rates of infection. This is presented as the percentage infected, which takes into account population sizes. It is important to understand that the level of denial is indirectly proportional to the reported rate of infection. Even in this day and age, denial of stigmatized diseases is an issue.&lt;/p&gt;
&lt;div id=&#34;cleaning-the-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Cleaning the data&lt;/h3&gt;
&lt;p&gt;We can also look at the burden of HIV as the number of people infected, and we might get a different picture from what we saw from the choropleth.&lt;/p&gt;
&lt;p&gt;Here, we read in the &lt;a href=&#34;http://apps.who.int/gho/data/node.main.626&#34;&gt;data&lt;/a&gt;, and rename the columns to &lt;code&gt;Country&lt;/code&gt;, &lt;code&gt;PersCov&lt;/code&gt; (percentage ARV coverage), &lt;code&gt;NumberOnARV&lt;/code&gt; (Number of patients on ARVs), and &lt;code&gt;NumberInfected&lt;/code&gt; (Number of patients infected).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read csv with ARV infection dat
arv_dat &amp;lt;- read_csv(file.path(file_path, &amp;quot;ARV cov 2017.csv&amp;quot;), 
  col_types = &amp;quot;cccc&amp;quot;,
  col_names = c(&amp;quot;Country&amp;quot;, &amp;quot;PersCov&amp;quot;, &amp;quot;NumberOnARV&amp;quot;, &amp;quot;NumberInfected&amp;quot;),
  skip = 1
)

head(arv_dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 4
##   Country             PersCov    NumberOnARV NumberInfected           
##   &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt;      &amp;lt;chr&amp;gt;       &amp;lt;chr&amp;gt;                    
## 1 Afghanistan         No data    790         No data                  
## 2 Albania             42 [40-44] 570         1400 [1300-1400]         
## 3 Algeria             80 [75-87] 11000       14 000 [13 000-15 000]   
## 4 Andorra             No data    No data     No data                  
## 5 Angola              26 [22-30] 78700       310 000 [260 000-360 000]
## 6 Antigua and Barbuda No data    No data     No data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This data has several symptoms of being very messy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very long variable names, descriptive, but difficult to work with; this was changed during import&lt;/li&gt;
&lt;li&gt;The values contain confidence intervals in brackets; this will be difficult to work with as-is&lt;/li&gt;
&lt;li&gt;We might want to transform no data to &lt;code&gt;NA&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;We are interested in Sub-Saharan Africa, but the data is for the whole world&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# A list of Sub-Saharan countries
sub_sahara &amp;lt;- readLines(file.path(file_path, &amp;quot;Sub-Saharan.txt&amp;quot;))

clean_column &amp;lt;- function(x){
  # Remove the ranges in brackets and convert the values to numeric
  x %&amp;gt;% 
    str_replace_all(&amp;quot;\\[.*?\\]&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;% 
    str_replace_all(&amp;quot;&amp;lt;&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;%
    str_replace_all(&amp;quot; &amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;% 
    as.numeric()
}

arv_dat &amp;lt;- 
  arv_dat %&amp;gt;% 
  filter(Country %in% sub_sahara) %&amp;gt;% 
  na_if(&amp;quot;No data&amp;quot;) %&amp;gt;% 
  mutate_at(2:4, clean_column)

head(arv_dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 4
##   Country      PersCov NumberOnARV NumberInfected
##   &amp;lt;chr&amp;gt;          &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;          &amp;lt;dbl&amp;gt;
## 1 Angola            26       78700         310000
## 2 Benin             55       38400          70000
## 3 Botswana          84      318000         380000
## 4 Burkina Faso      65       61400          94000
## 5 Burundi           77       60100          78000
## 6 Cameroon          49      254000         510000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We use a regular expression to get rid of all the square bracket ranges. We also remove the “&amp;lt;” sign and spaces within numbers, change “No data” to &lt;code&gt;NA&lt;/code&gt;, and convert the characters to numbers. We filter out the countries we don’t want. (Note that some countries are not available in the ARV data, e.g., Swaziland and Reunion.)&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;highest-infected-countries&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Highest infected countries&lt;/h3&gt;
&lt;p&gt;Now look at the countries with the highest number of infected people of all ages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_dat %&amp;gt;% 
  top_n(4, wt = NumberInfected) %&amp;gt;% 
  arrange(-NumberInfected) %&amp;gt;% 
  kable(
    caption = &amp;quot;Countries with the highest number of HIV infections&amp;quot;
  )&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-1&#34;&gt;Table 1: &lt;/span&gt;Countries with the highest number of HIV infections&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Country&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;PersCov&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;NumberOnARV&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;NumberInfected&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;South Africa&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;61&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4359000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;7200000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Nigeria&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;33&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1040000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3100000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Mozambique&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;54&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1156000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2100000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Kenya&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;75&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1122000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1500000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We can see that South Africa has the highest number of HIV-infected people in Sub-Saharan Africa.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-in-southern-africa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;HIV in Southern Africa&lt;/h2&gt;
&lt;p&gt;In South Africa, the first AIDS-related death occurred in 1985. Not all patients were eligible to receive ARVs, and it was only in 2004 that ARVs became available in the public sector in South Africa. Eligibility restriction still applied, so not all HIV infected patients received treatment.&lt;/p&gt;
&lt;p&gt;Ideally, a country would have all its HIV-infected people on treatment, but due to financial constraints, this is not always possible. In South Africa, patients were only initialized on ARVs when their CD4 counts dropped below a certain level. This threshold was initially 200 cells/mL in 2004, which was then changed to 350 cells/mL and 500 cell/mL at later intervals. These recommendations were a compromise between the availability of funds and getting ARVs to the people needing it the most. CD4 cells are a major component of the immune system; the lower the CD4 cell count the higher the chance for opportunistic infections. Thus, the idea is to support the patients who are most likely to contract an opportunistic infection.&lt;/p&gt;
&lt;p&gt;The problem with this was that about only a third of the HIV infected people in South Africa were receiving HAART treatment. In 2017, the guidelines changed to test and treat; i.e., any newly diagnosed patient will receive HAART treatment. This is a big improvement for many reasons, but notably a lower infection rate. If a patient is taking HAART treatment and it is effective in suppressing the viral replication, the chances of the patient transmitting the virus are very close to zero.&lt;/p&gt;
&lt;p&gt;However, these treatments are not without side effects, which in some cases causes very poor adherence to the treatment. There are numerous factors to blame here, specifically socio-economic factors and depression. There is also ignorance and the “fear of knowing”, which causes people not to know their status. Finally, human nature brings with it various other complexities, such as conspiracy theories, and religious and personal beliefs. This will be a very long post if we delve into all the issues, but the take-home message is: the situation is complicated.&lt;/p&gt;
&lt;div id=&#34;arv-coverage-by-country&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;ARV coverage by country&lt;/h3&gt;
&lt;p&gt;We looked at the rate of HIV infections, and also the number of people infected, in the most endemic countries. We have talked about treatment. It would be interesting to look at ARV coverage by country.&lt;/p&gt;
&lt;p&gt;Let’s see how these countries rank by ARV coverage:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_dat %&amp;gt;%
  na.omit(PersCov) %&amp;gt;%
  ggplot(aes(x = reorder(Country, PersCov), y = PersCov)) +
  geom_point(aes(colour = NumberInfected), size = 3) +
  scale_colour_viridis(
    name = &amp;quot;Number of people infected&amp;quot;, 
    trans = &amp;quot;log10&amp;quot;,
    option = &amp;quot;plasma&amp;quot;
  ) +
  coord_flip() +
  ylab(&amp;quot;% ARV coverage&amp;quot;) + xlab(&amp;quot;Country&amp;quot;) +
  theme_bw()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_rank-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This shows that Zimbabwe, Namibia, Botswana, and Rwanda have the highest ARV coverage (above 80%). South Africa has the highest number of infections (as we saw before), and coverage of just above 60%.&lt;/p&gt;
&lt;p&gt;Botswana rolled out their treatment program in 2002, and by mid-2005, about half of the eligible population received ARV treatment. South Africa, on the other hand, only started treatment in 2004, which we discuss later.&lt;/p&gt;
&lt;p&gt;When talking about treatment, we should also look at the changes in mortality.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-related-deaths&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;HIV related deaths&lt;/h3&gt;
&lt;p&gt;Read in the &lt;a href=&#34;http://apps.who.int/gho/data/node.main.623?lang=en&#34;&gt;data&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hiv_mort &amp;lt;- 
  read_csv(file.path(file_path, &amp;quot;HIV deaths.csv&amp;quot;), col_types = &amp;quot;ccccc&amp;quot;) %&amp;gt;% 
  na_if(&amp;quot;No data&amp;quot;) %&amp;gt;% 
  mutate_at(vars(starts_with(&amp;quot;Deaths&amp;quot;)), clean_column) %&amp;gt;% 
  filter(Country %in% sub_sahara)

head(hiv_mort)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 5
##   Country      Deaths_2017 Deaths_2010 Deaths_2005 Deaths_2000
##   &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;
## 1 Angola             13000       10000        7900        3900
## 2 Benin               2500        2600        4300        2600
## 3 Botswana            4100        5900       13000       15000
## 4 Burkina Faso        2900        5400       12000       15000
## 5 Burundi             1700        5400        8600        8500
## 6 Cameroon           24000       25000       26000       17000&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(hiv_mort)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    Country           Deaths_2017      Deaths_2010      Deaths_2005    
##  Length:43          Min.   :   100   Min.   :   100   Min.   :   100  
##  Class :character   1st Qu.:  1900   1st Qu.:  1975   1st Qu.:  2050  
##  Mode  :character   Median :  4400   Median :  5400   Median :  8250  
##                     Mean   : 15442   Mean   : 23483   Mean   : 33227  
##                     3rd Qu.: 16250   3rd Qu.: 27250   3rd Qu.: 48250  
##                     Max.   :150000   Max.   :200000   Max.   :260000  
##                     NA&amp;#39;s   :3        NA&amp;#39;s   :3        NA&amp;#39;s   :3       
##   Deaths_2000    
##  Min.   :   100  
##  1st Qu.:  1150  
##  Median :  6500  
##  Mean   : 26496  
##  3rd Qu.: 41500  
##  Max.   :130000  
##  NA&amp;#39;s   :3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The 2017 mean for the dataset as a whole is about half of that during the early 2000s. It would be interesting to plot this data, but it will probably be too busy as it is. We can instead have a look at countries which had the most change.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hiv_mort &amp;lt;- hiv_mort %&amp;gt;% 
  mutate(
    min = apply(hiv_mort[, 2:4], 1, FUN = min),
    max  = apply(hiv_mort[, 2:4], 1, FUN = max),
    Change = max - min
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we can create a plot of the data, and look at the top five countries with the biggest change in HIV-related mortality.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hiv_mort %&amp;gt;%
  top_n(5, wt = Change) %&amp;gt;%
  gather(Year, Deaths, Deaths_2017:Deaths_2000) %&amp;gt;% 
  na.omit() %&amp;gt;%
  mutate(
    Year = str_replace(Year, &amp;quot;Deaths_&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;% as.numeric(),
    Country = fct_reorder(Country, Deaths)
  ) %&amp;gt;% 
  ggplot(aes(x = Year, y = Deaths, color = Country)) +
  geom_line(size = 1) +
  geom_vline(xintercept = 2004, color = &amp;quot;black&amp;quot;, linetype = &amp;quot;dotted&amp;quot;, size = 1.5) +
  scale_color_viridis(option = &amp;quot;D&amp;quot;, discrete = TRUE) +
  theme_bw() +
  theme(legend.position = &amp;quot;bottom&amp;quot;)  &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_hiv_mort-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Remember, we mentioned that &lt;strong&gt;HAART&lt;/strong&gt; (Highly Active Antiretroviral Treatment) was introduced in 2004 in South Africa, depicted here by the black dotted line. It is easy to appreciate the dramatic effect the introduction of ARVs had in South Africa.&lt;/p&gt;
&lt;p&gt;Although the picture above is positive, the fight is not over. The target is to get at least 90% of HIV-infected patients on treatment. Adherence to ARV regimens stays crucial not only to suppress viral replication, but also to minimize the development of drug resistance.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;infection-rates&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Infection rates&lt;/h3&gt;
&lt;p&gt;As mentioned earlier, if a patient is taking and responding to treatment, the viral load gets suppressed and the chances of transmitting the infection become very close to null. Thus, the more patients with an undetectable viral load, the lower the transmission rate.&lt;/p&gt;
&lt;p&gt;Read the &lt;a href=&#34;http://aidsinfo.unaids.org/?did=5b4eaa7cdddb54192bb39714&amp;amp;r=world&amp;amp;t=null&amp;amp;tb=d&amp;amp;bt=dnli&amp;amp;ts=null&amp;amp;tr=world&amp;amp;tl=2&#34;&gt;data&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;new_infections &amp;lt;- 
  read_csv(file.path(file_path, 
    &amp;quot;Epidemic transition metrics_Trend of new HIV infections.csv&amp;quot;), 
    na = &amp;quot;...&amp;quot;, 
    col_types = cols(
      .default = col_character(),
      `2017_1` = col_double()
    )
  ) %&amp;gt;% 
  select(
    -ends_with(&amp;quot;_upper&amp;quot;), 
    -ends_with(&amp;quot;lower&amp;quot;), 
    -ends_with(&amp;quot;_1&amp;quot;)
  ) %&amp;gt;% 
  mutate_at(-1, clean_column) %&amp;gt;%
  na.omit()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Duplicated column names deduplicated: &amp;#39;2017&amp;#39; =&amp;gt; &amp;#39;2017_1&amp;#39; [26]&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;new_infections %&amp;gt;% 
  gather(Year, NewInfections, 2:9) %&amp;gt;% 
  ggplot(aes(x = Year, y = NewInfections, color = Country)) +
  geom_point() +
  theme_classic() +
  theme(legend.position = &amp;quot;none&amp;quot;) +
  xlab(&amp;quot;Year&amp;quot;) + 
  ylab(&amp;quot;Number of new infections&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/new_infections-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This is a bit busy. Countries that are highly endemic with good ARV coverage and prevention of infection programs should have a steeper decline in the newly infected people. At first glance, it looks like some of the data points are fairly linear. Let’s go with that assumption, and apply linear regression to each country.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rates_modeled &amp;lt;- 
  new_infections %&amp;gt;% 
  filter(Country %in% sub_sahara) %&amp;gt;% 
  na.omit() %&amp;gt;% 
  gather(Year, NewInfections, 2:9) %&amp;gt;% 
  mutate(Year = as.numeric(Year)) %&amp;gt;% 
  group_by(Country) %&amp;gt;% 
  do(tidy(lm(NewInfections ~ Year, data = .))) %&amp;gt;% 
  filter(term == &amp;quot;Year&amp;quot;) %&amp;gt;% 
  ungroup() %&amp;gt;% 
  mutate(
    Country = fct_reorder(Country, estimate, .desc = TRUE)
  ) %&amp;gt;% 
  arrange(desc(estimate)) %&amp;gt;% 
  select(-one_of(&amp;quot;term&amp;quot;, &amp;quot;statistic&amp;quot;))

rates_modeled %&amp;gt;% 
  head() %&amp;gt;% 
  kable(
    caption = &amp;quot;Results of linear regression: Rate of new infections per year&amp;quot;
  )&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-3&#34;&gt;Table 2: &lt;/span&gt;Results of linear regression: Rate of new infections per year&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Country&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Madagascar&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;469.04762&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;12.56126&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Côte d’Ivoire&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;190.47619&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;153.99689&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2623441&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Botswana&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;130.95238&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;92.46968&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2064860&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Mali&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;108.33333&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;23.21683&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0034452&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Congo&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;103.57143&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;16.45271&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0007486&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Eritrea&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;89.28571&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;23.05347&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0082374&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rates_modeled %&amp;gt;%
  na.omit() %&amp;gt;% 
  ggplot(aes(x = Country, y = estimate, fill = p.value &amp;gt;= 0.05)) +
  geom_col() +
  coord_flip() +
  theme_bw() +
  ylab(&amp;quot;Estimated change in HIV infection (people/year)&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/rates_model_plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;With a quick look at the plot shown above, we can see that for most countries, a linear model fits the data with a significant p-value cutoff of 0.05.&lt;/p&gt;
&lt;p&gt;It is important to note here that the data we have at hand is from 2010 to 2017. This shows that some countries - notably, South Africa - are on a good trajectory. Botswana, being the “Poster Child” of a good HIV treatment and prevention program, seems to have stabilized in terms of rate of infection, with a positive but insignificant estimate of the rate of infection. This could be explained by the following reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First African country to introduce HAART, 2002&lt;/li&gt;
&lt;li&gt;Progressive in terms of prevention programs&lt;/li&gt;
&lt;li&gt;Looking only from 2010, we are missing the dramatic decline in infection&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://www.who.int/&#34;&gt;WHO&lt;/a&gt; goal is to get 90% of a country’s infected people on HAART, but the last 5-7% might be the hardest to convince&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can combine the ARV and estimated rates of infection data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_on_infection &amp;lt;- 
  arv_dat %&amp;gt;% 
  left_join(rates_modeled, by = &amp;quot;Country&amp;quot;) %&amp;gt;% 
  mutate(p_interpretation = if_else(p.value &amp;gt;= 0.05, &amp;quot;Significant&amp;quot;, &amp;quot;Insignificant&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Column `Country` joining character vector and factor, coercing
## into character vector&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_on_infection %&amp;gt;% 
  na.omit() %&amp;gt;% 
  ggplot(aes(x = PersCov, y = estimate, 
             shape = p_interpretation &amp;gt;= 0.05)) +
  geom_point(aes(color = NumberInfected), size = 2) +
  geom_text_repel(aes(label = Country), size = 3) +
  scale_color_gradient(high = &amp;quot;red&amp;quot;, low = &amp;quot;blue&amp;quot;) +
  theme_grey() +
  xlab(&amp;quot;% ARV coverage&amp;quot;) + 
  ylab(&amp;quot;Estimated change in HIV infection\n(people/year)&amp;quot;) +
  ggtitle(&amp;quot;Antiretroviral (ARV) coverage&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/arv_infection-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;South Africa has the highest number of infected people, but on the positive side, has a downward trajectory of about 15000 fewer people newly infected each year. Although ARVs do play a crucial role in controlling this epidemic, it is not the only factor involved. Prevention of mother-to-child transmission has been very successful in South Africa. Awareness campaigns and education are playing a big role as well. The plot above shows our linearly modeled rates.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;the-laboratory-hiv-diagnosis-and-monitoring&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The laboratory, HIV diagnosis and monitoring&lt;/h2&gt;
&lt;p&gt;HIV-related laboratory tests are not the only diagnostics done in a Virology department, but in endemic countries, it accounts for the majority of tests which are done. The first HIV-related test done would be for diagnosis. This is done differently in adults than in infants. As we discussed earlier, after HIV infection, the immune system develops antibodies. We can use a field of study called &lt;strong&gt;serology&lt;/strong&gt; to detect antibodies and antigens, and in most cases, an ELISA test is performed to confirm HIV seroconversion or status. Since the mother’s antibodies will be present in the infant, an ELISA will tell us the baby is positive even though not infected. Infants are diagnosed by detecting viral RNA or DNA in their blood. This is done by PCR (Polymerase Chain Reaction).&lt;/p&gt;
&lt;p&gt;Once a patient is diagnosed as HIV-positive, the patient will be initiated on HAART, and in most cases, the viral load will be suppressed. In the South African public sector treatment program, after HAART initiation, the patient gets two six-monthly viral load tests to make sure viral replication is suppressed. To keep an eye out for trouble, a yearly viral load is done to confirm adherence and effectiveness of the treatment.&lt;/p&gt;
&lt;p&gt;When an unsuppressed viral load is detected, action is taken and adherence counselling is performed. If this does not solve the problem, drug-resistance testing is performed to assess the resistance profile of the infection in order to adjust the ARV regimen accordingly. This is done by isolating the viral RNA, converting it to DNA, amplifying the DNA to sufficient quantities to enable sequencing of the DNA. In our laboratory, we use &lt;a href=&#34;https://en.wikipedia.org/wiki/Sanger_sequencing&#34;&gt;Sanger sequencing&lt;/a&gt;, but other sequencing technologies also exist.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span id=&#34;fig:unnamed-chunk-4&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;/post/2019-05-01-analysis-hiv-pandemic-part-1_files/hxb2genome.gif&#34; alt=&#34;HIV Genome as depicted by the Los Alamos HIV sequence database. Available at https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: HIV Genome as depicted by the Los Alamos HIV sequence database. Available at &lt;a href=&#34;https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html&#34; class=&#34;uri&#34;&gt;https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;This diagram depicts the genome of HIV. The most common targets for interfering with viral replication is located in the &lt;em&gt;pol&lt;/em&gt; gene. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;prot&lt;/strong&gt;: The viral protease. Many of the viral proteins are translated as longer polypeptides, which are then cleaved into mature proteins by the protease.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;p51 RT&lt;/strong&gt;: The viral reverse transcriptase: Each virion contains two copies of viral RNA. The reverse transcriptase converts the RNA to DNA.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;p31 int&lt;/strong&gt;: The viral integrase: This enzyme integrates the reverse transcribed viral DNA into host genomes of the infected cells, and establishes chronic infection.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Essentially, ARVs interfere with these viral enzymes by inhibiting their action:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Protease inhibitors&lt;/strong&gt; prevent the maturation of viral proteins.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reverse transcriptase inhibitors&lt;/strong&gt; prevent the formation of a DNA copy of the viral genome, which then gives the integrase nothing to work with.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrase inhibitors&lt;/strong&gt; prevent the integration of viral DNA into the host genome, which is a crucial part of replication and infection.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Combining these ARVs in clever ways results in HAART or cART. By sequencing the viral RNA, we can detect mutations that cause resistance to specific ARVs. This information is then used to adjust the ARV regimen to once again effectively suppress viral replication.&lt;/p&gt;
&lt;p&gt;The viral reverse transcriptase has a high error rate when doing the conversion of RNA to DNA, and introduces random mutations in the viral genome. In the presence of selective pressure like ARVs, these random mutations might give advantageous phenotypic traits to the replicating virus, like drug resistance. On the other hand, if the patient is properly adhering to the treatment, the viral replication is suppressed, replication does not occur, thus mutations can’t occur.&lt;/p&gt;
&lt;p&gt;This high rate of mutation can be used in the laboratory as one of the quality-control tools. The polymerase chain reaction is prone to contamination, so it is possible when doing these reactions that one sample might contaminate another. This will give rise to false mutations in the contaminated sample and an erroneous result to the treating clinician, thus direct negative impact on the patient.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;what-next&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What next?&lt;/h2&gt;
&lt;p&gt;In a recent publication in &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PLoS ONE&lt;/a&gt;, the authors described how they used affordable hardware to create a &lt;a href=&#34;https://en.wikipedia.org/wiki/Phylogenetics&#34;&gt;phylogenetic&lt;/a&gt; pipeline, tailored for the HIV drug resistance testing facility.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In &lt;strong&gt;Part 2&lt;/strong&gt; of this four part series, we discuss this pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In &lt;strong&gt;Part 3&lt;/strong&gt;, we will discuss genetic distances and phylogenetics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, in &lt;strong&gt;Part 4&lt;/strong&gt;, we will look at the application of logistic regression in analyzing inter- and intra-patient genetic distance of viral sequences.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See you in the next section!&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Statistics in Glaucoma: Part II</title>
      <link>https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/</link>
      <pubDate>Fri, 07 Dec 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University.&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;analyzing-visual-field-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Analyzing Visual Field Data&lt;/h2&gt;
&lt;p&gt;In Part I of this series on statistic in glaucoma, we detailed the use of visual fields for understanding functional vision loss in glaucoma patients. Before discussing a new method for modeling visual field data that accounts for the anatomy of the eye, we discussed how visual field data is typically analyzed by introducing a common diagnostic metric, point-wise linear regression (PLR). PLR is a trend-based diagnostic that uses slope p-values from the location specific linear regressions to discriminate progression status. The motivation for PLR is straightforward, assuming that large negative slopes at numerous visual field locations is indicative of progression. This is characteristic of a large class of methods for analyzing visual field data that attempt to discriminate progression based on changes in the DLS across time. This technique is simple, intuitive, and effective; however, it is often limited due to the naivete of modeling assumptions, including the independence of visual field locations.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ocular-anatomy-in-the-neighborhood-structure-of-the-visual-field&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Ocular Anatomy in the Neighborhood Structure of the Visual Field&lt;/h2&gt;
&lt;p&gt;To properly account for the spatial dependencies on the visual field, Berchuck et al. 2018 introduce a neighborhood model that incorporates anatomical information through a dissimilarity metric. Details of the method can be found in Berchuck et al. 2018, but we provide a quick introduction. The key development is the specification of the neighborhood structure through a new definition of adjacency weights. Typically in areal data, the adjacency for two locations &lt;span class=&#34;math inline&#34;&gt;\(i\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(j\)&lt;/span&gt; is defined as &lt;span class=&#34;math inline&#34;&gt;\(w_{ij} = 1(i \sim j)\)&lt;/span&gt;, where &lt;span class=&#34;math inline&#34;&gt;\(i \sim j\)&lt;/span&gt; is the event that locations &lt;span class=&#34;math inline&#34;&gt;\(i\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(j\)&lt;/span&gt; are neighbors. As discussed in Part I, this assumption is not sufficient due to the complex anatomy of the eye. To account for this additional structure, a more general adjacency is introduced that is a function of a dissimilarity metric, &lt;span class=&#34;math inline&#34;&gt;\(w_{ij}(\alpha_t) = 1(i \sim j)\exp\{-z_{ij}\alpha_t\}\)&lt;/span&gt;. Here, &lt;span class=&#34;math inline&#34;&gt;\(z_{ij}\)&lt;/span&gt; is a dissimilarity metric that represents the absolute difference between the Garway-Heath angles of locations &lt;span class=&#34;math inline&#34;&gt;\(i\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(j\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The parameter &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; dictates the importance of the dissimilarity metric at each visual field exam &lt;span class=&#34;math inline&#34;&gt;\(t\)&lt;/span&gt;. When &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; becomes large, the model reduces to an independent process, and as &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; goes to zero, the process becomes the standard spatial model for areal data. Based on the specification of the adjacency weights, &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; has a useful interpretation with respect to deterioration of visual ability. In particular, &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; changing over exams indicates that the neighborhood structure on the visual field is changing, which in turn implies damage to the underlying retinal ganglion cell structure. This observation motivates a diagnostic of progression that quantifies variability in &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; across time. We choose the coefficient of variation (CV) and demonstrate that is a highly significant predictor of progression, and furthermore, independent of trend-based methods such as PLR.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;navigating-the-womblr-package&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Navigating the &lt;code&gt;womblR&lt;/code&gt; Package&lt;/h2&gt;
&lt;p&gt;To make the method available to clinicians, the R package &lt;code&gt;womblR&lt;/code&gt; was developed. The package provides a suite of functions that walk a user through the full process of analyzing a series of visual fields from beginning to end. The user interface was modeled after other impactful R packages for Bayesian spatial analysis, including &lt;code&gt;spBayes&lt;/code&gt; and &lt;code&gt;CARBayes&lt;/code&gt;. The package name combines Hadley’s naming convention for R packages (i.e., ending a package with the letter R) with the name of the author of the seminal paper on boundary detection, originally referred to areal wombling (Womble 1951).&lt;/p&gt;
&lt;p&gt;We will now walk through the process of analyzing visual field data, estimating the &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; parameters, and assessing progression status. The main function in &lt;code&gt;womblR&lt;/code&gt; is the Spatiotemporal Boundary Detection with Dissimilarity Metric model function (&lt;code&gt;STBDwDM&lt;/code&gt;). Inference for the method is obtained through Markov chain Monte Carlo (MCMC), which is a computationally intensive method that iterates between updating individual model parameters until enough posterior samples have been collected post-convergence for making accurate posterior inference. Because of the iterative nature of MCMC, the majority of computation is performed within a &lt;code&gt;for&lt;/code&gt; loop, so the package is built on C++ through the packages &lt;code&gt;Rcpp&lt;/code&gt; and &lt;code&gt;RcppArmadillo&lt;/code&gt;. Because of the increased complexity of writing in C++, the pre- and post-processing of the model are done in &lt;code&gt;R&lt;/code&gt; with the &lt;code&gt;for&lt;/code&gt; loop implemented in C++. The MCMC method employed in &lt;code&gt;womblR&lt;/code&gt; is a Metropolis-Hastings within Gibbs algorithm.&lt;/p&gt;
&lt;p&gt;Just as a quick aside, with the more recent advent of probabilistic programming, this model could have been implemented using the Hamiltonian Monte Carlo methods used in software like Stan or PyMC3. These programs do not require the derivation of full conditionals, and push the MCMC algorithm to the background. There is undoubtedly a huge market for this type of software, and it is clearly playing a significant role in the popularization of Bayesian modeling. At the same time, implementing MCMC samplers using &lt;code&gt;Rcpp&lt;/code&gt; with traditional MCMC algorithms can be instructive, and for those with experience, nearly as quick of a coding experience.&lt;/p&gt;
&lt;p&gt;We now begin by formatting the visual field data for analysis. According to the manual, the observed data &lt;code&gt;Y&lt;/code&gt; must first be ordered spatially and then temporally. Furthermore, we will remove all locations that correspond to the natural blind spot (which, in the Humphrey Field Analyzer-II, correspond to locations 26 and 35).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;###Load package
library(womblR)

###Format data
blind_spot &amp;lt;- c(26, 35) # define blind spot
VFSeries &amp;lt;- VFSeries[order(VFSeries$Location), ] # sort by location
VFSeries &amp;lt;- VFSeries[order(VFSeries$Visit), ] # sort by visit
VFSeries &amp;lt;- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locations
Y &amp;lt;- VFSeries$DLS # define observed outcome data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we have assigned the observed outcomes to &lt;code&gt;Y&lt;/code&gt;, we move onto the temporal variable &lt;code&gt;Time&lt;/code&gt;. For visual field data, we define this to be the time from the baseline visit. We obtain the unique days from the baseline visit and scale them to be on the year scale.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Time &amp;lt;- unique(VFSeries$Time) / 365 # years since baseline visit
print(Time)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.0000000 0.3452055 0.6520548 1.1123288 1.3808219 1.6109589 2.0712329
## [8] 2.3780822 2.5698630&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we assign the adjacency matrix and dissimilarity metric (both discussed in Part I).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;W &amp;lt;- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix
DM &amp;lt;- GarwayHeath[-blind_spot] # Garway-Heath angles&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we have specified the data objects &lt;code&gt;Y&lt;/code&gt;, &lt;code&gt;DM&lt;/code&gt;, &lt;code&gt;W&lt;/code&gt;, and &lt;code&gt;Time&lt;/code&gt;, we will customize the objects that characterize Bayesian MCMC methods, in particular, hyperparameters, starting values, Metropolis tuning values, and MCMC inputs. These objects have been detailed previously in the &lt;code&gt;womblR&lt;/code&gt; package &lt;a href=&#34;https://cran.r-project.org/web/packages/womblR/vignettes/womblR-example.html&#34;&gt;vignette&lt;/a&gt;, so we will not spend time going over their definitions. We will only note that they are each &lt;code&gt;list&lt;/code&gt; objects similar to the &lt;code&gt;spBayes&lt;/code&gt; package. We begin by specifying the hyperparameters.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;###Bounds for temporal tuning parameter phi
TimeDist &amp;lt;- abs(outer(Time, Time, &amp;quot;-&amp;quot;))
TimeDistVec &amp;lt;- TimeDist[lower.tri(TimeDist)]
minDiff &amp;lt;- min(TimeDistVec)
maxDiff &amp;lt;- max(TimeDistVec)
PhiUpper &amp;lt;- -log(0.01) / minDiff # shortest diff goes down to 1%
PhiLower &amp;lt;- -log(0.95) / maxDiff # longest diff goes up to 95%

###Hyperparameter object
Hypers &amp;lt;- list(Delta = list(MuDelta = c(3, 0, 0), OmegaDelta = diag(c(1000, 1000, 1))),
               T = list(Xi = 4, Psi = diag(3)),
               Phi = list(APhi = PhiLower, BPhi = PhiUpper))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we specify the starting values for the parameters, Metropolis tuning variances, and MCMC details.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;###Starting values
Starting &amp;lt;- list(Delta = c(3, 0, 0), T = diag(3), Phi = 0.5)

###Metropolis tuning variances
Nu &amp;lt;- length(Time) # calculate number of visits
Tuning &amp;lt;- list(Theta2 = rep(1, Nu), Theta3 = rep(1, Nu), Phi = 1)

###MCMC inputs
MCMC &amp;lt;- list(NBurn = 10000, NSims = 250000, NThin = 25, NPilot = 20)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We specify that our model will run for a burn-in period of 10,000 scans, followed by 250,000 scans post burn-in. In the burn-in period there will be 20 iterations of pilot adaptation evenly spaced out over the period. The final number of samples to be used for inference will be thinned down to 10,000 based on the thinning number of 25. We can now run the MCMC sampler. Details of the various options available in the sampler can be found in the documentation, &lt;code&gt;help(STBDwDM)&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;reg.STBDwDM &amp;lt;- STBDwDM(Y = Y, DM = DM, W = W, Time = Time,
                       Starting = Starting, Hypers = Hypers, Tuning = Tuning, MCMC = MCMC,
                       Family = &amp;quot;tobit&amp;quot;, 
                       TemporalStructure = &amp;quot;exponential&amp;quot;,
                       Distance = &amp;quot;circumference&amp;quot;,
                       Weights = &amp;quot;continuous&amp;quot;,
                       Rho = 0.99,
                       ScaleY = 10, 
                       ScaleDM = 100,
                       Seed = 54)
## Burn-in progress: |*************************************************|
## Sampler progress: 0%.. 10%.. 20%.. 30%.. 40%.. 50%.. 60%.. 70%.. 80%.. 90%.. 100%..  &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We quickly assess convergence by checking the traceplots of &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; (note that further MCMC convergence diagnostics should be used in practice).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;###Load coda package
library(coda)

###Convert alpha to an MCMC object
Alpha &amp;lt;- as.mcmc(reg.STBDwDM$alpha)

###Create traceplot
par(mfrow = c(3, 3))
for (t in 1:Nu) traceplot(Alpha[, t], ylab = bquote(alpha[.(t)]), main = bquote(paste(&amp;quot;Posterior of &amp;quot; ~ alpha[.(t)])))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-12-03-statistics-in-glaucoma-part-ii_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;689.28&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;converting-mcmc-samples-into-clinical-statements&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Converting MCMC Samples into Clinical Statements&lt;/h2&gt;
&lt;p&gt;Now we calculate the posterior distribution of the CV of &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; and print its moments.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;CVAlpha &amp;lt;- apply(Alpha, 1, function(x) sd(x) / mean(x))
plot(density(CVAlpha, adjust = 2), main = expression(&amp;quot;Posterior of CV&amp;quot;~(alpha[t])), xlab = expression(&amp;quot;CV&amp;quot;~(alpha[t])))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-12-03-statistics-in-glaucoma-part-ii_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;50%&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;STCV &amp;lt;- c(mean(CVAlpha), sd(CVAlpha), quantile(CVAlpha, probs = c(0.025, 0.975)))
names(STCV)[1:2] &amp;lt;- c(&amp;quot;Mean&amp;quot;, &amp;quot;SD&amp;quot;)
print(STCV)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       Mean         SD       2.5%      97.5% 
## 0.19121622 0.10205826 0.04636219 0.42744656&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For this information to be useful clinically, we convert it into a probability of progression based on a model trained on a large cohort of glaucoma patients (Berchuck et al. 2019). Because the information from &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; is independent of trend-based methods, we show that the optimal use of &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; is combining it with a basic global metric that includes the slope and p-value (and their interaction) of the overall mean at each visual field exam. The trained model coefficients are publicly available and are used below. Furthermore, both the mean, standard deviation, and their interaction of the CV of &lt;span class=&#34;math inline&#34;&gt;\(\alpha_t\)&lt;/span&gt; are included. The probability of progression can be calculated as follows.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;###Calculate the global metric slope and p-value
MeanSens &amp;lt;- apply(t(matrix(VFSeries$DLS, ncol = Nu)) / 10, 1, mean) # scaled mean DLS
reg.global &amp;lt;- lm(MeanSens ~ Time) # global regression
GlobalS &amp;lt;- summary(reg.global)$coef[2, 1] # global slope
GlobalP &amp;lt;- summary(reg.global)$coef[2, 4] # global p-value

###Obtain probabiltiy of progression using estimated parameters from Berchuck et al. 2019
input &amp;lt;- c(1, GlobalP, GlobalS, STCV[1], STCV[2], GlobalS * GlobalP, STCV[1] * STCV[2])
coef &amp;lt;- c(-1.7471655, -0.2502131, -13.7317622, 7.4746348, -8.9152523, 18.6964153, -13.3706058)
fit &amp;lt;- input %*% coef
exp(fit) / (1 + exp(fit))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##           [,1]
## [1,] 0.4355997&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The probability of progression is calculated to be 0.44, which can be compared to the threshold cutoff for the trained model of 0.325. This cutoff for the probability of progression was determined using operating characteristics, so that the specificity was forced to be in the clinically meaningful range of 85%. Based on this derived threshold, the probability of progression is high enough to indicate that this patient’s disease shows evidence of visual field progression (which is reassuring, because we know this patient has progression as determined by clinicians).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Looking ahead:&lt;/code&gt; The third installment will wrap up the discussion on the &lt;code&gt;womblR&lt;/code&gt; package and ponder future directions for the role of statistics in glaucoma research. Furthermore, the role of open-source software in medicine will be discussed.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Berchuck, S.I., Mwanza, J.C., &amp;amp; Warren, J.L. (2018). &lt;a href=&#34;https://arxiv.org/abs/1805.11636&#34;&gt;&lt;em&gt;Diagnosing Glaucoma Progression with Visual Field Data Using a Spatiotemporal Boundary Detection Method&lt;/em&gt;&lt;/a&gt;, In press at &lt;em&gt;Journal of the American Statistical Association&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Womble, W. H. (1951). &lt;a href=&#34;http://science.sciencemag.org/content/114/2961/315&#34;&gt;&lt;em&gt;Differential Systematics&lt;/em&gt;&lt;/a&gt;. &lt;em&gt;Science&lt;/em&gt;, 114(2961), 315-322.&lt;/li&gt;
&lt;li&gt;Berchuck, S.I., Mwanza, J.C., Tanna, A.P., Budenz, D.L., Warren, J.L. (2019). &lt;em&gt;Improved Detection of Visual Field Progression Using a Spatiotemporal Boundary Detection Method&lt;/em&gt;. In press at &lt;em&gt;Scientific Reports&lt;/em&gt; (Available upon request).&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/12/07/statistics-in-glaucoma-part-ii/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Statistics in Glaucoma: Part I</title>
      <link>https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/</link>
      <pubDate>Mon, 03 Dec 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Samuel Berchuck is a Postdoctoral Associate in Duke University’s Department of Statistical Science and Forge-Duke’s Center for Actionable Health Data Science.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Joshua L. Warren is an Assistant Professor of Biostatistics at Yale University.&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Glaucoma is a leading cause of blindness worldwide, with a prevalence of 4% in the population aged 40-80. The disease is characterized by retinal ganglion cell death and corresponding damage to the optic nerve head. Since visual impairment caused by glaucoma is irreversible and efficient treatments exist, early detection of the disease is essential. Determining if the disease is progressing remains one of the most challenging aspects of glaucoma management, since it is difficult to distinguish true progression from variability due to natural degradation or noise. In practice, clinicians monitor progression using a multifactorial approach that relies on various measurements of the disease. In this series of blog posts, we focus on the use of visual fields. Visual field examinations obtain levels of a patient’s actual vision, and the practice is thus referred to as a functional measurement. As such, visual fields are a proxy for a patient’s quality of life, and therefore are typically prioritized in practice.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;visual-field-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Visual Field Data&lt;/h2&gt;
&lt;p&gt;Visual fields are complex spatiotemporal data generated from an intricate anatomical system, which is important to understand for modeling purposes. To illustrate visual field data, we load an example data set from the &lt;code&gt;womblR&lt;/code&gt; package on CRAN. The package &lt;code&gt;womblR&lt;/code&gt; was developed specifically for analyzing visual field data, and uses a Bayesian hierarchical model that accounts for the complex nature of the data (more details will be provided in Part II). The specific data set comes from the Vein Pulsation Study Trial in Glaucoma and the Lions Eye Institute trial registry, Perth, Western Australia. We begin by loading the package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(womblR)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The data set of interest is loaded lazily and can be accessed as follows; we also view the first six rows for illustration.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(VFSeries)
head(VFSeries)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   Visit DLS Time Location
## 1     1  25    0        1
## 2     2  23  126        1
## 3     3  23  238        1
## 4     4  23  406        1
## 5     5  24  504        1
## 6     6  21  588        1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The data object &lt;code&gt;VFSeries&lt;/code&gt; contains a longitudinal series of visual fields for a glaucoma patient that we will use throughout the three blog posts to exemplify the study of visual fields. This patient has been determined to be progressing, based on the expertise of two clinicians. &lt;code&gt;VFSeries&lt;/code&gt; has four variables: &lt;code&gt;Visit&lt;/code&gt;, &lt;code&gt;DLS&lt;/code&gt;, &lt;code&gt;Time&lt;/code&gt;, and &lt;code&gt;Location&lt;/code&gt;. The variable &lt;code&gt;Visit&lt;/code&gt; represents the visual field test visit number, &lt;code&gt;DLS&lt;/code&gt; the observed measure, &lt;code&gt;Time&lt;/code&gt; the time of the visual field test (in days from baseline visit), and &lt;code&gt;Location&lt;/code&gt; the spatial location on the visual field where the observation occurred. There are 9 visual field exams contained in this data set, and on average 117.25 days between visits.&lt;/p&gt;
&lt;p&gt;To help visualize the dataframe, we can use the &lt;code&gt;PlotVFTimeSeries&lt;/code&gt; function. &lt;code&gt;PlotVFTimeSeries&lt;/code&gt; is a function that plots a patient’s observed visual acuity over time at each location on the visual field.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;PlotVfTimeSeries(Y = VFSeries$DLS,
                 Location = VFSeries$Location,
                 Time = VFSeries$Time,
                 main = &amp;quot;Visual field sensitivity time series \n at each location&amp;quot;,
                 xlab = &amp;quot;Days from baseline visit&amp;quot;,
                 ylab = &amp;quot;Differential light sensitivity (dB)&amp;quot;,
                 line.reg = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;528&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The above figure demonstrates the visual field from a Humphrey Field Analyzer-II (HFA-II) testing machine, which generates 54 spatial locations (only 52 informative locations; note the 2 blanks spots corresponding to the blind spot). The visual field map is constructed by assessing a patient’s response to varying levels of light. Patients are instructed to focus on a central fixation point as light is introduced randomly in a preceding manner over a grid on the visual field. As light is observed, the patient presses a button and the current light intensity is recorded. The process is repeated until the entire visual field is tested. The light intensity is measured in differential light sensitivity (DLS), which quantifies the difference in the HFA-II background and observed light intensity. Smaller values indicate worsening vision.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;spatial-anatomy-on-the-visual-field&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Spatial Anatomy on the Visual Field&lt;/h2&gt;
&lt;p&gt;The spatial surface of the visual field is observed on a lattice (i.e., uniform areal data); however, it is a complex projection of the underlying optic nerve head and exhibits anatomically induced spatial dependencies. In particular, localized damage to the optic disc can result in clinically deterministic deterioration across the visual field. Incorporating this non-standard spatial dependence structure into our methodology is a priority for properly analyzing these data, although it is commonly ignored. Translating this into math lingo, this means that a naive modeling of the spatial surface of the visual field would be inappropriate (i.e., neighbors defined through adjacent locations). Instead, the definition of a neighbor when considering vision loss on the visual field must depend on the underlying anatomical proximities.&lt;/p&gt;
&lt;p&gt;To illustrate this concept, we begin by displaying the visual field neighborhood structure. The adjacency matrix for the HFA-II is available in the &lt;code&gt;womblR&lt;/code&gt; package. In this analysis, we use a queen specification, meaning that an adjacency is defined as any location that shares an edge or corner on the lattice. We now load this adjacency matrix and remove the two locations that correspond to the blind spot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;blind_spot &amp;lt;- c(26, 35) # define blind spot
W &amp;lt;- HFAII_Queen[-blind_spot, -blind_spot] # HFA-II visual field adjacency matrix&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This adjacency structure can be displayed using the &lt;code&gt;graph.adjacency&lt;/code&gt; function in the &lt;code&gt;igraph&lt;/code&gt; package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(igraph)
adj.graph &amp;lt;- graph.adjacency(W, mode = &amp;quot;undirected&amp;quot;) 
plot(adj.graph)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;528&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As mentioned above, naively assuming that all of these adjacencies are equal ignores the important underlying anatomy that enforces these dependencies. This anatomical relationship of the visual field test points and the underlying optic nerve head was studied by Garway-Heath et al. (2000), in which they estimated the angle that each test location’s underlying retinal ganglion cells enters the optic disc, measured in degrees. These angles are the missing link that will allow the visual field adjacency structure to be dictated by the underlying anatomy. These angles can be visualized using the function &lt;code&gt;PlotAdjacency&lt;/code&gt; from &lt;code&gt;womblR&lt;/code&gt;, which displays neighborhood structures across the visual field. Before using this function, we need to load the angles measured in Garway-Heath et al. (2000). These are available from &lt;code&gt;womblR&lt;/code&gt;; again, we remove the blind spot before using.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Angles &amp;lt;- GarwayHeath[-blind_spot] # Garway-Heath angles
summary(Angles)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   80.75  192.50  177.35  275.75  329.00&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We are now ready to visualize the neighborhood structure of the visual field using the &lt;code&gt;PlotAdjacency&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;###Plot the angles on the visual field
PlotAdjacency(W = W, 
              DM = Angles, 
              zlim = c(0, 180), 
              Visit = NA, 
              edgewidth = 3.75,
              cornerwidth = 0.33,
              lwd.border = 3.75,
              main = &amp;quot;Garway-Heath angles\n across the visual field&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-11-19-statistics-in-glaucoma-part-i_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;528&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The angles measured by Garway-Heath et al. are presented at each location on the visual field. More interestingly, the distances between these angles are presented for each of the neighbor pairs. This figure is equivalent to the adjacency plot displayed above, but allows the adjacencies to vary as a function of the anatomy. In particular, if two visual field locations are anatomically similar, the dependency is strengthened (i.e., more white), and if the locations are close to anatomically independent, the dependency is weaker (i.e., more black). Here the edge adjacencies are represented by lines, while the diagonal adjacencies are represented as two triangles. This view of the visual field details the anatomical importance in modeling visual field data, as neighboring locations can have underlying retinal ganglion cells that enter the optic nerve head with a large degree of separation. In particular, locations on either side of the equator, although adjacent, are anatomically close to independent based on anatomy.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;how-to-model-visual-field-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;How to Model Visual Field Data?&lt;/h2&gt;
&lt;p&gt;If you have gotten this far in the post, hopefully you have the sense that the study of visual field data is statistically interesting and clinically important for properly assessing a glaucoma patient’s risk of vision loss. In the next two blog posts, we will explore how visual field data are currently analyzed and new methods that account for the anatomical structure detailed above. To accomplish this, we will break down the algorithm and software used to build the &lt;code&gt;womblR&lt;/code&gt; package, and will attempt to illustrate the importance of R packages for open-source clinical research.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reference&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Reference&lt;/h2&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Garway-Heath, David F., Darmalingum Poinoosawmy, Frederick W. Fitzke, and Roger A. Hitchings. “Mapping the visual field to the optic disc in normal tension glaucoma eyes” &lt;em&gt;Ophthalmology&lt;/em&gt; 107, no. 10 (2000): 1809-1815.&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/12/03/statistics-in-glaucoma-part-i/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Serendipity at R / Medicine</title>
      <link>https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/</link>
      <pubDate>Tue, 16 Oct 2018 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/</guid>
      <description>
        &lt;p&gt;We knew we were on to something important early on in the process of organizing &lt;a href=&#34;www.r-medicine.com&#34;&gt;R / Medicine 2018&lt;/a&gt;. Even during our initial attempts to articulate the differences between this conference and &lt;a href=&#34;www.rinpharma.com&#34;&gt;R / Pharma 2018&lt;/a&gt;, it became clear that the focus on the use of R and statistics in clinical settings was going to be a richer topic than just the design of clinical trials. However, it wasn&amp;rsquo;t until the conference got underway that we realized there was magic in the mix of attendees. R / Medicine attracted quite a few clinicians who were themselves using R in their work, or were in the process of teaching themselves R. This group catalyzed the discussions that continued throughout the conference, enabling high-bandwidth exchanges that would have otherwise suffered from the effort to translate between the two cultures. The small, single-track nature of the conference helped to keep the conversations going, with the questions and answers at the end of a given talk helping to enrich the quality of successive discussions.&lt;/p&gt;

&lt;p&gt;Rob Tibshirani set the collaborative tone for the conference with his opening &lt;a href=&#34;https://r-medicine.netlify.com/talks/talk7.pdf&#34;&gt;keynote talk&lt;/a&gt; describing the clinical forecasting system he and his collaborators have built to predict platelet usage for the Stanford hospitals. Big-league and big-impact, the system shows the promise of delivering real clinical and financial benefits. Tibshirani&amp;rsquo;s presentation of the modeling process also set the bar for clarity.&lt;/p&gt;

&lt;p&gt;The other keynotes were also &amp;ldquo;top shelf&amp;rdquo;. Michael Lawrence spoke about &lt;a href=&#34;https://r-medicine.netlify.com/talks/michael-lawrence-keynote.pdf&#34;&gt;Scientific Software In-the-Large&lt;/a&gt;. He laid out three challenges for scientific programming at this scale:&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Integration of independently developed modules&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Translation of analyses and prototypes into software&lt;br /&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Scalability&lt;br /&gt;
and addressed these issues using examples from the &lt;a href=&#34;https://www.bioconductor.org/&#34;&gt;Bioconductor&lt;/a&gt; project.&lt;/p&gt;

&lt;p&gt;Victoria Stodden&amp;rsquo;s Keynote, &lt;a href=&#34;http://web.stanford.edu/~vcs/talks/Yale-Sept-2018-STODDEN.pdf&#34;&gt;Computational Reproducibility in Medical Research: Toward Open Code and Data&lt;/a&gt;, was a meditation on the need to reassess scientific transparency in an age where big data and computational power are driving medical research, and deep intellectual contributions are encoded in software. I was particularly struck by the idea that progress towards computational reproducibility depends on the coordination of stakeholders.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-10-10-RMedicine_files/progress.png&#34; height = &#34;500&#34; width=&#34;700&#34;&gt;&lt;/p&gt;

&lt;p&gt;Perhaps the highest-energy talk of the conference (and maybe all of the conferences I have attended this year) was given by Yale&amp;rsquo;s &lt;a href=&#34;https://medicine.yale.edu/intmed/people/harlan_krumholz.profile&#34;&gt;Dr. Harlan Krumholz&lt;/a&gt;. Unfortunately, we have neither video nor slides from this keynote, but to give you some ideal of Dr. Krumholz iconoclastic work, look at the 2010 &lt;a href=&#34;https://www.forbes.com/forbes/2010/0927/opinions-harlan-krumholz-yale-medicine-ideas-opinions.html#311fdfca6db3&#34;&gt;Forbes Article&lt;/a&gt; and this more recent article published in &lt;a href=&#34;https://www.healthaffairs.org/doi/10.1377/hlthaff.2014.0053&#34;&gt;HealthAffairs&lt;/a&gt;. The following are some notes I managed to take at the talk between moments of mesmerization. With respect to medicine in general Dr. Krumholz said that:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There could not be a more exciting era in medicine. Medicine is emerging as an information science and the clinician&amp;rsquo;s role is changing to be a guide or interpretor, not a shaman.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Commenting on evidence-based medicine:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;More than half of the guidlines in cardiology are not based on evidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;With respect to medical data, he said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The goal should be to take high-dimensional data and make it low-dimensional. Instead of thinking that everyone should have the same data, we should move towards thinking: How dow we use the data that we do have? There should be no missing data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I took these statements to mean that teams of clinicians, statisticians, and data scientists should be working towards building predictive models for individual patients based on whatever data is available for them and whatever big data is relevant. This was clearly the music the crowd wanted to dance to.&lt;/p&gt;

&lt;p&gt;The slides for most of the rest of the talks are available on the website. One talk I would like to highlight here is Nathaniel Phillips&amp;rsquo; talk on &lt;a href=&#34;https://ndphillips.github.io/RMedicine_2018/#1&#34;&gt;Fast and Frugal Trees&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;/post/2018-10-10-RMedicine_files/fft.png&#34; height = &#34;500&#34; width=&#34;700&#34;&gt;&lt;/p&gt;

&lt;p&gt;This talk addressed a recurring theme throughout the conference: the difference in decision making between the two cultures of statisticians and physicians. Probabilistic estimates to characteristic risk and to inform decision making are central to a statisticians worldview. Physicians, on the other hand, are in general not comfortable with probabilities, and when push comes to shove, prefer unambiguous guidelines and thresholds, such as blood pressure ranges, to inform treatment decisions. A vexing cultural problem is to identify effective decision models that have a chance of actually being used by clinicians.&lt;/p&gt;

&lt;p&gt;The conference finished with a roundtable discussion with the theme &lt;em&gt;Bridging the Two Cultures&lt;/em&gt;, with panelists Beth Atkinson, Joseph Chou, Peter Higgins, Stephan Kadauke, Chinonyerem Madu, and Jack Wasey representing both the statistical and clinical points of view. The moderator (me) began by asking three questions:
  1. How do clinicians engage with statisticians and data scientists?
  2. What are some key ideas you should know about collaborating?
  3. In your experience, what kinds of engagements have been the most successful?&lt;/p&gt;

&lt;p&gt;Panelists were free to respond as they felt inclined to any or all of the questions. As I recall, a consensus emerged around three key ideas: make an effort to empathize with colleagues, meet frequently and go out of your way to interact with colleagues, and carefully select projects and then cultivate them.&lt;/p&gt;

&lt;p&gt;Planning is already underway for R / Medicine 2019. Mark the week of September 23rd, and stay tuned!&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2018/10/16/serendipity-at-r-medicine/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
