<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Medicine on R Views</title>
    <link>https://rviews.rstudio.com/tags/medicine/</link>
    <description>Recent content in Medicine on R Views</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 22 Apr 2021 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://rviews.rstudio.com/tags/medicine/" rel="self" type="application/rss+xml" />
    
    
    
    
    <item>
      <title>March 2021: &#34;Top 40&#34; New CRAN Packages</title>
      <link>https://rviews.rstudio.com/2021/04/22/march-2021-top-40-new-cran-packages/</link>
      <pubDate>Thu, 22 Apr 2021 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2021/04/22/march-2021-top-40-new-cran-packages/</guid>
      <description>
        

&lt;p&gt;By my count, two hundred twenty-one new packages &lt;em&gt;stuck&lt;/em&gt; to CRAN in March 2021.&lt;sup&gt;1&lt;/sup&gt; Here are my &amp;ldquo;Top 40&amp;rdquo; selections in twelve categories: Computational Methods, Data, Engineering, Genomics, Machine Learning, Medicine, Music, Networks, Science, Statistics, Utility, and Visualization. Two of these categories Engineering and Music have only one entry each. However, I decided to give them their own category in order to draw attention to  the use of R outside of the mainstream, and I have always lamented the fate of the &lt;em&gt;Miscellaneous&lt;/em&gt;. In the same spirit, note that the complete works of &lt;em&gt;the Bard&lt;/em&gt; appear in the Data category and that due to &lt;code&gt;tidypaleo&lt;/code&gt; &lt;em&gt;Paleoenvironmental&lt;/em&gt; is now &lt;em&gt;a thing&lt;/em&gt; in R.&lt;/p&gt;

&lt;h3 id=&#34;computational-methods&#34;&gt;Computational Methods&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=gamlss.foreach&#34;&gt;gamlss&lt;/a&gt; v1.0-5: Implements computationally intensive calculations for Generalized Additive Models for location, scale, and shape as described in &lt;a href=&#34;https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9876.2005.00510.x&#34;&gt;Rigby &amp;amp; Stasinopoulos (2005)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=waydown&#34;&gt;waydown&lt;/a&gt; v1.1.0: Implements an algorithm based on the classical Helmholtz decomposition to obtain an approximate potential function for non gradient fields. See &lt;a href=&#34;https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007788&#34;&gt;Rodríguez-Sánchez (2020)&lt;/a&gt; for background and the &lt;a href=&#34;https://cran.r-project.org/web/packages/waydown/vignettes/examples.pdf&#34;&gt;vignette&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;waydown.png&#34; height = &#34;400&#34; width=&#34;600&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;data&#34;&gt;Data&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=aopdata&#34;&gt;aopdata&lt;/a&gt; v0.2.1: Provides functions to download data from the &lt;a href=&#34;https://www.ipea.gov.br/acessooportunidades/en/&#34;&gt;Access to Opportunities Project&lt;/a&gt; (AOP) which includes annual estimates of access to employment, health and education services by transport mode, as well as data on the spatial distribution of population, schools and health-care facilities at a fine spatial resolution for all cities included in the study. There is an &lt;a href=&#34;https://cran.r-project.org/web/packages/aopdata/vignettes/intro_to_aopdata.html&#34;&gt;Introduction&lt;/a&gt; to the package, and there are vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/aopdata/vignettes/access_inequality.html&#34;&gt;Analyzing Inequality&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/aopdata/vignettes/access_maps.html&#34;&gt;Mapping Urban Accessibility&lt;/a&gt;, and &lt;a href=&#34;https://cran.r-project.org/web/packages/aopdata/vignettes/landuse_maps.html&#34;&gt;Mapping Pooulation and Land Use&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;aopdata.png&#34; height = &#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=bardr&#34;&gt;bardr&lt;/a&gt; v0.0.9: Provides R data structures for Shakespeare&amp;rsquo;s complete works, as provided by &lt;a href=&#34;https:www.gutenberg.org/ebooks/100&#34;&gt;Project Gutenberg&lt;/a&gt;. See &lt;a href=&#34;https://cran.r-project.org/web/packages/bardr/readme/README.html&#34;&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=metro&#34;&gt;metro&lt;/a&gt; v0.9.1: Provides access to the &lt;a href=&#34;https://developer.wmata.com/&#34;&gt;Metro Transparent Data Sets API&lt;/a&gt; published by the Washington Metropolitan Area Transit Authority, the  government agency operating light rail and passenger buses in the Washington D.C. area. See &lt;a href=&#34;https://cran.r-project.org/web/packages/metro/readme/README.html&#34;&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=RAQSAPI&#34;&gt;RAQSAPI&lt;/a&gt; v2.0.1: Provides functions to retrieve air monitoring data and associated metadata from the US Environmental Protection Agency&amp;rsquo;s &lt;a href=&#34;https://aqs.epa.gov/aqsweb/documents/data_api.html&#34;&gt;Air Quality System Service&lt;/a&gt;. There are several short vignettes including an &lt;a href=&#34;https://cran.r-project.org/web/packages/RAQSAPI/vignettes/Intro.html&#34;&gt;Introduction&lt;/a&gt; and a vignette on &lt;a href=&#34;https://cran.r-project.org/web/packages/RAQSAPI/vignettes/RAQSAPIusagetipsandprecautions.html&#34;&gt;Usage tips and precautions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=troopdata&#34;&gt;troopdata&lt;/a&gt; v0.1.3: Provides access to U.S. Department of Defense data on overseas military deployments and includes functions for pulling country-year troop deployment and basing data. See &lt;a href=&#34;https://cran.r-project.org/web/packages/troopdata/readme/README.html&#34;&gt;README&lt;/a&gt; to get started&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;troopdata.png&#34; height = &#34;400&#34; width=&#34;600&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;engineering&#34;&gt;Engineering&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=pipenostics&#34;&gt;pipenostics&lt;/a&gt; v0.1.7: Implements empirical and data-driven models of heat losses, corrosion diagnostics, reliability and predictive maintenance of pipeline systems which should be of interest to the engineering departments of heat generating and heat transferring companies. See &lt;a href=&#34;https://link.springer.com/book/10.1007%2F978-3-319-25307-7&#34;&gt;Timashev et al. (2016)&lt;/a&gt; and &lt;a href=&#34;https://www.sciencedirect.com/science/article/pii/S2214785317313755?via%3Dihub&#34;&gt;Reddy (2017)&lt;/a&gt; for the methods used and &lt;a href=&#34;https://cran.r-project.org/web/packages/pipenostics/readme/README.html&#34;&gt;README&lt;/a&gt; to get started.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;pipenostics.svg&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;genomics&#34;&gt;Genomics&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=glmmSeq&#34;&gt;glmmSeq&lt;/a&gt; v0.1.0: Provides functions to fit negative binomial mixed effects models with matched samples to model expression data. See the &lt;a href=&#34;https://cran.r-project.org/web/packages/glmmSeq/vignettes/glmmSeq.html&#34;&gt;vignette&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;glmmSeq.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=ondisc&#34;&gt;ondisc&lt;/a&gt; v1.0.0: Implements a method to allow researchers to analyze large-scale single-cell data as and R object stored on disk. There is a tutorial on the the &lt;a href=&#34;https://cran.r-project.org/web/packages/ondisc/vignettes/tutorial_odm_class.html&#34;&gt;ondisc matrix class&lt;/a&gt; and another on &lt;a href=&#34;https://cran.r-project.org/web/packages/ondisc/vignettes/tutorial_other_classes.html&#34;&gt;Metadata&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=SignacX&#34;&gt;SignacX&lt;/a&gt; v2.2.0: Implements a neural network trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See &lt;a href=&#34;https://www.biorxiv.org/content/10.1101/2021.02.01.429207v3&#34;&gt;Chamberlain et al. (2021)&lt;/a&gt; for background. There are seven vignettes including an &lt;a href=&#34;https://cran.r-project.org/web/packages/SignacX/vignettes/signac-Seurat_AMP.html&#34;&gt;Analysis of Kidney Lupus Data&lt;/a&gt; and an &lt;a href=&#34;https://cran.r-project.org/web/packages/SignacX/vignettes/signac-Seurat_pbmcs.html&#34;&gt;Analysis of PBMCs from 10X Genomics&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;SignacX.png&#34; height = &#34;200&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;machine-learning&#34;&gt;Machine Learning&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=opitools&#34;&gt;opitools&lt;/a&gt; v1.0.3: Implements a tool to analyze opinions inherent in a text document relating to a specific subject (A) and assess how opinions expressed with respect to another subject (B) may affect the opinions on subject A. This package has been designed specifically for application to social media datasets, such as Twitter and Facebook. See &lt;a href=&#34;https://osf.io/preprints/socarxiv/c32qh/&#34;&gt;Adepeju and Jimoh (2021)&lt;/a&gt; for an extended example that demonstrates the utility of the approach and the &lt;a href=&#34;https://cran.r-project.org/web/packages/opitools/vignettes/opitools-vignette.html&#34;&gt;vignette&lt;/a&gt; to get started.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;opitools.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=poems&#34;&gt;poems&lt;/a&gt; v1.0.1: Provides a framework of interoperable R6 classes for building ensembles of viable models via the &lt;a href=&#34;https://en.wikipedia.org/wiki/Pattern-oriented_modeling&#34;&gt;pattern-oriented modeling&lt;/a&gt; (POM) approach. The package includes classes for encapsulating and generating model parameters, and managing the POM workflow which includes: model setup; generating model parameters via Latin hyper-cube sampling; running multiple sampled model simulations; collating summary results; and validating and selecting an ensemble of models that best match known patterns. There are two vignettes: &lt;a href=&#34;https://cran.r-project.org/web/packages/poems/vignettes/simple_example.pdf&#34;&gt;Simple Example&lt;/a&gt; and &lt;a href=&#34;https://cran.r-project.org/web/packages/poems/vignettes/thylacine_example.pdf&#34;&gt;Thylacine Example&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;poems.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;medicine&#34;&gt;Medicine&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=dampack&#34;&gt;dampack&lt;/a&gt; v1.0.0: Implements a suite of functions for analyzing and visualizing the health economic outputs of mathematical models. See &lt;a href=&#34;https://www.cambridge.org/core/books/decision-making-in-health-and-medicine/31FD197195DAE2A6321409568BEFA2DD&#34;&gt;Hunink et al. (2014)&lt;/a&gt; for the theoretical underpinnings. There are five vignettes including &lt;a href=&#34;https://cran.r-project.org/web/packages/dampack/vignettes/basic_cea.html&#34;&gt;Basic Cost Effectiveness Analysis&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/dampack/vignettes/psa_analysis.html&#34;&gt;Probabilistic Sensitivity Analysis: Analysis&lt;/a&gt; and &lt;a href=&#34;https://cran.r-project.org/web/packages/dampack/vignettes/voi.html&#34;&gt;Value of Information Analysis&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;dampack.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=rdecision&#34;&gt;rdecision&lt;/a&gt; v1.0.3: Provides classes and functions for using decision trees to model health care interventions using cohort models. See &lt;a href=&#34;https://www.amazon.com/Decision-Modelling-Economic-Evaluation-Handbooks/dp/0198526628&#34;&gt;Briggs et al.&lt;/a&gt; for theory and terminology. There are five vignettes including &lt;a href=&#34;https://cran.r-project.org/web/packages/rdecision/vignettes/DT01-Sumatriptan.html&#34;&gt;Elementary decision tree (Evans 1997)&lt;/a&gt; and &lt;a href=&#34;https://cran.r-project.org/web/packages/rdecision/vignettes/DT02-Tegaderm.html&#34;&gt;Decision tree with PSA&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;rdecision.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;music&#34;&gt;Music&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=gm&#34;&gt;gm&lt;/a&gt; v1.0.2: Implements a high-level language to create music including converting your music to musical scores and audio files. It works with &lt;a href=&#34;https://rmarkdown.rstudio.com/&#34;&gt;R Markdown&lt;/a&gt;, R &lt;a href=&#34;https://jupyter.org/&#34;&gt;Jupyter Notebooks&lt;/a&gt;, and RStudio. There vignette is available in &lt;a href=&#34;https://cran.r-project.org/web/packages/gm/vignettes/gm.html&#34;&gt;English&lt;/a&gt; and in &lt;a href=&#34;https://cran.r-project.org/web/packages/gm/vignettes/cn.html&#34;&gt;Chinese&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;gm.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;networks&#34;&gt;Networks&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=sfnetworks&#34;&gt;sfnetworks&lt;/a&gt; v0.5.1: Provides a tidy approach to spatial network analysis in the form of classes and functions that enable a seamless interaction between the network analysis package &lt;code&gt;tidygraph&lt;/code&gt; and the spatial analysis package &lt;code&gt;sf&lt;/code&gt;. There are vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/sfnetworks/vignettes/structure.html&#34;&gt;sf network structure&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/sfnetworks/vignettes/preprocess_and_clean.html&#34;&gt;Preprocessing&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/sfnetworks/vignettes/join_filter.html&#34;&gt;Spatial joins and filters&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/sfnetworks/vignettes/routing.html&#34;&gt;Routing&lt;/a&gt;, and &lt;a href=&#34;https://cran.r-project.org/web/packages/sfnetworks/vignettes/morphers.html&#34;&gt;Spatial morphers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;sfnetworks.png&#34; height = &#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=valhallr&#34;&gt;valhallr&lt;/a&gt; v0.1.0: Implements an interface to the &lt;a href=&#34;https://github.com/valhalla/valhalla&#34;&gt;Valhalla&lt;/a&gt; routing engine’s API for turn-by-turn routing, isochrones, and origin-destination analyses. See the &lt;a href=&#34;https://cran.r-project.org/web/packages/valhallr/vignettes/valhallr.html&#34;&gt;vignette&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;valhallr.jpeg&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;science&#34;&gt;Science&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=asteRisk&#34;&gt;asteRisk&lt;/a&gt; v0.99.4: Provides functions to calculate the positions of satellites given a known state vector. It includes implementations of the SGP4 and SDP4 simplified perturbation models to propagate orbital state vectors. See &lt;a href=&#34;https://celestrak.com/NORAD/documentation/spacetrk.pdf&#34;&gt;Hoots et al. (1988)&lt;/a&gt;, &lt;a href=&#34;https://arc.aiaa.org/doi/10.2514/6.2006-6753&#34;&gt;Vallado et al. (2012)&lt;/a&gt;, and &lt;a href=&#34;https://arc.aiaa.org/doi/10.2514/1.9161&#34;&gt;Hoots et al. (2014)&lt;/a&gt; for background and the &lt;a href=&#34;https://cran.r-project.org/web/packages/asteRisk/vignettes/asteRisk.html&#34;&gt;vignette&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;asteRisk.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=forImage&#34;&gt;forImage&lt;/a&gt; v0.1.0: Implements a tool to measure the size of foraminifera and other unicellulars and includes functions to guide foraminiferal test biovolume calculations and cell biomass estimations. The volume function includes several microalgae models geometric adaptations based on &lt;a href=&#34;https://onlinelibrary.wiley.com/doi/abs/10.1046/j.1529-8817.1999.3520403.x&#34;&gt;Hillebrand et al. (1999)&lt;/a&gt;, &lt;a href=&#34;https://academic.oup.com/plankt/article/25/11/1331/1490055&#34;&gt;Sun &amp;amp; Liu (2003)&lt;/a&gt;, and &lt;a href=&#34;http://siba-ese.unisalento.it/index.php/twb/article/view/106&#34;&gt;Vadrucci et al. (2007)&lt;/a&gt;. See the &lt;a href=&#34;https://cran.r-project.org/web/packages/forImage/vignettes/forImage_vignette.html&#34;&gt;vignette&lt;/a&gt; to get started.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;forImage.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=OpenSpecy&#34;&gt;OpenSpecy&lt;/a&gt; v0.9.1: Provides functions to analyze, process, identify and share Raman and (FT)IR spectra with functions to implement Savitzky-Golay smoothing in accordance with &lt;a href=&#34;https://journals.sagepub.com/doi/10.1366/000370207782597003&#34;&gt;Zhao et al. (2007)&lt;/a&gt; and identify spectra using an onboard reference library, see &lt;a href=&#34;https://journals.sagepub.com/doi/10.1177/0003702820929064&#34;&gt;Cowger et al. 2020&lt;/a&gt;. Analyzed spectra can be shared via &lt;a href=&#34;https://wincowger.shinyapps.io/OpenSpecy/&#34;&gt;Shiny App&lt;/a&gt;. There is a &lt;a href=&#34;https://cran.r-project.org/web/packages/OpenSpecy/vignettes/sop.html&#34;&gt;vignette&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;OpenSpecy.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=tidypaleo&#34;&gt;tidypaleo&lt;/a&gt; v0.1.1: Provides functions with a common framework for age-depth model management, stratigraphic visualization, and common statistical transformations with a focus on stratigraphic visualization using &lt;code&gt;ggplot2&lt;/code&gt;. There are vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/tidypaleo/vignettes/age_depth.html&#34;&gt;Age-depth Models&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/tidypaleo/vignettes/nested_analysis.html&#34;&gt;Nested Analyses&lt;/a&gt;, and &lt;a href=&#34;https://cran.r-project.org/web/packages/tidypaleo/vignettes/strat_diagrams.html&#34;&gt;Stratigraphic Diagrams&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;tidypaleo.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=VulnToolkit&#34;&gt;VulnToolkit&lt;/a&gt; v1.1.2: Provides functions to analyze and summarize tidal data sets and to access to NOAA mean sea level data. See &lt;a href=&#34;https://www.sciencedirect.com/science/article/abs/pii/S0272771415002139?via%3Dihub&#34;&gt;Hill &amp;amp; Anisfeld (2015)&lt;/a&gt; for background and the &lt;a href=&#34;https://cran.r-project.org/web/packages/VulnToolkit/vignettes/Tidal_data.html&#34;&gt;vignette&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;VulnToolkit.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;statistics&#34;&gt;Statistics&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=corncob&#34;&gt;corncob&lt;/a&gt; v0.2.0: Implements functions for modeling correlated count data using the beta-binomial distribution, described in &lt;a href=&#34;https://projecteuclid.org/journals/annals-of-applied-statistics/volume-14/issue-1/Modeling-microbial-abundances-and-dysbiosis-with-beta-binomial-regression/10.1214/19-AOAS1283.short&#34;&gt;Martin et al. (2020)&lt;/a&gt;. See the &lt;a href=&#34;https://cran.r-project.org/web/packages/corncob/vignettes/corncob-intro.pdf&#34;&gt;vignette&lt;/a&gt; for an introduction.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;corncob.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=hawkesbow&#34;&gt;hawkesbow&lt;/a&gt; v1.0.2: Implements an estimation method for &lt;a href=&#34;https://arxiv.org/pdf/1507.02822.pdf#:~:text=The%20Hawkes%20process%20(HP)%20is,trade%20orders%2C%20or%20bank%20defaults.&#34;&gt;Hawkes processes&lt;/a&gt; when count data are only observed in discrete time, using a spectral approach derived from the Bartlett spectrum. See &lt;a href=&#34;https://arxiv.org/abs/2003.04314&#34;&gt;Cheysson and Lang (2020)&lt;/a&gt; for background and the &lt;a href=&#34;https://cran.r-project.org/web/packages/hawkesbow/vignettes/hawkesbow.pdf&#34;&gt;vignette&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=LMMELSM&#34;&gt;LMMELSM&lt;/a&gt; v0.1.0: Implements two-level mixed effects location scale models on multiple observed or latent outcomes, and between-group variance modeling. See &lt;a href=&#34;https://econtent.hogrefe.com/doi/10.1027/1015-5759/a000624&#34;&gt;Williams et al. (2020)&lt;/a&gt; and &lt;a href=&#34;https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2007.00924.x&#34;&gt;Hedeker et al. (2008)&lt;/a&gt; for background and &lt;a href=&#34;https://cran.r-project.org/web/packages/LMMELSM/readme/README.html&#34;&gt;README&lt;/a&gt; for an example.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=mixpoissonreg&#34;&gt;mixpoissinreg&lt;/a&gt; v1.0.0: Provides functions to fit mixed Poisson regression models (Poisson-Inverse Gaussian or Negative-Binomial) with count data response variables. See &lt;a href=&#34;https://link.springer.com/article/10.1007%2Fs11222-015-9601-6&#34;&gt; Barreto-Souza and Simas (2016)&lt;/a&gt; for background. There are five vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/influence-mixpoissonreg.html&#34;&gt;Global and Local Influence&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/intervals-mixpoissonreg.html&#34;&gt;Confidence and Prediction Intervals&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/ml-mixpoissonreg.html&#34;&gt;MLE&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/tidyverse-mixpoissonreg.html&#34;&gt;Tidy Methods&lt;/a&gt;, and &lt;a href=&#34;https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/tutorial-mixpoissonreg.html&#34;&gt;Overdispersed Count Data&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;mixpoissinreg.png&#34; height = &#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=ppdiag&#34;&gt;ppdiag&lt;/a&gt; v0.1.0: Provides a suite of diagnostic tools for univariate point processes including tools for simulating and fitting both common and more complex temporal point processes and the diagnostic tools described in &lt;a href=&#34;https://direct.mit.edu/neco/article/14/2/325/6578/The-Time-Rescaling-Theorem-and-Its-Application-to&#34;&gt;Brown et al. (2002)&lt;/a&gt; and &lt;a href=&#34;https://arxiv.org/abs/2001.09359&#34;&gt;Wu et al. (2020)&lt;/a&gt;. There is a vignette on &lt;a href=&#34;https://cran.r-project.org/web/packages/ppdiag/vignettes/fitting_markov_modulated.html&#34;&gt;Markov Modulated Point Processes&lt;/a&gt; and another on &lt;a href=&#34;https://cran.r-project.org/web/packages/ppdiag/vignettes/ppdiag.html&#34;&gt;Diagnostic Tools&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=robustlm&#34;&gt;robustlm&lt;/a&gt; v0.1.0: Implements a computationally efficient exponential squared loss algorithm for variable selection proposed by &lt;a href=&#34;https://www.tandfonline.com/doi/abs/10.1080/01621459.2013.766613&#34;&gt;Wang et al.(2013)&lt;/a&gt;. See the &lt;a href=&#34;https://cran.r-project.org/web/packages/robustlm/vignettes/vignette.html&#34;&gt;vignette&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;robustlm.png&#34; height = &#34;200&#34; width=&#34;300&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://CRAN.R-project.org/package=smmR&#34;&gt;smmR&lt;/a&gt; v1.0.2: Provides functions to estimate and simulate multi-state semi-Markov models. The methods implemented are described in &lt;a href=&#34;https://www.tandfonline.com/doi/abs/10.1080/10485250701261913&#34;&gt;Barbu &amp;amp; Limnios (2008)&lt;/a&gt; and &lt;a href=&#34;https://www.tandfonline.com/doi/abs/10.1080/10485252.2011.555543&#34;&gt;Trevezas &amp;amp; Limnios (2011)&lt;/a&gt;. The &lt;a href=&#34;https://cran.r-project.org/web/packages/smmR/vignettes/Textile-Factory.html&#34;&gt;vignette&lt;/a&gt; contains an extended example.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;smmR.png&#34; height = &#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=spotoroo&#34;&gt;spotoroo&lt;/a&gt; v0.1.1: Implements an algorithm to cluster satellite hot spot data spatially and temporally. See the &lt;a href=&#34;https://cran.r-project.org/web/packages/spotoroo/vignettes/Clustering-hot-spots.html&#34;&gt;vignette&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;spotoroo.png&#34; height = &#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;h3 id=&#34;utilities&#34;&gt;Utilities&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=clock&#34;&gt;clock&lt;/a&gt; v0.2.0: Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (duration, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. There is a &lt;a href=&#34;Getting Started&#34;&gt;Getting Started&lt;/a&gt; guide, as well as vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/clock/vignettes/faq.html&#34;&gt;FAQ&lt;/a&gt;, and &lt;a href=&#34;https://cran.r-project.org/web/packages/clock/vignettes/recipes.html&#34;&gt;Examples and Recipies&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=crosstable&#34;&gt;crosstable&lt;/a&gt; v0.2.1: Provides functions to create descriptive tables for continuous and categorical variables, apply summary statistics, and create reports using &lt;code&gt;rmarkdown&lt;/code&gt; or &lt;code&gt;officer&lt;/code&gt;. There is an &lt;a href=&#34;https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable.html&#34;&gt;Introduction&lt;/a&gt;, and vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable-install.html&#34;&gt;Troubleshooting&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable-report.html&#34;&gt;Making Automatic Reports&lt;/a&gt;, and &lt;a href=&#34;https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable-selection.html&#34;&gt;Selecting Variables&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=pkgdepends&#34;&gt;pkgdepends&lt;/a&gt; v0.1.0: Provides functions to find recursive dependencies for R packages from various sources including CRAN, Bioconductor, and GitHub enabling users to obtain a consistent set of packages to install. See &lt;a href=&#34;https://cran.r-project.org/web/packages/pkgdepends/readme/README.html&#34;&gt;README&lt;/a&gt; to get started.&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=pkglite&#34;&gt;pkglite&lt;/a&gt; v0.1.1: Implements a tool, grammar, and standard to represent and exchange R package source code as text files. Converts one or more source packages to a text file and restores the package structures from the file. There are vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/pkglite/vignettes/filespec.html&#34;&gt;Generating File Specifications&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/pkglite/vignettes/format.html&#34;&gt;Representing Packages&lt;/a&gt;, and &lt;a href=&#34;https://cran.r-project.org/web/packages/pkglite/index.html&#34;&gt;Compact Package Representation&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&#34;visualization&#34;&gt;Visualization&lt;/h3&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=datplot&#34;&gt;datplot&lt;/a&gt; v1.0.0: Provides tools to process and prepare data for visualization and employs the concept of &lt;a href=&#34;https://www.jratcliffe.net/aoristic-analysis&#34;&gt;aoristic analysis&lt;/a&gt;. See &lt;a href=&#34;https://bit.ly/3svhbdV&#34;&gt;aorist&lt;/a&gt; and the vignettes &lt;a href=&#34;https://cran.r-project.org/web/packages/datplot/vignettes/data_preparation.html&#34;&gt;Data Preparation and Visualization&lt;/a&gt; and &lt;a href=&#34;https://cran.r-project.org/web/packages/datplot/vignettes/how-to.html&#34;&gt;Visualizing Chronological Distribution&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;datplot.png&#34; height = &#34;400&#34; width=&#34;600&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=ferrn&#34;&gt;ferrn&lt;/a&gt; v0.0.1: Implements diagnostic plots for optimization, with a focus on projection pursuit which show paths the optimizer takes in the high-dimensional space. See &lt;a href=&#34;https://cran.r-project.org/web/packages/ferrn/readme/README.html&#34;&gt;README&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;ferrn.gif&#34; height = &#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=funcharts&#34;&gt;funcharts&lt;/a&gt; v1.0.0: Provides functional control charts for statistical process monitoring of functional data, using the methods of &lt;a href=&#34;https://onlinelibrary.wiley.com/doi/abs/10.1002/asmb.2507&#34;&gt;Capezza et al. (2020)&lt;/a&gt; and &lt;a href=&#34;https://www.tandfonline.com/doi/abs/10.1080/00401706.2020.1753581?journalCode=utch20&#34;&gt;Centofanti et al. (2020)&lt;/a&gt;. There are  vignettes on &lt;a href=&#34;https://cran.r-project.org/web/packages/funcharts/vignettes/capezza2020.html&#34;&gt;Capezza 2020&lt;/a&gt;, &lt;a href=&#34;https://cran.r-project.org/web/packages/funcharts/vignettes/centofanti2020.html&#34;&gt;Centofanti 2020&lt;/a&gt; and on the &lt;a href=&#34;https://cran.r-project.org/web/packages/funcharts/vignettes/mfd.html&#34;&gt;mfd class&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;funcharts.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=gghilbertstrings&#34;&gt;gghilbertstrings&lt;/a&gt; v0.3.3: Provides functions to plot Hilbert curves which are used to map one dimensional data into the 2D plane. A specific use case maps a character column in a data frame into 2D space allowing visually comparing long lists of URLs, words, genes or other data that has a fixed order and position. See &lt;a href=&#34;https://cran.r-project.org/web/packages/gghilbertstrings/readme/README.html&#34;&gt;README&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;gghilbertstrings.png&#34; height = &#34;300&#34; width=&#34;500&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&#34;https://cran.r-project.org/package=mapsf&#34;&gt;mapsf&lt;/a&gt; v0.1.1: Provides functions to create and integrate thematic maps including functions to design various cartographic representations such as proportional symbols, choropleth or typology maps. Look &lt;a href=&#34;https://riatelab.github.io/mapsf&#34;&gt;here&lt;/a&gt; for examples.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;mapsf.png&#34; height = &#34;400&#34; width=&#34;400&#34;&gt;&lt;/p&gt;

&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; I have used phrases like &lt;em&gt;By my count&lt;/em&gt; and &lt;em&gt;stuck to CRAN&lt;/em&gt; in the past, but I do not believe that I have explained what I mean. For some time now, but I believe more frequently in recent months, packages will appear as new on CRAN, only to be removed within a relatively short period of time for failing to resolve check problems. If you happen to know about these packages and search for them by name on CRAN you will receive the message:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Package XXXX was removed from the CRAN repository.
Formerly available versions can be obtained from the archive.
Archived on 2021-04-17 as check problems remained after update.
A summary of the most recent check results can be obtained from the check results archive.
Please use the canonical form &lt;a href=&#34;https://CRAN.R-project.org/package=XXXX&#34;&gt;https://CRAN.R-project.org/package=XXXX&lt;/a&gt; to link to this page.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I did not include the ten packages that were identified as being new for March when I created my list of March packages on April 10, 2021, but were removed by the time I finalized my list for this post a week later, in my total count of new CRAN packages. So, there is some instability with the notion of counting new packages in a given month.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2021/04/22/march-2021-top-40-new-cran-packages/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 4: Classification of lab samples</title>
      <link>https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/</link>
      <pubDate>Thu, 23 May 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this post we complete our series on analysing the HIV pandemic in Africa. Previously we covered the bigger picture of &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;HIV infection in Africa&lt;/a&gt;, and a &lt;a href=&#34;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#34;&gt;pipeline for drug resistance testing&lt;/a&gt; of samples in the lab.&lt;/p&gt;
&lt;p&gt;Then, in &lt;a href=&#34;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/&#34;&gt;part 3&lt;/a&gt; we saw that sometimes the same patient’s genotype must be repeatedly analysed in the lab, from samples taken years apart.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Let’s say we have genotyped a patient five years ago and we have a current genotype sequence. It should be possible to retrieve the previous sequence from a database of sequences without relying on identifiers only or at all. Sometimes when someone remarries they may change their surname or transcription errors can be made, which makes finding previous samples tedious and error-prone. So instead of using patient information to look for previous samples to include, we can rather use the sequence data itself and then confirm the sequences belong to the same patient or investigate any irregularities. If we suspect mother-to-child transmission from our analysis, we confirm this with the healthcare worker who sent the sample.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this final part, we discuss how the inter- and intra-patient HIV genetic distances were analyzed using logistic regression to gain insights into the probability distribution of these two classes. In other words, the goal is to find a way to tell whether two genetic samples are from the same person or from two different people.&lt;/p&gt;
&lt;p&gt;Samples from the same person can have slightly different genetic sequences, due to mutations and other errors. This is especially useful in comparing samples of genetic material from retroviruses.&lt;/p&gt;
&lt;div id=&#34;preliminary-analysis&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Preliminary analysis&lt;/h2&gt;
&lt;p&gt;To help answer this question, we downloaded data from the &lt;a href=&#34;https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html&#34;&gt;Los Alamos HIV sequence database&lt;/a&gt; (specifically, &lt;em&gt;Virus HIV-1, subtype C, genetic region POL CDS&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;Each observation is the (dis)similarity distance between different samples.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(readr)
library(dplyr)
library(ggplot2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;ggplot2&amp;#39; was built under R version 3.5.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pt_distance &amp;lt;- 
  read_csv(&amp;quot;dist_sample_10.csv.zip&amp;quot;, col_types = &amp;quot;ccdccf&amp;quot;)

head(pt_distance)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 6
##   sample1                sample2                 distance sub   area  type 
##   &amp;lt;chr&amp;gt;                  &amp;lt;chr&amp;gt;                      &amp;lt;dbl&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;fct&amp;gt;
## 1 KI_797.67744.AB874124… KI_481.67593.AB873933.…   0.0644 B     INT   Inter
## 2 502-2794.39696.JF3202… WC3.27170.EF175209.B.U…   0.0418 B     INT   Inter
## 3 KI_882.67653.AB874186… KI_813.67589.AB874131.…   0.0347 B     INT   Inter
## 4 HTM360.13332.DQ322231… C11-2069070.63977.AB87…   0.0487 B     INT   Inter
## 5 O5598.34737.GQ372062.… LM49.4011.AF086817.B.T…   0.0360 B     INT   Inter
## 6 GKN.45901.HQ026515.B.… C11-2069083.65198.AB87…   0.0699 B     INT   Inter&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, plot a histogram of the distance between samples. This clearly shows that the distance between samples of the same subject (intra-patient) is smaller than the distance between different subjects (inter-patient). This is not surprising.&lt;/p&gt;
&lt;p&gt;However, from the histogram it is also clear that there is not a clear demarcation between these types. Simply eye-balling the data seems to indicate that one could use an arbitrary threshold of around 0.025 to indicate whether the sample is from the same person or different people.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pt_distance %&amp;gt;% 
  mutate(
    type = forcats::fct_rev(type)
  ) %&amp;gt;% 
  ggplot(aes(x = distance, fill = type)) +
  geom_histogram(binwidth = 0.001) +
  facet_grid(rows = vars(type), scales = &amp;quot;free_y&amp;quot;) +
  scale_fill_manual(values = c(&amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;)) +
  coord_cartesian(xlim = c(0, 0.1)) +
  ggtitle(&amp;quot;Histogram of phylogenetic distance by type&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-21_analysing-hiv-pandemic-part-4/2019-05-21-analysing-hiv-pandemic-part-4_files/figure-html/histogram-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;modeling&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Modeling&lt;/h2&gt;
&lt;p&gt;Since we have &lt;strong&gt;two&lt;/strong&gt; sample types (intra-patient vs inter-patient), this is a binary classification problem.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Logistic_regression&#34;&gt;Logistic regression&lt;/a&gt; is a simple algorithm for binary classification, and a special case of a &lt;a href=&#34;https://en.wikipedia.org/wiki/Generalized_linear_model&#34;&gt;generalized linear model&lt;/a&gt; (&lt;strong&gt;GLM&lt;/strong&gt;). In &lt;strong&gt;R&lt;/strong&gt;, you can use the &lt;code&gt;glm()&lt;/code&gt; function to fit a GLM, and to specify a logistic regression, use the &lt;code&gt;family = binomial&lt;/code&gt; argument.&lt;/p&gt;
&lt;p&gt;In this case we want to train a model with &lt;code&gt;distance&lt;/code&gt; as independent variable, and &lt;code&gt;type&lt;/code&gt; the dependent variable, i.e. &lt;code&gt;type ~ distance&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We train on 100,000 (&lt;code&gt;n = 1e5&lt;/code&gt;) observations purely to reduce computation time:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pt_sample &amp;lt;- 
  pt_distance %&amp;gt;% 
  sample_n(1e5)
model &amp;lt;- glm(type ~ distance, data = pt_sample, family = binomial)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Note that sometimes the model throws a warning indicating numerical problems. This happens because the overlap between intra and inter is very small. If there is a very sharp dividing line between classes, the logistic regression algorithm has problems to converge.)&lt;/p&gt;
&lt;p&gt;However, in this case the numerical problems doesn’t actually cause a practical problem with model itself.&lt;/p&gt;
&lt;p&gt;The model summary tells us that the &lt;code&gt;distance&lt;/code&gt; variable is highly significant (indicated by the ***):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(model)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## glm(formula = type ~ distance, family = binomial, data = pt_sample)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.4035  -0.0050  -0.0010  -0.0002   8.4904  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(&amp;gt;|z|)    
## (Intercept)    5.7887     0.1796   32.23   &amp;lt;2e-16 ***
## distance    -355.1454     9.3247  -38.09   &amp;lt;2e-16 ***
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 23659.2  on 99999  degrees of freedom
## Residual deviance:  1440.5  on 99998  degrees of freedom
## AIC: 1444.5
## 
## Number of Fisher Scoring iterations: 12&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can use the model to compute a prediction for a range of genetic distances (from 0 to 0.05) and create a plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;newdata &amp;lt;-  data.frame(distance = seq(0, 0.05, by = 0.001))
pred &amp;lt;- predict(model, newdata, type = &amp;quot;response&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot_inter &amp;lt;- 
  pt_sample %&amp;gt;% 
  filter(distance &amp;lt;= 0.05, type == &amp;quot;Inter&amp;quot;) %&amp;gt;% 
  sample_n(2000)
  
plot_intra &amp;lt;- 
  pt_sample %&amp;gt;% 
  filter(distance &amp;lt;= 0.05, type == &amp;quot;Intra&amp;quot;) %&amp;gt;% 
  sample_n(2000)

threshold &amp;lt;-  with(newdata, approx(pred, distance, xout = 0.5))$y

ggplot() +
  geom_point(data = plot_inter, aes(x = distance, y = 0), alpha = 0.05, col = &amp;quot;blue&amp;quot;) +
  geom_point(data = plot_intra, aes(x = distance, y = 1), alpha = 0.05, col = &amp;quot;red&amp;quot;) +
  geom_rug(data = plot_inter, aes(x = distance, y = 0), col = &amp;quot;blue&amp;quot;) +
  geom_rug(data = plot_intra, aes(x = distance, y = 0), col = &amp;quot;red&amp;quot;) +
  geom_line(data = newdata, aes(x = distance, y = pred)) +
  annotate(x = 0.005, y = 0.9, label = &amp;quot;Type == intra&amp;quot;, geom = &amp;quot;text&amp;quot;, col = &amp;quot;red&amp;quot;) +
  annotate(x = 0.04, y = 0.1, label = &amp;quot;Type == inter&amp;quot;, geom = &amp;quot;text&amp;quot;, col = &amp;quot;blue&amp;quot;) +
  geom_vline(xintercept = threshold, col = &amp;quot;grey50&amp;quot;) +
  ggtitle(&amp;quot;Model results&amp;quot;, subtitle = &amp;quot;Predicted probability that Type == &amp;#39;Intra&amp;#39;&amp;quot;) +
  xlab(&amp;quot;Phylogenetic distance&amp;quot;) +
  ylab(&amp;quot;Probability&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-21_analysing-hiv-pandemic-part-4/2019-05-21-analysing-hiv-pandemic-part-4_files/figure-html/predictionplot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Logistic regression essentially fits an s-curve that indicates the probability. In this case, for small distances (lower than ~0.01) the probability of being the same person (i.e., type is intra) is almost 100%. For distances greater than 0.03 the probability of being type intra is almost zero (i.e., the model predicts type inter).&lt;/p&gt;
&lt;p&gt;The model puts the distance threshold at approximately 0.016.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-practical-value-of-this-work&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The practical value of this work&lt;/h2&gt;
&lt;p&gt;In part 2, we discussed how &lt;a href=&#34;https://journals.plos.org/plosone/article/authors?id=10.1371/journal.pone.0213241&#34;&gt;researchers&lt;/a&gt; developed an automated pipeline of phylogenetic analysis. The project was designed to run on the Raspberry Pi, a very low-cost computing device. This meant that the cost of implementation of the project is low, and the project has been implemented at the &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt; in South Africa.&lt;/p&gt;
&lt;p&gt;In this part, we described the very simple logistic regression model that runs as part of the pipeline. In addition to the descriptive analysis, e.g., heat maps and trees (as described in part 3), this logistic regression makes a prediction whether two samples were obtained from the same person, or from two different people. This prediction is helpful in allowing the laboratory staff identify potential contamination of samples, or indeed to match samples from people who weren’t matched properly by their name and other identifying information (e.g., through spelling mistakes or name changes).&lt;/p&gt;
&lt;p&gt;Finally, it’s interesting to note that traditionally the decision whether two samples were intra-patient or inter-patient was made on heuristics, instead of modelling. For example, a heuristic might say that if the genetic distance between two samples is less than 0.01, they should be considered a match from a single person.&lt;/p&gt;
&lt;p&gt;Heuristics are easy to implement in the lab, but sometimes it can happen that the origin of the original heuristic gets lost. This means that it’s possible that the heuristic is no longer applicable to the sample population.&lt;/p&gt;
&lt;p&gt;This modelling gave the researchers a tool to establish confidence intervals around predictions. In addition, it is now possible to repeat the model for many different local sample populations of interest, and thus have a tool that is better able to discriminate given the most recent data.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this multi-part series of HIV in Africa we covered four topics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;part 1&lt;/a&gt;, we analysed the incidence of HIV in sub-Sahara Africa, with special mention of the effect of the wide-spread availability of anti-retroviral (ARV) drugs during 2004. Since then, there was a rapid decline in HIV infection rates in South Africa.&lt;/li&gt;
&lt;li&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-2/&#34;&gt;part 2&lt;/a&gt;, we described the PhyloPi project - a phylogenetic pipeline to analyse HIV in the lab, available for the low-cost RaspBerry Pi. This work as published in the &lt;a href=&#34;https://journals.plos.org/plosone/&#34;&gt;PLoS ONE journal&lt;/a&gt;: “&lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility&lt;/a&gt;”&lt;/li&gt;
&lt;li&gt;Then, &lt;a href=&#34;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/&#34;&gt;part 3&lt;/a&gt; described the biological mechanism how the HIV virus mutates, and how this can be modeled using a Markov chain, and visualized as heat maps and phylogenetic trees.&lt;/li&gt;
&lt;li&gt;This final part covered how we used a very simple logistic regression model to identify if two samples in the lab came from the same person or two different people.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;closing-thoughts&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Closing thoughts&lt;/h2&gt;
&lt;p&gt;Dear readers,&lt;/p&gt;
&lt;p&gt;I hope that you enjoyed this series on ‘Analysing the HIV pandemic’ using R and some of the tools available as part of the &lt;a href=&#34;https://www.tidyverse.org/&#34;&gt;&lt;code&gt;tidyverse&lt;/code&gt;&lt;/a&gt; packages. Learning R provided me not only with a tool set to analyse data problems, but also a &lt;a href=&#34;https://stackoverflow.com/questions/tagged/r&#34;&gt;community&lt;/a&gt;. Being a biologist, I was not sure of the best approach for solving the problem of inter- and intra-patient genetic distances. I contacted Andrie from &lt;a href=&#34;https://resources.rstudio.com/authors/andrie-de-vries&#34;&gt;Rstudio&lt;/a&gt;, and not only did he help us with this, but he was also excited about it. It was a pleasure telling you about our journey on this blog site, and a privilege doing this with experts.&lt;/p&gt;
&lt;p&gt;Armand&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/05/23/pipeline-for-analysing-hiv-part-4/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 3: Genetic diversity</title>
      <link>https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/</link>
      <pubDate>Thu, 16 May 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/</guid>
      <description>
        
&lt;script src=&#34;/rmarkdown-libs/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-binding/plotly.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/typedarray/typedarray.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/jquery/jquery.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/crosstalk/css/crosstalk.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/crosstalk/js/crosstalk.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/plotly-htmlwidgets-css/plotly-htmlwidgets.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-main/plotly-latest.min.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;recap&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Recap&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#34;&gt;part 2 of this series&lt;/a&gt;, we discussed the &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PhyloPi&lt;/a&gt; pipeline for conducting routine HIV phylogenetics in the drug-resistance testing laboratory as a part of quality control. As mentioned, during HIV replication the error-prone viral reverse transcriptase (RT) converts its RNA genome into DNA before it can be integrated into the host cell genome. During this conversion, the enzyme makes random mistakes in the copying process. These mistakes, or mutations, can be deleterious, beneficial or may have no measurable impact on the replicative fitness of the virus. However, the fast rate of mutation provides enough divergence to be useful for phylogenetic analysis.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;As infections spread from person to person, the virus continues to mutate and become more and more divergent. This allows us to use the genetic information we obtain while doing the drug resistance test and analyse the sequences for abnormalities.&lt;/p&gt;
&lt;p&gt;We showed how DNA sequences can be aligned and, based on the composition of ‘columns’ in these strings, a distance matrix can be calculated of each string against each other. In the example we discussed in part 2, we had a very simple method for calculating matches, i.e., we used either a one or zero. We can get closer to the truth by using substitution models, as we will explain below. In many machine learning algorithms, it is required that one first calculate the distances of each observation against each other, and the choice of algorithm is up to the analyst. Phylogenetic inference is very similar in that a distance matrix needs to be constructed on which the tree can be calculated.&lt;/p&gt;
&lt;p&gt;If the sequence targeted for phylogenetic inference is very stable with little or no evolution, the distances calculated will be zero or very close to it. This will not allow for differentiation. However, as we mentioned, HIV has a very fast rate of evolution due to its error-prone reverse transcriptase.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002251&#34;&gt;Cuevas&lt;/a&gt; &lt;em&gt;et al.&lt;/em&gt; (2015) published work on the &lt;em&gt;in vivo&lt;/em&gt; rate of HIV evolution. Their analysis revealed the highest mutation rate of any biological entity of &lt;span class=&#34;math inline&#34;&gt;\(4.1 \cdot 10^{-3}\)&lt;/span&gt; (&lt;span class=&#34;math inline&#34;&gt;\(sd=1.7 \cdot 10^{-3}\)&lt;/span&gt;). However, the error-prone reverse transcriptase is not the only mechanism of mutation. One defence against HIV infection is an enzyme called apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like or &lt;strong&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/APOBEC3G&#34;&gt;APOBEC&lt;/a&gt;&lt;/strong&gt;. These enzymes act on RNA and convert or mutate cytidine to uridine (uridine in RNA is the thymadine counterpart in DNA). This results in a G to A mutation on the cDNA.&lt;/p&gt;
&lt;p&gt;Also, shown by Cuevas &lt;em&gt;et al&lt;/em&gt;, these enzymes are not equally active in all people. On the other hand, the viral Vif protein inhibits this hypermutation by ‘tagging’ the APOBEC protein with ubiquinone for degradation by the cytoplasmic ubiquitin-dependent proteasome machinery.&lt;/p&gt;
&lt;p&gt;But how does this virus-driven mutation, or APOBEC-driven hypermutation, affect the virus in a negative (or positive) way?&lt;/p&gt;
&lt;p&gt;We first need to understand how RNA is translated into proteins. Below is a table showing the codon combinations for each of the 20 amino acids.&lt;/p&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span id=&#34;fig:codons&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/codon-table-by-sabal-edu.jpg&#34; alt=&#34;Amino acid encoding. Available at https://www.biologyjunction.com/protein-synthesis-worksheet/&#34; width=&#34;80%&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Amino acid encoding. Available at &lt;a href=&#34;https://www.biologyjunction.com/protein-synthesis-worksheet/&#34; class=&#34;uri&#34;&gt;https://www.biologyjunction.com/protein-synthesis-worksheet/&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As can be seen from the table above, some amino acids are encoded by more than one codon. For example, if we change the codon CGU to AGA, the resulting amino acid stays Arginine or R. This is referred to as a silent mutation, since the resulting protein will look the same. On the other hand, if we mutate AGU to CGU, the resulting mutation is from Serine to Arginine, or in single-letter notation, &lt;strong&gt;S to R&lt;/strong&gt;. A change in the amino acid is referred to as a non-synonymous mutation.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;example&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Example&lt;/h2&gt;
&lt;p&gt;In reality, the APOBEC enzyme recognizes specific RNA sequence motifs, but just to give an idea of how this works, let’s look at an example.&lt;/p&gt;
&lt;p&gt;Load some packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ape)
library(Biostrings)
library(tibble)
library(tidyr)
library(dplyr)
library(knitr)
library(plotly)
library(RColorBrewer)
library(diagram)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Create a RNA sequence (remember &lt;code&gt;U&lt;/code&gt; is &lt;code&gt;T&lt;/code&gt; in RNA language):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;WT &amp;lt;- c(&amp;quot;CGA&amp;quot;, &amp;quot;GUU&amp;quot;, &amp;quot;AUA&amp;quot;, &amp;quot;GAG&amp;quot;, &amp;quot;UGG&amp;quot;, &amp;quot;AGU&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We have the sequence CGAGUUAUAGAGUGGAGU that we created in the cell block above as codons for clarity. We can now translate this sequence using the codon table or some function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;translate_dna_sequence &amp;lt;- function(x){
  x %&amp;gt;% 
    paste0(collapse = &amp;quot;&amp;quot;) %&amp;gt;% 
    gsub(&amp;quot;U&amp;quot;, &amp;quot;T&amp;quot;, .) %&amp;gt;% 
    DNAString() %&amp;gt;% 
    as.DNAbin() %&amp;gt;% 
    trans() %&amp;gt;% 
    .[[1]] %&amp;gt;% 
    as.character.AAbin()
}

AA &amp;lt;- WT %&amp;gt;% translate_dna_sequence()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code block above translated our RNA sequence into a protein sequence: R, V, I, E, W, S.&lt;/p&gt;
&lt;p&gt;Now let’s mutate all occurrences of &lt;code&gt;C&lt;/code&gt; to &lt;code&gt;U/T&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;MUT &amp;lt;- gsub(&amp;quot;C&amp;quot;, &amp;quot;U&amp;quot;, WT)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The resulting mutant sequence is: UGA, GUU, AUA, GAG, UGG, AGU, and if we now translate that, we get …&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;AA &amp;lt;- MUT %&amp;gt;% translate_dna_sequence()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;… the protein sequence: *, V, I, E, W, S.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;*&lt;/code&gt; means a &lt;em&gt;stop codon&lt;/em&gt; was introduced. Stop codons are responsible for terminating translation from RNA to protein. If one of the viral genes has a stop codon in it, the protein will truncate prematurely and the protein will most likely be dysfunctional. Mutations other than stop codons could also have a negative effect on the virus, or it can cause resistance to an ARV.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;calculating-genetic-distances-from-a-multiple-sequence-alignment-msa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Calculating genetic distances from a multiple sequence alignment (MSA)&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#34;&gt;part 2&lt;/a&gt;, we showed the general principle of a MSA. In biology, sequence alignments are used to look at similarities of DNA or protein sequences. For most phylogenetic analysis, a multiple sequence alignment is a requirement, and the more accurate the MSA, the more accurate the phylogenetic inference.&lt;/p&gt;
&lt;p&gt;First, we read in the multiple sequence alignment file.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read in the alignment file
aln &amp;lt;- read.dna(&amp;#39;example.aln&amp;#39;, format = &amp;#39;fasta&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we can calculate the distance matrix using the Kimura two-parameter (K80) model. There are various models that can be applied when looking at DNA substitution models. We will use a model based on &lt;a href=&#34;https://en.wikipedia.org/wiki/Markov_chain&#34;&gt;Markov chains&lt;/a&gt;. Remember:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“All models are wrong, but some are useful” - &lt;a href=&#34;https://en.wikipedia.org/wiki/George_E._P._Box&#34;&gt;George Box&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is &lt;strong&gt;very&lt;/strong&gt; true when it comes to estimating genetic distances and phylogenetic inference. Consider the image below:&lt;/p&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span id=&#34;fig:sumbstetutions&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/1024px-All_transitions_and_transversions.svg.png&#34; alt=&#34;transversions vs transitions. Available at https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: transversions vs transitions. Available at &lt;a href=&#34;https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png&#34; class=&#34;uri&#34;&gt;https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/All_transitions_and_transversions.svg/1024px-All_transitions_and_transversions.svg.png&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The figure above shows transition and transversion events. &lt;strong&gt;Transition&lt;/strong&gt; between &lt;strong&gt;A&lt;/strong&gt; and &lt;strong&gt;G&lt;/strong&gt; (the purines) and &lt;strong&gt;C&lt;/strong&gt; and &lt;strong&gt;T&lt;/strong&gt; (the pyrimidines) are more likely than &lt;strong&gt;transversions&lt;/strong&gt; (indicated by the red arrows). The K80 model takes this into account as one of its parameters, and these rates, or probabilities, are calculated or estimated by maximum likelihood.&lt;/p&gt;
&lt;p&gt;Let’s see what that looks like:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tmDNA &amp;lt;- matrix(c(0.8,0.05,0.1,0.05,
                  0.05,0.8,0.05,0.1,
                  0.1,0.05,0.8,0.05,
                  0.05,0.1,0.05,0.8),
                nrow = 4, byrow = TRUE)
stateNames &amp;lt;- c(&amp;quot;A&amp;quot;,&amp;quot;C&amp;quot;,&amp;quot;G&amp;quot;, &amp;quot;T&amp;quot;)
row.names(tmDNA) &amp;lt;- stateNames; colnames(tmDNA) &amp;lt;- stateNames

tmDNA %&amp;gt;% 
  kable(
    caption = &amp;quot;Example K80 probabilities of transitions or transversions&amp;quot;
  )&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-6&#34;&gt;Table 1: &lt;/span&gt;Example K80 probabilities of transitions or transversions&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;A&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;C&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;G&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;T&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;G&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;T&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.05&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.80&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plotmat(tmDNA,pos = c(2,2), 
        lwd = 1, box.lwd = 2, 
        cex.txt = 0.8, 
        box.size = 0.1, 
        box.type = &amp;quot;circle&amp;quot;, 
        box.prop = 0.5,
        box.col = &amp;quot;light blue&amp;quot;,
        arr.length=.1,
        arr.width=.1,
        self.cex = .6,
        self.shifty = -.01,
        self.shiftx = .14,
        main = &amp;quot;Markov Chain&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-14_analysing-hiv-pandemic-part-3/2019-05-14-analysing-hiv-pandemic-part-3_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This example is contrived, but should explain the concept of a substitutions model. The viral reverse transcriptase is not a random sequence generator, but it does make mistakes. Most of the time when it is copying the RNA into DNA, the base (state) stays the same. Then also, the probability of a transversion &lt;em&gt;vs.&lt;/em&gt; a transition is different. If you look at the figure above where we introduced transversion and transition, you will notice that A is more similar to G, and T is more similar to C in its chemical structure.&lt;/p&gt;
&lt;p&gt;There are many other &lt;a href=&#34;http://www.iqtree.org/doc/Substitution-Models&#34;&gt;substitution models&lt;/a&gt;. It is not always trivial to select the best model for phylogenetic inference. One technique is to run multiple maximum likelihood phylogenetic calculations using different models, and then pick the model with the lowest AIC (Akaike Information Criterion). For our pipeline, we selected the rather simple K80 model. Since we are looking at different sets of sequences at each submission, a simple model is probably better in order to avoid the problems caused by overfitting.&lt;/p&gt;
&lt;p&gt;We can use the &lt;code&gt;ape&lt;/code&gt; package and calculate distances using the &lt;code&gt;K80&lt;/code&gt; model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Calculate the genetic distances between sequences using the K80 model, as.mattrix makes the rest easier
alnDist &amp;lt;- dist.dna(aln, model = &amp;quot;K80&amp;quot;, as.matrix = TRUE)
alnDist[1:5, 1:5] %&amp;gt;% 
  kable(caption = &amp;quot;First few rows of our distance matrix&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-8&#34;&gt;Table 2: &lt;/span&gt;First few rows of our distance matrix&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;01_AE.JP.AB253686_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.US.HM450245_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.AU.AF407664_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.CN.KJ820110_INT&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;B.RU.HM466986_INT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;01_AE.JP.AB253686_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0935626&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0961965&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;B.US.HM450245_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0935626&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378446&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378167&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378748&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;B.AU.AF407664_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0961965&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378446&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0454602&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0494138&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;B.CN.KJ820110_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378167&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0454602&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0479955&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;B.RU.HM466986_INT&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0962887&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0378748&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0494138&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0479955&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The matrix has a shape of 47 by 47, so we just preview the first 5 rows and columns.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;reduction-of-the-heatmap-to-focus-on-the-important-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Reduction of the heatmap to focus on the important data&lt;/h2&gt;
&lt;p&gt;The pipeline mentioned uses the &lt;strong&gt;Basic Local Alignment Search Tool&lt;/strong&gt; (BLAST) to retrieve previously sampled sequences, and adds these retrieved sequences to the analysis. &lt;a href=&#34;https://blast.ncbi.nlm.nih.gov/Blast.cgi&#34;&gt;BLAST&lt;/a&gt; is like a search engine you use on the web, but for protein or DNA sequences. By doing this, important sequences from retrospective samples are included, which enables PhyloPi to be aware of past sequences and not just batch-per-batch aware. Have a look at the &lt;a href=&#34;https://journals.plos.org/plosone/article/comments?id=10.1371/journal.pone.0213241&#34;&gt;paper&lt;/a&gt; for some examples.&lt;/p&gt;
&lt;p&gt;The data we have is ready to use for heatmap plotting purposes, but since the data also contains previously sampled sequences, comparing those sequences amongst themselves would be a distraction. We are interested in those samples, but only compared to the current batch of samples analysed. The figures below should explain this a bit better.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/heatmap_full.png&#34; alt=&#34;A diagram of a heatmap with lots of redundant and distracting data. &#34; width=&#34;50%&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
(#fig:distracting data)A diagram of a heatmap with lots of redundant and distracting data.
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;From the image above you can see that, typical of a heatmap, it is symmetrical on the diagonal. We show submitted &lt;em&gt;vs&lt;/em&gt; retrieved samples in both the horizontal and vertical direction. Notice also, annotated as “Distraction”, the previous samples are compared amongst themselves. We are not interested in those samples now, as we would already have acted on any issues then. What we want instead is a heatmap, as depicted in the image below.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;
&lt;img src=&#34;/post/2019-05-14-analysis-hiv-pandemic-part-3_files/heatmap_focused.png&#34; alt=&#34;A diagram of a more focussed heatmap with the redundant and distracting data removed.&#34; width=&#34;50%&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
(#fig:focussed data)A diagram of a more focussed heatmap with the redundant and distracting data removed.
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;Fortunately, we have a very powerful tool, &lt;strong&gt;R&lt;/strong&gt;, at our disposal, and plenty of really useful and convenient packages like &lt;code&gt;dplyr&lt;/code&gt; to fix this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;alnDistLong &amp;lt;- 
  alnDist %&amp;gt;% 
  as.data.frame(stringsToFactors = FALSE) %&amp;gt;% 
  rownames_to_column(var = &amp;quot;sample_1&amp;quot;) %&amp;gt;% 
  gather(key = &amp;quot;sample_2&amp;quot;, value = &amp;quot;distance&amp;quot;, -sample_1, na.rm = TRUE) %&amp;gt;% 
  arrange(distance)

alnDistLong %&amp;gt;% head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                sample_1              sample_2 distance
## 1 01_AE.JP.AB253686_INT 01_AE.JP.AB253686_INT        0
## 2     B.US.HM450245_INT     B.US.HM450245_INT        0
## 3     B.AU.AF407664_INT     B.AU.AF407664_INT        0
## 4     B.CN.KJ820110_INT     B.CN.KJ820110_INT        0
## 5     B.RU.HM466986_INT     B.RU.HM466986_INT        0
## 6     B.US.DQ127546_INT     B.US.DQ127546_INT        0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Final cleanup and removal of distracting data&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# get the names of samples originally in the fasta file used for submission
qSample &amp;lt;- names(read.dna(&amp;quot;example.fasta&amp;quot;, format = &amp;quot;fasta&amp;quot;))

# compute new order of samples, so the new alignment is in the order of the heatmap example
sample_1 &amp;lt;- unique(alnDistLong$sample_1)
new_order &amp;lt;- c(sort(qSample), setdiff(sample_1, qSample))
new_order&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1] &amp;quot;01_AE.JP.AB253686_INT&amp;quot;  &amp;quot;01_AE.TH.JX448243_INT&amp;quot; 
##  [3] &amp;quot;01_AE.VN.LC100946_INT&amp;quot;  &amp;quot;38_BF1.UY.FJ213783_INT&amp;quot;
##  [5] &amp;quot;B.AU.AF407664_INT&amp;quot;      &amp;quot;B.CN.KJ820110_INT&amp;quot;     
##  [7] &amp;quot;B.KR.JN417106_INT&amp;quot;      &amp;quot;B.RU.HM466986_INT&amp;quot;     
##  [9] &amp;quot;B.US.DQ127546_INT&amp;quot;      &amp;quot;B.US.GU076504_INT&amp;quot;     
## [11] &amp;quot;B.US.HM450245_INT&amp;quot;      &amp;quot;BC.CN.JQ898256_INT&amp;quot;    
## [13] &amp;quot;C.ZA.KT183056_INT&amp;quot;      &amp;quot;C.ZM.KM049918_INT&amp;quot;     
## [15] &amp;quot;C.ZM.KM050042_INT&amp;quot;      &amp;quot;01_AE.TH.JX448252_INT&amp;quot; 
## [17] &amp;quot;01_AE.TH.JX448250_INT&amp;quot;  &amp;quot;01_AE.TH.JX448249_INT&amp;quot; 
## [19] &amp;quot;C.ZA.KT183058_INT&amp;quot;      &amp;quot;C.ZM.KM049913_INT&amp;quot;     
## [21] &amp;quot;B.KR.JN417120_INT&amp;quot;      &amp;quot;B.KR.JN417117_INT&amp;quot;     
## [23] &amp;quot;B.KR.JN417116_INT&amp;quot;      &amp;quot;57_BC.CN.JX679207_INT&amp;quot; 
## [25] &amp;quot;C.ZM.KM050043_INT&amp;quot;      &amp;quot;C.ZM.KM050041_INT&amp;quot;     
## [27] &amp;quot;01_AE.JP.AB253682_INT&amp;quot;  &amp;quot;01_AE.JP.AB253689_INT&amp;quot; 
## [29] &amp;quot;B.US.KJ704790_INT&amp;quot;      &amp;quot;B.ES.KC238594_INT&amp;quot;     
## [31] &amp;quot;B.AU.AF407665_INT&amp;quot;      &amp;quot;B.AU.AF407667_INT&amp;quot;     
## [33] &amp;quot;B.CN.KC987976_INT&amp;quot;      &amp;quot;B.CN.KT192001_INT&amp;quot;     
## [35] &amp;quot;B.US.AF040369_INT&amp;quot;      &amp;quot;B.US.M38429_INT&amp;quot;       
## [37] &amp;quot;B.US.DQ127547_INT&amp;quot;      &amp;quot;B.US.DQ127543_INT&amp;quot;     
## [39] &amp;quot;C.ZA.KT183062_INT&amp;quot;      &amp;quot;B.US.GU076505_INT&amp;quot;     
## [41] &amp;quot;B.US.GU076507_INT&amp;quot;      &amp;quot;C.ZM.KM049917_INT&amp;quot;     
## [43] &amp;quot;01_AE.CN.JQ302565_INT&amp;quot;  &amp;quot;01_AE.VN.FJ185234_INT&amp;quot; 
## [45] &amp;quot;F1.BR.FJ771006_INT&amp;quot;     &amp;quot;BF.AR.AF408631_INT&amp;quot;    
## [47] &amp;quot;BC.CN.KC898983_INT&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Plot the heatmap using &lt;code&gt;plotly&lt;/code&gt; for interactivity&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;alnDistLong %&amp;gt;% 
  filter(
    sample_1 %in% qSample,
    sample_1 != sample_2
    ) %&amp;gt;% 
  mutate(
    sample_2 = factor(sample_2, levels = new_order)
  ) %&amp;gt;% 
  plot_ly(
    x = ~sample_2,
    y = ~sample_1,
    z = ~distance,
    type = &amp;quot;heatmap&amp;quot;, colors = brewer.pal(11, &amp;quot;RdYlBu&amp;quot;), 
    zmin = 0.0, zmax = 0.03,  xgap = 2, ygap = 1
) %&amp;gt;% 
  layout(
    margin = list(l = 100, r = 10, b = 100, t = 10, pad = 4), 
    yaxis = list(tickfont = list(size = 10), showspikes = TRUE),
    xaxis = list(tickfont = list(size = 10), showspikes = TRUE)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;htmlwidget-1&#34; style=&#34;width:672px;height:480px;&#34; class=&#34;plotly html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;htmlwidget-1&#34;&gt;{&#34;x&#34;:{&#34;visdat&#34;:{&#34;538659887c83&#34;:[&#34;function () &#34;,&#34;plotlyVisDat&#34;]},&#34;cur_data&#34;:&#34;538659887c83&#34;,&#34;attrs&#34;:{&#34;538659887c83&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;z&#34;:{},&#34;zmin&#34;:0,&#34;zmax&#34;:0.03,&#34;xgap&#34;:2,&#34;ygap&#34;:1,&#34;colors&#34;:[&#34;#A50026&#34;,&#34;#D73027&#34;,&#34;#F46D43&#34;,&#34;#FDAE61&#34;,&#34;#FEE090&#34;,&#34;#FFFFBF&#34;,&#34;#E0F3F8&#34;,&#34;#ABD9E9&#34;,&#34;#74ADD1&#34;,&#34;#4575B4&#34;,&#34;#313695&#34;],&#34;alpha_stroke&#34;:1,&#34;sizes&#34;:[10,100],&#34;spans&#34;:[1,20],&#34;type&#34;:&#34;heatmap&#34;}},&#34;layout&#34;:{&#34;margin&#34;:{&#34;b&#34;:100,&#34;l&#34;:100,&#34;t&#34;:10,&#34;r&#34;:10,&#34;pad&#34;:4},&#34;yaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;tickfont&#34;:{&#34;size&#34;:10},&#34;showspikes&#34;:true,&#34;title&#34;:&#34;sample_1&#34;,&#34;type&#34;:&#34;category&#34;,&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;]},&#34;xaxis&#34;:{&#34;domain&#34;:[0,1],&#34;automargin&#34;:true,&#34;tickfont&#34;:{&#34;size&#34;:10},&#34;showspikes&#34;:true,&#34;title&#34;:&#34;sample_2&#34;,&#34;type&#34;:&#34;category&#34;,&#34;categoryorder&#34;:&#34;array&#34;,&#34;categoryarray&#34;:[&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;BC.CN.KC898983_INT&#34;]},&#34;scene&#34;:{&#34;zaxis&#34;:{&#34;title&#34;:&#34;distance&#34;}},&#34;hovermode&#34;:&#34;closest&#34;,&#34;showlegend&#34;:false,&#34;legend&#34;:{&#34;yanchor&#34;:&#34;top&#34;,&#34;y&#34;:0.5}},&#34;source&#34;:&#34;A&#34;,&#34;config&#34;:{&#34;showSendToCloud&#34;:false},&#34;data&#34;:[{&#34;colorbar&#34;:{&#34;title&#34;:&#34;distance&#34;,&#34;ticklen&#34;:2,&#34;len&#34;:0.5,&#34;lenmode&#34;:&#34;fraction&#34;,&#34;y&#34;:1,&#34;yanchor&#34;:&#34;top&#34;},&#34;colorscale&#34;:[[&#34;0&#34;,&#34;rgba(165,0,38,1)&#34;],[&#34;0.0416666666666667&#34;,&#34;rgba(186,25,39,1)&#34;],[&#34;0.0833333333333333&#34;,&#34;rgba(207,42,39,1)&#34;],[&#34;0.125&#34;,&#34;rgba(222,66,46,1)&#34;],[&#34;0.166666666666667&#34;,&#34;rgba(235,91,57,1)&#34;],[&#34;0.208333333333333&#34;,&#34;rgba(245,115,69,1)&#34;],[&#34;0.25&#34;,&#34;rgba(249,143,82,1)&#34;],[&#34;0.291666666666667&#34;,&#34;rgba(253,169,94,1)&#34;],[&#34;0.333333333333333&#34;,&#34;rgba(254,191,112,1)&#34;],[&#34;0.375&#34;,&#34;rgba(254,212,132,1)&#34;],[&#34;0.416666666666667&#34;,&#34;rgba(254,229,152,1)&#34;],[&#34;0.458333333333333&#34;,&#34;rgba(255,242,171,1)&#34;],[&#34;0.5&#34;,&#34;rgba(255,255,191,1)&#34;],[&#34;0.541666666666667&#34;,&#34;rgba(244,250,215,1)&#34;],[&#34;0.583333333333333&#34;,&#34;rgba(230,245,239,1)&#34;],[&#34;0.625&#34;,&#34;rgba(211,236,244,1)&#34;],[&#34;0.666666666666667&#34;,&#34;rgba(189,226,238,1)&#34;],[&#34;0.708333333333333&#34;,&#34;rgba(167,213,231,1)&#34;],[&#34;0.75&#34;,&#34;rgba(144,195,221,1)&#34;],[&#34;0.791666666666667&#34;,&#34;rgba(121,177,211,1)&#34;],[&#34;0.833333333333333&#34;,&#34;rgba(101,154,199,1)&#34;],[&#34;0.875&#34;,&#34;rgba(82,131,187,1)&#34;],[&#34;0.916666666666667&#34;,&#34;rgba(67,106,175,1)&#34;],[&#34;0.958333333333333&#34;,&#34;rgba(59,80,162,1)&#34;],[&#34;1&#34;,&#34;rgba(49,54,149,1)&#34;]],&#34;showscale&#34;:true,&#34;x&#34;:[&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;57_BC.CN.JX679207_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;BC.CN.KC898983_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.M38429_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.CN.JQ302565_INT&#34;,&#34;01_AE.VN.FJ185234_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;F1.BR.FJ771006_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127543_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.US.DQ127547_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.KR.JN417120_INT&#34;,&#34;B.KR.JN417117_INT&#34;,&#34;B.KR.JN417116_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;B.US.AF040369_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.US.KJ704790_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;BF.AR.AF408631_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;B.ES.KC238594_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183058_INT&#34;,&#34;B.US.GU076507_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;C.ZA.KT183062_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;B.AU.AF407665_INT&#34;,&#34;B.AU.AF407667_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.TH.JX448252_INT&#34;,&#34;01_AE.TH.JX448250_INT&#34;,&#34;01_AE.TH.JX448249_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050043_INT&#34;,&#34;C.ZM.KM050041_INT&#34;,&#34;B.CN.KT192001_INT&#34;,&#34;01_AE.JP.AB253682_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;01_AE.JP.AB253689_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076505_INT&#34;,&#34;C.ZM.KM049913_INT&#34;,&#34;B.CN.KC987976_INT&#34;,&#34;C.ZM.KM049917_INT&#34;,&#34;B.US.GU076507_INT&#34;],&#34;y&#34;:[&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;BC.CN.JQ898256_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;B.KR.JN417106_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.DQ127546_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZA.KT183056_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.US.HM450245_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.JP.AB253686_INT&#34;,&#34;01_AE.TH.JX448243_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM050042_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;01_AE.VN.LC100946_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;38_BF1.UY.FJ213783_INT&#34;,&#34;B.RU.HM466986_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;B.AU.AF407664_INT&#34;,&#34;B.CN.KJ820110_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;,&#34;B.US.GU076504_INT&#34;,&#34;C.ZM.KM049918_INT&#34;],&#34;z&#34;:[0,0,0,0,0,0,0,0,0,0,0,0.00118588838545844,0.00118624017681518,0.00118624017681518,0.00237530137929893,0.00356295206894832,0.00356719705693705,0.00356719705693705,0.00595350496095845,0.00595954611860516,0.00715254630296478,0.00715254630296478,0.0144078126601721,0.0144078126601721,0.0155870201477025,0.0156273254598515,0.0156273254598515,0.0168178038999123,0.0168178038999123,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.0180260493744189,0.019250040193739,0.019250040193739,0.0204770346745223,0.0241565618280353,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0241761884152448,0.0242220267343138,0.0253584872200611,0.0253941862223067,0.0254642633981448,0.0254642633981448,0.0254642633981448,0.0266143894883516,0.0266348816479994,0.0278192110973387,0.0278192110973387,0.0278786633839003,0.0290614477611697,0.0291019568084084,0.0291255468231015,0.0291255468231015,0.0302883205933425,0.030350396907702,0.030350396907702,0.0316019620167614,0.0328067836257485,0.0340614894284614,0.0340614894284614,0.0341145301180979,0.0352946202395574,0.035319351727085,0.0353755648344528,0.0354070364734542,0.0364714474011735,0.0365300078460321,0.0365300078460321,0.0365540667492993,0.0365803864434399,0.0365803864434399,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0377910425623869,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378166936828964,0.0378446096201193,0.0378446096201193,0.0378747853263698,0.0378747853263698,0.0378747853263698,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391120374217133,0.0391778645270624,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0392141614660769,0.0402300862987281,0.0403226935406603,0.0403515533580218,0.0415651479742212,0.0416238186755191,0.0417288904690102,0.0428098896677898,0.0428993295713217,0.0429337120779973,0.0429337120779973,0.0440569279050769,0.0440837757071892,0.0440837757071892,0.0441443681969443,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0441781026468803,0.0442141225088342,0.0442141225088342,0.0453324388008796,0.0453324388008796,0.0453324388008796,0.0454247786277811,0.0454247786277811,0.0454247786277811,0.0454247786277811,0.0454601546313492,0.0454601546313492,0.0454601546313492,0.0455377705654831,0.0466112118764381,0.0466112118764381,0.0466413260405744,0.0467084763804041,0.0467084763804041,0.0478367045264898,0.0478367045264898,0.0479250237931973,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479590969990992,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0479954783779905,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0480751455557894,0.0481184214555959,0.0491187565109586,0.0491187565109586,0.0491786116124573,0.0492477575062494,0.0494137589676422,0.0494137589676422,0.0504345221143317,0.0504672721196076,0.0505397478892119,0.0505397478892119,0.0517592625025701,0.0517592625025701,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0517960108700051,0.0519201928139014,0.0519201928139014,0.0519201928139014,0.0519201928139014,0.0530546000146163,0.0531768718957638,0.0543155247820763,0.0543533020434591,0.0543533020434591,0.055615915632105,0.055615915632105,0.055655386112653,0.055655386112653,0.055655386112653,0.055655386112653,0.0556972010120974,0.055741355269655,0.055741355269655,0.055741355269655,0.055741355269655,0.055741355269655,0.0568808787627047,0.0569196973356454,0.0569608698830132,0.0570043912991329,0.0570043912991329,0.0582697711540526,0.05831500803907,0.05831500803907,0.05831500803907,0.05831500803907,0.05836259325208,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0596783834851079,0.0727187684900024,0.0740077866714672,0.0752992347330038,0.0753554963823033,0.0766487632175863,0.0780618629356785,0.0781890298399144,0.0781890298399144,0.0783259358529553,0.0783259358529553,0.07924264135199,0.07924264135199,0.0794205594239284,0.0794205594239284,0.079551428503007,0.0805994942877881,0.0806581857138925,0.0807193420094362,0.0808490289258925,0.0809885151347227,0.0809885151347227,0.0809885151347227,0.0809885151347227,0.0812967310572018,0.0813798501393859,0.0813798501393859,0.0819600393449685,0.082020584376985,0.082020584376985,0.082020584376985,0.082020584376985,0.0820835993360382,0.0820835993360382,0.0820835993360382,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0821490791554575,0.0823602581075981,0.0825934447555855,0.0825934447555855,0.0826760422767417,0.082761066487341,0.082761066487341,0.082761066487341,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0833867054454215,0.0834515892454756,0.0835887618082053,0.083735774793543,0.0841461089242488,0.0841461089242488,0.0841461089242488,0.0846922865810079,0.0847565693159816,0.0847565693159816,0.0848925702635871,0.0854461324645494,0.0855349987062,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0860640295538832,0.0861301965535511,0.0862699816030094,0.0862699816030094,0.0862699816030094,0.0864196683669504,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.086498212759528,0.0865792181538065,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0866626798773427,0.0871923074146537,0.0872503555801866,0.0874395462398282,0.0875076078929734,0.0875076078929734,0.0875076078929734,0.0875076078929734,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0876511979509644,0.0878047108038582,0.0878047108038582,0.0878851762008905,0.0878851762008905,0.0879681079357578,0.0880535013582811,0.0880535013582811,0.0880535013582811,0.0885640503875291,0.0885640503875291,0.0885640503875291,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.0886864316175879,0.088818857459368,0.0888888242409284,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0889612868702809,0.0890362403878722,0.0890362403878722,0.0890362403878722,0.0891136799014978,0.0891136799014978,0.0891136799014978,0.089275997681829,0.089275997681829,0.089275997681829,0.089275997681829,0.0894482024020703,0.0894482024020703,0.0894482024020703,0.089538000835919,0.0900657428371278,0.0900657428371278,0.0901326079009568,0.0901326079009568,0.0901326079009568,0.0902019842058131,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0902738666778362,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0903482503116435,0.0904251301698234,0.0904251301698234,0.0905045013824362,0.0905863591465226,0.0905863591465226,0.0905863591465226,0.0906706987256182,0.0906706987256182,0.0907575154492772,0.0907575154492772,0.090846804712601,0.090846804712601,0.090846804712601,0.0913826105051333,0.0914488695835729,0.0915889476471757,0.0915889476471757,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0917390717925819,0.0918992024262255,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0919830080993289,0.0920693010361488,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0921580765891331,0.0927676529420411,0.0927676529420411,0.0927676529420411,0.0927676529420411,0.0928358330249355,0.0928358330249355,0.0929065401198158,0.0929797691281142,0.0930555150205005,0.0930555150205005,0.0930555150205005,0.0930555150205005,0.093214537683343,0.0932978047367561,0.0932978047367561,0.0933835692391848,0.0933835692391848,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0935625718946606,0.0936558008647126,0.0936558008647126,0.0942266545058739,0.094299298680529,0.094299298680529,0.094299298680529,0.094299298680529,0.094299298680529,0.0943744701719034,0.0943744701719034,0.0943744701719034,0.0943744701719034,0.0945323751469019,0.0945323751469019,0.0946150988231988,0.0946150988231988,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0947880645447123,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0948782971882605,0.0949710235302174,0.0949710235302174,0.0949710235302174,0.0950662390355332,0.0950662390355332,0.0950662390355332,0.0956213555496632,0.0956213555496632,0.095773072482434,0.095773072482434,0.095773072482434,0.095773072482434,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.0959349006107061,0.096196516180269,0.096196516180269,0.096196516180269,0.0962887353590812,0.0962887353590812,0.0962887353590812,0.0962887353590812,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0963834538479947,0.0966754557211777,0.0967392384692493,0.0968055853353737,0.0968055853353737,0.0969459502375118,0.0969459502375118,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0970965087731912,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0971755979462384,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0972572204186903,0.0973413712990135,0.097428045764283,0.097428045764283,0.0975172390596885,0.0976089464980463,0.0976089464980463,0.0976089464980463,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.09770316345932,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0977998853901467,0.0980682142818908,0.0980682142818908,0.0983465113773677,0.0984224833239981,0.0984224833239981,0.0984224833239981,0.0984224833239981,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0985820686345457,0.0987518094698342,0.0987518094698342,0.0987518094698342,0.0989316671599273,0.0990253780401984,0.0990253780401984,0.0990253780401984,0.0991216041269198,0.0991216041269198,0.0991216041269198,0.0991216041269198,0.0996756187798842,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0998289540123055,0.0999094557142755,0.0999094557142755,0.0999925068053664,0.0999925068053664,0.100166237570073,0.100166237570073,0.100166237570073,0.100256907624212,0.100350107827527,0.100544080193036,0.100544080193036,0.100644843278986,0.100644843278986,0.101321886032053,0.101321886032053,0.101321886032053,0.101321886032053,0.101321886032053,0.101406934905605,0.101406934905605,0.101406934905605,0.101584678237673,0.101677363125014,0.101677363125014,0.101677363125014,0.101772583893643,0.101870335929038,0.101870335929038,0.101870335929038,0.101870335929038,0.102738317574205,0.102738317574205,0.102738317574205,0.102738317574205,0.102738317574205,0.103007154303788,0.103298908329428,0.104158773075007,0.104247851639321,0.104247851639321,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104530437913442,0.104942864013885,0.104942864013885,0.104942864013885,0.105674386130524,0.105674386130524,0.105674386130524,0.105674386130524,0.105864304935508,0.105963103650085,0.106274801952539,0.107011847863434,0.107011847863434,0.107011847863434,0.107399886247951,0.1083518876488,0.108539723485016,0.108539723485016,0.108539723485016,0.108637512537915,0.108637512537915,0.108637512537915,0.109881296197943,0.109978573398824,0.11113129880212,0.11113129880212,0.11113129880212,0.11113129880212,0.111421575843391,0.11247859423247,0.112668461221151],&#34;zmin&#34;:0,&#34;zmax&#34;:0.03,&#34;xgap&#34;:2,&#34;ygap&#34;:1,&#34;type&#34;:&#34;heatmap&#34;,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;frame&#34;:null}],&#34;highlight&#34;:{&#34;on&#34;:&#34;plotly_click&#34;,&#34;persistent&#34;:false,&#34;dynamic&#34;:false,&#34;selectize&#34;:false,&#34;opacityDim&#34;:0.2,&#34;selected&#34;:{&#34;opacity&#34;:1},&#34;debounce&#34;:0},&#34;shinyEvents&#34;:[&#34;plotly_hover&#34;,&#34;plotly_click&#34;,&#34;plotly_selected&#34;,&#34;plotly_relayout&#34;,&#34;plotly_brushed&#34;,&#34;plotly_brushing&#34;,&#34;plotly_clickannotation&#34;,&#34;plotly_doubleclick&#34;,&#34;plotly_deselect&#34;,&#34;plotly_afterplot&#34;],&#34;base_url&#34;:&#34;https://plot.ly&#34;},&#34;evals&#34;:[],&#34;jsHooks&#34;:[]}&lt;/script&gt;
&lt;/div&gt;
&lt;div id=&#34;phylogenetic-tree&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Phylogenetic tree&lt;/h2&gt;
&lt;p&gt;Above we used the package &lt;a href=&#34;http://ape-package.ird.fr/&#34;&gt;ape&lt;/a&gt; to calculate the genetic distances for the heatmap.&lt;/p&gt;
&lt;p&gt;Another way of looking at our alignment data is to use phylogenetic inference. The PhyloPi pipeline saves each step of phylogenetic inference to allow the user to intercept at any step. We can use the &lt;a href=&#34;https://en.wikipedia.org/wiki/Newick_format&#34;&gt;newick tree file&lt;/a&gt; (a text file formatted as newick) and draw our own tree:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tree &amp;lt;- read.tree(&amp;quot;example-tree.txt&amp;quot;)
plot.phylo(
  tree, cex = 0.8, 
  use.edge.length = TRUE, 
  tip.color = &amp;#39;blue&amp;#39;, 
  align.tip.label = FALSE, 
  show.node.label = TRUE
)
nodelabels(&amp;quot;This one&amp;quot;, 9, frame = &amp;quot;r&amp;quot;, bg = &amp;quot;red&amp;quot;, adj = c(-8.2,-46))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-14_analysing-hiv-pandemic-part-3/2019-05-14-analysing-hiv-pandemic-part-3_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;1152&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We have highlighted a node with a red block, with the text “This one”, which we can now discuss. We have three leaves in this node - KM050043, KM050042, KM050041 - and if you would look up these accession numbers at &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/nuccore/KM050041.1/&#34;&gt;NCBI&lt;/a&gt;, you will notice the publication it is tied to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“HIV transmission. Selection bias at the heterosexual HIV-1 transmission bottleneck”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this paper, the authors looked at selection bias when the infection is transmitted. They found that in a pool of viral quasi-species, transmission is biased to benefit the fittest viral quasi-species. The node highlighted above shows the kind of clustering one would expect with a study like the one mentioned above. You will also notice plenty of other nodes, which you can explore using the accession number and searching for it &lt;a href=&#34;https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The tree above is much like a &lt;a href=&#34;https://en.wikipedia.org/wiki/Dendrogram&#34;&gt;dendrogram&lt;/a&gt; used when displaying &lt;a href=&#34;https://en.wikipedia.org/wiki/Hierarchical_clustering#Agglomerative_clustering_example&#34;&gt;agglomerative&lt;/a&gt; or &lt;a href=&#34;https://en.wikipedia.org/wiki/Hierarchical_clustering&#34;&gt;hierarchical clustering&lt;/a&gt;. The numbers on the tree indicate the probability that the corresponding clusters are correct. The branch lengths indicate the distances between samples. In conjunction with a properly coloured heatmap, this is very useful for finding relevant clusters to investigate. If the reason for close clustering cannot be explained, the tests are repeated.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-importance-of-phylogenetics&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The importance of phylogenetics&lt;/h2&gt;
&lt;p&gt;Phylogenetics, and thus genetic distance calculations, are used in many branches of biology. It is one of the quality-control measures at our disposal, but it has been used for the reconstruction of the origin of HIV. You may find the research papers listed below interesting where the authors used phylogenetics to infer the zoonotic origins of HIV.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3234451/&#34;&gt;Paul M. Sharp and Beatrice H. Hahn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href=&#34;https://science.sciencemag.org/content/287/5453/607.long&#34;&gt;Beatrice H. Hahn et al.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As another example, in 1998, six foreign medical workers were accused of deliberately infecting hospitalized children with HIV and were &lt;a href=&#34;https://en.wikipedia.org/wiki/HIV_trial_in_Libya&#34;&gt;sentenced to death in Libya&lt;/a&gt;. In 2006, &lt;a href=&#34;https://www.nature.com/articles/444836a&#34;&gt;de Oliveira, et al.&lt;/a&gt; used phylogenetics to provide evidence that the origin of the HIV strains that infected the children had an evolutionary history in the mid-90s, which was before the health care workers arrived in 1998. The six medics were released in 2007. There is also a very good writeup on the case by &lt;a href=&#34;https://www.nature.com/articles/444658b&#34;&gt;Declan Butler&lt;/a&gt;. Although probably very emotional, this would be a great movie.&lt;/p&gt;
&lt;p&gt;These techniques are also used in criminal convictions. However, the interpretation of this kind of evidence in court cases can be unsafe. The insights of &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1971185/&#34;&gt;Pillay, et al.&lt;/a&gt; should bring this to light.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;p&gt;In this post we discussed that as infections spread from person to person, the virus continues to mutate and become more and more divergent. This allows using the genetic information we obtain while doing the drug resistance test and analyse the sequences for abnormalities.&lt;/p&gt;
&lt;p&gt;We then showed how to compute genetic distance using multiple sequence alignment (MSA) and that it’s possible to model this process as a Markov chain. Then you can view the resulting model as a heatmap or phylogenetic trees.&lt;/p&gt;
&lt;p&gt;This finds practical application in diverse situations, for exampling shedding light on the origin of the HIV virus, as well as evidence in legal trials.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;whats-next&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What’s next&lt;/h2&gt;
&lt;p&gt;In the fourth and final part of this series, we will show how we analysed the inter- and intra-patient genetic distances of HIV sequences by logistic regression. This was useful in properly colouring our heatmap explained in this series. See you there!&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/05/16/pipeline-for-analysing-hiv-part-3/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 2: Drug resistance testing</title>
      <link>https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/</link>
      <pubDate>Tue, 07 May 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Dominique Goedhals is a pathologist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies”, and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;part 1&lt;/a&gt; of this four-part series about HIV AIDS, we discussed the &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;HIV pandemic in Sub-Saharan Africa&lt;/a&gt;. In this second installment, we cover a recent publication in the &lt;a href=&#34;https://journals.plos.org/plosone/&#34;&gt;PLoS ONE journal&lt;/a&gt;: “&lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility&lt;/a&gt;”.&lt;/p&gt;
&lt;p&gt;The authors described how they used affordable hardware to create a &lt;a href=&#34;https://en.wikipedia.org/wiki/Phylogenetics&#34;&gt;phylogenetic&lt;/a&gt; pipeline, tailored for the HIV drug-resistance testing facility.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-drug-resistance&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;HIV drug resistance&lt;/h2&gt;
&lt;p&gt;Natural selection is the process by which some form of selective pressure favours a &lt;strong&gt;phenotypic&lt;/strong&gt; trait or change. These phenotypic traits can be the blood group of a person, whether a pea is wrinkly or not, or whether an infectious organism is susceptible or resistant to a drug. Many times these phenotypic traits, or physical attributes, are caused by genetics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Genotyping&lt;/strong&gt; is the process by which one can infer this phenotypic trait from a genotype, and this is used more and more frequently in medicine. For exampe, in breast cancer treatment, the BRCA (BReast CAncer) genes are genotyped to determine whether these cancer suppressing genes are intact. If there is a deleterious or damaging mutation in one of these genes, it can increase the risk of developing breast cancer, thus a phenotype of increased risk of breast cancer.&lt;/p&gt;
&lt;p&gt;For most organisms, the copying of genetic material happens by very precise enzymes or pathways, but occasionally mutations do occur. If a mutation occurs and is sufficiently damaging, it gets removed from the gene pool. However, if the mutation is sufficiently beneficial, it increases the survival of this genetic variation and might biasly select for it.&lt;/p&gt;
&lt;p&gt;In the &lt;a href=&#34;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#34;&gt;previous post&lt;/a&gt;, we discussed &lt;strong&gt;ARVs&lt;/strong&gt; (antiretrovirals) and how these drugs changed the landscape of HIV infection by preventing the development of AIDS. We mentioned that ARVs suppress viral replication. One of the steps in HIV replication is the conversion of its single-stranded RNA to DNA, which can then be incorporated into the DNA of infected cells. The enzyme responsible for this conversion is reverse transcriptase, and it has a high error rate when doing this conversion. One can thus say that HIV has a high evolutionary rate, or mutation rate. These genes are translated into viral proteins, which are required to make more virions (viral particles). Proteins are strings or polymers of amino acid residues with an alphabet of 20 choices of amino acids or letters. The sequence of the DNA or RNA influences the sequence of the protein; thus, mutations in the DNA or RNA can result in changes in the protein, and our targets for stopping HIV replication are proteins/enzymes.&lt;/p&gt;
&lt;p&gt;There are various classes of ARVs which interfere with viral replication by inhibition of viral enzymes. If the DNA or RNA sequence encoding this enzyme is changed, the result might be an unfit virus not capable of further infection or replication. On the other hand, if this mutation results in an ARV-resistant virus, replication and infection can still continue in the presence of the ARV in question, possibly causing the ARV to become ineffective in stopping replication.&lt;/p&gt;
&lt;p&gt;The question remains, why do people develop resistance? The short answer: it’s a numbers game.&lt;/p&gt;
&lt;p&gt;If the patient received the correct regimen of ARVs (known as &lt;strong&gt;HAART&lt;/strong&gt;, or highly active antiretroviral treatment) and is taking the doses correctly, the viral load will suppress. Suppression is caused by stopping viral replication, and if the virus is not replicating, the error-prone reverse transcriptase can’t cause mutations, which in turn cannot be favoured by selective pressure. If the patient is not taking any treatment, the virus is replicating and thus inevitably mutating, but there is no selective pressure to select for these variants. Lastly, if the patient is adhering poorly to the treatment, there are times where the levels of the treatment are too low to effectively suppress viral replication completely. In this scenario, mutants with a mutation which makes them less susceptible to the treatment will replicate more than the wild type counterparts - these are called escape mutants.&lt;/p&gt;
&lt;p&gt;The reason why this is a numbers game is that the virus is mutating randomly and one resulting amino acid residue could be replaced by any of 19 other amino acid residues. It is only when this change causes an increase in replicative fitness while there is some form of selective pressure that this mutant can become a dominant quasi-species and the patient develops resistance.&lt;/p&gt;
&lt;p&gt;Mutations are expressed using the notation &lt;code&gt;[WT AA][POS][Mutant AA]&lt;/code&gt;, where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;WT denotes wild type (the typical genotype)&lt;/li&gt;
&lt;li&gt;AA denotes amino acid residue&lt;/li&gt;
&lt;li&gt;POS denotes the position on the protein&lt;/li&gt;
&lt;li&gt;Mutant means the changed genotype&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We mentioned some classes of ARVs in part 1. To the viral reverse transcriptase, &lt;strong&gt;NRTIs&lt;/strong&gt; (Nucleoside/Nucleotide Reverse Transcriptase Inhibitors) look like the building blocks of DNA called nucleotides. If the reverse transcriptase incorporates one of these ‘fake’ nucleotides, it is not able to further extend the DNA strand, leaving it incomplete, thus interfering with replication. Not all mutations cause the same level of resistance. These levels are:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Total score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Susceptible&lt;/td&gt;
&lt;td&gt;0 to 9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;Potential low-level resistance&lt;/td&gt;
&lt;td&gt;10 to 14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;Low-level resistance&lt;/td&gt;
&lt;td&gt;15 to 29&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td&gt;Intermediate resistance&lt;/td&gt;
&lt;td&gt;30 to 59&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td&gt;High-level resistance&lt;/td&gt;
&lt;td&gt;&amp;gt;= 60&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;a href=&#34;https://hivdb.stanford.edu/page/release-notes/&#34;&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can plot resistance scores for five commonly used NRTIs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
  library(stringr)
  library(tidyr)
  library(ggplot2)
  library(knitr)
  library(broom)
})&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nrti_dr_scores &amp;lt;- read_tsv(&amp;quot;ScoresNRTI_1555579653110.tsv&amp;quot;, col_types = &amp;quot;cdcddddddd&amp;quot;)

nrti_dr_scores %&amp;gt;% 
  select(Rule, ABC:AZT, FTC:TDF) %&amp;gt;% 
  gather(arv, score, 2:6) %&amp;gt;% 
  filter(!grepl(&amp;quot; &amp;quot;, Rule)) %&amp;gt;% 
  mutate(effect = ifelse(score &amp;gt; 0, &amp;quot;resistance&amp;quot;, &amp;quot;hyper-susceptible&amp;quot;)) %&amp;gt;% 
  
  ggplot(aes(x = Rule, y = score, fill = effect)) +
  geom_col() +
  coord_flip() +
  theme_bw() +
  facet_grid(. ~ arv)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-1-1.png&#34; width=&#34;960&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can see that 3TC and FTC have the exact same profiles, and they are chemically also very similar, as shown in the figure below.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;
&lt;img src=&#34;/post/2019-05-07-analysis-hiv-pandemic-part-2_files/lamivu10.gif&#34; alt=&#34;The chemical structures of 3TC (left) and FTC (right). Available at http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
(#fig:3TC and FTC)The chemical structures of 3TC (left) and FTC (right). Available at &lt;a href=&#34;http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber&#34; class=&#34;uri&#34;&gt;http://aras.ab.ca/articles/HAART-Nukes-AIDS-Umber&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;Also, note that some of the mutations increase susceptibility for AZT and TDF, indicated by a negative value for resistance. This is called &lt;strong&gt;hyper-susceptibility&lt;/strong&gt;, and is used by clinicians treating patients.&lt;/p&gt;
&lt;p&gt;For example, the mutation &lt;strong&gt;M184V&lt;/strong&gt; means that the wild type AA at position 184 is a methionine (M) and it has been mutated to valine (V). Although this mutation makes the virus highly resistant to 3TC, it has a crippling effect on viral replication, i.e., the virus can still replicate in the presence of 3TC, but slower. This mutation also makes the virus hypersusceptible to AZT and TDF. The way clinicians use this knowledge is to keep patients on 3TC in order to keep the selective pressure for M184V, and use AZT or TDF as the other NRTI. It is typical to have a patient on two NRTIs, which is sometimes referred to as the “back bone”, and then one drug from another drug class to which the patient is fully susceptible. Knowing the genotype of the virus allows us to infer the phenotype, which in this case is the drug-resistance profile.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;phylopi-an-affordable-purpose-built-phylogenetic-pipeline-for-the-hiv-drug-resistance-testing-facility&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;PhyloPi: An affordable, purpose built phylogenetic pipeline for the HIV drug resistance testing facility&lt;/h2&gt;
&lt;p&gt;The goal of HIV drug resistance genotyping is to determine which drugs will produce the best response in the patient, and, as mentioned earlier, we use the viral sequence information for this. Due to the rapid evolution of HIV, we can use this attribute in quality assurance. &lt;strong&gt;PCR&lt;/strong&gt; (polymerase chain reaction) is very sensitive to contamination, and if gross cross-contamination occurred during this process, the sequences of, say, two unrelated individuals might be very similar. Also, the viral sequences of a patient over time will be more similar than the sequences between different people.&lt;/p&gt;
&lt;p&gt;Let’s say we genotyped a patient five years ago and we have a current genotype sequence. It should be possible to retrieve the previous sequence from a database of sequences without relying on identifiers only, or at all. Sometimes when someone remarries they may change their surname or transcription errors can be made, which makes finding previous samples tedious and error-prone. So instead of using patient information to look for previous samples to include, we can instead use the sequence data itself, and then confirm the sequences belong to the same patient, or investigate any irregularities. If we suspect mother-to-child transmission from our analysis, we confirm this with the health care worker who sent the sample.&lt;/p&gt;
&lt;p&gt;We recently published an automated pipeline for maintaining a sequence database, automatically retrieving the most similar sequences from previous genotyped viral isolates, calculating genetic distances and phylogenetic inference. Let’s look at each of these steps.&lt;/p&gt;
&lt;p&gt;Firstly, we cannot conduct phylogenetic analysis on all past and present sequences; this would be very computationally expensive and time-consuming, and the result will be very difficult to interpret. Rather, we want to focus on the current batch of sequences the laboratory generated, but also the most similar sequences from previous batches stored in our rolling database:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We used a tool called &lt;a href=&#34;https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&amp;amp;PAGE_TYPE=BlastDocs&amp;amp;DOC_TYPE=Download&#34;&gt;&lt;code&gt;BLAST&lt;/code&gt;&lt;/a&gt; (Basic Local Alignment Search Tool) for this. This tool is used to add our new submissions to the current rolling database and then also retrieve the most similar previous sequences.&lt;/li&gt;
&lt;li&gt;These sequences are aligned using &lt;a href=&#34;https://mafft.cbrc.jp/alignment/software/&#34;&gt;&lt;code&gt;MAFFT&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The resulting multiple sequence alignment is automatically curated with &lt;a href=&#34;http://trimal.cgenomics.org/&#34;&gt;&lt;code&gt;trimAl&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, the sequences are ready for phylogenetic inference.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For this, we used &lt;a href=&#34;http://www.microbesonline.org/fasttree/&#34;&gt;&lt;code&gt;FastTree&lt;/code&gt;&lt;/a&gt;. As its name implies, it is fast and capable of handling large datasets requiring minimal resources.&lt;/li&gt;
&lt;li&gt;The resulting tree is rendered using the &lt;a href=&#34;http://etetoolkit.org/&#34;&gt;&lt;code&gt;ETE3&lt;/code&gt;&lt;/a&gt; python API.&lt;/li&gt;
&lt;li&gt;R is used to calculate a distance matrix from the multiple sequence alignment using the &lt;a href=&#34;https://cran.r-project.org/web/packages/ape/index.html&#34;&gt;&lt;code&gt;ape&lt;/code&gt;&lt;/a&gt; library and &lt;a href=&#34;https://plot.ly/r/&#34;&gt;&lt;code&gt;plotly&lt;/code&gt;&lt;/a&gt; for visualization.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In part 3 of this series, we will talk more about the distance matrix calculation and how logistic regression was used to look at inter- and intra-patient genetic distances of HIV sequences by mining a large public database at the &lt;a href=&#34;https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html&#34;&gt;Los Alamos HIV sequence database&lt;/a&gt;. This was important, as the insights gained here were used to colour the distance matrix so that the user’s attention is drawn to relevant samples.&lt;/p&gt;
&lt;p&gt;This is an R for medicine blog post, but there is a lot of jargon in the paragraph above. We can clear things up a bit, but please check out our &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;publication&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;how-does-it-work&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;How does it work?&lt;/h2&gt;
&lt;p&gt;Firstly, our DNA sequences are strings consisting of an alphabet: A, C, G, and T. Also, genetic distances are much like &lt;a href=&#34;https://en.wikipedia.org/wiki/Levenshtein_distance&#34;&gt;Levenshtein&lt;/a&gt; or &lt;a href=&#34;https://en.wikipedia.org/wiki/Hamming_distance&#34;&gt;Hamming&lt;/a&gt; distances, or other &lt;a href=&#34;https://en.wikipedia.org/wiki/Edit_distance&#34;&gt;edit distance&lt;/a&gt; algorithms.&lt;/p&gt;
&lt;div id=&#34;raw-strings&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Raw strings&lt;/h3&gt;
&lt;p&gt;Consider the following strings, A, B and C:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;A: peter kicked the ball really far
B: i think it was yesterday when peter kicked the ball really far
C: pieter kicked the round ball really hard&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see that there are obvious similarities between these three sentences, but it would be much easier if they where aligned.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;aligned-strings&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Aligned strings&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;A: ______________________________p eter kicked the _____ ball really far
B: i think it was yesterday when p eter kicked the _____ ball really far
C: ______________________________pieter kicked the round ball really hard&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By aligning the string it is much easier to calculate the similarities or differences.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;curated-strings&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Curated strings&lt;/h3&gt;
&lt;p&gt;Next, we remove the overhangs since it is possible that in reality strings A and C also had more text on the left-hand side, but it was not sampled. Depending on your situation, we could also remove the internal ‘gaps’ like the word ‘round’. For our pipeline, insertions and deletions, like the letter ‘i’ in our example and the word ‘round’ are real features we would like to include. We also have a substitution in C, where the ‘f’ in A and B was changed to an ‘h’.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;A: p eter kicked the _____ ball really far
B: p eter kicked the _____ ball really far
C: pieter kicked the round ball really har&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;calculation&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Calculation&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;A: p eter kicked the _____ ball really far
B: p eter kicked the _____ ball really far
M: 111111 111111 111 11111 1111 111111 111&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can see for A and B we have matches for all of the features. If we sum up all the ones, we get 33, so the distance between them:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ d = \frac{33 - 33}{33} = 0\]&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;B: p eter kicked the _____ ball really far
C: pieter kicked the round ball really har
M: 101111 111111 111 00000 1111 111111 011&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ d = \frac{33 - 26}{33} = 0.212\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;After the multiple sequence alignment and curation, each sequence is compared to each in order to calculate a distance matrix. This can then be used to create a phylogenetic tree, like a kind of dendrogram that can be calculated using hierarchical clustering. The above is very simplified, but should give enough background to understand the rest of the post. The resource at &lt;a href=&#34;https://www.ebi.ac.uk/training/online/course/introduction-phylogenetics/what-phylogenetics&#34;&gt;EMBL-EBI Train Online&lt;/a&gt; is a good place to get started if you want to know more&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;the-pipeline-on-a-raspberry-pi&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The pipeline on a Raspberry Pi&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.raspberrypi.org/&#34;&gt;Raspberry Pi&lt;/a&gt; is a small and cheap single-board computer. It is used amongst many hobbyists for all kinds of projects, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://pyvideo.org/pycon-us-2012/militarizing-your-backyard-with-python-computer.html&#34;&gt;Militarizing Your Backyard with Python: Computer Vision and the Squirrel Hordes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://hackaday.com/2013/01/20/raspberry-pi-and-r/&#34;&gt;Brewing beer with the help of R&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://retropie.org.uk/&#34;&gt;Retro gaming machines&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One of the motivations behind developing this computer was to teach kids to &lt;a href=&#34;http://blog.sparkfuneducation.com/teaching-coding-to-kids-using-raspberry-pi-3-and-scratch&#34;&gt;code or engage in electronics&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All of the above are very important, but the Raspberry Pi has made its way into &lt;strong&gt;science and medicine&lt;/strong&gt; as well. For example, a group developed a cheap &lt;a href=&#34;https://pubs.rsc.org/en/content/articlehtml/2017/sc/c7sc03281a&#34;&gt;instrument&lt;/a&gt; to diagnose Ebola virus infection in the field. Researchers can attach various sensors to the Raspberry Pi and use it for data collection.&lt;/p&gt;
&lt;div id=&#34;benchmarking&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Benchmarking&lt;/h3&gt;
&lt;p&gt;For our application, we needed to show that the Pi can handle the problem we wanted it to solve, so we did some benchmarking.&lt;/p&gt;
&lt;p&gt;We used &lt;a href=&#34;https://www.seleniumhq.org/&#34;&gt;Selenium WebDriver&lt;/a&gt; to operate the pipeline as a human would, by actually browsing for an input file and submitting it through the button. Time stamps were taken for each step, and the number of blast hits that were included in the phylogenetic inference was also recorded. For this exercise, we set the number of closest sequences to retrieve for each sample to 5, which means the submitted sample and 4 of the genetically closest samples. However, it is possible that different submitted sequences have retrieved a sequence in common; these will be included in the analysis only once. When we start analyzing this data, we will see this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read csv with time data
time_dat &amp;lt;- read_csv(
  &amp;quot;timeFile.csv&amp;quot;, 
  col_types = &amp;quot;ccd&amp;quot;,
  col_names = c(&amp;quot;Run&amp;quot;, &amp;quot;Description&amp;quot;, &amp;quot;Measure&amp;quot;)
)

head(time_dat) %&amp;gt;% 
  kable(caption = &amp;quot;First few lines of the benchmarking data.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:import&#34;&gt;Table 1: &lt;/span&gt;First few lines of the benchmarking data.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Run&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Description&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;Measure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;blastHits&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;blast&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;11.219230&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;mafftTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;13.404623&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;trimalTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.111737&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;fasttreeTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.986582&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;final5best_random_1&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;heatmapTime&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.354820&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The &lt;code&gt;Run&lt;/code&gt; column shows some info regarding the benchmarking experiment. We know we asked for the five best hits to be included; the sequences were pseudo-randomly selected. We started with one sequence for submission and then incremented this by one up to 50. The above again shows how data is not always in the best format for working with. We need to extract the digits at the end of the Run variable. Previously we used the &lt;code&gt;tidyr::gather()&lt;/code&gt; function to pivot data from wide to long. This time we will use the &lt;code&gt;spread()&lt;/code&gt; function to make long data wide.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_dat &amp;lt;- time_dat %&amp;gt;% 
  mutate(nSubmitted = str_extract(Run, &amp;quot;\\d+$&amp;quot;) %&amp;gt;% as.numeric) %&amp;gt;% 
  select(-Run ) %&amp;gt;% 
  spread(Description, Measure)

head(time_dat) %&amp;gt;% 
  kable(caption = &amp;quot;First few lines of the benchmarking data after some cleaning.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-2&#34;&gt;Table 2: &lt;/span&gt;First few lines of the benchmarking data after some cleaning.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;right&#34;&gt;nSubmitted&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;blast&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;blastHits&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;fasttreeTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;heatmapTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;mafftTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;renderTime&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;trimalTime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;1&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;11.21923&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.986582&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.354820&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;13.40462&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.686239&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.1117370&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;2&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;22.08694&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;10&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3.129514&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.369152&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;30.26920&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.890183&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2699649&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;3&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;33.67705&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;15&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;5.480334&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.400223&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;47.42213&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.107776&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.4849610&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;4&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;43.58782&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;21&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.627502&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.437273&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;76.47209&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.243336&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.7980120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;right&#34;&gt;5&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;55.43246&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;25&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;10.753521&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.476636&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;105.21836&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.494058&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.0820050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;right&#34;&gt;6&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;65.18629&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;30&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;9.688977&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.516058&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;128.93219&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2.653201&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1.4656579&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We got rid of the useless data in the &lt;code&gt;Run&lt;/code&gt; variable and extracted the useful information into the &lt;code&gt;nSubmitted&lt;/code&gt; variable.&lt;/p&gt;
&lt;p&gt;Below are the explanations for the variables.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;nSubmitted&lt;/code&gt;: Number of sequences submitted or uploaded to the pipeline&lt;/li&gt;
&lt;li&gt;&lt;code&gt;blast&lt;/code&gt;: time in seconds for blast to find most similar previously sequenced samples&lt;/li&gt;
&lt;li&gt;&lt;code&gt;blastHits&lt;/code&gt;: the number of sequences retrieved&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mafftTime&lt;/code&gt;: the time it took to create a multiple-sequence alignment&lt;/li&gt;
&lt;li&gt;&lt;code&gt;trimalTime&lt;/code&gt;: the time it took to clean the multiple-sequence alignment&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fasttreeTime&lt;/code&gt;: the time it took for phylogenetic inference&lt;/li&gt;
&lt;li&gt;&lt;code&gt;heatmapTime&lt;/code&gt;: the time it took to produce the heatmap&lt;/li&gt;
&lt;li&gt;&lt;code&gt;renderTime&lt;/code&gt;: the time it took to render the tree&lt;/li&gt;
&lt;/ul&gt;
&lt;div id=&#34;number-of-sequences-submitted-vs.-most-similar-sequences-retrieved&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Number of sequences submitted &lt;em&gt;vs.&lt;/em&gt; most similar sequences retrieved&lt;/h4&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_dat %&amp;gt;%
  ggplot(aes(x = nSubmitted, y = blastHits)) +
  geom_smooth(method = lm, se = FALSE, colour = &amp;quot;black&amp;quot;, formula = y ~ x - 1, size = 0.25) +
  geom_point() +
  theme_bw() +
  xlab(&amp;quot;Number of sequences submitted&amp;quot;) +
  ylab(&amp;quot;Number of sequences retrieved using blastn&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 41, y = 72, label = &amp;quot;y == 4.628 * x&amp;quot;, parse = TRUE) +
  annotate(&amp;quot;text&amp;quot;, x = 40, y = 60, label = &amp;quot;R^2 == 0.998&amp;quot;, parse = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;- lm(blastHits ~ nSubmitted - 1, data = time_dat)
tidy(fit) %&amp;gt;% 
  kable(caption = &amp;quot;Regression analysis of the number of blast hits retrieved.&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-4&#34;&gt;Table 3: &lt;/span&gt;Regression analysis of the number of blast hits retrieved.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;term&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;statistic&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;nSubmitted&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4.628026&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0280312&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;165.1026&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A linear line fits the data really well. We mentioned that if different sequences retrieve the same sequence from the database, it is used only once. The slope of this line will depend on the genetic diversity of the database. A more diverse database will have a steeper slope, whereas a less diverse database will have a shallower slope. Also, theoretically, at some point, the line will reach an asymptote as the number of requested sequences start to saturate the number of available sequences. Practically, one would not have to submit more than 16 - 24 samples at a time; thus, we are in the linear part of the rarefaction curve. We can thus see from this that for the Los Alamos data used in the analysis, about 4.5 sequences get retrieved for every sequence submitted.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;blast-time-vs.-number-of-sequences-submitted&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;BLAST time &lt;em&gt;vs.&lt;/em&gt; number of sequences submitted&lt;/h4&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;time_dat %&amp;gt;%
  ggplot(aes(x = nSubmitted, y = blast)) +
  geom_smooth(method = lm, se = FALSE, colour = &amp;quot;black&amp;quot;, formula = y ~ x, size = 0.25) +
  geom_point(colour = &amp;quot;blue&amp;quot;) +
  theme_bw() +
  xlab(&amp;quot;Number of input sequences&amp;quot;) + ylab(&amp;quot;Time in seconds (blastn)&amp;quot;) +
  annotate(&amp;quot;text&amp;quot;, x = 41, y = 90, label = &amp;quot;y == 11.0453 * x&amp;quot;, parse = TRUE) +
  annotate(&amp;quot;text&amp;quot;, x = 40, y = 60, label = &amp;quot;R^2 == 0.9999&amp;quot;, parse = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;- lm(time_dat$blast ~ time_dat$nSubmitted)
tidy(fit) %&amp;gt;% 
  kable(caption = &amp;quot;Regression analysis of blastn time vs. number of sequences.&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-6&#34;&gt;Table 4: &lt;/span&gt;Regression analysis of blastn time vs. number of sequences.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;term&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;statistic&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;(Intercept)&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-0.8176139&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.5185500&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;-1.576731&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.121426&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;time_dat$nSubmitted&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;11.0453236&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0176978&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;624.105409&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.000000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Again, we see a linear relationship for &lt;code&gt;blastn&lt;/code&gt; and the time it takes to complete. For every sequence submitted, it takes about 11 seconds to search a database of about 11,000 sequence entries. We can say the &lt;code&gt;blastn&lt;/code&gt; displays linear time complexity or &lt;span class=&#34;math inline&#34;&gt;\(O(n)\)&lt;/span&gt; time. We did not discover anything new here. Remember, the purpose of this is to show off the Pi flexing its muscles. (You can read about the BLAST algorithm &lt;a href=&#34;https://www.ncbi.nlm.nih.gov/pubmed/2231712&#34;&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;multiple-sequence-alignment-time-vs.-number-of-total-sequences-submitted-and-retrieved&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Multiple sequence alignment time &lt;em&gt;vs.&lt;/em&gt; number of total sequences, submitted and retrieved&lt;/h4&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit &amp;lt;- lm(mafftTime ~ I(blastHits^2) - 1, data = time_dat)

time_dat %&amp;gt;%
  ggplot(aes(x = blastHits, y = mafftTime)) +
  geom_point(colour = &amp;quot;blue&amp;quot;) +
  geom_smooth(method = &amp;quot;lm&amp;quot;,formula = y ~ I(x^2) - 1, colour = &amp;quot;black&amp;quot;, size = 0.25) +
  annotate(&amp;quot;text&amp;quot;, x = 190, y = 1800, label = &amp;quot;y == 0.09997 * x^2&amp;quot;, parse = TRUE) +
  theme_bw() +
  xlab(&amp;quot;Number of sequences in alignment&amp;quot;) + 
  ylab(&amp;quot;Time in seconds (MAFFT)&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-07-analysing-hiv-pandemic-part-2/2019-05-07-analysing-hiv-pandemic-part-2_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tidy(fit) %&amp;gt;% 
  kable(caption = &amp;quot;Regression analysis of multiple sequence alignment.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-8&#34;&gt;Table 5: &lt;/span&gt;Regression analysis of multiple sequence alignment.&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;term&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;statistic&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;I(blastHits^2)&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.099974&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0004048&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;246.9813&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Since in multiple sequence alignment, each sequence is aligned with each other sequence, we would expect &lt;span class=&#34;math inline&#34;&gt;\(O(N^2)\)&lt;/span&gt; time complexity. We can see in our regression result that we are very close to what we expect. And &lt;span class=&#34;math inline&#34;&gt;\(O\)&lt;/span&gt; is a bit less than a sixth of a second. Thus, if we would analyse 16 sequences, we would retrieve &lt;span class=&#34;math inline&#34;&gt;\(16 * 4.5 = 72\)&lt;/span&gt;, and the multiple-sequence alignment would take &lt;span class=&#34;math inline&#34;&gt;\(0.09997 * 72^2 = 518\)&lt;/span&gt; seconds or ~8.6 minutes, which is not bad. Also consider that you can submit your samples and walk away.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;impact&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Impact&lt;/h3&gt;
&lt;p&gt;It is important to mention that PhyloPi is not used for tracking or detecting transmission clusters, but rather offers a way of automating phylogenetic analysis. Some patients will be genotyped more than once, and these sequences will cluster very closely on a phylogenetic tree. This offers a spot check into the quality of the results. Sometimes we find that the patient has two different first names, which they interchangeably use depending on the health care worker and patient language preference. We have also detected sample swaps which otherwise would have gone unnoticed.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;what-next&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What next?&lt;/h2&gt;
&lt;p&gt;In part 3, we will discuss how the inter- and intrapatient HIV genetic distances were analyzed using logistic regression to gain insights into the probability distribution of these two classes. This is also where we asked Andrie from RStudio for help. It was useful for us biologists and virologists to have someone not just to oversee the analysis we did, but also to implement the correct analysis to get the job done. Hope to see you in the next section!&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/05/07/pipeline-for-analysing-hiv-part-2/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Analysing the HIV pandemic, Part 1: HIV in sub-Sahara Africa</title>
      <link>https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/</link>
      <pubDate>Tue, 30 Apr 2019 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/</guid>
      <description>
        


&lt;p&gt;&lt;em&gt;Phillip (Armand) Bester is a medical scientist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Sabeehah Vawda is a pathologist, researcher, and lecturer at the &lt;a href=&#34;https://www.ufs.ac.za/health/departments-and-divisions/virology-home&#34;&gt;Division of Virology&lt;/a&gt;, &lt;a href=&#34;https://www.ufs.ac.za&#34;&gt;University of the Free State&lt;/a&gt;, and &lt;a href=&#34;http://www.nhls.ac.za/&#34;&gt;National Health Laboratory Service (NHLS)&lt;/a&gt;, Bloemfontein, South Africa&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Andrie de Vries is the author of “R for Dummies” and a Solutions Engineer at RStudio&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.immunology.org/public-information/bitesized-immunology/pathogens-and-disease/human-immunodeficiency-virus-hiv&#34;&gt;Human Immunodeficiency Virus&lt;/a&gt; (&lt;strong&gt;HIV&lt;/strong&gt;) is the virus that causes acquired immunodeficiency syndrome (&lt;strong&gt;AIDS&lt;/strong&gt;). The virus invades various immune cells, causing loss of immunity, and thus increased susceptibility to infections, including Tuberculosis and cancer. In a recent publication in &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PLoS ONE&lt;/a&gt;, the authors described how they used affordable hardware to create a &lt;a href=&#34;https://en.wikipedia.org/wiki/Phylogenetics&#34;&gt;phylogenetic&lt;/a&gt; pipeline, tailored for the HIV drug resistance testing facility. In this series of blog posts we highlight the serious problem of HIV infection in sub-Saharan Africa, with special analysis of the situation in South Africa.&lt;/p&gt;
&lt;div id=&#34;stages-of-hiv-infection&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Stages of HIV infection&lt;/h3&gt;
&lt;p&gt;HIV infection can be divided into the three consecutive stages: acute primary infection, asymptomatic stage, and the symptomatic stage.&lt;/p&gt;
&lt;p&gt;The first stage, &lt;strong&gt;acute primary infection&lt;/strong&gt;, has symptoms very much like flu and may last for a week or two. The body reacts with an immune response, which results in the production of antibodies to fight the HIV infection. This process is called seroconversion and can last a couple of months. During this stage, although the patient is infected and the virus is spreading through the body, the patient might not test positive. This initial period of seroconversion is called ‘the window period’ and depends on the type of test used. Rapid tests are done at the point of care. This means that the test can be done at the clinic with a finger prick and the result is ready in 20 minutes. The drawback of this test is a window period of three months and a small false positive rate. The rapid test detects HIV antibodies, and because the immune system needs some time to produce sufficient antibodies to be detected, there is this window period. Most laboratories these days use fourth-generation &lt;a href=&#34;https://www.immunology.org/public-information/bitesized-immunology/experimental-techniques/enzyme-linked-immunosorbent-assay&#34;&gt;ELISA&lt;/a&gt; (Enzyme-Linked Immunosorbent Assay) for HIV diagnosis and confirmation. This technique detects both HIV antibodies and antigens. Antigens are the foreign objects that the immune system recognizes as ‘non-self’; in this case, it is the viral protein p24. The advantage of this technique is a window period of only one month.&lt;/p&gt;
&lt;p&gt;This first stage, including the window period, is then followed by the &lt;strong&gt;asymptomatic stage&lt;/strong&gt;, which may last for as long as ten years. During this stage, the infected person does not experience symptoms and feels healthy. However, the virus is still replicating and destroying immune cells, especially CD4 cells. This damages the immune system and ultimately leads to stage 3 if not treated. This does not mean that people at stage 3 are doomed, but the earlier treatment starts, the better the outcome.&lt;/p&gt;
&lt;p&gt;Stage 3 is referred to as &lt;strong&gt;symptomatic HIV infection or AIDS&lt;/strong&gt; (Acquired Immune Deficiency Syndrome). At this stage, the immune system is so weak that it is not able to fight off bacterial or fungal infections that typically do not cause infections in immune competent people. These serious infections are called opportunistic infections, and have a high morbidity and mortality rate.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;transmission-and-epidemiology&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Transmission and epidemiology&lt;/h3&gt;
&lt;p&gt;Worldwide, approximately 36.9 million (UNAIDS) people are living with HIV.&lt;/p&gt;
&lt;p&gt;HIV is transmitted mainly by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Having unprotected sex&lt;/li&gt;
&lt;li&gt;Non-sterile needles in drug use or sharing needles&lt;/li&gt;
&lt;li&gt;Mother-to-child transmission during birth or breastfeeding&lt;/li&gt;
&lt;li&gt;Infected blood transfusions, transplants or other medical procedures (very unlikely)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We mentioned the window period of the HIV infection as well as the asymptomatic stage. During any of the stages, it is possible to transmit the infection. The problem with the window period is an unknown HIV status or falsely assumed negative status, and during the asymptomatic stage, there is no reason for the infected person to seek medical attention. There are obviously behavioural issues in HIV transmission, and due to the long asymptomatic phase, HIV-positive status can be unknown for a long period. For these reasons, it is important that high-risk individuals do frequent HIV tests to determine their status.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;treatment-for-hiv-infection&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Treatment for HIV infection&lt;/h3&gt;
&lt;p&gt;HIV is treatable but not (yet) curable. The good news, however, is that if a person receives &lt;strong&gt;antiretroviral (ARV) treatment&lt;/strong&gt;, their viral load suppresses (viral replication stops) and the chance of transmitting HIV drastically decreases.&lt;/p&gt;
&lt;p&gt;So 30 years into this pandemic, the big question is, why is HIV still a problem?&lt;/p&gt;
&lt;p&gt;Not all countries adopted the use of ARVs in an equal manner. Although AZT (Zidovudine) was the first drug to be approved by the &lt;a href=&#34;https://www.fda.gov/forpatients/illness/hivaids/history/ucm151074.htm&#34;&gt;FDA&lt;/a&gt; in March 1987, it was soon discovered that monotherapy with only AZT was not effective for very long, as the virus developed resistance to the medicine quickly. Since then, ARVs have come a long way, and patients are placed on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HAART&lt;/strong&gt; (Highly Active Antiretroviral Treatment), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;cART&lt;/strong&gt; (combination Antiretroviral Treatment), which typically consists of 3 drugs of different classes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-in-africa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;HIV in Africa&lt;/h2&gt;
&lt;p&gt;Let’s look at the rates of HIV infection in different African countries. The world factbook by the CIA has some HIV infection rate &lt;a href=&#34;https://www.cia.gov/LIBRARY/publications/the-world-factbook/rankorder/rawdata_2155.txt&#34;&gt;data&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
  library(stringr)
  library(tidyr)
  library(ggplot2)
  library(forcats)
  library(knitr)
  library(maptools)
  library(viridis)
  library(RColorBrewer)
  library(mapproj)
  library(broom)
  library(ggrepel)
  library(sf)
})&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# read the HIV data
HIV_rate_2016 &amp;lt;- read_csv(
  file.path(file_path, &amp;quot;HIV rates.csv&amp;quot;), col_names = TRUE, col_types = &amp;quot;cd&amp;quot;
  )

# read the Africa shape file
africa &amp;lt;-
  sf::st_read(
    file.path(file_path, &amp;quot;Africa_SHP/Africa.shp&amp;quot;), 
    stringsAsFactors = FALSE, quiet = TRUE
    ) %&amp;gt;%
  rename(Country = &amp;quot;COUNTRY&amp;quot;) %&amp;gt;%
  left_join(HIV_rate_2016, by = &amp;quot;Country&amp;quot;)

africa %&amp;gt;%
  ggplot(aes(fill = Rate)) +
  geom_sf() +
  coord_sf() +
  scale_fill_viridis(option = &amp;quot;plasma&amp;quot;) +
  theme_minimal()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_map-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In the choropleth above, we see that South Africa, Botswana, Lesotho, and Swaziland seem to have the highest rates of infection. This is presented as the percentage infected, which takes into account population sizes. It is important to understand that the level of denial is indirectly proportional to the reported rate of infection. Even in this day and age, denial of stigmatized diseases is an issue.&lt;/p&gt;
&lt;div id=&#34;cleaning-the-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Cleaning the data&lt;/h3&gt;
&lt;p&gt;We can also look at the burden of HIV as the number of people infected, and we might get a different picture from what we saw from the choropleth.&lt;/p&gt;
&lt;p&gt;Here, we read in the &lt;a href=&#34;http://apps.who.int/gho/data/node.main.626&#34;&gt;data&lt;/a&gt;, and rename the columns to &lt;code&gt;Country&lt;/code&gt;, &lt;code&gt;PersCov&lt;/code&gt; (percentage ARV coverage), &lt;code&gt;NumberOnARV&lt;/code&gt; (Number of patients on ARVs), and &lt;code&gt;NumberInfected&lt;/code&gt; (Number of patients infected).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Read csv with ARV infection dat
arv_dat &amp;lt;- read_csv(file.path(file_path, &amp;quot;ARV cov 2017.csv&amp;quot;), 
  col_types = &amp;quot;cccc&amp;quot;,
  col_names = c(&amp;quot;Country&amp;quot;, &amp;quot;PersCov&amp;quot;, &amp;quot;NumberOnARV&amp;quot;, &amp;quot;NumberInfected&amp;quot;),
  skip = 1
)

head(arv_dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 4
##   Country             PersCov    NumberOnARV NumberInfected           
##   &amp;lt;chr&amp;gt;               &amp;lt;chr&amp;gt;      &amp;lt;chr&amp;gt;       &amp;lt;chr&amp;gt;                    
## 1 Afghanistan         No data    790         No data                  
## 2 Albania             42 [40-44] 570         1400 [1300-1400]         
## 3 Algeria             80 [75-87] 11000       14 000 [13 000-15 000]   
## 4 Andorra             No data    No data     No data                  
## 5 Angola              26 [22-30] 78700       310 000 [260 000-360 000]
## 6 Antigua and Barbuda No data    No data     No data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This data has several symptoms of being very messy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very long variable names, descriptive, but difficult to work with; this was changed during import&lt;/li&gt;
&lt;li&gt;The values contain confidence intervals in brackets; this will be difficult to work with as-is&lt;/li&gt;
&lt;li&gt;We might want to transform no data to &lt;code&gt;NA&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;We are interested in Sub-Saharan Africa, but the data is for the whole world&lt;/li&gt;
&lt;/ul&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# A list of Sub-Saharan countries
sub_sahara &amp;lt;- readLines(file.path(file_path, &amp;quot;Sub-Saharan.txt&amp;quot;))

clean_column &amp;lt;- function(x){
  # Remove the ranges in brackets and convert the values to numeric
  x %&amp;gt;% 
    str_replace_all(&amp;quot;\\[.*?\\]&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;% 
    str_replace_all(&amp;quot;&amp;lt;&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;%
    str_replace_all(&amp;quot; &amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;% 
    as.numeric()
}

arv_dat &amp;lt;- 
  arv_dat %&amp;gt;% 
  filter(Country %in% sub_sahara) %&amp;gt;% 
  na_if(&amp;quot;No data&amp;quot;) %&amp;gt;% 
  mutate_at(2:4, clean_column)

head(arv_dat)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 4
##   Country      PersCov NumberOnARV NumberInfected
##   &amp;lt;chr&amp;gt;          &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;          &amp;lt;dbl&amp;gt;
## 1 Angola            26       78700         310000
## 2 Benin             55       38400          70000
## 3 Botswana          84      318000         380000
## 4 Burkina Faso      65       61400          94000
## 5 Burundi           77       60100          78000
## 6 Cameroon          49      254000         510000&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We use a regular expression to get rid of all the square bracket ranges. We also remove the “&amp;lt;” sign and spaces within numbers, change “No data” to &lt;code&gt;NA&lt;/code&gt;, and convert the characters to numbers. We filter out the countries we don’t want. (Note that some countries are not available in the ARV data, e.g., Swaziland and Reunion.)&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;highest-infected-countries&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Highest infected countries&lt;/h3&gt;
&lt;p&gt;Now look at the countries with the highest number of infected people of all ages.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_dat %&amp;gt;% 
  top_n(4, wt = NumberInfected) %&amp;gt;% 
  arrange(-NumberInfected) %&amp;gt;% 
  kable(
    caption = &amp;quot;Countries with the highest number of HIV infections&amp;quot;
  )&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-1&#34;&gt;Table 1: &lt;/span&gt;Countries with the highest number of HIV infections&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Country&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;PersCov&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;NumberOnARV&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;NumberInfected&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;South Africa&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;61&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;4359000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;7200000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Nigeria&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;33&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1040000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;3100000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Mozambique&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;54&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1156000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;2100000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Kenya&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;75&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1122000&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;1500000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We can see that South Africa has the highest number of HIV-infected people in Sub-Saharan Africa.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-in-southern-africa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;HIV in Southern Africa&lt;/h2&gt;
&lt;p&gt;In South Africa, the first AIDS-related death occurred in 1985. Not all patients were eligible to receive ARVs, and it was only in 2004 that ARVs became available in the public sector in South Africa. Eligibility restriction still applied, so not all HIV infected patients received treatment.&lt;/p&gt;
&lt;p&gt;Ideally, a country would have all its HIV-infected people on treatment, but due to financial constraints, this is not always possible. In South Africa, patients were only initialized on ARVs when their CD4 counts dropped below a certain level. This threshold was initially 200 cells/mL in 2004, which was then changed to 350 cells/mL and 500 cell/mL at later intervals. These recommendations were a compromise between the availability of funds and getting ARVs to the people needing it the most. CD4 cells are a major component of the immune system; the lower the CD4 cell count the higher the chance for opportunistic infections. Thus, the idea is to support the patients who are most likely to contract an opportunistic infection.&lt;/p&gt;
&lt;p&gt;The problem with this was that about only a third of the HIV infected people in South Africa were receiving HAART treatment. In 2017, the guidelines changed to test and treat; i.e., any newly diagnosed patient will receive HAART treatment. This is a big improvement for many reasons, but notably a lower infection rate. If a patient is taking HAART treatment and it is effective in suppressing the viral replication, the chances of the patient transmitting the virus are very close to zero.&lt;/p&gt;
&lt;p&gt;However, these treatments are not without side effects, which in some cases causes very poor adherence to the treatment. There are numerous factors to blame here, specifically socio-economic factors and depression. There is also ignorance and the “fear of knowing”, which causes people not to know their status. Finally, human nature brings with it various other complexities, such as conspiracy theories, and religious and personal beliefs. This will be a very long post if we delve into all the issues, but the take-home message is: the situation is complicated.&lt;/p&gt;
&lt;div id=&#34;arv-coverage-by-country&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;ARV coverage by country&lt;/h3&gt;
&lt;p&gt;We looked at the rate of HIV infections, and also the number of people infected, in the most endemic countries. We have talked about treatment. It would be interesting to look at ARV coverage by country.&lt;/p&gt;
&lt;p&gt;Let’s see how these countries rank by ARV coverage:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_dat %&amp;gt;%
  na.omit(PersCov) %&amp;gt;%
  ggplot(aes(x = reorder(Country, PersCov), y = PersCov)) +
  geom_point(aes(colour = NumberInfected), size = 3) +
  scale_colour_viridis(
    name = &amp;quot;Number of people infected&amp;quot;, 
    trans = &amp;quot;log10&amp;quot;,
    option = &amp;quot;plasma&amp;quot;
  ) +
  coord_flip() +
  ylab(&amp;quot;% ARV coverage&amp;quot;) + xlab(&amp;quot;Country&amp;quot;) +
  theme_bw()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_rank-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This shows that Zimbabwe, Namibia, Botswana, and Rwanda have the highest ARV coverage (above 80%). South Africa has the highest number of infections (as we saw before), and coverage of just above 60%.&lt;/p&gt;
&lt;p&gt;Botswana rolled out their treatment program in 2002, and by mid-2005, about half of the eligible population received ARV treatment. South Africa, on the other hand, only started treatment in 2004, which we discuss later.&lt;/p&gt;
&lt;p&gt;When talking about treatment, we should also look at the changes in mortality.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;hiv-related-deaths&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;HIV related deaths&lt;/h3&gt;
&lt;p&gt;Read in the &lt;a href=&#34;http://apps.who.int/gho/data/node.main.623?lang=en&#34;&gt;data&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hiv_mort &amp;lt;- 
  read_csv(file.path(file_path, &amp;quot;HIV deaths.csv&amp;quot;), col_types = &amp;quot;ccccc&amp;quot;) %&amp;gt;% 
  na_if(&amp;quot;No data&amp;quot;) %&amp;gt;% 
  mutate_at(vars(starts_with(&amp;quot;Deaths&amp;quot;)), clean_column) %&amp;gt;% 
  filter(Country %in% sub_sahara)

head(hiv_mort)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 5
##   Country      Deaths_2017 Deaths_2010 Deaths_2005 Deaths_2000
##   &amp;lt;chr&amp;gt;              &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;
## 1 Angola             13000       10000        7900        3900
## 2 Benin               2500        2600        4300        2600
## 3 Botswana            4100        5900       13000       15000
## 4 Burkina Faso        2900        5400       12000       15000
## 5 Burundi             1700        5400        8600        8500
## 6 Cameroon           24000       25000       26000       17000&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary(hiv_mort)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    Country           Deaths_2017      Deaths_2010      Deaths_2005    
##  Length:43          Min.   :   100   Min.   :   100   Min.   :   100  
##  Class :character   1st Qu.:  1900   1st Qu.:  1975   1st Qu.:  2050  
##  Mode  :character   Median :  4400   Median :  5400   Median :  8250  
##                     Mean   : 15442   Mean   : 23483   Mean   : 33227  
##                     3rd Qu.: 16250   3rd Qu.: 27250   3rd Qu.: 48250  
##                     Max.   :150000   Max.   :200000   Max.   :260000  
##                     NA&amp;#39;s   :3        NA&amp;#39;s   :3        NA&amp;#39;s   :3       
##   Deaths_2000    
##  Min.   :   100  
##  1st Qu.:  1150  
##  Median :  6500  
##  Mean   : 26496  
##  3rd Qu.: 41500  
##  Max.   :130000  
##  NA&amp;#39;s   :3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The 2017 mean for the dataset as a whole is about half of that during the early 2000s. It would be interesting to plot this data, but it will probably be too busy as it is. We can instead have a look at countries which had the most change.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hiv_mort &amp;lt;- hiv_mort %&amp;gt;% 
  mutate(
    min = apply(hiv_mort[, 2:4], 1, FUN = min),
    max  = apply(hiv_mort[, 2:4], 1, FUN = max),
    Change = max - min
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we can create a plot of the data, and look at the top five countries with the biggest change in HIV-related mortality.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hiv_mort %&amp;gt;%
  top_n(5, wt = Change) %&amp;gt;%
  gather(Year, Deaths, Deaths_2017:Deaths_2000) %&amp;gt;% 
  na.omit() %&amp;gt;%
  mutate(
    Year = str_replace(Year, &amp;quot;Deaths_&amp;quot;, &amp;quot;&amp;quot;) %&amp;gt;% as.numeric(),
    Country = fct_reorder(Country, Deaths)
  ) %&amp;gt;% 
  ggplot(aes(x = Year, y = Deaths, color = Country)) +
  geom_line(size = 1) +
  geom_vline(xintercept = 2004, color = &amp;quot;black&amp;quot;, linetype = &amp;quot;dotted&amp;quot;, size = 1.5) +
  scale_color_viridis(option = &amp;quot;D&amp;quot;, discrete = TRUE) +
  theme_bw() +
  theme(legend.position = &amp;quot;bottom&amp;quot;)  &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/plot_hiv_mort-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Remember, we mentioned that &lt;strong&gt;HAART&lt;/strong&gt; (Highly Active Antiretroviral Treatment) was introduced in 2004 in South Africa, depicted here by the black dotted line. It is easy to appreciate the dramatic effect the introduction of ARVs had in South Africa.&lt;/p&gt;
&lt;p&gt;Although the picture above is positive, the fight is not over. The target is to get at least 90% of HIV-infected patients on treatment. Adherence to ARV regimens stays crucial not only to suppress viral replication, but also to minimize the development of drug resistance.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;infection-rates&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Infection rates&lt;/h3&gt;
&lt;p&gt;As mentioned earlier, if a patient is taking and responding to treatment, the viral load gets suppressed and the chances of transmitting the infection become very close to null. Thus, the more patients with an undetectable viral load, the lower the transmission rate.&lt;/p&gt;
&lt;p&gt;Read the &lt;a href=&#34;http://aidsinfo.unaids.org/?did=5b4eaa7cdddb54192bb39714&amp;amp;r=world&amp;amp;t=null&amp;amp;tb=d&amp;amp;bt=dnli&amp;amp;ts=null&amp;amp;tr=world&amp;amp;tl=2&#34;&gt;data&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;new_infections &amp;lt;- 
  read_csv(file.path(file_path, 
    &amp;quot;Epidemic transition metrics_Trend of new HIV infections.csv&amp;quot;), 
    na = &amp;quot;...&amp;quot;, 
    col_types = cols(
      .default = col_character(),
      `2017_1` = col_double()
    )
  ) %&amp;gt;% 
  select(
    -ends_with(&amp;quot;_upper&amp;quot;), 
    -ends_with(&amp;quot;lower&amp;quot;), 
    -ends_with(&amp;quot;_1&amp;quot;)
  ) %&amp;gt;% 
  mutate_at(-1, clean_column) %&amp;gt;%
  na.omit()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Duplicated column names deduplicated: &amp;#39;2017&amp;#39; =&amp;gt; &amp;#39;2017_1&amp;#39; [26]&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;new_infections %&amp;gt;% 
  gather(Year, NewInfections, 2:9) %&amp;gt;% 
  ggplot(aes(x = Year, y = NewInfections, color = Country)) +
  geom_point() +
  theme_classic() +
  theme(legend.position = &amp;quot;none&amp;quot;) +
  xlab(&amp;quot;Year&amp;quot;) + 
  ylab(&amp;quot;Number of new infections&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/new_infections-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This is a bit busy. Countries that are highly endemic with good ARV coverage and prevention of infection programs should have a steeper decline in the newly infected people. At first glance, it looks like some of the data points are fairly linear. Let’s go with that assumption, and apply linear regression to each country.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rates_modeled &amp;lt;- 
  new_infections %&amp;gt;% 
  filter(Country %in% sub_sahara) %&amp;gt;% 
  na.omit() %&amp;gt;% 
  gather(Year, NewInfections, 2:9) %&amp;gt;% 
  mutate(Year = as.numeric(Year)) %&amp;gt;% 
  group_by(Country) %&amp;gt;% 
  do(tidy(lm(NewInfections ~ Year, data = .))) %&amp;gt;% 
  filter(term == &amp;quot;Year&amp;quot;) %&amp;gt;% 
  ungroup() %&amp;gt;% 
  mutate(
    Country = fct_reorder(Country, estimate, .desc = TRUE)
  ) %&amp;gt;% 
  arrange(desc(estimate)) %&amp;gt;% 
  select(-one_of(&amp;quot;term&amp;quot;, &amp;quot;statistic&amp;quot;))

rates_modeled %&amp;gt;% 
  head() %&amp;gt;% 
  kable(
    caption = &amp;quot;Results of linear regression: Rate of new infections per year&amp;quot;
  )&lt;/code&gt;&lt;/pre&gt;
&lt;table&gt;
&lt;caption&gt;&lt;span id=&#34;tab:unnamed-chunk-3&#34;&gt;Table 2: &lt;/span&gt;Results of linear regression: Rate of new infections per year&lt;/caption&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Country&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;estimate&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;std.error&lt;/th&gt;
&lt;th align=&#34;right&#34;&gt;p.value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Madagascar&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;469.04762&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;12.56126&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0000000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Côte d’Ivoire&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;190.47619&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;153.99689&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2623441&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Botswana&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;130.95238&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;92.46968&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.2064860&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Mali&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;108.33333&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;23.21683&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0034452&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Congo&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;103.57143&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;16.45271&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0007486&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;Eritrea&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;89.28571&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;23.05347&lt;/td&gt;
&lt;td align=&#34;right&#34;&gt;0.0082374&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rates_modeled %&amp;gt;%
  na.omit() %&amp;gt;% 
  ggplot(aes(x = Country, y = estimate, fill = p.value &amp;gt;= 0.05)) +
  geom_col() +
  coord_flip() +
  theme_bw() +
  ylab(&amp;quot;Estimated change in HIV infection (people/year)&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/rates_model_plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;With a quick look at the plot shown above, we can see that for most countries, a linear model fits the data with a significant p-value cutoff of 0.05.&lt;/p&gt;
&lt;p&gt;It is important to note here that the data we have at hand is from 2010 to 2017. This shows that some countries - notably, South Africa - are on a good trajectory. Botswana, being the “Poster Child” of a good HIV treatment and prevention program, seems to have stabilized in terms of rate of infection, with a positive but insignificant estimate of the rate of infection. This could be explained by the following reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First African country to introduce HAART, 2002&lt;/li&gt;
&lt;li&gt;Progressive in terms of prevention programs&lt;/li&gt;
&lt;li&gt;Looking only from 2010, we are missing the dramatic decline in infection&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://www.who.int/&#34;&gt;WHO&lt;/a&gt; goal is to get 90% of a country’s infected people on HAART, but the last 5-7% might be the hardest to convince&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can combine the ARV and estimated rates of infection data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_on_infection &amp;lt;- 
  arv_dat %&amp;gt;% 
  left_join(rates_modeled, by = &amp;quot;Country&amp;quot;) %&amp;gt;% 
  mutate(p_interpretation = if_else(p.value &amp;gt;= 0.05, &amp;quot;Significant&amp;quot;, &amp;quot;Insignificant&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Column `Country` joining character vector and factor, coercing
## into character vector&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arv_on_infection %&amp;gt;% 
  na.omit() %&amp;gt;% 
  ggplot(aes(x = PersCov, y = estimate, 
             shape = p_interpretation &amp;gt;= 0.05)) +
  geom_point(aes(color = NumberInfected), size = 2) +
  geom_text_repel(aes(label = Country), size = 3) +
  scale_color_gradient(high = &amp;quot;red&amp;quot;, low = &amp;quot;blue&amp;quot;) +
  theme_grey() +
  xlab(&amp;quot;% ARV coverage&amp;quot;) + 
  ylab(&amp;quot;Estimated change in HIV infection\n(people/year)&amp;quot;) +
  ggtitle(&amp;quot;Antiretroviral (ARV) coverage&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2019/2019-05-01-analysing-hiv-pandemic-part-1/2019-05-01-analysing-hiv-pandemic-part-1_files/figure-html/arv_infection-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;South Africa has the highest number of infected people, but on the positive side, has a downward trajectory of about 15000 fewer people newly infected each year. Although ARVs do play a crucial role in controlling this epidemic, it is not the only factor involved. Prevention of mother-to-child transmission has been very successful in South Africa. Awareness campaigns and education are playing a big role as well. The plot above shows our linearly modeled rates.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;the-laboratory-hiv-diagnosis-and-monitoring&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The laboratory, HIV diagnosis and monitoring&lt;/h2&gt;
&lt;p&gt;HIV-related laboratory tests are not the only diagnostics done in a Virology department, but in endemic countries, it accounts for the majority of tests which are done. The first HIV-related test done would be for diagnosis. This is done differently in adults than in infants. As we discussed earlier, after HIV infection, the immune system develops antibodies. We can use a field of study called &lt;strong&gt;serology&lt;/strong&gt; to detect antibodies and antigens, and in most cases, an ELISA test is performed to confirm HIV seroconversion or status. Since the mother’s antibodies will be present in the infant, an ELISA will tell us the baby is positive even though not infected. Infants are diagnosed by detecting viral RNA or DNA in their blood. This is done by PCR (Polymerase Chain Reaction).&lt;/p&gt;
&lt;p&gt;Once a patient is diagnosed as HIV-positive, the patient will be initiated on HAART, and in most cases, the viral load will be suppressed. In the South African public sector treatment program, after HAART initiation, the patient gets two six-monthly viral load tests to make sure viral replication is suppressed. To keep an eye out for trouble, a yearly viral load is done to confirm adherence and effectiveness of the treatment.&lt;/p&gt;
&lt;p&gt;When an unsuppressed viral load is detected, action is taken and adherence counselling is performed. If this does not solve the problem, drug-resistance testing is performed to assess the resistance profile of the infection in order to adjust the ARV regimen accordingly. This is done by isolating the viral RNA, converting it to DNA, amplifying the DNA to sufficient quantities to enable sequencing of the DNA. In our laboratory, we use &lt;a href=&#34;https://en.wikipedia.org/wiki/Sanger_sequencing&#34;&gt;Sanger sequencing&lt;/a&gt;, but other sequencing technologies also exist.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&#34;figure&#34; style=&#34;text-align: center&#34;&gt;&lt;span id=&#34;fig:unnamed-chunk-4&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;/post/2019-05-01-analysis-hiv-pandemic-part-1_files/hxb2genome.gif&#34; alt=&#34;HIV Genome as depicted by the Los Alamos HIV sequence database. Available at https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html&#34; style=&#34;margin:50px 10px&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: HIV Genome as depicted by the Los Alamos HIV sequence database. Available at &lt;a href=&#34;https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html&#34; class=&#34;uri&#34;&gt;https://www.hiv.lanl.gov/content/sequence/HIV/MAP/landmark.html&lt;/a&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;p&gt;This diagram depicts the genome of HIV. The most common targets for interfering with viral replication is located in the &lt;em&gt;pol&lt;/em&gt; gene. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;prot&lt;/strong&gt;: The viral protease. Many of the viral proteins are translated as longer polypeptides, which are then cleaved into mature proteins by the protease.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;p51 RT&lt;/strong&gt;: The viral reverse transcriptase: Each virion contains two copies of viral RNA. The reverse transcriptase converts the RNA to DNA.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;p31 int&lt;/strong&gt;: The viral integrase: This enzyme integrates the reverse transcribed viral DNA into host genomes of the infected cells, and establishes chronic infection.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Essentially, ARVs interfere with these viral enzymes by inhibiting their action:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Protease inhibitors&lt;/strong&gt; prevent the maturation of viral proteins.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reverse transcriptase inhibitors&lt;/strong&gt; prevent the formation of a DNA copy of the viral genome, which then gives the integrase nothing to work with.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrase inhibitors&lt;/strong&gt; prevent the integration of viral DNA into the host genome, which is a crucial part of replication and infection.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Combining these ARVs in clever ways results in HAART or cART. By sequencing the viral RNA, we can detect mutations that cause resistance to specific ARVs. This information is then used to adjust the ARV regimen to once again effectively suppress viral replication.&lt;/p&gt;
&lt;p&gt;The viral reverse transcriptase has a high error rate when doing the conversion of RNA to DNA, and introduces random mutations in the viral genome. In the presence of selective pressure like ARVs, these random mutations might give advantageous phenotypic traits to the replicating virus, like drug resistance. On the other hand, if the patient is properly adhering to the treatment, the viral replication is suppressed, replication does not occur, thus mutations can’t occur.&lt;/p&gt;
&lt;p&gt;This high rate of mutation can be used in the laboratory as one of the quality-control tools. The polymerase chain reaction is prone to contamination, so it is possible when doing these reactions that one sample might contaminate another. This will give rise to false mutations in the contaminated sample and an erroneous result to the treating clinician, thus direct negative impact on the patient.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;what-next&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;What next?&lt;/h2&gt;
&lt;p&gt;In a recent publication in &lt;a href=&#34;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0213241&#34;&gt;PLoS ONE&lt;/a&gt;, the authors described how they used affordable hardware to create a &lt;a href=&#34;https://en.wikipedia.org/wiki/Phylogenetics&#34;&gt;phylogenetic&lt;/a&gt; pipeline, tailored for the HIV drug resistance testing facility.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In &lt;strong&gt;Part 2&lt;/strong&gt; of this four part series, we discuss this pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In &lt;strong&gt;Part 3&lt;/strong&gt;, we will discuss genetic distances and phylogenetics.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Finally, in &lt;strong&gt;Part 4&lt;/strong&gt;, we will look at the application of logistic regression in analyzing inter- and intra-patient genetic distance of viral sequences.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;See you in the next section!&lt;/p&gt;
&lt;/div&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2019/04/30/analysing-hiv-pandemic-part-1/&#39;;&lt;/script&gt;
      </description>
    </item>
    
    <item>
      <title>Shiny in Medicine</title>
      <link>https://rviews.rstudio.com/2017/05/03/shiny-in-medicine/</link>
      <pubDate>Wed, 03 May 2017 00:00:00 +0000</pubDate>
      
      <guid>https://rviews.rstudio.com/2017/05/03/shiny-in-medicine/</guid>
      <description>
        

&lt;p&gt;Shiny Apps are becoming ubiquitous as a way for data scientists to present the results of an analysis, and also to engage with information consumers who may not be coders. The trend I see is that the greater the variety of skills and interests of the information consumers for any particular project, the more valued are interactive visualizations that can be integrated into enterprise-wide communication workflows. So, it is not surprising to see Shiny apps popping up in all manner of healthcare and medical applications. If data scientists are going to bring predictive analytics into clinical workflows where doctors, nurses, scientists, technicians and administrators are all part of near real-time decision processes, they are going to have to be even more inventive in providing these multi-skilled teams with low-friction tools to ingest and manipulate information. Below are few interesting Shiny apps that are broadly related to Health Care and Medicine. My guess is that interactive visualizations like these will improve research and clinical workflows, and eventually change how all of us look at Health Care and Medicine.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://gallery.shinyapps.io/genome_browser/&#34;&gt;Genome viewer&lt;/a&gt; for ICGC cancer, built by the folks at &lt;a href=&#34;http://www.aridhia.com/&#34;&gt;Aridhia&lt;/a&gt;, is geared towards researchers. You can learn how to interpret the plot and learn the story behind its creation &lt;a href=&#34;http://www.aridhia.com/blog/beauty-in-simplicity-visualising-large-scale-genomic-data/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-04-26-shiny-in-medicine_files/app1.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;The &lt;a href=&#34;https://gallery.shinyapps.io/EDsimulation/&#34;&gt;Emergency Department Simulation&lt;/a&gt;, built by a group of mathematicians and physicians, models patient flow information under different assumptions about emergency department case loads, and illustrates how predictive analytics and statistical analysis can be integrated into operational clinical workflows.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-04-26-shiny-in-medicine_files/app2.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;The &lt;a href=&#34;http://riskcalc.org/ColorectalCancer/&#34;&gt;Colorectal Cancer risk calculator&lt;/a&gt; from the Cleveland Clinic targets physicians and the general public to personalize the risk of this disease. I found working through different “what if” scenarios of great help in thinking about the risk factors that are under my control.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-04-26-shiny-in-medicine_files/app3.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;Finally, these two Shiny Apps that provide information about US hospitals should be of interest to public health planners, as well as to the general public. The &lt;a href=&#34;http://datascience-enthusiast.com/R/Hospital_Rankings.html&#34;&gt;Hospital Ranking App&lt;/a&gt; compares hospital outcomes for heart attack, heart failure, and pneumonia against national statistics.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-04-26-shiny-in-medicine_files/app4.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;The &lt;a href=&#34;http://colorado.rstudio.com:3939/content/188/&#34;&gt;Access to Hospital Care Dashboard&lt;/a&gt; plots the density of hospitals throughout the United States, and indicates under-served areas.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;/post/2017-04-26-shiny-in-medicine_files/app5.png&#34; /&gt;

&lt;/div&gt;
&lt;p&gt;If you or your team are doing this kind of work, we here at R Views would love to hear about it.&lt;/p&gt;

        &lt;script&gt;window.location.href=&#39;https://rviews.rstudio.com/2017/05/03/shiny-in-medicine/&#39;;&lt;/script&gt;
      </description>
    </item>
    
  </channel>
</rss>
