R Views
https://rviews.rstudio.com/
Recent content on R ViewsHugo -- gohugo.ioen-usThu, 08 Jul 2021 00:00:00 +0000Exploratory Functional PCA with Sparse Data
https://rviews.rstudio.com/2021/07/08/exploratory-fda-with-sparse-data/
Thu, 08 Jul 2021 00:00:00 +0000https://rviews.rstudio.com/2021/07/08/exploratory-fda-with-sparse-data/
<script src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/header-attrs/header-attrs.js"></script>
<script src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/htmlwidgets/htmlwidgets.js"></script>
<script src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/plotly-binding/plotly.js"></script>
<script src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/typedarray/typedarray.min.js"></script>
<script src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/jquery/jquery.min.js"></script>
<link href="/2021/07/08/exploratory-fda-with-sparse-data/index_files/crosstalk/css/crosstalk.css" rel="stylesheet" />
<script src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/crosstalk/js/crosstalk.min.js"></script>
<link href="/2021/07/08/exploratory-fda-with-sparse-data/index_files/plotly-htmlwidgets-css/plotly-htmlwidgets.css" rel="stylesheet" />
<script src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/plotly-main/plotly-latest.min.js"></script>
<p>I have written about the basics of Functional Data Analysis in three prior posts. In <a href="https://rviews.rstudio.com/2021/05/04/functional-data-analysis-in-r/">Post 1</a>, I used the <a href="https://cran.r-project.org/package=fda"><code>fda</code></a> package to introduce the fundamental concept of using basis vectors to represent longitudinal or time series data as a curve in an abstract vector space. In <a href="https://rviews.rstudio.com/2021/05/14/basic-fda-descriptive-statistics-with-r/">Post 2</a>, I continued to rely on the <code>fda</code> package to show basic FDA descriptive statistics. In <a href="https://rviews.rstudio.com/2021/06/10/functional-pca-with-r/">Post 3</a>, I introduced Functional PCA with the help of the <a href="https://cran.r-project.org/package=fdapace"><code>fdapace</code></a> package. In this post, I pick up where that last post left off and look at how one might explore a sparse, longitudinal data set with the FPCA tools provided in the <code>fdapace</code> package. I will begin by highlighting some of the really nice tools available in the <a href="https://cran.r-project.org/package=brolgar"><code>brolgar</code></a> package for doing exploratory longitudinal data analysis. While the first three posts made do with artificial data constructed with an algorithm for generating data from a Wiener Process, in this post I’ll use the <code>wages</code> data set available in <code>brolgar</code>.</p>
<p><code>brolgar</code> is a beautiful, tidyverse inspired package, based on the <a href="https://tsibble.tidyverts.org/"><code>tsibble</code></a> data structure that offers a number of super helpful functions for visualizing and manipulating longitudinal data. See the arXiv paper <a href="https://arxiv.org/abs/2012.01619">Tierney, Cook and Prvan (2021)</a> for an overview, and the seven package vignettes for examples. Collectively, these vignettes offer a pretty thorough exploration of the <code>wages</code> data set. Using <code>wages</code> for this post should provide a feeling for what additional insight FDA may have to offer.</p>
<div id="longitudinal-data" class="section level3">
<h3>Longitudinal Data</h3>
<p>Let’s look at the data. <code>wages</code> contains measurements on hourly wages associated with years of experience in the workforce along with several covariates for male high school dropouts who were between 14 and 17 years old when first measured. In what follows, I will use only<code>ln_wages</code>, the natural log of wages in 1990 dollars, and <code>xp</code> the number of years of work experience.</p>
<pre class="r"><code>library(brolgar)
library(fdapace)
library(tidyverse)
library(plotly)
dim(wages)</code></pre>
<pre><code>## [1] 6402 9</code></pre>
<pre class="r"><code>head(wages)</code></pre>
<pre><code>## # A tsibble: 6 x 9 [!]
## # Key: id [1]
## id ln_wages xp ged xp_since_ged black hispanic high_grade
## <int> <dbl> <dbl> <int> <dbl> <int> <int> <int>
## 1 31 1.49 0.015 1 0.015 0 1 8
## 2 31 1.43 0.715 1 0.715 0 1 8
## 3 31 1.47 1.73 1 1.73 0 1 8
## 4 31 1.75 2.77 1 2.77 0 1 8
## 5 31 1.93 3.93 1 3.93 0 1 8
## 6 31 1.71 4.95 1 4.95 0 1 8
## # … with 1 more variable: unemploy_rate <dbl></code></pre>
<p>A summary of the variables shows that wages vary from 2 dollars per hour to a high of about 73 dollars per hour. Time varies between 0 and almost 13 years.</p>
<pre class="r"><code>wages %>% select(ln_wages, xp) %>% summary()</code></pre>
<pre><code>## ln_wages xp id
## Min. :0.708 Min. : 0.001 Min. : 31
## 1st Qu.:1.591 1st Qu.: 1.609 1st Qu.: 3194
## Median :1.842 Median : 3.451 Median : 6582
## Mean :1.897 Mean : 3.957 Mean : 6301
## 3rd Qu.:2.140 3rd Qu.: 5.949 3rd Qu.: 9300
## Max. :4.304 Max. :12.700 Max. :12543</code></pre>
<p>Next, we construct the <code>tsibble</code> data structure to make use of some of the very convenient <code>brolgar</code> sampling functions, and count the number of observations for each subject.</p>
<pre class="r"><code>wages_t <- as_tsibble(wages,
key = id,
index = xp,
regular = FALSE)
num_obs <- wages_t %>% features(ln_wages,n_obs)
summary(num_obs$n_obs)</code></pre>
<pre><code>## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 5.00 8.00 7.21 9.00 13.00</code></pre>
<p>Just like the <code>brolgar</code> package vignettes, we filter out subjects with less than 3 observations. Then we use the <code>sample_n_keys()</code> function to generate a random sample of 10 wages versus year’s experience curves and plot them.</p>
<pre class="r"><code>df <- wages_t %>% add_n_obs() %>% filter(n_obs > 3)
set.seed(123)
df %>%
sample_n_keys(size = 10) %>%
ggplot(aes(x = xp, y = ln_wages, group = id, color = id)) +
geom_line()</code></pre>
<p><img src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/figure-html/unnamed-chunk-4-1.png" width="672" /></p>
<p><code>brolgar</code> makes it easy to generates lots of panels with small numbers of curves in order to get a feel for the data.</p>
<pre class="r"><code>df %>% ggplot(aes(x = xp, y = ln_wages, group = id, colour = id)) +
geom_line() +
facet_sample(n_per_facet = 3, n_facets = 20)</code></pre>
<p><img src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/figure-html/unnamed-chunk-5-1.png" width="672" /></p>
<p>Finally, for comparison with the plots produced by <code>fdapace</code> we plot the curves for the first two subjects in the <code>tsibble</code>.</p>
<pre class="r"><code>df %>% filter(id == 31 | id == 36) %>%
ggplot(aes(x = xp, y = ln_wages, group = id, color = id)) +
geom_line()</code></pre>
<p><img src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/figure-html/unnamed-chunk-6-1.png" width="672" /></p>
<p>We finish our initial look at the data by noting that we are really dealing with sparse data here. Some curves have only 4 measurements, no curve has more than 13 measurements, and all subjects were measured at different times. This is a classic longitudinal data set.</p>
</div>
<div id="functional-pca" class="section level3">
<h3>Functional PCA</h3>
<p>As I mentioned in my previous post, Principal Components by Conditional Expectation (PACE), described in <a href="https://anson.ucdavis.edu/~mueller/jasa03-190final.pdf">Yao, Müller & Wang (2005)</a>, was designed for sparse data. The method works by pooling the data. Curves are not individually smoothed. Instead, estimates of the FPC scores are obtained from the entire ensemble of data. (See equation (5) in the reference above.)</p>
<p>The first step towards using FPCA functions in the <a href="https://cran.r-project.org/package=brolgar"><code>fdapace</code></a> package is to reshape the data so that the time and wages data for each subject are stored as lists in separate columns of a <code>tibble</code> where each row contains all of the data for a single id. (Standard <code>dplyr</code> operations might not work as expected on a <code>tsibble</code>.) The following code pulls just the required data into a <code>tibble</code> before the <code>dplyr</code> code in the somewhat untidy ‘for loop’ builds the data structure we will use for the analysis.</p>
<pre class="r"><code>df2 <- df %>% select(id, xp, ln_wages) %>% as_tibble()
uid <- unique(df2$id)
N <- length(uid)
Wages <- rep(0,N)
Exp <- rep(0,N)
for (k in 1:N){
Wages[k] <- df2 %>% filter(id == uid[k]) %>% select(ln_wages) %>% pull() %>% list()
Exp[k] <- df2 %>% filter(id == uid[k]) %>% select(xp) %>% pull() %>% list()
}
df3 <- tibble( uid, Wages, Exp )
glimpse(df3)</code></pre>
<pre><code>## Rows: 764
## Columns: 3
## $ uid <int> 31, 36, 53, 122, 134, 145, 155, 173, 207, 222, 223, 226, 234, 24…
## $ Wages <list> <1.491, 1.433, 1.469, 1.749, 1.931, 1.709, 2.086, 2.129>, <1.98…
## $ Exp <list> <0.015, 0.715, 1.734, 2.773, 3.927, 4.946, 5.965, 6.984>, <0.31…</code></pre>
<p>he <code>FPCA()</code> function computes the functional principal components. Note that in the function call the input data are the two columns of lists we created above. The <code>dataType</code> parameter specifies the data as being sparse. <code>error = FALSE</code> means we are using a simple model that does not account for unobserved error. <code>kernel =</code>epan` means that the we are using the <a href="https://www.gabormelli.com/RKB/Epanechnikov_Kernel">Epanechnikov</a> for smoothing the pooled data to compute the mean and covariance. (For this data set, Epanechnikov seems to yield better results than the default Gaussian kernel.)</p>
<pre class="r"><code>res_wages <- FPCA(df3$Wages,df3$Exp, list(dataType='Sparse', error=FALSE, kernel='epan', verbose=TRUE))</code></pre>
<p>The plot method for the resulting <code>FPCA</code> object provides a visual summary of the results. Going clockwise from the upper left, the Design Plot shows that collectively the data are fairly dense over the support except at the upper range near twelve years. The computed mean function for the data shows a little dip in wages near the beginning and then a clear upward trend until it abruptly drops towards the end. The first three eigenfunctions are plotted at the bottom right, and the scree plot at the bottom left shows that the first eigenfunction accounts for about 60% of the total variation and that it takes about eight eigenfunctions to account for 99% of the variance. Note that the default label for the <em>time</em> access in all of the <code>fdapace</code> plots is <em>s</em> for support.</p>
<pre class="r"><code>plot(res_wages)</code></pre>
<p><img src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/figure-html/unnamed-chunk-9-1.png" width="672" />
We can obtain the exact estimates for the <em>cumulative Fraction of Variance Explained</em> by picking <code>cumFVE</code> out of the <code>FPCA</code> object.</p>
<pre class="r"><code>round(res_wages$cumFVE,3)</code></pre>
<pre><code>## [1] 0.591 0.739 0.806 0.860 0.908 0.936 0.957 0.975 0.983 0.989 0.993</code></pre>
<p>The following plot shows the smoothed curves estimated for the first two subjects. These are the same subject plotted in the third plot above. The circles indicate the input data. Everything looks pretty good here.</p>
<pre class="r"><code>CreatePathPlot(res_wages, subset=1:2, main = 'Estimated Paths for IDs 31 and 36'); grid()</code></pre>
<p><img src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/figure-html/unnamed-chunk-11-1.png" width="672" />
But looking at just one more subject shows how quickly things can apparently go off the rails. The green curve for subject id 53 after 1.77 years is pure algorithmic imagination. Although there are several data points in the first two years, there is nothing thereafter.</p>
<pre class="r"><code>CreatePathPlot(res_wages, subset= 1:3, showMean=TRUE)</code></pre>
<p><img src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/figure-html/unnamed-chunk-12-1.png" width="672" /></p>
<p>The value of the FPCA analysis lies in estimating the mean function and the covariance operator which are constructed from the pooled data, and not in predicting an individual paths.</p>
<p>The covariance surface is easily plotted with the help of the extractor function <code>GetCovSurface()</code> which fetches the time grid and associated covariance surface stored in the <code>FPCA</code> object. These are in the right format for a three dimensional, interactive <code>plotly</code> visualization. Rotating the plot and changing the viewing angle reveals quite a bit about the details of the estimated covariance surface.</p>
<pre class="r"><code>covS <- GetCovSurface(df3$Wages,df3$Exp)
x <- covS$workGrid
Surf <- covS$cov
fig <- plot_ly(x = x, y = x, z = ~Surf) %>%
add_surface(contours = list(
z = list(show=TRUE,usecolormap=TRUE,
highlightcolor="#ff0000", project=list(z=TRUE))))
fig <- fig %>%
layout(scene = list(camera=list(eye = list(x=1.87, y=0.88, z=-0.64))))
fig</code></pre>
<div id="htmlwidget-1" style="width:672px;height:480px;" class="plotly html-widget"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"visdat":{"3e4258572446":["function () ","plotlyVisDat"]},"cur_data":"3e4258572446","attrs":{"3e4258572446":{"x":[0.001,0.25498,0.50896,0.76294,1.01692,1.2709,1.52488,1.77886,2.03284,2.28682,2.5408,2.79478,3.04876,3.30274,3.55672,3.8107,4.06468,4.31866,4.57264,4.82662,5.0806,5.33458,5.58856,5.84254,6.09652,6.3505,6.60448,6.85846,7.11244,7.36642,7.6204,7.87438,8.12836,8.38234,8.63632,8.8903,9.14428,9.39826,9.65224,9.90622,10.1602,10.41418,10.66816,10.92214,11.17612,11.4301,11.68408,11.93806,12.19204,12.44602,12.7],"y":[0.001,0.25498,0.50896,0.76294,1.01692,1.2709,1.52488,1.77886,2.03284,2.28682,2.5408,2.79478,3.04876,3.30274,3.55672,3.8107,4.06468,4.31866,4.57264,4.82662,5.0806,5.33458,5.58856,5.84254,6.09652,6.3505,6.60448,6.85846,7.11244,7.36642,7.6204,7.87438,8.12836,8.38234,8.63632,8.8903,9.14428,9.39826,9.65224,9.90622,10.1602,10.41418,10.66816,10.92214,11.17612,11.4301,11.68408,11.93806,12.19204,12.44602,12.7],"z":{},"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"surface","contours":{"z":{"show":true,"usecolormap":true,"highlightcolor":"#ff0000","project":{"z":true}}},"inherit":true}},"layout":{"margin":{"b":40,"l":60,"t":25,"r":10},"scene":{"camera":{"eye":{"x":1.87,"y":0.88,"z":-0.64}},"xaxis":{"title":[]},"yaxis":{"title":[]},"zaxis":{"title":"Surf"}},"hovermode":"closest","showlegend":false,"legend":{"yanchor":"top","y":0.5}},"source":"A","config":{"showSendToCloud":false},"data":[{"colorbar":{"title":"Surf","ticklen":2,"len":0.5,"lenmode":"fraction","y":1,"yanchor":"top"},"colorscale":[["0","rgba(68,1,84,1)"],["0.0416666666666667","rgba(70,19,97,1)"],["0.0833333333333334","rgba(72,32,111,1)"],["0.125","rgba(71,45,122,1)"],["0.166666666666667","rgba(68,58,128,1)"],["0.208333333333333","rgba(64,70,135,1)"],["0.25","rgba(60,82,138,1)"],["0.291666666666667","rgba(56,93,140,1)"],["0.333333333333333","rgba(49,104,142,1)"],["0.375","rgba(46,114,142,1)"],["0.416666666666667","rgba(42,123,142,1)"],["0.458333333333333","rgba(38,133,141,1)"],["0.5","rgba(37,144,140,1)"],["0.541666666666667","rgba(33,154,138,1)"],["0.583333333333333","rgba(39,164,133,1)"],["0.625","rgba(47,174,127,1)"],["0.666666666666667","rgba(53,183,121,1)"],["0.708333333333333","rgba(79,191,110,1)"],["0.75","rgba(98,199,98,1)"],["0.791666666666667","rgba(119,207,85,1)"],["0.833333333333333","rgba(147,214,70,1)"],["0.875","rgba(172,220,52,1)"],["0.916666666666667","rgba(199,225,42,1)"],["0.958333333333333","rgba(226,228,40,1)"],["1","rgba(253,231,37,1)"]],"showscale":true,"x":[0.001,0.25498,0.50896,0.76294,1.01692,1.2709,1.52488,1.77886,2.03284,2.28682,2.5408,2.79478,3.04876,3.30274,3.55672,3.8107,4.06468,4.31866,4.57264,4.82662,5.0806,5.33458,5.58856,5.84254,6.09652,6.3505,6.60448,6.85846,7.11244,7.36642,7.6204,7.87438,8.12836,8.38234,8.63632,8.8903,9.14428,9.39826,9.65224,9.90622,10.1602,10.41418,10.66816,10.92214,11.17612,11.4301,11.68408,11.93806,12.19204,12.44602,12.7],"y":[0.001,0.25498,0.50896,0.76294,1.01692,1.2709,1.52488,1.77886,2.03284,2.28682,2.5408,2.79478,3.04876,3.30274,3.55672,3.8107,4.06468,4.31866,4.57264,4.82662,5.0806,5.33458,5.58856,5.84254,6.09652,6.3505,6.60448,6.85846,7.11244,7.36642,7.6204,7.87438,8.12836,8.38234,8.63632,8.8903,9.14428,9.39826,9.65224,9.90622,10.1602,10.41418,10.66816,10.92214,11.17612,11.4301,11.68408,11.93806,12.19204,12.44602,12.7],"z":[[0.0331322070546811,0.0318695963697909,0.0306084275995855,0.0293443616254352,0.028072629395297,0.0267906480987916,0.0255006756921525,0.0242120268543572,0.0229422198561734,0.0217163929495041,0.0205645744597996,0.0195170186693932,0.0185987116402154,0.0178248600414893,0.0171991275937116,0.0167153708708982,0.0163621251457203,0.016127955920449,0.0160055878036825,0.0159933918024623,0.0160938921540552,0.0163099523455379,0.0166398999476985,0.0170729652007625,0.0175861660276314,0.0181434002723871,0.018697160461305,0.0191929524421927,0.0195760566546932,0.0197996745792023,0.0198329265466436,0.019666951979124,0.0193177557384191,0.0188253291304183,0.0182494927870589,0.0176634036179944,0.0171456377144425,0.0167714924650897,0.0166040332006384,0.0166856083812501,0.0170309009009039,0.0176227077845935,0.0184112786443383,0.0193172496887705,0.0202373545430986,0.0210516141248012,0.0216308185565193,0.0218436758463872,0.0215636271123677,0.0206756173712682,0.0190828850096431],[0.0318695963697909,0.031511328351703,0.0310844213553877,0.0305886167447588,0.0300221313024635,0.0293838087120821,0.0286753652366703,0.0279034036844076,0.0270807724091334,0.0262268026862442,0.0253660533977032,0.0245255219649181,0.0237308522495712,0.0230026783208871,0.0223544820291294,0.0217928829847496,0.0213202272841608,0.0209382638901337,0.0206512098636519,0.0204668224419947,0.0203949617226654,0.0204440719199803,0.0206166411527903,0.0209048771066281,0.0212876675483224,0.0217295972350372,0.0221825046990564,0.0225897571489521,0.0228929803263033,0.0230403620782594,0.0229950257762506,0.0227416870288337,0.0222901364657058,0.0216749595037037,0.0209518737204194,0.0201916473732674,0.0194725899040552,0.0188723116905523,0.0184592270070578,0.0182843430388284,0.0183741443774614,0.0187255391079382,0.0193036144352677,0.0200423581397086,0.0208478281555037,0.021602876289728,0.0221726533066038,0.0224106243518809,0.0221653460138802,0.0212884113617055,0.0196435736264059],[0.0306084275995855,0.0310844213553877,0.0314203443157761,0.0316211732112498,0.0316899997689659,0.0316296698778145,0.0314444829686355,0.031141735926594,0.0307328651729079,0.0302339329167516,0.0296652411030064,0.0290499875995216,0.0284121506968899,0.0277741613522838,0.0271552007357069,0.026570865378461,0.0260343645616434,0.0255586067552207,0.0251579795896333,0.0248486796578694,0.0246470601734909,0.024566269920893,0.0246120535026586,0.0247787839835087,0.0250466839220888,0.0253809574004011,0.0257333265645521,0.0260462056967514,0.0262593329966311,0.0263180703825426,0.0261819359281641,0.025831592457205,0.0252727742295653,0.0245364639953456,0.0236756287150415,0.0227594766668228,0.0218662830175306,0.0210755270415093,0.0204597776663962,0.0200767218002543,0.019961924623402,0.0201230862453322,0.0205364563355298,0.0211456469793341,0.0218625706246616,0.0225699549560327,0.0231250360536958,0.0233645023153083,0.023111210667939,0.0221832362790041,0.0204052535287975],[0.0293443616254352,0.0305886167447588,0.0316211732112498,0.0324527742841928,0.0330926122589903,0.033549427345952,0.0338325744107203,0.0339529229691742,0.0339234846317267,0.0337597059483445,0.0334794012503232,0.0331023272155123,0.0326494526229137,0.0321420923387922,0.0316012214453457,0.031047322714557,0.0305008998002475,0.02998334029036,0.0295174074411659,0.0291265943169415,0.0288329729453783,0.0286537846443632,0.0285975020695321,0.0286602597534576,0.0288234546599859,0.0290531349281225,0.0293016258764072,0.0295116403872773,0.0296227592326892,0.0295795907918275,0.0293402725455583,0.028883589112671,0.0282131669338688,0.0273579821762354,0.0263694175927364,0.0253158112109128,0.0242755760487848,0.0233296661615941,0.022553794222206,0.0220106712168221,0.0217426696601811,0.0217654972684213,0.0220634583583041,0.0225866023071851,0.0232496861286319,0.0239327005742675,0.0244828996383591,0.0247187370588077,0.0244365136706508,0.0234204805254652,0.021456414903528],[0.028072629395297,0.0300221313024635,0.0316899997689659,0.0330926122589903,0.0342457668280061,0.035165226524543,0.035867100741192,0.036367996968833,0.0366849499816742,0.0368352230856204,0.0368361319229153,0.036705028868662,0.0364595123559421,0.0361178439234699,0.0356995164882531,0.0352259038660699,0.0347208562860593,0.0342109531388896,0.0337249828906253,0.0332922721567286,0.0329398044529304,0.0326885091584365,0.0325494155058391,0.0325204245478211,0.0325843242153551,0.0327085147322396,0.0328467995228084,0.032943463587786,0.0329395682686644,0.0327808689153244,0.0324261328975807,0.0318542158157446,0.0310683701757549,0.0300969734622301,0.0289908442452867,0.0278180542076627,0.0266573229710324,0.0255907851168261,0.0246965027479584,0.0240408889623787,0.0236712848970544,0.023609124657479,0.0238441881948054,0.0243302842342403,0.0249824555037043,0.0256757120523867,0.0262455376714748,0.0264908882983998,0.0261807728004444,0.0250653643770753,0.0228917047983721],[0.0267906480987916,0.0293838087120821,0.0316296698778145,0.033549427345952,0.035165226524543,0.0365001589179228,0.0375779327997819,0.0384222209715625,0.0390557934481177,0.0394996564073907,0.0397724863651319,0.0398906214993328,0.039868742804271,0.0397211907255714,0.0394636825816669,0.0391150591393827,0.0386985955588714,0.0382423748739331,0.0377783067772102,0.0373396493141163,0.0369573124635487,0.0366556042743079,0.0364482293719683,0.0363352361293477,0.0363013778603103,0.0363161774963071,0.0363359156859208,0.0363077011780925,0.0361755685166264,0.0358880933379644,0.0354064210283272,0.03471116976732,0.0338067269710181,0.0327221039924163,0.0315084612945201,0.0302341663872877,0.0289784558548719,0.0278244848982743,0.0268521019279102,0.0261304237510619,0.0257103226997689,0.0256171332160939,0.0258440117422347,0.0263463244850819,0.0270372918356324,0.0277851146244285,0.0284120989309966,0.0286967926700521,0.028380506127086,0.0271793781075005,0.0248021151327367],[0.0255006756921525,0.0286753652366703,0.0314444829686355,0.0338325744107203,0.035867100741192,0.0375779327997819,0.0389963219307895,0.0401534282277709,0.0410786174610846,0.0417978607923005,0.0423326308575331,0.0426996401654115,0.0429116121443023,0.0429790556104094,0.0429127816213682,0.0427266795680036,0.0424400690487865,0.0420788384185495,0.0416747176013895,0.0412625019030786,0.0408757045452815,0.0405416265908022,0.0402769245879492,0.040084463293372,0.0399518369263898,0.0398516815407925,0.0397438379186396,0.039579424852938,0.0393067502032849,0.0388786081930561,0.0382599652836571,0.0374346055545078,0.0364093221534189,0.0352148242199604,0.0339034295561093,0.0325443539260933,0.0312176325491698,0.0300074319152829,0.0289950491331545,0.0282515992223047,0.0278303977209859,0.0277592460923794,0.0280330007461299,0.0286068255988149,0.0293904679760304,0.0302439668339692,0.0309755461573968,0.0313429736802349,0.0310600244238471,0.0298094247004417,0.0272624847552774],[0.0242120268543572,0.0279034036844076,0.031141735926594,0.0339529229691742,0.036367996968833,0.0384222209715625,0.0401534282277709,0.0415997646370282,0.0427971577911487,0.0437769617221728,0.0445642600859563,0.0451772221872844,0.0456277189660057,0.04592318372548,0.0460694929360384,0.0460744279739409,0.0459510134802218,0.0457197707341108,0.0454089123023939,0.045051980609218,0.0446833151434125,0.0443325374328273,0.0440194809255232,0.0437506202194773,0.0435174524559551,0.0432968591513079,0.0430533531143708,0.0427431422914126,0.0423198800384787,0.041741670224981,0.0409784095422716,0.0400181407658664,0.0388710841047306,0.0375705432865608,0.0361707306601078,0.034742269401235,0.0333663571800869,0.0321283085190226,0.0311107307504811,0.0303862751998386,0.0300098953209992,0.0300107465165768,0.0303840689693109,0.0310834720514554,0.0320140457697783,0.0330268552853245,0.0339157681373235,0.0344181265782517,0.0342211476124548,0.0329756318229061,0.0303172817711952],[0.0229422198561734,0.0270807724091334,0.0307328651729079,0.0339234846317267,0.0366849499816742,0.0390557934481177,0.0410786174610846,0.0427971577911487,0.0442529887701707,0.0454824432758555,0.0465143235143619,0.0473688329525233,0.0480579198058665,0.048586988191066,0.0489577712307832,0.0491720297546636,0.0492355165620847,0.0491612917511296,0.0489712182000815,0.0486947136249981,0.0483647338695585,0.0480120695132449,0.0476596314143326,0.047318167274733,0.0469841324033539,0.0466397997562264,0.0462554189796303,0.04579321924752,0.045213025409274,0.0444790265391244,0.0435668192610594,0.042469482036325,0.0412014299600058,0.0397992897434228,0.0383198276984895,0.0368356380231976,0.035429515100045,0.0341881768360146,0.0331955548265636,0.0325255539276368,0.0322341671793313,0.032351035192074,0.0328707640014087,0.0337444298930393,0.0348717546066247,0.0360946142218224,0.0371929840285189,0.0378850222502866,0.0378333851627186,0.0366595435632467,0.033966503058887],[0.0217163929495041,0.0262268026862442,0.0302339329167516,0.0337597059483445,0.0368352230856204,0.0394996564073907,0.0417978607923005,0.0437769617221728,0.0454824432758555,0.0469544296254086,0.0482248415510343,0.0493159068889525,0.0502401949657233,0.0510020717135082,0.051600337508263,0.0520317811264457,0.0522952875855835,0.0523958186937653,0.052347151470331,0.0521721525117859,0.0518999967310636,0.051560940778863,0.0511802713868282,0.0507732101226357,0.0503419353831115,0.0498750765127259,0.0493495297935976,0.0487342891617836,0.0479959315171738,0.0471052125440611,0.0460438847500474,0.0448105437488646,0.0434243246084322,0.0419257373646328,0.0403746734207767,0.0388462433236286,0.0374253059640954,0.0362003045305237,0.0352565956086472,0.0346691587521296,0.0344945578485685,0.0347622238454742,0.0354653549447156,0.0365518603713253,0.0379158575355582,0.0393904494187039,0.040742988478514,0.0416746727400368,0.0418267364762657,0.0407951715946082,0.0381544909245256],[0.0205645744597996,0.0253660533977032,0.0296652411030064,0.0334794012503232,0.0368361319229153,0.0397724863651319,0.0423326308575331,0.0445642600859563,0.0465143235143619,0.0482248415510343,0.0497295966756182,0.0510522508284168,0.0522060672925228,0.0531950813066915,0.0540164141691486,0.0546634615566294,0.0551297349076588,0.0554129627094953,0.055518616756928,0.0554616372084989,0.0552653191753962,0.0549572900391486,0.0545637429668059,0.0541037683390233,0.053585385320088,0.053004092967106,0.0523440333779606,0.051581464630932,0.0506900638309376,0.0496473902063883,0.0484415545052407,0.0470769076000458,0.0455776199433563,0.0439884858786634,0.0423729877906212,0.0408092384462426,0.0393846008910634,0.038189555773451,0.0373109890085399,0.0368247986457371,0.0367877087338195,0.0372283639033316,0.038137988574964,0.0394610180011997,0.0410862019494618,0.0428389243668108,0.0444759976719848,0.0456848692074931,0.0460896282272149,0.0452658894575249,0.0427651789303845],[0.0195170186693932,0.0245255219649181,0.0290499875995216,0.0331023272155123,0.036705028868662,0.0398906214993328,0.0426996401654115,0.0451772221872844,0.0473688329525233,0.0493159068889525,0.0510522508284168,0.0526018411772825,0.0539782445988262,0.0551855000657116,0.056220099760657,0.0570737433973107,0.0577366805901296,0.058201457900842,0.0584665898396193,0.0585392042675024,0.0584355150619547,0.0581784704229294,0.0577930229923928,0.0573005020972329,0.0567138801987863,0.0560352553140835,0.0552560839383807,0.054360049409957,0.053328049662346,0.0521445050236724,0.0508039225309999,0.0493164881911971,0.0477115747013433,0.0460385277666891,0.0443647670349988,0.04277178250366,0.0413497730546252,0.0401914639491365,0.0393852824208924,0.0390078286098411,0.0391155741302234,0.0397358852831468,0.0408576437369137,0.0424218328801814,0.0443125440979638,0.046349115163758,0.0482806604983484,0.0497849724381056,0.0504742642018614,0.0499099438517527,0.0476271646380002],[0.0185987116402154,0.0237308522495712,0.0284121506968899,0.0326494526229137,0.0364595123559421,0.039868742804271,0.0429116121443023,0.0456277189660057,0.0480579198058665,0.0502401949657233,0.0522060672925228,0.0539782445988262,0.0555697905486779,0.0569847184814562,0.0582196437939824,0.0592661127814069,0.0601133755308818,0.0607514927861801,0.0611745694532348,0.0613835618155174,0.061387750097617,0.0612040069372007,0.0608536489163157,0.0603576672737655,0.0597318805962031,0.058983602697914,0.0581108584551825,0.0571044154250173,0.0559522457302509,0.0546455605943084,0.0531852375642895,0.0515873300079298,0.0498865216933156,0.0481368933504082,0.0464100300839153,0.0447910174000128,0.043373031408999,0.0422510429655798,0.041514840834889,0.0412413717499742,0.0414863944493764,0.042275579018315,0.0435953084654497,0.0453834893143447,0.0475207426909342,0.0498226133977412,0.0520340121657551,0.0538278673371518,0.0548105000575356,0.0545360015284652,0.052530482460023],[0.0178248600414893,0.0230026783208871,0.0277741613522838,0.0321420923387922,0.0361178439234699,0.0397211907255714,0.0429790556104094,0.04592318372548,0.048586988191066,0.0510020717135082,0.0531950813066915,0.0551855000657116,0.0569847184814562,0.0585963735587847,0.0600176639883751,0.061241256945453,0.062257490072733,0.0630567248468819,0.0636317684888794,0.0639801498475518,0.0641057577250964,0.0640191546452811,0.0637360515952995,0.063274057972641,0.062648625762394,0.0618696058873551,0.0609397352232774,0.0598557726821016,0.0586122210548296,0.0572068762776654,0.0556469736621194,0.0539545341813076,0.0521697166573959,0.0503515180409112,0.0485758316043286,0.0469313817098358,0.0455142129926303,0.0444212530086321,0.0437431948848057,0.0435567709287507,0.0439164860155415,0.0448459744950298,0.0463292085754549,0.0483017799036967,0.0506425125125869,0.0531659366943929,0.0556167609033422,0.0576682820873136,0.0589272621375833,0.0589476194755151,0.0572539355838214],[0.0171991275937116,0.0223544820291294,0.0271552007357069,0.0316012214453457,0.0356995164882531,0.0394636825816669,0.0429127816213682,0.0460694929360384,0.0489577712307832,0.051600337508263,0.0540164141691486,0.056220099760657,0.0582196437939824,0.0600176639883751,0.0616121307218852,0.0629978135304456,0.0641678843030938,0.0651154760881114,0.0658351243262892,0.0663240697163271,0.0665833174235592,0.0666181657746142,0.0664378041517355,0.0660537395296178,0.0654772992279524,0.0647170520509045,0.0637773114496324,0.0626586901024346,0.0613610332114099,0.0598882547154044,0.0582539468743659,0.0564863485347136,0.0546314284187607,0.0527533869635769,0.050932558636991,0.0492612081923167,0.0478378833053571,0.0467608548245097,0.0461209398623877,0.0459938501190088,0.0464321946920124,0.0474573139732284,0.0490511191524726,0.0511480492320706,0.0536272691498109,0.0563055137478691,0.0589316217540477,0.0611846483990364,0.062678081017518,0.0629725651941309,0.0615982752030172],[0.0167153708708982,0.0217928829847496,0.026570865378461,0.031047322714557,0.0352259038660699,0.0391150591393827,0.0427266795680036,0.0460744279739409,0.0491720297546636,0.0520317811264457,0.0546634615566294,0.0570737433973107,0.0592661127814069,0.061241256945453,0.0629978135304456,0.0645333100972503,0.0658450782765156,0.0669309523068001,0.0677896675792781,0.068421012364542,0.0688258673468949,0.0690062249473209,0.0689651310582874,0.0687063639849809,0.0682337313746592,0.0675501867353708,0.0666573746184522,0.0655563892916451,0.0642502577019007,0.0627480047056168,0.0610694440316023,0.0592494193188867,0.0573402919545905,0.0554119769009098,0.0535494934426646,0.0518485076341788,0.0504095219009486,0.0493312595848625,0.0487035830371259,0.0486001403838893,0.0490708988766152,0.0501347147520288,0.051772022780174,0.0539176208888705,0.0564535281944212,0.0592021990009788,0.0619210517798117,0.0643001544683514,0.0659655850158709,0.0664909288536984,0.0654181840902741],[0.0163621251457203,0.0213202272841608,0.0260343645616434,0.0305008998002475,0.0347208562860593,0.0386985955588714,0.0424400690487865,0.0459510134802218,0.0492355165620847,0.0522952875855835,0.0551297349076588,0.0577366805901296,0.0601133755308818,0.062257490072733,0.0641678843030938,0.0658450782765156,0.0672913751291343,0.0685105823250321,0.0695073140434355,0.070285971913449,0.0708496397913598,0.0711992024309695,0.0713329478099026,0.0712467515840892,0.0709347669694883,0.070390491316937,0.069608214532828,0.0685850591094622,0.0673238537979696,0.0658367880766665,0.0641492850889093,0.0623031293376317,0.0603578749677877,0.0583899685967409,0.0564896130930861,0.0547558642479761,0.0532906154317949,0.0521920250661299,0.0515477453532323,0.0514281618677676,0.051879779055568,0.0529188175908524,0.0545249656843273,0.0566350945847835,0.0591367698553657,0.061861739126727,0.064580298867567,0.0669983577770905,0.068759722155404,0.0694561217415781,0.0686463861879824],[0.016127955920449,0.0209382638901337,0.0255586067552207,0.02998334029036,0.0342109531388896,0.0382423748739331,0.0420788384185495,0.0457197707341108,0.0491612917511296,0.0523958186937653,0.0554129627094953,0.058201457900842,0.0607514927861801,0.0630567248468819,0.0651154760881114,0.0669309523068001,0.0685105823250321,0.069864660771131,0.0710044800140192,0.0719401604443124,0.0726784629233871,0.0732209509670114,0.0735629059229249,0.0736933452404863,0.0735963371236817,0.0732535753663923,0.0726479601160189,0.0717678054186407,0.0706112552626868,0.0691904418657728,0.0675348095869117,0.065692947907517,0.0637323921756593,0.0617372112051954,0.0598036597815166,0.0580345032357235,0.0565326927574597,0.0553949301746461,0.0547054523864568,0.0545301977218941,0.0549113968697175,0.0558625014438278,0.057363196237199,0.0593541128195264,0.0617309347674406,0.0643380045051197,0.0669623185873697,0.0693297398868033,0.0711059789698952,0.0719049280351492,0.0713059001829667],[0.0160055878036825,0.0206512098636519,0.0251579795896333,0.0295174074411659,0.0337249828906253,0.0377783067772102,0.0416747176013895,0.0454089123023939,0.0489712182000815,0.052347151470331,0.055518616756928,0.0584665898396193,0.0611745694532348,0.0636317684888794,0.0658351243262892,0.0677896675792781,0.0695073140434355,0.0710044800140192,0.0722990125927399,0.0734068810914066,0.0743390128301715,0.0750986315464315,0.0756794756670089,0.076065315001788,0.0762311856672073,0.0761466265710056,0.075780857900844,0.0751093385037901,0.0741206534827394,0.0728224358154073,0.0712451420777093,0.0694429483429996,0.067491651440042,0.0654840517630202,0.0635236807961097,0.0617178308359329,0.060170681698369,0.0589770306205514,0.0582168546282454,0.0579507298666269,0.0582159700752847,0.0590231699298102,0.0603526522165691,0.0621502240248153,0.0643218025811392,0.0667269834018504,0.0691724614632012,0.0714071769938292,0.0731217847031828,0.0739551043288354,0.0735092539496809],[0.0159933918024623,0.0204668224419947,0.0248486796578694,0.0291265943169415,0.0332922721567286,0.0373396493141163,0.0412625019030786,0.045051980609218,0.0486947136249981,0.0521721525117859,0.0554616372084989,0.0585392042675024,0.0613835618155174,0.0639801498475518,0.0663240697163271,0.068421012364542,0.070285971913449,0.0719401604443124,0.0734068810914066,0.0747071348803353,0.0758555631684153,0.0768571240245939,0.0777047859678467,0.0783785506966801,0.0788462522396025,0.0790666738538563,0.0789953500438669,0.0785928135615148,0.0778340945249359,0.0767174138310883,0.0752698128492905,0.0735482048684632,0.0716357088710055,0.0696344276166984,0.0676564827666857,0.0658150176284293,0.0642163197260308,0.0629535863969765,0.0621023879018121,0.061717597256679,0.0618313585729791,0.0624514729788767,0.0635594121154091,0.0651071451363291,0.0670122289257222,0.0691512267404509,0.0713524169495847,0.0733897236751991,0.0749805227340939,0.0757900605957289,0.075444345101303],[0.0160938921540552,0.0203949617226654,0.0246470601734909,0.0288329729453783,0.0329398044529304,0.0369573124635487,0.0408757045452815,0.0446833151434125,0.0483647338695585,0.0518999967310636,0.0552653191753962,0.0584355150619547,0.061387750097617,0.0641057577250964,0.0665833174235592,0.0688258673468949,0.0708496397913598,0.0726784629233871,0.0743390128301715,0.0758555631684153,0.0772451452905528,0.0785136830202586,0.0796533474708173,0.0806412705413001,0.0814399083834682,0.0819996339966568,0.0822642710045249,0.0821798978394456,0.081706164761703,0.0808278827214659,0.0795636374776064,0.0779685686437122,0.0761303232849857,0.0741595405776082,0.0721777216884198,0.070305399846015,0.0686525563853742,0.0673120344566068,0.0663558236564226,0.0658336118704106,0.0657727555273533,0.0661786504479392,0.0670343867389329,0.0682986586983935,0.0699012934552287,0.0717364804592963,0.0736547254506154,0.0754555175425166,0.0768834080097065,0.0776303138573719,0.0773460695850045],[0.0163099523455379,0.0204440719199803,0.024566269920893,0.0286537846443632,0.0326885091584365,0.0366556042743079,0.0405416265908022,0.0443325374328273,0.0480120695132449,0.051560940778863,0.0549572900391486,0.0581784704229294,0.0612040069372007,0.0640191546452811,0.0666181657746142,0.0690062249473209,0.0711992024309695,0.0732209509670114,0.0750986315464315,0.0768571240245939,0.0785136830202586,0.0800736658544262,0.081527673681703,0.0828501372891118,0.0839994229280531,0.084919882111281,0.0855466525199673,0.085813999688535,0.0856671466414864,0.0850758390493886,0.0840460870116161,0.0826259658085248,0.080902954094887,0.0789934166472068,0.0770276203127833,0.0751345617927553,0.0734298575278577,0.072008113536416,0.0709396406065027,0.0702704963000985,0.0700244602327785,0.0702054224577551,0.0707986936857298,0.0717699947986146,0.0730614264944729,0.0745845358079433,0.0762115637406577,0.0777669010031268,0.079021472046981,0.0796929205297642,0.0794537853683119],[0.0166398999476985,0.0206166411527903,0.0246120535026586,0.0285975020695321,0.0325494155058391,0.0364482293719683,0.0402769245879492,0.0440194809255232,0.0476596314143326,0.0511802713868282,0.0545637429668059,0.0577930229923928,0.0608536489163157,0.0637360515952995,0.0664378041517355,0.0689651310582874,0.0713329478099026,0.0735629059229249,0.0756794756670089,0.0777047859678467,0.0796533474708173,0.081527673681703,0.0833153353332167,0.0849875216533874,0.0864990551751751,0.0877900769011238,0.0887900878799529,0.0894252995595545,0.0896298066760721,0.0893596351395245,0.0886065868477193,0.0874072786352219,0.0858433748977925,0.0840320495417437,0.0821095463954886,0.0802129743792947,0.0784651138009087,0.0769648698729929,0.0757836661338225,0.0749665006787499,0.0745356841725799,0.0744951355422329,0.0748332970184869,0.0755231887018097,0.0765188450798398,0.0777482904039718,0.0791041891881894,0.0804342146468295,0.0815338496952199,0.0821445328592555,0.0819594966089163],[0.0170729652007625,0.0209048771066281,0.0247787839835087,0.0286602597534576,0.0325204245478211,0.0363352361293477,0.040084463293372,0.0437506202194773,0.047318167274733,0.0507732101226357,0.0541037683390233,0.0573005020972329,0.0603576672737655,0.063274057972641,0.0660537395296178,0.0687063639849809,0.0712467515840892,0.0736933452404863,0.076065315001788,0.0783785506966801,0.0806412705413001,0.0828501372891118,0.0849875216533874,0.0870201393595641,0.0888990586348729,0.0905611958981878,0.0919328005582537,0.0929357782139176,0.0934975781889819,0.093564362644805,0.0931152728196424,0.0921736423997681,0.0908105091740811,0.08913783903358,0.0872928194407002,0.0854180751676755,0.0836436352676376,0.0820748008423383,0.0807872298262874,0.0798281487438137,0.0792212833346261,0.0789727505795509,0.0790754419614093,0.0795101149518328,0.0802423419962743,0.081215495753131,0.0823409428008623,0.0834874856536821,0.0844727420062443,0.0850593943383111,0.0849588020344645],[0.0175861660276314,0.0212876675483224,0.0250466839220888,0.0288234546599859,0.0325843242153551,0.0363013778603103,0.0399518369263898,0.0435174524559551,0.0469841324033539,0.0503419353831115,0.053585385320088,0.0567138801987863,0.0597318805962031,0.062648625762394,0.0654772992279524,0.0682337313746592,0.0709347669694883,0.0735963371236817,0.0762311856672073,0.0788462522396025,0.0814399083834682,0.0839994229280531,0.0864990551751751,0.0888990586348729,0.0911457744978129,0.0931730359723613,0.0949053043926594,0.096263180544068,0.0971719073718473,0.0975728659467336,0.097436692815716,0.0967748746785305,0.0956455762095412,0.0941502865248955,0.0924209006149407,0.0906006525186964,0.0888246222175502,0.0872050955894278,0.0858245059766227,0.0847356952841487,0.0839671130397174,0.0835297202332216,0.0834225616148196,0.0836348243437206,0.0841433537785074,0.0849057710511616,0.0858503692034513,0.0868648084228072,0.0877862633440928,0.0883959647032726,0.0884207553569314],[0.0181434002723871,0.0217295972350372,0.0253809574004011,0.0290531349281225,0.0327085147322396,0.0363161774963071,0.0398516815407925,0.0432968591513079,0.0466397997562264,0.0498750765127259,0.053004092967106,0.0560352553140835,0.058983602697914,0.0618696058873551,0.0647170520509045,0.0675501867353708,0.070390491316937,0.0732535753663923,0.0761466265710056,0.0790666738538563,0.0819996339966568,0.084919882111281,0.0877900769011238,0.0905611958981878,0.0931730359723613,0.0955556456009848,0.0976322345247785,0.0993240829971134,0.100557818214925,0.101275000146099,0.101443110451299,0.101065832101195,0.100189455009278,0.0989022326285972,0.097325211198836,0.0955960861991019,0.0938504714530358,0.0922059136606917,0.0907525481483566,0.0895514382434187,0.0886389000624342,0.0880335342703454,0.0877424884744388,0.0877643064447572,0.0880870380626791,0.0886816214752877,0.0894916606878204,0.0904215758088382,0.0913257358299202,0.0920015175738187,0.0921890192445248],[0.018697160461305,0.0221825046990564,0.0257333265645521,0.0293016258764072,0.0328467995228084,0.0363359156859208,0.0397438379186396,0.0430533531143708,0.0462554189796303,0.0493495297935976,0.0523440333779606,0.0552560839383807,0.0581108584551825,0.0609397352232774,0.0637773114496324,0.0666573746184522,0.069608214532828,0.0726479601160189,0.075780857900844,0.0789953500438669,0.0822642710045249,0.0855466525199673,0.0887900878799529,0.0919328005582537,0.0949053043926594,0.0976322345247785,0.10003516025407,0.102036986984579,0.103568146491787,0.104574317256765,0.105024910606696,0.104920958426604,0.104300434195248,0.103238788654741,0.101843106124674,0.10024003194464,0.098559962311594,0.0969216927634622,0.0954216709187161,0.0941300950979937,0.093093363656403,0.0923401727443969,0.091887716123396,0.0917449588334424,0.0919112734279505,0.0923701939418145,0.0930792672556463,0.0939578908043491,0.0948756865613284,0.0956443435828908,0.0960157406076288],[0.0191929524421927,0.0225897571489521,0.0260462056967514,0.0295116403872773,0.032943463587786,0.0363077011780925,0.039579424852938,0.0427431422914126,0.04579321924752,0.0487342891617836,0.051581464630932,0.054360049409957,0.0571044154250173,0.0598557726821016,0.0626586901024346,0.0655563892916451,0.0685850591094622,0.0717678054186407,0.0751093385037901,0.0785928135615148,0.0821798978394456,0.085813999688535,0.0894252995595545,0.0929357782139176,0.096263180544068,0.0993240829971134,0.102036986984579,0.104326290372479,0.106127404614697,0.10739266324478,0.108097273867193,0.108244393823937,0.10786831546042,0.107034659498267,0.105836563213491,0.104386463396965,0.102804375737443,0.101205133078558,0.0996879074202355,0.0983307132578995,0.0971905928021063,0.0963078613964445,0.0957113408970778,0.0954214646357943,0.0954491972577069,0.0957901778901475,0.0964148164781206,0.0972560718556443,0.0981973626594604,0.0990635044577494,0.0996175316874479],[0.0195760566546932,0.0228929803263033,0.0262593329966311,0.0296227592326892,0.0329395682686644,0.0361755685166264,0.0393067502032849,0.0423198800384787,0.045213025409274,0.0479959315171738,0.0506900638309376,0.053328049662346,0.0559522457302509,0.0586122210548296,0.0613610332114099,0.0642502577019007,0.0673238537979696,0.0706112552626868,0.0741206534827394,0.0778340945249359,0.081706164761703,0.0856671466414864,0.0896298066760721,0.0934975781889819,0.0971719073718473,0.100557818214925,0.103568146491787,0.106127404614697,0.108175829322502,0.109673444920823,0.110603525948291,0.110974840307782,0.110822298930238,0.110205812197299,0.109207117067596,0.107924291723597,0.106464007765531,0.104932444018865,0.103426784755773,0.102029527482395,0.100806902172034,0.0998109018765983,0.0990827757904326,0.0986552350119151,0.0985512045964603,0.0987782095222593,0.0993188002335625,0.10011850049598,0.101073565803754,0.102021357898257,0.102736190883306],[0.0197996745792023,0.0230403620782594,0.0263180703825426,0.0295795907918275,0.0327808689153244,0.0358880933379644,0.0388786081930561,0.041741670224981,0.0444790265391244,0.0471052125440611,0.0496473902063883,0.0521445050236724,0.0546455605943084,0.0572068762776654,0.0598882547154044,0.0627480047056168,0.0658367880766665,0.0691904418657728,0.0728224358154073,0.0767174138310883,0.0808278827214659,0.0850758390493886,0.0893596351395245,0.093564362644805,0.0975728659467336,0.101275000146099,0.104574317256765,0.10739266324478,0.109673444920823,0.111383857546067,0.112515877824602,0.113085776518446,0.11313222591449,0.112713393545381,0.111903411034972,0.110788305687717,0.109461229512625,0.108016993157218,0.106546529565121,0.105132493899998,0.103847111519071,0.102752419483958,0.101901732035687,0.101340326749818,0.101103449998262,0.101210604058146,0.101656224873726,0.102397931691148,0.103344400801675,0.104345513394756,0.105187564744777],[0.0198329265466436,0.0229950257762506,0.0261819359281641,0.0293402725455583,0.0324261328975807,0.0354064210283272,0.0382599652836571,0.0409784095422716,0.0435668192610594,0.0460438847500474,0.0484415545052407,0.0508039225309999,0.0531852375642895,0.0556469736621194,0.0582539468743659,0.0610694440316023,0.0641492850889093,0.0675348095869117,0.0712451420777093,0.0752698128492905,0.0795636374776064,0.0840460870116161,0.0886065868477193,0.0931152728196424,0.097436692815716,0.101443110451299,0.105024910606696,0.108097273867193,0.110603525948291,0.112515877824602,0.113834054290717,0.114582176500992,0.114804433460968,0.114560320775369,0.113920213195428,0.112961677971017,0.111766452052585,0.1104177482942,0.108997707407521,0.107585206582271,0.106254448586631,0.105074460818783,0.104108931896927,0.103415179632419,0.103040923420831,0.103018007293712,0.103353057139428,0.104015987465594,0.104928133739159,0.105952437353606,0.106888328103055],[0.019666951979124,0.0227416870288337,0.025831592457205,0.028883589112671,0.0318542158157446,0.03471116976732,0.0374346055545078,0.0400181407658664,0.042469482036325,0.0448105437488646,0.0470769076000458,0.0493164881911971,0.0515873300079298,0.0539545341813076,0.0564863485347136,0.0592494193188867,0.0623031293376317,0.065692947907517,0.0694429483429996,0.0735482048684632,0.0779685686437122,0.0826259658085248,0.0874072786352219,0.0921736423997681,0.0967748746785305,0.101065832101195,0.104920958426604,0.108244393823937,0.110974840307782,0.113085776518446,0.114582176500992,0.115494941123467,0.115874222037705,0.115782796235101,0.115290479775698,0.114470159706538,0.113395492262777,0.112139897763217,0.110776336371563,0.109377438541185,0.108015694584173,0.10676338545883,0.10569173692745,0.104868571561808,0.104353732351723,0.104191849914758,0.104402571445975,0.10496906278508,0.105826333685744,0.106851565637909,0.107858882603496],[0.0193177557384191,0.0222901364657058,0.0252727742295653,0.0282131669338688,0.0310683701757549,0.0338067269710181,0.0364093221534189,0.0388710841047306,0.0412014299600058,0.0434243246084322,0.0455776199433563,0.0477115747013433,0.0498865216933156,0.0521697166573959,0.0546314284187607,0.0573402919545905,0.0603578749677877,0.0637323921756593,0.067491651440042,0.0716357088710055,0.0761303232849857,0.080902954094887,0.0858433748977925,0.0908105091740811,0.0956455762095412,0.100189455009278,0.104300434195248,0.10786831546042,0.110822298930238,0.11313222591449,0.114804433460968,0.115874222037705,0.116396969505733,0.116439605607111,0.116073681660397,0.115370703138586,0.114399832477675,0.113227633932602,0.111919278851983,0.11054051606788,0.109159649703249,0.107848711631506,0.106683010165943,0.105738359857973,0.105085579561552,0.104782242435146,0.104862115769381,0.105323231845509,0.106116085261734,0.107133962584367,0.108207633789301],[0.0188253291304183,0.0216749595037037,0.0245364639953456,0.0273579821762354,0.0300969734622301,0.0327221039924163,0.0352148242199604,0.0375705432865608,0.0397992897434228,0.0419257373646328,0.0439884858786634,0.0460385277666891,0.0481368933504082,0.0503515180409112,0.0527533869635769,0.0554119769009098,0.0583899685967409,0.0617372112051954,0.0654840517630202,0.0696344276166984,0.0741595405776082,0.0789934166472068,0.0840320495417437,0.08913783903358,0.0941502865248955,0.0989022326285972,0.103238788654741,0.107034659498267,0.110205812197299,0.112713393545381,0.114560320775369,0.115782796235101,0.116439605607111,0.116601695766022,0.116343710815491,0.115738304031513,0.114853358010791,0.113751798687446,0.1124934301592,0.111138036961544,0.109748815904429,0.108395027857676,0.107152714978651,0.106102529495525,0.105324178609466,0.104887590576054,0.104841501813262,0.105200685521992,0.105933497377517,0.10695177152194,0.108105193171818],[0.0182494927870589,0.0209518737204194,0.0236756287150415,0.0263694175927364,0.0289908442452867,0.0315084612945201,0.0339034295561093,0.0361707306601078,0.0383198276984895,0.0403746734207767,0.0423729877906212,0.0443647670349988,0.0464100300839153,0.0485758316043286,0.050932558636991,0.0535494934426646,0.0564896130930861,0.0598036597815166,0.0635236807961097,0.0676564827666857,0.0721777216884198,0.0770276203127833,0.0821095463954886,0.0872928194407002,0.0924209006149407,0.097325211198836,0.101843106124674,0.105836563213491,0.109207117067596,0.111903411034972,0.113920213195428,0.115290479775698,0.116073681660397,0.116343710815491,0.11617874694784,0.115654261082425,0.114839384688616,0.113796324023148,0.112582235147372,0.111252805074603,0.109866583642804,0.108488883731186,0.107193929179962,0.106064024350804,0.105184925339355,0.104637267764746,0.10448470451514,0.104760160903368,0.105452201444788,0.106493830431803,0.107755996492083],[0.0176634036179944,0.0201916473732674,0.0227594766668228,0.0253158112109128,0.0278180542076627,0.0302341663872877,0.0325443539260933,0.034742269401235,0.0368356380231976,0.0388462433236286,0.0408092384462426,0.04277178250366,0.0447910174000128,0.0469313817098358,0.0492612081923167,0.0518485076341788,0.0547558642479761,0.0580345032357235,0.0617178308359329,0.0658150176284293,0.070305399846015,0.0751345617927553,0.0802129743792947,0.0854180751676755,0.0906006525186964,0.0955960861991019,0.10024003194464,0.104386463396965,0.107924291723597,0.110788305687717,0.112961677971017,0.114470159706538,0.115370703138586,0.115738304031513,0.115654261082425,0.115197657606072,0.114440579543786,0.113446797531765,0.112273291906639,0.110973839594665,0.109603724990492,0.108224442207695,0.106907088500316,0.105733113809483,0.104791298065786,0.10417035678817,0.103947420975635,0.104173656593322,0.104859223430166,0.105960334589026,0.107371136840097],[0.0171456377144425,0.0194725899040552,0.0218662830175306,0.0242755760487848,0.0266573229710324,0.0289784558548719,0.0312176325491698,0.0333663571800869,0.035429515100045,0.0374253059640954,0.0393846008910634,0.0413497730546252,0.043373031408999,0.0455142129926303,0.0478378833053571,0.0504095219009486,0.0532906154317949,0.0565326927574597,0.060170681698369,0.0642163197260308,0.0686525563853742,0.0734298575278577,0.0784651138009087,0.0836436352676376,0.0888246222175502,0.0938504714530358,0.098559962311594,0.102804375737443,0.106464007765531,0.109461229512625,0.111766452052585,0.113395492262777,0.114399832477675,0.114853358010791,0.114839384688616,0.114440579543786,0.113732820598261,0.112782915020332,0.111649540448044,0.110386561338576,0.109047759328712,0.107691903485977,0.106386972828554,0.105212260134586,0.104257101912314,0.10361523764427,0.103374465910063,0.103602363239967,0.104330125041191,0.105537609543634,0.107142928208123],[0.0167714924650897,0.0188723116905523,0.0210755270415093,0.0233296661615941,0.0255907851168261,0.0278244848982743,0.0300074319152829,0.0321283085190226,0.0341881768360146,0.0362003045305237,0.038189555773451,0.0401914639491365,0.0422510429655798,0.0444212530086321,0.0467608548245097,0.0493312595848625,0.0521920250661299,0.0553949301746461,0.0589770306205514,0.0629535863969765,0.0673120344566068,0.072008113536416,0.0769648698729929,0.0820748008423383,0.0872050955894278,0.0922059136606917,0.0969216927634622,0.101205133078558,0.104932444018865,0.108016993157218,0.1104177482942,0.112139897763217,0.113227633932602,0.113751798687446,0.113796324023148,0.113446797531765,0.112782915020332,0.111875121276362,0.110784869490238,0.109567564507343,0.108277140617219,0.106971198287026,0.10571561697311,0.104587524853554,0.103675434780255,0.103075372474765,0.10288218522253,0.103176151056989,0.104006449881658,0.10537453878966,0.107221272697958],[0.0166040332006384,0.0184592270070578,0.0204597776663962,0.022553794222206,0.0246965027479584,0.0268521019279102,0.0289950491331545,0.0311107307504811,0.0331955548265636,0.0352565956086472,0.0373109890085399,0.0393852824208924,0.041514840834889,0.0437431948848057,0.0461209398623877,0.0487035830371259,0.0515477453532323,0.0547054523864568,0.0582168546282454,0.0621023879018121,0.0663558236564226,0.0709396406065027,0.0757836661338225,0.0807872298262874,0.0858245059766227,0.0907525481483566,0.0954216709187161,0.0996879074202355,0.103426784755773,0.106546529565121,0.108997707407521,0.110776336371563,0.111919278851983,0.1124934301592,0.112582235147372,0.112273291906639,0.111649540448044,0.110784869490238,0.10974373936335,0.108583802309521,0.107360319768912,0.106131218338772,0.104961746361344,0.103927768433203,0.103116685438074,0.102624858350806,0.102550506302654,0.102981682926133,0.103980257251917,0.105564530119846,0.107694434575614],[0.0166856083812501,0.0182843430388284,0.0200767218002543,0.0220106712168221,0.0240408889623787,0.0261304237510619,0.0282515992223047,0.0303862751998386,0.0325255539276368,0.0346691587521296,0.0368247986457371,0.0390078286098411,0.0412413717499742,0.0435567709287507,0.0459938501190088,0.0486001403838893,0.0514281618677676,0.0545301977218941,0.0579507298666269,0.061717597256679,0.0658336118704106,0.0702704963000985,0.0749665006787499,0.0798281487438137,0.0847356952841487,0.0895514382434187,0.0941300950979937,0.0983307132578995,0.102029527482395,0.105132493899998,0.107585206582271,0.109377438541185,0.11054051606788,0.111138036961544,0.111252805074603,0.110973839594665,0.110386561338576,0.109567564507343,0.108583802309521,0.107495079258876,0.10635843647708,0.105233110398018,0.104185005564433,0.10328984449144,0.102634201910419,0.102313510365311,0.102426061550053,0.103062381590745,0.104290400275489,0.106138458121161,0.108579766828597],[0.0170309009009039,0.0183741443774614,0.019961924623402,0.0217426696601811,0.0236712848970544,0.0257103226997689,0.0278303977209859,0.0300098953209992,0.0322341671793313,0.0344945578485685,0.0367877087338195,0.0391155741302234,0.0414863944493764,0.0439164860155415,0.0464321946920124,0.0490708988766152,0.051879779055568,0.0549113968697175,0.0582159700752847,0.0618313585729791,0.0657727555273533,0.0700244602327785,0.0745356841725799,0.0792212833346261,0.0839671130397174,0.0886389000624342,0.093093363656403,0.0971905928021063,0.100806902172034,0.103847111519071,0.106254448586631,0.108015694584173,0.109159649703249,0.109748815904429,0.109866583642804,0.109603724990492,0.109047759328712,0.108277140617219,0.107360319768912,0.10635843647708,0.105329925935416,0.104335476670581,0.103442203722576,0.102726300320326,0.102273609833579,0.102177504351418,0.102533352112481,0.103429024147376,0.104931622509005,0.10707193345879,0.10982959053582],[0.0176227077845935,0.0187255391079382,0.0201230862453322,0.0217654972684213,0.023609124657479,0.0256171332160939,0.0277592460923794,0.0300107465165768,0.032351035192074,0.0347622238454742,0.0372283639033316,0.0397358852831468,0.042275579018315,0.0448459744950298,0.0474573139732284,0.0501347147520288,0.0529188175908524,0.0558625014438278,0.0590231699298102,0.0624514729788767,0.0661786504479392,0.0702054224577551,0.0744951355422329,0.0789727505795509,0.0835297202332216,0.0880335342703454,0.0923401727443969,0.0963078613964445,0.0998109018765983,0.102752419483958,0.105074460818783,0.10676338545883,0.107848711631506,0.108395027857676,0.108488883731186,0.108224442207695,0.107691903485977,0.106971198287026,0.106131218338772,0.105233110398018,0.104335476670581,0.103499551655562,0.102793080777742,0.102292254661182,0.102081401508485,0.102250190761531,0.102888025057679,0.104075363901974,0.10587217683983,0.108304635908844,0.111352283115469],[0.0184112786443383,0.0193036144352677,0.0205364563355298,0.0220634583583041,0.0238441881948054,0.0258440117422347,0.0280330007461299,0.0303840689693109,0.0328707640014087,0.0354653549447156,0.038137988574964,0.0408576437369137,0.0435953084654497,0.0463292085754549,0.0490511191524726,0.051772022780174,0.0545249656843273,0.057363196237199,0.0603526522165691,0.0635594121154091,0.0670343867389329,0.0707986936857298,0.0748332970184869,0.0790754419614093,0.0834225616148196,0.0877424884744388,0.091887716123396,0.0957113408970778,0.0990827757904326,0.101901732035687,0.104108931896927,0.10569173692745,0.106683010165943,0.107152714978651,0.107193929179962,0.106907088500316,0.106386972828554,0.10571561697311,0.104961746361344,0.104185005564433,0.103442203722576,0.102793080777742,0.102304063161057,0.102049436620413,0.102109946452767,0.10256902067649,0.103506783839688,0.104992022722983,0.107072464157825,0.109764198317764,0.113041726217119],[0.0193172496887705,0.0200423581397086,0.0211456469793341,0.0225866023071851,0.0243302842342403,0.0263463244850819,0.0286068255988149,0.0310834720514554,0.0337444298930393,0.0365518603713253,0.0394610180011997,0.0424218328801814,0.0453834893143447,0.0483017799036967,0.0511480492320706,0.0539176208888705,0.0566350945847835,0.0593541128195264,0.0621502240248153,0.0651071451363291,0.0682986586983935,0.0717699947986146,0.0755231887018097,0.0795101149518328,0.0836348243437206,0.0877643064447572,0.0917449588334424,0.0954214646357943,0.0986552350119151,0.101340326749818,0.103415179632419,0.104868571561808,0.105738359857973,0.106102529495525,0.106064024350804,0.105733113809483,0.105212260134586,0.104587524853554,0.103927768433203,0.10328984449144,0.102726300320326,0.102292254661182,0.102049436620413,0.102066807188804,0.102418123127492,0.10317716528555,0.104411357542059,0.106174391548663,0.10849840217679,0.11138628362159,0.114804908845702],[0.0202373545430986,0.0208478281555037,0.0218625706246616,0.0232496861286319,0.0249824555037043,0.0270372918356324,0.0293904679760304,0.0320140457697783,0.0348717546066247,0.0379158575355582,0.0410862019494618,0.0443125440979638,0.0475207426909342,0.0506425125125869,0.0536272691498109,0.0564535281944212,0.0591367698553657,0.0617309347674406,0.0643218025811392,0.0670122289257222,0.0699012934552287,0.0730614264944729,0.0765188450798398,0.0802423419962743,0.0841433537785074,0.0880870380626791,0.0919112734279505,0.0954491972577069,0.0985512045964603,0.101103449998262,0.103040923420831,0.104353732351723,0.105085579561552,0.105324178609466,0.105184925339355,0.104791298065786,0.104257101912314,0.103675434780255,0.103116685438074,0.102634201910419,0.102273609833579,0.102081401508485,0.102109946452767,0.102418123127492,0.103068252153533,0.104120637527244,0.105627032920497,0.107624065572147,0.110127276876865,0.113126101434321,0.116579914741623],[0.0210516141248012,0.021602876289728,0.0225699549560327,0.0239327005742675,0.0256757120523867,0.0277851146244285,0.0302439668339692,0.0330268552853245,0.0360946142218224,0.0393904494187039,0.0428389243668108,0.046349115163758,0.0498226133977412,0.0531659366943929,0.0563055137478691,0.0592021990009788,0.061861739126727,0.0643380045051197,0.0667269834018504,0.0691512267404509,0.0717364804592963,0.0745845358079433,0.0777482904039718,0.081215495753131,0.0849057710511616,0.0886816214752877,0.0923701939418145,0.0957901778901475,0.0987782095222593,0.101210604058146,0.103018007293712,0.104191849914758,0.104782242435146,0.104887590576054,0.104637267764746,0.10417035678817,0.10361523764427,0.103075372474765,0.102624858350806,0.102313510365311,0.102177504351418,0.102250190761531,0.10256902067649,0.10317716528555,0.104120637527244,0.105442771078341,0.107177942564458,0.109345907566447,0.111947428967978,0.114961243178256,0.118342017148305],[0.0216308185565193,0.0221726533066038,0.0231250360536958,0.0244828996383591,0.0262455376714748,0.0284120989309966,0.0309755461573968,0.0339157681373235,0.0371929840285189,0.040742988478514,0.0444759976719848,0.0482806604983484,0.0520340121657551,0.0556167609033422,0.0589316217540477,0.0619210517798117,0.064580298867567,0.0669623185873697,0.0691724614632012,0.0713524169495847,0.0736547254506154,0.0762115637406577,0.0791041891881894,0.0823409428008623,0.0858503692034513,0.0894916606878204,0.0930792672556463,0.0964148164781206,0.0993188002335625,0.101656224873726,0.103353057139428,0.104402571445975,0.104862115769381,0.104841501813262,0.10448470451514,0.103947420975635,0.103374465910063,0.10288218522253,0.102550506302654,0.102426061550053,0.102533352112481,0.102888025057679,0.103506783839688,0.104411357542059,0.105627032920497,0.107177942564458,0.109081468818869,0.111343399732056,0.113954504237764,0.116888383585565,0.120100020384222],[0.0218436758463872,0.0224106243518809,0.0233645023153083,0.0247187370588077,0.0264908882983998,0.0286967926700521,0.0313429736802349,0.0344181265782517,0.0378850222502866,0.0416746727400368,0.0456848692074931,0.0497849724381056,0.0538278673371518,0.0576682820873136,0.0611846483990364,0.0643001544683514,0.0669983577770905,0.0693297398868033,0.0714071769938292,0.0733897236751991,0.0754555175425166,0.0777669010031268,0.0804342146468295,0.0834874856536821,0.0868648084228072,0.0904215758088382,0.0939578908043491,0.0972560718556443,0.10011850049598,0.102397931691148,0.104015987465594,0.10496906278508,0.105323231845509,0.105200685521992,0.104760160903368,0.104173656593322,0.103602363239967,0.103176151056989,0.102981682926133,0.103062381590745,0.103429024147376,0.104075363901974,0.104992022722983,0.106174391548663,0.107624065572147,0.109345907566447,0.111343399732056,0.113614179481031,0.11614652614003,0.118916692761225,0.121886586438237],[0.0215636271123677,0.0221653460138802,0.023111210667939,0.0244365136706508,0.0261807728004444,0.028380506127086,0.0310600244238471,0.0342211476124548,0.0378333851627186,0.0418267364762657,0.0460896282272149,0.0504742642018614,0.0548105000575356,0.0589272621375833,0.062678081017518,0.0659655850158709,0.068759722155404,0.0711059789698952,0.0731217847031828,0.0749805227340939,0.0768834080097065,0.079021472046981,0.0815338496952199,0.0844727420062443,0.0877862633440928,0.0913257358299202,0.0948756865613284,0.0981973626594604,0.101073565803754,0.103344400801675,0.104928133739159,0.105826333685744,0.106116085261734,0.105933497377517,0.105452201444788,0.104859223430166,0.104330125041191,0.104006449881658,0.103980257251917,0.104290400275489,0.104931622509005,0.10587217683983,0.107072464157825,0.10849840217679,0.110127276876865,0.111947428967978,0.113954504237764,0.11614652614003,0.11851894361148,0.12105993339361,0.123745862555688],[0.0206756173712682,0.0212884113617055,0.0221832362790041,0.0234204805254652,0.0250653643770753,0.0271793781075005,0.0298094247004417,0.0329756318229061,0.0366595435632467,0.0407951715946082,0.0452658894575249,0.0499099438517527,0.0545360015284652,0.0589476194755151,0.0629725651941309,0.0664909288536984,0.0694561217415781,0.0719049280351492,0.0739551043288354,0.0757900605957289,0.0776303138573719,0.0796929205297642,0.0821445328592555,0.0850593943383111,0.0883959647032726,0.0920015175738187,0.0956443435828908,0.0990635044577494,0.102021357898257,0.104345513394756,0.105952437353606,0.106851565637909,0.107133962584367,0.10695177152194,0.106493830431803,0.105960334589026,0.105537609543634,0.10537453878966,0.105564530119846,0.106138458121161,0.10707193345879,0.108304635908844,0.109764198317764,0.11138628362159,0.113126101434321,0.114961243178256,0.116888383585565,0.118916692761225,0.12105993339361,0.123328316520917,0.125720682867589],[0.0190828850096431,0.0196435736264059,0.0204052535287975,0.021456414903528,0.0228917047983721,0.0248021151327367,0.0272624847552774,0.0303172817711952,0.033966503058887,0.0381544909245256,0.0427651789303845,0.0476271646380002,0.052530482460023,0.0572539355838214,0.0615982752030172,0.0654181840902741,0.0686463861879824,0.0713059001829667,0.0735092539496809,0.075444345101303,0.0773460695850045,0.0794537853683119,0.0819594966089163,0.0849588020344645,0.0884207553569314,0.0921890192445248,0.0960157406076288,0.0996175316874479,0.102736190883306,0.105187564744777,0.106888328103055,0.107858882603496,0.108207633789301,0.108105193171818,0.107755996492083,0.107371136840097,0.107142928208123,0.107221272697958,0.107694434575614,0.108579766828597,0.10982959053582,0.111352283115469,0.113041726217119,0.114804908845702,0.116579914741623,0.118342017148305,0.120100020384222,0.121886586438237,0.123745862555688,0.125720682867589,0.127840734674854]],"type":"surface","contours":{"z":{"show":true,"usecolormap":true,"highlightcolor":"#ff0000","project":{"z":true}}},"frame":null}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>
<p>The next plot, the <a href="https://en.wikipedia.org/wiki/Modes_of_variation">Modes of Variation Plot</a> shows the modes of variation about the mean for the first two eigenfunctions. The mean function is indicated by the red line. The dark gray shows the variation of the first eigenfunction, and the light gray shows the variation of the second. The plot indicates that all of the subjects begin their careers with similar entry level wages, start to diverge by their second year, reach peak variation around year eight and then begin to converge again by year eleven.</p>
<pre class="r"><code>CreateModeOfVarPlot(res_wages, main = "Modes of Variation of Eigenvectors")</code></pre>
<p><img src="/2021/07/08/exploratory-fda-with-sparse-data/index_files/figure-html/unnamed-chunk-14-1.png" width="672" />
### Some Conclusions</p>
<p>This short post neither presents a comprehensive view of functional principal components analysis, nor does it provide the last word on the <code>wages</code> data set. Nevertheless, juxtaposing the highly visual but traditional exploratory analysis conducted by the authors of the <code>brolgar</code> package with a basic FPCA look does provide some insight into the promise and pitfalls of using FPCA to explore sparse, longitudinal data sets.</p>
<p>On the promise side:</p>
<ol style="list-style-type: decimal">
<li><p>FDA offers a global perspective that facilitates thinking about an individual subject’s <em>time path</em> as a whole. For the <code>wages</code> data set, we see individual trajectories developing in a wages/time space that can be parsimoniously represented, analyzed and compared.</p></li>
<li><p>FPCA works with very sparse data, does not require the same number of observations for each subject, and does not demand that the observations be taken at the same time points.</p></li>
<li><p>There is really no concept of missing data per se, and no need for data imputation. The amount of information required to represent a subject can vary over a wide range.</p></li>
</ol>
<p>As to the pitfalls:</p>
<ol style="list-style-type: decimal">
<li><p>As with plain old multivariate PCA, eigenvectors may not have any obvious meaning, and trajectories in an abstract space may be difficult to interpret.</p></li>
<li><p>There is really no way to avoid the missing data <em>daemon</em>. The <code>wages</code> data set shows the sensitivity of FPCA trajectories to both the number and the locations of the observed data points. It is not possible to reconstruct individual trajectories for which there are too few observations.</p></li>
</ol>
<p>That’s all for now. Thank you for following along.</p>
</div>
<script>window.location.href='https://rviews.rstudio.com/2021/07/08/exploratory-fda-with-sparse-data/';</script>
May 2021: "Top 40" New CRAN Packages
https://rviews.rstudio.com/2021/06/24/may-2021-top-40-new-cran-packages/
Thu, 24 Jun 2021 00:00:00 +0000https://rviews.rstudio.com/2021/06/24/may-2021-top-40-new-cran-packages/
<p>Two hundred five packages made it to CRAN in May, but seven were removed before this post went to print. Here are my “Top40” picks in ten categories: Computational Methods, Data, Genomics, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities, and Visualization.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=madgrad">madgrad</a> v0.1.0: Implements MADGRAD, a Momentumized, Adaptive Dual Averaged Gradient method for stochastic optimization. See <a href="https://arxiv.org/abs/2101.11075">Defazio & Jelassi (2021)</a> for details and <a href="https://cran.r-project.org/web/packages/madgrad/readme/README.html">README</a> to get started.</p>
<p><img src="madgrad.gif" height = "250" width="450"></p>
<p><a href="https://cran.r-project.org/package=TriDimRegression">TriDimRegression</a> v1.0.0.0: Provides functions to fit 2D and 3D transformations using <a href="https://mc-stan.org/">Stan</a> which return posterior distributed for fitted parameters. There are vignettes on <a href="https://cran.r-project.org/web/packages/TriDimRegression/vignettes/transformation_matrices.html">Transformation Matrices</a>, <a href="https://cran.r-project.org/web/packages/TriDimRegression/vignettes/calibration.html">Eye Gaze Mapping</a>, and <a href="https://cran.r-project.org/web/packages/TriDimRegression/vignettes/comparing_faces.html">Comparing Faces</a>. See <a href="https://cran.r-project.org/web/packages/TriDimRegression/readme/README.html">README</a> to get started.</p>
<p><img src="Tri.png" height = "300" width="500"></p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=AtmChile">AtmChile</a> v0.1.0: Provides access to air quality and meteorological information from the Chile’s National Air Quality System <a href="https://sinca.mma.gob.cl/">(S.I.N.C.A.)</a>. See <a href="https://cran.r-project.org/web/packages/AtmChile/readme/README.html">READMW</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=basemaps">basemaps</a> v0.0.1: Provides a lightweight interface to access spatial basemaps from open sources such as <a href="https://www.openstreetmap.org/#map=5/38.007/-95.844">OpenStreetMap</a>, <a href="https://www.mapbox.com/">Mapbox</a> and others. See <a href="https://cran.r-project.org/web/packages/basemaps/readme/README.html">README</a> to get started.</p>
<p><img src="basemaps.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=causaldata">causaldata</a> v0.1.1: Contains the data sets to run the example problems in the online causal inference textbooks <a href="https://theeffectbook.net/"><em>The Effect</em></a> and <a href="https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/"><em>Causal Inference: What If</em></a> and more.</p>
<p><a href="https://cran.r-project.org/package=exoplanets">exoplanets</a> v0.2.1: Provides access to NASA’s <a href="https://exoplanetarchive.ipac.caltech.edu/index.html">Exoplanet Archive</a>. See the <a href="https://cran.r-project.org/web/packages/exoplanets/vignettes/exoplanets.html">vignette</a> to get started.</p>
<p><img src="exoplanets.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=frenchdata">frenchdata</a> v0.1.1: Provides access to Kenneth’s French <a href="http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html">finance data library</a>. See the <a href="https://cran.r-project.org/web/packages/frenchdata/vignettes/basic_usage.html">vignette</a> for basic usage.</p>
<p><img src="frenchdata.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=tradepolicy">tradepolicy</a> v0.5.0: Provides access to the data sets from <a href="https://zenodo.org/record/4277741#.YNIS_TZKj0o">Yotov et al. (2016)</a> along with an <a href="https://r.tiid.org/R_structural_gravity/">online book</a> containing commentary and the code to recreate the original analysis.</p>
<p><img src="trade.png" height = "300" width="500"></p>
<h3 id="genomics">Genomics</h3>
<p><a href="https://cran.r-project.org/package=artemis">artemis</a> v1.0.7: Provides a modeling framework for the design and analysis of experiments collecting environmental DNA. There is an <a href="https://cran.r-project.org/web/packages/artemis/vignettes/artemis-overview.html">Introduction</a> and also vignettes on <a href="https://cran.r-project.org/web/packages/artemis/vignettes/modeling.html">Modeling eDNA and qPCR Data</a> and <a href="https://cran.r-project.org/web/packages/artemis/vignettes/simulation.html">Simulating eDNA Data</a>.</p>
<p><img src="artemis.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=MAGEE">MAGEE</a> v1.0.0: Provides functions to perform variant set-based main effect tests, gene-environment interaction tests, and joint tests for association, as proposed in <a href="https://onlinelibrary.wiley.com/doi/10.1002/gepi.22351">Wang et al. (2020)</a>. See the <a href="https://cran.r-project.org/web/packages/MAGEE/vignettes/MAGEE.pdf">vignette</a> for details.</p>
<p><a href="https://cran.r-project.org/package=MultIS">MultIS</a> v0.5.1: Implements a bioinformatical approach to detect the multiple integration of viral vectors within the same clone. See the <a href="https://cran.r-project.org/web/packages/MultIS/vignettes/QuickStart.html">vignette</a> for how to use the package.</p>
<p><img src="MultIS.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=TopDom">TopDom</a> v0.10.0: Provides functions to identify topological domains in genomes from Hi-C sequence data as described in <a href="https://academic.oup.com/nar/article/44/7/e70/2467818">Shin et al. (2016)</a>. See <a href="https://cran.r-project.org/web/packages/TopDom/readme/README.html">README</a> to get started.</p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=cjbart">cjbart</a> v0.1.0: Implements a tool for analyzing conjoint experiments using Bayesian Additive Regression Trees (BART), a machine learning method developed by <a href="https://projecteuclid.org/journals/annals-of-applied-statistics/volume-4/issue-1/BART-Bayesian-additive-regression-trees/10.1214/09-AOAS285.full">Chipman & McCulloch (2010)</a>. See the <a href="https://cran.r-project.org/web/packages/cjbart/vignettes/cjbart-demo.html">vignette</a> for examples.</p>
<p><img src="cjbart.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=fastText">fastText</a> v1.0.1: Implements an interface to Facebook’s <a href="https://github.com/facebookresearch/fastText">fastText Library</a>. See <a href="https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00051/43387/Enriching-Word-Vectors-with-Subword-Information">Bojanowski et al. (2017)</a> for a description of the algorithm. There is a <a href="https://cran.r-project.org/web/packages/fastText/vignettes/language_identification.html">Benchmark</a> vignette and an <a href="https://cran.r-project.org/web/packages/fastText/vignettes/the_fastText_R_package.html">Introduction</a>.</p>
<p>.</p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=afdx">afdx</a> v1.1.1: Provides functions to estimate diagnosis performance (Sensitivity, Specificity, Positive predictive value, Negative predicted value) of a diagnostic test when there is no golden standard by estimating the attributable fraction using either a <a href="https://cran.r-project.org/web/packages/afdx/vignettes/af_logit_exponential.html">logitexponential model</a> or a <a href="https://cran.r-project.org/web/packages/afdx/vignettes/latentclass.html">latent class model</a>.</p>
<p><img src="afdx.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=covidcast">covidcast</a> v0.4.2: Provides an interface to Delphi’s <a href="https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html">COVIDcast Epidata</a> including tools for data access, maps and time series plotting, and basic signal processing, and a collection of numerous indicators relevant to the COVID-19 pandemic in the United States. There is a <a href="https://cran.r-project.org/web/packages/covidcast/vignettes/covidcast.html">Getting Started Guide</a>, and vignettes on <a href="https://cran.r-project.org/web/packages/covidcast/vignettes/correlation-utils.html">Computing Signal Correlations</a>, <a href="https://cran.r-project.org/web/packages/covidcast/vignettes/external-data.html">Combining Data Sources</a>, <a href="https://cran.r-project.org/web/packages/covidcast/vignettes/multi-signals.html">Manipulating Multiple Signals</a>, and <a href="https://cran.r-project.org/web/packages/covidcast/vignettes/plotting-signals.html">Plotting and Mapping Signals</a>.</p>
<p><img src="covidcast.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=eventTrack">eventTrack</a> v1.0.0: Implements the hybrid framework for event prediction in clinical trials as described in <a href="https://www.sciencedirect.com/science/article/pii/S155171441100139X?via%3Dihub">Fang & Zheng (2011)</a>. See the <a href="https://cran.r-project.org/web/packages/eventTrack/vignettes/eventTrack.html">vignette</a> for an example.</p>
<p><img src="eventTrack.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=goldilocks">goldilocks</a> v0.3.0: Implements the Goldilocks adaptive trial design for a time to event outcome using a piecewise exponential model and conjugate Gamma prior distributions as described in <a href="https://www.tandfonline.com/doi/abs/10.1080/10543406.2014.888569?journalCode=lbps20">Broglio et al. (2014)</a>. See the <a href="https://cran.r-project.org/web/packages/goldilocks/vignettes/broglio.html">vignette</a> for an example.</p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=CopernicusDEM">CopernicusDEM</a> v1.0.1: Provides an interface to the <a href="https://spacedata.copernicus.eu/explore-more/news-archive/-/asset_publisher/Ye8egYeRPLEs/blog/id/434960">Copernicus DEM</a> Digital Elevation Model of the European Space Agency with 90 and 30 meters resolution using the <a href="https://aws.amazon.com/cli/">AWS CLI</a> command line tool. See the <a href="https://cran.r-project.org/web/packages/CopernicusDEM/vignettes/Copernicus_Digital_Elevation_Models.html">vignette</a> for an example.</p>
<p><img src="DEM.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=nimbleCarbon">nimbleCarbon</a> v0.1.2: Provides functions and a custom probability distribution for Bayesian analyses of radiocarbon dates within the <code>nimble</code> modeling framework, including a suite of functions for prior and posterior predictive checks for demographic inference as described in <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251695">Crema & Shoda (2021)</a>. See the <a href="https://cran.r-project.org/web/packages/nimbleCarbon/vignettes/nimble_carbon_vignette.html">Introduction</a>.</p>
<p><img src="nimbleCarbon.png" height = "300" width="500"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=bayesmodels">bayesmodels</a> v0.1.0: Implements a framework to bring a number of Bayesian models into the <code>tidymodels</code> ecosystem. See the <a href="https://cran.r-project.org/web/packages/bayesmodels/vignettes/modeltime-integration.html">vignette</a> for an overview.</p>
<p><img src="bayesmodels.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=div">div</a> v0.3.1: Provides functions to facilitate the analysis of teams in a corporate setting, assess the diversity per grade and job, search for bias and also provides methods to simulate the effects of bias. See <a href="http://www.de-brouwer.com/assets/div/div-white-paper.pdf">De Brouwer (2021)</a> and <a href="https://onlinelibrary.wiley.com/doi/book/10.1002/9781119632757">De Brouwer (2020)</a> for background. Look <a href="http://www.de-brouwer.com/div/">here</a> to get started.</p>
<p><img src="div.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=HotellingEllipse">HotellingEllipse</a> v0.1.1: Provides functions to compute the semi-axes lengths and coordinate points of Hotelling ellipse. See <a href="https://pubs.rsc.org/en/content/articlelanding/2014/AY/C3AY41907J#!divAbstract">Bro & Smilde (2014)</a> and <a href="https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/full/10.1002/cem.2763">Brenton (2016)</a> for background. Look <a href="https://github.com/ChristianGoueguel/HotellingEllipse">here</a> and at the <a href="https://cran.r-project.org/web/packages/HotellingEllipse/vignettes/HotellingEllipse.html">vignette</a> for examples.</p>
<p><img src="Hottelling.png" height = "300" width="500"></p>
<p><a href="https://CRAN.R-project.org/package=makemyprior">makemyprior</a> v1.0.0: Provides tools to construct and visualize joint priors for variance parameters. Vignettes provide examples for <a href="https://cran.r-project.org/web/packages/makemyprior/vignettes/latin_square.html">Latin Square</a>, <a href="https://cran.r-project.org/web/packages/makemyprior/vignettes/make_prior.html">i.i.d. models</a>, <a href="https://cran.r-project.org/web/packages/makemyprior/vignettes/neonatal_mortality.html">neonatal mortality</a>, and <a href="https://cran.r-project.org/web/packages/makemyprior/vignettes/wheat_breeding.html">wheat breeding</a>.</p>
<p><img src="makemyprior.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=Rage">Rage</a> v1.0.0: Provides functions for calculating life history metrics using matrix population models (MPMs) as described in <a href="https://www.biorxiv.org/content/10.1101/2021.04.26.441330v2">Jones et al. (2021)</a>. There is a <a href="https://cran.r-project.org/web/packages/Rage/vignettes/a01_GettingStarted.html">Getting Started Guide</a> and vignettes on <a href="https://cran.r-project.org/web/packages/Rage/vignettes/a02_VitalRates.html">Vital Rates</a>, <a href="https://cran.r-project.org/web/packages/Rage/vignettes/a03_LifeHistoryTraits.html">Life History Traits</a>, <a href="https://cran.r-project.org/web/packages/Rage/vignettes/a04_AgeFromStage.html">Deriving Age</a>, and <a href="https://cran.r-project.org/web/packages/Rage/vignettes/a05_TernaryPlots.html">Ternary Plots</a>.</p>
<p><img src="Rage.png" height = "300" width="300"></p>
<p><a href="https://cran.r-project.org/package=unusualprofile">unusualprofile</a> v0.1.0: Provides functions to calculate <a href="https://link.springer.com/article/10.1007%2Fs13171-019-00164-5">Mahalanobis distance</a> for every row of a set of outcome variables. There is an <a href="https://cran.r-project.org/web/packages/unusualprofile/vignettes/tutorial_unusualprofile.html">Introduction</a> and a vignette on the <a href="https://cran.r-project.org/web/packages/unusualprofile/vignettes/unusualprofile_calculations.html">calculations</a>.</p>
<p><img src="unusualprofile.png" height = "200" width="400"></p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=gsignal">gsignal</a> v0.3-2: Implements the <a href="https://octave.sourceforge.io/packages.php">Ovtave signal</a> package which provides a variety of signal processing tools, such as signal generation and measurement, correlation and convolution, filtering, filter design, filter analysis and conversion, power spectrum analysis, system identification, decimation and sample rate change, and windowing. See the <a href="https://cran.r-project.org/web/packages/gsignal/vignettes/gsignal.html">vignette</a> for an introduction.</p>
<p><img src="gsignal.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=legion">legion</a> v0.1.0 Provides functions for implementing multivariate state space models such as Vector Exponential Smoothing and Vector Error-Trend-Seasonal models, for time series analysis and forecasting as described in <a href="https://journals.sagepub.com/doi/10.1177/1471082X0901000401">de Silva et al. (2010)</a> There is a <a href="https://cran.r-project.org/web/packages/legion/vignettes/legion.html">Function Overview</a> and vignettes on <a href="https://cran.r-project.org/web/packages/legion/vignettes/ves.html">Vector ES</a> and <a href="https://cran.r-project.org/web/packages/legion/vignettes/vets.html">Vector ETS</a>.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=parsermd">parsermd</a> v0.1.2: Implements formal grammar and parser for R Markdown documents using the <a href="https://www.boost.org/doc/libs/1_76_0/libs/spirit/doc/x3/html/index.html">Boost Spirit X3</a> library. It also includes a collection of high level functions for working with the resulting abstract syntax tree. There is a <a href="https://cran.r-project.org/web/packages/parsermd/vignettes/parsermd.html">Getting Started Guide</a> and a vignette on <a href="https://cran.r-project.org/web/packages/parsermd/vignettes/templates.html">Rmd Templates</a>.</p>
<p><a href="https://cran.r-project.org/package=riskmetric">riskmetric</a> v0.1.0: Provides facilities for assessing R packages against a number of metrics to help quantify their robustness. Look <a href="https://pharmar.github.io/riskmetric/">here</a> for background on the package and <a href="https://www.pharmar.org/about/">here</a> for background on the R Consortium, R Validation Hub project. There is a <a href="https://cran.r-project.org/web/packages/riskmetric/riskmetric.pdf">Quick Start Guide</a> and a vignette on <a href="https://cran.r-project.org/web/packages/riskmetric/vignettes/extending-riskmetric.html">Extending riskmetric</a>.</p>
<p><a href="https://cran.r-project.org/package=shinyvalidate">shinyvalidate</a> v0.1.0: Provides functions to improve the user experience of Shiny apps by providing feedback when required inputs are missing, or input values are not valid. See <a href="https://cran.r-project.org/web/packages/shinyvalidate/readme/README.html">README</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=ttt">ttt</a> v1.0: Provides tools to create structured, formatted HTML tables. See the <a href="https://cran.r-project.org/web/packages/ttt/vignettes/ttt-intro.html">vignette</a>.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=fitbitViz">fitbitViz</a> v1.0.1: Implements a connection to the <a href="https://dev.fitbit.com/build/reference/web-api/">Fitbit Web API</a> to provide <code>ggplot2</code>, <code>Leaflet</code> and <code>Rayshader</code> visualizations. See the <a href="https://cran.r-project.org/web/packages/fitbitViz/vignettes/fitbit_viz.html">vignette</a> for examples.</p>
<p><img src="fitbitViz.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=ggbreak">ggbreak</a> v0.0.3: Implements scale functions for setting axis breaks for <code>ggplot2</code>. See the <a href="https://cran.r-project.org/web/packages/ggbreak/vignettes/ggbreak.html">vignette</a>.</p>
<p><img src="ggbreak.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=ggpp">ggpp</a> v0.4.0: Provides extensions to <code>ggplot2</code> to add inserts to plots using both <em>native</em> and <a href="https://www.christophenicault.com/post/npc_ggplot2/"><em>npc</em></a> data coordinates. See the <a href="https://cran.r-project.org/web/packages/ggpp/vignettes/grammar-extensions.html">vignette</a> for examples.</p>
<p><img src="ggpp.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=ggseg">ggseg</a> v1.6.3: Implements a <code>ggplot</code> geom for plotting brain atlases using simple features. The largest component of the package is the data for two built-in atlases. See <a href="https://journals.sagepub.com/doi/10.1177/2515245920928009">Mowinckel & Vidal-Piñero (2020)</a> for background. There is an <a href="https://cran.r-project.org/web/packages/ggseg/vignettes/ggseg.html">Introduction</a> along with vignettes on <a href="https://cran.r-project.org/web/packages/ggseg/vignettes/externalData.html">external data</a>, <a href="https://cran.r-project.org/web/packages/ggseg/vignettes/freesurfer_files.html">Freesurfer files</a>, <a href="https://cran.r-project.org/web/packages/ggseg/vignettes/geom-sf.html">using atlases</a>.</p>
<p><img src="ggseg.png" height = "2500" width="550"></p>
<p><a href="https://cran.r-project.org/package=ichimoku">ichimoku</a> v0.2.0: Implements <a href="https://www.investopedia.com/terms/i/ichimokuchart.asp">Ichimoku Kinko Hyo</a>, also commonly known as <a href="https://www.amazon.com/Charts-Trading-Success-Ichimoku-Technique/dp/0956517102">cloud charts</a>, including static and interactive visualizations with tools for creating, backtesting and developing quantitative <em>ichimoku</em> strategies. There is a <a href="https://cran.r-project.org/web/packages/ichimoku/vignettes/reference.html">Reference</a> and a vignette on <a href="https://cran.r-project.org/web/packages/ichimoku/vignettes/strategies.html">Strategies</a>.</p>
<p><img src="ichimoku.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=liminal">liminal</a> v0.1.2: Provides functions for composing interactive visualizations and creating linked interactive graphics for exploratory high-dimensional data analysis. See <a href="https://arxiv.org/abs/2012.06077">Lee et al. (2020)</a> for background. There is a vignette on <a href="https://cran.r-project.org/web/packages/liminal/vignettes/liminal.html">Exploring Non-linear Embeddings</a> and another on the the geometry of <a href="https://cran.r-project.org/web/packages/liminal/vignettes/geometry_parameter_space.html">Parameter Space</a>.</p>
<p><img src="liminal.png" height = "200" width="300"></p>
<p><a href="https://cran.r-project.org/package=mipplot">mipplot</a> v0.3.1: Provides generic functions to produce area, bar, box, and line plots following Integrated Assessment Modeling Consortium <a href="https://www.iamconsortium.org/">(IAMC)</a> submission format in order to visualize climate migration scenarios. See the <a href="https://cran.r-project.org/web/packages/mipplot/vignettes/mipplot-first-steps.html">vignette</a> for first steps.</p>
<p><img src="mipplot.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=qqboxplot">qqboxplot</a> v0.1.0: Implements Q-Q boxplots as an extension to <code>ggplot2</code>. There is a vignette on <a href="https://cran.r-project.org/web/packages/qqboxplot/vignettes/qqboxplot-basic-usage.html">Basic Usage</a> and another that provides <a href="https://cran.r-project.org/web/packages/qqboxplot/vignettes/qqboxplot-paper-replication.html">Examples</a>.
<img src="qqboxplot.png" height = "300" width="500"></p>
<script>window.location.href='https://rviews.rstudio.com/2021/06/24/may-2021-top-40-new-cran-packages/';</script>
Summer Conferences!
https://rviews.rstudio.com/2021/06/17/summer-conferences/
Thu, 17 Jun 2021 00:00:00 +0000https://rviews.rstudio.com/2021/06/17/summer-conferences/
<p>Summer is here, but it is not too late sign up for some summer conferences. The following short list promises interesting speakers, a wide range of topics and plenty of R content.</p>
<p><img src="sc.png" height = "300" width="100%"></p>
<p>June (21 - 23) - The <a href="https://psiweb.org/conferences">PSI 2021</a> conference is online and the <a href="https://psiweb.org/conferences/conference-registration">Registration Portal</a> is still open. Keynote speakers <a href="https://www.ft.com/alan-smith">Alan Smith</a> and <a href="https://www.ft.com/ian-bott">Ian Bott</a> from the Financial Times, and <a href="https://www.statcollab.com/people/janet-wittes/">Janet Wittes</a>, President of the WCG Statistics Collaborative, head the program.</p>
<p>June (21 - 24) - The <a href="https://community.amstat.org/biop/events/ncb/index">Nonclinical Biostatistics Conference 2021</a> is virtual. <a href="https://en.wikipedia.org/wiki/Wendy_L._Martinez">Wendy Martinez</a> of the Bureau of Labor statistics will present <em>A Conversation About Data Ethics</em>, <a href="https://en.wikipedia.org/wiki/Nassim_Nicholas_Taleb">Nassim Taleb</a> of <em>Black Swan</em> fame will deliver the keynote on <em>Statistical Consequences of Fat Tails</em>, and <a href="http://dicook.org/">Di Cook</a> and RStudio’s Carson Sievert will both talk in the Statistical Computational & Visualization Session. <a href="https://community.amstat.org/biop/events/ncb/registration">Registration</a> is open.</p>
<p>July (1 to 2) - <a href="https://r-hta.org/events/workshop/2021/">R for HTA Annual Workshop</a> This online workshop from the Health Technology Assessment Consortium will be focused on R for trial and model-based cost-effectiveness analysis. <a href="https://onlinestore.ucl.ac.uk/conferences-and-events/faculty-of-mathematical-physical-sciences-c06/department-of-statistical-science-f61/f61-workshop-r-for-health-technology-assessment-2021">Registration</a> is open until June 30.</p>
<p>July (5 - 9) - <a href="https://user2021.r-project.org/">useR!2021</a> looks like it is going to be a blockbuster of a conference. The <a href="https://user2021.r-project.org/program/keynotes/">keynote talks</a> alone would be worth the price of admission. This exceptional lineup comprises a remarkably diverse, international group of long-time contributors, new faces, R developers, statisticians, journalists, and educators representing the global R community and speaking on a wide range of topics. I am very pleased to be presenting <em>A little bit about RStudio</em> on July 9 at UTC 9PM. <a href="https://user2021.r-project.org/participation/registration/">Registration</a> closes on June 25.</p>
<p>July (28 - 30) - <a href="https://juliacon.org/2021/">Juliacon 2021</a> will be online and everywhere and <strong>Free</strong>! Long time R contributor <a href="http://janvitek.org/">Jan Vitek</a>, Xiaoye Li of the Lawrence Livermore National Laboratory, and Soumith Chintala of Facebook AI Research will be the keynote speakers. It is free but you need to <a href="https://juliacon.org/2021/tickets/">Register</a>.</p>
<p>Aug (8 - 12) - The <a href="https://ww2.amstat.org/meetings/jsm/2021/">JSM</a> will be online. A keyword search using R and Shiny will turn up quite a few interesting talks. I am particularly looking forward to <a href="https://ww2.amstat.org/meetings/jsm/2021/onlineprogram/AbstractDetails.cfm?abstractid=318815">the talk</a> <em>Simulating Clinical Trials Data with Synthetic.Cdisc.Data and Respectables]</em> by Gabe Becker and Adrian Waddell and <a href="https://ww2.amstat.org/meetings/jsm/2021/onlineprogram/ActivityDetails.cfm?sessionid=220593">the session</a> on <em>Tools to Enable the Use of R by the Biopharmaceutical Industry in a Regulatory Setting</em> which contains five talks from members of the R Consortium’s <a href="https://www.pharmar.org/">R Validation Hub</a> working group.</p>
<p>August (24 - 27) - <a href="https://r-medicine.com/">R/Medicine 2021</a> is online and on track to repeat the last year’s international success. <a href="https://bit.ly/3zuZPTj">Registration</a> is open. The deadline for submitting <a href="https://r-medicine.com/abstract">Abstracts</a> is June 25. Workshops being planned include:</p>
<ul>
<li>R/Med 101: Intro to R for Clinicians and Healthcare Professionals</li>
<li>R Markdown for Reproducible Research (R<sup>3</sup>)</li>
<li>SAS 2 R: Getting off the Island!</li>
<li>From Excel to R+REDcap</li>
<li>Spatial Analysis of Healthcare Data</li>
</ul>
<p>Sept (6 to 9) - <a href="https://rss.org.uk/training-events/conference2021/">RSS 2021 International Conference</a> The Royal Statistical Society conference hopes to be in person in Manchester, UK. The <a href="https://rss.org.uk/training-events/conference2021/conference-programme/">keynote speakers</a> will be Tom Chivers and David Chivers,
Melinda Mills, Jonty Rougier, Eric Tchetgen Tchetgen,
Bin Yu, and <strong>Hadley Wickham</strong>. Submissions for poster presentations are currently open with a deadline of July 1. Registration is open with an early booking discount available until June 4.</p>
<script>window.location.href='https://rviews.rstudio.com/2021/06/17/summer-conferences/';</script>
Functional PCA with R
https://rviews.rstudio.com/2021/06/10/functional-pca-with-r/
Thu, 10 Jun 2021 00:00:00 +0000https://rviews.rstudio.com/2021/06/10/functional-pca-with-r/
<script src="/2021/06/10/functional-pca-with-r/index_files/header-attrs/header-attrs.js"></script>
<p>In two previous posts, <a href="https://rviews.rstudio.com/2021/05/04/functional-data-analysis-in-r/">Introduction to Functional Data Analysis with R</a> and <a href="https://rviews.rstudio.com/2021/05/14/basic-fda-descriptive-statistics-with-r/">Basic FDA Descriptive Statistics with R</a>, I began looking into FDA from a beginners perspective. In this post, I would like to continue where I left off and investigate Functional Principal Components Analysis (FPCA), the analog of ordinary Principal Components Analysis in multivariate statistics. I’ll begin with the math, and then show how to compute FPCs with R.</p>
<p>As I have discussed previously, although the theoretical foundations of FDA depend on some pretty advanced mathematics, it is not necessary to master this math to do basic analyses. The R functions in the various packages insulate the user from most of the underlying theory. Nevertheless, attaining a deep understanding of what the R functions are doing, or looking into any of the background references requires some level of comfort with the notation and fundamental mathematical ideas.</p>
<p>I will define some of the basic concepts and then provide a high level roadmap of the mathematical argument required to develop FPCA from first principals. It is my hope that if you are a total newcomer to Functional Data Analysis you will find this roadmap useful in apprehending the big picture. This synopsis closely follows the presentation by Kokoszka and Reimherr (Reference 1. below).</p>
<p>We are working in <span class="math inline">\(\mathscr{H}\)</span>, a separable <a href="https://en.wikipedia.org/wiki/Hilbert_space#:~:text=A%20Hilbert%20space%20is%20a,of%20calculus%20to%20be%20used.">Hilbert space</a> of square integrable random functions where each random function, <span class="math inline">\(X(\omega,t)\)</span>, where <span class="math inline">\(\omega \in \Omega\)</span> the underlying space of probabilistic outcomes, and <span class="math inline">\(t \in [0,1]\)</span>. (After the definitions below, I will suppress the independent variables and in most equations assume <span class="math inline">\(EX = 0\)</span>.)</p>
<div id="definitions" class="section level3">
<h3>Definitions</h3>
<ul>
<li>A Hilbert Space <span class="math inline">\(\mathscr{H}\)</span> is an infinite dimensional vector space with an inner product denoted by <span class="math inline">\(<.,.>\)</span>. In our case, the <em>vectors</em> are functions.</li>
<li><span class="math inline">\(\mathscr{H}\)</span> is separable if there exists an orthonormal basis. That is, there is an orthogonal collection of functions <span class="math inline">\((e_i)\)</span> in <span class="math inline">\(\mathscr{H}\)</span> such that <span class="math inline">\(<e_i,e_j>\; = 0\)</span> if <span class="math inline">\(i = j\)</span> and 0 otherwise, and every function in <span class="math inline">\(\mathscr{H}\)</span> can be represented as a linear combination of these functions.</li>
<li>The inner product of two functions <span class="math inline">\(X\)</span> and <span class="math inline">\(Y\)</span> in <span class="math inline">\(\mathscr{H}\)</span> is defined as <span class="math inline">\(<X,Y>\; = \int X(\omega,t) Y(\omega,t)dt\)</span>.</li>
<li>The norm of <span class="math inline">\(X\)</span> is defined in terms of the inner product: <span class="math inline">\(\parallel X(\omega) \parallel ^2\; = \int X(\omega, t)^2 dt < \infty\)</span>.</li>
<li><span class="math inline">\(X\)</span> is said to be square integrable if <span class="math inline">\(E\parallel X(\omega) \parallel ^2 < \infty\)</span>.</li>
<li>The <a href="https://math.stackexchange.com/questions/1687111/understanding-the-definition-of-the-covariance-operator">covariance operator</a> <span class="math inline">\(C(y): \mathscr{H} \Rightarrow \mathscr{H}\)</span> for any square integrable function <span class="math inline">\(X\)</span> is given by: <span class="math inline">\(C(y) = E[<X - EX,y>(X - EX)]\)</span></li>
</ul>
</div>
<div id="the-road-to-functional-principal-components" class="section level3">
<h3>The Road to Functional Principal Components</h3>
<p>As we have seen, the fundamental idea of Functional Data Analysis is to represent a function <span class="math inline">\(X\)</span> by a linear combination of basis elements. In the previous posts we showed how to accomplish this using a basis constructed from more or less arbitrarily selected B-spline vectors. But, is there a an empirical, some would say <em>natural</em> basis that can be estimated from the data? The answer is yes, and that is what FPCA is all about.</p>
<p>A good way to start is to look at the distance between a vector <span class="math inline">\(X\)</span> and its projection down into the space spanned by some finite, p-dimensional basis <span class="math inline">\((u_k)\)</span> which is expressed in the following equation,</p>
<p><span class="math inline">\(D = E\parallel X - \sum_{k=1}^{p}<X, u_k>u_k\parallel^2\)</span> <span class="math inline">\((*)\)</span></p>
<p>This expands out to:</p>
<p><span class="math inline">\(= E [< (X - \sum_{k=1}^{p}<X, u_k>u_k, X - \sum_{k=1}^{p}<X, u_k>u_k)>]\)</span></p>
<p>and with a little algebra this:</p>
<p><span class="math inline">\(= E\parallel X \parallel^2 - \sum_{k=1}^{p}E<X, u_k>^2\)</span></p>
<p>It should be clear that we would want to find a basis that makes <span class="math inline">\(D\)</span> as small as possible, and that minimizing <span class="math inline">\(D\)</span> is equivalent to maximizing the term to be subtracted in the line above.</p>
<p>A little algebra shows that, <span class="math inline">\(E<X, u_k>^2 \;=\; <C(u_k),u_k>\)</span> where <span class="math inline">\(C(u_k)\)</span> is the covariance operator defined above.</p>
<p>Now, we are almost at our destination. There is a theorem (e.g. Theorem 11.4.1 in reference 1.) that says for any fixed number of basis elements p, the distance D above is minimized if <span class="math inline">\(u_j = v_j\)</span> where the <span class="math inline">\(v_j\)</span> are the eigenfunctions of <span class="math inline">\(C(y)\)</span> with respect to the unit norm. From this it follows that <span class="math inline">\(E<X, v_j>^2 \;=\; <C(v_,),v_j>\; =\; <\lambda_j, v_j>\; =\; \lambda_j\)</span>.</p>
<p>Going back to equation <span class="math inline">\((*)\)</span>, we can expand <span class="math inline">\(X\)</span> in terms of the basis <span class="math inline">\((v_j)\)</span> so <span class="math inline">\(D = 0\)</span> and we have what is called the <a href="https://en.wikipedia.org/wiki/Karhunen%E2%80%93Lo%C3%A8ve_theorem">Karhunen–Loève</a> expansion: <span class="math inline">\(X = \mu + \sum_{j=1}^{\infty}\xi_jv_j\)</span></p>
<p>where:</p>
<ul>
<li><span class="math inline">\(\mu = EX\)</span><br />
</li>
<li>The deterministic basis functions <span class="math inline">\((v_j)\)</span> are called the <em>functional principal components</em></li>
<li>The <span class="math inline">\((v_j)\)</span> have unit norm and are unique up to their signs. (You can work with <span class="math inline">\(v_j\)</span> or <span class="math inline">\(-v_j\)</span>.)</li>
<li>The eigenvalues are such that: <span class="math inline">\(\lambda_1 > \lambda_2 > . . . \lambda_p\)</span></li>
<li>The random variables <span class="math inline">\(\xi_j =\; <X - \mu,v_j>\)</span> are called the <em>scores</em>.</li>
<li><span class="math inline">\(E\xi_j = 0\)</span>, <span class="math inline">\(E\xi_j^2 = \lambda_j\)</span> and <span class="math inline">\(E|\xi_i\xi_j| = 0,\; if\; i\;\neq\;j\)</span>.</li>
</ul>
<p>And finally, with one more line:<br />
<span class="math inline">\(\sum_{j=1}^{\infty}\lambda_j \: = \: \sum_{j=1}^{\infty}E[<X,v_j>^2] = E\sum_{j=1}^{\infty}<X,v_j>^2 \; = \; E\parallel X \parallel^2 \; < \infty\)</span></p>
<p>we arrive at our destination, the variance decomposition:
<span class="math inline">\(E\parallel X - \mu \parallel^2 \;= \;\sum_{j=1}^{\infty}\lambda_j\)</span></p>
</div>
<div id="lets-calculate" class="section level3">
<h3>Let’s Calculate</h3>
<p>Now that we have enough math to set the context, let’s calculate. We will use the same simulated Brownian motion data that we used in the previous posts, and also construct the same B-spline basis that we used before and save it in the fda object <code>W.obj</code>. I won’t repeat the code here.</p>
<p>The following plot shows <strong>120</strong> simulated curves, each having <strong>1000</strong> points scattered over the interval <strong>[0, 100]</strong>. Each curve has unique observation times over that interval.</p>
<p><img src="/2021/06/10/functional-pca-with-r/index_files/figure-html/unnamed-chunk-1-1.png" width="672" />
For first attempt at calculating functional principal components we’ll use the <code>pca.fd()</code> function from the <code>fda</code> package. So set up wise, we are picking up our calculations exactly where we left off in the previous post. As before, the basis representations of these curves are packed into the fda object <code>W.obj</code>. The function <code>pca.fd()</code> takes <code>W.obj</code> as input. It needs the non-orthogonal B-spline basis to seed its computations and the estimate the covariance matrix and the orthogonal eigenvector basis <span class="math inline">\(v_j\)</span>. The <code>nharm = 5</code> parameter requests computing 5 eigenvalues.</p>
<p>The method of calculation roughly follows the theory outlined above. It starts with a basis representation of the functions, computes the covariance matrix, and calculates the eigenfunctions.</p>
<pre class="r"><code>fun_pca <- pca.fd(W.obj, nharm = 5)</code></pre>
<p>The object produced by <code>pca.fd()</code> is fairly complicated. For example, the list <code>fun_pca$harmonics</code> does not contain the eignevectors themselves, but rather coefficients that enable the eigenvectors to be computed from the original basis. However, because there is a special plot method for <code>plot.pca.fd()</code> it is easy to plot the eigenvectors.</p>
<pre class="r"><code>plot(fun_pca$harmonics, lwd = 3)</code></pre>
<p><img src="/2021/06/10/functional-pca-with-r/index_files/figure-html/unnamed-chunk-3-1.png" width="672" /></p>
<pre><code>## [1] "done"</code></pre>
<p>It is also to obtain the eivenvalues <span class="math inline">\(\lambda_j\)</span>,</p>
<pre class="r"><code>fun_pca$values</code></pre>
<pre><code>## [1] 37.232207 3.724524 1.703604 0.763120 0.547976 0.389431 0.196101
## [8] 0.163289 0.144052 0.116587 0.089307 0.057999 0.054246 0.050683
## [15] 0.042738 0.035107 0.031283 0.024905 0.019079 0.016428 0.011657
## [22] 0.007392 0.002664</code></pre>
<p>and, the proportion of the variance explained by each eigenvalue.</p>
<pre class="r"><code>fun_pca$varprop</code></pre>
<pre><code>## [1] 0.81965 0.08199 0.03750 0.01680 0.01206</code></pre>
</div>
<div id="a-different-approach" class="section level3">
<h3>A Different Approach</h3>
<p>So far in this short series of FDA posts, I have been mostly using the <code>fda</code> package to calculate. In 2003 when it was released, it was ground breaking work. It is still the package that you are most likely to find when doing internet searches, and is the foundation for many subsequent R packages. However, as the <a href="https://cran.r-project.org/web/views/FunctionalData.html">CRAN Task View</a> on Functional Data Analysis indicates, new work in FDA has resulted in several new R packages. The more recent <a href="https://cran.r-project.org/package=fdapace"><code>fdapace</code></a> takes a different approach to calculating principal components. The package takes its name from the <strong>(PACE)</strong> Principal Components by Conditional Expectation algorithm described in the paper by Yao, Müller and Wang (Reference 4. below). The package <a href="https://cran.r-project.org/web/packages/fdapace/vignettes/fdapaceVig.html">vignette</a> is exemplary. It describes the methods of calculation, develops clear examples and provides a list of references to guide your reading about PACE and FDA in general.</p>
<p>A very notable feature of the PACE algorithm is that it is designed specifically to work with sparse data. The vignette describes the two different methods of calculation that package functions employ for sparse and non-sparse data. In this post, In this post we are not working with sparse data, but hope to do so in the future. See the vignette for examples of FPCA with sparse data.</p>
<pre class="r"><code>library(fdapace)</code></pre>
<p>The <code>fdapace</code> package requires data for the functions (curves) and associated times be organized in lists. We begin by using the <code>fdapace::CheckData()</code> function to check the data set up in the tibble <code>df</code>. (See previous post on descriptive statistics for the details on the data construction.)</p>
<pre class="r"><code>CheckData(df$Curve,df$Time)</code></pre>
<p>No error message is generated, so we move on th having the <code>FPCA()</code> function calculate the FPCA outputs including:</p>
<pre class="r"><code>W_fpca <- FPCA(df$Curve,df$Time)</code></pre>
<p>the eigenvalues:</p>
<pre class="r"><code>W_fpca$lambda</code></pre>
<pre><code>## [1] 37.5386 3.2098 1.0210 0.3533</code></pre>
<p>the cumulative percentage of variance explained by the eigenvalue</p>
<pre class="r"><code>W_fpca$cumFVE</code></pre>
<pre><code>## [1] 0.8875 0.9634 0.9875 0.9959</code></pre>
<p>and the scores:</p>
<pre class="r"><code>head(W_fpca$xiEst)</code></pre>
<pre><code>## [,1] [,2] [,3] [,4]
## [1,] 0.8187 1.1971 -1.77755 0.20087
## [2,] -8.7396 2.4165 -0.33768 -0.10565
## [3,] -2.7517 -0.4879 -0.06747 -0.13953
## [4,] 3.9218 0.9419 0.39098 -0.25449
## [5,] -1.4400 0.7691 2.11549 -0.59947
## [6,] 7.3952 0.5114 -2.16391 0.05199</code></pre>
<p>All of these are in fairly good agreement with what we computed above. I am, however, a little surprised by the discrepancy in the value of the second eigenvalue. The default plot method for <code>FPCA()</code> produces a plot indicating the density of the data, a plot of the mean of the functions reconstructed from the eigenfunction expansion, a scree plot of the eigenvalues and a plot of the first three eigenfunctions.</p>
<pre class="r"><code>plot(W_fpca)</code></pre>
<p><img src="/2021/06/10/functional-pca-with-r/index_files/figure-html/unnamed-chunk-12-1.png" width="672" /></p>
<p>Finally, it has probably already occurred to you that if you know the eigenvalues and scores, the Karhunen–Loève expansion can be used to simulate random functions. It can be shown that for the Wiener process:</p>
<p><span class="math inline">\(v_j(t) = \sqrt2 sin(( j - \frac{1}{2})\pi t)\)</span> and <span class="math inline">\(\lambda_j = \frac{1}{(j - \frac{1}{2})^2\pi^2}\)</span></p>
<p>This gives us:</p>
<p><span class="math inline">\(W(t) = \sum_{j=1}^{\infty} \frac {\sqrt2}{(j - \frac{1}{2})\pi)} N_j sin(( j - \frac{1}{2})\pi t\)</span></p>
<p>where the <span class="math inline">\(N_j \:are\; iid \;N(0,1)\)</span>.</p>
<p>The <code>fdapace</code> function <code>fdapace::Wiener()</code> uses this information to simulate an alternative, smoothed version of the Brownian motion, Wiener process.</p>
<pre class="r"><code>set.seed(123)
w <- Wiener(n = 1, pts = seq(0,1, length = 100))
t <- 1:100
df_w <- tibble(t, as.vector(w))
ggplot(df_w, aes(x = t, y = w)) + geom_line()</code></pre>
<p><img src="/2021/06/10/functional-pca-with-r/index_files/figure-html/unnamed-chunk-13-1.png" width="576" /></p>
<p>In future posts, I hope to continue exploring the <code>fdapace</code> package, including its ability to work with sparse data.</p>
</div>
<div id="references" class="section level3">
<h3>References</h3>
<p>I found the following references particularly helpful.</p>
<ol style="list-style-type: decimal">
<li>Kokoszka, P. and Reimherr, M. (2017). <a href="https://www.amazon.com/Introduction-Functional-Analysis-Chapman-Statistical-ebook/dp/B075Z9QCV9/ref=sr_1_1?dchild=1&keywords=Introduction+to+functional+data+analysis&qid=1623276309&sr=8-1"><em>Introduction to Functional Data Analysis</em></a>. CRC.</li>
<li>Hsing, T and Eubank, R. (2015). <a href="https://www.amazon.com/Theoretical-Foundations-Functional-Introduction-Probability/dp/0470016914/ref=sr_1_1?dchild=1&keywords=theoretical+foundations+of+functional+data+analysis&qid=1623276176&sr=8-1"><em>Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators</em></a> Wiley</li>
<li>Wang, J., Chiou, J. and Müller, H. (2015). <a href="https://arxiv.org/pdf/1507.05135.pdf"><em>Review of Functional Data Analysis</em></a></li>
<li>Yao, F., Müller, H, Wang, J. (2012). <a href="https://anson.ucdavis.edu/~mueller/jasa03-190final.pdf"><em>Functional Data Analysis for Sparse Longitudinal Data</em></a> JASA J100, I 470</li>
</ol>
</div>
<script>window.location.href='https://rviews.rstudio.com/2021/06/10/functional-pca-with-r/';</script>
R for Public Health
https://rviews.rstudio.com/2021/06/02/r-for-public-health/
Wed, 02 Jun 2021 00:00:00 +0000https://rviews.rstudio.com/2021/06/02/r-for-public-health/
<p>The COVID19 pandemic has raised the profile of public health workers at all levels from the nurses and doctors working on the front lines at our hospitals, to high level state and federal public health officials. I think its a good bet that eighteen months ago few of us had any clear idea about how the public health care system works, or thought much about the people charged with the awesome responsibility to keep us safe. We are all a little bit wiser now. It strikes me as obvious that we will have a continuing need to improve our public health systems and that this need will create opportunities for data scientists to make significant contributions, in research, logistics, data management, reporting, public communication and many more areas. The <a href="https://blog.rstudio.com/2021/05/18/managing-covid-vaccine-distribution-with-a-little-help-from-shiny/">recent post</a> by my colleague Jesse Mostipak describes how R and Shiny made a big difference in the nitty gritty work required to roll out vaccine distribution in West Virginia. This four minute video about how the West Virginia Army National Guard built a COVID vaccine inventory management system is inspiring.</p>
<div style="text-align:center">
<iframe width="400" height="250" src="https://www.youtube.com/embed/CYilc-rEgjg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
<p>The following are some R resources that you may find helpful if you are seeking to increase your R skills with a eye toward public health applications. Some may be useful to public health professionals seeking to learn R, others may be interesting to R users who want to investigate data science in a public health context.</p>
<h3 id="courses">Courses</h3>
<ul>
<li><p>The Johns Hopkins Graduate Institute of Epidemiology and Biostatistics has a slew of of <a href="https://www.jhsph.edu/departments/epidemiology/continuing-education/graduate-summer-institute-of-epidemiology-and-biostatistics/">summer courses</a> (many of them online), including <a href="https://www.jhsph.edu/departments/epidemiology/continuing-education/graduate-summer-institute-of-epidemiology-and-biostatistics/courses/introduction-to-r-for-public-health-researchers.html">Introduction to R For Public Health Researchers</a>.</p></li>
<li><p>If you are feeling optimistic about traveling and can make your to Estonia this summer you might consider <a href="http://bendixcarstensen.com/SPE/">Statistical Practice in Epidemiology using R</a>.</p></li>
<li><p>If you hurry, you may be able to get into the Coursera course offered by Imperial College London <a href="https://www.coursera.org/specializations/statistical-analysis-r-public-health">Statistical Analysis with R for Public Health Specialization</a>.</p></li>
<li><p>There are several <a href="https://www.classcentral.com/course/introduction-statistics-data-analysis-pu-13079">Coursera Courses</a> for R in a public health context.</p></li>
<li><p>Frank Harrell’s free online course <a href="https://hbiostat.org/bbr/">Biostatistics for Biomedical Research</a>, available on YouTube: <a href="https://www.youtube.com/channel/UC-o_ZZ0tuFUYn8e8rf-QURA">BBRcourse</a>, is an excellent introduction to the basic statistical concepts underlying all medical applications.</p></li>
</ul>
<h3 id="books">Books</h3>
<p>If you can learn on your own with the help of a good book, here are some ideas.</p>
<p><img src="Handbook.png" height = "300" width="500"></p>
<ul>
<li><p><a href="https://epirhandbook.com/contact-tracing-1.html">The Epidemiologist R Handbook</a>: R for applied epidemiology and public health is a delightful, brief, modern introduction to R that covers the basics of the <a href="https://www.tidyverse.org/">Tidyverse</a>, <a href="https://rmarkdown.rstudio.com/">R Markdown</a>, <a href="https://shiny.rstudio.com/">Shiny</a> and <a href="https://rdatatable.gitlab.io/data.table/">data.table</a>.</p></li>
<li><p><a href="https://web.stanford.edu/class/bios221/book/">Modern Statistics for Modern Biology</a>: This book is focused more on genomics than public health applications, but it is probably the best introductory statistical text available. The modern statistics part in the title means statistics as a data driven, computational based science. Every topic is illustrated with well-crafted R code and visualizations. The book is great read.</p></li>
<li><p><a href="https://bookdown.org/taragonmd/phds/">Population Health Data Science with R</a>: An introduction to R by authors who see population health as a systems framework for studying and improving the health of populations through collective action and learning.</p></li>
<li><p><a href="https://global.oup.com/academic/product/epidemiology-with-r-9780198841333?cc=us&lang=en&#">Epidemiology with R</a>: This is a new, reasonably priced book that covers the basics. It emphasizes reproducibility with R Markdown. Code examples are written in a base R style that matches the extensively used <a href="https://cran.r-project.org/package=Epi">Epi</a> package.</p></li>
</ul>
<h3 id="r-packages">R Packages</h3>
<p>If working through a book on you own seems too much of a stretch, then digest an R package or two. Here are a few examples of packages on public health themes with sufficient documentation to make interesting self-learning projects.</p>
<ul>
<li><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5931789/">EpiModel</a>: An R Package for Mathematical Modeling of Infectious Disease over Networks</p></li>
<li><p><a href="https://cran.r-project.org/package=PHEindicatormethods">PHEindicators</a>: Common Public Health Statistics and their Confidence Intervals</p></li>
<li><p><a href="https://www.jstatsoft.org/article/view/v091i12">SimInf</a>: An R Package for Data-Driven Stochastic Disease Simulations</p></li>
<li><p><a href="https://cran.r-project.org/package=SPARSEMODr">SPARSEMODr</a>: Construct spatial, stochastic disease models that show how parameter values fluctuate in response to public health interventions</p></li>
</ul>
<h3 id="shiny">Shiny</h3>
<p>Finally, if you find yourself in a situation similar to the West Virginia Army National Guard team featured in the video, you may just want to teach yourself Shiny. The RStudio <a href="https://shiny.rstudio.com/tutorial/">Shiny Tutorial</a>, along with Hadley Wickham’s book <a href="https://mastering-shiny.org/">Mastering Shiny</a>, is a very good place to start. If you need more structure, you might check out the <a href="https://www.udemy.com/topic/shiny/?utm_source=adwords&utm_medium=udemyads&utm_campaign=DSA_Catchall_la.EN_cc.US&utm_content=deal4584&utm_term=_._ag_95911180068_._ad_436653296108_._kw__._de_c_._dm__._pl__._ti_dsa-437115340933_._li_9032191_._pd__._&matchtype=b&gclid=Cj0KCQjw2NyFBhDoARIsAMtHtZ7tXUSdrjLIvbkpb2BdzwBYYCelutNMt6RUZDCaI7lST-fAXjwwaeQaAumnEALw_wcB">Udemy Courses</a>, or work through the online workshops from <a href="https://library.capture.duke.edu/Panopto/Pages/Viewer.aspx?id=7a59e23a-1f7f-4bd7-8ebc-a943014170b4">Duke University</a> or the <a href="(https://uomresearchit.github.io/r-shiny-course/)">University of Manchester</a>. And by all means, immerse yourself in the examples, posts and podcasts of the <a href="https://shinydevseries.com/">Shiny Developer Series</a>.</p>
<script>window.location.href='https://rviews.rstudio.com/2021/06/02/r-for-public-health/';</script>
April 2021: "Top 40" New CRAN Packages
https://rviews.rstudio.com/2021/05/25/april-2021-top-40-new-cran-packages/
Tue, 25 May 2021 00:00:00 +0000https://rviews.rstudio.com/2021/05/25/april-2021-top-40-new-cran-packages/
<p>One hundred seventy-nine new packages made it to CRAN in April. Here are my “Top 40” picks in twelve categories: Computational Methods, Data, Genomics, Machine Learning, Mathematics, Medicine, Networks, Operations Research, Statistics, Time Series, Utilities, and Visualization.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=abess">abess</a> v0.1.0: Provides a toolkit for solving the best subset selection problem in linear regression, logistic regression, Poisson regression, Cox proportional hazard model, multiple-response Gaussian, and multinomial regression. It implements and generalizes algorithms described in <a href="https://www.pnas.org/content/117/52/33117">Zhu et al. (2020)</a> that exploit a novel sequencing-and-splicing technique to guarantee exact support recovery and globally optimal solution in polynomial times. There is an <a href="https://cran.r-project.org/web/packages/abess/vignettes/abess-guide.html">Introduction</a>.</p>
<p><a href="https://cran.r-project.org/package=eat">eat</a> v0.1.0: Provides functions to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0957417420306072?via%3Dihub">Esteve et al. (2020)</a> for details. There is an <a href="https://cran.r-project.org/web/packages/eat/vignettes/EAT.html">Introduction</a>.</p>
<p><img src="eat.png" height = "300" width="500"></p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=childdevdata">childdevdata</a> v1.1.0: Bundles publicly available data sets with individual milestone data for children aged 0-5 years, with the aim of supporting the construction, evaluation, validation and interpretation of methodologies that aggregate milestone data into informative measures of child development. See <a href="https://cran.r-project.org/web/packages/childdevdata/readme/README.html">README</a>.</p>
<p><img src="child.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=datagovindia">datagovindia</a> v0.0.3: Allows users to search the <a href="https:data.gov.in/ogpl_apis">open data platform</a> of the government of India to communicate with the more than 80,000 available APIs. See the <a href="https://cran.r-project.org/web/packages/datagovindia/vignettes/datagovindia_vignette.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=lehdr">lehdr</a> v0.2.4: Provides functions to query the <a href="https://lehd.ces.census.gov/data/lodes/LODES7/">LODES FTP server</a> to obtain longitudinal <a href="https://lehd.ces.census.gov/">Employer-Household Dynamics</a> data and optionally aggregate Census block-level data. See the <a href="https://cran.r-project.org/web/packages/lehdr/vignettes/getting_started.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=rbioapi">rbioapi</a> v0.7.0: Provides a consistent R interface to the Biologic Web Services API and fully supports <a href="https://cran.r-project.org/web/packages/rbioapi/vignettes/rbioapi_mieaa.html">miEAA</a>, <a href="https://cran.r-project.org/web/packages/rbioapi/vignettes/rbioapi_panther.html">PANTHER</a>, <a href="https://cran.r-project.org/web/packages/rbioapi/vignettes/rbioapi_reactome.html">Reactome</a>, <a href="https://cran.r-project.org/web/packages/rbioapi/vignettes/rbioapi_string.html">String</a>, and <a href="https://cran.r-project.org/web/packages/rbioapi/vignettes/rbioapi_uniprot.html">UniProt</a>. See this <a href="https://cran.r-project.org/web/packages/rbioapi/vignettes/rbioapi.html">vignette</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=tidywikidatar">tidywikidatar</a> v0.2.0: Provides functions to query <a href="https://wikidata.org/">Wilidata</a>, get tidy data frames in response, and cache data in a local <code>SQLite</code> database. See <a href="https://cran.r-project.org/web/packages/tidywikidatar/readme/README.html">README</a>.</p>
<h3 id="genomics">Genomics</h3>
<p><a href="https://cran.r-project.org/package=protti">protti</a> v0.1.1: Provides functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry and regular bottom-up proteomics experiments. See <a href="https://www.nature.com/articles/nbt.2999">Feng et. al. (2014)</a> for background. There are vignettes for various workflows: <a href="https://cran.r-project.org/web/packages/protti/vignettes/data_analysis_dose_response_workflow.html">Dose Response</a>, <a href="https://cran.r-project.org/web/packages/protti/vignettes/data_analysis_single_dose_treatment_workflow.html">Single Treatment Dose Response</a>, <a href="https://cran.r-project.org/web/packages/protti/vignettes/input_preparation_workflow.html">Input Preparation</a>, and <a href="https://cran.r-project.org/web/packages/protti/vignettes/quality_control_workflow.html">Quality Control</a>.</p>
<p><img src="protti.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=Rediscover">Rediscover</a> v0.1.0: Implements an optimized method for identifying mutually exclusive genomic events based on the Poisson-Binomial distribution that takes into account that some samples are more mutated than others. See <a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1114-x">Canisius et al. (2016)</a>. The <a href="https://cran.r-project.org/web/packages/Rediscover/vignettes/Rediscover.html">vignette</a> provides an introduction.</p>
<p><img src="Rediscover.png" height = "300" width="500"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=geocmeans">geocmeans</a> v0.1.1: Provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results, as well as indices for estimating the spatial consistency and classification quality. See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0031320306003451?via%3Dihub">Cai et al. (2007)</a>, <a href="https://www.sciencedirect.com/science/article/abs/pii/S1051200412002357?via%3Dihub">Zaho et al. (2013)</a>, and <a href="https://journals.openedition.org/cybergeo/36414">Gelb & Appaericio (2021)</a> for background. There is an <a href="https://cran.r-project.org/web/packages/geocmeans/vignettes/introduction.html">Introduction</a> and an additional <a href="https://cran.r-project.org/web/packages/geocmeans/vignettes/adjustinconsistency.html">vignette</a>.</p>
<p><img src="geocmeans.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=Rforestry">Rforestry</a> v0.9.0.4: Provides fast implementations of Honest Random Forests, Gradient Boosting, and Linear Random Forests, with an emphasis on inference and interpretability. See <a href="https://arxiv.org/abs/1906.06463">Kunzel et al. (2019)</a>. See <a href="https://cran.r-project.org/web/packages/Rforestry/readme/README.html">README</a> to get started.</p>
<h3 id="mathematics">Mathematics</h3>
<p><a href="https://cran.r-project.org/package=elasdics">elasdics</a> v0.1.2: Provides functions to align curves and to compute mean curves based on the elastic distance defined in the square-root-velocity framework. For information on the framework see <a href="https://link.springer.com/book/10.1007%2F978-1-4939-4020-2">Srivastava and Klassen (2016)</a>, For more theoretical details see <a href="https://arxiv.org/abs/2104.11039">Steyer et al. (2021)</a></p>
<p><a href="https://cran.r-project.org/package=jordan">jordan</a> v1.0-1: Provides functions to manipulate Jordan Algebras, commutative but non-associative algebraic structures that satisfy the Jordan Identify: (xy)x<sup>2</sup> = x(yx<sup>2</sup>). See <a href="http://math.nsc.ru/LBRT/a1/files/mccrimmon.pdf">McCrimmon (204)</a>.</p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=ccoptimalmatch">ccoptimalmatch</a> v0.1.0: Uses sub-sampling to create pseudo-observations of controls to optimally match cases with controls. See <a href="https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01256-3">Mamoiris (2021)</a> for the theory and the <a href="https://cran.r-project.org/web/packages/ccoptimalmatch/vignettes/ccoptimalmatching_vignette.html">vignette</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=nCov2019">nCov2019</a> v0.4.4: Implements an interface to <a href="https://disease.sh/">disease.sh - Open Disease Data API</a> to access real time and historical data of COVID-19 cases, vaccine and therapeutics data. There is a <a href="https://cran.r-project.org/web/packages/nCov2019/vignettes/nCov2019.html">vignette</a>.</p>
<p><img src="nCov2019.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=hlaR">hlaR</a> v0.1.0: Implements a tool for the eplet analysis of donor and recipient HLA (human leukocyte antigen) mismatches. There are vignettes on <a href="https://cran.r-project.org/web/packages/hlaR/vignettes/allele-haplotype.html">Imputation</a> and <a href="https://cran.r-project.org/web/packages/hlaR/vignettes/eplet-mm.html">Eplet Mismatch</a> and a <a href="https://emory-larsenlab.shinyapps.io/hlar_shiny/">Shiny App</a> as well.</p>
<p><a href="https://cran.r-project.org/package=ReviewR">RevieweR</a> v2.3.6: Implements a portable <code>Shiny</code> tool to explore patient-level electronic health record data and perform chart review in a single integrated framework. This tool supports the <a href="https://www.ohdsi.org/data-standardization/the-common-data-model/">OMOP</a> common data model as well as the <a href="https://mimic.physionet.org/">MIMIC-III</a> data model, and chart review through a <a href="https://www.project-redcap.org/">REDCap</a> API. See the <a href="https://reviewr.thewileylab.org/">RevieweR Website</a> for more information. There are several vignettes including <a href="https://cran.r-project.org/web/packages/ReviewR/vignettes/deploy_local.html">Local</a>, <a href="https://cran.r-project.org/web/packages/ReviewR/vignettes/deploy_docker.html">Docker</a>, <a href="https://cran.r-project.org/web/packages/ReviewR/vignettes/deploy_bigquery.html">BigQuery</a> and <a href="https://cran.r-project.org/web/packages/ReviewR/vignettes/deploy_server.html">Shiny Server</a> deployment and performing a <a href="https://cran.r-project.org/web/packages/ReviewR/vignettes/usage_perform_chart_review.html">Chart Review</a>.</p>
<p><img src="RevieweR.png" height = "300" width="500"></p>
<h3 id="networks">Networks</h3>
<p><a href="https://CRAN.R-project.org/package=greed">greed</a> v0.5.1: Provides an ensemble of algorithms to enable clustering of networks and data matrices with different type of generative models. Model selection and clustering is performed in combination by optimizing the Integrated Classification Likelihood. The optimization is performed with a combination of greedy local search and a genetic algorithm. See <a href="https://arxiv.org/abs/2002.11577">Côme et al. (2021)</a> for background and the vignettes on <a href="https://cran.r-project.org/web/packages/greed/vignettes/GMM.html">Gaussian Mixture Models</a> and <a href="https://cran.r-project.org/web/packages/greed/vignettes/graph-clustering-with-sbm.html">Clustering</a>.</p>
<p><img src="greed.png" height = "400" width="400"></p>
<h3 id="operations-research">Operations Research</h3>
<p><a href="https://cran.r-project.org/package=critpath">critpath</a> v0.1.2: Provides functions to compute critical paths, schedules, <a href="https://www.investopedia.com/terms/p/pert-chart.asp">PERT</a> charts and <a href="https://en.wikipedia.org/wiki/Gantt_chart">Gantt</a> charts. There is a vignette on <a href="https://cran.r-project.org/web/packages/critpath/vignettes/CPMandPERT.html">CPM and PERT</a> and another on the <a href="https://cran.r-project.org/web/packages/critpath/vignettes/LESS.html">LESS Method</a>.</p>
<p><img src="critpath.png" height = "350" width="350"></p>
<p><a href="https://cran.r-project.org/package=himach">himach</a> v0.1.2: Provides functions to compute the best routes between airports for supersonic aircraft flying subsonic over land. There is an <a href="https://cran.r-project.org/web/packages/himach/vignettes/Supersonic_Routes.html">Introduction to Supersonic Routing</a> and a vignette on <a href="https://cran.r-project.org/web/packages/himach/vignettes/Supersonic_Routes_in_depth.html">Advanced Supersonic Routing</a>.</p>
<p><img src="himach.png" height = "350" width="350"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=convdistr">convdistr</a> v1.5.3: Provides functions to compute convolutions of probability distributions via a method that creates a new random number function for individual random samples from the random generator function of each distribution. There is an <a href="https://cran.r-project.org/web/packages/convdistr/vignettes/using-convdistr.html">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/convdistr/vignettes/sample_size.html">Sample Size</a>.</p>
<p><img src="convdistr.png" height = "350" width="500"></p>
<p><a href="https://cran.r-project.org/package=gamlss.lasso">gamlss.lasso</a> v1.0-0: Provides an interface for extra high-dimensional smooth functions for Generalized Additive Models for Location Scale and Shape (GAMLSS) including lasso, ridge, elastic net and least angle regression. The <a href="https://www.gamlss.com/">gamlss website</a> provides considerable information.</p>
<p><img src="gamlss.jpeg" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=GGMnonreg">GGMnonreg</a> v1.0.0: Provides functions to estimate non-regularized Gaussian graphical models, Ising models, and mixed graphical models. See <a href="https://www.tandfonline.com/doi/abs/10.1080/00273171.2019.1575716?journalCode=hmbr20">Williams et al. (2019)</a>, <a href="https://bpspsychub.onlinelibrary.wiley.com/doi/abs/10.1111/bmsp.12173">Williams & Rast (2019)</a>, and <a href="https://psyarxiv.com/fb4sa/">Williams (2020)</a> for details. <a href="https://cran.r-project.org/web/packages/GGMnonreg/readme/README.html">README</a> contains examples.</p>
<p><img src="GGMnonreg.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=relevance">relevance</a> v1.1: Implements the concepts of relevance and significance measures introduced in <a href="https://stat.ethz.ch/~stahel/relevance/stahel-relevance2103.pdf">Stahel (2021)</a> to augment inference with p-values. See the <a href="https://cran.r-project.org/web/packages/relevance/vignettes/relevance-descr.pdf">vignette</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=sasfunclust">sasfunclust</a> v1.0.0: Implements the sparse and smooth functional clustering method described in <a href="https://arxiv.org/abs/2103.15224">Centofanti et al. (2021)</a> that aims to classify a sample of curves into homogeneous groups while jointly detecting the most informative portions of domain. See <a href="https://cran.r-project.org/web/packages/sasfunclust/readme/README.html">README</a> to get started.</p>
<p><img src="sasfunclust.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=survMS">survMS</a> v0.0.1: Provides functions to simulate data from the <a href="https://www.sciencedirect.com/science/article/pii/S0169716103230248">Accelerated Hazard</a>, <a href="https://en.wikipedia.org/wiki/Accelerated_failure_time_model">Accelerated Failure Time</a>, and <a href="https://socialsciences.mcmaster.ca/jfox/Books/Companion-2E/appendix/Appendix-Cox-Regression.pdf">Cox</a> survival models. See <a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.2059">Bender et al. (2004)</a> for the methods used to implement the Cox model, and the <a href="https://cran.r-project.org/web/packages/survMS/vignettes/how-to-simulate-survival-models.html">vignette</a> and <a href="https://github.com/mathildesautreuil/survMS/">GitHub</a> for an introduction and examples.</p>
<p><img src="survMS.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=TestGardener">TestGardener</a> v0.1.4: Provides functions to develop, evaluate, and score multiple choice examinations, psychological scales, questionnaires, and similar types of data involving sequences of choices among one or more sets of answers. See <a href="https://www.mdpi.com/2624-8611/2/4/26">Ramsay et al. (2020)</a> and <a href="https://journals.sagepub.com/doi/10.3102/1076998619885636">Ramsay et al. (2019)</a> for the methodology and the vignettes <a href="https://cran.r-project.org/web/packages/TestGardener/vignettes/SDSAnalysis.html">Symptom Distress Analysis</a> and <a href="https://cran.r-project.org/web/packages/TestGardener/vignettes/SweSATQuantitativeAnalysis.html">SweSAT Quantitative Analysis</a>.</p>
<p><img src="TestG.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=wpa">wpa</a> v1.5.0: Provides opinionated functions to enable easier and faster analysis of Workplace Analytics data. See the <a href="https://cran.r-project.org/web/packages/wpa/vignettes/intro-to-wpa.html">vignette</a> for an introduction.</p>
<p><img src="wpa.png" height = "400" width="400"></p>
<h3 id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=garchmodels">garchmodels</a> v0.1.1: Implements a framework for using <a href="https://www.investopedia.com/terms/g/garch.asp#:~:text=GARCH%20is%20a%20statistical%20modeling,an%20autoregressive%20moving%20average%20process.">GARCH</a> models with the <code>tidymodels</code> ecosystem. It includes both univariate and multivariate methods from the <code>rugarch</code> and <code>rmgarch</code> packages. There is a <a href="https://cran.r-project.org/web/packages/garchmodels/vignettes/getting-started.html">Getting Started Guide</a> and a <a href="https://cran.r-project.org/web/packages/garchmodels/vignettes/tuning_univariate_algorithms.html">vignette</a> on tuning univariate GARCH models.</p>
<p><img src="garchmodels.png" height = "600" width="400"></p>
<p><a href="https://cran.r-project.org/package=tensorTS">tensorTS</a> v0.1.1: Provides functions for estimating, simulating and predicting factor and autoregressive models for matrix and tensor valued time series. See <a href="https://arxiv.org/abs/1905.07530">Chen et al. (2020)</a>, <a href="https://www.sciencedirect.com/science/article/abs/pii/S0304407620302050?via%3Dihub">Chen et al. (2020)</a>, and <a href="https://arxiv.org/abs/2006.02611">Han et al. (2020)</a> for the math.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=diffmatchpatch">diffmatchpatch</a> v0.1.0: Implements a wrapper for Google’s <a href="https://github.com/google/diff-match-patch">diff-match-patch</a> library. It provides basic tools for computing diffs, finding fuzzy matches, and constructing / applying patches to strings. See <a href="https://github.com/google/diff-match-patch">README</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=erify">erify</a> v0.2.0: Provides several validator functions to check if arguments passed by users have valid types, lengths, etc., and if not, to generate informative and good-formatted error messages in a consistent style. See the <a href="https://cran.r-project.org/web/packages/erify/vignettes/erify.html">vignette</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=juicr">juicr</a> v0.1: Provides a GUI interface for automating data extraction from multiple images containing scatter and bar plots, semi-automated tools to tinker with extraction attempts, and a fully-loaded point-and-click manual extractor with image zoom, calibrator, and classifier. See the <a href="https://cran.r-project.org/web/packages/juicr/vignettes/juicr_basic_vignette_v0.1.pdf">vignette</a> for examples, and the <a href="https://www.youtube.com/c/LajeunesseLab/">Youtube channel</a> for a course on meta analysis.</p>
<p><a href="https://CRAN.R-project.org/package=mailmerge">mailmerge</a> v 0.2.1: Allows users to mail merge using markdown documents and gmail, parse markdown documents as the body of email, use the <code>yaml</code> header to specify the subject line of the email, preview the email in the RStudio viewer pane, and send (draft) email using <code>gmailr</code>. See the <a href="https://cran.r-project.org/web/packages/mailmerge/vignettes/introduction.html">vignette</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=m61r">m61r</a> v0.0.2: Provides <code>dplyr</code> and <code>tidyr</code> like data manipulation functions using only base R and no dependencies. See the <a href="https://cran.r-project.org/web/packages/m61r/vignettes/base_r.pdf">vignette</a> for examples.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=flametree">flametree</a> v0.1.2: Implements a generative art system for producing tree-like images using an L-system to create the structures. See <a href="https://cran.r-project.org/web/packages/flametree/readme/README.html">README</a> to get started.</p>
<p><img src="flametree.png" height = "500" width="500"></p>
<p><a href="https://cran.r-project.org/package=leafdown">leafdown</a> v1.0.0: Provides drill down functionality for <code>leaflet</code> choropleths in <code>shiny</code> apps. There is an <a href="https://cran.r-project.org/web/packages/leafdown/vignettes/Introduction.html">Introduction</a> and a <a href="https://cran.r-project.org/web/packages/leafdown/vignettes/Showcase_electionapp.html">Showcase</a> example.</p>
<p><img src="leafdown.png" height = "600" width="400"></p>
<p><a href="https://cran.r-project.org/package=mapping">mapping</a> v1.2: Provides coordinates, linking and mapping functions for mapping workflows of different geographical statistical units. Geographical coordinates automatically link with the input data to generate maps. See the <a href="https://cran.r-project.org/web/packages/mapping/vignettes/a-journey-into-mapping.html">vignette</a> to get started.</p>
<p><img src="mapping.png" height = "350" width="500"></p>
<p><a href="https://cran.r-project.org/package=materialmodifier">materialmodifier</a> v1.0.0: Provides functions to apply image processing effects to modify the perceived material properties such as gloss, smoothness, and blemishes. Look <a href="https://github.com/tsuda16k/materialmodifier">here</a> for documentation and practical tips of the package is available at</p>
<p><img src="material.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=svplots">svplots</a> v0.1.0: Implements two versions of sample variance plots illustrating the squared deviations from sample variance as described in <a href="https://www.tandfonline.com/doi/abs/10.1080/03610918.2020.1851716?journalCode=lssp20">Wijesuriya (2020)</a>. See the <a href="https://cran.r-project.org/web/packages/svplots/vignettes/Svplots_and_Testing_Hypothes.html">vignette</a>.</p>
<p><img src="svplots.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=vivid">vivid</a> v0.1.0: Provides a suite of plots for displaying variable importance and two-way variable interaction. Plots include partial dependence plots laid out in “pairs plot”” or <a href="https://www.zenplot.com/en?gclid=CjwKCAjwy42FBhB2EiwAJY0yQi2mfla-DMC2uuDglAGzh1mUx4sYyT5p8uEmfWgeMv5gKBh_5V2RDxoC6jEQAvD_BwE">zenplots</a> style. There is an <a href="https://cran.r-project.org/web/packages/vivid/vignettes/vivid.html">Introduction</a> and a <a href="https://cran.r-project.org/web/packages/vivid/vignettes/vividQStart.html">Quick Start Guide</a>.</p>
<p><img src="vivid.png" height = "500" width="300"></p>
<script>window.location.href='https://rviews.rstudio.com/2021/05/25/april-2021-top-40-new-cran-packages/';</script>
Basic FDA Descriptive Statistics with R
https://rviews.rstudio.com/2021/05/14/basic-fda-descriptive-statistics-with-r/
Fri, 14 May 2021 00:00:00 +0000https://rviews.rstudio.com/2021/05/14/basic-fda-descriptive-statistics-with-r/
<script src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/header-attrs/header-attrs.js"></script>
<script src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/htmlwidgets/htmlwidgets.js"></script>
<script src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/plotly-binding/plotly.js"></script>
<script src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/typedarray/typedarray.min.js"></script>
<script src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/jquery/jquery.min.js"></script>
<link href="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/crosstalk/css/crosstalk.css" rel="stylesheet" />
<script src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/crosstalk/js/crosstalk.min.js"></script>
<link href="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/plotly-htmlwidgets-css/plotly-htmlwidgets.css" rel="stylesheet" />
<script src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/plotly-main/plotly-latest.min.js"></script>
<p>In a previous post, I introduced the topic of Functional Data Analysis (FDA). In that post, I provided some background on Functional Analysis, the mathematical theory that makes FDA possible, identified FDA resources that might be of interest R users, and showed how to turn a series of data points into an FDA object. In this post, I will pick up where I left off and move on to doing some very basic FDA descriptive statistics.</p>
<p>Let’s continue with the same motivating example from last time. We will use synthetic data generated by a Brownian motion process and pretend that it is observed longitudinal data. However, before getting to the statistics, I would like to take a tiny, tidy diversion. The functions in <code>fda</code> and other fundamental FDA R packages require data structured in matrices. Consequently, the examples in the basic FDA reference works (listed below) construct matrices using code that seems to be convenient for the occasion. I think this makes adapting sample code to user data a little harder than it needs to be. There ought to be standard data structures for working with FDA data. I propose tibbles or data frames with function values packed into lists.</p>
<p>The following function generates <code>n_points</code> data points for each of <code>n_curve</code> Brownian motion curves that represent the longitudinal data collected from <code>n_curve</code> subjects.</p>
<pre class="r"><code>library(fda)
library(tidyverse)
library(plotly)
# Function to simulate data
fake_curves <- function(n_curves = 100, n_points = 80, max_time = 100){
ID <- 1:n_curves
x <- vector(mode = "list", length = n_curves)
t <- vector(mode = "list", length = n_curves)
for (i in 1:n_curves){
t[i] <- list(sort(runif(n_points,0,max_time)))
x[i] <- list(cumsum(rnorm(n_points)) / sqrt(n_points))
}
df <- tibble(ID,t,x)
names(df) <- c("ID", "Time", "Curve")
return(df)
}</code></pre>
<p>Notice that each curve is associated with a unique set of random time points that lie in the interval [0, max_time]. Not being restricted to situations where data from all subjects must be observed at the same times is a big deal. However, in practice you may encounter problems that will require curve alignment procedures. We will ignore this for now. Note that the variables <code>Time</code> and <code>Curve</code> contain lists of data points in each cell.</p>
<pre class="r"><code>set.seed(123)
n_curves <- 40
n_points <- 80
max_time <- 100
df <- fake_curves(n_curves = n_curves,n_points = n_points, max_time = max_time)
head(df)</code></pre>
<pre><code>## # A tibble: 6 x 3
## ID Time Curve
## <int> <list> <list>
## 1 1 <dbl [80]> <dbl [80]>
## 2 2 <dbl [80]> <dbl [80]>
## 3 3 <dbl [80]> <dbl [80]>
## 4 4 <dbl [80]> <dbl [80]>
## 5 5 <dbl [80]> <dbl [80]>
## 6 6 <dbl [80]> <dbl [80]></code></pre>
<p>Later on, this kind of structure will be convenient for data sets that contain both FDA curves and other scalar covariates. Note that if you are using the RStudio IDE running the function <code>View(df)</code> will show you an expanded view of the tibble under a tab labeled df that should look something like this:</p>
<p><img src="df.png" height = "400" width="600"></p>
<p>Next, we unpack the data into a long form tibble and plot.</p>
<pre class="r"><code>df_1 <- df %>% select(!c(ID,Curve)) %>% unnest_longer(Time)
df_2 <- df %>% select(!c(ID,Time)) %>% unnest_longer(Curve)
ID <- sort(rep(1:n_curves,n_points))
df_l <- cbind(ID,df_1,df_2)
p <- ggplot(df_l, aes(x = Time, y = Curve, group = ID, col = ID)) +
geom_line()
p</code></pre>
<p><img src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/figure-html/unnamed-chunk-3-1.png" width="672" />
Now that we have the data, remember that FDA treats each curve as a basic data element living in an infinite dimensional vector space. The vectors, <span class="math inline">\(X\)</span>, are random functions: <span class="math inline">\(X: \Omega \Rightarrow \mathscr{H}\)</span> where <span class="math inline">\(\Omega\)</span> is an underlying probability space and <span class="math inline">\(\mathscr{H}\)</span> is typically a complete Hilbert Space of square integrable functions. That is, for each <span class="math inline">\(\omega \in \Omega\)</span>, <span class="math inline">\(\parallel X(\omega) \parallel ^2 = \int X((\omega)(t))^2 dt < \infty\)</span>. In multivariate statistics we work with random variables that live in a Euclidean space, here we are dealing with random functions that live in a Hilbert space. In this context, square integrable means <span class="math inline">\(E \parallel X \parallel ^2 < \infty\)</span>. You will find lucid elaborations of all of this in the references below which I have reproduced below from the previous post.</p>
<p>The bridge from the theory to practice is the ability to represent the random functions as a linear combination of basis vectors. This was the topic of the previous post. Here is some compact code to construct the basis.</p>
<pre class="r"><code>knots = c(seq(0,max_time,5)) #Location of knots
n_knots = length(knots) #Number of knots
n_order = 4 # order of basis functions: for cubic b-splines: order = 3 + 1
n_basis = length(knots) + n_order - 2;
basis = create.bspline.basis(rangeval = c(0,max_time), n_basis)
plot(basis)</code></pre>
<p><img src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/figure-html/unnamed-chunk-4-1.png" width="672" />
This next bit of code formats the data in the long form tibble into the matrix input expected by the <code>fda</code> functions and creates an <code>fda</code> object that contains the coefficients and basis functions used to smooth data. Note the smoothing constant of lambda = .5.</p>
<pre class="r"><code>argvals <- matrix(df_l$Time, nrow = n_points, ncol = n_curves)
y_mat <- matrix(df_l$Curve, nrow = n_points, ncol = n_curves)
W.obj <- Data2fd(argvals = argvals, y = y_mat, basisobj = basis, lambda = 0.5)</code></pre>
<p>Next somewhat anticlimactically after all of the preparation and theory, we use the <code>fda</code> functions <code>mean.fd()</code> and <code>std.fd()</code> to calculate the pointwise mean and standard deviation from information contained in <code>fda</code> object. In order to use these objects to calculate the pointwise confidence interval for the mean it is necessary to construct a couple of new <code>fda</code> objects for the upper and lower curves. Then, we plot the smoothed curves for our data along with the pointwise mean and pointwise 95% confidence bands for the mean.</p>
<pre class="r"><code>W_mean <- mean.fd(W.obj)
W_sd <- std.fd(W.obj)
# Create objects for the standard upper and lower standard deviation
SE_u <- fd(basisobj = basis)
SE_l <- fd(basisobj = basis)
# Fill in the sd values
SE_u$coefs <- W_mean$coefs + 1.96 * W_sd$coefs/sqrt(n_curves)
SE_l$coefs <- W_mean$coefs - 1.96 * W_sd$coefs/sqrt(n_curves)
plot(W.obj, xlab="Time", ylab="", lty = 1)</code></pre>
<pre><code>## [1] "done"</code></pre>
<pre class="r"><code>title(main = "Smoothed Curves")
lines(SE_u, lwd = 3, lty = 3)
lines(SE_l, lwd = 3, lty = 3)
lines(W_mean, lwd = 3)</code></pre>
<p><img src="/2021/05/14/basic-fda-descriptive-statistics-with-r/index_files/figure-html/unnamed-chunk-6-1.png" width="672" />
Finally, we compute the covariance/correlation matrix for our sample of smoothed curves and use <a href="https://plotly.com/r/3d-surface-plots/"><code>plotly</code></a> to create an interactive plot of the three dimensional correlation surface along with a contour map.</p>
<pre class="r"><code>days <- seq(0,100, by=2)
cov_W <- var.fd(W.obj)
var_mat <- eval.bifd(days,days,cov_W)</code></pre>
<pre class="r"><code>fig <- plot_ly(x = days, y = days, z = ~var_mat) %>%
add_surface(contours = list(
z = list(show=TRUE,usecolormap=TRUE, highlightcolor="#ff0000", project=list(z=TRUE))))
fig <- fig %>%
layout(scene = list(camera=list(eye = list(x=1.87, y=0.88, z=-0.64))))
fig</code></pre>
<p><div id="htmlwidget-1" style="width:672px;height:480px;" class="plotly html-widget"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"visdat":{"10fc4fe7d04f":["function () ","plotlyVisDat"]},"cur_data":"10fc4fe7d04f","attrs":{"10fc4fe7d04f":{"x":[0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100],"y":[0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100],"z":{},"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"surface","contours":{"z":{"show":true,"usecolormap":true,"highlightcolor":"#ff0000","project":{"z":true}}},"inherit":true}},"layout":{"margin":{"b":40,"l":60,"t":25,"r":10},"scene":{"camera":{"eye":{"x":1.87,"y":0.88,"z":-0.64}},"xaxis":{"title":[]},"yaxis":{"title":[]},"zaxis":{"title":"var_mat"}},"hovermode":"closest","showlegend":false,"legend":{"yanchor":"top","y":0.5}},"source":"A","config":{"showSendToCloud":false},"data":[{"colorbar":{"title":"var_mat","ticklen":2,"len":0.5,"lenmode":"fraction","y":1,"yanchor":"top"},"colorscale":[["0","rgba(68,1,84,1)"],["0.0416666666666667","rgba(70,19,97,1)"],["0.0833333333333333","rgba(72,32,111,1)"],["0.125","rgba(71,45,122,1)"],["0.166666666666667","rgba(68,58,128,1)"],["0.208333333333333","rgba(64,70,135,1)"],["0.25","rgba(60,82,138,1)"],["0.291666666666667","rgba(56,93,140,1)"],["0.333333333333333","rgba(49,104,142,1)"],["0.375","rgba(46,114,142,1)"],["0.416666666666667","rgba(42,123,142,1)"],["0.458333333333333","rgba(38,133,141,1)"],["0.5","rgba(37,144,140,1)"],["0.541666666666667","rgba(33,154,138,1)"],["0.583333333333333","rgba(39,164,133,1)"],["0.625","rgba(47,174,127,1)"],["0.666666666666667","rgba(53,183,121,1)"],["0.708333333333333","rgba(79,191,110,1)"],["0.75","rgba(98,199,98,1)"],["0.791666666666667","rgba(119,207,85,1)"],["0.833333333333333","rgba(147,214,70,1)"],["0.875","rgba(172,220,52,1)"],["0.916666666666667","rgba(199,225,42,1)"],["0.958333333333333","rgba(226,228,40,1)"],["1","rgba(253,231,37,1)"]],"showscale":true,"x":[0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100],"y":[0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100],"z":[[0.0300141214765612,0.015686905073093,0.00564093884922246,0.00447256728960355,0.00995490109738018,0.0130378171941865,0.00725655484167828,-0.00232554798509603,-0.00801467720093724,-0.00790263578918598,-0.00586684380172273,-0.00503976012652639,-0.00456780472318001,-0.00268331461816891,0.000485556784993114,0.00291483970576381,0.00310419846129833,0.00197754095554643,0.00105330118737587,0.00158461271469904,0.00455930865447297,0.0102627510155511,0.0162165565814951,0.0193760713693298,0.0195639488111623,0.0194701497541815,0.0210337796708257,0.0226918603753865,0.0221081111780568,0.0192972748291135,0.0166251175189173,0.0157786309103451,0.015320834914587,0.013141952079488,0.00962070316816562,0.00762430715900963,0.0091589006740241,0.0123535086113619,0.0144475321088493,0.01451241822894,0.0134516599587157,0.012184664428522,0.0113758786592171,0.0115914994526642,0.0126039892034018,0.0133920758986437,0.0132433983079006,0.0128192790963617,0.0130612176454187,0.0134560954089633,0.0120361759133872],[0.015686905073093,0.0226633175061417,0.0244188932917484,0.0250220663890372,0.0258019153407324,0.025348163273158,0.0226239217260867,0.0185622655307192,0.0145795755299542,0.0118809819100223,0.011460364200486,0.0136427970615,0.0161148749202248,0.0160327747314867,0.0135243329261398,0.0116890454110662,0.0127990751660581,0.0153004431188909,0.0167891143432511,0.0173100860678814,0.0193573876765801,0.0245339491722834,0.0304523843988157,0.0338539590040724,0.0343832480444693,0.0345881359849426,0.0363282575780583,0.0382053253515058,0.0380840687391034,0.0356116588086777,0.0322177082620627,0.0291246356267592,0.0264160928835393,0.0239139417559029,0.0219667756937601,0.0214499198734321,0.0229121181584461,0.0255041836435166,0.0280467134870814,0.029803440233887,0.0304812318149881,0.0298735894418936,0.0280434804242242,0.0251388192376269,0.021711875112217,0.0187192720325778,0.0169049313606271,0.0160916414016255,0.0158522428499515,0.01530729642547,0.0131250828735328],[0.00564093884922246,0.0244188932917484,0.0404127823425623,0.0483733511242439,0.0480174477822059,0.0440280234846942,0.0403499317976119,0.0371119657865627,0.0335517590970443,0.0303526265888372,0.0296435643360046,0.0324526349389525,0.035152744283584,0.0331828317842279,0.0272677591173969,0.0234183102217875,0.026004021647384,0.0319101495611166,0.0363122500732657,0.0380973901937379,0.0398641478320654,0.0435528802177005,0.0478255817033191,0.0506029719293355,0.0516070386956147,0.0523610379614717,0.0538230640251236,0.0543773005126641,0.0518705141914604,0.0465888704428848,0.0412579332623144,0.0380833303825961,0.0367667012492371,0.0364074277070802,0.0366508994378863,0.0376885139603343,0.0396474612639257,0.0423031433367703,0.0453253497513456,0.0479775655285843,0.0491169711378746,0.0478109208618154,0.0440381258969864,0.0380544047895784,0.0312487428714465,0.0261432922601097,0.0245335471295971,0.0251112973955133,0.0258242178246622,0.025351634247369,0.0231045235574797],[0.00447256728960355,0.0250220663890372,0.0483733511242439,0.0658210619904763,0.0737960986504746,0.0738656199344664,0.0682277414378763,0.0607111433530042,0.0554453449883687,0.0541034188934748,0.055901990858814,0.059498497561025,0.0617408273533381,0.0592276422063684,0.0532303001501739,0.049692855274256,0.0528918336035986,0.0596210061121686,0.0649634904915878,0.0676832204828881,0.0702249458765112,0.0744621022049338,0.0793427260471337,0.0831343126365808,0.0852729502426609,0.0863633201706759,0.0866778286903351,0.0849565483575114,0.0796664880504596,0.0718051109167692,0.0649003343733642,0.0616710344942862,0.0611598429786833,0.0615355063179176,0.0620058163799042,0.0628176104091112,0.0643008086961966,0.0669369601728294,0.0711766273093092,0.0758858192786237,0.0783459919564488,0.0764513991188953,0.0708230606303023,0.0627955601389712,0.0544365433170178,0.0485467178603732,0.0470326094185766,0.0480965264079801,0.0490696898156232,0.0485475588451775,0.0463898309169458],[0.00995490109738018,0.0258019153407324,0.0480174477822059,0.0737960986504746,0.0971085914615697,0.10870177301888,0.102995904418255,0.0896655781362864,0.081799993583024,0.0833119041988271,0.0889376174543645,0.0937835994937921,0.0960328548021177,0.0947403483065957,0.0913260396462496,0.0895748831718718,0.0921893839315523,0.0971309463344363,0.10129314750544,0.104270865706126,0.108360280334708,0.115145590152833,0.122893279873863,0.129099970422491,0.132632271361622,0.133726780892375,0.132535733834645,0.128635652589831,0.121538220159228,0.112574159748106,0.104893234765713,0.1008406212425,0.0992267908756421,0.098035543560513,0.0965617776673574,0.0954114900412902,0.095367918350293,0.097695246427747,0.103646726979766,0.111458756334261,0.11635087444094,0.114697704490176,0.108018873309467,0.099107724487911,0.0904693162179292,0.084320421295265,0.0821118946249729,0.0822810561400769,0.0825592190496153,0.0817673680001241,0.0798161590756366],[0.0130378171941865,0.025348163273158,0.0440280234846942,0.0738656199344664,0.10870177301888,0.131429901425073,0.130338371899877,0.117199239378325,0.109151554151168,0.111744052608463,0.118935157239576,0.125205625814209,0.129141263988743,0.130406449035972,0.129868680057772,0.129598577985106,0.131177622726623,0.134021491522753,0.137061062815098,0.140530208014422,0.145965791500655,0.154228736045967,0.163249590518587,0.170292721388971,0.174149949926465,0.175140552199302,0.173523514301647,0.169051018118542,0.161419129688686,0.151947374787411,0.143578738926672,0.138481517872149,0.135442907580638,0.132469174994164,0.129046543615972,0.12614119351053,0.124904427310549,0.126970654171923,0.133934823992636,0.143706461230417,0.150509664902736,0.149900457099137,0.143403495752272,0.134024472210158,0.124513190276249,0.117363566209445,0.114287591739835,0.113914062663951,0.114141378734128,0.113797207304149,0.112638483329245],[0.00725655484167828,0.0226239217260867,0.0403499317976119,0.0682277414378763,0.102995904418255,0.130338371899877,0.139688182032744,0.137397260506744,0.133823360055384,0.134174652061097,0.138509726555246,0.146085330009431,0.153846414021606,0.158293537808086,0.158999056330915,0.158607120297858,0.159403311653335,0.16170471194128,0.165336738878575,0.170135419462684,0.175947389971655,0.182424713480816,0.188439315162361,0.192711088947525,0.194949003168609,0.195851100558978,0.195717042964027,0.19308095573795,0.186158898282678,0.176003247922498,0.166502699904055,0.160623618434228,0.15714977087869,0.153889591090921,0.150268773980867,0.147330275514936,0.146229194592453,0.148287939051957,0.154740975922443,0.163838107212786,0.170844469911743,0.171861305927171,0.166853349020735,0.1568316981457,0.144658773502766,0.135048316540061,0.131476061460251,0.132143744571539,0.133992284753065,0.135289287908722,0.135629046967152],[-0.00232554798509603,0.0185622655307192,0.0371119657865627,0.0607111433530042,0.0896655781362864,0.117199239378325,0.137397260506744,0.149021050937968,0.152158909046904,0.150528992318004,0.151479317345255,0.160019309553339,0.17117274984456,0.177717176035795,0.178183990982668,0.176858462579558,0.177467053922628,0.180500336734544,0.185609272525172,0.191740163307645,0.197134651598366,0.200408010855296,0.201792587780331,0.201934709781931,0.201704089367035,0.202193824141059,0.203687257405181,0.203189865661295,0.197077963680589,0.186104880523814,0.175400959541291,0.169000153079858,0.165789629595956,0.1634337203065,0.161065507288953,0.159286823481326,0.158833987946666,0.160725829315896,0.165959553302808,0.173410608511967,0.179832686438706,0.182195069327086,0.178698403652472,0.168035338100033,0.153134227850431,0.141159134579828,0.137488888247712,0.139624747338963,0.143184656221075,0.145766370540687,0.146947457223577],[-0.00801467720093724,0.0145795755299542,0.0335517590970443,0.0554453449883687,0.081799993583024,0.109151554151168,0.133823360055384,0.152158909046904,0.160759913976905,0.162051089760505,0.164280153376763,0.173284826371914,0.184250152549762,0.189965437082023,0.189370893621324,0.18755764430121,0.188800283197836,0.193037568943746,0.199106383544097,0.205431500374914,0.210025584183091,0.211435296519902,0.210414957220078,0.208268791568572,0.20625481507554,0.205584833476341,0.206553546822989,0.205795268916695,0.199235499638045,0.187544239635493,0.176135990325352,0.169354995508846,0.166439337958515,0.165382190838407,0.164904232232213,0.164453645143262,0.163936442039167,0.164963430649948,0.169454935750345,0.176647601384842,0.183094390867664,0.185548505510878,0.182030825701433,0.171074994035733,0.155719029247019,0.143505326205363,0.140040182937765,0.142402140335997,0.145642027044141,0.147207893536936,0.146945012119781],[-0.00790263578918598,0.0118809819100223,0.0303526265888372,0.0541034188934748,0.0833119041988271,0.111744052608463,0.134174652061097,0.150528992318004,0.162051089760505,0.170700287551338,0.179151255634655,0.188978321865683,0.197230041086307,0.200018353165535,0.197936248422864,0.196057767628283,0.198383944655527,0.203844472671486,0.210131452167584,0.215632561163899,0.219431055209156,0.220875817626191,0.220257272829271,0.218115273498233,0.215312661145223,0.21303526611469,0.211703065462037,0.20861645111564,0.200474760734835,0.188090698271463,0.176390333969868,0.169382638465077,0.166692815903987,0.166839231539204,0.168089613361411,0.168461052099373,0.166954905361007,0.166553187359872,0.171034412275775,0.179607818029667,0.186913366283649,0.188377265566261,0.183365368612479,0.172349353976725,0.158581939873738,0.148096884178574,0.14520462632401,0.146838672902345,0.148201202752072,0.147091220206974,0.143904555096121],[-0.00586684380172273,0.011460364200486,0.0296435643360046,0.055901990858814,0.0889376174543645,0.118935157239576,0.138509726555246,0.151479317345255,0.164280153376763,0.179151255634655,0.194134442321539,0.207015416909719,0.215285372260236,0.216547915360939,0.212685614130958,0.20985999742071,0.212748494985508,0.219347973081767,0.226063519070894,0.231102545847802,0.234474791840914,0.236211400126787,0.236115684455659,0.233972826276096,0.230460982724484,0.227151286625036,0.224790732613606,0.220674462452998,0.211423082374906,0.197991035415441,0.185666601415132,0.178728193736514,0.176661049081538,0.177732671916507,0.179763491086094,0.180126859813348,0.177508440228303,0.175920882370973,0.180448181283924,0.19018508658607,0.198237102472678,0.198945893456674,0.192639373221394,0.181216192008275,0.168281008165233,0.159144488146665,0.157417469158103,0.159614774347633,0.1606122604985,0.158391638832914,0.154040475012052],[-0.00503976012652639,0.0136427970615,0.0324526349389525,0.059498497561025,0.0937835994937921,0.125205625814209,0.146085330009431,0.160019309553339,0.173284826371914,0.188978321865683,0.207015416909719,0.226109065876101,0.240714743303262,0.244563014165053,0.239192032493299,0.233947541377799,0.236123607806666,0.243449752785962,0.251354795250806,0.257347545901448,0.261016807203268,0.2621094067394,0.260642534865696,0.256757706272572,0.251901309402189,0.248824606448451,0.248855362704169,0.247400421292272,0.238677363741663,0.223596057395594,0.20975865541166,0.203180237592353,0.202363714525955,0.20396716548771,0.205412522534698,0.204885570505838,0.201735186368487,0.199829771641696,0.203924162419897,0.213173923859123,0.221135350177003,0.222354612483207,0.216311175705738,0.203857529124282,0.189059282651913,0.179195166835087,0.179292120516932,0.184811112138843,0.189007860808978,0.189421680781059,0.187875481454371],[-0.00456780472318001,0.0161148749202248,0.035152744283584,0.0617408273533381,0.0960328548021177,0.129141263988743,0.153846414021606,0.171172749844561,0.184250152549762,0.197230041086307,0.215285372260236,0.240714743303262,0.264141654392061,0.273778057962538,0.269502116479285,0.262858202435585,0.263266344191042,0.269620287478599,0.278215275521839,0.28610711893243,0.29111219571012,0.291611942233209,0.288115755804973,0.281735341820846,0.275199716477354,0.272855206772115,0.276772608953709,0.279639324830909,0.272228571166917,0.255241826705762,0.239308832172306,0.232761966132105,0.23301750425598,0.234863600825815,0.23540920982273,0.234084084927079,0.231119034228133,0.229545466111812,0.23291543691702,0.240652454195423,0.248051476711448,0.25073177323223,0.246327861020472,0.233237750290338,0.215832110290119,0.204454269302239,0.206439741046498,0.216054057681938,0.224544341762145,0.228886688889647,0.231786167715914],[-0.00268331461816891,0.0160327747314867,0.0331828317842279,0.0592276422063684,0.0947403483065957,0.130406449035972,0.158293537808086,0.177717176035795,0.189965437082023,0.200018353165535,0.216547915360938,0.244563014165053,0.273778057962538,0.290572798312891,0.292573151941868,0.288651200741483,0.286737733825309,0.289042514381638,0.296222610813786,0.30611901054109,0.313756619998902,0.315175948630781,0.310969671127717,0.302955086674437,0.294940907638863,0.292727259572112,0.299164990782613,0.304961507880513,0.298318983747391,0.279644005272815,0.261547573354341,0.253979104530278,0.254119171117953,0.25607802348289,0.256769361369846,0.255910333902806,0.253830406873757,0.252820756332902,0.25554153363934,0.261859488272072,0.268847967829996,0.273294804653498,0.271331579071523,0.259269362595121,0.241320972989505,0.22960097427405,0.232798630393512,0.244527788254317,0.254931804696063,0.261016412991644,0.266649722847255],[0.000485556784993114,0.0135243329261398,0.0272677591173969,0.0532303001501739,0.0913260396462496,0.129868680057772,0.158999056330915,0.178183990982668,0.189370893621324,0.197936248422864,0.212685614130957,0.239192032493299,0.269502116479285,0.292573151941868,0.304084866530968,0.306439429693472,0.303057606677035,0.300265425619857,0.30480328575577,0.316236859897483,0.326957094436583,0.33064485798658,0.327388486556694,0.318988840468754,0.309791901709214,0.306688773929157,0.313271127157899,0.319490267082992,0.312455188069513,0.29233381903075,0.272353023428209,0.263086482662052,0.262224030835217,0.264331737959722,0.266211221915455,0.266899648450175,0.266091910995561,0.265725022785431,0.268205597825174,0.273848879004003,0.280878738094951,0.286898206164275,0.28739066816697,0.27767308912116,0.261422740636704,0.250677200915191,0.254094213698673,0.26535821852726,0.274730759985274,0.279843297835096,0.285697207017158],[0.00291483970576381,0.0116890454110662,0.0234183102217875,0.049692855274256,0.0895748831718718,0.129598577985106,0.158607120297858,0.176858462579558,0.18755764430121,0.196057767628283,0.20985999742071,0.233947541377799,0.262858202435585,0.288651200741483,0.306439429693472,0.314389455940099,0.312704592994684,0.309204068220854,0.313232618676246,0.325430199741382,0.337731985119672,0.343248592386973,0.341306289683202,0.332931048721084,0.322502032644519,0.317751596028588,0.322945736593217,0.327901860910561,0.319408078209404,0.29765805792069,0.276251029677522,0.266088035103276,0.26494596200812,0.26741130770582,0.270155472518723,0.271934759777765,0.272178336907421,0.27264423587875,0.275594391087112,0.281489521543979,0.288989130876933,0.296033547649337,0.297999655017049,0.289977868944833,0.275209671095333,0.265087608829077,0.267839041596672,0.277613000565924,0.285352127040801,0.289202483083125,0.294513551512573],[0.00310419846129833,0.0127990751660581,0.026004021647384,0.0528918336035986,0.0921893839315523,0.131177622726623,0.159403311653335,0.177467053922628,0.188800283197836,0.198383944655527,0.212748494985508,0.236123607806666,0.263266344191042,0.286737733825309,0.303057606677035,0.312704592994684,0.317105958549478,0.320795024198311,0.328855480815721,0.341169712939617,0.352418798771269,0.35791411030228,0.356392769700643,0.347634698529288,0.335706140920473,0.328959663575778,0.332178669150744,0.33512445671338,0.324411072215951,0.300642835576825,0.278414340680476,0.269212797906219,0.26966279248818,0.27282133348202,0.275151272423422,0.276521303328098,0.277252984769834,0.278887013123408,0.283196363791438,0.290286387461154,0.298594810104402,0.305911837133245,0.307727613410256,0.299305263018981,0.283911780284874,0.2728180297753,0.274256187773986,0.282911699996743,0.290363130038896,0.294624453490113,0.300145057934403],[0.00197754095554643,0.0153004431188909,0.0319101495611166,0.0596210061121686,0.0971309463344363,0.134021491522753,0.16170471194128,0.180500336734544,0.193037568943746,0.203844472671486,0.219347973081767,0.243449752785962,0.269620287478599,0.289042514381637,0.300265425619857,0.309204068220854,0.320795024198311,0.334779962483662,0.349582114419545,0.363171397029049,0.37306441301572,0.377071908776566,0.374260042755413,0.36420041359492,0.350871125241779,0.342656786946715,0.344349675078645,0.345606385730735,0.332936500117839,0.30749841883916,0.285099361878255,0.277894716559139,0.280884918805161,0.285014172433834,0.286574313382245,0.287204809707052,0.288632814369124,0.292006196968851,0.298266092384556,0.306929616209746,0.316089864753118,0.323293649031453,0.324150294421663,0.314137195281634,0.296837874960971,0.283941985800991,0.2841000136905,0.292396017144489,0.300764836732139,0.306716257668493,0.313335009814453],[0.00105330118737587,0.0167891143432511,0.0363122500732657,0.0649634904915878,0.10129314750544,0.137061062815098,0.165336738878575,0.185609272525172,0.199106383544097,0.210131452167584,0.226063519070894,0.251354795250806,0.278215275521839,0.296222610813786,0.30480328575577,0.313232618676246,0.328855480815721,0.349582114419545,0.370984151742953,0.389094311990697,0.400406401317044,0.402341684864012,0.395951074689911,0.383309110744288,0.369163304706409,0.360934139985263,0.362547165055812,0.36348332400828,0.350264985949383,0.324423770577444,0.302500548182388,0.297055887456202,0.3021188462484,0.307373832604049,0.309134496410136,0.310343729395571,0.313679062288349,0.319603670188119,0.327992114572648,0.337913388479374,0.347630916505408,0.354837546540918,0.35508391850748,0.3437550001667,0.324742995420216,0.310447344309692,0.3101171483734,0.318920635807337,0.328733235184467,0.336661005466127,0.345040636002025],[0.00158461271469904,0.0173100860678814,0.0380973901937379,0.0676832204828881,0.104270865706126,0.140530208014422,0.170135419462684,0.191740163307645,0.205431500374914,0.215632561163899,0.231102545847802,0.257347545901448,0.28610711893243,0.30611901054109,0.316236859897483,0.325430199741382,0.341169712939617,0.363171397029049,0.389094311990697,0.413877394668567,0.429739458769652,0.431049389052446,0.421249419374843,0.406064538201852,0.39155133264059,0.384097988440287,0.386666813598334,0.388458935775705,0.375894171990615,0.350548981412123,0.329156465360131,0.324417648570495,0.330269311677518,0.336262693460866,0.338975973627527,0.342014272811809,0.348387788586863,0.357504949598187,0.367888852076066,0.378410118034793,0.388286895272678,0.39594582808551,0.396587107492231,0.385004366348378,0.365192506093068,0.350347696748992,0.350321470200624,0.359986585156321,0.370722880057251,0.379700972961929,0.389882261546216],[0.00455930865447297,0.0193573876765801,0.0398641478320654,0.0702249458765112,0.108360280334708,0.145965791500655,0.175947389971655,0.197134651598366,0.210025584183091,0.219431055209156,0.234474791840914,0.261016807203268,0.29111219571012,0.313756619998902,0.326957094436583,0.337731985119672,0.352418798771269,0.37306441301572,0.400406401317044,0.429739458769652,0.450915402098339,0.456227225263344,0.448679218709221,0.43399327559022,0.418806196598807,0.410669689965659,0.413411559059499,0.415804873446741,0.40353871876728,0.37793526971434,0.355949790034467,0.350500579454025,0.355639283522414,0.361021505897904,0.363677148260339,0.368010410311134,0.377389876800323,0.389759186059976,0.401796610629334,0.412270451715893,0.422039039195408,0.430764527316019,0.432960884568757,0.422295336741163,0.402562095543856,0.387682358610529,0.387994106900862,0.39774123973533,0.407466892609289,0.415137602465859,0.426143307695915],[0.0102627510155511,0.0245339491722834,0.0435528802177005,0.0744621022049338,0.115145590152833,0.154228736045967,0.182424713480816,0.200408010855296,0.211435296519902,0.220875817626191,0.236211400126787,0.262109406739399,0.291611942233209,0.315175948630781,0.33064485798658,0.343248592386973,0.35791411030228,0.377071908776566,0.402341684864012,0.431049389052446,0.456227225263344,0.471713187952435,0.475315172245749,0.466116373399275,0.44970496080387,0.438174077985257,0.439084189309815,0.440733742656996,0.427329602525772,0.399728810463848,0.375438585067662,0.367819908526032,0.370753134432285,0.373607043309506,0.373997849489237,0.377789201111481,0.389187998900115,0.404339817600011,0.417561860121391,0.427471401556844,0.436985789181323,0.447303940848557,0.452003218086965,0.443239184389206,0.424289997295981,0.409556408396027,0.409624129776975,0.41788435133975,0.423866209365925,0.427140109465569,0.437317726577243],[0.0162165565814951,0.0304523843988157,0.0478255817033191,0.0793427260471338,0.122893279873863,0.163249590518587,0.188439315162361,0.201792587780331,0.210414957220078,0.220257272829271,0.236115684455659,0.260642534865696,0.288115755804973,0.310969671127718,0.327388486556694,0.341306289683202,0.356392769700643,0.374260042755413,0.395951074689911,0.421249419374843,0.448679218709221,0.475315172245749,0.492653238320889,0.491359229692647,0.475083372807843,0.460460307802116,0.458942148751755,0.458992746593247,0.444068798352341,0.414644974161663,0.388213917260719,0.378121765487047,0.378168987573464,0.377661321888484,0.374913305773777,0.377248275544171,0.389748533039457,0.406965494551783,0.421086008099859,0.430464443312404,0.439622691428143,0.450989028425408,0.457544656787058,0.450414823856973,0.432358481381503,0.417768285509471,0.417094601932727,0.42299538157567,0.424227190249541,0.422121121075806,0.430582794486151],[0.0193760713693298,0.0338539590040724,0.0506029719293355,0.0831343126365808,0.129099970422491,0.170292721388971,0.192711088947525,0.201934709781931,0.208268791568572,0.218115273498233,0.233972826276096,0.256757706272572,0.281735341820845,0.302955086674437,0.318988840468754,0.332931048721084,0.347634698529288,0.36420041359492,0.383309110744288,0.406064538201852,0.43399327559022,0.466116373399275,0.491359229692647,0.496771232159396,0.484303531324282,0.470809040547772,0.46829506710943,0.466794883037271,0.451513381679334,0.422953449440548,0.396915965782739,0.38544867037553,0.382926223188107,0.379686126428383,0.374831462072606,0.376230891865352,0.389241041266658,0.407645941943051,0.422630307690389,0.432136955591859,0.440866806017986,0.451495296269285,0.457420609055407,0.450202304850737,0.432455846616029,0.417852599798406,0.416199279096981,0.419921230569029,0.417696722356713,0.411964100245248,0.418921787662896],[0.0195639488111623,0.0343832480444693,0.0516070386956147,0.0852729502426609,0.132632271361622,0.174149949926465,0.194949003168609,0.201704089367035,0.20625481507554,0.215312661145223,0.230460982724484,0.251901309402189,0.275199716477354,0.294940907638863,0.309791901709214,0.322502032644519,0.335706140920473,0.350871125241779,0.369163304706409,0.39155133264059,0.418806196598807,0.44970496080387,0.475083372807843,0.484303531324282,0.4784961826378,0.470560720604065,0.469663561445729,0.467992489536655,0.454029492570107,0.428650367716082,0.40512472162131,0.393582438634855,0.389439068788739,0.384767773448362,0.379374196969909,0.38079646670035,0.394092004671593,0.413050632004732,0.428916568775937,0.439173916265262,0.447546656956644,0.456126506677243,0.459390917895988,0.450343846596568,0.431880843911928,0.416789056124265,0.414162394520702,0.416601552990779,0.413172633257409,0.406482117247161,0.41267686709026],[0.0194701497541815,0.0345881359849426,0.0523610379614717,0.0863633201706759,0.133726780892375,0.175140552199302,0.195851100558978,0.202193824141059,0.205584833476341,0.21303526611469,0.227151286625036,0.248824606448451,0.272855206772115,0.292727259572112,0.306688773929156,0.317751596028588,0.328959663575778,0.342656786946715,0.360934139985263,0.384097988440287,0.410669689965659,0.438174077985257,0.460460307802116,0.470809040547772,0.470560720604065,0.469115575603143,0.472779236162325,0.474077243404792,0.462592335719025,0.439458303186664,0.417359987582511,0.406044181113312,0.401496599158257,0.396577111193187,0.391377322675403,0.393220575043668,0.406984823385701,0.426508784170891,0.443127157996766,0.454057902463434,0.462402232173585,0.469814678020682,0.471139860930387,0.459943837840332,0.439542676278332,0.423002454362375,0.419638658976955,0.422068758056881,0.419303690994989,0.413417849679406,0.41954907849355],[0.0210337796708257,0.0363282575780583,0.0538230640251236,0.0866778286903351,0.132535733834645,0.173523514301647,0.195717042964027,0.203687257405181,0.206553546822989,0.211703065462037,0.224790732613606,0.248855362704169,0.276772608953709,0.299164990782613,0.313271127157899,0.322945736593217,0.332178669150744,0.344349675078645,0.362547165055812,0.386666813598334,0.413411559059499,0.439084189309815,0.458942148751755,0.46829506710943,0.469663561445729,0.472779236162324,0.48360904363182,0.491807243355981,0.483588029512162,0.459842077823593,0.436136545559384,0.424554952905488,0.420693840142059,0.4163763933165,0.411438785581337,0.413730177194226,0.428485347922813,0.449087992599913,0.466245834218447,0.4772629818291,0.486039930540316,0.494647711593813,0.49669230318913,0.484169386589381,0.460711868512451,0.44158988113099,0.437839440090394,0.441536229630547,0.440592365552886,0.436179795156215,0.442730297236699],[0.0226918603753865,0.0382053253515058,0.0543773005126641,0.0849565483575114,0.128635652589831,0.169051018118542,0.19308095573795,0.203189865661295,0.205795268916695,0.20861645111564,0.220674462452998,0.247400421292272,0.279639324830909,0.304961507880513,0.319490267082992,0.327901860910561,0.33512445671338,0.345606385730735,0.36348332400828,0.388458935775705,0.415804873446741,0.440733742656996,0.458992746593247,0.466794883037271,0.467992489536655,0.474077243404792,0.491807243355981,0.507677692879297,0.503896944557904,0.480587427684073,0.455785648259172,0.44368092602074,0.439958344049395,0.436039663502974,0.4316896020926,0.435015834082681,0.451340200927786,0.473390263877326,0.491077480554345,0.501960036355071,0.511242844448911,0.521749517586555,0.525448601144149,0.512201578850094,0.485824153938282,0.464086249148102,0.459953954192522,0.464751205895275,0.46499924937509,0.461199799648683,0.467835041630759],[0.0221081111780568,0.0380840687391034,0.0518705141914604,0.0796664880504596,0.121538220159228,0.161419129688686,0.186158898282678,0.197077963680589,0.199235499638045,0.200474760734835,0.211423082374906,0.238677363741663,0.272228571166917,0.298318983747391,0.312455188069513,0.319408078209404,0.324411072215951,0.332936500117839,0.350264985949383,0.375894171990615,0.40353871876728,0.427329602525772,0.444068798352341,0.451513381679334,0.454029492570107,0.462592335719025,0.483588029512162,0.503896944557905,0.506227456261832,0.48949010008663,0.46879757155209,0.456414732531753,0.450397343347703,0.445509862382192,0.442519137322143,0.448194405159151,0.466632497873868,0.490196733611043,0.508662304073023,0.5197451973432,0.529102197886016,0.539854500427431,0.543599932673212,0.529687626975937,0.502006082931013,0.478983167376673,0.474042569654484,0.478062683511896,0.476929702418906,0.471344663130109,0.476823445684704],[0.0192972748291135,0.0356116588086776,0.0465888704428848,0.0718051109167692,0.112574159748106,0.151947374787411,0.176003247922498,0.186104880523813,0.187544239635493,0.188090698271463,0.197991035415441,0.223596057395594,0.255241826705762,0.279644005272815,0.29233381903075,0.29765805792069,0.300642835576825,0.30749841883916,0.324423770577444,0.350548981412123,0.37793526971434,0.399728810463848,0.414644974161663,0.422953449440548,0.428650367716082,0.439458303186664,0.459842077823594,0.480587427684073,0.48949010008663,0.484230845366375,0.472375416874993,0.460698163707857,0.451088686975665,0.444296767638183,0.442943624095666,0.451521912188858,0.47214524953392,0.497085633467737,0.516481498387758,0.527988647690225,0.53678025377018,0.54578138088254,0.547617292200202,0.532915275809014,0.505573049014237,0.48275875834055,0.476829521694732,0.478590441778143,0.474134533299673,0.465100874572452,0.468674607513855],[0.0166251175189173,0.0322177082620627,0.0412579332623144,0.0649003343733642,0.104893234765713,0.143578738926672,0.166502699904055,0.175400959541291,0.176135990325352,0.176390333969868,0.185666601415132,0.20975865541166,0.239308832172306,0.261547573354341,0.272353023428209,0.276251029677522,0.278414340680476,0.285099361878255,0.302500548182388,0.329156465360131,0.355949790034467,0.375438585067662,0.388213917260719,0.396915965782739,0.40512472162131,0.417359987582511,0.436136545559384,0.455785648259172,0.46879757155209,0.472375416874993,0.468435111231379,0.459422170015031,0.449080840343203,0.441476994009423,0.440606162951622,0.450393539252134,0.472361361980421,0.498432291172835,0.518457243656538,0.529834621887771,0.53751031395185,0.54443319605555,0.544251490695122,0.528812802680059,0.502029288859946,0.479877658124455,0.473702912379025,0.474225042628651,0.467709999290762,0.45657461194633,0.459386589339875],[0.0157786309103451,0.0291246356267592,0.0380833303825961,0.0616710344942862,0.1008406212425,0.138481517872149,0.160623618434228,0.169000153079858,0.169354995508846,0.169382638465077,0.178728193736514,0.203180237592353,0.232761966132105,0.253979104530278,0.263086482662052,0.266088035103276,0.269212797906219,0.277894716559139,0.297055887456202,0.324417648570495,0.350500579454025,0.367819908526032,0.378121765487047,0.38544867037553,0.393582438634855,0.406044181113312,0.424554952905488,0.44368092602074,0.456414732531752,0.460698163707857,0.459422170015031,0.455349888117668,0.449872476135279,0.444155568067038,0.442066619777388,0.450174908996043,0.47205200752875,0.498808781421859,0.518817411678937,0.529109381436008,0.535375475961557,0.541203508738676,0.540263440520645,0.524279252988556,0.497245062359597,0.475425119387053,0.470333511356097,0.472349735238402,0.467288600496787,0.457500998193171,0.461873900988569],[0.015320834914587,0.0264160928835393,0.0367667012492371,0.0611598429786833,0.0992267908756421,0.135442907580638,0.15714977087869,0.165789629595956,0.166439337958515,0.166692815903987,0.176661049081538,0.202363714525955,0.23301750425598,0.254119171117953,0.262224030835217,0.26494596200812,0.26966279248818,0.280884918805161,0.3021188462484,0.330269311677518,0.355639283522414,0.370753134432285,0.378168987573464,0.382926223188107,0.389439068788739,0.401496599158257,0.420693840142059,0.439958344049395,0.450397343347703,0.451088686975665,0.449080840343203,0.449872476135279,0.451376901968089,0.44988736596414,0.448051597036438,0.454871804888492,0.475976224283362,0.502392063128317,0.521892469247847,0.531363112983211,0.536802187192439,0.542042016419592,0.540666665428192,0.524246217019632,0.496956662604507,0.475579902202605,0.472015999154859,0.476445338154734,0.474326860754641,0.467414468485694,0.473757022857713],[0.013141952079488,0.0239139417559029,0.0364074277070802,0.0615355063179176,0.098035543560513,0.132469174994164,0.153889591090921,0.1634337203065,0.165382190838407,0.166839231539204,0.177732671916507,0.20396716548771,0.234863600825815,0.25607802348289,0.264331737959722,0.26741130770582,0.27282133348202,0.285014172433834,0.307373832604049,0.336262693460866,0.361021505897904,0.373607043309506,0.377661321888484,0.379686126428384,0.384767773448362,0.396577111193187,0.4163763933165,0.436039663502974,0.445509862382192,0.444296767638183,0.441476994009423,0.444155568067038,0.44988736596414,0.454087083196024,0.457858625031747,0.467995106514124,0.488749400593422,0.513223984062976,0.532025928598614,0.54248276607584,0.548642488569836,0.552915213701676,0.549990785589102,0.533117094808669,0.506768673133801,0.486646693534791,0.483833995147439,0.488987622430972,0.488309143092758,0.482972478937609,0.489123905867787],[0.00962070316816562,0.0219667756937601,0.0366508994378863,0.0620058163799043,0.0965617776673574,0.129046543615972,0.150268773980867,0.161065507288953,0.164904232232213,0.168089613361412,0.179763491086095,0.205412522534698,0.23540920982273,0.256769361369846,0.266211221915455,0.270155472518723,0.275151272423422,0.286574313382245,0.309134496410136,0.338975973627527,0.363677148260339,0.373997849489237,0.374913305773777,0.374831462072606,0.379374196969909,0.391377322675403,0.411438785581337,0.4316896020926,0.442519137322143,0.442943624095666,0.440606162951622,0.442066619777388,0.448051597036438,0.457858625031747,0.471457839062233,0.489491979422758,0.511752752149262,0.53451075146009,0.55327142593128,0.566188267317039,0.574062810549737,0.576978154350557,0.57168312334756,0.554459777554331,0.530026666886978,0.511538831164126,0.508179224538887,0.51177524286709,0.510371975484521,0.50481592197545,0.508754992172626],[0.00762430715900963,0.0214499198734321,0.0376885139603343,0.0628176104091112,0.0954114900412902,0.12614119351053,0.147330275514936,0.159286823481326,0.164453645143262,0.168461052099373,0.180126859813348,0.204885570505838,0.234084084927079,0.255910333902806,0.266899648450175,0.271934759777765,0.276521303328098,0.287204809707052,0.310343729395571,0.342014272811809,0.368010410311134,0.377789201111481,0.377248275544171,0.376230891865352,0.38079646670035,0.393220575043668,0.413730177194226,0.435015834082681,0.448194405159151,0.451521912188858,0.450393539252133,0.450174908996043,0.454871804888492,0.467995106514124,0.489491979422758,0.515745875129392,0.543133321982381,0.568622932017401,0.58939272141059,0.604032520366637,0.612543973118782,0.614574451294114,0.608108703220587,0.591006647618568,0.567817206203912,0.549778303687962,0.54475151836718,0.545929738009707,0.54330405084826,0.537569126338575,0.540123215159413],[0.0091589006740241,0.0229121181584461,0.0396474612639257,0.0643008086961966,0.095367918350293,0.124904427310549,0.146229194592453,0.158833987946666,0.163936442039167,0.166954905361007,0.177508440228303,0.201735186368487,0.231119034228132,0.253830406873757,0.266091910995561,0.272178336907421,0.277252984769834,0.288632814369124,0.313679062288349,0.348387788586863,0.377389876800323,0.389187998900115,0.389748533039457,0.389241041266658,0.394092004671593,0.406984823385701,0.428485347922813,0.451340200927786,0.466632497873868,0.47214524953392,0.472361361980421,0.47205200752875,0.475976224283362,0.488749400593422,0.511752752149262,0.543133321982381,0.579891489816235,0.615003446952348,0.640693227552964,0.655024335003,0.661899741910047,0.66410144349512,0.658911965568111,0.642567491498452,0.618860277638933,0.599138653325696,0.5916170129157,0.590659911199867,0.587575328027475,0.58300659709459,0.586932405944075],[0.0123535086113619,0.0255041836435166,0.0423031433367703,0.0669369601728294,0.097695246427747,0.126970654171923,0.148287939051957,0.160725829315896,0.164963430649948,0.166553187359872,0.175920882370973,0.199829771641696,0.229545466111812,0.252820756332902,0.265725022785431,0.27264423587875,0.278887013123408,0.292006196968851,0.319603670188119,0.357504949598187,0.389759186059976,0.404339817600011,0.406965494551783,0.407645941943052,0.413050632004732,0.426508784170891,0.449087992599913,0.473390263877326,0.490196733611043,0.497085633467737,0.498432291172835,0.498808781421859,0.502392063128317,0.513223984062976,0.53451075146009,0.568622932017401,0.615003446952348,0.661529964350107,0.693685379505064,0.708373019034013,0.713916640876153,0.716660320288563,0.713043239830729,0.697407018116191,0.672802436012822,0.650989436642837,0.640775220956778,0.637641384304702,0.634093496532438,0.630810773091416,0.636646075038659],[0.0144475321088493,0.0280467134870814,0.0453253497513456,0.0711766273093092,0.103646726979766,0.133934823992636,0.154740975922443,0.165959553302808,0.169454935750345,0.171034412275775,0.180448181283924,0.203924162419897,0.23291543691702,0.25554153363934,0.268205597825174,0.275594391087112,0.283196363791438,0.298266092384556,0.327992114572648,0.367888852076066,0.401796610629334,0.417561860121391,0.421086008099859,0.422630307690389,0.428916568775937,0.443127157996766,0.466245834218447,0.491077480554345,0.508662304073023,0.516481498387758,0.518457243656538,0.518817411678937,0.521892469247847,0.532025928598614,0.55327142593128,0.58939272141059,0.640693227552964,0.693685379505064,0.731958672093633,0.75117309266341,0.759059121077132,0.761790173772067,0.757212197393526,0.741394265942365,0.717420333900737,0.695389236232093,0.68298265091228,0.676993648752599,0.671773520528878,0.668122107397169,0.673287800893748],[0.01451241822894,0.029803440233887,0.0479775655285843,0.0758858192786237,0.111458756334261,0.143706461230417,0.163838107212786,0.173410608511967,0.176647601384842,0.179607818029667,0.19018508658607,0.213173923859123,0.240652454195423,0.261859488272072,0.273848879004003,0.281489521543979,0.290286387461154,0.306929616209746,0.337913388479374,0.378410118034794,0.412270451715893,0.427471401556844,0.430464443312404,0.432136955591859,0.439173916265262,0.454057902463434,0.4772629818291,0.501960036355071,0.5197451973432,0.527988647690225,0.529834621887771,0.529109381436008,0.531363112983211,0.54248276607584,0.566188267317039,0.604032520366637,0.655024335003,0.708373019034013,0.751173092663411,0.778226097529673,0.792040596706031,0.795054719943551,0.788084512150767,0.771682049115599,0.749655104696312,0.72906715082152,0.715289796331835,0.7063609893558,0.698680729923973,0.693144760907573,0.695144568018487],[0.0134516599587157,0.0304812318149881,0.0491169711378746,0.0783459919564488,0.11635087444094,0.150509664902736,0.170844469911743,0.179832686438706,0.183094390867664,0.186913366283649,0.198237102472678,0.221135350177003,0.248051476711448,0.268847967829996,0.280878738094951,0.288989130876933,0.298594810104402,0.316089864753118,0.347630916505408,0.388286895272678,0.422039039195408,0.436985789181323,0.439622691428143,0.440866806017986,0.447546656956644,0.462402232173585,0.486039930540316,0.511242844448911,0.529102197886016,0.53678025377018,0.53751031395185,0.535375475961557,0.536802187192439,0.548642488569836,0.574062810549737,0.612543973118782,0.661899741910047,0.713916640876153,0.759059121077132,0.792040596706031,0.811823445088911,0.818042298701227,0.812281860505905,0.796706957442559,0.77561236261166,0.755422795274529,0.741041135892844,0.730912484310399,0.722030820366355,0.715146602345967,0.714766766980591],[0.012184664428522,0.0298735894418936,0.0478109208618154,0.0764513991188953,0.114697704490176,0.149900457099137,0.171861305927171,0.182195069327086,0.185548505510878,0.188377265566261,0.198945893456674,0.222354612483207,0.25073177323223,0.273294804653498,0.286898206164275,0.296033547649337,0.305911837133245,0.323293649031453,0.354837546540918,0.39594582808551,0.430764527316019,0.447303940848557,0.450989028425408,0.451495296269285,0.456126506677243,0.469814678020682,0.494647711593813,0.521749517586555,0.539854500427431,0.545781380882539,0.54443319605555,0.541203508738676,0.542042016419592,0.552915213701676,0.576978154350557,0.614574451294114,0.66410144349512,0.716660320288563,0.761790173772067,0.795054719943551,0.818042298701227,0.831782934255224,0.834025627384994,0.821986847722224,0.799500733226275,0.777019090184183,0.7626638177211,0.754086287475037,0.746539338907004,0.740315104458059,0.740745009549303],[0.0113758786592171,0.0280434804242242,0.0440381258969864,0.0708230606303023,0.108018873309467,0.143403495752272,0.166853349020735,0.178698403652472,0.182030825701433,0.183365368612479,0.192639373221394,0.216311175705738,0.246327861020472,0.271331579071523,0.28739066816697,0.297999655017049,0.307727613410256,0.324150294421663,0.35508391850748,0.396587107492231,0.432960884568757,0.452003218086965,0.457544656787058,0.457420609055407,0.459390917895988,0.471139860930387,0.49669230318913,0.525448601144149,0.543599932673212,0.547617292200202,0.544251490695122,0.540263440520645,0.540666665428192,0.549990785589103,0.57168312334756,0.608108703220587,0.65891196556811,0.713043239830729,0.757212197393526,0.788084512150767,0.812281860505905,0.834025627384994,0.846204553107233,0.839402248954447,0.816347064838136,0.79191208929948,0.777821061626795,0.771088195639605,0.765329870251256,0.760592032978596,0.763350199941972],[0.0115914994526642,0.0251388192376269,0.0380544047895784,0.0627955601389712,0.099107724487911,0.134024472210158,0.1568316981457,0.168035338100033,0.171074994035733,0.172349353976725,0.181216192008275,0.203857529124282,0.233237750290338,0.259269362595121,0.27767308912116,0.289977868944833,0.299305263018981,0.314137195281635,0.3437550001667,0.385004366348379,0.422295336741163,0.443239184389206,0.450414823856973,0.450202304850737,0.450343846596568,0.459943837840332,0.484169386589381,0.512201578850094,0.529687626975937,0.532915275809014,0.528812802680059,0.524279252988556,0.524246217019632,0.533117094808669,0.554459777554332,0.591006647618568,0.642567491498452,0.697407018116191,0.741394265942365,0.771682049115599,0.79670695744256,0.821986847722224,0.839402248954446,0.837968384106799,0.81921307224918,0.79717672855382,0.783171392528298,0.775419497351596,0.769108763063164,0.764893743882469,0.768895828209003],[0.0126039892034018,0.021711875112217,0.0312487428714465,0.0544365433170178,0.0904693162179292,0.124513190276249,0.144658773502766,0.153134227850431,0.155719029247019,0.158581939873738,0.168281008165233,0.189059282651913,0.215832110290119,0.241320972989505,0.261422740636704,0.275209671095333,0.283911780284874,0.296837874960971,0.324742995420216,0.365192506093068,0.402562095543856,0.424289997295981,0.432358481381503,0.432455846616029,0.431880843911928,0.439542676278332,0.460711868512451,0.485824153938282,0.502006082931013,0.505573049014237,0.502029288859946,0.497245062359597,0.496956662604507,0.5067686731338,0.530026666886978,0.567817206203912,0.618860277638933,0.672802436012822,0.717420333900737,0.749655104696312,0.77561236261166,0.799500733226275,0.816347064838136,0.81921307224918,0.808757619221069,0.793236718475193,0.77965794064671,0.768713844728515,0.759639720544742,0.754464983981018,0.758013176984464],[0.0133920758986437,0.0187192720325778,0.0261432922601097,0.0485467178603732,0.084320421295265,0.117363566209445,0.135048316540061,0.141159134579828,0.143505326205363,0.148096884178574,0.159144488146665,0.179195166835087,0.204454269302239,0.22960097427405,0.250677200915191,0.265087608829077,0.2728180297753,0.283941985800991,0.310447344309692,0.350347696748992,0.387682358610529,0.409556408396027,0.417768285509471,0.417852599798406,0.416789056124265,0.423002454362375,0.44158988113099,0.464086249148102,0.478983167376673,0.48275875834055,0.479877658124455,0.475425119387052,0.475579902202605,0.486646693534791,0.511538831164126,0.549778303687962,0.599138653325696,0.650989436642838,0.695389236232093,0.72906715082152,0.755422795274529,0.777019090184183,0.79191208929948,0.79717672855382,0.793236718475193,0.783864544186173,0.772508441505872,0.760737253457883,0.749743800435967,0.742878890922022,0.745651321486086],[0.0132433983079006,0.0169049313606271,0.0245335471295971,0.0470326094185766,0.0821118946249729,0.114287591739835,0.131476061460251,0.137488888247712,0.140040182937765,0.14520462632401,0.157417469158103,0.179292120516932,0.206439741046498,0.232798630393512,0.254094213698673,0.267839041596672,0.274256187773986,0.2841000136905,0.3101171483734,0.350321470200625,0.387994106900862,0.409624129776975,0.417094601932727,0.416199279096981,0.414162394520702,0.419638658976955,0.437839440090394,0.459953954192522,0.474042569654484,0.476829521694731,0.473702912379025,0.470333511356097,0.472015999154859,0.483833995147439,0.508179224538888,0.54475151836718,0.5916170129157,0.640775220956778,0.68298265091228,0.715289796331835,0.741041135892844,0.7626638177211,0.777821061626796,0.783171392528298,0.77965794064671,0.772508441505872,0.765873337377905,0.758848748779751,0.749517114130461,0.741708533689831,0.745000769558399],[0.0128192790963617,0.0160916414016255,0.0251112973955133,0.0480965264079801,0.0822810561400769,0.113914062663951,0.132143744571539,0.139624747338963,0.142402140335997,0.146838672902345,0.159614774347633,0.184811112138843,0.216054057681938,0.244527788254317,0.26535821852726,0.277613000565924,0.282911699996743,0.292396017144489,0.318920635807337,0.359986585156321,0.39774123973533,0.41788435133975,0.42299538157567,0.419921230569029,0.416601552990779,0.422068758056881,0.441536229630547,0.464751205895275,0.478062683511897,0.478590441778143,0.474225042628651,0.472349735238402,0.476445338154734,0.488987622430972,0.51177524286709,0.545929738009707,0.590659911199867,0.637641384304702,0.676993648752599,0.7063609893558,0.730912484310399,0.754086287475036,0.771088195639605,0.775419497351596,0.768713844728515,0.760737253457883,0.758848748779751,0.75936507004516,0.756232969455926,0.752382638643508,0.759727708669011],[0.0130612176454187,0.0158522428499515,0.0258242178246622,0.0490696898156232,0.0825592190496153,0.114141378734128,0.133992284753065,0.143184656221075,0.145642027044141,0.148201202752072,0.1606122604985,0.189007860808978,0.224544341762145,0.254931804696063,0.274730759985274,0.285352127040801,0.290363130038896,0.300764836732139,0.328733235184467,0.370722880057251,0.407466892609289,0.423866209365925,0.424227190249541,0.417696722356713,0.413172633257409,0.419303690994989,0.440592365552886,0.46499924937509,0.476929702418906,0.474134533299673,0.467709999290762,0.467288600496787,0.474326860754641,0.488309143092758,0.510371975484521,0.54330405084826,0.587575328027474,0.634093496532438,0.671773520528878,0.698680729923973,0.722030820366355,0.746539338907004,0.765329870251256,0.769108763063164,0.759639720544742,0.749743800435966,0.749517114130462,0.756232969455926,0.764217043125277,0.773730616715519,0.790970576667741],[0.0134560954089633,0.01530729642547,0.025351634247369,0.0485475588451775,0.0817673680001241,0.113797207304149,0.135289287908722,0.145766370540687,0.147207893536936,0.147091220206974,0.158391638832914,0.189421680781059,0.228886688889647,0.261016412991644,0.279843297835096,0.289202483083125,0.294624453490113,0.306716257668493,0.336661005466127,0.379700972961929,0.415137602465859,0.427140109465569,0.422121121075806,0.411964100245248,0.406482117247161,0.413417849679406,0.436179795156215,0.461199799648683,0.471344663130109,0.465100874572452,0.45657461194633,0.457500998193171,0.467414468485694,0.482972478937609,0.50481592197545,0.537569126338575,0.58300659709459,0.630810773091416,0.668122107397169,0.693144760907573,0.715146602345967,0.740315104458059,0.760592032978596,0.764893743882469,0.754464983981018,0.742878890922022,0.741708533689831,0.752382638643508,0.773730616715519,0.803127594121945,0.836494412362482],[0.0120361759133872,0.0131250828735328,0.0231045235574797,0.0463898309169458,0.0798161590756366,0.112638483329245,0.135629046967152,0.146947457223577,0.146945012119781,0.143904555096121,0.154040475012052,0.187875481454371,0.231786167715914,0.266649722847255,0.285697207017158,0.294513551512573,0.300145057934403,0.313335009814453,0.345040636002025,0.389882261546216,0.426143307695915,0.437317726577243,0.430582794486151,0.418921787662896,0.41267686709026,0.41954907849355,0.442730297236699,0.467835041630759,0.476823445684704,0.468674607513855,0.459386589339875,0.461873900988569,0.473757022857713,0.489123905867787,0.508754992172626,0.540123215159413,0.586932405944075,0.636646075038659,0.673287800893748,0.695144568018487,0.714766766980591,0.740745009549303,0.763350199941972,0.768895828209003,0.758013176984464,0.745651321486086,0.745000769558399,0.759727708669011,0.790970576667741,0.836494412362482,0.890690855519201]],"type":"surface","contours":{"z":{"show":true,"usecolormap":true,"highlightcolor":"#ff0000","project":{"z":true}}},"frame":null}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>
This covariance surface for the Brownian motion random walk we are using for data is interesting in its own right. There is a theoretical result that says that the estimate for the covariance function for this process is <span class="math inline">\(\hat{c}(t,s) = min(t,s)\)</span>. You can find the proof in the book by Kokoszka and Reimherr referenced below. Using the mouse to hover over some points in the graph makes this result seem plausible.</p>
<div id="references" class="section level4">
<h4>References</h4>
<div id="books" class="section level5">
<h5>Books</h5>
<ul>
<li>Kokoszka, P. and Reimherr, M. (2017). <em>Introduction to Functional Data Analysis</em>. CRC.</li>
<li>Ramsay, J.O. and Silverman, B.W. (2005). <em>Functional Data Analysis</em>. Springer.</li>
<li>Ramsay, J.0., Hooker, G. and Graves, S. (2009) <em>Functional Data Analysis with R and MATLAB</em> Springer.</li>
</ul>
</div>
<div id="online-resources" class="section level5">
<h5>Online Resources</h5>
<ul>
<li>Cao, J. (2019). <a href="https://www.youtube.com/watch?v=SUp_Nq8NwfE"><em>Functional Data Analysis Course</em></a></li>
<li>Staicu, A. and Park, Y. (2016) <a href="https://www4.stat.ncsu.edu/~staicu/FDAtutorial/"><em>Short Course on Applied Functional Data Analysis</em></a></li>
</ul>
</div>
<div id="recommended-papers" class="section level5">
<h5>Recommended Papers</h5>
<ul>
<li>Sørensen, H. Goldsmith, J. and Sangalli, L. (2013). <a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.5989"><em>An introduction with medical applications fo functional data analysis</em></a> Wiley</li>
<li>Wang, J., Chiou, J. and Müller, H. (2015). <a href="https://arxiv.org/pdf/1507.05135.pdf"><em>Review of Functional Data Analysis</em></a></li>
<li>Yao, F., Müller, H, Wang, J. (2012). <a href="https://anson.ucdavis.edu/~mueller/jasa03-190final.pdf"><em>Functional Data Analysis for Sparse Longitudinal Data</em></a> JASA J100, I 470</li>
</ul>
</div>
</div>
<script>window.location.href='https://rviews.rstudio.com/2021/05/14/basic-fda-descriptive-statistics-with-r/';</script>
Introduction to Functional Data Analysis with R
https://rviews.rstudio.com/2021/05/04/functional-data-analysis-in-r/
Tue, 04 May 2021 00:00:00 +0000https://rviews.rstudio.com/2021/05/04/functional-data-analysis-in-r/
<script src="/2021/05/04/functional-data-analysis-in-r/index_files/header-attrs/header-attrs.js"></script>
<p>Suppose you have data that looks something like this.</p>
<p><img src="/2021/05/04/functional-data-analysis-in-r/index_files/figure-html/unnamed-chunk-1-1.png" width="672" />
This plot might depict 80 measurements for a participant in a clinical trial where each data point represents the change in the level of some protein level. Or it could represent any series of longitudinal data where the measurements are taken at irregular intervals. The curve looks like a time series with obvious correlations among the points, but there are not enough measurements to model the data with the usual time series methods. In a scenario like this, you might find <a href="https://en.wikipedia.org/wiki/Functional_data_analysis">Functional Data Analysis</a> (FDA) to be a viable alternative to the usual multi-level, mixed model approach.</p>
<p>This post is meant to be a “gentle” introduction to doing FDA with R for someone who is totally new to the subject. I’ll show some “first steps” code, but most of the post will be about providing background and motivation for looking into FDA. I will also point out some of the available resources that a newcommer to FDA should find helpful.</p>
<p>FDA is a branch of statistics that deals with data that can be conceptualized as a function of an underlying, continuous variable. The data in FDA are smooth curves (or surfaces) in time or space. To fix a mental model of this idea, first consider an ordinary time series. For example, you might think of the daily closing prices of your favorite stock. The data that make up a time series are the individual points which are considered to be random draws from an underlying stochastic process.</p>
<p>Now, go up a level of abstraction, and consider a space where the whole time series, or rather an imaginary continuous curve that runs through all of your data points is the basic item of analysis. In this conceptual model, the curve comprises an infinite number of points, not just the few you observed. Moreover, unlike in basic time series analysis, the observed points do not need to be equally spaced, and the various curves that make up your data set do not need to be sampled at the same time points.</p>
<p>Mathematically, the curves are modeled as functions that live in an infinite dimensional vector space, what the mathematicians call a <a href="https://iopscience.iop.org/article/10.1088/1742-6596/839/1/012002/pdf">Hilbert Space</a>. One way to think of this is that you are dealing with the ultimate large p small n problem. Each curve has infinitely many points, not just the 3 or 30 or 3,000 you happen to have.</p>
<p>The theory of Hilbert Spaces is part of the area of mathematical analysis called <a href="https://en.wikipedia.org/wiki/Functional_analysis">Functional Analysis</a>, a subject usually introduced as part of a second or third course in mathematical analysis, or perhaps in a course on <a href="https://quantum.phys.cmu.edu/QCQI/qitd114.pdf">Quantum Mechanics</a>.</p>
<p>It would be a heavy lift to expect someone new to Functional Data Analysis to start with the mathematics. Fortunately, this is not really necessary. The practical applications of FDA and the necessary supporting software have been sufficiently developed so that anyone familiar with the basics of ordinary vector spaces should have sufficient background to get started. The salient points to remember are:</p>
<ul>
<li>Hilbert space is an infinite dimensional linear vector space</li>
<li>The vectors in Hilbert space are functions</li>
<li>The inner product of two functions in the Hilbert space is defined as the integral of two functions, but it behaves very much like the familiar dot product.</li>
</ul>
<p>Moreover, for the last twenty years or so mathematical statisticians have been writing R packages to put FDA within the reach of anyone with motivation and minimal R skills. The CRAN Task View on <a href="https://cran.r-project.org/view=FunctionalData">Functional Data Analysis</a> categorizes and provides brief explanations for forty packages that collectively cover most of the established work on FDA. The following graph built with functions from the <a href="https://cran.r-project.org/package=cranly"><code>cranly</code></a> package shows part of the network for two core FDA packages.</p>
<p><img src="fda.png" height = "600" width="100%"></p>
<p>The <a href="https://cran.r-project.org/package=fda"><code>fda</code></a> package emphasized in the network plot above is the logical place for an R user to begin investigating FDA. With thirty-two reverse depends, thirty-eight reverse imports and thirteen reverse suggest, fda is at the root of Functional Data Analysis software for R. Moreover, in a very real sense, it is at the root of modern FDA itself. <code>fda</code> was written to explicate the theory developed in the 2005 book by Ramsay and Silverman<span class="math inline">\(^{1}\)</span>. Kokoszka and Reimnerr state that the first edition of this book published in 1997: “is largely credited with solidifying FDA as an official subbranch of statistics” (p xiv)<span class="math inline">\(^{2}\)</span>. The <a href="https://cran.r-project.org/package=refund"><code>refund</code></a> package is used extensively throughout the book by Kokoszka and Reimnerr.</p>
<div id="first-steps" class="section level3">
<h3>First Steps</h3>
<p>The synthetic data in the figure above were generated by a Wiener, Brownian Motion process, which for the purposes of this post, is just a convenient way to generate a variety of reasonable looking curves. We suppose that the data points shown represent noisy observations generated by a smooth curve f(t). We estimate this curve with the model: <span class="math inline">\(y_{i} = f(t_{i}) + \epsilon_{i}\)</span> where the <span class="math inline">\(\epsilon_{i}\)</span> are normally distributed with mean 0 and variance <span class="math inline">\(\sigma^{2}\)</span>.</p>
<p>Notice that the measurement times are randomly selected within the 100 day window and not uniformly spaced.</p>
<pre class="r"><code>set.seed(999)
n_obs <- 80
time_span <- 100
time <- sort(runif(n_obs,0,time_span))
Wiener <- cumsum(rnorm(n_obs)) / sqrt(n_obs)
y_obs <- Wiener + rnorm(n_obs,0,.05)</code></pre>
<p>Remember that the task ahead is to represent the entire curve of infinitely many points and not just the handful of observed values. Here is where the linear algebra comes in. The curve is treated as a vector in an infinite dimensional vector space, and what we want is something that will serve as a basis for this curve projected down into the subspace where the measurements live. The standard way to do this for non-periodic data is to construct a <a href="https://en.wikipedia.org/wiki/B-spline">B-spline</a> basis. (B-splines or basis splines are splines designed to have properties that make them suitable for representing vectors.) The code that follows is mostly <em>borrowed</em> from Jiguo Cao’s Youtube Video Course<span class="math inline">\(^{3}\)</span> which I very highly recommend for anyone just starting with FDA. In his first five videos, Cao explains B-splines and the placement of knots in great detail and derives the formula used in the code to calculates the number of basis elements from the number of knots and the order of the splines.</p>
<p>Note that we are placing the knots at times equally spaced over the 100 day time span.</p>
<pre class="r"><code>times_basis = seq(0,time_span,1)
knots = c(seq(0,time_span,5)) #Location of knots
n_knots = length(knots) #Number of knots
n_order = 4 # order of basis functions: cubic bspline: order = 3 + 1
n_basis = length(knots) + n_order - 2;
basis = create.bspline.basis(c(min(times_basis),max(times_basis)),n_basis,n_order,knots)
n_basis</code></pre>
<pre><code>## [1] 23</code></pre>
<p>and there are 23 basis vectors.</p>
<p>Next, we use the function <code>eval.basis()</code> to evaluate the basis functions at the times where our data curve was observed The matrix <code>PHI</code> contains the values of the 23 basis functions <span class="math inline">\(\phi_j(t)\)</span> evaluated at 80 points.</p>
<pre class="r"><code>PHI = eval.basis(time, basis)
dim(PHI)</code></pre>
<pre><code>## [1] 80 23</code></pre>
<p>We plot the basis functions and locations of the knots.</p>
<pre class="r"><code>matplot(time,PHI,type='l',lwd=1,lty=1, xlab='time',ylab='basis',cex.lab=1,cex.axis=1)
for (i in 1:n_knots)
{
abline(v=knots[i], lty=2, lwd=1)
}</code></pre>
<p><img src="/2021/05/04/functional-data-analysis-in-r/index_files/figure-html/unnamed-chunk-5-1.png" width="672" /></p>
<p>The plot shows that for interior points, four basis functions contribute to computing the value of any point. The endpoints, however, are computed from a single basis function.</p>
</div>
<div id="estimating-the-basis-coefficients" class="section level3">
<h3>Estimating the Basis Coefficients</h3>
<p>As in ordinary regression, we express the function in terms of the coefficients <span class="math inline">\(c_j\)</span> and basis functions <span class="math inline">\(\phi_j\)</span> using the formula: <span class="math inline">\(f(t) = \sum c_j \phi_j(t)\)</span>. Later we will see how to use built-in <code>fda</code> functions to estimate the coefficients, but now we follow Cao’s lead and calculate everything from first principles.</p>
<p>The following code uses matrix least squares equation <span class="math inline">\(\hat{c} = (\Phi^t\Phi)^{-1} \Phi^{t}y\)</span> to estimate the coefficients.</p>
<pre class="r"><code># Least squares estimate
# estimate basis coefficient
M = ginv(t(PHI) %*% PHI) %*% t(PHI)
c_hat = M %*% Wiener</code></pre>
<p>We compute <span class="math inline">\(\hat{y}\)</span>, the estimates of our observed values, and plot.</p>
<pre class="r"><code>y_hat = PHI %*% c_hat
# Augment data frame for plotting
df <- df %>% mutate(y_hat = y_hat)
p2 <- df %>% ggplot() +
geom_line(aes(x = time, y = Wiener), col = "grey") +
geom_point(aes(x = time, y = y_obs)) +
geom_line(aes(x = time, y = y_hat), col = "red")
p2 + ggtitle("Original curve and least squares estimate") +
xlab("time") + ylab("f(time)")</code></pre>
<p><img src="/2021/05/04/functional-data-analysis-in-r/index_files/figure-html/unnamed-chunk-7-1.png" width="672" />
The gray curve in the plot represents the underlying Brownian motion process, the dots are the observed values (the same as in the first plot), and the red curve represents the least squares “smoothed” estimates.</p>
<p>Now, we work through the matrix calculations to estimate the variance of the noise and the error bars for <span class="math inline">\(\hat{y}\)</span>.</p>
<pre class="r"><code># estimate the variance of noise
## SSE = (Y - Xb)'(Y - Xb)
SSE = t(y_hat-y_obs)%*%(y_hat-y_obs)
sigma2 = SSE/(n_obs-n_basis)
# estimate the variance of the fitted curve
# H is the Hat matrix H
# H = X*inv(X'X)*X``
H = PHI %*% M
varYhat = diag(H %*% H * matrix(sigma2,n_obs,n_obs))
# 95% confidence interval
y_hat025 = y_hat-1.96*sqrt(varYhat)
y_hat975 = y_hat+1.96*sqrt(varYhat)</code></pre>
<p>And, we plot. We have a satisfying smoothed representation of our original curve that looks like it would be and adequate starting point for further study. Note that process of using regression to produce a curve from the basis functions is often referred to as “regression smoothing”</p>
<pre class="r"><code>df <- mutate(df, y_hat025 = y_hat025,
y_hat975 = y_hat975)
#names(df) <- c("time","Wiener","y_hat", "y_hat025", "y_hat975")
p3 <- df %>% ggplot() +
geom_line(aes(x = time, y = Wiener), col = "grey") +
geom_point(aes(x = time, y = y_obs)) +
geom_line(aes(x = time, y = y_hat), col = "red") +
geom_line(aes(x = time, y_hat025), col = "green") +
geom_line(aes(x = time, y_hat975), col = "green")
p3 + ggtitle("Estimated curve with error bars") +
xlab("time") + ylab("f(time)")</code></pre>
<p><img src="/2021/05/04/functional-data-analysis-in-r/index_files/figure-html/unnamed-chunk-9-1.png" width="672" /></p>
<p>We finish for today, by showing how to do the hard work of estimating coefficients and function values with a single line of code using the <code>fda</code> function <code>smooth.basis()</code>. The function takes the arguments <code>argvals</code> the times we want to use for evaluation as a vector (or matrix or array), <code>y</code> the observed values, and <code>fdParobj</code>, an <code>fda</code> object containing the basis elements.</p>
<pre class="r"><code>Wiener_obj <- smooth.basis(argvals = time, y = y_obs, fdParobj = basis)</code></pre>
<p>Here we plot our “hand calculated” curve in red and show the <code>smooth.basis()</code> curve in blue. They are reasonably close, except at the end points, where there is not much data to construct the basis.</p>
<pre class="r"><code>plot(time, Wiener, type = "l", xlab = "time", ylab = "f(time)",
main = "Comparison of fda package and naive smoothing estimates", col = "grey")
lines(time,y_hat,type = "l",col="red")
lines(Wiener_obj, lwd = 1, col = "blue")</code></pre>
<p><img src="/2021/05/04/functional-data-analysis-in-r/index_files/figure-html/unnamed-chunk-11-1.png" width="672" /></p>
<p>Note that we have shown the simplest use of <code>smooth.basis()</code> which is capable of computing penalized regression estimates and more. The <a href="https://www.rdocumentation.org/packages/fda/versions/5.1.9/topics/smooth.basis">examples</a> of using the <code>smooth.basis()</code> function in the <code>fda</code> pdf are extensive and worth multiple blog posts. In general, the pdf level documentation for <code>fda</code> is superb. However, the package lacks vignettes. For a price, the book <em>Functional Data Analysis with R and Matlab</em><span class="math inline">\(^{4}\)</span> supplies the equivalent of several the missing vignettes.</p>
</div>
<div id="next-steps" class="section level3">
<h3>Next Steps</h3>
<p>Once you have a basis representation, what’s next? You may be interested in the following:</p>
<ul>
<li>More exploratory work such as <a href="https://en.wikipedia.org/wiki/Functional_principal_component_analysis">Functional Principal Components Analysis</a>, the analog of principal components analysis.</li>
<li>Clustering curves. See the <a href="https://cran.r-project.org/package=funHDDC">funHDDC</a> package.</li>
<li>Setting up regression models where either the dependent variable, or some of the independent variables, or both are functional objects. See the <a href="https://CRAN.R-project.org/package=refund">refund</a> package and the book by Kokoszka and Reimnerr<span class="math inline">\(^{2}\)</span></li>
<li>Studying the shape of the curves themselves. For example, the shape of a protein concentration curve may convey some clinical meaning. FDA permits studying the velocity and acceleration of curves, offering the possibility of obtaining more information than the standard practice of looking at the area under the curves. You can explore this with the <code>fda</code> package (But be sure to check that the order of your basis functions is adequate to compute derivatives.). See the book by Ramsay, Hooker and Graves<span class="math inline">\(^{4}\)</span>.</li>
<li>Learning what to do when you have sparse data. See the paper by Yao et al. below<span class="math inline">\(^{5}\)</span> and look into the <a href="https://cran.r-project.org/package=fdapace">fdapace</a> package.</li>
<li>Working with two and three dimensional medical images<span class="math inline">\(^{6}\)</span>.</li>
</ul>
<p>I would like to make Functional Data Analysis a regular feature on R Views. If you are working with FDA and would like to post, please let me (<a href="mailto:joseph.rickert@rstudio.com" class="email">joseph.rickert@rstudio.com</a>) know.</p>
</div>
<div id="references" class="section level3">
<h3>References</h3>
<div id="books" class="section level4">
<h4>Books</h4>
<ul>
<li><span class="math inline">\(^{2}\)</span>Kokoszka, P. and Reimherr, M. (2017). <em>Introduction to Functional Data Analysis</em>. CRC.</li>
<li><span class="math inline">\(^{1}\)</span>Ramsay, J.O. and Silverman, B.W. (2005). <em>Functional Data Analysis</em>. Springer.</li>
<li><span class="math inline">\(^{4}\)</span>Ramsay, J.0., Hooker, G. and Graves, S. (2009) <em>Functional Data Analysis with R and MATLAB</em> Springer.</li>
</ul>
</div>
<div id="online-resources" class="section level4">
<h4>Online Resources</h4>
<ul>
<li><span class="math inline">\(^{3}\)</span>Cao, J. (2019). <a href="https://www.youtube.com/watch?v=SUp_Nq8NwfE"><em>Functional Data Analysis Course</em></a></li>
<li>Staicu, A. and Park, Y. (2016) <a href="https://www4.stat.ncsu.edu/~staicu/FDAtutorial/"><em>Short Course on Applied Functional Data Analysis</em></a></li>
</ul>
</div>
<div id="recommended-papers" class="section level4">
<h4>Recommended Papers</h4>
<ul>
<li><span class="math inline">\(^{6}\)</span>Sørensen, H. Goldsmith, J. and Sangalli, L. (2013). <a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.5989"><em>An introduction with medical applications fo functional data analysis</em></a> Wiley</li>
<li>Wang, J., Chiou, J. and Müller, H. (2015). <a href="https://arxiv.org/pdf/1507.05135.pdf"><em>Review of Functional Data Analysis</em></a></li>
<li><span class="math inline">\(^{5}\)</span> Yao, F., Müller, H, Wang, J. (2012). <a href="https://anson.ucdavis.edu/~mueller/jasa03-190final.pdf"><em>Functional Data Analysis for Sparse Longitudinal Data</em></a> JASA J100, I 470</li>
</ul>
</div>
</div>
<script>window.location.href='https://rviews.rstudio.com/2021/05/04/functional-data-analysis-in-r/';</script>
March 2021: "Top 40" New CRAN Packages
https://rviews.rstudio.com/2021/04/22/march-2021-top-40-new-cran-packages/
Thu, 22 Apr 2021 00:00:00 +0000https://rviews.rstudio.com/2021/04/22/march-2021-top-40-new-cran-packages/
<p>By my count, two hundred twenty-one new packages <em>stuck</em> to CRAN in March 2021.<sup>1</sup> Here are my “Top 40” selections in twelve categories: Computational Methods, Data, Engineering, Genomics, Machine Learning, Medicine, Music, Networks, Science, Statistics, Utility, and Visualization. Two of these categories Engineering and Music have only one entry each. However, I decided to give them their own category in order to draw attention to the use of R outside of the mainstream, and I have always lamented the fate of the <em>Miscellaneous</em>. In the same spirit, note that the complete works of <em>the Bard</em> appear in the Data category and that due to <code>tidypaleo</code> <em>Paleoenvironmental</em> is now <em>a thing</em> in R.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=gamlss.foreach">gamlss</a> v1.0-5: Implements computationally intensive calculations for Generalized Additive Models for location, scale, and shape as described in <a href="https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9876.2005.00510.x">Rigby & Stasinopoulos (2005)</a>.</p>
<p><a href="https://cran.r-project.org/package=waydown">waydown</a> v1.1.0: Implements an algorithm based on the classical Helmholtz decomposition to obtain an approximate potential function for non gradient fields. See <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007788">Rodríguez-Sánchez (2020)</a> for background and the <a href="https://cran.r-project.org/web/packages/waydown/vignettes/examples.pdf">vignette</a> for examples.</p>
<p><img src="waydown.png" height = "400" width="600"></p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=aopdata">aopdata</a> v0.2.1: Provides functions to download data from the <a href="https://www.ipea.gov.br/acessooportunidades/en/">Access to Opportunities Project</a> (AOP) which includes annual estimates of access to employment, health and education services by transport mode, as well as data on the spatial distribution of population, schools and health-care facilities at a fine spatial resolution for all cities included in the study. There is an <a href="https://cran.r-project.org/web/packages/aopdata/vignettes/intro_to_aopdata.html">Introduction</a> to the package, and there are vignettes on <a href="https://cran.r-project.org/web/packages/aopdata/vignettes/access_inequality.html">Analyzing Inequality</a>, <a href="https://cran.r-project.org/web/packages/aopdata/vignettes/access_maps.html">Mapping Urban Accessibility</a>, and <a href="https://cran.r-project.org/web/packages/aopdata/vignettes/landuse_maps.html">Mapping Pooulation and Land Use</a>.</p>
<p><img src="aopdata.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=bardr">bardr</a> v0.0.9: Provides R data structures for Shakespeare’s complete works, as provided by <a href="https:www.gutenberg.org/ebooks/100">Project Gutenberg</a>. See <a href="https://cran.r-project.org/web/packages/bardr/readme/README.html">README</a>.</p>
<p><a href="https://cran.r-project.org/package=metro">metro</a> v0.9.1: Provides access to the <a href="https://developer.wmata.com/">Metro Transparent Data Sets API</a> published by the Washington Metropolitan Area Transit Authority, the government agency operating light rail and passenger buses in the Washington D.C. area. See <a href="https://cran.r-project.org/web/packages/metro/readme/README.html">README</a>.</p>
<p><a href="https://cran.r-project.org/package=RAQSAPI">RAQSAPI</a> v2.0.1: Provides functions to retrieve air monitoring data and associated metadata from the US Environmental Protection Agency’s <a href="https://aqs.epa.gov/aqsweb/documents/data_api.html">Air Quality System Service</a>. There are several short vignettes including an <a href="https://cran.r-project.org/web/packages/RAQSAPI/vignettes/Intro.html">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/RAQSAPI/vignettes/RAQSAPIusagetipsandprecautions.html">Usage tips and precautions</a>.</p>
<p><a href="https://cran.r-project.org/package=troopdata">troopdata</a> v0.1.3: Provides access to U.S. Department of Defense data on overseas military deployments and includes functions for pulling country-year troop deployment and basing data. See <a href="https://cran.r-project.org/web/packages/troopdata/readme/README.html">README</a> to get started</p>
<p><img src="troopdata.png" height = "400" width="600"></p>
<h3 id="engineering">Engineering</h3>
<p><a href="https://cran.r-project.org/package=pipenostics">pipenostics</a> v0.1.7: Implements empirical and data-driven models of heat losses, corrosion diagnostics, reliability and predictive maintenance of pipeline systems which should be of interest to the engineering departments of heat generating and heat transferring companies. See <a href="https://link.springer.com/book/10.1007%2F978-3-319-25307-7">Timashev et al. (2016)</a> and <a href="https://www.sciencedirect.com/science/article/pii/S2214785317313755?via%3Dihub">Reddy (2017)</a> for the methods used and <a href="https://cran.r-project.org/web/packages/pipenostics/readme/README.html">README</a> to get started.</p>
<p><img src="pipenostics.svg" height = "300" width="500"></p>
<h3 id="genomics">Genomics</h3>
<p><a href="https://cran.r-project.org/package=glmmSeq">glmmSeq</a> v0.1.0: Provides functions to fit negative binomial mixed effects models with matched samples to model expression data. See the <a href="https://cran.r-project.org/web/packages/glmmSeq/vignettes/glmmSeq.html">vignette</a> for examples.</p>
<p><img src="glmmSeq.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=ondisc">ondisc</a> v1.0.0: Implements a method to allow researchers to analyze large-scale single-cell data as and R object stored on disk. There is a tutorial on the the <a href="https://cran.r-project.org/web/packages/ondisc/vignettes/tutorial_odm_class.html">ondisc matrix class</a> and another on <a href="https://cran.r-project.org/web/packages/ondisc/vignettes/tutorial_other_classes.html">Metadata</a>.</p>
<p><a href="https://cran.r-project.org/package=SignacX">SignacX</a> v2.2.0: Implements a neural network trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See <a href="https://www.biorxiv.org/content/10.1101/2021.02.01.429207v3">Chamberlain et al. (2021)</a> for background. There are seven vignettes including an <a href="https://cran.r-project.org/web/packages/SignacX/vignettes/signac-Seurat_AMP.html">Analysis of Kidney Lupus Data</a> and an <a href="https://cran.r-project.org/web/packages/SignacX/vignettes/signac-Seurat_pbmcs.html">Analysis of PBMCs from 10X Genomics</a>.</p>
<p><img src="SignacX.png" height = "200" width="400"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=opitools">opitools</a> v1.0.3: Implements a tool to analyze opinions inherent in a text document relating to a specific subject (A) and assess how opinions expressed with respect to another subject (B) may affect the opinions on subject A. This package has been designed specifically for application to social media datasets, such as Twitter and Facebook. See <a href="https://osf.io/preprints/socarxiv/c32qh/">Adepeju and Jimoh (2021)</a> for an extended example that demonstrates the utility of the approach and the <a href="https://cran.r-project.org/web/packages/opitools/vignettes/opitools-vignette.html">vignette</a> to get started.</p>
<p><img src="opitools.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=poems">poems</a> v1.0.1: Provides a framework of interoperable R6 classes for building ensembles of viable models via the <a href="https://en.wikipedia.org/wiki/Pattern-oriented_modeling">pattern-oriented modeling</a> (POM) approach. The package includes classes for encapsulating and generating model parameters, and managing the POM workflow which includes: model setup; generating model parameters via Latin hyper-cube sampling; running multiple sampled model simulations; collating summary results; and validating and selecting an ensemble of models that best match known patterns. There are two vignettes: <a href="https://cran.r-project.org/web/packages/poems/vignettes/simple_example.pdf">Simple Example</a> and <a href="https://cran.r-project.org/web/packages/poems/vignettes/thylacine_example.pdf">Thylacine Example</a>.</p>
<p><img src="poems.png" height = "300" width="500"></p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=dampack">dampack</a> v1.0.0: Implements a suite of functions for analyzing and visualizing the health economic outputs of mathematical models. See <a href="https://www.cambridge.org/core/books/decision-making-in-health-and-medicine/31FD197195DAE2A6321409568BEFA2DD">Hunink et al. (2014)</a> for the theoretical underpinnings. There are five vignettes including <a href="https://cran.r-project.org/web/packages/dampack/vignettes/basic_cea.html">Basic Cost Effectiveness Analysis</a>, <a href="https://cran.r-project.org/web/packages/dampack/vignettes/psa_analysis.html">Probabilistic Sensitivity Analysis: Analysis</a> and <a href="https://cran.r-project.org/web/packages/dampack/vignettes/voi.html">Value of Information Analysis</a>.</p>
<p><img src="dampack.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=rdecision">rdecision</a> v1.0.3: Provides classes and functions for using decision trees to model health care interventions using cohort models. See <a href="https://www.amazon.com/Decision-Modelling-Economic-Evaluation-Handbooks/dp/0198526628">Briggs et al.</a> for theory and terminology. There are five vignettes including <a href="https://cran.r-project.org/web/packages/rdecision/vignettes/DT01-Sumatriptan.html">Elementary decision tree (Evans 1997)</a> and <a href="https://cran.r-project.org/web/packages/rdecision/vignettes/DT02-Tegaderm.html">Decision tree with PSA</a>.</p>
<p><img src="rdecision.png" height = "300" width="500"></p>
<h3 id="music">Music</h3>
<p><a href="https://cran.r-project.org/package=gm">gm</a> v1.0.2: Implements a high-level language to create music including converting your music to musical scores and audio files. It works with <a href="https://rmarkdown.rstudio.com/">R Markdown</a>, R <a href="https://jupyter.org/">Jupyter Notebooks</a>, and RStudio. There vignette is available in <a href="https://cran.r-project.org/web/packages/gm/vignettes/gm.html">English</a> and in <a href="https://cran.r-project.org/web/packages/gm/vignettes/cn.html">Chinese</a>.</p>
<p><img src="gm.png" height = "300" width="500"></p>
<h3 id="networks">Networks</h3>
<p><a href="https://cran.r-project.org/package=sfnetworks">sfnetworks</a> v0.5.1: Provides a tidy approach to spatial network analysis in the form of classes and functions that enable a seamless interaction between the network analysis package <code>tidygraph</code> and the spatial analysis package <code>sf</code>. There are vignettes on <a href="https://cran.r-project.org/web/packages/sfnetworks/vignettes/structure.html">sf network structure</a>, <a href="https://cran.r-project.org/web/packages/sfnetworks/vignettes/preprocess_and_clean.html">Preprocessing</a>, <a href="https://cran.r-project.org/web/packages/sfnetworks/vignettes/join_filter.html">Spatial joins and filters</a>, <a href="https://cran.r-project.org/web/packages/sfnetworks/vignettes/routing.html">Routing</a>, and <a href="https://cran.r-project.org/web/packages/sfnetworks/vignettes/morphers.html">Spatial morphers</a>.</p>
<p><img src="sfnetworks.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=valhallr">valhallr</a> v0.1.0: Implements an interface to the <a href="https://github.com/valhalla/valhalla">Valhalla</a> routing engine’s API for turn-by-turn routing, isochrones, and origin-destination analyses. See the <a href="https://cran.r-project.org/web/packages/valhallr/vignettes/valhallr.html">vignette</a> for examples.</p>
<p><img src="valhallr.jpeg" height = "300" width="500"></p>
<h3 id="science">Science</h3>
<p><a href="https://cran.r-project.org/package=asteRisk">asteRisk</a> v0.99.4: Provides functions to calculate the positions of satellites given a known state vector. It includes implementations of the SGP4 and SDP4 simplified perturbation models to propagate orbital state vectors. See <a href="https://celestrak.com/NORAD/documentation/spacetrk.pdf">Hoots et al. (1988)</a>, <a href="https://arc.aiaa.org/doi/10.2514/6.2006-6753">Vallado et al. (2012)</a>, and <a href="https://arc.aiaa.org/doi/10.2514/1.9161">Hoots et al. (2014)</a> for background and the <a href="https://cran.r-project.org/web/packages/asteRisk/vignettes/asteRisk.html">vignette</a> for examples.</p>
<p><img src="asteRisk.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=forImage">forImage</a> v0.1.0: Implements a tool to measure the size of foraminifera and other unicellulars and includes functions to guide foraminiferal test biovolume calculations and cell biomass estimations. The volume function includes several microalgae models geometric adaptations based on <a href="https://onlinelibrary.wiley.com/doi/abs/10.1046/j.1529-8817.1999.3520403.x">Hillebrand et al. (1999)</a>, <a href="https://academic.oup.com/plankt/article/25/11/1331/1490055">Sun & Liu (2003)</a>, and <a href="http://siba-ese.unisalento.it/index.php/twb/article/view/106">Vadrucci et al. (2007)</a>. See the <a href="https://cran.r-project.org/web/packages/forImage/vignettes/forImage_vignette.html">vignette</a> to get started.</p>
<p><img src="forImage.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=OpenSpecy">OpenSpecy</a> v0.9.1: Provides functions to analyze, process, identify and share Raman and (FT)IR spectra with functions to implement Savitzky-Golay smoothing in accordance with <a href="https://journals.sagepub.com/doi/10.1366/000370207782597003">Zhao et al. (2007)</a> and identify spectra using an onboard reference library, see <a href="https://journals.sagepub.com/doi/10.1177/0003702820929064">Cowger et al. 2020</a>. Analyzed spectra can be shared via <a href="https://wincowger.shinyapps.io/OpenSpecy/">Shiny App</a>. There is a <a href="https://cran.r-project.org/web/packages/OpenSpecy/vignettes/sop.html">vignette</a>.</p>
<p><img src="OpenSpecy.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=tidypaleo">tidypaleo</a> v0.1.1: Provides functions with a common framework for age-depth model management, stratigraphic visualization, and common statistical transformations with a focus on stratigraphic visualization using <code>ggplot2</code>. There are vignettes on <a href="https://cran.r-project.org/web/packages/tidypaleo/vignettes/age_depth.html">Age-depth Models</a>, <a href="https://cran.r-project.org/web/packages/tidypaleo/vignettes/nested_analysis.html">Nested Analyses</a>, and <a href="https://cran.r-project.org/web/packages/tidypaleo/vignettes/strat_diagrams.html">Stratigraphic Diagrams</a>.</p>
<p><img src="tidypaleo.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=VulnToolkit">VulnToolkit</a> v1.1.2: Provides functions to analyze and summarize tidal data sets and to access to NOAA mean sea level data. See <a href="https://www.sciencedirect.com/science/article/abs/pii/S0272771415002139?via%3Dihub">Hill & Anisfeld (2015)</a> for background and the <a href="https://cran.r-project.org/web/packages/VulnToolkit/vignettes/Tidal_data.html">vignette</a> for examples.</p>
<p><img src="VulnToolkit.png" height = "300" width="500"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=corncob">corncob</a> v0.2.0: Implements functions for modeling correlated count data using the beta-binomial distribution, described in <a href="https://projecteuclid.org/journals/annals-of-applied-statistics/volume-14/issue-1/Modeling-microbial-abundances-and-dysbiosis-with-beta-binomial-regression/10.1214/19-AOAS1283.short">Martin et al. (2020)</a>. See the <a href="https://cran.r-project.org/web/packages/corncob/vignettes/corncob-intro.pdf">vignette</a> for an introduction.</p>
<p><img src="corncob.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=hawkesbow">hawkesbow</a> v1.0.2: Implements an estimation method for <a href="https://arxiv.org/pdf/1507.02822.pdf#:~:text=The%20Hawkes%20process%20(HP)%20is,trade%20orders%2C%20or%20bank%20defaults.">Hawkes processes</a> when count data are only observed in discrete time, using a spectral approach derived from the Bartlett spectrum. See <a href="https://arxiv.org/abs/2003.04314">Cheysson and Lang (2020)</a> for background and the <a href="https://cran.r-project.org/web/packages/hawkesbow/vignettes/hawkesbow.pdf">vignette</a> for examples.</p>
<p><a href="https://cran.r-project.org/package=LMMELSM">LMMELSM</a> v0.1.0: Implements two-level mixed effects location scale models on multiple observed or latent outcomes, and between-group variance modeling. See <a href="https://econtent.hogrefe.com/doi/10.1027/1015-5759/a000624">Williams et al. (2020)</a> and <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2007.00924.x">Hedeker et al. (2008)</a> for background and <a href="https://cran.r-project.org/web/packages/LMMELSM/readme/README.html">README</a> for an example.</p>
<p><a href="https://cran.r-project.org/package=mixpoissonreg">mixpoissinreg</a> v1.0.0: Provides functions to fit mixed Poisson regression models (Poisson-Inverse Gaussian or Negative-Binomial) with count data response variables. See <a href="https://link.springer.com/article/10.1007%2Fs11222-015-9601-6"> Barreto-Souza and Simas (2016)</a> for background. There are five vignettes on <a href="https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/influence-mixpoissonreg.html">Global and Local Influence</a>, <a href="https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/intervals-mixpoissonreg.html">Confidence and Prediction Intervals</a>, <a href="https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/ml-mixpoissonreg.html">MLE</a>, <a href="https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/tidyverse-mixpoissonreg.html">Tidy Methods</a>, and <a href="https://cran.r-project.org/web/packages/mixpoissonreg/vignettes/tutorial-mixpoissonreg.html">Overdispersed Count Data</a>.</p>
<p><img src="mixpoissinreg.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=ppdiag">ppdiag</a> v0.1.0: Provides a suite of diagnostic tools for univariate point processes including tools for simulating and fitting both common and more complex temporal point processes and the diagnostic tools described in <a href="https://direct.mit.edu/neco/article/14/2/325/6578/The-Time-Rescaling-Theorem-and-Its-Application-to">Brown et al. (2002)</a> and <a href="https://arxiv.org/abs/2001.09359">Wu et al. (2020)</a>. There is a vignette on <a href="https://cran.r-project.org/web/packages/ppdiag/vignettes/fitting_markov_modulated.html">Markov Modulated Point Processes</a> and another on <a href="https://cran.r-project.org/web/packages/ppdiag/vignettes/ppdiag.html">Diagnostic Tools</a>.</p>
<p><a href="https://cran.r-project.org/package=robustlm">robustlm</a> v0.1.0: Implements a computationally efficient exponential squared loss algorithm for variable selection proposed by <a href="https://www.tandfonline.com/doi/abs/10.1080/01621459.2013.766613">Wang et al.(2013)</a>. See the <a href="https://cran.r-project.org/web/packages/robustlm/vignettes/vignette.html">vignette</a>.</p>
<p><img src="robustlm.png" height = "200" width="300"></p>
<p><a href="https://CRAN.R-project.org/package=smmR">smmR</a> v1.0.2: Provides functions to estimate and simulate multi-state semi-Markov models. The methods implemented are described in <a href="https://www.tandfonline.com/doi/abs/10.1080/10485250701261913">Barbu & Limnios (2008)</a> and <a href="https://www.tandfonline.com/doi/abs/10.1080/10485252.2011.555543">Trevezas & Limnios (2011)</a>. The <a href="https://cran.r-project.org/web/packages/smmR/vignettes/Textile-Factory.html">vignette</a> contains an extended example.</p>
<p><img src="smmR.png" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=spotoroo">spotoroo</a> v0.1.1: Implements an algorithm to cluster satellite hot spot data spatially and temporally. See the <a href="https://cran.r-project.org/web/packages/spotoroo/vignettes/Clustering-hot-spots.html">vignette</a>.</p>
<p><img src="spotoroo.png" height = "400" width="400"></p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=clock">clock</a> v0.2.0: Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (duration, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. There is a <a href="Getting Started">Getting Started</a> guide, as well as vignettes on <a href="https://cran.r-project.org/web/packages/clock/vignettes/faq.html">FAQ</a>, and <a href="https://cran.r-project.org/web/packages/clock/vignettes/recipes.html">Examples and Recipies</a>.</p>
<p><a href="https://cran.r-project.org/package=crosstable">crosstable</a> v0.2.1: Provides functions to create descriptive tables for continuous and categorical variables, apply summary statistics, and create reports using <code>rmarkdown</code> or <code>officer</code>. There is an <a href="https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable.html">Introduction</a>, and vignettes on <a href="https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable-install.html">Troubleshooting</a>, <a href="https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable-report.html">Making Automatic Reports</a>, and <a href="https://cran.r-project.org/web/packages/crosstable/vignettes/crosstable-selection.html">Selecting Variables</a>.</p>
<p><a href="https://cran.r-project.org/package=pkgdepends">pkgdepends</a> v0.1.0: Provides functions to find recursive dependencies for R packages from various sources including CRAN, Bioconductor, and GitHub enabling users to obtain a consistent set of packages to install. See <a href="https://cran.r-project.org/web/packages/pkgdepends/readme/README.html">README</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=pkglite">pkglite</a> v0.1.1: Implements a tool, grammar, and standard to represent and exchange R package source code as text files. Converts one or more source packages to a text file and restores the package structures from the file. There are vignettes on <a href="https://cran.r-project.org/web/packages/pkglite/vignettes/filespec.html">Generating File Specifications</a>, <a href="https://cran.r-project.org/web/packages/pkglite/vignettes/format.html">Representing Packages</a>, and <a href="https://cran.r-project.org/web/packages/pkglite/index.html">Compact Package Representation</a>.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=datplot">datplot</a> v1.0.0: Provides tools to process and prepare data for visualization and employs the concept of <a href="https://www.jratcliffe.net/aoristic-analysis">aoristic analysis</a>. See <a href="https://bit.ly/3svhbdV">aorist</a> and the vignettes <a href="https://cran.r-project.org/web/packages/datplot/vignettes/data_preparation.html">Data Preparation and Visualization</a> and <a href="https://cran.r-project.org/web/packages/datplot/vignettes/how-to.html">Visualizing Chronological Distribution</a>.</p>
<p><img src="datplot.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=ferrn">ferrn</a> v0.0.1: Implements diagnostic plots for optimization, with a focus on projection pursuit which show paths the optimizer takes in the high-dimensional space. See <a href="https://cran.r-project.org/web/packages/ferrn/readme/README.html">README</a> for examples.</p>
<p><img src="ferrn.gif" height = "400" width="400"></p>
<p><a href="https://cran.r-project.org/package=funcharts">funcharts</a> v1.0.0: Provides functional control charts for statistical process monitoring of functional data, using the methods of <a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/asmb.2507">Capezza et al. (2020)</a> and <a href="https://www.tandfonline.com/doi/abs/10.1080/00401706.2020.1753581?journalCode=utch20">Centofanti et al. (2020)</a>. There are vignettes on <a href="https://cran.r-project.org/web/packages/funcharts/vignettes/capezza2020.html">Capezza 2020</a>, <a href="https://cran.r-project.org/web/packages/funcharts/vignettes/centofanti2020.html">Centofanti 2020</a> and on the <a href="https://cran.r-project.org/web/packages/funcharts/vignettes/mfd.html">mfd class</a>.</p>
<p><img src="funcharts.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=gghilbertstrings">gghilbertstrings</a> v0.3.3: Provides functions to plot Hilbert curves which are used to map one dimensional data into the 2D plane. A specific use case maps a character column in a data frame into 2D space allowing visually comparing long lists of URLs, words, genes or other data that has a fixed order and position. See <a href="https://cran.r-project.org/web/packages/gghilbertstrings/readme/README.html">README</a> for examples.</p>
<p><img src="gghilbertstrings.png" height = "300" width="500"></p>
<p><a href="https://cran.r-project.org/package=mapsf">mapsf</a> v0.1.1: Provides functions to create and integrate thematic maps including functions to design various cartographic representations such as proportional symbols, choropleth or typology maps. Look <a href="https://riatelab.github.io/mapsf">here</a> for examples.</p>
<p><img src="mapsf.png" height = "400" width="400"></p>
<p><sup>1</sup> I have used phrases like <em>By my count</em> and <em>stuck to CRAN</em> in the past, but I do not believe that I have explained what I mean. For some time now, but I believe more frequently in recent months, packages will appear as new on CRAN, only to be removed within a relatively short period of time for failing to resolve check problems. If you happen to know about these packages and search for them by name on CRAN you will receive the message:</p>
<blockquote>
<p>Package XXXX was removed from the CRAN repository.
Formerly available versions can be obtained from the archive.
Archived on 2021-04-17 as check problems remained after update.
A summary of the most recent check results can be obtained from the check results archive.
Please use the canonical form <a href="https://CRAN.R-project.org/package=XXXX">https://CRAN.R-project.org/package=XXXX</a> to link to this page.</p>
</blockquote>
<p>I did not include the ten packages that were identified as being new for March when I created my list of March packages on April 10, 2021, but were removed by the time I finalized my list for this post a week later, in my total count of new CRAN packages. So, there is some instability with the notion of counting new packages in a given month.</p>
<script>window.location.href='https://rviews.rstudio.com/2021/04/22/march-2021-top-40-new-cran-packages/';</script>
An Alternative to the Correlation Coefficient That Works For Numeric and Categorical Variables
https://rviews.rstudio.com/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/
Thu, 15 Apr 2021 00:00:00 +0000https://rviews.rstudio.com/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/
<script src="/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/index_files/header-attrs/header-attrs.js"></script>
<p><em>Dr. Rama Ramakrishnan is Professor of the Practice at MIT Sloan School of Management where he teaches courses in Data Science, Optimization and applied Machine Learning.</em></p>
<p>When starting to work with a new dataset, it is useful to quickly pinpoint which pairs of variables appear to be <em>strongly related</em>. It helps you spot data issues, make better modeling decisions, and ultimately arrive at better answers.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Correlation_coefficient"><em>correlation coefficient</em></a> is used widely for this purpose, but it is well-known that it can’t detect non-linear relationships. Take a look at this scatterplot of two variables <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>.</p>
<pre class="r"><code>set.seed(42)
x <- seq(-1,1,0.01)
y <- sqrt(1 - x^2) + rnorm(length(x),mean = 0, sd = 0.05)
ggplot(mapping = aes(x, y)) +
geom_point() </code></pre>
<p><img src="/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/index_files/figure-html/unnamed-chunk-1-1.png" width="672" /></p>
<p>It is obvious to the human eye that <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> have a strong relationship but the correlation coefficient between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> is only -0.01.</p>
<p>Further, if either variable of the pair is <em>categorical</em>, we can’t use the correlation coefficient. We will have to turn to other metrics. If <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> are <strong>both</strong> categorical, we can try <a href="https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V">Cramer’s V</a> or <a href="https://en.wikipedia.org/wiki/Phi_coefficient">the phi coefficient</a>. If <span class="math inline">\(x\)</span> is continuous and <span class="math inline">\(y\)</span> is binary, we can use the <a href="https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient">point-biserial correlation coefficient.</a></p>
<p>But using different metrics is problematic. Since they are derived from different assumptions, we can’t <strong>compare the resulting numbers with one another</strong>. If the correlation coefficient between continuous variables <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> is 0.6 and the phi coefficient between categorical variables <span class="math inline">\(u\)</span> and <span class="math inline">\(v\)</span> is also 0.6, can we safely conclude that the relationships are equally strong? According to <a href="https://en.wikipedia.org/wiki/Phi_coefficient">Wikipedia</a>,</p>
<blockquote>
<p>The correlation coefficient ranges from −1 to +1, where ±1 indicates perfect agreement or disagreement, and 0 indicates no relationship. The phi coefficient has a maximum value that is determined by the distribution of the two variables if one or both variables can take on more than two values.</p>
</blockquote>
<p>A phi coefficient value of 0.6 between <span class="math inline">\(u\)</span> and <span class="math inline">\(v\)</span> may not mean much if its maximum possible value in this particular situation is much higher. Perhaps we can normalize the phi coefficient to map it to the 0-1 range? But what if that modification introduces biases?</p>
<p>Wouldn’t it be nice if we had <strong>one</strong> uniform approach that was easy to understand, worked for continuous <strong>and</strong> categorical variables alike, and could detect linear <strong>and</strong> nonlinear relationships?</p>
<p>(BTW, when <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> are continuous, looking at a scatter plot of <span class="math inline">\(x\)</span> vs <span class="math inline">\(y\)</span> can be very effective since the human brain can detect linear and non-linear patterns very quickly. But even if you are lucky and <em>all</em> your variables are continuous, looking at scatterplots of <em>all</em> pairs of variables is hard when you have lots of variables in your dataset; with just 100 predictors (say), you will need to look through 4950 scatterplots and this obviously isn’t practical)</p>
<p><br></p>
<div id="a-potential-solution" class="section level3">
<h3>A Potential Solution</h3>
<p>To devise a metric that satisfies the requirements we listed above, let’s <em>invert</em> the problem: What does it mean to say that <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> <strong>don’t</strong> have a strong relationship?</p>
<p>Intuitively, if there’s no relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>, we would expect to see no patterns in a scatterplot of <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> - no lines, curves, groups etc. It will be a cloud of points that appears to be randomly scattered, perhaps something like this:</p>
<pre class="r"><code>x <- seq(-1,1,0.01)
y <- runif(length(x),min = -1, max = 1)
ggplot(mapping = aes(x, y)) +
geom_point() </code></pre>
<p><img src="/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/index_files/figure-html/unnamed-chunk-2-1.png" width="672" /></p>
<p>In this situation, does knowing the value of <span class="math inline">\(x\)</span> give us any information on <span class="math inline">\(y\)</span>?</p>
<p>Clearly not. <span class="math inline">\(y\)</span> seems to be somewhere between -1 and 1 with no particular pattern, regardless of the value of <span class="math inline">\(x\)</span>. Knowing <span class="math inline">\(x\)</span> does not seem to help <em>reduce our uncertainty</em> about the value of <span class="math inline">\(y\)</span>.</p>
<p>In contrast, look at the first picture again.</p>
<p><img src="/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/index_files/figure-html/unnamed-chunk-3-1.png" width="672" /></p>
<p>Here, knowing the value of <span class="math inline">\(x\)</span> <em>does</em> help. If we know that <span class="math inline">\(x\)</span> is around 0.0, for example, from the graph we will guess that <span class="math inline">\(y\)</span> is likely near 1.0 (the red dots). We can be confident that <span class="math inline">\(y\)</span> is <strong>not</strong> between 0 and 0.8. Knowing <span class="math inline">\(x\)</span> helps us eliminate certain values of <span class="math inline">\(y\)</span>, <strong>reducing our uncertainty</strong> about the values <span class="math inline">\(y\)</span> might take.</p>
<p>This notion - that knowing something reduces our uncertainty about something else - is exactly the idea behind <a href="https://en.wikipedia.org/wiki/Mutual_information">mutual information</a> from <a href="https://en.wikipedia.org/wiki/Information_theory">Information Theory</a>.</p>
<p>According to <a href="https://en.wikipedia.org/wiki/Mutual_information">Wikipedia</a> (emphasis mine),</p>
<blockquote>
<p>Intuitively, mutual information measures the information that <span class="math inline">\(X\)</span> and <span class="math inline">\(Y\)</span> share: It measures <strong>how much knowing one of these variables reduces uncertainty about the other</strong>. For example, if <span class="math inline">\(X\)</span> and <span class="math inline">\(Y\)</span> are independent, then knowing <span class="math inline">\(X\)</span> does not give any information about <span class="math inline">\(Y\)</span> and vice versa, so their mutual information is zero.</p>
</blockquote>
<p>Furthermore,</p>
<blockquote>
<p><strong>Not limited to real-valued random variables and linear dependence like the correlation coefficient</strong>, MI is more general and determines how different the joint distribution of the pair <span class="math inline">\((X,Y)\)</span> is to the product of the marginal distributions of <span class="math inline">\(X\)</span> and <span class="math inline">\(Y\)</span>.</p>
</blockquote>
<p>This is very promising!</p>
<p>As it turns out, however, implementing mutual information is not so simple. We first need to estimate the joint probabilities (i.e., the joint probability density/mass function) of <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> before we can calculate their Mutual Information. If <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> are categorical, this is easy but if one or both of them is continuous, it is more involved.</p>
<p>But we can use the basic insight behind mutual information – that knowing <span class="math inline">\(x\)</span> may reduce our uncertainty about <span class="math inline">\(y\)</span> – in a different way.</p>
<p><br></p>
</div>
<div id="the-x2y-metric" class="section level3">
<h3>The X2Y Metric</h3>
<p>Consider three variables <span class="math inline">\(x\)</span>, <span class="math inline">\(y\)</span> and <span class="math inline">\(z\)</span>. If knowing <span class="math inline">\(x\)</span> reduces our uncertainty about <span class="math inline">\(y\)</span> by 70% but knowing <span class="math inline">\(z\)</span> reduces our uncertainty about <span class="math inline">\(y\)</span> by only 40%, we will intuitively expect that the association between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> will be stronger than the association between <span class="math inline">\(z\)</span> and <span class="math inline">\(y\)</span>.</p>
<p>So, if we can <em>quantify</em> the reduction in uncertainty, that can be used as a measure of the strength of the association. One way to do so is to measure <span class="math inline">\(x\)</span>’s ability to <em>predict</em> <span class="math inline">\(y\)</span> - after all, <strong>if <span class="math inline">\(x\)</span> reduces our uncertainty about <span class="math inline">\(y\)</span>, knowing <span class="math inline">\(x\)</span> should help us predict <span class="math inline">\(y\)</span> better than if we didn’t know <span class="math inline">\(x\)</span></strong>.</p>
<p>Stated another way, we can think of reduction in prediction error <span class="math inline">\(\approx\)</span> reduction in uncertainty <span class="math inline">\(\approx\)</span> strength of association.</p>
<p>This suggests the following approach:</p>
<ol style="list-style-type: decimal">
<li>Predict <span class="math inline">\(y\)</span> <em>without using</em> <span class="math inline">\(x\)</span>.
<ul>
<li>If <span class="math inline">\(y\)</span> is continuous, we can simply use the average value of <span class="math inline">\(y\)</span>.</li>
<li>If <span class="math inline">\(y\)</span> is categorical, we can use the most frequent value of <span class="math inline">\(y\)</span>.</li>
<li>These are sometimes referred to as a <em>baseline</em> model.</li>
</ul></li>
<li>Predict <span class="math inline">\(y\)</span> <em>using</em> <span class="math inline">\(x\)</span>
<ul>
<li>We can take any of the standard predictive models out there (Linear/Logistic Regression, CART, Random Forests, SVMs, Neural Networks, Gradient Boosting etc.), set <span class="math inline">\(x\)</span> as the independent variable and <span class="math inline">\(y\)</span> as the dependent variable, fit the model to the data, and make predictions. More on this below.</li>
</ul></li>
<li>Calculate the <strong>% decrease in prediction error</strong> when we go from (1) to (2)
<ul>
<li>If <span class="math inline">\(y\)</span> is continuous, we can use any of the familiar error metrics like RMSE, SSE, MAE etc. I prefer mean absolute error (MAE) since it is less susceptible to outliers and is in the same units as <span class="math inline">\(y\)</span> but this is a matter of personal preference.</li>
<li>If <span class="math inline">\(y\)</span> is categorical, we can use Misclassification Error (= 1 - Accuracy) as the error metric.</li>
</ul></li>
</ol>
<blockquote>
<p>In summary, the % reduction in error when we go from a baseline model to a predictive model measures the strength of the relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>. We will call this metric <code>x2y</code> since it measures the ability of <span class="math inline">\(x\)</span> to predict <span class="math inline">\(y\)</span>.</p>
</blockquote>
<p>(This definition is similar to <a href="https://en.wikipedia.org/wiki/Coefficient_of_determination"><em>R-Squared</em></a> from Linear Regression. In fact, if <span class="math inline">\(y\)</span> is continuous and we use the Sum of Squared Errors as our error metric, the <code>x2y</code> metric is equal to R-Squared.)</p>
<p>To implement (2) above, we need to pick a predictive model to use. Let’s remind ourselves of what the requirements are:</p>
<ul>
<li>If there’s a non-linear relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>, the model should be able to detect it</li>
<li>It should be able to handle all possible <span class="math inline">\(x\)</span>-<span class="math inline">\(y\)</span> variable types: continuous-continuous, continuous-categorical, categorical-continuous and categorical-categorical</li>
<li>We may have hundreds (if not thousands) of pairs of variables we want to analyze so we want this to be quick</li>
</ul>
<p><a href="https://en.wikipedia.org/wiki/Decision_tree_learning">Classification and Regression Trees (CART)</a> satisfies these requirements very nicely and that’s the one I prefer to use. That said, you can certainly use other models if you like.</p>
<p>Let’s try this approach on the ‘semicircle’ dataset from above. We use CART to predict <span class="math inline">\(y\)</span> using <span class="math inline">\(x\)</span> and here’s how the fitted values look:</p>
<pre class="r"><code># Let's generate the data again
set.seed(42)
x <- seq(-1,1,0.01)
d <- data.frame(x = x,
y = sqrt(1 - x^2) + rnorm(length(x),mean = 0, sd = 0.05))
library(rpart)
preds <- predict(rpart(y~x, data = d, method = "anova"), type = "vector")
# Set up a chart
ggplot(data = d, mapping = aes(x = x)) +
geom_point(aes(y = y), size = 0.5) +
geom_line(aes(y=preds, color = '2')) +
scale_color_brewer(name = "", labels='CART', palette="Set1")</code></pre>
<p><img src="/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/index_files/figure-html/unnamed-chunk-4-1.png" width="672" /></p>
<p>Visually, the CART predictions seem to approximate the semi-circular relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>. To confirm, let’s calculate the <code>x2y</code> metric step by step.</p>
<ul>
<li>The MAE from using the average of <span class="math inline">\(y\)</span> to predict <span class="math inline">\(y\)</span> is 0.19.</li>
<li>The MAE from using the CART predictions to predict <span class="math inline">\(y\)</span> is 0.06.</li>
<li>The % reduction in MAE is 68.88%.</li>
</ul>
<p>Excellent!</p>
<p>If you are familiar with CART models, it is straightforward to implement the <code>x2y</code> metric in the Machine Learning environment of your choice. An R implementation is <a href="x2y.R">here</a> and details can be found in the <a href="#appendix">appendix</a> but, for now, I want to highlight two functions from the R script that we will use in the examples below:</p>
<ul>
<li><code>x2y(u, v)</code> calculates the <code>x2y</code> metric between two vectors <span class="math inline">\(u\)</span> and <span class="math inline">\(v\)</span></li>
<li><code>dx2y(d)</code> calculates the <code>x2y</code> metric between all pairs of variables in a dataframe <span class="math inline">\(d\)</span></li>
</ul>
<p><br></p>
</div>
<div id="two-caveats" class="section level3">
<h3>Two Caveats</h3>
<p>Before we demonstrate the <code>x2y</code> metric on a couple of datasets, I want to highlight two aspects of the <code>x2y</code> approach.</p>
<p>Unlike metrics like the correlation coefficient, the <code>x2y</code> metric is <strong>not</strong> symmetric with respect to <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>. The extent to which <span class="math inline">\(x\)</span> can predict <span class="math inline">\(y\)</span> can be different from the extent to which <span class="math inline">\(y\)</span> can predict <span class="math inline">\(x\)</span>. For the semi-circle dataset, <code>x2y(x,y)</code> is 68.88% but <code>x2y(y,x)</code> is only 10.2%.</p>
<p>This shouldn’t come as a surprise, however. Let’s look at the scatterplot again but with the axes reversed.</p>
<pre class="r"><code>ggplot(data = d, mapping = aes(x = y)) +
geom_point(aes(y = x), size = 0.5) +
geom_point(data = d[abs(d$x) < 0.05,], aes(x = y, y = x), color = "orange" ) +
geom_point(data = d[abs(d$y-0.6) < 0.05,], aes(x = y, y = x), color = "red" )</code></pre>
<p><img src="/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/index_files/figure-html/unnamed-chunk-5-1.png" width="672" /></p>
<p>When <span class="math inline">\(x\)</span> is around 0.0, for instance, <span class="math inline">\(y\)</span> is near 1.0 (the orange dots). But when <span class="math inline">\(y\)</span> is around 0.6, <span class="math inline">\(x\)</span> can be in the (-0.75, - 1.0) range <em>or</em> in the (0.5, 0.75) range (the red dots). Knowing <span class="math inline">\(x\)</span> reduces the uncertainty about the value of <span class="math inline">\(y\)</span> a lot more than knowing <span class="math inline">\(y\)</span> reduces the uncertainty about the value of <span class="math inline">\(x\)</span>.</p>
<p>But there’s an easy solution if you <em>must</em> have a symmetric metric for your application: just take the average of <code>x2y(x,y)</code> and <code>x2y(y,x)</code>.</p>
<p>The second aspect worth highlighting is about the comparability of the <code>x2y</code> metric across variable pairs. All <code>x2y</code> values where the <span class="math inline">\(y\)</span> variable is continuous will be measuring a % reduction in MAE. All <code>x2y</code> values where the <span class="math inline">\(y\)</span> variable is categorical will be measuring a % reduction in Misclassification Error. Is a 30% reduction in MAE equal to a 30% reduction in Misclassification Error? It is problem dependent, there’s no universal right answer.</p>
<p>On the other hand, since (1) <em>all</em> <code>x2y</code> values are on the same 0-100% scale (2) are conceptually measuring the same thing, i.e., reduction in prediction error and (3) our objective is to quickly scan and identify strongly-related pairs (rather than conduct an in-depth investigation), the <code>x2y</code> approach may be adequate.</p>
<p><br></p>
</div>
<div id="application-to-the-iris-dataset" class="section level3">
<h3>Application to the Iris Dataset</h3>
<p>The <a href="https://en.wikipedia.org/wiki/Iris_flower_data_set">iris flower dataset</a> is iconic in the statistics/ML communities and is widely used to illustrate basic concepts. The dataset consists of 150 observations in total and each observation has four continuous variables - the length and the width of petals and sepals - and a categorical variable indicating the species of iris.</p>
<p>Let’s take a look at 10 randomly chosen rows.</p>
<pre class="r"><code>iris %>% sample_n(10) %>% pander</code></pre>
<table>
<colgroup>
<col width="20%" />
<col width="19%" />
<col width="20%" />
<col width="19%" />
<col width="19%" />
</colgroup>
<thead>
<tr class="header">
<th align="center">Sepal.Length</th>
<th align="center">Sepal.Width</th>
<th align="center">Petal.Length</th>
<th align="center">Petal.Width</th>
<th align="center">Species</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center">5.9</td>
<td align="center">3</td>
<td align="center">5.1</td>
<td align="center">1.8</td>
<td align="center">virginica</td>
</tr>
<tr class="even">
<td align="center">5.5</td>
<td align="center">2.6</td>
<td align="center">4.4</td>
<td align="center">1.2</td>
<td align="center">versicolor</td>
</tr>
<tr class="odd">
<td align="center">6.1</td>
<td align="center">2.8</td>
<td align="center">4</td>
<td align="center">1.3</td>
<td align="center">versicolor</td>
</tr>
<tr class="even">
<td align="center">5.9</td>
<td align="center">3.2</td>
<td align="center">4.8</td>
<td align="center">1.8</td>
<td align="center">versicolor</td>
</tr>
<tr class="odd">
<td align="center">7.7</td>
<td align="center">2.6</td>
<td align="center">6.9</td>
<td align="center">2.3</td>
<td align="center">virginica</td>
</tr>
<tr class="even">
<td align="center">5.7</td>
<td align="center">4.4</td>
<td align="center">1.5</td>
<td align="center">0.4</td>
<td align="center">setosa</td>
</tr>
<tr class="odd">
<td align="center">6.5</td>
<td align="center">3</td>
<td align="center">5.2</td>
<td align="center">2</td>
<td align="center">virginica</td>
</tr>
<tr class="even">
<td align="center">5.2</td>
<td align="center">2.7</td>
<td align="center">3.9</td>
<td align="center">1.4</td>
<td align="center">versicolor</td>
</tr>
<tr class="odd">
<td align="center">5.6</td>
<td align="center">2.7</td>
<td align="center">4.2</td>
<td align="center">1.3</td>
<td align="center">versicolor</td>
</tr>
<tr class="even">
<td align="center">7.2</td>
<td align="center">3.2</td>
<td align="center">6</td>
<td align="center">1.8</td>
<td align="center">virginica</td>
</tr>
</tbody>
</table>
<p>We can calculate the <code>x2y</code> values for all pairs of variables in <code>iris</code> by running <code>dx2y(iris)</code> in R (details of how to use the <code>dx2y()</code> function are in the <a href="#appendix">appendix</a>).</p>
<pre class="r"><code>dx2y(iris) %>% pander</code></pre>
<table style="width:72%;">
<colgroup>
<col width="20%" />
<col width="20%" />
<col width="19%" />
<col width="11%" />
</colgroup>
<thead>
<tr class="header">
<th align="center">x</th>
<th align="center">y</th>
<th align="center">perc_of_obs</th>
<th align="center">x2y</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center">Petal.Width</td>
<td align="center">Species</td>
<td align="center">100</td>
<td align="center">94</td>
</tr>
<tr class="even">
<td align="center">Petal.Length</td>
<td align="center">Species</td>
<td align="center">100</td>
<td align="center">93</td>
</tr>
<tr class="odd">
<td align="center">Petal.Width</td>
<td align="center">Petal.Length</td>
<td align="center">100</td>
<td align="center">80.73</td>
</tr>
<tr class="even">
<td align="center">Species</td>
<td align="center">Petal.Length</td>
<td align="center">100</td>
<td align="center">79.72</td>
</tr>
<tr class="odd">
<td align="center">Petal.Length</td>
<td align="center">Petal.Width</td>
<td align="center">100</td>
<td align="center">77.32</td>
</tr>
<tr class="even">
<td align="center">Species</td>
<td align="center">Petal.Width</td>
<td align="center">100</td>
<td align="center">76.31</td>
</tr>
<tr class="odd">
<td align="center">Sepal.Length</td>
<td align="center">Petal.Length</td>
<td align="center">100</td>
<td align="center">66.88</td>
</tr>
<tr class="even">
<td align="center">Sepal.Length</td>
<td align="center">Species</td>
<td align="center">100</td>
<td align="center">62</td>
</tr>
<tr class="odd">
<td align="center">Petal.Length</td>
<td align="center">Sepal.Length</td>
<td align="center">100</td>
<td align="center">60.98</td>
</tr>
<tr class="even">
<td align="center">Sepal.Length</td>
<td align="center">Petal.Width</td>
<td align="center">100</td>
<td align="center">54.36</td>
</tr>
<tr class="odd">
<td align="center">Petal.Width</td>
<td align="center">Sepal.Length</td>
<td align="center">100</td>
<td align="center">48.81</td>
</tr>
<tr class="even">
<td align="center">Species</td>
<td align="center">Sepal.Length</td>
<td align="center">100</td>
<td align="center">42.08</td>
</tr>
<tr class="odd">
<td align="center">Sepal.Width</td>
<td align="center">Species</td>
<td align="center">100</td>
<td align="center">39</td>
</tr>
<tr class="even">
<td align="center">Petal.Width</td>
<td align="center">Sepal.Width</td>
<td align="center">100</td>
<td align="center">31.75</td>
</tr>
<tr class="odd">
<td align="center">Petal.Length</td>
<td align="center">Sepal.Width</td>
<td align="center">100</td>
<td align="center">30</td>
</tr>
<tr class="even">
<td align="center">Sepal.Width</td>
<td align="center">Petal.Length</td>
<td align="center">100</td>
<td align="center">28.16</td>
</tr>
<tr class="odd">
<td align="center">Sepal.Width</td>
<td align="center">Petal.Width</td>
<td align="center">100</td>
<td align="center">23.02</td>
</tr>
<tr class="even">
<td align="center">Species</td>
<td align="center">Sepal.Width</td>
<td align="center">100</td>
<td align="center">22.37</td>
</tr>
<tr class="odd">
<td align="center">Sepal.Length</td>
<td align="center">Sepal.Width</td>
<td align="center">100</td>
<td align="center">18.22</td>
</tr>
<tr class="even">
<td align="center">Sepal.Width</td>
<td align="center">Sepal.Length</td>
<td align="center">100</td>
<td align="center">12.18</td>
</tr>
</tbody>
</table>
<p>The first two columns in the output are self-explanatory. The third column - <code>perc_of_obs</code> - is the % of observations in the dataset that was used to calculate that row’s <code>x2y</code> value. When a dataset has missing values, only observations that have values present for both <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> will be used to calculate the <code>x2y</code> metrics for that variable pair. The <code>iris</code> dataset has no missing values so this value is 100% for all rows. The fourth column is the value of the <code>x2y</code> metric and the results are sorted in descending order of this value.</p>
<p>Looking at the numbers, both <code>Petal.Length</code> and <code>Petal.Width</code> seem to be highly associated with <code>Species</code> (and with each other). In contrast, it appears that <code>Sepal.Length</code> and <code>Sepal.Width</code> are very weakly associated with each other.</p>
<p>Note that even though <code>Species</code> is categorical and the other four variables are continuous, we could simply “drop” the <code>iris</code> dataframe into the <code>dx2y()</code> function and calculate the associations between all the variables.</p>
<p><br></p>
</div>
<div id="application-to-a-covid-19-dataset" class="section level3">
<h3>Application to a COVID-19 Dataset</h3>
<p>Next, we examine a <a href="https://github.com/rama100/x2y/blob/main/covid19.csv">COVID-19 dataset</a> that was downloaded from the <a href="https://github.com/mdcollab/covidclinicaldata/">COVID-19 Clinical Data Repository</a> in April 2020. This dataset contains clinical characteristics and COVID-19 test outcomes for 352 patients. Since it has a good mix of continuous and categorical variables, having something like the <code>x2y</code> metric that can work for any type of variable pair is convenient.</p>
<p>Let’s read in the data and take a quick look at the columns.</p>
<pre class="r"><code>df <- read.csv("covid19.csv", stringsAsFactors = FALSE)
str(df) </code></pre>
<pre><code>## 'data.frame': 352 obs. of 45 variables:
## $ date_published : chr "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14" ...
## $ clinic_state : chr "CA" "CA" "CA" "CA" ...
## $ test_name : chr "Rapid COVID-19 Test" "Rapid COVID-19 Test" "Rapid COVID-19 Test" "Rapid COVID-19 Test" ...
## $ swab_type : chr "" "Nasopharyngeal" "Nasal" "" ...
## $ covid_19_test_results : chr "Negative" "Negative" "Negative" "Negative" ...
## $ age : int 30 77 49 42 37 23 71 28 55 51 ...
## $ high_risk_exposure_occupation: logi TRUE NA NA FALSE TRUE FALSE ...
## $ high_risk_interactions : logi FALSE NA NA FALSE TRUE TRUE ...
## $ diabetes : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ chd : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ htn : logi FALSE TRUE FALSE TRUE FALSE FALSE ...
## $ cancer : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ asthma : logi TRUE TRUE FALSE TRUE FALSE FALSE ...
## $ copd : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ autoimmune_dis : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ temperature : num 37.1 36.8 37 36.9 37.3 ...
## $ pulse : int 84 96 79 108 74 110 78 NA 97 66 ...
## $ sys : int 117 128 120 156 126 134 144 NA 160 98 ...
## $ dia : int 69 73 80 89 67 79 85 NA 97 65 ...
## $ rr : int NA 16 18 14 16 16 15 NA 16 16 ...
## $ sats : int 99 97 100 NA 99 98 96 97 99 100 ...
## $ rapid_flu : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ rapid_flu_results : chr "" "" "" "" ...
## $ rapid_strep : logi FALSE TRUE FALSE FALSE FALSE TRUE ...
## $ rapid_strep_results : chr "" "Negative" "" "" ...
## $ ctab : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ labored_respiration : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ rhonchi : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ wheezes : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ cough : logi FALSE NA TRUE TRUE TRUE TRUE ...
## $ cough_severity : chr "" "" "" "Mild" ...
## $ fever : logi NA NA NA FALSE FALSE TRUE ...
## $ sob : logi FALSE NA FALSE FALSE TRUE TRUE ...
## $ sob_severity : chr "" "" "" "" ...
## $ diarrhea : logi NA NA NA TRUE NA NA ...
## $ fatigue : logi NA NA NA NA TRUE TRUE ...
## $ headache : logi NA NA NA NA TRUE TRUE ...
## $ loss_of_smell : logi NA NA NA NA NA NA ...
## $ loss_of_taste : logi NA NA NA NA NA NA ...
## $ runny_nose : logi NA NA NA NA NA TRUE ...
## $ muscle_sore : logi NA NA NA TRUE NA TRUE ...
## $ sore_throat : logi TRUE NA NA NA NA TRUE ...
## $ cxr_findings : chr "" "" "" "" ...
## $ cxr_impression : chr "" "" "" "" ...
## $ cxr_link : chr "" "" "" "" ...</code></pre>
<pre class="r"><code>#%>% pander</code></pre>
<p>There are lots of missing values (denoted by ‘NA’) and lots of blanks as well - for example, see the first few values of the <code>rapid_flu_results</code> field above. We will convert the blanks to NAs so that all the missing values can be treated consistently. Also, the rightmost three columns are free-text fields so we will remove them from the dataframe.</p>
<pre class="r"><code>df <- read.csv("covid19.csv",
stringsAsFactors = FALSE,
na.strings=c("","NA") # read in blanks as NAs
)%>%
select(-starts_with("cxr")) # remove the chest x-ray note fields
str(df) </code></pre>
<pre><code>## 'data.frame': 352 obs. of 42 variables:
## $ date_published : chr "2020-04-14" "2020-04-14" "2020-04-14" "2020-04-14" ...
## $ clinic_state : chr "CA" "CA" "CA" "CA" ...
## $ test_name : chr "Rapid COVID-19 Test" "Rapid COVID-19 Test" "Rapid COVID-19 Test" "Rapid COVID-19 Test" ...
## $ swab_type : chr NA "Nasopharyngeal" "Nasal" NA ...
## $ covid_19_test_results : chr "Negative" "Negative" "Negative" "Negative" ...
## $ age : int 30 77 49 42 37 23 71 28 55 51 ...
## $ high_risk_exposure_occupation: logi TRUE NA NA FALSE TRUE FALSE ...
## $ high_risk_interactions : logi FALSE NA NA FALSE TRUE TRUE ...
## $ diabetes : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ chd : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ htn : logi FALSE TRUE FALSE TRUE FALSE FALSE ...
## $ cancer : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ asthma : logi TRUE TRUE FALSE TRUE FALSE FALSE ...
## $ copd : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ autoimmune_dis : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ temperature : num 37.1 36.8 37 36.9 37.3 ...
## $ pulse : int 84 96 79 108 74 110 78 NA 97 66 ...
## $ sys : int 117 128 120 156 126 134 144 NA 160 98 ...
## $ dia : int 69 73 80 89 67 79 85 NA 97 65 ...
## $ rr : int NA 16 18 14 16 16 15 NA 16 16 ...
## $ sats : int 99 97 100 NA 99 98 96 97 99 100 ...
## $ rapid_flu : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ rapid_flu_results : chr NA NA NA NA ...
## $ rapid_strep : logi FALSE TRUE FALSE FALSE FALSE TRUE ...
## $ rapid_strep_results : chr NA "Negative" NA NA ...
## $ ctab : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ labored_respiration : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ rhonchi : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ wheezes : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
## $ cough : logi FALSE NA TRUE TRUE TRUE TRUE ...
## $ cough_severity : chr NA NA NA "Mild" ...
## $ fever : logi NA NA NA FALSE FALSE TRUE ...
## $ sob : logi FALSE NA FALSE FALSE TRUE TRUE ...
## $ sob_severity : chr NA NA NA NA ...
## $ diarrhea : logi NA NA NA TRUE NA NA ...
## $ fatigue : logi NA NA NA NA TRUE TRUE ...
## $ headache : logi NA NA NA NA TRUE TRUE ...
## $ loss_of_smell : logi NA NA NA NA NA NA ...
## $ loss_of_taste : logi NA NA NA NA NA NA ...
## $ runny_nose : logi NA NA NA NA NA TRUE ...
## $ muscle_sore : logi NA NA NA TRUE NA TRUE ...
## $ sore_throat : logi TRUE NA NA NA NA TRUE ...</code></pre>
<pre class="r"><code>#%>% pander</code></pre>
<p>Now, let’s run it through the <code>x2y</code> approach. We are particularly interested in non-zero associations between the <code>covid_19_test_results</code> field and the other fields so we zero in on those by running <code>dx2y(df, target = "covid_19_test_results")</code> in R (details in the <a href="#appendix">appendix</a>) and filtering out the zero associations.</p>
<pre class="r"><code>dx2y(df, target = "covid_19_test_results") %>%
filter(x2y >0) %>%
pander</code></pre>
<table style="width:86%;">
<colgroup>
<col width="33%" />
<col width="22%" />
<col width="19%" />
<col width="11%" />
</colgroup>
<thead>
<tr class="header">
<th align="center">x</th>
<th align="center">y</th>
<th align="center">perc_of_obs</th>
<th align="center">x2y</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center">covid_19_test_results</td>
<td align="center">loss_of_smell</td>
<td align="center">21.88</td>
<td align="center">18.18</td>
</tr>
<tr class="even">
<td align="center">covid_19_test_results</td>
<td align="center">loss_of_taste</td>
<td align="center">22.73</td>
<td align="center">12.5</td>
</tr>
<tr class="odd">
<td align="center">covid_19_test_results</td>
<td align="center">sats</td>
<td align="center">92.9</td>
<td align="center">2.24</td>
</tr>
</tbody>
</table>
<p>Only <em>three</em> of the 41 variables have a non-zero association with <code>covid_19_test_results</code>. Disappointingly, the highest <code>x2y</code> value is an unimpressive 18%. It is based on just 22% of the observations (since the other 78% of observations had missing values) and makes one wonder if this modest association is real or if it is just due to chance.</p>
<p>If we were working with the correlation coefficient, we could easily calculate a <em>confidence interval</em> for it and gauge if what we are seeing is real or not. Can we do the same thing for the <code>x2y</code> metric?</p>
<p>We can, by using <a href="https://en.wikipedia.org/wiki/Bootstrapping_(statistics)">bootstrapping</a>. Given <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>, we can sample with replacement a 1000 times (say) and calculate the <code>x2y</code> metric each time. With these 1000 numbers, we can construct a confidence interval easily (this is available as an optional <code>confidence</code> argument in the R functions we have been using; please see the <a href="#appendix">appendix</a>).</p>
<p>Let’s re-do the earlier calculation with “confidence intervals” turned on by running <code>dx2y(df, target = "covid_19_test_results", confidence = TRUE)</code> in R.</p>
<pre class="r"><code>dx2y(df, target = "covid_19_test_results", confidence = TRUE) %>%
filter(x2y >0) %>%
pander(split.tables = Inf)</code></pre>
<table>
<colgroup>
<col width="26%" />
<col width="17%" />
<col width="15%" />
<col width="8%" />
<col width="15%" />
<col width="15%" />
</colgroup>
<thead>
<tr class="header">
<th align="center">x</th>
<th align="center">y</th>
<th align="center">perc_of_obs</th>
<th align="center">x2y</th>
<th align="center">CI_95_Lower</th>
<th align="center">CI_95_Upper</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center">covid_19_test_results</td>
<td align="center">loss_of_smell</td>
<td align="center">21.88</td>
<td align="center">18.18</td>
<td align="center">-8.08</td>
<td align="center">36.36</td>
</tr>
<tr class="even">
<td align="center">covid_19_test_results</td>
<td align="center">loss_of_taste</td>
<td align="center">22.73</td>
<td align="center">12.5</td>
<td align="center">-11.67</td>
<td align="center">25</td>
</tr>
<tr class="odd">
<td align="center">covid_19_test_results</td>
<td align="center">sats</td>
<td align="center">92.9</td>
<td align="center">2.24</td>
<td align="center">-1.85</td>
<td align="center">4.48</td>
</tr>
</tbody>
</table>
<p><em>The 95% confidence intervals all contain 0.0</em>, so none of these associations appear to be real.</p>
<p>Let’s see what the top 10 associations are, between <em>any</em> pair of variables.</p>
<pre class="r"><code>dx2y(df) %>%head(10) %>% pander</code></pre>
<table style="width:75%;">
<colgroup>
<col width="22%" />
<col width="22%" />
<col width="19%" />
<col width="11%" />
</colgroup>
<thead>
<tr class="header">
<th align="center">x</th>
<th align="center">y</th>
<th align="center">perc_of_obs</th>
<th align="center">x2y</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center">loss_of_smell</td>
<td align="center">loss_of_taste</td>
<td align="center">20.17</td>
<td align="center">100</td>
</tr>
<tr class="even">
<td align="center">loss_of_taste</td>
<td align="center">loss_of_smell</td>
<td align="center">20.17</td>
<td align="center">100</td>
</tr>
<tr class="odd">
<td align="center">fatigue</td>
<td align="center">headache</td>
<td align="center">40.06</td>
<td align="center">90.91</td>
</tr>
<tr class="even">
<td align="center">headache</td>
<td align="center">fatigue</td>
<td align="center">40.06</td>
<td align="center">90.91</td>
</tr>
<tr class="odd">
<td align="center">fatigue</td>
<td align="center">sore_throat</td>
<td align="center">27.84</td>
<td align="center">89.58</td>
</tr>
<tr class="even">
<td align="center">headache</td>
<td align="center">sore_throat</td>
<td align="center">30.4</td>
<td align="center">89.36</td>
</tr>
<tr class="odd">
<td align="center">sore_throat</td>
<td align="center">fatigue</td>
<td align="center">27.84</td>
<td align="center">88.89</td>
</tr>
<tr class="even">
<td align="center">sore_throat</td>
<td align="center">headache</td>
<td align="center">30.4</td>
<td align="center">88.64</td>
</tr>
<tr class="odd">
<td align="center">runny_nose</td>
<td align="center">fatigue</td>
<td align="center">25.57</td>
<td align="center">84.44</td>
</tr>
<tr class="even">
<td align="center">runny_nose</td>
<td align="center">headache</td>
<td align="center">25.57</td>
<td align="center">84.09</td>
</tr>
</tbody>
</table>
<p>Interesting. <code>loss_of_smell</code> and <code>loss_of_taste</code> are <em>perfectly</em> associated with each other. Let’s look at the raw data.</p>
<pre class="r"><code>with(df, table(loss_of_smell, loss_of_taste))</code></pre>
<pre><code>## loss_of_taste
## loss_of_smell FALSE TRUE
## FALSE 55 0
## TRUE 0 16</code></pre>
<p>They agree for <em>every</em> observation in the dataset and, as a result, their <code>x2y</code> is 100%.</p>
<p>Moving down the <code>x2y</code> ranking, we see a number of variables - <code>fatigue</code>, <code>headache</code>, <code>sore_throat</code>, and <code>runny_nose</code> - that are <em>all strongly associated with each other</em>, as if they are all connected by a common cause.</p>
<p>When the number of variable combinations is high and there are lots of missing values, it can be helpful to scatterplot <code>x2y</code> vs <code>perc_of_obs</code>.</p>
<pre class="r"><code>ggplot(data = dx2y(df), aes(y=x2y, x = perc_of_obs)) +
geom_point()</code></pre>
<pre><code>## Warning: Removed 364 rows containing missing values (geom_point).</code></pre>
<p><img src="/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/index_files/figure-html/unnamed-chunk-14-1.png" width="672" /></p>
<p>Unfortunately, the top-right quadrant is empty: there are no strongly-related variable pairs that are based on at least 50% of the observations. There <em>are</em> some variable pairs with <code>x2y</code> values > 75% but none of them are based on more than 40% of the observations.</p>
<p><br></p>
</div>
<div id="conclusion" class="section level3">
<h3>Conclusion</h3>
<p>Using an insight from Information Theory, we devised a new metric - the <code>x2y</code> metric - that quantifies the strength of the association between pairs of variables.</p>
<p>The <code>x2y</code> metric has several advantages:</p>
<ul>
<li>It works for all types of variable pairs (continuous-continuous, continuous-categorical, categorical-continuous and categorical-categorical)</li>
<li>It captures linear and non-linear relationships</li>
<li>Perhaps best of all, it is easy to understand and use.</li>
</ul>
<p>I hope you give it a try in your work.</p>
<p>(If you found this note helpful, you may find <a href="https://rama100.github.io/lecture-notes/">these</a> of interest)</p>
<p><br></p>
</div>
<div id="acknowledgements" class="section level3">
<h3>Acknowledgements</h3>
<p>Thanks to <a href="https://mitsloan.mit.edu/faculty/directory/amr-farahat">Amr Farahat</a> for helpful feedback on an earlier draft.</p>
<p><br></p>
</div>
<div id="appendix" class="section level3">
<h3>Appendix: How to use the R script</h3>
<p>The <a href="https://github.com/rama100/x2y/blob/main/x2y.R">R script</a> depends on two R packages - <code>rpart</code> and <code>dplyr</code> - so please ensure that they are installed in your environment.</p>
<p>The script has two key functions: <code>x2y()</code> and <code>dx2y()</code>.</p>
<p><br></p>
<div id="using-the-x2y-function" class="section level4">
<h4>Using the <code>x2y()</code> function</h4>
<p><em>Usage</em>: <code>x2y(u, v, confidence = FALSE)</code></p>
<p><em>Arguments</em>:</p>
<ul>
<li><code>u</code>, <code>v</code>: two vectors of equal length</li>
<li><code>confidence</code>: (OPTIONAL) a boolean that indicates if a confidence interval is needed. Default is FALSE.</li>
</ul>
<p><em>Value</em>: A list with the following elements:</p>
<ul>
<li><code>perc_of_obs</code>: the % of total observations that were used to calculate <code>x2y</code>. If some observations are missing for either <span class="math inline">\(u\)</span> or <span class="math inline">\(v\)</span>, this will be less than 100%.</li>
<li><code>x2y</code>: the <code>x2y</code> metric for using <span class="math inline">\(u\)</span> to predict <span class="math inline">\(v\)</span></li>
</ul>
<p>Additionally, if <code>x2y()</code> was called with <code>confidence = TRUE</code>:</p>
<ul>
<li><code>CI_95_Lower</code>: the lower end of a 95% confidence interval for the <code>x2y</code> metric estimated by <a href="https://en.wikipedia.org/wiki/Bootstrapping_(statistics)">bootstrapping</a> 1000 samples</li>
<li><code>CI_95_Upper</code>: the upper end of a 95% confidence interval for the <code>x2y</code> metric estimated by bootstrapping 1000 samples</li>
</ul>
<p><br></p>
</div>
<div id="using-the-dx2y-function" class="section level4">
<h4>Using the <code>dx2y()</code> function</h4>
<p><em>Usage</em>: <code>dx2y(d, target = NA, confidence = FALSE)</code></p>
<p><em>Arguments</em>:</p>
<ul>
<li><code>d</code>: a dataframe</li>
<li><code>target</code>: (OPTIONAL) if you are only interested in the <code>x2y</code> values between a <em>particular variable</em> in <code>d</code> and all other variables, set <code>target</code> equal to the name of the variable you are interested in. Default is NA.</li>
<li><code>confidence</code>: (OPTIONAL) a boolean that indicates if a confidence interval is needed. Default is FALSE.</li>
</ul>
<p><em>Value</em>: A dataframe with each row containing the output of running <code>x2y(u, v, confidence)</code> for <code>u</code> and <code>v</code> chosen from the dataframe. Since this is just a standard R dataframe, it can be sliced, sorted, filtered, plotted etc.</p>
<p><strong>Update on April 16, 2021</strong>: I learned from a commenter that a <a href="https://paulvanderlaken.com/2020/05/04/predictive-power-score-finding-patterns-dataset/">similar approach</a> was proposed in April 2020, and that the R package <a href="https://cran.r-project.org/package=ppsr">ppsr</a> which implements that approach is now available on CRAN.</p>
</div>
</div>
<script>window.location.href='https://rviews.rstudio.com/2021/04/15/an-alternative-to-the-correlation-coefficient-that-works-for-numeric-and-categorical-variables/';</script>
COVID-19 Data Forum: Data Journalism
https://rviews.rstudio.com/2021/04/06/covid-19-data-forum-data-journalism/
Tue, 06 Apr 2021 00:00:00 +0000https://rviews.rstudio.com/2021/04/06/covid-19-data-forum-data-journalism/
<p>The <a href="https://covid19-data-forum.org/">COVID-19 Data Forum</a>, a joint project of the Stanford Data Science Institute and the R Consortium, is an ongoing series of multidisciplinary webinars where topic experts discuss data-related aspects of the scientific response to the pandemic. The most recent event, held on March 18, 2021, explored the role of data journalism in the pandemic. This was a bit of a departure from previous forum events<sup>1</sup> because it focused on issues relating to using and interpreting COVID-19 data, and not on the particular kinds of COVID-19 related data that are available.</p>
<p>I think you will find the <a href="https://www.youtube.com/watch?v=Wh-GynBeEsQ">webinar video</a> worth watching. If you are a statistician or epidemiologist working on COVID-19, you may find the data journalists’ accounts of difficulties they faced working with COVID data and statistical models instructive. But, even if you are not directly working on COVID, you may find that listening to the journalists fills in some gaps between what you know about statistics and data visualizations and what you see in the news.</p>
<p>The data journalism event was moderated by <a href="https://twitter.com/irenatfh?lang=en">Dr. Irena Hwang</a>, a data reporter at ProPublica. Speakers included
<a href="https://journalism.columbia.edu/faculty/mark-hansen">Dr. Mark Hansen</a>, David and Helen Gurley Brown Professor of Journalism and Innovation at Columbia University; <a href="https://twitter.com/anarina?lang=en">Ana Carolina Moreno</a>, a senior data journalist at TV Globo in São Paulo, Brazil; and
<a href="https://twitter.com/meghanhoyer?lang=en">Meghan Hoyer</a>, Director of Data Reporting at the Washington Post.</p>
<p>The video of the data journalism event is <a href="https://www.youtube.com/watch?v=Wh-GynBeEsQ">available here</a>. The following short time map and the times referenced in my comments below should be helpful for browsing the ninety minute event.</p>
<ul>
<li><strong>2:37</strong> Irena Hwang introduces Mark Hansen</li>
<li><strong>3:50</strong> Start of Mark’s talk</li>
<li><strong>19:30</strong> Irena introduces Ana Carolina Moreno (Carol)</li>
<li><strong>21:10</strong> Start of Carol’s talk</li>
<li><strong>39:20</strong> Irena introduces Meghan Hoyer</li>
<li><strong>40:00</strong> Start of Meghan’s talk</li>
<li><strong>1:01:40</strong> Start of discussion</li>
</ul>
<h3 id="mark-hansen">Mark Hansen</h3>
<p>In his talk, Mark offers an overview of the profession of data journalism that provides some historical context and emphasizes the hybrid nature of the practice which blends a hard nose detective’s drive to uncover facts with the empathy to tell stories “about who we are and how we live”.</p>
<p><strong>7:00</strong> Mark introduces Joseph Pulitzer’s 1904 paper <a href="https://www.jstor.org/stable/25119561?refreqid=excelsior%3A9216a1bfa7873dae49d35beff9b2b01d&seq=33#metadata_info_tab_contents">The College of Journalism</a> in which Pulitzer includes Statistics as a subject journalists should study. On page 673, Pulitzer writes:</p>
<blockquote>
<p>You want statistics to tell you the truth. You can find truth there if you know how to get at it, and romance, human interest, humor and fascinating revelations as well.</p>
</blockquote>
<p><strong>10:19</strong> Mark describes a piece, <a href="https://www.cjr.org/first_person/journalism-notebooks.php"><em>An ode to reporter’s notebooks</em></a>, published by Philip Eil in the <em>Columbia Journalism Review</em> that offers a personal account of reporting: Eil writes:</p>
<blockquote>
<p>To report is to be alert and alive at a particular time and place.</p>
</blockquote>
<p><img src="mark.png" height = "400" width="600"></p>
<p><strong>11:00</strong> Mark remarks:</p>
<blockquote>
<p>when we’re thinking about bringing computation to journalism we are taking that basic curiosity that we are cultivating in our students minds … and adding computational lines of inquiry to that habit of mind, that questioning why things look the way they do…</p>
</blockquote>
<p><strong>12:08</strong> Mark calls attention to the report by Charles Berret and Cheryl Phillips <a href="https://journalism.columbia.edu/system/files/content/teaching_data_and_computational_journalism.pdf"><em>Teaching Data And Computational Journalism</em></a> and describes some recent activities of the <a href="https://brown.columbia.edu/">Brown Institute</a> at the Columbia School of Journalism.</p>
<h3 id="ana-carolina-moreno">Ana Carolina Moreno</h3>
<p><strong>22:32</strong> Carol introduces Brazil’s universal healthcare system and shows a schematic of the available official and unofficial COVID-19 data sources.</p>
<p><strong>26:00</strong> Carol notes that a platform originally built to track SARS data was adapted to track COVID.</p>
<p><strong>27:38</strong> Carol explains that, in practice, there are many obstacles making it difficult to obtain the data necessary to understand how the pandemic is developing. Some of these are called out in the following slide:</p>
<p><img src="c2.png" height = "400" width="600"></p>
<p><strong>30:37</strong> Carol remarks that hospital data seems to be the most reliable.</p>
<p><strong>31:06</strong> Carol describes how the government changed its policy for reporting deaths. The new scheme of only reporting deaths that have been confirmed in the past twenty-four hours vastly undercounts the current death rate.</p>
<p><strong>31:57</strong> In an effort to obtain more reliable data, a consortium of competing journalists at local news organizations began cooperating by sharing information directly obtained from hospitals every day.</p>
<p><strong>32:35</strong> Carol provides a view of day-to-day journalism at the local news organizations and describes how the data journalist scrape data on a daily basis to populate dashboards showing rolling averages and daily indicators. By focusing on the more reliable hospitalization data journalists are doing their best to track the spread of the pandemic an expose inequities in the health care system.</p>
<h3 id="meghan-hoyer">Meghan Hoyer</h3>
<p><strong>40:07</strong> Meghan begins her walk through of what last year was like for data journalists who were trying to tell the story of the pandemic in real time as it was happening.</p>
<p><strong>41:22</strong> Meghan recounts her experiences trying to make sense of COVID-19 models and expresses the frustration she and other data journalists felt with the multitude of contradictory predictive models.</p>
<p><strong>44:03</strong> In a memorable quote, Meghan remarks:</p>
<blockquote>
<p>Models were inherently problematic and yet they were being forced upon us by society…</p>
</blockquote>
<p><img src="models.png" height = "400" width="600"></p>
<p>Consequently journalists at the AP agreed and decided that they were not going to base stories on models.</p>
<p>In absence of reliable case data, and wanting nothing to do with the models, Meghan explains that data journalists turned to whatever data they could get their hands on to quantify the story of the pandemic.</p>
<p><strong>46:00</strong> Meghan recounts how journalists used garbage pickup data as a proxy for population density to estimate where people were living in NYC and correlate it with case data.</p>
<p><img src="garbage.png" height = "400" width="600"></p>
<p><strong>47:30</strong> Journalists struggled to find data to verify the anecdotal stories they were hearing about the the disparities in who was being affected by virus. Finding that one quarter to one third of the COVID case data was missing information on race, data journalists “hand collected” data by looking city by city to find the missing data.</p>
<p><strong>50:30</strong> Meghan recounts how they turned to age adjusted data to determine the impact of the virus on communities of color.</p>
<p><strong>52:10</strong> Data journalists find that excess deaths is a reliable metric for determining the impact of what is happening on the ground.</p>
<p><strong>54:18</strong> Journalists developed a survey which was returned by seven hundred schools to investigate how going back to school might be affecting students. Among their findings was that districts serving students of color were more likely to start online.</p>
<p><strong>56:35</strong> Meghan discusses the <a href="https://covidtracking.com/">COVID-19 Tracking Project</a> and the effort to sort out the impact of test positivity rates. She reports that because not all states measure the number of people who test in the same way, correctly comparing test positivity rates among states remains an unsolved problem.</p>
<p><strong>58:33</strong> Meghan shares the need to “flip the numbers” to help people understand the meaning of statistics stated in terms of very large numbers. For example, saying that “Since January of last year at least 1 in 15 people who live in Alexandria, Virginia have been infected by the virus” is easier for people to understand than something like: “On March 17th there were 14 cases per 100,000 in Alexandria”.</p>
<p><strong>59:53</strong> Vaccination tracking is another problematic data reporting area. Not only are vaccinations reported differently from state-to-state, but the data that is reported is changing from day-to-day. The CDC is apparently still adding new fields to the vaccination data sets.</p>
<h3 id="the-q-a-discussion">The Q & A Discussion</h3>
<p><strong>1:02</strong> The question and answer discussion begins.</p>
<p><strong>1:02:56</strong> Mark talks about how visualizations evolved over the course of the pandemic.</p>
<p><strong>1:06:08</strong> Carol and then Meghan talk how the lessons the pandemic taught data journalists about competition and collaboration.</p>
<p><strong>1:10:04</strong> Meghan describes how during the pandemic data journalists became advocates for public data.</p>
<p><strong>1:11:21</strong> Carol answers a question about the opportunities for data journalism in Brazil.</p>
<p><strong>1:15:50</strong> Answers a question of how academia is supporting data journalism during the pandemic and mentions an effort to have statistical and scientific experts collaborate with data journalists.</p>
<p><strong>1:20:19</strong> Meghan responds to a question about technical and social challenges for data journalists during the pandemic.</p>
<p><strong>1:23:10</strong> Carol talks about the difference between reporting online news and television news.</p>
<p><strong>1:26:01</strong> Mark answers a question about communicating emotional impact in COVID reporting and ends with emphasizing the importance of communicating honestly about what we do, and do not know.</p>
<p><sup>1</sup>The <a href="https://www.youtube.com/watch?v=6N1p99bLXjk">first forum</a> on May 14, 2020 focused on the data needs and challenges of modeling and controlling the spread of COVID-19, The <a href="https://www.youtube.com/watch?v=mEsDzwIMDz8">second forum</a> on August 13, 2020 explored what was being done to make clinical data available and useful. The <a href="https://www.youtube.com/watch?v=Blab8omzrb8">third forum</a> on December 10, 2020 discussed the role of mobility data.</p>
<script>window.location.href='https://rviews.rstudio.com/2021/04/06/covid-19-data-forum-data-journalism/';</script>
What does it take to do a t-test?
https://rviews.rstudio.com/2021/03/29/what-does-it-take-to-do-a-t-test/
Mon, 29 Mar 2021 00:00:00 +0000https://rviews.rstudio.com/2021/03/29/what-does-it-take-to-do-a-t-test/
<script src="/2021/03/29/what-does-it-take-to-do-a-t-test/index_files/header-attrs/header-attrs.js"></script>
<p>In this post, I examine the fundamental assumption of independence underlying the basic <a href="https://en.wikipedia.org/wiki/Student%27s_t-test">Independent two-sample t-test</a> for comparing the means of two random samples. In addition to independence, we assume that both samples are draws from normal distributions where the population means and common variance are unknown. I am going to assume that you are familiar with this kind of test, but even if you are not you are still in the right place. The references at the end of the post all provide rigorous, but gentle explanations that should be very helpful.</p>
<div id="the-two-sample-t-test" class="section level3">
<h3>The two sample t-test</h3>
<p>Typically, we have independent samples for some numeric variable of interest (say the concentration of a drug in the blood stream) from two different groups, and we would like to know whether it is likely that two groups differ with respect to this variable. The formal test of the null hypothesis, <span class="math inline">\(H_0\)</span>, that the means of the underlying populations from which the samples are drawn are equal, proceeds making some assumptions:</p>
<ol style="list-style-type: decimal">
<li><span class="math inline">\(H_0\)</span> is true</li>
<li>The samples are independent</li>
<li>The data are normally distributed</li>
<li>The variances of the two samples are equal (This is the simplest test.)</li>
</ol>
<p>Next, a test statistic that includes the difference between the two sample means is calculated, and a decision is made to establish a “rejection region” for the test statistic. This region depends on the particular circumstances of the test, and is selected to balance the error of rejecting <span class="math inline">\(H_0\)</span> when it is true against the error of not rejecting <span class="math inline">\(H_0\)</span> when it is false. If we compute the test statistic and its value does not fall in the rejection region, then we do not reject <span class="math inline">\(H_0\)</span> and we conclude that we have found nothing. On the other hand, if the test statistic does fall in the rejection region, then we reject the <span class="math inline">\(H_0\)</span> and conclude that our data along with the the bundle of assumptions we made in setting up the test, and the “steel trap” logic of the t-test itself provide some evidence that the population means are different. (Page 6 of the MIT Open Courseware notes <a href="https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading18.pdf">Null Hypothesis Significance Testing II</a> contains an elegantly concise mathematical description of the t-test.)</p>
<p>All of the above assumptions must hold, or be pretty close to holding for the test to give an accurate result. However in my opinion, from the point of view of statistical practice, assumption 2. is fundamental. There are other tests and workarounds for the situations where 4. doesn’t hold. Assumption 3. is very important, but it is relatively easy to check, and the t-test is robust enough to deal with some deviation from normality. Of course, assumption 1. is important. The whole test depends on it, but this assumption is baked into the software that will run the test.</p>
</div>
<div id="independence" class="section level3">
<h3>Independence</h3>
<p>Independence, on the other hand can be a show stopper. Checking for independence is the difference between doing statistics and carrying out a mathematical or maybe just a mechanical exercise. It often involves considerable creative thinking and tedious legwork.</p>
<p>So, what do we mean by independent samples or independent data, and how do we go about verifying it? Independence is a mathematical idea, an abstraction from probability theory. Two events A and B are said to be independent events if the probability of both A and B happening equals the product of the probabilities of A and B happening. That is: P(AB) = P(A)P(B).</p>
<p>A more intuitive way to think about it is in terms of conditionally probability. In general, the probability of A happening given that B happens is defined to be:</p>
<blockquote>
<p>P(A|B) = P(<span class="math inline">\(A\bigcap B\)</span>) / P(B)</p>
</blockquote>
<p>If A and B are independent then P(A|B) = P(A). That is: B has no influence on whether A happens.</p>
<p>“Independent data” or “independent samples” are both shorthand for data sampled or otherwise resulting from independent probability distributions. Relating the mathematical concept to a real world situation requires a clear idea of the population of interest, considerable domain expertise, and a mental slight of hand that is nicely exposed in the short article <a href="https://support.minitab.com/en-us/minitab/19/help-and-how-to/statistics/basic-statistics/supporting-topics/tests-of-means/what-are-independent-samples/">What are independent samples?</a>, by the Minitab® folks. They write:</p>
<blockquote>
<p>Independent samples are samples that are selected randomly so that its observations do not depend on the values other observations.</p>
</blockquote>
<p>Notice what is happening here: what started out as a property of probability distributions has now become a prescription for obtaining data in a way that makes it plausible that we can assume independence for the probability distributions that we imagine govern our data. This is a real magic trick. No procedure for selecting data is ever going to guarantee the mathematical properties of our models. Nevertheless, the statement does show the way to proceed. By systematically tracking down all possibilities for interaction within the sampling process and eliminating the possibilities for one sample to influence another it may be possible to reach confidence that it is plausible to assume that the samples are independent. Because the math says that <a href="http://athenasc.com/Bivariate-Normal.pdf">independent data are not correlated</a> much of the exploratory data analysis involves looking for correlations that would signal dependent data. The Minitab® authors make this clear in the <a href="https://support.minitab.com/en-us/minitab/19/help-and-how-to/statistics/basic-statistics/supporting-topics/tests-of-means/what-are-independent-samples/">example</a> they offer to illustrate their definition.</p>
<blockquote>
<p>For example, suppose quality inspectors want to compare two laboratories to determine whether their blood tests give similar results. They send blood samples drawn from the same 10 children to both labs for analysis. Because both labs tested blood specimens from the same 10 children, the test results are not independent. To compare the average blood test results from the two labs, the inspectors would need to do a paired t-test, which is based on the assumption that samples are dependent.</p>
</blockquote>
<blockquote>
<p>To obtain independent samples, the inspectors would need to randomly select and test 10 children using Lab A and then randomly select and test a different group of 10 different children using Lab B. Then they could compare the average blood test results from the two labs using a 2-sample t-test, which is based on the assumption that samples are independent.</p>
</blockquote>
<p>Nicely said, and to further make their point, I am sure that the authors would agree that if it somehow turned out that the children from lab B happened to be the identical twins of the children in Lab A, they still would not have independent samples.</p>
</div>
<div id="what-happens-when-samples-are-not-independent" class="section level3">
<h3>What happens when samples are not independent</h3>
<p>The following example illustrates the consequences of performing a t-test when the independence assumption does not hold. We adapt a method of <a href="https://blog.revolutionanalytics.com/2016/08/simulating-form-the-bivariate-normal-distribution-in-r-1.html">simulating a bivariate normal distribution</a> with a specified covariance matrix that produces two dependent samples with a specified correlation matrix.</p>
<pre class="r"><code>library(tidyverse)
library(ggfortify)
set.seed(9999)</code></pre>
<p>First, we simulate a two uncorrelated samples with 20 observations each and run a two-sided t-test with equal variances. As you would expect, test output shows that there are 38 degrees of freedom and the p-value is large.</p>
<pre class="r"><code>rbvn_t<-function (n=20, mu1=1, s1=4, mu2=1, s2=4, rho=0)
{
X <- rnorm(n, mu1, s1)
Y <- rnorm(n, mu2 + (s2/s1) * rho *
(X - mu1), sqrt((1 - rho^2)*s2^2))
t.test(X,Y, mu=0, alternative = "two.sided", var.equal = TRUE)
}
rbvn_t()</code></pre>
<pre><code>##
## Two Sample t-test
##
## data: X and Y
## t = 2.1, df = 38, p-value = 0.04
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.06266 5.06516
## sample estimates:
## mean of x mean of y
## 2.9333 0.3694</code></pre>
<p>Now we simulate 10,000 two-sided t-tests with independent samples having 20 observations in each sample.</p>
<pre class="r"><code>ts <- replicate(10000,rbvn_t(n=20, mu1=1, s1=4, mu2=1, s2=4, rho=0)$statistic)</code></pre>
<p>Plotting the simulated samples shows that the empirical density curve nicely overlays the theoretical density for the t-distribution.</p>
<pre class="r"><code>p <- ggdistribution(dt, df = 38, seq(-4, 4, 0.1))
autoplot(density(ts), colour = 'blue', p = p, fill = 'blue') +
ggtitle("When variables are independent")</code></pre>
<p><img src="/2021/03/29/what-does-it-take-to-do-a-t-test/index_files/figure-html/unnamed-chunk-4-1.png" width="672" /></p>
<p>Moreover, the 0.975 quantile, the value that would indicate the upper boundary for the acceptance region for an <span class="math inline">\(\alpha\)</span> value of 0.05 is very close to the theoretical value of 2.024.</p>
<pre class="r"><code>quantile(ts,.975)</code></pre>
<pre><code>## 97.5%
## 1.996</code></pre>
<pre class="r"><code>qt(.975,38)</code></pre>
<pre><code>## [1] 2.024</code></pre>
<p>Next, we simulate 10,000 small samples of 20 with a correlation of 0.3.</p>
<pre class="r"><code>ts_d <- replicate(10000,rbvn_t(n=20, mu1=1, s1=4, mu2=1, s2=4, rho=.3)$statistic)</code></pre>
<p>We see that now the fit is not so good. There simulated distribution has noticeably less probability in the tails.</p>
<pre class="r"><code>pd <- ggdistribution(dt, df = 38, seq(-4, 4, 0.1))
autoplot(density(ts_d), colour = 'blue', p = pd, fill = 'blue') +
ggtitle("When variables are NOT independent")</code></pre>
<p><img src="/2021/03/29/what-does-it-take-to-do-a-t-test/index_files/figure-html/unnamed-chunk-7-1.png" width="672" />
The .975 quantile is much lower than the theoretical value of 2.024 showing that dependent data would lead to very misleading p-values.</p>
<pre class="r"><code>quantile(ts_d,.975)</code></pre>
<pre><code>## 97.5%
## 1.73</code></pre>
</div>
<div id="summary" class="section level3">
<h3>Summary</h3>
<p>Properly performing a t-test on data obtained from an experiment could mean doing a whole lot of up front work to design the experiment in a way that will make the assumptions plausible. One could argue that the real practice of statistics begins even before making exploratory plots. Doing statistics with found data is much more problematic. At a minimum, doing a simple t-test means acquiring more that a superficial understanding of how the data were generated.</p>
<p>Finally, when all is said and done, and you have a well constructed t-test that results in a sufficiently small p-value to reject the null hypothesis, you will have attained what most people call a statistically significant result. However, I think this language misleadingly emphasizes the mechanical grinding of the “steel trap” logic of the test that I mentioned above. Maybe instead we should emphasize the work that went into checking assumptions, and think about hypothesis tests as producing “plausibly significant” results.</p>
</div>
<div id="some-resources-for-doing-t-tests-in-r" class="section level3">
<h3>Some resources for doing t-tests in R</h3>
<ul>
<li><p>Holmes and Huber (2019) <a href="https://web.stanford.edu/class/bios221/book/">Modern Statistics for Modern Biology</a>, Chapter 6,</p></li>
<li><p>Orloff and Bloom (2014) <a href="https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading18.pdf">Null Hypothesis Significance Testing II</a></p></li>
<li><p>Poldrack (2018) <a href="https://web.stanford.edu/group/poldracklab/statsthinking21/">Statistical Thinking for the 21st century</a>, Chapter 9</p></li>
<li><p>Spector <a href="https://statistics.berkeley.edu/computing/r-t-tests">Using t-tests in R</a></p></li>
<li><p>Wetherill (2015) <a href="https://datascienceplus.com/t-tests/">How to Perform T-tests in R</a></p></li>
</ul>
</div>
<script>window.location.href='https://rviews.rstudio.com/2021/03/29/what-does-it-take-to-do-a-t-test/';</script>
February 2021: "Top 40" New CRAN Packages
https://rviews.rstudio.com/2021/03/19/february-2021-top-40-new-cran-packages/
Fri, 19 Mar 2021 00:00:00 +0000https://rviews.rstudio.com/2021/03/19/february-2021-top-40-new-cran-packages/
<p>In February, two hundred forty-three new packages made it to CRAN, many of them very interesting and at least one entertaining. It was exceptionally difficult to pick the “Top 40”, but here they are, more or less, in eleven categories: Computational Methods, Data, Finance, Games, Genomics, Machine Learning, Mathematics, Medicine, Networks and Graphs, Statistics, Utilities, and Visualization. <code>iconr</code> in the Networks and Graphs section is a package for doing computational archaeology, a relatively new field that I hope will dig R. I also hope that <code>sassy</code> in the Statistics sections helps some statisticians find their way to R.</p>
<h3 id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=blaster">blaster</a> v1.0.3: Implements an efficient BLAST-like sequence comparison algorithm, written in C++11 and using native R data types. See <a href="https://www.biorxiv.org/content/10.1101/399782v1">Schmid et al. (2018)</a> for background and <a href="https://cran.r-project.org/web/packages/blaster/readme/README.html">README</a> for an example.</p>
<p><a href="https://cran.r-project.org/package=rando">rando</a> v0.2.0: Provides random number generating functions that are much more context aware than the built-in functions. The functions are also safer, as they check for incompatible values, and reproducible.</p>
<h3 id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=AWAPer">AWAPer</a> 0.1.46: Provides catchment area weighted climate data NetCDF files from the Bureau of Meteorology <a href="http://www.bom.gov.au/jsp/awap/">Australian Water Availability Project</a> for all of Australia. There is a vignette on <a href="https://cran.r-project.org/web/packages/AWAPer/vignettes/Catchment_avg_ET_rainfall.html">Daily Area Weighted PET and Precipitation</a> and another on <a href="https://cran.r-project.org/web/packages/AWAPer/vignettes/Point_rainfall.html">Daily Point Precipitation</a></p>
<p><img src="AWAPer.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=caRecall">caRecall</a> v0.1.0: Provides API access to the Government of Canada <a href="https://tc.api.canada.ca/en/detail?api=VRDB">Vehicle Recalls Database</a> used by the Defect Investigations and Recalls Division for vehicles, tires, and child car seats. See the <a href="https://cran.r-project.org/web/packages/caRecall/vignettes/vrd_vignette.html">vignette</a>.</p>
<p><img src="caRecall.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=geofi">geofi</a> v1.0.0: Provides tools for reading Finnish open geospatial data in R. There are vignettes on <a href="https://cran.r-project.org/web/packages/geofi/vignettes/geofi_datasets.html">Datasets</a>, <a href="https://cran.r-project.org/web/packages/geofi/vignettes/geofi_joining_attribute_data.html">Joining Attributes</a>, <a href="https://cran.r-project.org/web/packages/geofi/vignettes/geofi_making_maps.html">Making Maps</a>, <a href="https://cran.r-project.org/web/packages/geofi/vignettes/geofi_spatial_analysis.html">Data Manipulation</a>, and <a href="https://cran.r-project.org/web/packages/geofi/vignettes/tricolore_tutorial.html">Color-coded Maps</a>.</p>
<p><img src="geofi.png" height = "400" width="200"></p>
<p><a href="https://cran.r-project.org/package=hockeystick">hockeystick</a> v0.4.0: Provides easy access to essential climate change data sets for non-climate experts. Users can download the latest raw data from authoritative sources and view it via pre-defined <code>ggplot2</code> charts. Data sets include atmospheric CO2, instrumental and proxy temperature records, sea levels, Arctic/Antarctic sea-ice, and Paleoclimate data. Sources include: <a href="https://www.esrl.noaa.gov/gmd/ccgg/trends/data.html">NOAA Mauna Loa Laboratory</a>, <a href="https://data.giss.nasa.gov/gistemp/">NASA GISTEMP</a>, <a href="https://nsidc.org/data/seaice_index/archives">National Snow and Sea Ice Data Center</a>, <a href="http://www.cmar.csiro.au/sealevel/sl_data_cmar.htm">CSIRO</a>, <a href="https://www.star.nesdis.noaa.gov/socd/lsa/SeaLevelRise/">NOAA Laboratory for Satellite Altimetry</a>, and <a href="https://cdiac.ess-dive.lbl.gov/trends/co2/vostok.html">Vostok Paleo</a> carbon dioxide and temperature data. See <a href="https://cran.r-project.org/web/packages/hockeystick/readme/README.html">README</a> for examples.</p>
<p><img src="hockeystick.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=votesmart">votesmart</a> v0.1.0: Implements a wrapper to the <a href="https://justfacts.votesmart.org/">Project VoteSmart</a> API. See the <a href="https://cran.r-project.org/web/packages/votesmart/vignettes/votesmart.html">vignette</a>.</p>
<h3 id="finance">Finance</h3>
<p><a href="https://cran.r-project.org/package=PriceIndices">PriceIndices</a> v0.0.3: Provides functions to compute bilateral and multilateral indexes. For details, see: <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/roiw.12304">de Haan and Krsinich (2017)</a> and <a href="https://www.tandfonline.com/doi/abs/10.1080/07350015.2020.1816176?journalCode=ubes20">Diewert and Fox (2020)</a>. The <a href="https://cran.r-project.org/web/packages/PriceIndices/vignettes/PriceIndices.html">vignette</a> offers examples.</p>
<p><a href="https://cran.r-project.org/package=treasuryTR">treasuryTR</a> v0.1.1: Generates Total Returns (TR) from bond yield data with fixed maturity (e.g. reported treasury yields) which may provide an alternative to commercial products. See <a href="https://www.mdpi.com/2306-5729/4/3/91">Swinkels (2019)</a> for background and the <a href="https://cran.r-project.org/web/packages/treasuryTR/vignettes/treasuryTR.html">vignette</a> for examples.</p>
<p><img src="treasuryTR.png" height = "200" width="400"></p>
<h3 id="games">Games</h3>
<p><a href="https://cran.r-project.org/package=pixelpuzzle">pixelpuzzle</a> v1.0.0: Implements a puzzle game that can be played in the R console. Restore the pixel art by shifting rows. Learn how to play <a href="https://github.com/rolkra/pixelpuzzle">here</a>.</p>
<p><img src="pixelpuzzle.png" height = "200" width="400"></p>
<h3 id="genomics">Genomics</h3>
<p><a href="https://cran.r-project.org/package=CDSeq">CDSeq</a> v1.0.8: Provides functions to estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions simultaneously using bulk sequencing data. See <a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007510">Kang et al. (2019)</a> for the theory and the <a href="https://cran.r-project.org/web/packages/CDSeq/vignettes/CDSeq-vignette.html">vignette</a> for examples.</p>
<p><img src="CDSeq.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=ClusTorus">ClusTorus</a> v0.0.1: Provides various tools for clustering multivariate angular data on the torus including angular adaptations of usual clustering methods such as the k-means clustering, pairwise angular distances. See the <a href="https://cran.r-project.org/web/packages/ClusTorus/vignettes/ClusTorus.html">vignette</a> for examples.</p>
<p><img src="ClusTorus.png" height = "200" width="400"></p>
<p><a href="https://CRAN.R-project.org/package=dsb">dsb</a> v0.1.0: Provides a method for normalizing and denoising protein expression data from droplet based single cell experiments. See the <a href="https://cran.r-project.org/web/packages/dsb/vignettes/dsb_normalizing_CITEseq_data.html">vignette</a> for tutorials on how to integrate <code>dsb</code> with Seurat, Bioconductor and the AnnData class in Python. The preprint <a href="https://www.biorxiv.org/content/10.1101/2020.02.24.963603v1">Mulè et al. (2020)</a> describes the details.</p>
<p><img src="dsb.png" height = "200" width="400"></p>
<h3 id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=bestridge">besridge</a> v1.0.4: Provides functions to perform ridge regression in complex situations on high dimensional data using the primal dual active set algorithm proposed in <a href="https://www.jstatsoft.org/article/view/v094i04">Wen et al. (2020)</a>. Functions support regression, classification, count regression and censored regression, group variable selection and nuisance variable selection. See the <a href="https://cran.r-project.org/web/packages/bestridge/vignettes/An-introduction-to-bestridge.html">vignette</a> for examples.</p>
<p><img src="bestridge.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=ROCket">ROCket</a> v1.0.1: Provides functions for estimating receiver operating characteristic (ROC) curves and area under the curve (AUC) calculation which distinguish two types of ROC curve representations: 1) parametric curves - the true positive rate (TPR) and the false positive rate (FPR) are functions of a score parameter and 2) function curves - TPR is a function of FPR. See <a href="https://www.ine.pt/revstat/pdf/rs140101.pdf">Gonçalves et al. (2014)</a> and <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.0006-341X.2004.00200.x">Cai & Pepe (2004)</a> for background and <a href="https://cran.r-project.org/web/packages/ROCket/readme/README.html">README</a> to get started.</p>
<p><img src="ROCket.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=wordpiece">wordpiece</a> v1.0.2: Provides functions to apply <a href="https://arxiv.org/abs/1609.08144">Wordpiece</a> tokenization to input text, given an appropriate vocabulary. The <a href="https://arxiv.org/abs/1810.04805">BERT</a> tokenization conventions are used by default. See the <a href="https://cran.r-project.org/web/packages/wordpiece/vignettes/basic_usage.html">vignette</a> for an example.</p>
<h3 id="mathematics">Mathematics</h3>
<p><a href="https://cran.r-project.org/package=fractD">fractD</a> v0.1.0: Estimates the of fractal dimension of a black area in 2D and 3D (slices) images using the box-counting method. See <a href="https://link.springer.com/article/10.1007%2FBF02065874">Klinkenberg (1994)</a> for background and the <a href="https://cran.r-project.org/web/packages/fractD/vignettes/Calculates_the_fractal_dimension_of_2D_and_3D_images.html">vignette</a> for examples.</p>
<p><img src="fractD.svg" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=spacefillr">spacefillr</a> v0.2.0: Generates random and quasi-random space-filling sequences including <a href="https://en.wikipedia.org/wiki/Halton_sequence">Halton</a>, <a href="https://en.wikipedia.org/wiki/Sobol_sequence">Sobol</a> and other sequences with errors distributed as various types of jittered blue noise. See <a href="https://epubs.siam.org/doi/10.1137/070709359">Joe and Kuo (2018)</a>, <a href="https://graphics.pixar.com/library/ProgressiveMultiJitteredSampling/paper.pdf">Christensen et al. (2018)</a> and <a href="https://dl.acm.org/doi/10.1145/3306307.3328191">Heitz et al. (2019)</a> for background and look <a href="https://github.com/tylermorganwall/spacefillr">here</a> for examples.</p>
<p><img src="spacefillr.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=tensorsign">tensorsign</a> v0.1.0: Provides an efficient algorithm for nonparametric tensor completion via sign series. The algorithm which employs the alternating optimization approach to solve the weighted classification problem is described in <a href="https://arxiv.org/abs/2102.00384">Lee and Wang (2021)</a></p>
<h3 id="medicine">Medicine</h3>
<p><a href="https://cran.r-project.org/package=bhmbasket">bhmbasket</a> v0.9.1: Provides functions to evaluate basket trial designs with binary endpoints using Bayesian hierarchical models and Bayesian decision rules. See <a href="https://journals.sagepub.com/doi/10.1177/1740774513497539">Berry et al. (2013)</a>, <a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/pst.1730">Neuenschwander et al. (2016)</a> and <a href="https://link.springer.com/article/10.1177%2F2168479014533970">Fisch et al. (2015)</a> for background and the <a href="https://cran.r-project.org/web/packages/bhmbasket/vignettes/reproduceExNex.html">vignette</a> for an example.</p>
<p><a href="https://cran.r-project.org/package=bp">bp</a> v1.0.1: Provides functions to aid in the analysis of blood pressure data of all forms by providing both descriptive and visualization tools for researchers. There is a <a href="https://cran.r-project.org/web/packages/bp/vignettes/bp.html">vignette</a>.</p>
<p><img src="blood.png" height = "400" width="600"></p>
<p><a href="https://cran.r-project.org/package=CHOIRBM">CHOIRBM</a> v0.0.2: Provides functions for visualizing body map data collected with the Collaborative Health Outcomes Information Registry <a href="https://choir.stanford.edu/">CHOIR)</a>. See the <a href="https://cran.r-project.org/web/packages/CHOIRBM/vignettes/plot-one-patient.html">vignette</a>.</p>
<p><img src="CHOIRBM.png" height = "300" width="300"></p>
<p><a href="https://cran.r-project.org/package=QDiabetes">QDiabetes</a> v1.0-2: Calculates the risk of developing type 2 diabetes using risk prediction algorithms derived by <a href="https://clinrisk.co.uk/ClinRisk/Welcome.html">ClinRisk</a>. Look <a href="https://github.com/Feakster/qdiabetes">here</a> for information and examples.</p>
<p><a href="https://cran.r-project.org/package=SteppedPower">SteppedPower</a> v0.1.0: Provides tools for power and sample size calculations and design diagnostics for longitudinal mixed models with a focus on stepped wedge designs using methods introduced in <a href="https://www.sciencedirect.com/science/article/pii/S1551714406000632?via%3Dihub">Hussey and Hughes (2007)</a> and extensions discussed in <a href="https://journals.sagepub.com/doi/10.1177/0962280220932962">Li et al. (2020)</a>. See the <a href="https://cran.r-project.org/web/packages/SteppedPower/vignettes/Getting_Started.html">vignette</a> to get started.</p>
<p><img src="SteppedPower.png" height = "200" width="400"></p>
<h3 id="networks-and-graphs">Networks and Graphs</h3>
<p><a href="https://cran.r-project.org/package=bnmonitor">bnmonitor</a> v0.1.0. Implements sensitivity and robustness methods for Bayesian networks including methods to perform parameter variations via a variety of co-variation schemes, to compute sensitivity functions and to quantify the dissimilarity of two Bayesian networks via distances and divergences. See <a href="https://www.jair.org/index.php/jair/article/view/10307">Chan and Darwiche (2002)</a>, <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1539-6975.2007.00235.x">Cowell et al. (2007)</a>, and <a href="https://arxiv.org/abs/1809.10794">Goergen and Leonell (2020)</a> for background and <a href="https://cran.r-project.org/web/packages/bnmonitor/readme/README.html">README</a> for examples.</p>
<p><img src="bnmonitor.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=iconr">iconr</a> v0.1.0: Provides formal methods for studying archaeological iconographic data sets (rock-art, pottery decoration, stelae, etc.) using network and spatial analysis See <a href="http://archiv.ub.uni-heidelberg.de/propylaeumdok/512/">Alexander (2008)</a> and <a href="https://hal.archives-ouvertes.fr/hal-02913656">Huet (2018)</a> for background and the <a href="https://cran.r-project.org/web/packages/iconr/vignettes/index.html">vignette</a> for examples.</p>
<p><img src="iconr.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=MLVSBM">MLVSBM</a> 0.2.1: Provides functions for simulation, inference and clustering of multilevel networks using a stochastic block model framework as described in <a href="https://www.sciencedirect.com/science/article/abs/pii/S016794732100013X?via%3Dihub">Chabert-Liddell et al. (2021)</a>. There is a <a href="https://cran.r-project.org/web/packages/MLVSBM/vignettes/vignette.html">tutorial</a>.</p>
<p><img src="MLVSBM.png" height = "300" width="300"></p>
<p><a href="https://cran.r-project.org/package=motifr">motifr</a> v1.0.0: Provides tools to analyze motifs(small configurations of nodes and edges) in multi-level networks (networks which combine multiple networks in one, e.g. social-ecological networks.) See <a href="https://cran.r-project.org/web/packages/motifr/vignettes/motif_zoo.html">The motif zoo</a> and <a href="https://cran.r-project.org/web/packages/motifr/vignettes/random_baselines.html">Baseline model comparisons</a>.</p>
<p><img src="motifr.svg" height = "200" width="400"></p>
<h3 id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=cfda">cfda</a> v0.9.9: Provides functions to encode categorical data as functional data and perform basis statistical analysis. See <a href="https://hal.inria.fr/hal-02973094/document">Preda et al. (2020)</a> for background and the <a href="https://cran.r-project.org/web/packages/cfda/vignettes/cfda.html">vignette</a> to get started.</p>
<p><img src="cfda.png" height = "350" width="350"></p>
<p><a href="https://cran.r-project.org/package=cvCovEst">cvCovEst</a> v0.3.4: Implements an efficient cross-validated approach for covariance matrix estimation, particularly useful in high-dimensional settings. See the <a href="https://cran.r-project.org/web/packages/cvCovEst/vignettes/using_cvCovEst.html">vignette</a> for background and examples.</p>
<p><img src="cvCovEst.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=flipr">flipr</a> v0.2.1: Implements a permutation framework point estimation, confidence intervals or hypothesis testing for multiple data types. There is a <a href="https://cran.r-project.org/web/packages/flipr/vignettes/flipr.html">Tour of Permutation Inference</a>, and vignettes on <a href="https://cran.r-project.org/web/packages/flipr/vignettes/alternative.html">Alternative Hypothesis Testing</a>, the <a href="https://cran.r-project.org/web/packages/flipr/vignettes/exactness.html">Exactness of Permutation Tests</a>, and <a href="https://cran.r-project.org/web/packages/flipr/vignettes/pvalue-function.html">Calculating p-value Functions</a>.</p>
<p><img src="flipr.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=ipmr">ipmr</a> v0.0.1: implements integral projection models using an expression based framework that handles density dependence and environmental stochasticity and provides tools for diagnostics, plotting, simulations, and analysis. See <a href="https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/0012-9658%282000%29081%5B0694%3ASSSAAN%5D2.0.CO%3B2">Easterling et al. (2000)</a>
for an in depth description of integral projection models. There is an <a href="https://cran.r-project.org/web/packages/ipmr/vignettes/ipmr-introduction.html">Introduction</a> and vignettes on <a href="https://cran.r-project.org/web/packages/ipmr/vignettes/age_x_size.html">Age-Size IPMS</a>, <a href="https://cran.r-project.org/web/packages/ipmr/vignettes/density-dependence.html">Density Dependent IPMS</a>, <a href="https://cran.r-project.org/web/packages/ipmr/vignettes/hierarchical-notation.html">Hierarchical Notation</a>, and <a href="https://cran.r-project.org/web/packages/ipmr/vignettes/proto-ipms.html">Data Structures</a>.</p>
<p><a href="https://cran.r-project.org/package=metapack">metapack</a> v0.1.1: Provides functions performing Bayesian inference for meta-analytic and network meta-analytic models through Markov chain Monte Carlo algorithm. See <a href="https://www.tandfonline.com/doi/full/10.1080/01621459.2015.1006065">Yao et al. (2015)</a> for the theory, the <a href="https://cran.r-project.org/web/packages/metapack/vignettes/intro-to-metapack.html">vignette</a> for an introduction and the <a href="http://merlot.stat.uconn.edu/packages/metapack/">online documentation</a>.</p>
<p><a href="https://cran.r-project.org/package=sassy">sassy</a> v1.0.4: Loads a collection of packages that collectively aim to make R easier for SAS® programmers. Functions bring many familiar SAS® concepts to R, including data libraries, data dictionaries, formats and format catalogs, a data step, and a traceable log. There is an <a href="https://cran.r-project.org/web/packages/sassy/vignettes/sassy.html">Introduction</a>, and vignettes with example <a href="https://cran.r-project.org/web/packages/sassy/vignettes/sassy-figure.html">Figures</a>, <a href="https://cran.r-project.org/web/packages/sassy/vignettes/sassy-listing.html">Listings</a>, and <a href="https://cran.r-project.org/web/packages/sassy/vignettes/sassy-table.html">Tables</a>, as well as a few <a href="https://cran.r-project.org/web/packages/sassy/vignettes/sassy-disclaimers.html">Disclaimers</a> which include a statement indicating that the packages were developed in the context of the pharmaceutical industry but should be generally helpful.</p>
<h3 id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=gargoyle">gargoyle</a> v0.0.1: Implements an event-Based framework for building <code>Shiny</code> apps. Instead of relying on standard <code>Shiny</code> reactive objects, this package allow to relying on a lighter set of triggers, so that reactive contexts can be invalidated with more control. See the <a href="https://cran.r-project.org/web/packages/gargoyle/vignettes/gargoyle.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=multidplyr">multidplyr</a> Provides simple multicore parallelism through functions that partition a data frame across multiple worker processes. See the <a href="https://cran.r-project.org/web/packages/multidplyr/vignettes/multidplyr.html">vignette</a>.</p>
<p><a href="https://cran.r-project.org/package=quarto">quarto</a> v0.1: Provides an interface to the <a href="https://github.com/avdi/quarto">Quarto</a> markdown publishing system and allows converting R Markdown documents and <a href="https://jupyter.org/">Jupyter Notebooks</a> to a variety of output formats.</p>
<p><a href="https://cran.r-project.org/package=vmr">var</a> v0.0.2: Provides functions to manage, provision and use virtual machines pre-configured for R, and develop, test and build package in a clean environment. <a href="https://www.vagrantup.com/intro">Vagrant</a> and a provider such as <a href="https://www.virtualbox.org/">Virtualbox</a> must be installed.</p>
<h3 id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=ggh4x">ggh4x</a> v0.1.2.1: Extends <code>ggplot2</code> facets by setting individual scales per panel, resizing panels, providing nested facets, and allowing multiple colour and fill scales per plot. See the <a href="https://cran.r-project.org/web/packages/ggh4x/vignettes/ggh4x.html">Introduction</a>, and the vignettes <a href="https://cran.r-project.org/web/packages/ggh4x/vignettes/Facets.html">Facets</a>, <a href="https://cran.r-project.org/web/packages/ggh4x/vignettes/Miscellaneous.html">Misc</a>, <a href="https://cran.r-project.org/web/packages/ggh4x/vignettes/PositionGuides.html">Position Guides</a>, and <a href="https://cran.r-project.org/web/packages/ggh4x/vignettes/Statistics.html">Statistics</a>.</p>
<p><img src="ggh4x.png" height = "200" width="400"></p>
<p><a href="https://cran.r-project.org/package=tastypie">tastypie</a> v0.0.3: Provides functions and templates for making pie charts even though you probably shouldn’t. See the vignettes <a href="https://cran.r-project.org/web/packages/tastypie/vignettes/available_templates.html">available templates</a> and <a href="https://cran.r-project.org/web/packages/tastypie/vignettes/your_favourite_template.html">Your favorite template</a>, and look <a href="https://paolodalena.github.io/tastypie/">here</a> for examples.</p>
<p><img src="tastypie.png" height = "350" width="350"></p>
<p><a href="https://cran.r-project.org/package=terrainr">terrainr</a> v0.3.1: Provides functions to retrieve, manipulate, and visualize geospatial data, with an aim towards producing ‘3D’ landscape visualizations in the <a href="https://unity.com/">Unity 3D</a> rendering engine. Functions are also provided for retrieving elevation data and base map tiles from the <a href="https://apps.nationalmap.gov/services/">USGS National Map</a>. There is an <a href="https://cran.r-project.org/web/packages/terrainr/vignettes/overview.html">Introduction</a> and a <a href="https://cran.r-project.org/web/packages/terrainr/vignettes/unity_instructions.html">vignette</a> on importing terrain tiles.</p>
<p><img src="terrainr.jpeg" height = "300" width="300"></p>
<script>window.location.href='https://rviews.rstudio.com/2021/03/19/february-2021-top-40-new-cran-packages/';</script>
Cheat Sheets
https://rviews.rstudio.com/2021/03/10/rstudio-open-source-resorurces/
Wed, 10 Mar 2021 00:00:00 +0000https://rviews.rstudio.com/2021/03/10/rstudio-open-source-resorurces/
<p>In a <a href="https://rviews.rstudio.com/2020/12/02/learn-and-teach-r/">previous post</a>, I described how I was captivated by the virtual landscape imagined by the RStudio education team while looking for resources on the <a href="https://rstudio.com/">RStudio</a> website. In this post, I’ll take a look at
<a href="https://rstudio.com/resources/cheatsheets/"><em>Cheatsheets</em></a> another amazing resource hiding in plain sight.</p>
<p><img src="cs.png" height = "400" width="100%"></p>
<p>Apparently, some time ago when I wasn’t paying much attention, cheat sheets evolved from the home made study notes of students with highly refined visual cognitive skills, but a relatively poor grasp of algebra or history or whatever to an essential software learning tool. I don’t know how this happened in general, but master cheat sheet artist Garrett Grolemund has passed along some of the lore of the cheat sheet at RStudio. Garrett writes:</p>
<blockquote>
<p>One day I put two and two together and realized that our Winston Chang, who I had known for a couple of years, was the same “W Chang” that made the LaTex cheatsheet that I’d used throughout grad school. It inspired me to do something similarly useful, so I tried my hand at making a cheatsheet for Winston and Joe’s Shiny package. The Shiny cheatsheet ended up being the first of many. A funny thing about the first cheatsheet is that I was working next to Hadley at a co-working space when I made it. In the time it took me to put together the cheatsheet, he wrote the entire first version of the tidyr package from scratch.</p>
</blockquote>
<p>It is now hard to imagine getting by without cheat sheets. It seems as if they are becoming expected adjunct to the documentation. But, as Garret explains in the <a href="https://github.com/rstudio/cheatsheets">README</a> for the cheat sheets GitHub repository, <strong>they are not documentation!</strong></p>
<blockquote>
<p>RStudio cheat sheets are not meant to be text or documentation! They are scannable visual aids that use layout and visual mnemonics to help people zoom to the functions they need. … Cheat sheets fall squarely on the human-facing side of software design.</p>
</blockquote>
<p>Cheat sheets live in the space where <a href="https://psnet.ahrq.gov/primer/human-factors-engineering">human factors</a> engineering gets a boost from artistic design. If R packages were airplanes then pilots would want cheat sheets to help them master the controls.</p>
<p>The RStudio site contains sixteen RStudio produced cheat sheets and nearly forty contributed efforts, some of which are displayed in the graphic above. The <a href="https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf"><em>Data Transformation cheat sheet</em></a> is a classic example of a straightforward mnemonic tool.
It is likely that even someone who just beginning to work with <code>dplyr</code> will immediately grok that it organizes functions that manipulate tidy data. The cognitive load then is to remember how functions are grouped by task. The cheat sheet offers a canonical set of classes: “manipulate cases”, “manipulate variables” etc. to facilitate the process. Users that work with <code>dplyr</code> on a regular basis will probably just need to glance at the cheat sheet after a relatively short time.</p>
<p>The <a href="https://github.com/rstudio/cheatsheets/raw/master/shiny.pdf"><em>Shiny cheat sheet</em></a> is little more ambitious. It works on multiple levels and goes beyond categories to also suggest process and workflow.</p>
<p><img src="shiny.png" height = "400" width="100%"></p>
<p>The <a href="https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf"><em>Apply functions cheat sheet</em></a> takes on an even more difficult task. For most of us, internally visualizing multi-level data structures is difficult enough, imaging how data elements flow under transformations is a serious cognitive load. I for one, really appreciate the help.</p>
<p><img src="purrr.png" height = "400" width="100%"></p>
<p>Cheat sheets are immensely popular. And even in this ebook age where nearly everything you can look at is online, and conference attending digital natives travel light, the cheat sheets as artifacts retain considerable appeal. Not only are they useful tools and geek art (Take a look at <a href="https://github.com/rstudio/cheatsheets/raw/master/cartography.pdf"><em>cartography</em></a>) for decorating a workplace, my guess is that they are perceived as <em>runes of power</em> enabling the cognoscenti to grasp essential knowledge and project it in the world.</p>
<p>When in-person conferences resume again, I fully expect the heavy paper copies to disappear soon after we put them out at the RStudio booth.</p>
<script>window.location.href='https://rviews.rstudio.com/2021/03/10/rstudio-open-source-resorurces/';</script>
2021 R Conferences
https://rviews.rstudio.com/2021/03/03/2021-r-conferences/
Wed, 03 Mar 2021 00:00:00 +0000https://rviews.rstudio.com/2021/03/03/2021-r-conferences/
<p><img src="conf2021.png" height = "400" width="100%"></p>
<p>It is not yet clear what lasting impact the Covid-19 pandemic will ultimately have on R conferences. We are still adapting to our inability to attend large events, and trying to make the best of the “silver lining” of virtual events which permit worldwide participation. The following is an attempt to list 2021 conferences that are likely to have interesting R content. I suspect that it is incomplete. If you know of an R Conference that is not mentioned, please add it to the comments section for this post.</p>
<h3 id="upcoming-events">Upcoming Events</h3>
<p><a href="https://www.ire.org/training/conferences/nicar-2021/">NICAR 2021</a> (March 3 - 5), the Investigative Reporters & Editors Conference on data journalism should be well attended by data journalists using R for their everyday reporting.</p>
<p><a href="https://cascadiarconf.com/">CascadiaRConf 2021</a> (June 4 - 5), a jewel of a regional R conference for its first three years, was canceled in 2020. It is back this year as a virtual event. The <a href="https://cascadiarconf.com/speakers/">Call for Presentations</a> is open.</p>
<p><a href="https://www.phuse-events.org/attend/frontend/reg/thome.csp?pageID=2283&eventID=6&traceRedir=2">PHUSE US Connect 2021</a> (June 14 - 18) - PHUSE is a non-profit organization with the mission: “Sharing ideas, tools and standards around data, statistical and reporting technologies to advance the future of life sciences.” The conference which is focused on clinical data science is likely to have some interesting R content this year. The <a href="https://mail.google.com/mail/u/0/#inbox/FMfcgxwLsmclTmvczLGxMrVptgJlVrhW">Call for Papers</a> is open.</p>
<p><a href="https://psiweb.org/conferences/about-the-conference">PSI 2021 Online</a> (June 21 - 23) usually attracts six hundred or so statisticians from the pharmaceutical industry when the conference is held in person. <a href="https://www.psiweb.org/">PSI</a> statisticians bring you <a href="https://rviews.rstudio.com/2021/01/11/wonderful-wednesdays/">Wonderful Wednesdays</a>.</p>
<p><a href="https://user2021.r-project.org/">useR! 2021</a> (July 5 - 9) has an outstanding lineup of <a href="https://user2021.r-project.org/program/keynotes/">keynote speakers</a>. The <a href="https://user2021.r-project.org/program/overview/">program</a> is very likely to make US based attendees night-owls.</p>
<p><a href="https://bioc2021.bioconductor.org/">BioC 2021</a> (August 4 - 6) is the must attend event for anyone doing computational biology. Peruse the <a href="https://bioc2021.bioconductor.org/conferences/">slides</a> of past events to get a “rear view preview” of what to expect.</p>
<p><a href="https://ww2.amstat.org/meetings/jsm/2021/">JSM 2021</a> Seattle (August 7 - 12), the mother of all statistics conferences, usually draws between 4,000 and 6,000 statisticians to in-person events. This organizers appear to be following some pretty optimistic Covid-19 vaccination rate models.</p>
<p><a href="https://events.linuxfoundation.org/r-medicine/">R/Medicine 2021</a> (August 27 - 29) has the dates, but no website yet. Don’t worry, the clinicians are big come from behind organizers. <a href="https://rviews.rstudio.com/2020/09/16/some-thoughts-on-r-medicine-2020/">Last year’s</a> conference was outstanding, and I expect an amazing event again this year.</p>
<p><a href="https://rinpharma.com/">R/Pharma 2021</a> organizers like to give R / Medicine organizers a head start, but a well placed source tells me that the conference will take place in Q3 or Q4. For the past three years, <a href="https://rviews.rstudio.com/2018/10/03/some-thoughts-on-r-pharma-2018/">R/Pharma</a> has been a bright star among R conferences where some of the best Shiny developers in the world meet and discuss their work.</p>
<p><a href="https://info.mango-solutions.com/earl-2021#:~:text=EARL%202021%206%2D10th%20September,of%20the%20world%27s%20leading%20practitioners">EARL Conference 2021</a> (September 6 - 10), the premier R in industry event, will be online this year. The call for abstracts is already open.</p>
<p><a href="https://rstats.ai/">NY R Conference 2021</a> is usually the perfect way to spend a couple of Manhattan Spring days. This year, the organizers are hoping for and in-person event in August or September if things go really well, but planning to surpass their spectacular 2020 virtual event if things don’t.</p>
<p><a href="https://ww2.amstat.org/meetings/biop/2021/workshopinfo.cfm">BIOP 2021</a> Rockville, MD (September 21 - 23) may be an in-person event. This workshop was originally an event for FDA statisticians but is now open to all statisticians interested in statistical practices for all areas regulated by the FDA.</p>
<p><a href="https://rnorthconference.github.io/">noRth 2021</a> (September 29 30) is a regional conference out of the “Twin Cities” that is looking to virtually expand their reach within the R Community. Gabriela de Queiroz heads the list of confirmed speakers which includes new faces from IBM, Google, and the Federal Reserve.</p>
<p><a href="https://2021.foss4g.org/">Foss4g for OSGEO</a> Buenos Aires (September 27 - October 2) is the annual conference of <a href="https://www.osgeo.org/">OSGeo</a>, the Open Source Geospatial Foundation. Given the prominence of R in geospatial analysis this is sure to be an R heavy event. The conference will be online.</p>
<p><a href="https://www.linkedin.com/in/gabrieladequeiroz/">PHUSE EU Connect 21</a> (November 15 - 19) See above.</p>
<p><a href="https://rstats.ai/">R Government</a> has a reasonable chance of pulling off an in-person event (at least for people in the DC area) sometime in December if the region gets a break from Covid.</p>
<h3 id="earlier-events">Earlier Events</h3>
<p><a href="https://rstudio.com/resources/rstudioglobal-2021/">rstudio::global</a> (January 21) - The <a href="https://rviews.rstudio.com/2021/02/04/some-thoughts-on-rstudio-global/">talks</a> from this unique 24 hour, worldwide event are on line.</p>
<p><a href="https://www.eshackathon.org/events/2021-01-ESMAR.html">Evidence Synthesis and Meta-Analysis in R</a> - The talks from this conference and hackathon which attracted 514 participants from 26 countries are online <a href="https://www.youtube.com/channel/UCqoKd8CCBInvyDMqeqGs0YQ">here</a>.</p>
<script>window.location.href='https://rviews.rstudio.com/2021/03/03/2021-r-conferences/';</script>