A First Look at NIMBLE

2018-07-05

by Joseph Rickert

Writing a domain-specific language (DSL) is a powerful and fairly common method for extending the R language. Both ggplot2 and dplyr, for example, are DSLs. (See Hadley’s chapter in Advanced R for some elaboration.) In this post, I take a first look at NIMBLE (Numerical Inference for Statistical Models using Bayesian and Likelihood Estimation), a DSL for formulating and efficiently solving statistical models in general, and Bayesian hierarchical models in particular. The latter comprise a class of interpretable statistical models useful for both inference and prediction. (See Gelman’s 2006 Technographics paper for what these models can and cannot do.)

Most of what I describe here can be found in the comprehensive and the very readable paper by Valpine et al., or the extensive NIMBLE User Manual.

At the risk of oversimplification, it seems to me that the essence of the NIMBLE is that it is a system for developing models designed around two dichotomies. The first dichotomy is that NIMBLE separates the specification of a model from the implementation of the algorithms that express it. A user can formulate a model in the BUGS language, and then use different NIMBLE functions to solve the model. Or, looking at things the other way round, a user can write a NIMBLE function to implement some algorithm or another efficiently, and then use this function in different models.

The second dichotomy separates model setup from model execution. NIMBLE programming is done by writing nimbleFunctions (see Chapter 11 of the Manual), a subset of the R Language that is augmented with additional data structures and made compilable. nimbleFunctions come in two flavors. Setup functions run once for each model, node any model structure used to define the model. Run functions can be executed by R or compiled into C++ code.

NIMBLE is actually more complicated, or should I say “richer”, than this. There are many advanced programming concepts to wrap your head around. Nevertheless, it is pretty straightforward to start using NIMBLE for models that already have BUGS implementations. The best way to get started for anyone new to Bayesian statistics, or whose hierarchical model building skills may be a bit rusty, is to take the advice of the NIMBLE developers and work through the Pump Model example in Chapter 2 of the manual. In the rest of this post, I present an abbreviated and slightly reworked version of that model.

Before getting started, note that although NIMBLE is an R package listed on CRAN, and it installs like any other R package, you must have a C++ compiler and the standard make utility installed on your system before installing NIMBLE. (See Chapter 4 of the manual for platform-specific installation instructions.)

Pump Failure Model

The Pump Failure model is discussed by George et al. in their 1993 paper: Conjugate Likelihood Distributions. The paper examines Bayesian models that use conjugate priors, but which do not have closed form solutions when prior distributions are associated with the hyperparameters. The BUGS solution of the problem is given here.

The data driving the model are: x[i] the number of failures for pump, i in a time interval, t[i] where i goes from 1 to 10.

library(nimble)
library(igraph)
library(tidyverse)

pumpConsts <- list(N = 10,
t = c(94.3, 15.7, 62.9, 126, 5.24,31.4, 1.05, 1.05, 2.1, 10.5))

pumpData <- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))

Arrival times are a Poisson distribution with parameter lambda, where lambda is itself modeled as a Gamma distribution with hyperparameters alpha and beta.

The model is expressed in the BUGS language wrapped inside the NIMBLE function nimbleCode(), which turns the BUGS code into a object that can be operated on by nimbleModel()

pumpCode <- nimbleCode(
  {
for (i in 1:N){
  theta[i] ~ dgamma(alpha,beta)
  lambda[i] <- theta[i]*t[i]
  x[i] ~ dpois(lambda[i])
  }
alpha ~ dexp(1.0)
beta ~ dgamma(0.1,1.0)
})

pumpInits <- list(alpha = 1, beta = 1,
theta = rep(0.1, pumpConsts$N))

nimbleModel() produces the model object that can be executed by R or compiled.

pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
                    data = pumpData, inits = pumpInits)

The following command lets us look at the nodes that comprise the model’s directed graph, and plot it.

pump$getNodeNames()

 [1] "alpha"               "beta"                "lifted_d1_over_beta"
 [4] "theta[1]"            "theta[2]"            "theta[3]"           
 [7] "theta[4]"            "theta[5]"            "theta[6]"           
[10] "theta[7]"            "theta[8]"            "theta[9]"           
[13] "theta[10]"           "lambda[1]"           "lambda[2]"          
[16] "lambda[3]"           "lambda[4]"           "lambda[5]"          
[19] "lambda[6]"           "lambda[7]"           "lambda[8]"          
[22] "lambda[9]"           "lambda[10]"          "x[1]"               
[25] "x[2]"                "x[3]"                "x[4]"               
[28] "x[5]"                "x[6]"                "x[7]"               
[31] "x[8]"                "x[9]"                "x[10]"

pump$plotGraph()

We can look at the values stored at each node. The node for x contains the initial values we entered into the model and the nodes for theta and lambda contain the initial calculated values

pump$x

 [1]  5  1  5 14  3 19  1  1  4 22

pump$theta

 [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

pump$lambda

 [1]  9.430  1.570  6.290 12.600  0.524  3.140  0.105  0.105  0.210  1.050

We can also look at the log probabilities of the likelihoods.

pump$logProb_x

 [1]  -2.998011  -1.118924  -1.882686  -2.319466  -4.254550 -20.739651
 [7]  -2.358795  -2.358795  -9.630645 -48.447798

Next, we use the model to simulate new values for theta and update the variables.

set.seed(1)
pump$simulate("theta")
pump$theta

 [1] 0.15514136 1.88240160 1.80451250 0.83617765 1.22254365 1.15835525
 [7] 0.99001994 0.30737332 0.09461909 0.15720154

These new values will, of course, lead to new values of lambda and the log probabilities.

pump$lambda

 [1]  9.430  1.570  6.290 12.600  0.524  3.140  0.105  0.105  0.210  1.050

pump$logProb_x

 [1]  -2.998011  -1.118924  -1.882686  -2.319466  -4.254550 -20.739651
 [7]  -2.358795  -2.358795  -9.630645 -48.447798

The model can also be compiled. The C++ code generated is loaded back into R with an object that can be examined like the uncompiled model.

Cpump <- compileNimble(pump)
Cpump$theta

 [1] 0.15514136 1.88240160 1.80451250 0.83617765 1.22254365 1.15835525
 [7] 0.99001994 0.30737332 0.09461909 0.15720154

Note that since a wide variety of NIMBLE models can be compiled, NIMBLE should be generally useful for developing production R code. Have a look at the presentation by Christopher Paciorek for a nice overview of NIMBLE’s compilation process.

Next, the default NIMBLE MCMC algorithm is used to generate posterior samples from the distributions for the model parameters alpha, beta, theta and lambda, along with summary statistics and the value of Wantanabi’s WAIC statistic.

mcmc.out <- nimbleMCMC(code = pumpCode, constants = pumpConsts,
                       data = pumpData, inits = pumpInits,
                       monitors=c("alpha","beta","theta","lambda"),
                       nchains = 2, niter = 10000,
                       summary = TRUE, WAIC = TRUE)

|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|

names(mcmc.out)

[1] "samples" "summary" "WAIC"

The model object contains a summary of the model,

mcmc.out$summary

$chain1
                  Mean      Median    St.Dev.    95%CI_low  95%CI_upp
alpha       0.69804352  0.65835063 0.27037676  0.287898244  1.3140461
beta        0.92862598  0.82156847 0.54969128  0.183699137  2.2872696
lambda[1]   5.67617535  5.35277649 2.39989346  1.986896247 11.3087435
lambda[2]   1.59476464  1.28802618 1.24109695  0.126649837  4.7635134
lambda[3]   5.58222072  5.28139948 2.36539331  1.961659802 11.1331869
lambda[4]  14.57540796 14.23984600 3.79587390  8.085538041 22.9890092
lambda[5]   3.16402849  2.87859865 1.63590766  0.836991007  7.1477641
lambda[6]  19.21831706 18.86685281 4.33423677 11.703168568 28.6847447
lambda[7]   0.94776605  0.74343559 0.77658191  0.077828283  2.8978174
lambda[8]   0.93472103  0.74313533 0.76301563  0.076199581  2.9598715
lambda[9]   3.31124086  3.03219017 1.61332897  0.955909813  7.2024472
lambda[10] 20.89018329 20.59798130 4.45302924 13.034529765 30.4628012
theta[1]    0.06019274  0.05676327 0.02544956  0.021069950  0.1199230
theta[2]    0.10157737  0.08203988 0.07905076  0.008066869  0.3034085
theta[3]    0.08874755  0.08396502 0.03760562  0.031186960  0.1769982
theta[4]    0.11567784  0.11301465 0.03012598  0.064170937  0.1824525
theta[5]    0.60382223  0.54935089 0.31219612  0.159731108  1.3640771
theta[6]    0.61204831  0.60085518 0.13803302  0.372712375  0.9135269
theta[7]    0.90263434  0.70803389 0.73960182  0.074122175  2.7598261
theta[8]    0.89021051  0.70774794 0.72668155  0.072571029  2.8189252
theta[9]    1.57678136  1.44390008 0.76825189  0.455195149  3.4297368
theta[10]   1.98954127  1.96171250 0.42409802  1.241383787  2.9012192

$chain2
                  Mean      Median    St.Dev.    95%CI_low  95%CI_upp
alpha       0.69101961  0.65803654 0.26548378  0.277195564  1.2858148
beta        0.91627273  0.81434426 0.53750825  0.185772263  2.2702428
lambda[1]   5.59893463  5.29143956 2.32153991  1.976164994 10.9564380
lambda[2]   1.57278293  1.27425268 1.23323878  0.129781580  4.7262566
lambda[3]   5.60321125  5.27780200 2.32992322  1.970709123 10.9249502
lambda[4]  14.60674094 14.30971897 3.86145332  8.013012004 23.0526313
lambda[5]   3.13119513  2.84685917 1.67006926  0.782262325  7.2043337
lambda[6]  19.17917926 18.82326155 4.33456380 11.661139736 28.5655803
lambda[7]   0.94397897  0.74446578 0.76576887  0.080055678  2.8813517
lambda[8]   0.94452324  0.74263433 0.77013200  0.074813473  2.9457364
lambda[9]   3.30813061  3.04512049 1.58008544  0.986914665  7.0355869
lambda[10] 20.88570471 20.60384141 4.44130984 13.092562597 30.5574424
theta[1]    0.05937364  0.05611283 0.02461866  0.020956151  0.1161870
theta[2]    0.10017726  0.08116259 0.07855024  0.008266343  0.3010355
theta[3]    0.08908126  0.08390782 0.03704170  0.031330829  0.1736876
theta[4]    0.11592652  0.11356920 0.03064645  0.063595333  0.1829574
theta[5]    0.59755632  0.54329373 0.31871551  0.149286703  1.3748728
theta[6]    0.61080189  0.59946693 0.13804343  0.371373877  0.9097319
theta[7]    0.89902759  0.70901502 0.72930369  0.076243503  2.7441445
theta[8]    0.89954594  0.70727079 0.73345905  0.071250926  2.8054633
theta[9]    1.57530029  1.45005738 0.75242164  0.469959364  3.3502795
theta[10]   1.98911473  1.96227061 0.42298189  1.246910723  2.9102326

$all.chains
                  Mean      Median    St.Dev.   95%CI_low  95%CI_upp
alpha       0.69453156  0.65803654 0.26795776  0.28329854  1.2999319
beta        0.92244935  0.81828160 0.54365539  0.18549077  2.2785444
lambda[1]   5.63755499  5.32462511 2.36129857  1.98294712 11.1597721
lambda[2]   1.58377379  1.28149065 1.23719199  0.12734396  4.7382079
lambda[3]   5.59271599  5.28024565 2.34769002  1.96451069 11.0072923
lambda[4]  14.59107445 14.27080924 3.82874035  8.04541916 23.0250158
lambda[5]   3.14761181  2.86460377 1.65311690  0.80506062  7.1718837
lambda[6]  19.19874816 18.84484055 4.33433610 11.68198222 28.6445471
lambda[7]   0.94587251  0.74395440 0.77117739  0.07927988  2.8911629
lambda[8]   0.93962214  0.74299160 0.76657858  0.07571751  2.9470742
lambda[9]   3.30968573  3.03910484 1.59675456  0.97386482  7.1120082
lambda[10] 20.88794400 20.60051118 4.44706278 13.05509616 30.5216406
theta[1]    0.05978319  0.05646474 0.02504028  0.02102807  0.1183433
theta[2]    0.10087731  0.08162361 0.07880204  0.00811108  0.3017967
theta[3]    0.08891440  0.08394667 0.03732417  0.03123228  0.1749967
theta[4]    0.11580218  0.11326039 0.03038683  0.06385253  0.1827382
theta[5]    0.60068928  0.54668011 0.31548032  0.15363752  1.3686801
theta[6]    0.61142510  0.60015416 0.13803618  0.37203765  0.9122467
theta[7]    0.90083096  0.70852800 0.73445465  0.07550465  2.7534885
theta[8]    0.89487822  0.70761105 0.73007484  0.07211191  2.8067373
theta[9]    1.57604083  1.44719278 0.76035931  0.46374515  3.3866706
theta[10]   1.98932800  1.96195345 0.42352979  1.24334249  2.9068229

and the value of the WAIC statistic.

mcmc.out$WAI

[1] 48.69896

Here, we select sample values for the parameters for pump 1 in the first chain, and put them into a data frame for plotting.

df <- data.frame(mcmc.out$samples$chain1)
df_l <- df %>% select(alpha, beta, theta.1., lambda.1.) %>% gather(key="parameter", value="value")

We plot the sample values.

ps <- df_l %>% ggplot(aes(x=seq_along(value), y = value)) + geom_line()
ps + facet_wrap(~parameter, scales = "free")

And, we plot histograms.

p <- ggplot(df_l,aes(value)) + geom_histogram(aes( y= ..density..),bins = 60) 
p + facet_wrap(~parameter, scales = "free")

Note that it is also possible to perform the MCMC simulation using the compiled model.

mcmc.out_c<- nimbleMCMC(model=Cpump,
                       monitors=c("alpha","beta","theta","lambda"),
                       nchains = 2, niter = 10000,
                       summary = TRUE, WAIC = TRUE)

|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|

mcmc.out_c$summary

$chain1
                  Mean      Median    St.Dev.    95%CI_low  95%CI_upp
alpha       0.70269965  0.65474666 0.27690796  0.288871669  1.3525877
beta        0.92892181  0.82320341 0.54874194  0.186215812  2.2756643
lambda[1]   5.65646492  5.32568604 2.38108302  1.991839453 11.1329833
lambda[2]   1.58848917  1.28392445 1.25676948  0.133001650  4.7163618
lambda[3]   5.62720720  5.30681963 2.36776285  1.986194548 11.1264464
lambda[4]  14.61256770 14.29447077 3.75584085  8.106535985 22.8405075
lambda[5]   3.16521167  2.88771869 1.65132178  0.785161558  7.0181185
lambda[6]  19.12824948 18.77541823 4.27045427 11.733960316 28.4815908
lambda[7]   0.94353548  0.75312906 0.75111813  0.079570796  2.8410369
lambda[8]   0.93661525  0.74764821 0.75397145  0.078424425  2.8536890
lambda[9]   3.33178422  3.05974419 1.63035789  1.019006182  7.3163945
lambda[10] 20.90388784 20.58355995 4.45456152 12.997984788 30.2815949
theta[1]    0.05998372  0.05647599 0.02525009  0.021122370  0.1180592
theta[2]    0.10117765  0.08177863 0.08004901  0.008471443  0.3004052
theta[3]    0.08946275  0.08436915 0.03764329  0.031577020  0.1768910
theta[4]    0.11597276  0.11344818 0.02980826  0.064337587  0.1812739
theta[5]    0.60404803  0.55109135 0.31513774  0.149839992  1.3393356
theta[6]    0.60917992  0.59794326 0.13600173  0.373693004  0.9070570
theta[7]    0.89860522  0.71726577 0.71535060  0.075781711  2.7057495
theta[8]    0.89201452  0.71204591 0.71806805  0.074689928  2.7177991
theta[9]    1.58656391  1.45702104 0.77636090  0.485241039  3.4839974
theta[10]   1.99084646  1.96033904 0.42424395  1.237903313  2.8839614

$chain2
                  Mean      Median    St.Dev.    95%CI_low  95%CI_upp
alpha       0.70184646  0.65773527 0.27237859  0.297108329  1.3420954
beta        0.92323539  0.82124257 0.53880496  0.194879601  2.2590201
lambda[1]   5.59702813  5.25201276 2.35832632  2.029382518 10.9327321
lambda[2]   1.62105397  1.31199418 1.26269004  0.137977536  4.8652461
lambda[3]   5.62874314  5.33611797 2.37576774  1.953756972 11.1296452
lambda[4]  14.53507135 14.22210300 3.84823087  8.042950272 22.9287741
lambda[5]   3.17647361  2.88816158 1.67257096  0.807468793  7.2500264
lambda[6]  19.13576117 18.82011366 4.34294395 11.646448559 28.4716352
lambda[7]   0.94656373  0.74705570 0.76192793  0.084445304  2.9245815
lambda[8]   0.94754678  0.75725106 0.75136514  0.083985118  2.8740656
lambda[9]   3.34514300  3.04989975 1.64818642  0.974288761  7.2961225
lambda[10] 20.97154230 20.61713159 4.45405753 13.260753885 30.5614115
theta[1]    0.05935343  0.05569473 0.02500876  0.021520493  0.1159357
theta[2]    0.10325185  0.08356651 0.08042612  0.008788378  0.3098883
theta[3]    0.08948717  0.08483494 0.03777055  0.031061319  0.1769419
theta[4]    0.11535771  0.11287383 0.03054151  0.063832939  0.1819744
theta[5]    0.60619725  0.55117587 0.31919293  0.154097098  1.3835928
theta[6]    0.60941915  0.59936668 0.13831032  0.370906005  0.9067400
theta[7]    0.90148927  0.71148162 0.72564565  0.080424099  2.7853158
theta[8]    0.90242550  0.72119149 0.71558585  0.079985826  2.7372053
theta[9]    1.59292524  1.45233321 0.78485068  0.463947029  3.4743440
theta[10]   1.99728974  1.96353634 0.42419596  1.262928941  2.9106106

$all.chains
                  Mean      Median    St.Dev.    95%CI_low  95%CI_upp
alpha       0.70227305  0.65580594 0.27464608  0.292312894  1.3489184
beta        0.92607860  0.82237274 0.54378998  0.191611561  2.2684771
lambda[1]   5.62674653  5.28611334 2.36985910  2.007150471 11.0415843
lambda[2]   1.60477157  1.29777398 1.25980697  0.135465213  4.8015059
lambda[3]   5.62797517  5.32327502 2.37170950  1.967656654 11.1291530
lambda[4]  14.57381952 14.26247905 3.80241886  8.071801955 22.9002186
lambda[5]   3.17084264  2.88801833 1.66194832  0.798927322  7.1492800
lambda[6]  19.13200533 18.78653361 4.30674559 11.682136284 28.4767857
lambda[7]   0.94504960  0.75003915 0.75652494  0.081865294  2.8859048
lambda[8]   0.94208101  0.75177975 0.75267045  0.080734142  2.8644982
lambda[9]   3.33846361  3.05488924 1.63926902  0.990402148  7.3041714
lambda[10] 20.93771507 20.60497943 4.45432662 13.116940079 30.4162660
theta[1]    0.05966857  0.05605635 0.02513106  0.021284735  0.1170900
theta[2]    0.10221475  0.08266076 0.08024248  0.008628358  0.3058284
theta[3]    0.08947496  0.08463076 0.03770603  0.031282300  0.1769341
theta[4]    0.11566523  0.11319428 0.03017793  0.064061920  0.1817478
theta[5]    0.60512264  0.55114854 0.31716571  0.152467046  1.3643664
theta[6]    0.60929953  0.59829725 0.13715750  0.372042557  0.9069040
theta[7]    0.90004724  0.71432300 0.72049994  0.077966947  2.7484808
theta[8]    0.89722001  0.71598072 0.71682900  0.076889659  2.7280935
theta[9]    1.58974458  1.45470916 0.78060429  0.471620071  3.4781768
theta[10]   1.99406810  1.96237899 0.42422158  1.249232389  2.8967872

Monte Carlo Expectation Analysis

To illustrate that NIMBLE is more than just an MCMC engine, the manual example uses NIMBLE’s built-in Monte Carlo Expectation algorithm to maximize the marginal likelihood for alpha and beta. The first step is to set up “box constraints” for the model. Then, the buildMCEM() function is used to construct an MCEM algorithm from a NIMBLE model.

pump2 <- pump$newModel()
box = list( list(c("alpha","beta"), c(0, Inf)))
pumpMCEM <- buildMCEM(model = pump2, latentNodes = "theta[1:10]", boxConstraints = box)

This is how to run the Monte Carlo Expectation model. Note that the authors of the NIMBLE manual point out that these results are within 0.01 of the values obtained by George et al.

pumpMLE <- pumpMCEM$run()

|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Iteration Number: 1.
Current number of MCMC iterations: 1000.
Parameter Estimates: 
    alpha      beta 
0.8130146 1.1125783 
Convergence Criterion: 1.001.
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Iteration Number: 2.
Current number of MCMC iterations: 1000.
Parameter Estimates: 
    alpha      beta 
0.8148887 1.2323211 
Convergence Criterion: 0.02959269.
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Iteration Number: 3.
Current number of MCMC iterations: 1000.
Parameter Estimates: 
    alpha      beta 
0.8244363 1.2797908 
Convergence Criterion: 0.005418668.
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Monte Carlo error too big: increasing MCMC sample size.
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Iteration Number: 4.
Current number of MCMC iterations: 1250.
Parameter Estimates: 
    alpha      beta 
0.8280353 1.2700022 
Convergence Criterion: 0.001180319.
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Monte Carlo error too big: increasing MCMC sample size.
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Monte Carlo error too big: increasing MCMC sample size.
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
Iteration Number: 5.
Current number of MCMC iterations: 2188.
Parameter Estimates: 
    alpha      beta 
0.8268224 1.2794244 
Convergence Criterion: 0.000745466.

Like the tidyverse, NIBMLE is an ecosystem of DSLs. The BUGS language is extended and used as a DSL for formulation models. nimbleFunctions is a language for writing algorithms that may be used with both BUGS and R. But, unlike the tidyverse, the NIMBLE DSLs are not distributed across multiple packages, but instead wrapped up into the NIMBLE package code.

For more on the design an use of DSLs with R, have a look at the Chapter in Advanced R, or Thomas Mailund’s new book, Domain-Specific Languages in R.