The jagstargets
package makes it easy to run a single
jags model and keep track of the results. R2jags
fits
the models, and targets
manages the workflow and helps avoid unnecessary computation.
Consider the simple regression model below with response variable
y
and covariate x
.
\[ \begin{aligned} y_i &\stackrel{\text{iid}}{\sim} \text{Normal}(x_i \beta, 1) \\ \beta &\sim \text{Normal}(0, 1) \end{aligned} \]
We write this model in the JAGS model file below.
lines <- "model {
for (i in 1:n) {
y[i] ~ dnorm(x[i] * beta, 1)
}
beta ~ dnorm(0, 1)
}"
writeLines(lines, "x.jags")
A typical workflow proceeds as follows:
- Prepare a list of input data to JAGS, including vector elements
x
andy
. - Fit the JAGS model using the list of input data.
- Use the fitted model object to compute posterior summaries and convergence diagnostics.
- Use the fitted model object to extract posterior draws of parameters and store them in a tidy data frame.
- If there are other models to compare, use the fitted model object to compute the deviance information criterion (DIC).
jagstargets
encapsulates this workflow with the tar_jags()
function. To use it in a targets
pipeline, invoke it from the _targets.R
script of the
project.
# _targets.R
library(targets)
library(jagstargets)
generate_data <- function(n = 10) {
true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
x <- seq(from = -1, to = 1, length.out = n)
y <- stats::rnorm(n, x * true_beta, 1)
out <- list(n = n, x = x, y = y)
}
# The _targets.R file ends with a list of target objects
# produced by jagstargets::tar_jags(), targets::tar_target(), or similar.
list(
tar_jags(
example,
jags_files = "x.jags",
parameters.to.save = "beta",
data = generate_data()
)
)
tar_jags()
only defines the pipeline. It does not actually run JAGS, it
declares the targets that will eventually run JAGS. The specific targets
are as follows. Run tar_manifest()
to show specific details
about the targets declared.
tar_manifest()
#> # A tibble: 7 × 3
#> name command description
#> <chr> <chr> <chr>
#> 1 example_data "tar_jags_example_data()" NA
#> 2 example_file_x "\"x.jags\"" x.jags
#> 3 example_lines_x "readLines(con = example_file_x)" x.jags
#> 4 example_mcmc_x "jagstargets::tar_jags_run(jags_lines = example… x.jags
#> 5 example_summary_x "jagstargets::tar_jags_df(example_mcmc_x, data … x.jags
#> 6 example_dic_x "jagstargets::tar_jags_df(fit = example_mcmc_x,… x.jags
#> 7 example_draws_x "jagstargets::tar_jags_df(fit = example_mcmc_x,… x.jags
Each target is responsible for a piece of the workflow.
-
example_file_x
: Reproducibly track changes to the jags model file. -
example_data
: Run the code you supplied to thedata
argument oftar_jags()
and return a dataset compatible with JAGS. -
example_mcmc_x
: Run the MCMC and return an object of classrjags
fromR2jags
. -
example_draws_x
: Return a friendlytibble
of the posterior draws fromexample
. -
example_summaries_x
: Return a friendlytibble
of the posterior summaries fromexample
. Usesposterior::summarize_draws()
-
example_dic_x
: Return a friendlytibble
with each model’s DIC and penalty.
The suffix _x
comes from the base name of the model
file, in this case x.jags
. If you supply multiple model
files to the jags_files
argument, all the models share the
same dataset, and the suffixes distinguish among the various
targets.
The targets depend on one another: for example,
example_mcmc_x
takes example_data
as input. targets
can
visualize the dependency relationships in a dependency graph, which is
helpful for understanding the pipeline and troubleshooting issues.
tar_visnetwork(targets_only = TRUE)
Run the computation with tar_make()
.
tar_make()
#> ▶ dispatched target example_data
#> ● completed target example_data [0 seconds]
#> ▶ dispatched target example_file_x
#> ● completed target example_file_x [0 seconds]
#> ▶ dispatched target example_lines_x
#> ● completed target example_lines_x [0 seconds]
#> ▶ dispatched target example_mcmc_x
#> ● completed target example_mcmc_x [0.037 seconds]
#> ▶ dispatched target example_summary_x
#> ● completed target example_summary_x [0.032 seconds]
#> ▶ dispatched target example_dic_x
#> ● completed target example_dic_x [0.002 seconds]
#> ▶ dispatched target example_draws_x
#> ● completed target example_draws_x [0.001 seconds]
#> ▶ ended pipeline [0.157 seconds]
#>
The output lives in a special folder called _targets/
and you can retrieve it with functions tar_load()
and
tar_read()
(from targets
).
tar_read(example_summary_x)
#> # A tibble: 2 × 11
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 beta -0.873 -0.875 0.438 0.434 -1.60 -0.156 1.00 2451. 2689.
#> 2 deviance 25.8 25.3 1.34 0.609 24.8 28.5 1.00 2925. 2946.
#> # ℹ 1 more variable: .join_data <dbl>
At this point, all our results are up to date because their dependencies did not change.
tar_make()
#> ✔ skipped target example_data
#> ✔ skipped target example_file_x
#> ✔ skipped target example_lines_x
#> ✔ skipped target example_mcmc_x
#> ✔ skipped target example_summary_x
#> ✔ skipped target example_dic_x
#> ✔ skipped target example_draws_x
#> ✔ skipped pipeline [0.08 seconds]
#>
But if we change the underlying code or data, some of the targets
will no longer be valid, and they will rerun during the next
tar_make()
. Below, we change the jags model file, so the
MCMC reruns while the data is skipped. This behavior saves time and
enhances reproducibility.
write(" ", file = "x.jags", append = TRUE)
tar_outdated()
#> [1] "example_summary_x" "example_dic_x" "example_file_x"
#> [4] "example_draws_x" "example_mcmc_x" "example_lines_x"
tar_visnetwork(targets_only = TRUE)
tar_make()
#> ✔ skipped target example_data
#> ▶ dispatched target example_file_x
#> ● completed target example_file_x [0 seconds]
#> ▶ dispatched target example_lines_x
#> ● completed target example_lines_x [0 seconds]
#> ▶ dispatched target example_mcmc_x
#> ● completed target example_mcmc_x [0.035 seconds]
#> ▶ dispatched target example_summary_x
#> ● completed target example_summary_x [0.03 seconds]
#> ▶ dispatched target example_dic_x
#> ● completed target example_dic_x [0.002 seconds]
#> ▶ dispatched target example_draws_x
#> ● completed target example_draws_x [0.001 seconds]
#> ▶ ended pipeline [0.175 seconds]
#>
At this point, we can add more targets and custom functions for
additional post-processing. See below for a custom summary target (which
is equivalent to customizing the summaries
argument of
tar_jags()
.)
# _targets.R
library(targets)
library(jagstargets)
generate_data <- function(n = 10) {
true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
x <- seq(from = -1, to = 1, length.out = n)
y <- stats::rnorm(n, x * true_beta, 1)
out <- list(n = n, x = x, y = y)
}
list(
tar_jags(
example,
jags_files = "x.jags",
parameters.to.save = "beta",
data = generate_data()
),
tar_target(
custom_summary,
posterior::summarize_draws(
dplyr::select(example_draws_x, -starts_with(".")),
~posterior::quantile2(.x, probs = c(0.25, 0.75))
)
)
)
In the graph, our new custom_summary
target should be
connected to the upstream example
target, and only
custom_summary
should be out of date.
tar_visnetwork(targets_only = TRUE)
In the next tar_make()
, we skip the expensive MCMC and
just run the custom summary.
tar_make()
#> ✔ skipped target example_data
#> ✔ skipped target example_file_x
#> ✔ skipped target example_lines_x
#> ✔ skipped target example_mcmc_x
#> ✔ skipped target example_summary_x
#> ✔ skipped target example_draws_x
#> ✔ skipped target example_dic_x
#> ▶ dispatched target custom_summary
#> ● completed target custom_summary [0.01 seconds]
#> ▶ ended pipeline [0.179 seconds]
#>
tar_read(custom_summary)
#> # A tibble: 2 × 3
#> variable q25 q75
#> <chr> <dbl> <dbl>
#> 1 beta -1.16 -0.578
#> 2 deviance 24.9 26.1
Multiple models
tar_jags()
and related functions allow you to supply
multiple models to jags_files
. If you do, each model will
run on the same dataset. Consider a new model, y.jags
.
lines <- "model {
for (i in 1:n) {
y[i] ~ dnorm(x[i] * x[i] * beta, 1) # Regress on x^2 instead of x.
}
beta ~ dnorm(0, 1)
}"
writeLines(lines, "y.jags")
Below, we add y.jags
to the jags_files
argument of tar_jags()
.
# _targets.R
library(targets)
library(jagstargets)
generate_data <- function(n = 10) {
true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
x <- seq(from = -1, to = 1, length.out = n)
y <- stats::rnorm(n, x * true_beta, 1)
out <- list(n = n, x = x, y = y)
}
list(
tar_jags(
example,
jags_files = c("x.jags", "y.jags"),
parameters.to.save = "beta",
data = generate_data()
),
tar_target(
custom_summary,
posterior::summarize_draws(
dplyr::select(example_draws_x, -starts_with(".")),
~posterior::quantile2(.x, probs = c(0.25, 0.75))
)
)
)
In the graph below, notice how the *_x
targets and
*_y
targets are both connected to example_data
upstream.
tar_visnetwork(targets_only = TRUE)
More information
For more on targets
,
please visit the reference website https://docs.ropensci.org/targets/ or the user manual https://books.ropensci.org/targets/. The manual walks
though advanced features of targets
such as high-performance
computing and cloud
storage support.