Skip to contents

The stantargets package makes it easy to run a single Stan model and keep track of the results. cmdstanr fits the models, and targets manages the workflow and helps avoid unnecessary computation.

First, write a Stan model file.

lines <- "data {
  int <lower = 1> n;
  vector[n] x;
  vector[n] y;
  real true_beta;
}
parameters {
  real beta;
}
model {
  y ~ normal(x * beta, 1);
  beta ~ normal(0, 1);
}"
writeLines(lines, "x.stan")

A typical workflow proceeds as follows:

  1. Prepare a list of input data to Stan, including vector elements x and y.
  2. Fit the Stan model using the list of input data.
  3. Use the fitted model object to compute posterior summaries and convergence diagnostics.
  4. Use the fitted model object to extract posterior draws of parameters and store them in a tidy data frame.
  5. Use the fitted model to compute Hamiltonian Monte Carlo (HMC) diagnostics.

stantargets expresses this workflow using the tar_stan_mcmc() function. To use it in a targets pipeline, invoke it from the _targets.R script of the project.

# _targets.R
library(targets)
library(stantargets)

generate_data <- function(n = 10) {
  true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, x * true_beta, 1)
  list(n = n, x = x, y = y, true_beta = true_beta)
}

# The _targets.R file ends with a list of target objects
# produced by stantargets::tar_stan_mcmc(), targets::tar_target(), or similar.
list(
  tar_stan_mcmc(
    example,
    "x.stan",
    generate_data(),
    stdout = R.utils::nullfile(),
    stderr = R.utils::nullfile()
  )
)

Above, tar_stan_mcmc(example, ...) only defines the pipeline. It does not actually run Stan, it declares the targets that will eventually run Stan. Run tar_manifest() to show specific details about the targets.

tar_manifest()
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 
#> # A tibble: 6 × 3
#>   name                  command                                      description
#>   <chr>                 <chr>                                        <chr>      
#> 1 example_data          "generate_data()"                            NA         
#> 2 example_file_x        "\"x.stan\""                                 x.stan     
#> 3 example_mcmc_x        "stantargets::tar_stan_mcmc_run(stan_file =… x.stan     
#> 4 example_diagnostics_x "tibble::as_tibble(posterior::as_draws_df(e… x.stan     
#> 5 example_summary_x     "stantargets::tar_stan_summary_join_data(su… x.stan     
#> 6 example_draws_x       "tibble::as_tibble(posterior::as_draws_df(e… x.stan

Each target listed above is responsible for a piece of the workflow.

  • example_file_x: Reproducibly track changes to the Stan model file.
  • example_data: Run the code you supplied to the data argument of tar_stan_mcmc() and return a dataset compatible with Stan.
  • example_mcmc_x: Run the MCMC and return an object of class CmdStanMCMC.
  • example_draws_X: Return a friendly tibble of the posterior draws from example. Uses the $draws() method. Suppress with draws = FALSE in tar_stan_mcmc().
  • example_summaries_x: Return a friendly tibble of the posterior summaries from example. Uses the $summary() method. Suppress with summary = FALSE in tar_stan_mcmc().
  • example_diagnostics_x: Return a friendly tibble of the sampler diagnostics from example. Uses the $sampler_diagnostics() method. Suppress with diagnostics = FALSE in tar_stan_mcmc().

The suffix _x comes from the base name of the model file, in this case x.stan. If you supply multiple model files to the stan_files argument, all the models share the same dataset, and the suffixes distinguish among the various targets.

targets produces a graph to show the dependency relationships among the targets. Below, the MCMC depends on the model file and the data, and the draws, summary, and diagnostics depend on the MCMC.

tar_visnetwork(targets_only = TRUE)
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

Run the computation with tar_make().

tar_make()
#>  dispatched target example_data
#>  completed target example_data [0 seconds, 287 bytes]
#>  dispatched target example_file_x
#>  completed target example_file_x [0 seconds, 166 bytes]
#>  dispatched target example_mcmc_x
#>  completed target example_mcmc_x [9.036 seconds, 166.909 kilobytes]
#>  dispatched target example_diagnostics_x
#>  completed target example_diagnostics_x [0.001 seconds, 85.808 kilobytes]
#>  dispatched target example_summary_x
#>  completed target example_summary_x [0.041 seconds, 1.076 kilobytes]
#>  dispatched target example_draws_x
#>  completed target example_draws_x [0.002 seconds, 72.04 kilobytes]
#>  ended pipeline [9.687 seconds]
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

The output lives in a special folder called _targets/ and you can retrieve it with functions tar_load() and tar_read() (from targets).

tar_read(example_summary_x)
#> # A tibble: 2 × 11
#>   variable   mean median    sd   mad    q5    q95  rhat ess_bulk ess_tail
#>   <chr>     <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 lp__     -4.20  -3.93  0.706 0.311 -5.64 -3.70   1.00    1882.    1839.
#> 2 beta     -0.885 -0.886 0.442 0.443 -1.60 -0.148  1.00    1654.    1614.
#> # ℹ 1 more variable: .join_data <dbl>

At this point, all our results are up to date because their dependencies did not change.

tar_make()
#>  skipping targets (1 so far)...
#>  skipped pipeline [0.064 seconds]
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

But if we change the underlying code or data, some of the targets will no longer be valid, and they will rerun during the next tar_make(). Below, we change the Stan model file, so the MCMC reruns while the data is skipped. This behavior saves time and enhances reproducibility.

write(" ", file = "x.stan", append = TRUE)
tar_outdated()
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 
#> [1] "example_summary_x"     "example_diagnostics_x" "example_file_x"       
#> [4] "example_draws_x"       "example_mcmc_x"
tar_visnetwork(targets_only = TRUE)
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 
tar_make()
#>  skipping targets (1 so far)...
#>  dispatched target example_file_x
#>  completed target example_file_x [0.001 seconds, 168 bytes]
#>  dispatched target example_mcmc_x
#>  completed target example_mcmc_x [9.011 seconds, 166.817 kilobytes]
#>  dispatched target example_diagnostics_x
#>  completed target example_diagnostics_x [0.001 seconds, 85.808 kilobytes]
#>  dispatched target example_summary_x
#>  completed target example_summary_x [0.041 seconds, 1.076 kilobytes]
#>  dispatched target example_draws_x
#>  completed target example_draws_x [0.002 seconds, 72.04 kilobytes]
#>  ended pipeline [9.581 seconds]
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

At this point, we can add more targets and custom functions for additional post-processing.

# _targets.R
library(targets)
library(stantargets)

generate_data <- function(n = 10) {
  true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, x * true_beta, 1)
  list(n = n, x = x, y = y, true_beta = true_beta)
}

list(
  tar_stan_mcmc(
    example,
    "x.stan",
    generate_data(),
    stdout = R.utils::nullfile(),
    stderr = R.utils::nullfile()
  ),
  tar_stan_summary(
    custom_summary,
    fit = example_mcmc_x,
    summaries = list(~posterior::quantile2(.x, probs = c(0.25, 0.75)))
  )
)

In the graph, our new custom_summary target should be connected to the upstream example target, and only custom_summary should be out of date.

tar_visnetwork(targets_only = TRUE)
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

In the next tar_make(), we skip the expensive MCMC and just run the custom summary.

tar_make()
#>  skipping targets (1 so far)...
#>  dispatched target custom_summary
#>  completed target custom_summary [0.022 seconds, 580 bytes]
#>  ended pipeline [0.194 seconds]
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 
tar_read(custom_summary)
#> # A tibble: 2 × 4
#>   variable   q25    q75 .join_data
#>   <chr>    <dbl>  <dbl>      <dbl>
#> 1 lp__     -4.36 -3.75          NA
#> 2 beta     -1.18 -0.580         NA

Multiple models

tar_stan_mcmc() and related functions allow you to supply multiple models to stan_files. If you do, each model will run on the same dataset. Consider a new model y.stan.

lines <- "data {
  int <lower = 1> n;
  vector[n] x;
  vector[n] y;
  real true_beta;
}
parameters {
  real beta;
}
model {
  y ~ normal(x * x * beta, 1); // Regress on x^2 instead of x.
  beta ~ normal(0, 1);
}"
writeLines(lines, "y.stan")

To include this y.stan, we add it to the stan_files argument of tar_stan_mcmc().

#> [1] FALSE
# _targets.R
library(targets)
library(stantargets)

generate_data <- function(n = 10) {
  true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, x * true_beta, 1)
  list(n = n, x = x, y = y, true_beta = true_beta)
}

list(
  tar_stan_mcmc(
    example,
    c("x.stan", "y.stan"), # another model
    generate_data(),
    stdout = R.utils::nullfile(),
    stderr = R.utils::nullfile()
  ),
  tar_stan_summary(
    custom_summary,
    fit = example_mcmc_x,
    summaries = list(~posterior::quantile2(.x, probs = c(0.25, 0.75)))
  )
)

In the graph below, notice how the *_x targets and *_y targets are both connected to example_data upstream.

tar_visnetwork(targets_only = TRUE)
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

Generated quantities

It is possible to use the CmdStanMCMC object from one run to simulate generated quantities downstream. For example, the tar_stan_gq_rep_summaries() function takes a single CmdStanMCMC object, produces multiple replications of generated quantities from multiple models, and aggregates the summaries from each. The following pipeline uses this technique to repeatedly draw from the posterior predictive distribution.

lines <- "data {
  int <lower = 1> n;
  vector[n] x;
  vector[n] y;
}
parameters {
  real beta;
}
model {
  y ~ normal(x * beta, 1);
  beta ~ normal(0, 1);
}
generated quantities {
  array[n] real y_rep = normal_rng(x * beta, 1); // posterior predictive draws
}"
writeLines(lines, "gen.stan")
# _targets.R
library(targets)
library(stantargets)

generate_data <- function(n = 10) {
  true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, x * true_beta, 1)
  list(n = n, x = x, y = y, true_beta = true_beta)
}

list(
  tar_stan_mcmc(
    example,
    "x.stan",
    generate_data(),
    stdout = R.utils::nullfile(),
    stderr = R.utils::nullfile()
  ),
  tar_stan_gq_rep_summary(
    postpred,
    stan_files = "gen.stan",
    fitted_params = example_mcmc_x, # one CmdStanFit object
    data = generate_data(), # Function runs once per rep.
    batches = 2, # 2 dynamic branches
    reps = 5, # 5 replications per branch
    quiet = TRUE,
    stdout = R.utils::nullfile(),
    stderr = R.utils::nullfile()
  )
)

Since we have defined many objects in the pipeline, it is extra important to check the graph to be sure everything is connected.

tar_visnetwork(targets_only = TRUE)
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

Then, we run the computation. The original MCMC is already up to date, so we only run the targets relevant to the generated quantities.

tar_make()
#>  dispatched target postpred_batch
#>  completed target postpred_batch [0 seconds, 98 bytes]
#>  dispatched target postpred_file_gen
#>  completed target postpred_file_gen [8.707 seconds, 2.612 megabytes]
#>  skipping targets (1 so far)...
#>  dispatched branch postpred_data_3f82502bb81fc7e3
#>  completed branch postpred_data_3f82502bb81fc7e3 [0.005 seconds, 766 bytes]
#>  dispatched branch postpred_data_f7ab7117578bb2a5
#>  completed branch postpred_data_f7ab7117578bb2a5 [0.001 seconds, 766 bytes]
#>  completed pattern postpred_data 
#>  dispatched branch postpred_gen_7f380f9fd103c950
#>  completed branch postpred_gen_7f380f9fd103c950 [3.33 seconds, 8.241 kilobytes]
#>  dispatched branch postpred_gen_b23471e06f80f1f3
#>  completed branch postpred_gen_b23471e06f80f1f3 [3.15 seconds, 8.241 kilobytes]
#>  completed pattern postpred_gen 
#>  skipping targets (4 so far)...
#>  dispatched target postpred
#>  completed target postpred [0.001 seconds, 13.564 kilobytes]
#>  ended pipeline [16.209 seconds]
#> Warning message:
#> package ‘targets’ was built under R version 4.4.2 
#> 

Finally, we read the summaries of posterior predictive samples.

tar_read(postpred)
#> # A tibble: 100 × 16
#>    variable     mean  median    sd   mad     q5   q95  rhat ess_bulk ess_tail
#>    <chr>       <dbl>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#>  1 y_rep[1]   0.893   0.893   1.06 1.05  -0.892 2.69   1.00    3476.    3890.
#>  2 y_rep[2]   0.689   0.702   1.08 1.06  -1.09  2.47   1.00    3500.    3555.
#>  3 y_rep[3]   0.493   0.492   1.08 1.07  -1.25  2.30   1.00    3746.    3813.
#>  4 y_rep[4]   0.253   0.224   1.01 0.985 -1.37  1.99   1.00    3765.    3373.
#>  5 y_rep[5]   0.0607  0.0389  1.02 0.983 -1.64  1.76   1.00    3933.    3630.
#>  6 y_rep[6]  -0.0506 -0.0680  1.01 1.00  -1.66  1.67   1.00    4232.    3429.
#>  7 y_rep[7]  -0.287  -0.324   1.01 1.01  -1.92  1.47   1.00    3702.    3548.
#>  8 y_rep[8]  -0.489  -0.485   1.02 1.03  -2.13  1.25   1.00    3812.    3644.
#>  9 y_rep[9]  -0.619  -0.621   1.06 1.09  -2.34  1.15   1.00    4186.    3853.
#> 10 y_rep[10] -0.833  -0.855   1.06 1.05  -2.52  0.963  1.00    3577.    3710.
#> # ℹ 90 more rows
#> # ℹ 6 more variables: .join_data <dbl>, .rep <chr>, .dataset_id <chr>,
#> #   .seed <int>, .file <chr>, .name <chr>

More information

For more on targets, please visit the reference website https://docs.ropensci.org/targets/ or the user manual https://books.ropensci.org/targets/. The manual walks though advanced features of targets such as high-performance computing and cloud storage support.