# Estimate a Bayesian Dynamic Multivariate Panel Model

Source:`R/dynamite.R`

, `R/print.R`

, `R/summary.R`

`dynamite.Rd`

Fit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via `dynamiteformula()`

. For more information and examples,
see 'Details' and the package vignettes.

The `formula`

method returns the model definition as a quoted expression.

Information on the estimated `dynamite`

model can be obtained via
`print()`

including the following: The model formula, the data,
the smallest effective sample sizes, largest Rhat and summary statistics of
the time-invariant and group-invariant model parameters.

The `summary()`

method provides statistics of the posterior samples of the
model; this is an alias of `as.data.frame.dynamitefit()`

with
`summary = TRUE`

.

## Usage

```
dynamite(
dformula,
data,
time,
group = NULL,
priors = NULL,
backend = "rstan",
verbose = TRUE,
verbose_stan = FALSE,
stanc_options = list("O0"),
threads_per_chain = 1L,
grainsize = NULL,
custom_stan_model = NULL,
debug = NULL,
...
)
# S3 method for class 'dynamitefit'
formula(x, ...)
# S3 method for class 'dynamitefit'
print(x, full_diagnostics = FALSE, ...)
# S3 method for class 'dynamitefit'
summary(object, ...)
```

## Arguments

- dformula
[

`dynamiteformula`

]

The model formula. See`dynamiteformula()`

and 'Details'.- data
[

`data.frame`

,`tibble::tibble`

, or`data.table::data.table`

]

The data that contains the variables in the model in long format. Supported column types are`integer`

,`logical`

,`double`

, and`factor`

. Columns of type`character`

will be converted to factors. Unused factor levels will be dropped. The`data`

can contain missing values which will simply be ignored in the estimation in a case-wise fashion (per time-point and per channel). Input`data`

is converted to channel specific matrix representations via`stats::model.matrix.lm()`

.- time
[

`character(1)`

]

A column name of`data`

that denotes the time index of observations. If this variable is a factor, the integer representation of its levels are used internally for defining the time indexing.- group
[

`character(1)`

]

A column name of`data`

that denotes the unique groups or`NULL`

corresponding to a scenario without any groups. If`group`

is`NULL`

, a new column`.group`

is created with constant value`1L`

is created indicating that all observations belong to the same group. In case of name conflicts with`data`

, see the`group_var`

element of the return object to get the column name of the new variable.- priors
[

`data.frame`

]

An optional data frame with prior definitions. See`get_priors()`

and 'Details'.- backend
[

`character(1)`

]

Defines the backend interface to Stan, should be either`"rstan"`

(the default) or`"cmdstanr"`

. Note that`cmdstanr`

needs to be installed separately as it is not on CRAN. It also needs the actual`CmdStan`

software. See https://mc-stan.org/cmdstanr/ for details.- verbose
[

`logical(1)`

]

All warnings and messages are suppressed if set to`FALSE`

. Defaults to`TRUE`

. Setting this to`FALSE`

will also disable checks for perfect collinearity in the model matrix.- verbose_stan
[

`logical(1)`

]

This is the`verbose`

argument for`rstan::sampling()`

. Defaults to`FALSE`

.- stanc_options
[

`list()`

]

This is the`stanc_options`

argument passed to the compile method of a`CmdStanModel`

object via`cmdstan_model()`

when`backend = "cmdstanr"`

. Defaults to`list("O0")`

. To enable level one compiler optimizations, use`list("O1")`

. See https://mc-stan.org/cmdstanr/reference/cmdstan_model.html for details.- threads_per_chain
[

`integer(1)`

]

A Positive integer defining the number of parallel threads to use within each chain. Default is`1`

. See`rstan::rstan_options()`

and`cmdstanr::sample()`

for details.- grainsize
[

`integer(1)`

]

A positive integer defining the suggested size of the partial sums when using within-chain parallelization. Default is number of time points divided by`threads_per_chain`

. Setting this to`1`

leads the workload division entirely to the internal scheduler. The performance of the within-chain parallelization can be sensitive to the choice of`grainsize`

, see Stan manual on reduce-sum for details.- custom_stan_model
[

`character(1)`

]

An optional character string that either contains a customized Stan model code or a path to a`.stan`

file that contains the code. Using this will override the generated model code. For expert users only.- debug
[

`list()`

]

A named list of form`name = TRUE`

indicating additional objects in the environment of the`dynamite`

function which are added to the return object. Additionally, values`no_compile = TRUE`

and`no_sampling = TRUE`

can be used to skip the compilation of the Stan code and sampling steps respectively. This can be useful for debugging when combined with`model_code = TRUE`

, which adds the Stan model code to the return object.- ...
For

`dynamite()`

, additional arguments to`rstan::sampling()`

or the`$sample()`

method of the`CmdStanModel`

object (see https://mc-stan.org/cmdstanr/reference/model-method-sample.html), such as`chains`

and`cores`

(`chains`

and`parallel_chains`

in`cmdstanr`

). For`summary()`

, additional arguments to`as.data.frame.dynamitefit()`

. For`print()`

, further arguments to the print method for tibbles (see tibble::formatting). Not used for`formula()`

.- x
[

`dynamitefit`

]

The model fit object.- full_diagnostics
By default, the effective sample size (ESS) and Rhat are computed only for the time- and group-invariant parameters (

`full_diagnostics = FALSE`

). Setting this to`TRUE`

computes ESS and Rhat values for all model parameters, which can take some time for complex models.- object
[

`dynamitefit`

]

The model fit object.

## Value

`dynamite`

returns a `dynamitefit`

object which is a list containing
the following components:

`stanfit`

A`stanfit`

object, see`rstan::sampling()`

for details.`dformulas`

A list of`dynamiteformula`

objects for internal use.`data`

A processed version of the input`data`

.`data_name`

Name of the input data object.`stan`

A`list`

containing various elements related to Stan model construction and sampling.`group_var`

Name of the variable defining the groups.`time_var`

Name of the variable defining the time index.`priors`

Data frame containing the used priors.`backend`

Either`"rstan"`

or`"cmdstanr"`

indicating which package was used in sampling.`permutation`

Randomized permutation of the posterior draws.`call`

Original function call as an object of class`call`

.

`formula`

returns a quoted expression.

`print`

returns `x`

invisibly.

`summary`

returns a `data.frame`

.

## Details

The best-case scalability of `dynamite`

in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.

## References

Santtu Tikka and Jouni Helske (2023). dynamite: An R Package for Dynamic Multivariate Panel Models. arXiv preprint, https://arxiv.org/abs/2302.01607.

Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. SocArxiv preprint, https://osf.io/preprints/socarxiv/mdwu5/.

## See also

Model fitting
`dynamice()`

,
`get_priors()`

,
`update.dynamitefit()`

Model formula construction
`dynamiteformula()`

,
`lags()`

,
`lfactor()`

,
`random_spec()`

,
`splines()`

Model outputs
`as.data.frame.dynamitefit()`

,
`as.data.table.dynamitefit()`

,
`as_draws_df.dynamitefit()`

,
`coef.dynamitefit()`

,
`confint.dynamitefit()`

,
`get_code()`

,
`get_data()`

,
`get_parameter_dims()`

,
`get_parameter_names()`

,
`get_parameter_types()`

,
`ndraws.dynamitefit()`

,
`nobs.dynamitefit()`

## Examples

```
data.table::setDTthreads(1) # For CRAN
# \donttest{
# Please update your rstan and StanHeaders installation before running
# on Windows
if (!identical(.Platform$OS.type, "windows")) {
fit <- dynamite(
dformula = obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20),
gaussian_example,
"time",
"id",
chains = 1,
refresh = 0
)
}
# }
data.table::setDTthreads(1) # For CRAN
formula(gaussian_example_fit)
#> obs(y ~ -1 + z + varying(~x + lag(y)) + random(~1), family = "gaussian") +
#> splines(df = 20, degree = 3, lb_tau = 0, noncentered = FALSE,
#> override = FALSE) + random_spec(correlated = FALSE, noncentered = TRUE)
data.table::setDTthreads(1) # For CRAN
print(gaussian_example_fit)
#> Model:
#> Family Formula
#> y gaussian y ~ -1 + z + varying(~x + lag(y)) + random(~1)
#>
#> Correlated random effects added for response(s): y
#>
#> Data: gaussian_example (Number of observations: 1450)
#> Grouping variable: id (Number of groups: 50)
#> Time index variable: time (Number of time points: 30)
#>
#> NUTS sampler diagnostics:
#>
#> No divergences, saturated max treedepths or low E-BFMIs.
#>
#> Smallest bulk-ESS: 178 (sigma_y)
#> Smallest tail-ESS: 156 (tau_alpha_y)
#> Largest Rhat: 1.018 (beta_y_z)
#>
#> Elapsed time (seconds):
#> warmup sample
#> chain:1 6.313 3.086
#> chain:2 6.472 3.322
#>
#> Summary statistics of the time- and group-invariant parameters:
#> # A tibble: 6 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 beta_y_z 1.97 1.97 0.0112 0.0113 1.95 1.98 1.02 230. 175.
#> 2 sigma_nu_y… 0.0955 0.0940 0.0117 0.0103 0.0791 0.120 1.00 195. 157.
#> 3 sigma_y 0.198 0.198 0.00404 0.00401 0.192 0.205 0.995 178. 188.
#> 4 tau_alpha_y 0.200 0.193 0.0463 0.0440 0.139 0.283 0.999 211. 156.
#> 5 tau_y_x 0.364 0.348 0.0770 0.0746 0.253 0.497 1.01 198. 187.
#> 6 tau_y_y_la… 0.105 0.102 0.0208 0.0195 0.0764 0.140 1.01 231. 186.
data.table::setDTthreads(1) # For CRAN
summary(gaussian_example_fit,
types = "beta",
probs = c(0.05, 0.1, 0.9, 0.95)
)
#> # A tibble: 1 × 12
#> parameter mean sd q5 q10 q90 q95 time group category response
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr> <chr>
#> 1 beta_y_z 1.97 0.0112 1.95 1.95 1.98 1.98 NA NA NA y
#> # ℹ 1 more variable: type <chr>
```