# Estimate a Bayesian Dynamic Multivariate Panel Model

Source:`R/dynamite.R`

, `R/print.R`

, `R/summary.R`

`dynamite.Rd`

Fit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via `dynamiteformula()`

. For more information and examples,
see 'Details' and the package vignettes.

The `formula`

method returns the model definition as a quoted expression.

Information on the estimated dynamite model can be obtained via
`print`

including the following: The model formula, the data, the smallest
effective sample sizes, largest Rhat and summary statistics of the
time-invariant model parameters.

The `summary`

method provides statistics of the posterior samples of the
model; this is an alias of `as.data.frame.dynamitefit()`

with
`summary = TRUE`

.

## Usage

```
dynamite(
dformula,
data,
time,
group = NULL,
priors = NULL,
backend = "rstan",
verbose = TRUE,
verbose_stan = FALSE,
stanc_options = list("O1"),
threads_per_chain = 1L,
grainsize = NULL,
debug = NULL,
...
)
# S3 method for dynamitefit
formula(x, ...)
# S3 method for dynamitefit
print(x, ...)
# S3 method for dynamitefit
summary(object, ...)
```

## Arguments

- dformula
[

`dynamiteformula`

]

The model formula. See`dynamiteformula()`

and 'Details'.- data
[

`data.frame`

,`tibble::tibble`

, or`data.table::data.table`

]

The data that contains the variables in the model in long format. Supported column types are`integer`

,`logical`

,`double`

, and`factor`

. Columns of type`character`

will be converted to factors. Unused factor levels will be dropped. The`data`

can contain missing values which will simply be ignored in the estimation in a case-wise fashion (per time-point and per channel). Input`data`

is converted to channel specific matrix representations via`stats::model.matrix.lm()`

.- time
[

`character(1)`

]

A column name of`data`

that denotes the time index of observations. If this variable is a factor, the integer representation of its levels are used internally for defining the time indexing.- group
[

`character(1)`

]

A column name of`data`

that denotes the unique groups or`NULL`

corresponding to a scenario without any groups. If`group`

is`NULL`

, a new column`.group`

is created with constant value`1L`

is created indicating that all observations belong to the same group. In case of name conflicts with`data`

, see the`group_var`

element of the return object to get the column name of the new variable.- priors
[

`data.frame`

]

An optional data frame with prior definitions. See`get_priors()`

and 'Details'.- backend
[

`character(1)`

]

Defines the backend interface to Stan, should be either`"rstan"`

(the default) or`"cmdstanr"`

. Note that`cmdstanr`

needs to be installed separately as it is not on CRAN. It also needs the actual`CmdStan`

software. See https://mc-stan.org/cmdstanr/ for details.- verbose
[

`logical(1)`

]

All warnings and messages are suppressed if set to`FALSE`

. Defaults to`TRUE`

.- verbose_stan
[

`logical(1)`

]

This is the`verbose`

argument for`rstan::sampling()`

. Defaults to`FALSE`

.- stanc_options
[

`list()`

]

This is the`stanc_options`

argument passed to the compile method of a`CmdStanModel`

object via`cmdstanr::cmdstan_model()`

when`backend = "cmdstanr"`

. Defaults to`list("O1")`

to enable level one compiler optimizations.- threads_per_chain
[

`integer(1)`

]

A Positive integer defining the number of parallel threads to use within each chain. Default is`1`

. See`rstan::rstan_options()`

and`cmdstanr::sample()`

for details.- grainsize
[

`integer(1)`

]

A Positive integer defining the suggested size of the partial sums when using within-chain parallelization. Default is number of time points divided by`threads_per_chain`

. Setting this to`1`

leads the workload division entirely to the internal scheduler. The performance of the within-chain parallelization can be sensitive to the choice of`grainsize`

, see Stan manual on reduce-sum for details.- debug
[

`list()`

]

A named list of form`name = TRUE`

indicating additional objects in the environment of the`dynamite`

function which are added to the return object. Additionally, values`no_compile = TRUE`

and`no_sampling = TRUE`

can be used to skip the compilation of the Stan code and sampling steps respectively. This can be useful for debugging when combined with`model_code = TRUE`

, which adds the Stan model code to the return object.- ...
For

`dynamite()`

, additional arguments to`rstan::sampling()`

or`cmdstanr::sample()`

, such as`chains`

and`cores`

(`chains`

and`parallel_chains`

in`cmdstanr`

). For`summary()`

, additional arguments to`as.data.frame.dynamitefit()`

. For`print()`

, further arguments to the print method for tibbles (see tibble::formatting). Not used for`formula()`

.- x
[

`dynamitefit`

]

The model fit object.- object
[

`dynamitefit`

]

The model fit object.

## Value

`dynamite`

returns a `dynamitefit`

object which is a list containing
the following components:

`stanfit`

A`stanfit`

object, see`rstan::sampling()`

for details.`dformulas`

A list of`dynamiteformula`

objects for internal use.`data`

A processed version of the input`data`

.`data_name`

Name of the input data object.`stan`

A`list`

containing various elements related to Stan model construction and sampling.`group_var`

Name of the variable defining the groups.`time_var`

Name of the variable defining the time index.`priors`

Data frame containing the used priors.`backend`

Either`"rstan"`

or`"cmdstanr"`

indicating which package was used in sampling.`call`

Original function call as an object of class`call`

.

`formula`

returns a quoted expression.

`print`

returns `x`

invisibly.

`summary`

returns a `data.frame`

.

## Details

The best-case scalability of `dynamite`

in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.

## References

Santtu Tikka and Jouni Helske (2023). `dynamite`

: An R Package for Dynamic
Multivariate Panel Models. arXiv preprint,
https://arxiv.org/abs/2302.01607.

Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. SocArxiv preprint, https://osf.io/preprints/socarxiv/mdwu5/.

## See also

Model fitting
`get_priors()`

,
`update.dynamitefit()`

Model formula construction
`dynamiteformula()`

,
`lags()`

,
`lfactor()`

,
`random_spec()`

,
`splines()`

Model outputs
`as.data.frame.dynamitefit()`

,
`as.data.table.dynamitefit()`

,
`as_draws_df.dynamitefit()`

,
`coef.dynamitefit()`

,
`confint.dynamitefit()`

,
`get_code()`

,
`get_data()`

,
`get_parameter_dims()`

,
`get_parameter_names()`

,
`get_parameter_types()`

,
`ndraws.dynamitefit()`

,
`nobs.dynamitefit()`

## Examples

```
# \donttest{
# Please update your rstan and StanHeaders installation before running
# on Windows
if (!identical(.Platform$OS.type, "windows")) {
fit <- dynamite(
dformula = obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20),
gaussian_example,
"time",
"id",
chains = 1,
refresh = 0
)
}
# }
formula(gaussian_example_fit)
#> obs(y ~ -1 + z + varying(~x + lag(y)) + random(~1), family = "gaussian") +
#> splines(df = 20, degree = 3, lb_tau = 0, noncentered = FALSE,
#> override = FALSE) + random_spec(correlated = FALSE, noncentered = TRUE)
print(gaussian_example_fit)
#> Model:
#> Family Formula
#> y gaussian y ~ -1 + z + varying(~x + lag(y)) + random(~1)
#>
#> Correlated random effects added for response(s): y
#>
#> Data: gaussian_example (Number of observations: 1450)
#> Grouping variable: id (Number of groups: 50)
#> Time index variable: time (Number of time points: 30)
#>
#> Smallest bulk-ESS: 40 (nu_y_alpha_id50)
#> Smallest tail-ESS: 54 (alpha_y[10])
#> Largest Rhat: 1.062 (nu_y_alpha_id50)
#>
#> Elapsed time (seconds):
#> warmup sample
#> chain:1 8.380 3.047
#> chain:2 8.248 4.767
#>
#> Summary statistics of the time- and group-invariant parameters:
#> # A tibble: 6 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1 beta_y_z 1.97 1.97 0.0118 0.0118 1.95 1.99 1.01 192. 156.
#> 2 sigma_nu_y… 0.0933 0.0941 0.0100 0.00973 0.0771 0.110 0.996 240. 231.
#> 3 sigma_y 0.197 0.198 0.00375 0.00365 0.191 0.203 1.01 223. 181.
#> 4 tau_alpha_y 0.213 0.208 0.0445 0.0443 0.154 0.297 0.997 199. 184.
#> 5 tau_y_x 0.358 0.349 0.0658 0.0646 0.267 0.469 0.999 209. 186.
#> 6 tau_y_y_la… 0.104 0.101 0.0180 0.0179 0.0780 0.134 1.00 189. 166.
summary(gaussian_example_fit,
types = "beta",
probs = c(0.05, 0.1, 0.9, 0.95)
)
#> # A tibble: 1 × 12
#> parameter mean sd q5 q10 q90 q95 time group category response
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr> <chr>
#> 1 beta_y_z 1.97 0.0118 1.95 1.95 1.98 1.99 NA NA NA y
#> # ℹ 1 more variable: type <chr>
```