Estimate a Bayesian Dynamic Multivariate Panel Model
Source:R/dynamite.R
, R/print.R
, R/summary.R
dynamite.Rd
Fit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via dynamiteformula()
. For more information and examples,
see 'Details' and the package vignettes.
The formula
method returns the model definition as a quoted expression.
Information on the estimated dynamite
model can be obtained via
print()
including the following: The model formula, the data,
the smallest effective sample sizes, largest Rhat and summary statistics of
the time-invariant and group-invariant model parameters.
The summary()
method provides statistics of the posterior samples of the
model; this is an alias of as.data.frame.dynamitefit()
with
summary = TRUE
.
Usage
dynamite(
dformula,
data,
time,
group = NULL,
priors = NULL,
backend = "rstan",
verbose = TRUE,
verbose_stan = FALSE,
stanc_options = list("O0"),
threads_per_chain = 1L,
grainsize = NULL,
custom_stan_model = NULL,
debug = NULL,
...
)
# S3 method for class 'dynamitefit'
formula(x, ...)
# S3 method for class 'dynamitefit'
print(x, full_diagnostics = FALSE, ...)
# S3 method for class 'dynamitefit'
summary(object, ...)
Arguments
- dformula
[
dynamiteformula
]
The model formula. Seedynamiteformula()
and 'Details'.- data
[
data.frame
,tibble::tibble
, ordata.table::data.table
]
The data that contains the variables in the model in long format. Supported column types areinteger
,logical
,double
, andfactor
. Columns of typecharacter
will be converted to factors. Unused factor levels will be dropped. Thedata
can contain missing values which will simply be ignored in the estimation in a case-wise fashion (per time-point and per channel). Inputdata
is converted to channel specific matrix representations viastats::model.matrix.lm()
.- time
[
character(1)
]
A column name ofdata
that denotes the time index of observations. If this variable is a factor, the integer representation of its levels are used internally for defining the time indexing.- group
[
character(1)
]
A column name ofdata
that denotes the unique groups orNULL
corresponding to a scenario without any groups. Ifgroup
isNULL
, a new column.group
is created with constant value1L
is created indicating that all observations belong to the same group. In case of name conflicts withdata
, see thegroup_var
element of the return object to get the column name of the new variable.- priors
[
data.frame
]
An optional data frame with prior definitions. Seeget_priors()
and 'Details'.- backend
[
character(1)
]
Defines the backend interface to Stan, should be either"rstan"
(the default) or"cmdstanr"
. Note thatcmdstanr
needs to be installed separately as it is not on CRAN. It also needs the actualCmdStan
software. See https://mc-stan.org/cmdstanr/ for details.- verbose
[
logical(1)
]
All warnings and messages are suppressed if set toFALSE
. Defaults toTRUE
. Setting this toFALSE
will also disable checks for perfect collinearity in the model matrix.- verbose_stan
[
logical(1)
]
This is theverbose
argument forrstan::sampling()
. Defaults toFALSE
.- stanc_options
[
list()
]
This is thestanc_options
argument passed to the compile method of aCmdStanModel
object viacmdstan_model()
whenbackend = "cmdstanr"
. Defaults tolist("O0")
. To enable level one compiler optimizations, uselist("O1")
. See https://mc-stan.org/cmdstanr/reference/cmdstan_model.html for details.- threads_per_chain
[
integer(1)
]
A Positive integer defining the number of parallel threads to use within each chain. Default is1
. Seerstan::rstan_options()
andcmdstanr::sample()
for details.- grainsize
[
integer(1)
]
A positive integer defining the suggested size of the partial sums when using within-chain parallelization. Default is number of time points divided bythreads_per_chain
. Setting this to1
leads the workload division entirely to the internal scheduler. The performance of the within-chain parallelization can be sensitive to the choice ofgrainsize
, see Stan manual on reduce-sum for details.- custom_stan_model
[
character(1)
]
An optional character string that either contains a customized Stan model code or a path to a.stan
file that contains the code. Using this will override the generated model code. For expert users only.- debug
[
list()
]
A named list of formname = TRUE
indicating additional objects in the environment of thedynamite
function which are added to the return object. Additionally, valuesno_compile = TRUE
andno_sampling = TRUE
can be used to skip the compilation of the Stan code and sampling steps respectively. This can be useful for debugging when combined withmodel_code = TRUE
, which adds the Stan model code to the return object.- ...
For
dynamite()
, additional arguments torstan::sampling()
or the$sample()
method of theCmdStanModel
object (see https://mc-stan.org/cmdstanr/reference/model-method-sample.html), such aschains
andcores
(chains
andparallel_chains
incmdstanr
). Forsummary()
, additional arguments toas.data.frame.dynamitefit()
. Forprint()
, further arguments to the print method for tibbles (see tibble::formatting). Not used forformula()
.- x
[
dynamitefit
]
The model fit object.- full_diagnostics
By default, the effective sample size (ESS) and Rhat are computed only for the time- and group-invariant parameters (
full_diagnostics = FALSE
). Setting this toTRUE
computes ESS and Rhat values for all model parameters, which can take some time for complex models.- object
[
dynamitefit
]
The model fit object.
Value
dynamite
returns a dynamitefit
object which is a list containing
the following components:
stanfit
Astanfit
object, seerstan::sampling()
for details.dformulas
A list ofdynamiteformula
objects for internal use.data
A processed version of the inputdata
.data_name
Name of the input data object.stan
Alist
containing various elements related to Stan model construction and sampling.group_var
Name of the variable defining the groups.time_var
Name of the variable defining the time index.priors
Data frame containing the used priors.backend
Either"rstan"
or"cmdstanr"
indicating which package was used in sampling.permutation
Randomized permutation of the posterior draws.call
Original function call as an object of classcall
.
formula
returns a quoted expression.
print
returns x
invisibly.
summary
returns a data.frame
.
Details
The best-case scalability of dynamite
in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.
References
Santtu Tikka and Jouni Helske (2023). dynamite: An R Package for Dynamic Multivariate Panel Models. arXiv preprint, https://arxiv.org/abs/2302.01607.
Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. SocArxiv preprint, https://osf.io/preprints/socarxiv/mdwu5/.
See also
Model fitting
dynamice()
,
get_priors()
,
update.dynamitefit()
Model formula construction
dynamiteformula()
,
lags()
,
lfactor()
,
random_spec()
,
splines()
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
Examples
data.table::setDTthreads(1) # For CRAN
# \donttest{
# Please update your rstan and StanHeaders installation before running
# on Windows
if (!identical(.Platform$OS.type, "windows")) {
fit <- dynamite(
dformula = obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20),
gaussian_example,
"time",
"id",
chains = 1,
refresh = 0
)
}
# }
data.table::setDTthreads(1) # For CRAN
formula(gaussian_example_fit)
#> obs(y ~ -1 + z + varying(~x + lag(y)) + random(~1), family = "gaussian") +
#> splines(df = 20, degree = 3, lb_tau = 0, noncentered = FALSE,
#> override = FALSE) + random_spec(correlated = FALSE, noncentered = TRUE)
data.table::setDTthreads(1) # For CRAN
print(gaussian_example_fit)
#> Model:
#> Family Formula
#> y gaussian y ~ -1 + z + varying(~x + lag(y)) + random(~1)
#>
#> Correlated random effects added for response(s): y
#>
#> Data: gaussian_example (Number of observations: 1450)
#> Grouping variable: id (Number of groups: 50)
#> Time index variable: time (Number of time points: 30)
#>
#> NUTS sampler diagnostics:
#>
#> No divergences, saturated max treedepths or low E-BFMIs.
#>
#> Smallest bulk-ESS: 178 (sigma_y)
#> Smallest tail-ESS: 156 (tau_alpha_y)
#> Largest Rhat: 1.018 (beta_y_z)
#>
#> Elapsed time (seconds):
#> warmup sample
#> chain:1 6.313 3.086
#> chain:2 6.472 3.322
#>
#> Summary statistics of the time- and group-invariant parameters:
#> # A tibble: 6 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 beta_y_z 1.97 1.97 0.0112 0.0113 1.95 1.98 1.02 230. 175.
#> 2 sigma_nu_y… 0.0955 0.0940 0.0117 0.0103 0.0791 0.120 1.00 195. 157.
#> 3 sigma_y 0.198 0.198 0.00404 0.00401 0.192 0.205 0.995 178. 188.
#> 4 tau_alpha_y 0.200 0.193 0.0463 0.0440 0.139 0.283 0.999 211. 156.
#> 5 tau_y_x 0.364 0.348 0.0770 0.0746 0.253 0.497 1.01 198. 187.
#> 6 tau_y_y_la… 0.105 0.102 0.0208 0.0195 0.0764 0.140 1.01 231. 186.
data.table::setDTthreads(1) # For CRAN
summary(gaussian_example_fit,
types = "beta",
probs = c(0.05, 0.1, 0.9, 0.95)
)
#> # A tibble: 1 × 12
#> parameter mean sd q5 q10 q90 q95 time group category response
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr> <chr>
#> 1 beta_y_z 1.97 0.0112 1.95 1.95 1.98 1.98 NA NA NA y
#> # ℹ 1 more variable: type <chr>