
Estimate a Bayesian Dynamic Multivariate Panel Model
Source:R/dynamite.R
, R/print.R
, R/summary.R
dynamite.Rd
Fit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via dynamiteformula()
. For more information and examples,
see 'Details' and the package vignettes.
The formula
method returns the model definition as a quoted expression.
Information on the estimated dynamite model can be obtained via
print
including the following: The model formula, the data, the smallest
effective sample sizes, largest Rhat and summary statistics of the
time-invariant model parameters.
The summary
method provides statistics of the posterior samples of the
model; this is an alias of as.data.frame.dynamitefit()
with
summary = TRUE
.
Usage
dynamite(
dformula,
data,
time,
group = NULL,
priors = NULL,
backend = "rstan",
verbose = TRUE,
verbose_stan = FALSE,
stanc_options = list("O1"),
threads_per_chain = 1L,
grainsize = NULL,
debug = NULL,
...
)
# S3 method for dynamitefit
formula(x, ...)
# S3 method for dynamitefit
print(x, ...)
# S3 method for dynamitefit
summary(object, ...)
Arguments
- dformula
[
dynamiteformula
]
The model formula. Seedynamiteformula()
and 'Details'.- data
[
data.frame
,tibble::tibble
, ordata.table::data.table
]
The data that contains the variables in the model in long format. Supported column types areinteger
,logical
,double
, andfactor
. Columns of typecharacter
will be converted to factors. Unused factor levels will be dropped. Thedata
can contain missing values which will simply be ignored in the estimation in a case-wise fashion (per time-point and per channel). Inputdata
is converted to channel specific matrix representations viastats::model.matrix.lm()
.- time
[
character(1)
]
A column name ofdata
that denotes the time index of observations. If this variable is a factor, the integer representation of its levels are used internally for defining the time indexing.- group
[
character(1)
]
A column name ofdata
that denotes the unique groups orNULL
corresponding to a scenario without any groups. Ifgroup
isNULL
, a new column.group
is created with constant value1L
is created indicating that all observations belong to the same group. In case of name conflicts withdata
, see thegroup_var
element of the return object to get the column name of the new variable.- priors
[
data.frame
]
An optional data frame with prior definitions. Seeget_priors()
and 'Details'.- backend
[
character(1)
]
Defines the backend interface to Stan, should be either"rstan"
(the default) or"cmdstanr"
. Note thatcmdstanr
needs to be installed separately as it is not on CRAN. It also needs the actualCmdStan
software. See https://mc-stan.org/cmdstanr/ for details.- verbose
[
logical(1)
]
All warnings and messages are suppressed if set toFALSE
. Defaults toTRUE
.- verbose_stan
[
logical(1)
]
This is theverbose
argument forrstan::sampling()
. Defaults toFALSE
.- stanc_options
[
list()
]
This is thestanc_options
argument passed to the compile method of aCmdStanModel
object viacmdstanr::cmdstan_model()
whenbackend = "cmdstanr"
. Defaults tolist("O1")
to enable level one compiler optimizations.- threads_per_chain
[
integer(1)
]
A Positive integer defining the number of parallel threads to use within each chain. Default is1
. Seerstan::rstan_options()
andcmdstanr::sample()
for details.- grainsize
[
integer(1)
]
A Positive integer defining the suggested size of the partial sums when using within-chain parallelization. Default is number of time points divided bythreads_per_chain
. Setting this to1
leads the workload division entirely to the internal scheduler. The performance of the within-chain parallelization can be sensitive to the choice ofgrainsize
, see Stan manual on reduce-sum for details.- debug
[
list()
]
A named list of formname = TRUE
indicating additional objects in the environment of thedynamite
function which are added to the return object. Additionally, valuesno_compile = TRUE
andno_sampling = TRUE
can be used to skip the compilation of the Stan code and sampling steps respectively. This can be useful for debugging when combined withmodel_code = TRUE
, which adds the Stan model code to the return object.- ...
For
dynamite()
, additional arguments torstan::sampling()
orcmdstanr::sample()
, such aschains
andcores
(chains
andparallel_chains
incmdstanr
). Forsummary()
, additional arguments toas.data.frame.dynamitefit()
. Forprint()
, further arguments to the print method for tibbles (see tibble::formatting). Not used forformula()
.- x
[
dynamitefit
]
The model fit object.- object
[
dynamitefit
]
The model fit object.
Value
dynamite
returns a dynamitefit
object which is a list containing
the following components:
stanfit
Astanfit
object, seerstan::sampling()
for details.dformulas
A list ofdynamiteformula
objects for internal use.data
A processed version of the inputdata
.data_name
Name of the input data object.stan
Alist
containing various elements related to Stan model construction and sampling.group_var
Name of the variable defining the groups.time_var
Name of the variable defining the time index.priors
Data frame containing the used priors.backend
Either"rstan"
or"cmdstanr"
indicating which package was used in sampling.call
Original function call as an object of classcall
.
formula
returns a quoted expression.
print
returns x
invisibly.
summary
returns a data.frame
.
Details
The best-case scalability of dynamite
in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.
References
Santtu Tikka and Jouni Helske (2023). dynamite
: An R Package for Dynamic
Multivariate Panel Models. arXiv preprint,
https://arxiv.org/abs/2302.01607.
Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. SocArxiv preprint, https://osf.io/preprints/socarxiv/mdwu5/.
See also
Model fitting
get_priors()
,
update.dynamitefit()
Model formula construction
dynamiteformula()
,
lags()
,
lfactor()
,
random_spec()
,
splines()
Model outputs
as.data.frame.dynamitefit()
,
as.data.table.dynamitefit()
,
as_draws_df.dynamitefit()
,
coef.dynamitefit()
,
confint.dynamitefit()
,
get_code()
,
get_data()
,
get_parameter_dims()
,
get_parameter_names()
,
get_parameter_types()
,
ndraws.dynamitefit()
,
nobs.dynamitefit()
Examples
# \donttest{
# Please update your rstan and StanHeaders installation before running
# on Windows
if (!identical(.Platform$OS.type, "windows")) {
fit <- dynamite(
dformula = obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20),
gaussian_example,
"time",
"id",
chains = 1,
refresh = 0
)
}
# }
formula(gaussian_example_fit)
#> obs(y ~ -1 + z + varying(~x + lag(y)) + random(~1), family = "gaussian") +
#> splines(df = 20, degree = 3, lb_tau = 0, noncentered = FALSE,
#> override = FALSE) + random_spec(correlated = FALSE, noncentered = TRUE)
print(gaussian_example_fit)
#> Model:
#> Family Formula
#> y gaussian y ~ -1 + z + varying(~x + lag(y)) + random(~1)
#>
#> Correlated random effects added for response(s): y
#>
#> Data: gaussian_example (Number of observations: 1450)
#> Grouping variable: id (Number of groups: 50)
#> Time index variable: time (Number of time points: 30)
#>
#> Smallest bulk-ESS: 40 (nu_y_alpha_id50)
#> Smallest tail-ESS: 54 (alpha_y[10])
#> Largest Rhat: 1.062 (nu_y_alpha_id50)
#>
#> Elapsed time (seconds):
#> warmup sample
#> chain:1 8.380 3.047
#> chain:2 8.248 4.767
#>
#> Summary statistics of the time- and group-invariant parameters:
#> # A tibble: 6 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1 beta_y_z 1.97 1.97 0.0118 0.0118 1.95 1.99 1.01 192. 156.
#> 2 sigma_nu_y… 0.0933 0.0941 0.0100 0.00973 0.0771 0.110 0.996 240. 231.
#> 3 sigma_y 0.197 0.198 0.00375 0.00365 0.191 0.203 1.01 223. 181.
#> 4 tau_alpha_y 0.213 0.208 0.0445 0.0443 0.154 0.297 0.997 199. 184.
#> 5 tau_y_x 0.358 0.349 0.0658 0.0646 0.267 0.469 0.999 209. 186.
#> 6 tau_y_y_la… 0.104 0.101 0.0180 0.0179 0.0780 0.134 1.00 189. 166.
summary(gaussian_example_fit,
types = "beta",
probs = c(0.05, 0.1, 0.9, 0.95)
)
#> # A tibble: 1 × 12
#> parameter mean sd q5 q10 q90 q95 time group category response
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr> <chr>
#> 1 beta_y_z 1.97 0.0118 1.95 1.95 1.98 1.99 NA NA NA y
#> # ℹ 1 more variable: type <chr>