
Estimate a Bayesian Dynamic Multivariate Panel Model
Source:R/dynamite.R, R/print.R, R/summary.R
dynamite.RdFit a Bayesian dynamic multivariate panel model (DMPM) using Stan for
Bayesian inference. The dynamite package supports a wide range of
distributions and allows the user to flexibly customize the priors for the
model parameters. The dynamite model is specified using standard R formula
syntax via dynamiteformula(). For more information and examples,
see 'Details' and the package vignettes.
The formula method returns the model definition as a quoted expression.
Information on the estimated dynamite model can be obtained via
print() including the following: The model formula, the data,
the smallest effective sample sizes, largest Rhat and summary statistics of
the time-invariant and group-invariant model parameters.
The summary() method provides statistics of the posterior samples of the
model; this is an alias of as.data.frame.dynamitefit() with
summary = TRUE.
Usage
dynamite(
dformula,
data,
time,
group = NULL,
priors = NULL,
backend = "cmdstanr",
verbose = TRUE,
verbose_stan = FALSE,
stanc_options = list("O0"),
threads_per_chain = 1L,
grainsize = NULL,
custom_stan_model = NULL,
debug = NULL,
interval = 1L,
...
)
# S3 method for class 'dynamitefit'
formula(x, ...)
# S3 method for class 'dynamitefit'
print(x, full_diagnostics = FALSE, ...)
# S3 method for class 'dynamitefit'
summary(object, ...)Arguments
- dformula
[
dynamiteformula]
The model formula. Seedynamiteformula()and 'Details'.- data
[
data.frame,tibble::tibble, ordata.table::data.table]
The data that contains the variables in the model in long format. Supported column types areinteger,logical,double, andfactor. Columns of typecharacterwill be converted to factors. Unused factor levels will be dropped. Thedatacan contain missing values which will simply be ignored in the estimation in a case-wise fashion (per time-point and per channel). Inputdatais converted to channel specific matrix representations viastats::model.matrix.lm().- time
[
character(1)]
A column name ofdatathat denotes the time index of observations. If this variable is a factor, the integer representation of its levels are used internally for defining the time indexing.- group
[
character(1)]
A column name ofdatathat denotes the unique groups orNULLcorresponding to a scenario without any groups. IfgroupisNULL, a new column.groupis created with constant value1Lis created indicating that all observations belong to the same group. In case of name conflicts withdata, see thegroup_varelement of the return object to get the column name of the new variable.- priors
[
data.frame]
An optional data frame with prior definitions. Seeget_priors()and 'Details'.- backend
[
character(1)]
Defines the backend interface to Stan, should be either"cmdstanr"(the default) or"rstan". Note thatcmdstanrneeds to be installed separately as it is not on CRAN. It also needs the actualCmdStansoftware. See https://mc-stan.org/cmdstanr/ for details. Defaults to"rstan"if"cmdstanr"cannot be used.- verbose
[
logical(1)]
All warnings and messages are suppressed if set toFALSE. Defaults toTRUE. Setting this toFALSEwill also disable checks for perfect collinearity in the model matrix.- verbose_stan
[
logical(1)]
This is theverboseargument forrstan::sampling(). Defaults toFALSE.- stanc_options
[
list()]
This is thestanc_optionsargument passed to the compile method of aCmdStanModelobject viacmdstan_model()whenbackend = "cmdstanr". Defaults tolist("O0"). To enable level one compiler optimizations, uselist("O1"). See https://mc-stan.org/cmdstanr/reference/cmdstan_model.html for details.- threads_per_chain
[
integer(1)]
A Positive integer defining the number of parallel threads to use within each chain. Default is1. Seerstan::rstan_options()and https://mc-stan.org/cmdstanr/reference/model-method-sample.html for details.- grainsize
[
integer(1)]
A positive integer defining the suggested size of the partial sums when using within-chain parallelization. Default is number of time points divided bythreads_per_chain. Setting this to1leads the workload division entirely to the internal scheduler. The performance of the within-chain parallelization can be sensitive to the choice ofgrainsize, see Stan manual on reduce-sum for details.- custom_stan_model
[
character(1)]
An optional character string that either contains a customized Stan model code or a path to a.stanfile that contains the code. Using this will override the generated model code. For expert users only.- debug
[
list()]
A named list of formname = TRUEindicating additional objects in the environment of thedynamitefunction which are added to the return object. Additionally, valuesno_compile = TRUEandno_sampling = TRUEcan be used to skip the compilation of the Stan code and sampling steps respectively. This can be useful for debugging when combined withmodel_code = TRUE, which adds the Stan model code to the return object.- interval
[
integer(1)]
This arguments acts as an offset for the evaluation of lagged observations when measurements are not available at every time point. For example, if measurements are only available at every second time point, settinginterval = 2means that a lag of orderkwill instead use the observation at2 * ktime units in the past. The default value is1meaning that there is a one-to-one correspondence between the lag order and the time scale. For expert users only.- ...
For
dynamite(), additional arguments torstan::sampling()or the$sample()method of theCmdStanModelobject (see https://mc-stan.org/cmdstanr/reference/model-method-sample.html), such aschainsandcores(chainsandparallel_chainsincmdstanr). Forsummary(), additional arguments toas.data.frame.dynamitefit(). Forprint(), further arguments to the print method for tibbles (see tibble::formatting). Not used forformula().- x
[
dynamitefit]
The model fit object.- full_diagnostics
By default, the effective sample size (ESS) and Rhat are computed only for the time- and group-invariant parameters (
full_diagnostics = FALSE). Setting this toTRUEcomputes ESS and Rhat values for all model parameters, which can take some time for complex models.- object
[
dynamitefit]
The model fit object.
Value
dynamite returns a dynamitefit object which is a list containing
the following components:
stanfit
Astanfitobject, seerstan::sampling()for details.dformulas
A list ofdynamiteformulaobjects for internal use.data
A processed version of the inputdata.data_name
Name of the input data object.stan
Alistcontaining various elements related to Stan model construction and sampling.group_var
Name of the variable defining the groups.time_var
Name of the variable defining the time index.priors
Data frame containing the used priors.backend
Either"rstan"or"cmdstanr"indicating which package was used in sampling.permutation
Randomized permutation of the posterior draws.call
Original function call as an object of classcall.
formula returns a quoted expression.
print returns x invisibly.
summary returns a data.frame.
Details
The best-case scalability of dynamite in terms of data size should be
approximately linear in terms of number of time points and and number of
groups, but as wall-clock time of the MCMC algorithms provided by Stan can
depend on the discrepancy of the data and the model (and the subsequent
shape of the posterior), this can vary greatly.
References
Santtu Tikka and Jouni Helske (2024). dynamite: An R Package for Dynamic Multivariate Panel Models. arXiv preprint, doi:10.48550/arXiv.2302.01607.
Jouni Helske and Santtu Tikka (2022). Estimating Causal Effects from Panel Data with Dynamic Multivariate Panel Models. Advances in Life Course Research, 60, 100617. doi:10.1016/j.alcr.2024.100617.
See also
Model fitting
dynamice(),
get_priors(),
update.dynamitefit()
Model formula construction
dynamiteformula(),
lags(),
lfactor(),
random_spec(),
splines()
Model outputs
as.data.frame.dynamitefit(),
as.data.table.dynamitefit(),
as_draws_df.dynamitefit(),
coef.dynamitefit(),
confint.dynamitefit(),
get_code(),
get_data(),
get_parameter_dims(),
get_parameter_names(),
get_parameter_types(),
ndraws.dynamitefit(),
nobs.dynamitefit()
Examples
data.table::setDTthreads(1) # For CRAN
# \donttest{
# Please update your rstan and StanHeaders installation before running
# on Windows
if (!identical(.Platform$OS.type, "windows")) {
fit <- dynamite(
dformula = obs(y ~ -1 + varying(~x), family = "gaussian") +
lags(type = "varying") +
splines(df = 20),
gaussian_example,
"time",
"id",
chains = 1,
refresh = 0
)
}
#> CmdStan path has not been set yet. See ?set_cmdstan_path.
#> ℹ Switching to rstan backend.
#> Error in rstan::stan_model(model_code = model_code): Boost not found; call install.packages('BH')
# }
data.table::setDTthreads(1) # For CRAN
formula(gaussian_example_fit)
#> obs(y ~ -1 + z + varying(~x + lag(y)) + random(~1), family = "gaussian") +
#> splines(df = 20, degree = 3, lb_tau = 0, noncentered = FALSE,
#> override = FALSE) + random_spec(correlated = FALSE, noncentered = TRUE)
data.table::setDTthreads(1) # For CRAN
print(gaussian_example_fit)
#> Model:
#> Family Formula
#> y gaussian y ~ -1 + z + varying(~x + lag(y)) + random(~1)
#>
#> Correlated random effects added for response(s): y
#>
#> Data: gaussian_example (Number of observations: 1450)
#> Grouping variable: id (Number of groups: 50)
#> Time index variable: time (Number of time points: 30)
#>
#> NUTS sampler diagnostics:
#>
#> No divergences, saturated max treedepths or low E-BFMIs.
#>
#> Smallest bulk-ESS: 72 (alpha_y[28])
#> Smallest tail-ESS: 81 (omega_alpha_y_d3)
#> Largest Rhat: 1.035 (delta_y_y_lag1[28])
#>
#> Elapsed time (seconds):
#> warmup sample
#> chain:1 6.996 4.193
#> chain:2 7.302 4.083
#>
#> Summary statistics of the time- and group-invariant parameters:
#> # A tibble: 113 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 alpha_y[2] 0.0579 0.0594 0.0301 0.0318 0.00740 0.102 1.01 144. 120.
#> 2 alpha_y[3] 0.0973 0.0955 0.0452 0.0478 0.0268 0.170 1.01 225. 191.
#> 3 alpha_y[4] 0.169 0.168 0.0407 0.0416 0.106 0.234 1.02 202. 188.
#> 4 alpha_y[5] 0.264 0.263 0.0410 0.0429 0.202 0.329 0.998 285. 218.
#> 5 alpha_y[6] 0.303 0.300 0.0392 0.0391 0.245 0.374 1.01 278. 154.
#> 6 alpha_y[7] 0.332 0.335 0.0384 0.0397 0.265 0.390 1.01 223. 114.
#> 7 alpha_y[8] 0.422 0.423 0.0348 0.0302 0.365 0.482 1.00 247. 163.
#> 8 alpha_y[9] 0.459 0.456 0.0382 0.0381 0.390 0.520 0.997 207. 220.
#> 9 alpha_y[10] 0.414 0.414 0.0433 0.0456 0.350 0.494 1.02 126. 165.
#> 10 alpha_y[11] 0.405 0.407 0.0412 0.0433 0.340 0.479 1.00 196. 231.
#> # ℹ 103 more rows
data.table::setDTthreads(1) # For CRAN
summary(gaussian_example_fit,
types = "beta",
probs = c(0.05, 0.1, 0.9, 0.95)
)
#> # A tibble: 1 × 12
#> parameter mean sd q5 q10 q90 q95 time group category response
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr> <chr>
#> 1 beta_y_z 1.97 0.0122 1.95 1.95 1.98 1.99 NA NA NA y
#> # ℹ 1 more variable: type <chr>