Go faster
Analyses can slow to a crawl when models need hours to run. In this
article you will find a few tricks to prevent this bottleneck when using
orsf().
Don’t specify a control
The default control for orsf() is
NULL because, if unspecified, orsf() will pick
the fastest possible control for you depending on the type
of forest being grown. The default control run-time
compared to other approaches can be striking. For example:
time_fast <- system.time(
expr = orsf(pbc_orsf,
formula = time+status~. -id,
n_tree = 5)
)
time_net <- system.time(
expr = orsf(pbc_orsf,
formula = time+status~. -id,
control = orsf_control_survival(method = 'net'),
n_tree = 5)
)
# unspecified control is much faster
time_net['elapsed'] / time_fast['elapsed']
#> elapsed
#> 45.59091Use n_thread
The n_thread argument uses multi-threading to run
aorsf functions in parallel when possible. If you know how
many threads you want, e.g. you want exactly 5, set
n_thread = 5. If you aren’t sure how many threads you have
available but want to use a feasible amount, using
n_thread = 0 (the default) tells aorsf to do
that for you.
# automatically pick number of threads based on amount available
orsf(pbc_orsf,
formula = time+status~. -id,
n_tree = 5,
n_thread = 0)Note: sometimes multi-threading is not possible. For example, because
R is a single threaded language, multi-threading cannot be applied when
orsf() needs to call R functions from C++, which occurs
when a customized R function is used to find linear combination of
variables or compute prediction accuracy.
Do less
There are some inputs in orsf() that can be adjusted to
make it run faster:
set
n_retryto0set
oobag_pred_typeto'none'set
importanceto'none'increase
split_min_events,split_min_obs,leaf_min_events, orleaf_min_obsto make trees stop growing soonerincrease
split_min_statto enforce more strict requirements for growing deeper trees.
Applying these tips:
orsf(pbc_orsf,
formula = time+status~.,
n_thread = 0,
n_tree = 5,
n_retry = 0,
oobag_pred_type = 'none',
importance = 'none',
split_min_events = 20,
leaf_min_events = 10,
split_min_stat = 10)While modifying these inputs can make orsf() run faster,
they can also impact prediction accuracy.
Show progress
Setting verbose_progress = TRUE doesn’t make anything
run faster, but it can help make it feel like things are
running less slow.
verbose_fit <- orsf(pbc_orsf,
formula = time+status~. -id,
n_tree = 5,
verbose_progress = TRUE)
#> Growing trees: 100%.
#> Computing predictions: 100%.Don’t wait. Estimate!
Instead of running a model and hoping it will be fast, you can
estimate how long a specification of that model will take by using
no_fit = TRUE in the call to orsf().
fit_spec <- orsf(pbc_orsf,
formula = time+status~. -id,
control = orsf_control_survival(method = 'net'),
n_tree = 2000,
no_fit = TRUE)
# how much time it takes to estimate training time:
system.time(
time_est <- orsf_time_to_train(fit_spec, n_tree_subset = 5)
)
#> user system elapsed
#> 0.283 0.000 0.282
# the estimated training time:
time_est
#> Time difference of 112.619 secs