Predicted risk, survival, hazard, or mortality from an ORSF model.
Usage
# S3 method for orsf_fit
predict(
object,
new_data,
pred_horizon = NULL,
pred_type = "risk",
na_action = "fail",
boundary_checks = TRUE,
n_thread = 1,
verbose_progress = FALSE,
pred_aggregate = TRUE,
...
)
Arguments
- object
(orsf_fit) a trained oblique random survival forest (see orsf).
- new_data
a data.frame, tibble, or data.table to compute predictions in.
- pred_horizon
(double) a value or vector indicating the time(s) that predictions will be calibrated to. E.g., if you were predicting risk of incident heart failure within the next 10 years, then
pred_horizon = 10
.pred_horizon
can beNULL
ifpred_type
is'mort'
, since mortality predictions are aggregated over all event times- pred_type
(character) the type of predictions to compute. Valid options are
'risk' : probability of having an event at or before
pred_horizon
.'surv' : 1 - risk.
'chf': cumulative hazard function
'mort': mortality prediction
- na_action
(character) what should happen when
new_data
contains missing values (i.e.,NA
values). Valid options are:'fail' : an error is thrown if
new_data
containsNA
values'pass' : the output will have
NA
in all rows wherenew_data
has 1 or moreNA
value for the predictors used byobject
'omit' : rows in
new_data
with incomplete data will be dropped'impute_meanmode' : missing values for continuous and categorical variables in
new_data
will be imputed using the mean and mode, respectively. To clarify, the mean and mode used to impute missing values are from the training data ofobject
, not fromnew_data
.
- boundary_checks
(logical) if
TRUE
,pred_horizon
will be checked to make sure the requested values are less than the maximum observed time inobject
's training data. IfFALSE
, these checks are skipped.- n_thread
(integer) number of threads to use while computing predictions. Default is one thread. To use the maximum number of threads that your system provides for concurrent execution, set
n_thread = 0
.- verbose_progress
(logical) if
TRUE
, progress messages are printed in the console. IfFALSE
(the default), nothing is printed.- pred_aggregate
(logical) If
TRUE
(the default), predictions will be aggregated over all trees by taking the mean. IfFALSE
, the returned output will contain one row per observation and one column for each tree. If the length ofpred_horizon
is two or more andpred_aggregate
isFALSE
, then the result will be a list of such matrices, with the i'th item in the list corresponding to the i'th value ofpred_horizon
.- ...
Further arguments passed to or from other methods (not currently used).
Value
a matrix
of predictions. Column j
of the matrix corresponds
to value j
in pred_horizon
. Row i
of the matrix corresponds to
row i
in new_data
.
Details
new_data
must have the same columns with equivalent types as the data
used to train object
. Also, factors in new_data
must not have levels
that were not in the data used to train object
.
pred_horizon
values should not exceed the maximum follow-up time in
object
's training data, but if you truly want to do this, set
boundary_checks = FALSE
and you can use a pred_horizon
as large
as you want. Note that predictions beyond the maximum follow-up time
in the object
's training data are equal to predictions at the
maximum follow-up time, because aorsf
does not estimate survival
beyond its maximum observed time.
If unspecified, pred_horizon
may be automatically specified as the value
used for oobag_pred_horizon
when object
was created (see orsf).
Examples
Begin by fitting an ORSF ensemble:
library(aorsf)
set.seed(329730)
index_train <- sample(nrow(pbc_orsf), 150)
pbc_orsf_train <- pbc_orsf[index_train, ]
pbc_orsf_test <- pbc_orsf[-index_train, ]
fit <- orsf(data = pbc_orsf_train,
formula = Surv(time, status) ~ . - id,
oobag_pred_horizon = 365.25 * 5)
Predict risk, survival, or cumulative hazard at one or several times:
# predicted risk, the default
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'risk',
pred_horizon = c(500, 1000, 1500))
## [,1] [,2] [,3]
## [1,] 0.49679905 0.77309053 0.90830168
## [2,] 0.03363621 0.08527972 0.17061414
## [3,] 0.15129784 0.30402666 0.43747212
## [4,] 0.01152480 0.02950914 0.07068198
## [5,] 0.01035341 0.01942262 0.05117679
# predicted survival, i.e., 1 - risk
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'surv',
pred_horizon = c(500, 1000, 1500))
## [,1] [,2] [,3]
## [1,] 0.5032009 0.2269095 0.09169832
## [2,] 0.9663638 0.9147203 0.82938586
## [3,] 0.8487022 0.6959733 0.56252788
## [4,] 0.9884752 0.9704909 0.92931802
## [5,] 0.9896466 0.9805774 0.94882321
# predicted cumulative hazard function
# (expected number of events for person i at time j)
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'chf',
pred_horizon = c(500, 1000, 1500))
## [,1] [,2] [,3]
## [1,] 0.74442414 1.39538511 1.78344589
## [2,] 0.03473938 0.10418984 0.24047328
## [3,] 0.19732086 0.47015754 0.73629459
## [4,] 0.01169147 0.03223257 0.09564168
## [5,] 0.01072007 0.02240040 0.06464319
Predict mortality, defined as the number of events in the forest’s population if all observations had characteristics like the current observation. This type of prediction does not require you to specify a prediction horizon
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'mort')
## [,1]
## [1,] 83.08611
## [2,] 27.48146
## [3,] 43.52432
## [4,] 15.20281
## [5,] 10.56334