Predicted risk, survival, hazard, or mortality from an ORSF model.

## Usage

```
# S3 method for orsf_fit
predict(
object,
new_data,
pred_horizon = NULL,
pred_type = "risk",
na_action = "fail",
boundary_checks = TRUE,
n_thread = 1,
verbose_progress = FALSE,
pred_aggregate = TRUE,
...
)
```

## Arguments

- object
(

*orsf_fit*) a trained oblique random survival forest (see orsf).- new_data
a data.frame, tibble, or data.table to compute predictions in.

- pred_horizon
(

*double*) a value or vector indicating the time(s) that predictions will be calibrated to. E.g., if you were predicting risk of incident heart failure within the next 10 years, then`pred_horizon = 10`

.`pred_horizon`

can be`NULL`

if`pred_type`

is`'mort'`

, since mortality predictions are aggregated over all event times- pred_type
(

*character*) the type of predictions to compute. Valid options are'risk' : probability of having an event at or before

`pred_horizon`

.'surv' : 1 - risk.

'chf': cumulative hazard function

'mort': mortality prediction

- na_action
(

*character*) what should happen when`new_data`

contains missing values (i.e.,`NA`

values). Valid options are:'fail' : an error is thrown if

`new_data`

contains`NA`

values'pass' : the output will have

`NA`

in all rows where`new_data`

has 1 or more`NA`

value for the predictors used by`object`

'omit' : rows in

`new_data`

with incomplete data will be dropped'impute_meanmode' : missing values for continuous and categorical variables in

`new_data`

will be imputed using the mean and mode, respectively. To clarify, the mean and mode used to impute missing values are from the training data of`object`

, not from`new_data`

.

- boundary_checks
(

*logical*) if`TRUE`

,`pred_horizon`

will be checked to make sure the requested values are less than the maximum observed time in`object`

's training data. If`FALSE`

, these checks are skipped.- n_thread
(

*integer*) number of threads to use while computing predictions. Default is one thread. To use the maximum number of threads that your system provides for concurrent execution, set`n_thread = 0`

.- verbose_progress
(

*logical*) if`TRUE`

, progress messages are printed in the console. If`FALSE`

(the default), nothing is printed.- pred_aggregate
(

*logical*) If`TRUE`

(the default), predictions will be aggregated over all trees by taking the mean. If`FALSE`

, the returned output will contain one row per observation and one column for each tree. If the length of`pred_horizon`

is two or more and`pred_aggregate`

is`FALSE`

, then the result will be a list of such matrices, with the i'th item in the list corresponding to the i'th value of`pred_horizon`

.- ...
Further arguments passed to or from other methods (not currently used).

## Value

a `matrix`

of predictions. Column `j`

of the matrix corresponds
to value `j`

in `pred_horizon`

. Row `i`

of the matrix corresponds to
row `i`

in `new_data`

.

## Details

`new_data`

must have the same columns with equivalent types as the data
used to train `object`

. Also, factors in `new_data`

must not have levels
that were not in the data used to train `object`

.

`pred_horizon`

values should not exceed the maximum follow-up time in
`object`

's training data, but if you truly want to do this, set
`boundary_checks = FALSE`

and you can use a `pred_horizon`

as large
as you want. Note that predictions beyond the maximum follow-up time
in the `object`

's training data are equal to predictions at the
maximum follow-up time, because `aorsf`

does not estimate survival
beyond its maximum observed time.

If unspecified, `pred_horizon`

may be automatically specified as the value
used for `oobag_pred_horizon`

when `object`

was created (see orsf).

## Examples

Begin by fitting an ORSF ensemble:

```
library(aorsf)
set.seed(329730)
index_train <- sample(nrow(pbc_orsf), 150)
pbc_orsf_train <- pbc_orsf[index_train, ]
pbc_orsf_test <- pbc_orsf[-index_train, ]
fit <- orsf(data = pbc_orsf_train,
formula = Surv(time, status) ~ . - id,
oobag_pred_horizon = 365.25 * 5)
```

Predict risk, survival, or cumulative hazard at one or several times:

```
# predicted risk, the default
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'risk',
pred_horizon = c(500, 1000, 1500))
```

```
## [,1] [,2] [,3]
## [1,] 0.49679905 0.77309053 0.90830168
## [2,] 0.03363621 0.08527972 0.17061414
## [3,] 0.15129784 0.30402666 0.43747212
## [4,] 0.01152480 0.02950914 0.07068198
## [5,] 0.01035341 0.01942262 0.05117679
```

```
# predicted survival, i.e., 1 - risk
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'surv',
pred_horizon = c(500, 1000, 1500))
```

```
## [,1] [,2] [,3]
## [1,] 0.5032009 0.2269095 0.09169832
## [2,] 0.9663638 0.9147203 0.82938586
## [3,] 0.8487022 0.6959733 0.56252788
## [4,] 0.9884752 0.9704909 0.92931802
## [5,] 0.9896466 0.9805774 0.94882321
```

```
# predicted cumulative hazard function
# (expected number of events for person i at time j)
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'chf',
pred_horizon = c(500, 1000, 1500))
```

```
## [,1] [,2] [,3]
## [1,] 0.74442414 1.39538511 1.78344589
## [2,] 0.03473938 0.10418984 0.24047328
## [3,] 0.19732086 0.47015754 0.73629459
## [4,] 0.01169147 0.03223257 0.09564168
## [5,] 0.01072007 0.02240040 0.06464319
```

Predict mortality, defined as the number of events in the forest’s population if all observations had characteristics like the current observation. This type of prediction does not require you to specify a prediction horizon

```
predict(fit,
new_data = pbc_orsf_test[1:5, ],
pred_type = 'mort')
```

```
## [,1]
## [1,] 83.08611
## [2,] 27.48146
## [3,] 43.52432
## [4,] 15.20281
## [5,] 10.56334
```