Skip to contents

This article covers core features of the aorsf package.

Background: ORSF

The oblique random survival forest (ORSF) is an extension of the axis-based RSF algorithm.

  • See orsf for more details on ORSFs.

  • see the arXiv paper for more details on algorithms used specifically by aorsf.

Accelerated ORSF

The purpose of aorsf (‘a’ is short for accelerated) is to provide routines to fit ORSFs that will scale adequately to large data sets. The fastest algorithm available in the package is the accelerated ORSF model, which is the default method used by orsf():


library(aorsf)

set.seed(329)

orsf_fit <- orsf(data = pbc_orsf, 
                 formula = Surv(time, status) ~ . - id)

orsf_fit
#> ---------- Oblique random survival forest
#> 
#>      Linear combinations: Accelerated
#>           N observations: 276
#>                 N events: 111
#>                  N trees: 500
#>       N predictors total: 17
#>    N predictors per node: 5
#>  Average leaves per tree: 25
#> Min observations in leaf: 5
#>       Min events in leaf: 1
#>           OOB stat value: 0.84
#>            OOB stat type: Harrell's C-statistic
#>      Variable importance: anova
#> 
#> -----------------------------------------

you may notice that the first input of aorsf is data. This is a design choice that makes it easier to use orsf with pipes (i.e., %>% or |>). For instance,

library(dplyr)

orsf_fit <- pbc_orsf |> 
 select(-id) |> 
 orsf(formula = Surv(time, status) ~ .)

Interpretation

aorsf includes several functions dedicated to interpretation of ORSFs, both through estimation of partial dependence and variable importance.

Variable importance

aorsf provides multiple ways to compute variable importance.

  • To compute negation importance, ORSF multiplies each coefficient of that variable by -1 and then re-computes the out-of-sample (sometimes referred to as out-of-bag) accuracy of the ORSF model.

    
    orsf_vi_negate(orsf_fit)
    #>         bili          age       copper      protime      albumin          ast 
    #>  0.076370077  0.027401542  0.025057304  0.013023547  0.010002084  0.006355491 
    #>          sex         chol      ascites      spiders     platelet        edema 
    #>  0.006042926  0.005782455  0.004584288  0.004323817  0.002396333  0.001638486 
    #>       hepato        stage          trt         trig 
    #>  0.000573036 -0.001041884 -0.002031673 -0.004167535
  • You can also compute variable importance using permutation, a more classical approach.

    
    orsf_vi_permute(orsf_fit)
    #>          bili           age       protime       albumin          chol 
    #>  0.0108355907  0.0089081059  0.0057824547  0.0032819337  0.0026568035 
    #>        copper      platelet       ascites         edema           ast 
    #>  0.0025005209  0.0019274849  0.0018753907  0.0013705732  0.0012502605 
    #>       spiders         stage           sex        hepato           trt 
    #>  0.0010939779  0.0007814128  0.0000000000 -0.0005209419 -0.0005730360 
    #>      alk.phos          trig 
    #> -0.0006772244 -0.0020316733
  • A faster alternative to permutation and negation importance is ANOVA importance, which computes the proportion of times each variable obtains a low p-value (p < 0.01) while the forest is grown.

    
    orsf_vi_anova(orsf_fit)
    #>    ascites       bili      edema     copper    albumin        age    protime 
    #> 0.38326586 0.27203454 0.23833229 0.20161087 0.17501252 0.16939891 0.15407334 
    #>       chol      stage        ast    spiders     hepato        sex   alk.phos 
    #> 0.14196607 0.13971368 0.12955466 0.12152358 0.11651962 0.11271975 0.09600998 
    #>       trig   platelet        trt 
    #> 0.09560853 0.07760141 0.07131324

Partial dependence (PD)

Partial dependence (PD) shows the expected prediction from a model as a function of a single predictor or multiple predictors. The expectation is marginalized over the values of all other predictors, giving something like a multivariable adjusted estimate of the model’s prediction.

For more on PD, see the vignette

Individual conditional expectations (ICE)

Unlike partial dependence, which shows the expected prediction as a function of one or multiple predictors, individual conditional expectations (ICE) show the prediction for an individual observation as a function of a predictor.

For more on ICE, see the vignette

What about the original ORSF?

The original ORSF (i.e., obliqueRSF) used glmnet to find linear combinations of inputs. aorsf allows users to implement this approach using the orsf_control_net() function:


orsf_net <- orsf(data = pbc_orsf, 
                 formula = Surv(time, status) ~ . - id, 
                 control = orsf_control_net(),
                 n_tree = 50)

net forests fit a lot faster than the original ORSF function in obliqueRSF. However, net forests are still much slower than cph ones:


# tracking how long it takes to fit 50 glmnet trees
print(
 t1 <- system.time(
  orsf(data = pbc_orsf, 
       formula = Surv(time, status) ~ . - id, 
       control = orsf_control_net(),
       n_tree = 50)
 )
)
#>    user  system elapsed 
#>   2.895   0.044   2.939

# and how long it takes to fit 50 cph trees
print(
 t2 <- system.time(
  orsf(data = pbc_orsf, 
       formula = Surv(time, status) ~ . - id, 
       control = orsf_control_cph(),
       n_tree = 50)
 )
)
#>    user  system elapsed 
#>   0.066   0.000   0.066

t1['elapsed'] / t2['elapsed']
#> elapsed 
#> 44.5303

aorsf and other machine learning software

The unique feature of aorsf is its fast algorithms to fit ORSF ensembles. RLT and obliqueRSF both fit oblique random survival forests, but aorsf does so faster. ranger and randomForestSRC fit survival forests, but neither package supports oblique splitting. obliqueRF fits oblique random forests for classification and regression, but not survival. PPforest fits oblique random forests for classification but not survival.

Note: The default prediction behavior for aorsf models is to produce predicted risk at a specific prediction horizon, which is not the default for ranger or randomForestSRC. I think this will change in the future, as computing time independent predictions with aorsf could be helpful.