Variable selection
Arguments
- object
(ObliqueForest) a trained oblique random forest object (see orsf).
- n_predictor_min
(integer) the minimum number of predictors allowed
- verbose_progress
(logical) not implemented yet. Should progress be printed to the console?
Value
a data.table with four columns:
n_predictors: the number of predictors used
stat_value: the out-of-bag statistic
variables_included: the names of the variables included
predictors_included: the names of the predictors included
predictor_dropped: the predictor selected to be dropped
Details
The difference between variables_included
and predictors_included
is
referent coding. The variable
would be the name of a factor variable
in the training data, while the predictor
would be the name of that
same factor with the levels of the factor appended. For example, if
the variable is diabetes
with levels = c("no", "yes")
, then the
variable name is diabetes
and the predictor name is diabetes_yes
.
tree_seeds
should be specified in object
so that each successive run
of orsf
will be evaluated in the same out-of-bag samples as the initial
run.
Examples
object <- orsf(formula = time + status ~ .,
data = pbc_orsf,
n_tree = 25,
importance = 'anova')
orsf_vs(object, n_predictor_min = 15)
#> n_predictors stat_value variables_included
#> <int> <num> <list>
#> 1: 15 0.8356685 age,albumin,ascites,ast,bili,chol,...
#> 2: 16 0.8351997 age,albumin,ascites,ast,bili,chol,...
#> 3: 17 0.8296786 age,albumin,ascites,ast,bili,chol,...
#> 4: 18 0.8185322 age,albumin,alk.phos,ascites,ast,bili,...
#> predictors_included predictor_dropped
#> <list> <char>
#> 1: id,age,sex_f,ascites_1,spiders_1,edema_0.5,... platelet
#> 2: id,age,sex_f,ascites_1,hepato_1,spiders_1,... hepato_1
#> 3: id,trt_placebo,age,sex_f,ascites_1,hepato_1,... trt_placebo
#> 4: id,trt_placebo,age,sex_f,ascites_1,hepato_1,... alk.phos