This is the main function of the package, which relies of a triad of objects: (1) data with unit id:s and possible dates of interest; (2) codedata for corresponding units and with optional dates of interest and; (3) a classification scheme (classcodes object; cc) with regular expressions to identify and categorize relevant codes. The function combines the three underlying steps performed by codify(), classify() and index(). Relevant arguments are passed to those functions by codify_args and cc_args.

categorize(x, ...)

# S3 method for data.frame
categorize(x, ...)

# S3 method for tbl_df
categorize(x, ...)

# S3 method for data.table
categorize(x, ..., codedata, id, code, codify_args = list())

# S3 method for codified
categorize(
  x,
  ...,
  cc,
  index = NULL,
  cc_args = list(),
  check.names = TRUE,
  .data_cols = NULL
)

Arguments

x

data set with mandatory character id column (identified by argument id = "<col_name>"), and optional Date of interest (identified by argument date = "<col_name>"). Alternatively, the output from codify()

...

arguments passed between methods

codedata

external code data with mandatory character id column (identified by id = "<col_name>"), code column (identified by argument code = "<col_name>") and optional Date column (identified by codify_args = list(code_date = "<col_name>")).

id

name of unique character id column found in both xand codedata. (where it must not be unique).

code

name of code column in codedata.

codify_args

Lists of named arguments passed to codify()

cc

classcodes object (or name of a default object from all_classcodes()).

index

Argument passed to index(). A character vector of names of columns with index weights from the corresponding classcodes object (as supplied by the ccargument). See attr(cc, "indices") for available options. Set to FALSE if no index should be calculated. If NULL, the default, all available indices (from attr(cc, "indices")) are provided.

cc_args

List with named arguments passed to set_classcodes()

check.names

Column names are based on cc$group, which might include spaces. Those names are changed to syntactically correct names by check.names = TRUE. Syntactically invalid, but grammatically correct names might be preferred for presentation of the data as achieved by check.names = FALSE. Alternatively, if categorize is called repeatedly, longer informative names might be created by cc_args = list(tech_names = TRUE).

.data_cols

used internally

Value

Object of the same class as x with additional logical columns indicating membership of groups identified by the classcodes object (the cc argument). Numeric indices are also included if requested by the index argument.

See also

Other verbs: classify(), codify(), index_fun

Examples

# For some patient data (ex_people) and related hospital visit code data # with ICD 10-codes (ex_icd10), add the Elixhauser comorbidity # conditions based on all registered ICD10-codes categorize( x = ex_people, codedata = ex_icd10, cc = "elixhauser", id = "name", code = "icd10" )
#> Classification based on: icd10
#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern
# Add Charlson categories and two versions of a calculated index # ("quan_original" and "quan_updated"). categorize( x = ex_people, codedata = ex_icd10, cc = "charlson", id = "name", code = "icd10", index = c("quan_original", "quan_updated") )
#> Classification based on: icd10
#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern
# Only include recent hospital visits within 30 days before surgery, categorize( x = ex_people, codedata = ex_icd10, cc = "charlson", id = "name", code = "icd10", index = c("quan_original", "quan_updated"), codify_args = list( date = "surgery", days = c(-30, -1), code_date = "admission" ) )
#> Classification based on: icd10
#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern
# Multiple versions ------------------------------------------------------- # We can compare categorization by according to Quan et al. (2005); "icd10", # and Armitage et al. (2010); "icd10_rcs" (see `?charlson`) # Note the use of `tech_names = TRUE` to distinguish the column names from the # two versions. # We first specify some common settings ... ind <- c("quan_original", "quan_updated") cd <- list(date = "surgery", days = c(-30, -1), code_date = "admission") # ... we then categorize once with "icd10" as the default regular expression ... categorize( x = ex_people, codedata = ex_icd10, cc = "charlson", id = "name", code = "icd10", index = ind, codify_args = cd, cc_args = list(tech_names = TRUE) ) %>% # .. and once more with `regex = "icd10_rcs"` categorize( codedata = ex_icd10, cc = "charlson", id = "name", code = "icd10", index = ind, codify_args = cd, cc_args = list(regex = "icd10_rcs", tech_names = TRUE) )
#> Classification based on: icd10
#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern
# column names ------------------------------------------------------------ # Default column names are based on row names from corresponding classcodes # object but are modified to be syntactically correct. default <- categorize(ex_people, codedata = ex_icd10, cc = "elixhauser", id = "name", code = "icd10")
#> Classification based on: icd10
# Set `check.names = FALSE` to retain original names: original <- categorize( ex_people, codedata = ex_icd10, cc = "elixhauser", id = "name", code = "icd10", check.names = FALSE )
#> Classification based on: icd10
# Or use `tech_names = TRUE` for informative but long names (use case above) tech <- categorize(ex_people, codedata = ex_icd10, cc = "elixhauser", id = "name", code = "icd10", cc_args = list(tech_names = TRUE) )
#> Classification based on: icd10
# Compare tibble::tibble(names(default), names(original), names(tech))
#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern