R/codify.R
codify.Rd
This is the first step of codify() %>% classify() %>% index()
.
The function combines case data from one data set with related code data from
a second source, possibly limited to codes valid at certain time points
relative to case dates.
codify(x, codedata, ..., id, code, date = NULL, code_date = NULL, days = NULL) # S3 method for data.frame codify(x, ..., id, date = NULL, days = NULL) # S3 method for data.table codify( x, codedata, ..., id, code, date = NULL, code_date = NULL, days = NULL, alnum = FALSE, .copy = NA ) # S3 method for codified print(x, ..., n = 10)
x | data set with mandatory character id column
(identified by argument |
---|---|
codedata | additional data with columns
including case id ( |
... | arguments passed between methods |
id, code, date, code_date | column names with case id
( |
days | numeric vector of length two with lower and upper bound for range
of relevant days relative to |
alnum | Should codes be cleaned from all non alphanumeric characters? |
.copy | Should the object be copied internally by |
n | number of rows to preview as tibble.
The output is technically a data.table::data.table, which might be an
unusual format to look at. Use |
Object of class codified
(inheriting from data.table::data.table).
Essentially x
with additional columns:
code, code_date
: left joined from codedata
or NA
if no match within period. in_period
: Boolean indicator if the case
had at least one code within the specified period.
The output has one row for each combination of "id" from x
and
"code" from codedata
. Rows from x
might be repeated
accordingly.
Some examples for argument days
:
c(-365, -1)
: window of one year prior to the date
column of x
. Useful for patient comorbidity.
c(1, 30)
: window of 30 days after date
.
Useful for adverse events after a surgical procedure.
c(-Inf, Inf)
: no limitation on non-missing dates.
NULL
: no time limitation at all.
Other verbs:
categorize()
,
classify()
,
index_fun
# Codify all patients from `ex_people` with their ICD-10 codes from `ex_icd10` x <- codify(ex_people, ex_icd10, id = "name", code = "icd10") x#> #> The printed data is of class: codified, data.table, data.frame. #> It has 700 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern# Only consider codes if recorded at hospital admissions within one year prior # to surgery codify( ex_people, ex_icd10, id = "name", code = "icd10", date = "surgery", code_date = "admission", days = c(-365, 0) # admission during one year before surgery )#> #> The printed data is of class: codified, data.table, data.frame. #> It has 378 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern# Only consider codes if recorded after surgery codify( ex_people, ex_icd10, id = "name", code = "icd10", date = "surgery", code_date = "admission", days = c(1, Inf) # admission any time after surgery )#> #> The printed data is of class: codified, data.table, data.frame. #> It has 355 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern# Dirty code data --------------------------------------------------------- # Assume that codes contain unwanted "dirty" characters # Those could for example be a dot used by ICD-10 (i.e. X12.3 instead of X123) dirt <- c(strsplit(c("!#%&/()=?`,.-_"), split = ""), recursive = TRUE) rdirt <- function(x) sample(x, nrow(ex_icd10), replace = TRUE) sub <- function(i) substr(ex_icd10$icd10, i, i) ex_icd10$icd10 <- paste0( rdirt(dirt), sub(1), rdirt(dirt), sub(2), rdirt(dirt), sub(3), rdirt(dirt), sub(4), rdirt(dirt), sub(5) ) head(ex_icd10)#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern# Use `alnum = TRUE` to ignore non alphanumeric characters codify(ex_people, ex_icd10, id = "name", code = "icd10", alnum = TRUE)#> #> The printed data is of class: codified, data.table, data.frame. #> It has 700 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern# Big data ---------------------------------------------------------------- # If `data` or `codedata` are large compared to available # Random Access Memory (RAM) it might not be possible to make internal copies # of those objects. Setting `.copy = FALSE` might help to overcome such problems # If no copies are made internally, however, the input objects (if data tables) # would change in the global environment x2 <- data.table::as.data.table(ex_icd10) head(x2) # Look at the "icd10" column (with dirty data)#> name admission icd10 hdia #> 1: Tran, Kenneth 2020-08-02 -S_1_3,4.A FALSE #> 2: Tran, Kenneth 2021-01-16 )W/3(3&1)9 FALSE #> 3: Tran, Kenneth 2020-12-26 .Y_0%2?6?2 TRUE #> 4: Tran, Kenneth 2020-11-18 /X-0/4%8-8 FALSE #> 5: Sommerville, Dominic 2021-01-07 (V!8-1)0?4 FALSE #> 6: Sommerville, Dominic 2020-08-18 &B=8_5%3- FALSE# Use `alnum = TRUE` combined with `.copy = FALSE` codify(ex_people, x2, id = "name", code = "icd10", alnum = TRUE, .copy = FALSE)#> #> The printed data is of class: codified, data.table, data.frame. #> It has 700 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern# Even though no explicit assignment was specified # (neither for the output of codify(), nor to explicitly alter `x2`, # the `x2` object has changed (look at the "icd10" column!): head(x2)#> name admission icd10 hdia #> 1: Tran, Kenneth 2020-08-02 S134A FALSE #> 2: Tran, Kenneth 2021-01-16 W3319 FALSE #> 3: Tran, Kenneth 2020-12-26 Y0262 TRUE #> 4: Tran, Kenneth 2020-11-18 X0488 FALSE #> 5: Sommerville, Dominic 2021-01-07 V8104 FALSE #> 6: Sommerville, Dominic 2020-08-18 B853 FALSE# Hence, the `.copy` argument should only be used if necessary # and if so, with caution! # print.codify() ---------------------------------------------------------- x # Preview first 10 rows as a tibble#> #> The printed data is of class: codified, data.table, data.frame. #> It has 700 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern#> #> The printed data is of class: codified, data.table, data.frame. #> It has 700 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern#> #> The printed data is of class: codified, data.table, data.frame. #> It has 700 row(s). #> It is here previewed as a tibble #> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)! #>#> Error in gsub(finish, start, ..., fixed = TRUE): zero-length pattern