Skip to contents

This is the first step of codify() %>% classify() %>% index(). The function combines case data from one data set with related code data from a second source, possibly limited to codes valid at certain time points relative to case dates.

Usage

codify(x, codedata, ..., id, code, date = NULL, code_date = NULL, days = NULL)

# S3 method for class 'data.frame'
codify(x, ..., id, date = NULL, days = NULL)

# S3 method for class 'data.table'
codify(
  x,
  codedata,
  ...,
  id,
  code,
  date = NULL,
  code_date = NULL,
  days = NULL,
  alnum = FALSE,
  .copy = NA
)

# S3 method for class 'codified'
print(x, ..., n = 10)

Arguments

x

data set with mandatory character id column (identified by argument id = "<col_name>"), and optional Date of interest (identified by argument date = "<col_name>"). Alternatively, the output from codify()

codedata

additional data with columns including case id (character), code and an optional date (Date) for each code. An optional column condition might distinguish codes/dates with certain characteristics (see example).

...

arguments passed between methods

id, code, date, code_date

column names with case id (character from x and codedata), code (from x) and optional date (Date from x) and code_date (Date from codedata).

days

numeric vector of length two with lower and upper bound for range of relevant days relative to date. See "Relevant period".

alnum

Should codes be cleaned from all non alphanumeric characters?

.copy

Should the object be copied internally by data.table::copy()? NA (by default) means that objects smaller than 1 GB are copied. If the size is larger, the argument must be set explicitly. Set TRUE to make copies regardless of object size. This is recommended if enough RAM is available. If set to FALSE, calculations might be carried out but the object will be changed by reference. IMPORTANT! This might lead to undesired consequences and should only be used if absolutely necessary!

n

number of rows to preview as tibble. The output is technically a data.table::data.table, which might be an unusual format to look at. Use n = NULL to print the object as is.

Value

Object of class codified (inheriting from data.table::data.table). Essentially x with additional columns: code, code_date: left joined from codedata or NA if no match within period. in_period: Boolean indicator if the case had at least one code within the specified period.

The output has one row for each combination of "id" from x and "code" from codedata. Rows from x might be repeated accordingly.

Relevant period

Some examples for argument days:

  • c(-365, -1): window of one year prior to the date column of x. Useful for patient comorbidity.

  • c(1, 30): window of 30 days after date. Useful for adverse events after a surgical procedure.

  • c(-Inf, Inf): no limitation on non-missing dates.

  • NULL: no time limitation at all.

See also

Other verbs: categorize(), classify(), index_fun

Examples

# Codify all patients from `ex_people` with their ICD-10 codes from `ex_icd10`
x <- codify(ex_people, ex_icd10, id = "name", code = "icd10")
x
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 700 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 10 × 6
#>    name                admission  icd10 hdia  in_period surgery   
#>    <chr>               <date>     <chr> <lgl> <lgl>     <date>    
#>  1 Archer, Leon Hunter 2022-12-27 B469  FALSE TRUE      2022-10-24
#>  2 Archer, Leon Hunter 2022-05-06 E012  FALSE TRUE      2022-10-24
#>  3 Archer, Leon Hunter 2023-01-13 R900  FALSE TRUE      2022-10-24
#>  4 Archer, Leon Hunter 2022-07-12 V7413 FALSE TRUE      2022-10-24
#>  5 Archer, Leon Hunter 2022-05-28 V8698 FALSE TRUE      2022-10-24
#>  6 Archer, Leon Hunter 2022-06-18 X3403 FALSE TRUE      2022-10-24
#>  7 Archer, Leon Hunter 2022-06-17 X4128 FALSE TRUE      2022-10-24
#>  8 Archer, Leon Hunter 2022-07-04 Z752  FALSE TRUE      2022-10-24
#>  9 Awtrey, Antonio     2023-02-24 N608  FALSE TRUE      2023-02-19
#> 10 Awtrey, Antonio     2022-12-27 W0341 FALSE TRUE      2023-02-19

# Only consider codes if recorded at hospital admissions within one year prior
# to surgery
codify(
  ex_people,
  ex_icd10,
  id        = "name",
  code      = "icd10",
  date      = "surgery",
  code_date = "admission",
  days      = c(-365, 0)   # admission during one year before surgery
)
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 378 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 10 × 6
#>    name                surgery    admission  icd10 hdia  in_period
#>    <chr>               <date>     <date>     <chr> <lgl> <lgl>    
#>  1 Archer, Leon Hunter 2022-10-24 2022-05-06 E012  FALSE TRUE     
#>  2 Archer, Leon Hunter 2022-10-24 2022-05-28 V8698 FALSE TRUE     
#>  3 Archer, Leon Hunter 2022-10-24 2022-06-17 X4128 FALSE TRUE     
#>  4 Archer, Leon Hunter 2022-10-24 2022-06-18 X3403 FALSE TRUE     
#>  5 Archer, Leon Hunter 2022-10-24 2022-07-04 Z752  FALSE TRUE     
#>  6 Archer, Leon Hunter 2022-10-24 2022-07-12 V7413 FALSE TRUE     
#>  7 Awtrey, Antonio     2023-02-19 2022-06-14 X3322 FALSE TRUE     
#>  8 Awtrey, Antonio     2023-02-19 2022-09-04 Y1614 FALSE TRUE     
#>  9 Awtrey, Antonio     2023-02-19 2022-09-07 X7564 FALSE TRUE     
#> 10 Awtrey, Antonio     2023-02-19 2022-10-25 X6542 FALSE TRUE     

# Only consider codes if recorded after surgery
codify(
  ex_people,
  ex_icd10,
  id        = "name",
  code      = "icd10",
  date      = "surgery",
  code_date = "admission",
  days      = c(1, Inf)     # admission any time after surgery
)
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 355 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 10 × 6
#>    name                surgery    admission  icd10 hdia  in_period
#>    <chr>               <date>     <date>     <chr> <lgl> <lgl>    
#>  1 Archer, Leon Hunter 2022-10-24 2022-12-27 B469  FALSE TRUE     
#>  2 Archer, Leon Hunter 2022-10-24 2023-01-13 R900  FALSE TRUE     
#>  3 Awtrey, Antonio     2023-02-19 2023-02-24 N608  FALSE TRUE     
#>  4 Bammesberger, Jozi  2022-08-24 2022-09-27 V4931 FALSE TRUE     
#>  5 Bammesberger, Jozi  2022-08-24 2022-10-11 V4960 FALSE TRUE     
#>  6 Bammesberger, Jozi  2022-08-24 2022-12-01 X6414 FALSE TRUE     
#>  7 Bammesberger, Jozi  2022-08-24 2022-12-20 Y1513 FALSE TRUE     
#>  8 Bammesberger, Jozi  2022-08-24 2023-03-10 P293A FALSE TRUE     
#>  9 Banks, Silbret      2022-11-17 2022-12-05 D229D FALSE TRUE     
#> 10 Banks, Silbret      2022-11-17 2022-12-25 V7452 TRUE  TRUE     


# Dirty code data ---------------------------------------------------------

# Assume that codes contain unwanted "dirty" characters
# Those could for example be a dot used by ICD-10 (i.e. X12.3 instead of X123)
dirt <- c(strsplit(c("!#%&/()=?`,.-_"), split = ""), recursive = TRUE)
rdirt <- function(x) sample(x, nrow(ex_icd10), replace = TRUE)
sub <- function(i) substr(ex_icd10$icd10, i, i)
ex_icd10$icd10 <-
  paste0(
    rdirt(dirt), sub(1),
    rdirt(dirt), sub(2),
    rdirt(dirt), sub(3),
    rdirt(dirt), sub(4),
    rdirt(dirt), sub(5)
  )
head(ex_icd10)
#> # A tibble: 6 × 4
#>   name                 admission  icd10      hdia 
#>   <chr>                <date>     <chr>      <lgl>
#> 1 Tran, Kenneth        2022-09-11 -S_1_3,4.A FALSE
#> 2 Tran, Kenneth        2023-02-25 )W/3(3&1)9 FALSE
#> 3 Tran, Kenneth        2023-02-04 .Y_0%2?6?2 TRUE 
#> 4 Tran, Kenneth        2022-12-28 /X-0/4%8-8 FALSE
#> 5 Sommerville, Dominic 2023-02-16 (V!8-1)0?4 FALSE
#> 6 Sommerville, Dominic 2022-09-27 &B=8_5%3-  FALSE

# Use `alnum = TRUE` to ignore non alphanumeric characters
codify(ex_people, ex_icd10, id = "name", code = "icd10", alnum = TRUE)
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 700 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 10 × 6
#>    name                admission  icd10 hdia  in_period surgery   
#>    <chr>               <date>     <chr> <lgl> <lgl>     <date>    
#>  1 Archer, Leon Hunter 2022-06-18 X3403 FALSE TRUE      2022-10-24
#>  2 Archer, Leon Hunter 2022-05-28 V8698 FALSE TRUE      2022-10-24
#>  3 Archer, Leon Hunter 2023-01-13 R900  FALSE TRUE      2022-10-24
#>  4 Archer, Leon Hunter 2022-07-04 Z752  FALSE TRUE      2022-10-24
#>  5 Archer, Leon Hunter 2022-12-27 B469  FALSE TRUE      2022-10-24
#>  6 Archer, Leon Hunter 2022-06-17 X4128 FALSE TRUE      2022-10-24
#>  7 Archer, Leon Hunter 2022-05-06 E012  FALSE TRUE      2022-10-24
#>  8 Archer, Leon Hunter 2022-07-12 V7413 FALSE TRUE      2022-10-24
#>  9 Awtrey, Antonio     2023-02-24 N608  FALSE TRUE      2023-02-19
#> 10 Awtrey, Antonio     2022-12-27 W0341 FALSE TRUE      2023-02-19



# Big data ----------------------------------------------------------------

# If `data` or `codedata` are large compared to available
# Random Access Memory (RAM) it might not be possible to make internal copies
# of those objects. Setting `.copy = FALSE` might help to overcome such problems

# If no copies are made internally, however, the input objects (if data tables)
# would change in the global environment
x2 <- data.table::as.data.table(ex_icd10)
head(x2) # Look at the "icd10" column (with dirty data)
#>                    name  admission      icd10   hdia
#>                  <char>     <Date>     <char> <lgcl>
#> 1:        Tran, Kenneth 2022-09-11 -S_1_3,4.A  FALSE
#> 2:        Tran, Kenneth 2023-02-25 )W/3(3&1)9  FALSE
#> 3:        Tran, Kenneth 2023-02-04 .Y_0%2?6?2   TRUE
#> 4:        Tran, Kenneth 2022-12-28 /X-0/4%8-8  FALSE
#> 5: Sommerville, Dominic 2023-02-16 (V!8-1)0?4  FALSE
#> 6: Sommerville, Dominic 2022-09-27  &B=8_5%3-  FALSE

# Use `alnum = TRUE` combined with `.copy = FALSE`
codify(ex_people, x2, id = "name", code = "icd10", alnum = TRUE, .copy = FALSE)
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 700 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 10 × 6
#>    name                admission  icd10 hdia  in_period surgery   
#>    <chr>               <date>     <chr> <lgl> <lgl>     <date>    
#>  1 Archer, Leon Hunter 2023-01-13 R900  FALSE TRUE      2022-10-24
#>  2 Archer, Leon Hunter 2022-06-18 X3403 FALSE TRUE      2022-10-24
#>  3 Archer, Leon Hunter 2022-06-17 X4128 FALSE TRUE      2022-10-24
#>  4 Archer, Leon Hunter 2022-07-04 Z752  FALSE TRUE      2022-10-24
#>  5 Archer, Leon Hunter 2022-05-06 E012  FALSE TRUE      2022-10-24
#>  6 Archer, Leon Hunter 2022-07-12 V7413 FALSE TRUE      2022-10-24
#>  7 Archer, Leon Hunter 2022-05-28 V8698 FALSE TRUE      2022-10-24
#>  8 Archer, Leon Hunter 2022-12-27 B469  FALSE TRUE      2022-10-24
#>  9 Awtrey, Antonio     2022-10-25 X6542 FALSE TRUE      2023-02-19
#> 10 Awtrey, Antonio     2023-01-27 X4078 FALSE TRUE      2023-02-19

# Even though no explicit assignment was specified
# (neither for the output of codify(), nor to explicitly alter `x2`,
# the `x2` object has changed (look at the "icd10" column!):
head(x2)
#>                    name  admission  icd10   hdia
#>                  <char>     <Date> <char> <lgcl>
#> 1:        Tran, Kenneth 2022-09-11  S134A  FALSE
#> 2:        Tran, Kenneth 2023-02-25  W3319  FALSE
#> 3:        Tran, Kenneth 2023-02-04  Y0262   TRUE
#> 4:        Tran, Kenneth 2022-12-28  X0488  FALSE
#> 5: Sommerville, Dominic 2023-02-16  V8104  FALSE
#> 6: Sommerville, Dominic 2022-09-27   B853  FALSE

# Hence, the `.copy` argument should only be used if necessary
# and if so, with caution!


# print.codify() ----------------------------------------------------------

x # Preview first 10 rows as a tibble
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 700 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 10 × 6
#>    name                admission  icd10 hdia  in_period surgery   
#>    <chr>               <date>     <chr> <lgl> <lgl>     <date>    
#>  1 Archer, Leon Hunter 2022-12-27 B469  FALSE TRUE      2022-10-24
#>  2 Archer, Leon Hunter 2022-05-06 E012  FALSE TRUE      2022-10-24
#>  3 Archer, Leon Hunter 2023-01-13 R900  FALSE TRUE      2022-10-24
#>  4 Archer, Leon Hunter 2022-07-12 V7413 FALSE TRUE      2022-10-24
#>  5 Archer, Leon Hunter 2022-05-28 V8698 FALSE TRUE      2022-10-24
#>  6 Archer, Leon Hunter 2022-06-18 X3403 FALSE TRUE      2022-10-24
#>  7 Archer, Leon Hunter 2022-06-17 X4128 FALSE TRUE      2022-10-24
#>  8 Archer, Leon Hunter 2022-07-04 Z752  FALSE TRUE      2022-10-24
#>  9 Awtrey, Antonio     2023-02-24 N608  FALSE TRUE      2023-02-19
#> 10 Awtrey, Antonio     2022-12-27 W0341 FALSE TRUE      2023-02-19
print(x, n = 20) # Preview first 20 rows as a tibble
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 700 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 20 × 6
#>    name                admission  icd10 hdia  in_period surgery   
#>    <chr>               <date>     <chr> <lgl> <lgl>     <date>    
#>  1 Archer, Leon Hunter 2022-12-27 B469  FALSE TRUE      2022-10-24
#>  2 Archer, Leon Hunter 2022-05-06 E012  FALSE TRUE      2022-10-24
#>  3 Archer, Leon Hunter 2023-01-13 R900  FALSE TRUE      2022-10-24
#>  4 Archer, Leon Hunter 2022-07-12 V7413 FALSE TRUE      2022-10-24
#>  5 Archer, Leon Hunter 2022-05-28 V8698 FALSE TRUE      2022-10-24
#>  6 Archer, Leon Hunter 2022-06-18 X3403 FALSE TRUE      2022-10-24
#>  7 Archer, Leon Hunter 2022-06-17 X4128 FALSE TRUE      2022-10-24
#>  8 Archer, Leon Hunter 2022-07-04 Z752  FALSE TRUE      2022-10-24
#>  9 Awtrey, Antonio     2023-02-24 N608  FALSE TRUE      2023-02-19
#> 10 Awtrey, Antonio     2022-12-27 W0341 FALSE TRUE      2023-02-19
#> 11 Awtrey, Antonio     2022-06-14 X3322 FALSE TRUE      2023-02-19
#> 12 Awtrey, Antonio     2023-01-27 X4078 FALSE TRUE      2023-02-19
#> 13 Awtrey, Antonio     2022-10-25 X6542 FALSE TRUE      2023-02-19
#> 14 Awtrey, Antonio     2022-09-07 X7564 FALSE TRUE      2023-02-19
#> 15 Awtrey, Antonio     2022-12-21 Y0492 FALSE TRUE      2023-02-19
#> 16 Awtrey, Antonio     2022-09-04 Y1614 FALSE TRUE      2023-02-19
#> 17 Bammesberger, Jozi  2023-03-10 P293A FALSE TRUE      2022-08-24
#> 18 Bammesberger, Jozi  2022-04-21 V1051 FALSE TRUE      2022-08-24
#> 19 Bammesberger, Jozi  2022-06-28 V1392 FALSE TRUE      2022-08-24
#> 20 Bammesberger, Jozi  2022-09-27 V4931 FALSE TRUE      2022-08-24
print(x, n = NULL) # Print as data.table (ignoring the 'classified' class)
#> 
#> The printed data is of class: codified, data.table, data.frame.
#> It has 700 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#> 
#> # A tibble: 10 × 6
#>    name                admission  icd10 hdia  in_period surgery   
#>    <chr>               <date>     <chr> <lgl> <lgl>     <date>    
#>  1 Archer, Leon Hunter 2022-12-27 B469  FALSE TRUE      2022-10-24
#>  2 Archer, Leon Hunter 2022-05-06 E012  FALSE TRUE      2022-10-24
#>  3 Archer, Leon Hunter 2023-01-13 R900  FALSE TRUE      2022-10-24
#>  4 Archer, Leon Hunter 2022-07-12 V7413 FALSE TRUE      2022-10-24
#>  5 Archer, Leon Hunter 2022-05-28 V8698 FALSE TRUE      2022-10-24
#>  6 Archer, Leon Hunter 2022-06-18 X3403 FALSE TRUE      2022-10-24
#>  7 Archer, Leon Hunter 2022-06-17 X4128 FALSE TRUE      2022-10-24
#>  8 Archer, Leon Hunter 2022-07-04 Z752  FALSE TRUE      2022-10-24
#>  9 Awtrey, Antonio     2023-02-24 N608  FALSE TRUE      2023-02-19
#> 10 Awtrey, Antonio     2022-12-27 W0341 FALSE TRUE      2023-02-19