Skip to contents

This function redacts the columns specified in columns in the data given in data using dittodb's standard redactors.

Usage

redact_columns(data, columns, ignore.case = TRUE, ...)

Arguments

data

a dataframe to redact

columns

character, the columns to redact

ignore.case

should case be ignored? (default: TRUE)

...

additional options to pass on to grep() when matching the column names

Value

data, with the columns specified in columns duly redacted

Details

The column names given in the columns argument are treated as regular expressions, however they always have ^ and $ added to the beginning and end of the strings. So if you would like to match any column that starts with the string sensitive (e.g. sensitive_name, sensitive_date) you could use "sensitive.* and this would catch all of those columns (though it would not catch a column called most_sensitive_name).

The standard redactors replace all values in the column with the following values based on the columns type:

  • integer -- 9L

  • numeric -- 9

  • character -- "[redacted]"

  • POSIXct (date times) -- as.POSIXct("1988-10-11T17:00:00", tz = tzone)

Examples

if (check_for_pkg("nycflights13", message)) {
  small_flights <- head(nycflights13::flights)

  # with no columns specified, redacting does nothing
  redact_columns(small_flights, columns = NULL)

  # integer
  redact_columns(small_flights, columns = c("arr_time"))

  # numeric
  redact_columns(small_flights, columns = c("arr_delay"))

  # characters
  redact_columns(small_flights, columns = c("origin", "dest"))

  # datetiems
  redact_columns(small_flights, columns = c("time_hour"))
}
#> # A tibble: 6 × 19
#>    year month   day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
#>   <int> <int> <int>    <int>       <int>   <dbl>   <int>   <int>   <dbl> <chr>  
#> 1  2013     1     1      517         515       2     830     819      11 UA     
#> 2  2013     1     1      533         529       4     850     830      20 UA     
#> 3  2013     1     1      542         540       2     923     850      33 AA     
#> 4  2013     1     1      544         545      -1    1004    1022     -18 B6     
#> 5  2013     1     1      554         600      -6     812     837     -25 DL     
#> 6  2013     1     1      554         558      -4     740     728      12 UA     
#> # … with 9 more variables: flight <int>, tailnum <chr>, origin <chr>,
#> #   dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
#> #   time_hour <dttm>, and abbreviated variable names ¹​sched_dep_time,
#> #   ²​dep_delay, ³​arr_time, ⁴​sched_arr_time, ⁵​arr_delay