While skim is designed around having an opinionated set of defaults, you can use this function to change the summary statistics that it returns.
Usage
skim_with(
...,
base = sfl(n_missing = n_missing, complete_rate = complete_rate),
append = TRUE
)
Details
skim_with()
is a closure: a function that returns a new function. This
lets you have several skimming functions in a single R session, but it
also means that you need to assign the return of skim_with()
before
you can use it.
You assign values within skim_with
by using the sfl()
helper (skimr
function list). This helper behaves mostly like dplyr::funs()
, but lets
you also identify which skimming functions you want to remove, by setting
them to NULL
. Assign an sfl
to each column type that you wish to modify.
Functions that summarize all data types, and always return the same type
of value, can be assigned to the base
argument. The default base skimmers
compute the number of missing values n_missing()
and the rate of values
being complete, i.e. not missing, complete_rate()
.
When append = TRUE
and local skimmers have names matching the names of
entries in the default skim_function_list
, the values in the default list
are overwritten. Similarly, if NULL
values are passed within sfl()
, these
default skimmers are dropped. Otherwise, if append = FALSE
, only the
locally-provided skimming functions are used.
Note that append
only applies to the typed
skimmers (i.e. non-base).
See get_default_skimmer_names()
for a list of defaults.
Examples
# Use new functions for numeric functions. If you don't provide a name,
# one will be automatically generated.
my_skim <- skim_with(numeric = sfl(median, mad), append = FALSE)
my_skim(faithful)
#> ── Data Summary ────────────────────────
#> Values
#> Name faithful
#> Number of rows 272
#> Number of columns 2
#> _______________________
#> Column type frequency:
#> numeric 2
#> ________________________
#> Group variables None
#>
#> ── Variable type: numeric ──────────────────────────────────────────────────────
#> skim_variable n_missing complete_rate median mad
#> 1 eruptions 0 1 4 0.951
#> 2 waiting 0 1 76 11.9
# If you want to remove a particular skimmer, set it to NULL
# This removes the inline histogram
my_skim <- skim_with(numeric = sfl(hist = NULL))
my_skim(faithful)
#> ── Data Summary ────────────────────────
#> Values
#> Name faithful
#> Number of rows 272
#> Number of columns 2
#> _______________________
#> Column type frequency:
#> numeric 2
#> ________________________
#> Group variables None
#>
#> ── Variable type: numeric ──────────────────────────────────────────────────────
#> skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
#> 1 eruptions 0 1 3.49 1.14 1.6 2.16 4 4.45 5.1
#> 2 waiting 0 1 70.9 13.6 43 58 76 82 96
# This works with multiple skimmers. Just match names to overwrite
my_skim <- skim_with(numeric = sfl(iqr = IQR, p25 = NULL, p75 = NULL))
my_skim(faithful)
#> ── Data Summary ────────────────────────
#> Values
#> Name faithful
#> Number of rows 272
#> Number of columns 2
#> _______________________
#> Column type frequency:
#> numeric 2
#> ________________________
#> Group variables None
#>
#> ── Variable type: numeric ──────────────────────────────────────────────────────
#> skim_variable n_missing complete_rate mean sd p0 p50 p100 hist iqr
#> 1 eruptions 0 1 3.49 1.14 1.6 4 5.1 ▇▂▂▇▇ 2.29
#> 2 waiting 0 1 70.9 13.6 43 76 96 ▃▃▂▇▂ 24
# Alternatively, set `append = FALSE` to replace the skimmers of a type.
my_skim <- skim_with(numeric = sfl(mean = mean, sd = sd), append = FALSE)
# Skimmers are unary functions. Partially apply arguments during assigment.
# For example, you might want to remove NA values.
my_skim <- skim_with(numeric = sfl(iqr = ~ IQR(., na.rm = TRUE)))
# Set multiple types of skimmers simultaneously.
my_skim <- skim_with(numeric = sfl(mean), character = sfl(length))
# Or pass the same as a list, unquoting the input.
my_skimmers <- list(numeric = sfl(mean), character = sfl(length))
my_skim <- skim_with(!!!my_skimmers)
# Use the v1 base skimmers instead.
my_skim <- skim_with(base = sfl(
missing = n_missing,
complete = n_complete,
n = length
))
# Remove the base skimmers entirely
my_skim <- skim_with(base = NULL)