Each column in the source dataset must be assigned to a particular ft_xx
depending on the type of data that it contains. This is done through a
field_types()
specification.
Usage
ft_timepoint(includes_time = TRUE, format = "", na = NULL)
ft_uniqueidentifier(na = NULL)
ft_categorical(aggregate_by_each_category = FALSE, na = NULL)
ft_numeric(na = NULL)
ft_datetime(includes_time = TRUE, format = "", na = NULL)
ft_freetext(na = NULL)
ft_simple(na = NULL)
ft_strata(na = NULL)
ft_ignore()
Arguments
- includes_time
If
TRUE
, additional aggregated values will be generated using the time portion (and if no time portion is present then midnight will be assumed). IfFALSE
, aggregated values will ignore any time portion. Default =TRUE
- format
Where datetime values are not in the format
YYYY-MM-DD
orYYYY-MM-DD HH:MM:SS
, an alternative format can be specified at the per field level, usingreadr::col_datetime()
format specifications, e.g.format = "%d/%m/%Y"
. When a format is supplied, it must match the complete string.- na
Column-specific vector of strings that should be interpreted as missing values (in addition to those specified at dataset level)
- aggregate_by_each_category
If
TRUE
, aggregated values will be generated for each distinct subcategory as well as for the field overall. IfFALSE
, aggregated values will only be generated for the field overall. Default =FALSE
Details
ft_timepoint()
- identifies the data field which should
be used as the independent time variable. There should be one and only one
of these specified.
ft_uniqueidentifier()
- identifies data fields which
contain a (usually computer-generated) identifier for an entity, e.g. a
patient. It does not need to be unique within the dataset.
ft_categorical()
- identifies data fields which should
be treated as categorical.
ft_numeric()
- identifies data fields which contain numeric values that
should be treated as continuous. Any values which contain non-numeric
characters (including grouping marks) will be classed as non-conformant
ft_datetime()
- identifies data fields which contain date
values that should be treated as continuous.
ft_freetext()
- identifies data fields which contain
free text values. Only presence/missingness will be evaluated.
ft_simple()
- identifies data fields where you only
want presence/missingness to be evaluated (but which are not necessarily
free text).
ft_strata()
- identifies a categorical data field which should
be used to stratify the rest of the data.
ft_ignore()
- identifies data fields which should be
ignored. These will not be loaded.
Examples
fts <- field_types(
PatientID = ft_uniqueidentifier(),
TestID = ft_ignore(),
TestDate = ft_timepoint(),
TestName = ft_categorical(aggregate_by_each_category = FALSE),
TestResult = ft_numeric(),
ResultDate = ft_datetime(),
ResultComment = ft_freetext(),
Location = ft_categorical()
)
ft_simple()
#> $type
#> [1] "simple"
#>
#> $collector
#> <collector_character>
#>
#> $data_class
#> [1] "character"
#>
#> $aggregation_functions
#> [1] "n" "missing_n" "missing_perc"
#>
#> $na
#> NULL
#>
#> $options
#> NULL
#>
#> attr(,"class")
#> [1] "daiquiri_field_type_simple" "daiquiri_field_type"