Popular file readers such as readr::read_delim()
perform datatype
conversion by default, which can interfere with daiquiri's ability to detect
non-conformant values. Use this function instead to ensure optimal
compatibility with daiquiri's features.
Usage
read_data(
file,
delim = NULL,
col_names = TRUE,
quote = "\"",
trim_ws = TRUE,
comment = "",
skip = 0,
n_max = Inf,
show_progress = TRUE
)
Arguments
- file
A string containing path of file containing data to load, or a URL starting
http://
,file://
, etc. Compressed files with extension.gz
,.bz2
,.xz
and.zip
are supported.- delim
Single character used to separate fields within a record. E.g.
","
or"\t"
- col_names
Either
TRUE
,FALSE
or a character vector of column names. IfTRUE
, the first row of the input will be used as the column names, and will not be included in the data frame. IfFALSE
, column names will be generated automatically. Default =TRUE
- quote
Single character used to quote strings.
- trim_ws
Should leading and trailing whitespace be trimmed from each field?
- comment
A string used to identify comments. Any text after the comment characters will be silently ignored
- skip
Number of lines to skip before reading data. If
comment
is supplied any commented lines are ignored after skipping- n_max
Maximum number of lines to read.
- show_progress
Display a progress bar? Default =
TRUE
Details
This function is aimed at non-expert users of R, and operates as a restricted
implementation of readr::read_delim()
. If you prefer to use read_delim()
directly, ensure you set the following parameters: col_types = readr::cols(.default = "c")
and na = character()
Examples
raw_data <- read_data(
system.file("extdata", "example_prescriptions.csv", package = "daiquiri"),
delim = ",",
col_names = TRUE
)
head(raw_data)
#> # A tibble: 6 × 8
#> PrescriptionID PrescriptionDate AdmissionDate Drug Dose DoseUnit PatientID
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 6000 2021-01-01 00:00:… 2020-12-31 Ceft… 500 mg 4993679
#> 2 6001 NULL 2020-12-31 Fluc… 1000 mg 819452
#> 3 6002 NULL 2020-12-30 Teic… 400 mg 275597
#> 4 6003 2021-01-01 01:00:… 1800-01-01 Fluc… 1000 NULL 819452
#> 5 6004 2021-01-01 02:00:… 1800-01-01 Fluc… 1000 NULL 528071
#> 6 6005 2021-01-01 03:00:… 2020-12-30 Co-a… 1.2 g 1001434
#> # ℹ 1 more variable: Location <chr>