Popular file readers such as readr::read_delim() perform datatype
conversion by default, which can interfere with daiquiri's ability to detect
non-conformant values. Use this function instead to ensure optimal
compatibility with daiquiri's features.
Usage
read_data(
file,
delim = NULL,
col_names = TRUE,
quote = "\"",
trim_ws = TRUE,
comment = "",
skip = 0,
n_max = Inf,
show_progress = TRUE
)Arguments
- file
A string containing path of file containing data to load, or a URL starting
http://,file://, etc. Compressed files with extension.gz,.bz2,.xzand.zipare supported.- delim
Single character used to separate fields within a record. E.g.
","or"\t"- col_names
Either
TRUE,FALSEor a character vector of column names. IfTRUE, the first row of the input will be used as the column names, and will not be included in the data frame. IfFALSE, column names will be generated automatically. Default =TRUE- quote
Single character used to quote strings.
- trim_ws
Should leading and trailing whitespace be trimmed from each field?
- comment
A string used to identify comments. Any text after the comment characters will be silently ignored
- skip
Number of lines to skip before reading data. If
commentis supplied any commented lines are ignored after skipping- n_max
Maximum number of lines to read.
- show_progress
Display a progress bar? Default =
TRUE
Details
This function is aimed at non-expert users of R, and operates as a restricted
implementation of readr::read_delim(). If you prefer to use read_delim()
directly, ensure you set the following parameters: col_types = readr::cols(.default = "c") and na = character()
Examples
raw_data <- read_data(
system.file("extdata", "example_prescriptions.csv", package = "daiquiri"),
delim = ",",
col_names = TRUE
)
head(raw_data)
#> # A tibble: 6 × 8
#> PrescriptionID PrescriptionDate AdmissionDate Drug Dose DoseUnit PatientID
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 6000 2021-01-01 00:00:… 2020-12-31 Ceft… 500 mg 4993679
#> 2 6001 NULL 2020-12-31 Fluc… 1000 mg 819452
#> 3 6002 NULL 2020-12-30 Teic… 400 mg 275597
#> 4 6003 2021-01-01 01:00:… 1800-01-01 Fluc… 1000 NULL 819452
#> 5 6004 2021-01-01 02:00:… 1800-01-01 Fluc… 1000 NULL 528071
#> 6 6005 2021-01-01 03:00:… 2020-12-30 Co-a… 1.2 g 1001434
#> # ℹ 1 more variable: Location <chr>
