Reads data from a Data Resource (in a Data Package) into a tibble (a
Tidyverse data frame).
The resource must be a Tabular Data Resource.
The function uses readr::read_delim()
to read CSV files, passing the
resource properties path
, CSV dialect, column names, data types, etc.
Column names are taken from the provided Table Schema (schema
), not from
the header in the CSV file(s).
Arguments
- package
Data Package object, as returned by
read_package()
orcreate_package()
.- resource_name
Name of the Data Resource.
- col_select
Character vector of the columns to include in the result, in the order provided. Selecting columns can improve read speed.
Value
A tibble::tibble()
with the Data Resource's tabular data.
If there are parsing problems, a warning will alert you.
You can retrieve the full details by calling problems()
on your data
frame.
Details
See vignette("data-resource")
, vignette("table-dialect")
and
vignette("table-schema")
to learn how this function implements the
Data Package standard.
See also
Other read functions:
read_package()
,
resources()
Examples
# Read a datapackage.json file
package <- read_package(
system.file("extdata", "v1", "datapackage.json", package = "frictionless")
)
package
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.
# Read data from the resource "observations"
read_resource(package, "observations")
#> # A tibble: 8 × 7
#> observation_id deployment_id timestamp scientific_name count
#> <chr> <chr> <dttm> <chr> <dbl>
#> 1 1-1 1 2020-09-28 00:13:07 Capreolus capreolus 1
#> 2 1-2 1 2020-09-28 15:59:17 Capreolus capreolus 1
#> 3 1-3 1 2020-09-28 16:35:23 Lepus europaeus 1
#> 4 1-4 1 2020-09-28 17:04:04 Lepus europaeus 1
#> 5 1-5 1 2020-09-28 19:19:54 Sus scrofa 2
#> 6 2-1 2 2021-10-01 01:25:06 Sus scrofa 1
#> 7 2-2 2 2021-10-01 01:25:06 Sus scrofa 1
#> 8 2-3 2 2021-10-01 04:47:30 Sus scrofa 1
#> # ℹ 2 more variables: life_stage <fct>, comments <chr>
# The above tibble is merged from 2 files listed in the resource path
package$resources[[2]]$path
#> [1] "observations_1.tsv" "observations_2.tsv"
# The column names and types are derived from the resource schema
purrr::map_chr(package$resources[[2]]$schema$fields, "name")
#> [1] "observation_id" "deployment_id" "timestamp" "scientific_name"
#> [5] "count" "life_stage" "comments"
purrr::map_chr(package$resources[[2]]$schema$fields, "type")
#> [1] "string" "string" "datetime" "string" "integer" "string" "string"
# Read data from the resource "deployments" with column selection
read_resource(package, "deployments", col_select = c("latitude", "longitude"))
#> # A tibble: 3 × 2
#> latitude longitude
#> <dbl> <dbl>
#> 1 50.8 4.62
#> 2 50.8 4.64
#> 3 50.8 4.65