Skip to contents

Reads data from a Data Resource (in a Data Package) into a tibble (a Tidyverse data frame). The resource must be a Tabular Data Resource. The function uses readr::read_delim() to read CSV files, passing the resource properties path, CSV dialect, column names, data types, etc. Column names are taken from the provided Table Schema (schema), not from the header in the CSV file(s).

Usage

read_resource(package, resource_name, col_select = NULL)

Arguments

package

Data Package object, as returned by read_package() or create_package().

resource_name

Name of the Data Resource.

col_select

Character vector of the columns to include in the result, in the order provided. Selecting columns can improve read speed.

Value

A tibble::tibble() with the Data Resource's tabular data. If there are parsing problems, a warning will alert you. You can retrieve the full details by calling problems() on your data frame.

Details

See vignette("data-resource"), vignette("table-dialect") and vignette("table-schema") to learn how this function implements the Data Package standard.

See also

Other read functions: read_package(), resources()

Examples

# Read a datapackage.json file
package <- read_package(
  system.file("extdata", "v1", "datapackage.json", package = "frictionless")
)

package
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.

# Read data from the resource "observations"
read_resource(package, "observations")
#> # A tibble: 8 × 7
#>   observation_id deployment_id timestamp           scientific_name     count
#>   <chr>          <chr>         <dttm>              <chr>               <dbl>
#> 1 1-1            1             2020-09-28 00:13:07 Capreolus capreolus     1
#> 2 1-2            1             2020-09-28 15:59:17 Capreolus capreolus     1
#> 3 1-3            1             2020-09-28 16:35:23 Lepus europaeus         1
#> 4 1-4            1             2020-09-28 17:04:04 Lepus europaeus         1
#> 5 1-5            1             2020-09-28 19:19:54 Sus scrofa              2
#> 6 2-1            2             2021-10-01 01:25:06 Sus scrofa              1
#> 7 2-2            2             2021-10-01 01:25:06 Sus scrofa              1
#> 8 2-3            2             2021-10-01 04:47:30 Sus scrofa              1
#> # ℹ 2 more variables: life_stage <fct>, comments <chr>

# The above tibble is merged from 2 files listed in the resource path
package$resources[[2]]$path
#> [1] "observations_1.tsv" "observations_2.tsv"

# The column names and types are derived from the resource schema
purrr::map_chr(package$resources[[2]]$schema$fields, "name")
#> [1] "observation_id"  "deployment_id"   "timestamp"       "scientific_name"
#> [5] "count"           "life_stage"      "comments"       
purrr::map_chr(package$resources[[2]]$schema$fields, "type")
#> [1] "string"   "string"   "datetime" "string"   "integer"  "string"   "string"  

# Read data from the resource "deployments" with column selection
read_resource(package, "deployments", col_select = c("latitude", "longitude"))
#> # A tibble: 3 × 2
#>   latitude longitude
#>      <dbl>     <dbl>
#> 1     50.8      4.62
#> 2     50.8      4.64
#> 3     50.8      4.65