Data Package is a simple container format to describe a coherent collection of data (a dataset), including its contributors, licenses, etc.
In this document we use the terms “package” for Data Package, “resource” for Data Resource, “dialect” for Table Dialect, and “schema” for Table Schema.
General implementation
Frictionless supports reading, manipulating and writing packages.
Much of its functionality is focused on manipulating resources (see
vignette("data-resource")
).
Read
read_package()
reads a package from
datapackage.json
file (path or URL):
library(frictionless)
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) :
#> object 'type_sum.accel' not found
file <- system.file("extdata", "v1", "datapackage.json", package = "frictionless")
package <- read_package(file)
print.datapackage()
prints a human-readable summary of a
package:
package
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.
Manipulate
A package is a list, with all the properties that were present in the
datapackage.json
file (e.g. name
,
id
, etc.). Frictionless adds the custom property
"directory"
to support reading data (which is removed when
writing to disk) and extends the class with "datapackage"
to support printing and checking:
attributes(package)
#> $names
#> [1] "name" "id" "licenses" "image" "version" "created"
#> [7] "temporal" "resources" "directory"
#>
#> $class
#> [1] "datapackage" "list"
create_package()
creates a package from scratch or from
an existing package. It adds the required properties and class if those
are missing:
# From scratch
create_package()
#> A Data Package with 0 resources.
#> Use `unclass()` to print the Data Package as a list.
# From an existing package
create_package(package)
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.
check_package()
checks if a package contains the
required properties and class:
invalid_package <- example_package()
invalid_package$resources <- NULL
check_package(invalid_package)
#> Error in `check_package()`:
#> ! `package` must be a Data Package object.
#> ✖ `package` is missing a resources property or it is not a list.
#> ℹ Create a valid Data Package object with `read_package()` or
#> `create_package()`.
You can manipulate the package list, but frictionless does not
provide functions to do that. Use purrr or base R instead
(see vignette("frictionless")
).
Some functions (e.g. unclass()
or append()
)
remove the custom class, creating an invalid package. You can fix this
by calling create_package()
on your package.
Most functions have package
as their first argument and
return package. This allows you to pipe the
functions:
library(dplyr) # Or library(magrittr)
my_package <-
create_package() %>%
add_resource(resource_name = "iris", data = iris) %>%
append(c("title" = "my_package"), after = 0) %>%
create_package() # To add the datapackage class again
my_package
#> A Data Package with 1 resource:
#> • iris
#> Use `unclass()` to print the Data Package as a list.
Write
write_package()
writes a package to disk as a
datapackage.json
file. For some resources, it also writes
the data files. See the function documentation and
vignette("data-resource")
for details.
Properties implementation
resources
resources
is required. It is used by resources()
and many other
functions. check_package()
returns an error if it is
missing.
profile
profile
is ignored by read_package()
and not set (to
e.g. "tabular-data-package"
) by
create_package()
.
name
name
is ignored by read_package()
and not set by
create_package()
.
id
id
is ignored by read_package()
and not set by
create_package()
. print.datapackage()
adds an
extra sentence when id
is a URL (like a DOI):
package <- example_package()
package$id <- "https://doi.org/10.5281/zenodo.10053702/"
package
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> For more information, see <https://doi.org/10.5281/zenodo.10053702/>.
#> Use `unclass()` to print the Data Package as a list.
licenses
licenses
is ignored by read_package()
and not set by
create_package()
.
title
title
is ignored by read_package()
and not set by
create_package()
.
description
description
is ignored by read_package()
and not set by
create_package()
.
homepage
homepage
is ignored by read_package()
and not set by
create_package()
.
image
image
is ignored by read_package()
and not set by
create_package()
.
version
version
is ignored by read_package()
and not set by
create_package()
.
created
created
is ignored by read_package()
and not set by
create_package()
.
keywords
keywords
is ignored by read_package()
and not set by
create_package()
.
contributors
contributors
is ignored by read_package()
and not set by
create_package()
.
sources
sources
is ignored by read_package()
and not set by
create_package()
.