The dataset_df()
constructor creates semantically rich modern data frames.
These inherit from tibble::tibble
and carry structured metadata using
attributes.
Usage
dataset_df(
...,
identifier = c(obs = "http://example.com/dataset#obs"),
var_labels = NULL,
units = NULL,
concepts = NULL,
dataset_bibentry = NULL,
dataset_subject = NULL
)
as_dataset_df(
df,
identifier = c(obs = "http://example.com/dataset#obs"),
var_labels = NULL,
units = NULL,
concepts = NULL,
dataset_bibentry = NULL,
dataset_subject = NULL,
...
)
is.dataset_df(x)
# S3 method for class 'dataset_df'
print(x, ...)
is_dataset_df(x)
Arguments
- ...
Vectors (columns) that should be included in the dataset.
- identifier
A named vector of one or more URI prefixes for row IDs. Defaults to
c(eg = "http://example.com/dataset#")
. For example, if your dataset will be published under DOIhttps://doi.org/1234
, you may usec(obs = "https://doi.org/1234#")
, which will generate row URIs such ashttps://doi.org/1234#1
, ...,#n
.- var_labels
A named list of human-readable labels for each variable.
- units
A named list of measurement units for measured variables.
- concepts
A named list of linked concepts (URIs) for variables or dimensions.
- dataset_bibentry
A bibliographic metadata record for the dataset, created using
datacite()
ordublincore()
.- dataset_subject
A subject descriptor created with
subject()
orsubject_create()
.- df
A
data.frame
to convert to adataset_df
.- x
A
dataset_df
object (used in method dispatch).
Value
A dataset_df
object: a tibble with attached metadata stored in
attributes.
is.dataset_df
returns a logical value
(if the object is of class dataset_df
.)
Details
Use is.dataset_df()
to check class membership.
S3 methods for dataset_df
include:
For full details, see vignette("dataset_df", package = "dataset")
.
Note
A simple, serverless scaffolding for publishing dataset_df
objects
on the web (with HTML + RDF exports) is available at
https://github.com/dataobservatory-eu/dataset-template.
Examples
my_dataset <- dataset_df(
country_name = defined(
c("AD", "LI"),
concept = "http://data.europa.eu/bna/c_6c2bb82d",
namespace = "https://www.geonames.org/countries/$1/"
),
gdp = defined(
c(3897, 7365),
label = "Gross Domestic Product",
unit = "million dollars",
concept = "http://data.europa.eu/83i/aa/GDP"
),
identifier = c(
obs = "https://dataobservatory-eu.github.io/dataset-template#"
),
dataset_bibentry = dublincore(
title = "GDP of Andorra and Liechtenstein",
description = "A small but semantically rich dataset example.",
creator = person("Jane", "Doe", role = "cre"),
publisher = "Open Data Institute",
language = "en"
)
)
# Basic usage
print(my_dataset)
#> Doe (2025): GDP of Andorra and Liechtenstein [dataset]
#> rowid country_name gdp
#> <defined> <defined> <defined>
#> 1 obs1 AD 3897
#> 2 obs2 LI 7365
head(my_dataset)
#> Doe (2025): GDP of Andorra and Liechtenstein [dataset]
#> rowid country_name gdp
#> <defined> <defined> <defined>
#> 1 obs1 AD 3897
#> 2 obs2 LI 7365
summary(my_dataset)
#> Doe (2025): Summary of GDP of Andorra and Liechtenstein [dataset]
#>
#> Gross Domestic Product (million dollars)
#> rowid country_name gdp
#> Length:2 Length:2 Min. :3897
#> Class :character Class :character 1st Qu.:4764
#> Mode :character Mode :character Median :5631
#> Mean :5631
#> 3rd Qu.:6498
#> Max. :7365
# Metadata access
as_dublincore(my_dataset)
#> Dublin Core Metadata Record
#> --------------------------
#> Title: GDP of Andorra and Liechtenstein
#> Creator(s): Jane Doe [cre]
#> Contributor(s): :unas
#> Publisher: Open Data Institute
#> Year: 2025
#> Language: en
#> Description: A small but semantically rich dataset example.
as_datacite(my_dataset)
#> DataCite Metadata Record
#> --------------------------
#> Title: GDP of Andorra and Liechtenstein
#> Creator(s): Jane Doe [cre]
#> Contributor(s): :unas
#> Identifier: :tba
#> Publisher: Open Data Institute
#> Year: 2025
#> Language: en
#> Description: A small but semantically rich dataset example.
# Export description as RDF triples
my_description <- describe(my_dataset, con = tempfile())
my_description
#> Doe (2025): GDP of Andorra and Liechtenstein [dataset]
#> rowid country_name gdp
#> <defined> <defined> <defined>
#> 1 obs1 AD 3897
#> 2 obs2 LI 7365