Skip to contents

The dataset_df() constructor creates semantically rich modern data frames. These inherit from tibble::tibble and carry structured metadata using attributes.

Usage

dataset_df(
  ...,
  identifier = c(obs = "http://example.com/dataset#obs"),
  var_labels = NULL,
  units = NULL,
  concepts = NULL,
  dataset_bibentry = NULL,
  dataset_subject = NULL
)

as_dataset_df(
  df,
  identifier = c(obs = "http://example.com/dataset#obs"),
  var_labels = NULL,
  units = NULL,
  concepts = NULL,
  dataset_bibentry = NULL,
  dataset_subject = NULL,
  ...
)

is.dataset_df(x)

# S3 method for class 'dataset_df'
print(x, ...)

is_dataset_df(x)

Arguments

...

Vectors (columns) that should be included in the dataset.

identifier

A named vector of one or more URI prefixes for row IDs. Defaults to c(eg = "http://example.com/dataset#"). For example, if your dataset will be published under DOI https://doi.org/1234, you may use c(obs = "https://doi.org/1234#"), which will generate row URIs such as https://doi.org/1234#1, ..., #n.

var_labels

A named list of human-readable labels for each variable.

units

A named list of measurement units for measured variables.

concepts

A named list of linked concepts (URIs) for variables or dimensions.

dataset_bibentry

A bibliographic metadata record for the dataset, created using datacite() or dublincore().

dataset_subject

A subject descriptor created with subject() or subject_create().

df

A data.frame to convert to a dataset_df.

x

A dataset_df object (used in method dispatch).

Value

A dataset_df object: a tibble with attached metadata stored in attributes.

is.dataset_df returns a logical value (if the object is of class dataset_df.)

Details

Use is.dataset_df() to check class membership.

S3 methods for dataset_df include:

  • print() to display the dataset with metadata

  • summary() to summarize both data and metadata

For full details, see vignette("dataset_df", package = "dataset").

Note

A simple, serverless scaffolding for publishing dataset_df objects on the web (with HTML + RDF exports) is available at https://github.com/dataobservatory-eu/dataset-template.

Examples

my_dataset <- dataset_df(
  country_name = defined(
    c("AD", "LI"),
    concept = "http://data.europa.eu/bna/c_6c2bb82d",
    namespace = "https://www.geonames.org/countries/$1/"
  ),
  gdp = defined(
    c(3897, 7365),
    label = "Gross Domestic Product",
    unit = "million dollars",
    concept = "http://data.europa.eu/83i/aa/GDP"
  ),
  identifier = c(
    obs = "https://dataobservatory-eu.github.io/dataset-template#"
  ),
  dataset_bibentry = dublincore(
    title = "GDP of Andorra and Liechtenstein",
    description = "A small but semantically rich dataset example.",
    creator = person("Jane", "Doe", role = "cre"),
    publisher = "Open Data Institute",
    language = "en"
  )
)

# Basic usage
print(my_dataset)
#> Doe (2025): GDP of Andorra and Liechtenstein [dataset]
#>   rowid     country_name gdp       
#>   <defined> <defined>    <defined>
#> 1 obs1      AD           3897     
#> 2 obs2      LI           7365      
head(my_dataset)
#> Doe (2025): GDP of Andorra and Liechtenstein [dataset]
#>   rowid     country_name gdp       
#>   <defined> <defined>    <defined>
#> 1 obs1      AD           3897     
#> 2 obs2      LI           7365      
summary(my_dataset)
#> Doe (2025): Summary of GDP of Andorra and Liechtenstein [dataset]
#> 
#> Gross Domestic Product (million dollars)
#>     rowid           country_name            gdp      
#>  Length:2           Length:2           Min.   :3897  
#>  Class :character   Class :character   1st Qu.:4764  
#>  Mode  :character   Mode  :character   Median :5631  
#>                                        Mean   :5631  
#>                                        3rd Qu.:6498  
#>                                        Max.   :7365  

# Metadata access
as_dublincore(my_dataset)
#> Dublin Core Metadata Record
#> --------------------------
#> Title:       GDP of Andorra and Liechtenstein
#> Creator(s):  Jane Doe [cre]
#> Contributor(s): :unas
#> Publisher:   Open Data Institute
#> Year:        2025
#> Language:    en
#> Description: A small but semantically rich dataset example.
as_datacite(my_dataset)
#> DataCite Metadata Record
#> --------------------------
#> Title:       GDP of Andorra and Liechtenstein
#> Creator(s):  Jane Doe [cre]
#> Contributor(s): :unas
#> Identifier:  :tba
#> Publisher:   Open Data Institute
#> Year:        2025
#> Language:    en
#> Description: A small but semantically rich dataset example.

# Export description as RDF triples
my_description <- describe(my_dataset, con = tempfile())
my_description
#> Doe (2025): GDP of Andorra and Liechtenstein [dataset]
#>   rowid     country_name gdp       
#>   <defined> <defined>    <defined>
#> 1 obs1      AD           3897     
#> 2 obs2      LI           7365