
Modernising Citation Metadata in R: Introducing `bibrecord`
Source:vignettes/bibrecord.Rmd
bibrecord.Rmd
Descriptive metadata is often added as an afterthought or stored
separately from the data it describes. This separation can lead to loss
of context when datasets are shared, archived, or reused. To avoid this,
the dataset
package encourages metadata to be embedded at
the time of dataset creation.
For a dataset_df
, this means not only providing
variable-level definitions, units, and namespaces, but also including a
complete, standards-aligned citation record for the dataset itself.
Encoding citation information early ensures that it travels with the
data, supports the FAIR principles (Findable, Accessible, Interoperable,
Reusable), and is ready for export to modern metadata formats.
In the Design Principles & Future Work Semantically Enriched, Standards-Aligned Datasets in R, we identify three objectives for dataset-level citation metadata:
-
Full compliance with standards such as Dublin Core
Terms (DCTERMS) and DataCite
-
Interoperability with the R ecosystem, including
dataset_df
and base R tools
- Preservation of meaning throughout the dataset’s lifecycle — from creation to publication and reuse
Purpose
The base R function utils::bibentry()
offers a way to
structure citation metadata and works well for simple references.
However, it does not fully support DCTERMS or DataCite, which
require:
- Clear separation of roles (e.g., creators vs. contributors)
- Richly typed relationships between resources
- Support for additional metadata fields such as identifiers, subjects, and funding information
The bibrecord
class builds on bibentry
to
bridge this gap while remaining fully compatible with base R. It
adds:
- Multiple
person()
entries for contributors
- Metadata fields aligned with DCTERMS and DataCite
- Safe serialization and extended printing methods
Ideally, bibrecord
should evolve in close coordination
with utils::bibentry()
or be replaced by a modernised
bibentry
that supports these capabilities natively,
achieving the three objectives described above.
What is bibrecord
A bibrecord
is a standard bibentry
object
with additional fields stored as attributes. This means:
- It works with any function that accepts a
bibentry
- It offers structured metadata fields such as
contributor
,subject
, andidentifier
- Extended methods display both the citation and the enriched metadata
Creating a bibrecord
person_jane <- person("Jane", "Doe", role = "cre")
person_alice <- person("Alice", "Smith", role = "dtm")
rec <- bibrecord(
title = "GDP of Small States",
author = list(person_jane),
contributor = list(person_alice),
publisher = "Tinystat",
identifier = "doi:10.1234/example",
date = "2023-05-01",
subject = "Economic indicators"
)
Printing a bibrecord
print(rec)
#> Doe J (2023). "GDP of Small States."
#>
#> Contributors:
#> {Alice Smith [dtm]}
When printed, a bibrecord
shows the standard citation
along with clearly labelled contributor and metadata fields.
Compatibility with existing infrastructure
Because bibrecord
inherits from
bibentry
:
It works with
citation()
and other base R citation toolsIt integrates into existing bibliographic workflows
It can be converted to
as_dublincore()
oras_datacite()
without loss of information
Future extensions
Planned enhancements to bibrecord
include:
- Support for additional metadata fields such as
funder
,geolocation
, andrelatedIdentifier
- Export to JSON-LD or RDF formats
- Integration with APIs from services like Zenodo, Crossref, or Wikidata
In the broader context described in Design Principles & Future Work Semantically Enriched, Standards-Aligned Datasets in R, the long-term goal is to ensure that dataset-level citation metadata in R meets three objectives:
-
Full compliance with modern metadata standards such
as Dublin Core Terms (DCTERMS) and DataCite
-
Seamless interoperability with the R ecosystem,
including
dataset_df
and base R tools
- Preservation of meaning across the entire data lifecycle, from dataset creation to long-term publication and reuse
To achieve this, bibrecord
should either evolve in close
coordination with utils::bibentry()
or, ideally, be
replaced entirely by a modernised version of bibentry
that
supports these capabilities natively.
Summary
The bibrecord
class extends base R’s
bibentry
to provide structured, standards-aligned citation
metadata that can be embedded directly into a dataset_df
.
It keeps full compatibility with existing R workflows while adding
support for contributor roles, richer metadata fields, and export to
standards like DCTERMS and DataCite.
Embedding a bibrecord
in a dataset_df
ensures that citation information is:
-
Complete – all key metadata is included at dataset
creation
-
Portable – metadata travels with the dataset and
can be exported to common formats
- Interoperable – remains compatible with base R and external metadata consumers
By adopting bibrecord
, you can create datasets that are
ready for FAIR-compliant publishing, are easier to share, and maintain
their full descriptive context throughout their lifecycle.