Working with Units
Carl Boettiger
2024-10-30
Source:vignettes/working-with-units.Rmd
working-with-units.Rmd
Overview
One essential role of EML metadata is in precisely defining the units
in which data is measured. To make sure these units can be understood by
(and thus potentially converted to other units by) a computer, it’s
necessary to be rather precise about our choice of units. EML knows
about a lot of commonly used units, referred to as “standardUnits,”
already. If a unit is in EML’s standardUnit
dictionary, we
can refer to it without further explanation as long as we’re careful to
use the precise id
for that unit, as we will see below.
Sometimes data involves a unit that is not in
standardUnit
dictionary. In this case, the metadata must
provide additional information about the unit, including how to convert
the unit into the SI system. EML uses an existing standard, stmml, to represent
this information, which must be given in the
additionalMetadata
section of an EML file. The
stmml
standard is also used to specify EML’s own
standardUnit
definitions.
Add a custom unit to EML
custom_units <-
data.frame(id = "speciesPerSquareMeter",
unitType = "arealDensity",
parentSI = "numberPerSquareMeter",
multiplierToSI = 1,
description = "number of species per square meter")
unitList <- set_unitList(custom_units)
Start with a minimal EML document
me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))
my_eml <- list(dataset = list(
title = "A Minimal Valid EML Dataset",
creator = me,
contact = me),
additionalMetadata = list(metadata = list(
unitList = unitList
))
)
write_eml(my_eml, "eml-with-units.xml")
eml_validate("eml-with-units.xml")
## [1] TRUE
## attr(,"errors")
## character(0)
Note: Custom units are widely misunderstood and misused in EML. See examples from custom-units
Reading EML: parsing unit information, including custom unit types
Let us start by examining the numeric attributes in an example EML file. First we read in the file:
f <- system.file("tests", emld::eml_version(), "eml-datasetWithUnits.xml", package = "emld")
eml <- read_eml(f)
We extract the attributeList
, and examine the numeric
attributes (e.g. those which have units):