Specimen records constitute the core of data served by the NBA. Museum specimens can represent a whole variety of different objects such as plants, animals or single parts thereof, DNA samples, fossils, rocks or meteorites. For detailed information of the data model, please refer to the official documentation in the NBA.
All specimen occurrence services are accessible using the methods within the SpecimenClient
class. For a list of available endpoints, please refer to the class documentation (?Specimen
). Below, we will give details about all services, grouped by category.
Querying for specimens is accomplished with the query
method in the class SpecimenClient
. For simple queries, query parameters of type list
can be passed via the parameter queryParams
, for example we can query specimens of the Family Ebenaceae that were collected in Europe:
library('nbaR')
# instantiate specimen client
sc <- SpecimenClient$new()
# specify query params in named list
qp <-
list(
identifications.defaultClassification.family = "Ebenaceae",
gatheringEvent.continent = "Europe"
)
# query
res <- sc$query(queryParams = qp)
If we now want to know all the countries that the specimens were collected in, we can access the Specimen
objects in res$content$resultSet
as follows:
sapply(res$content$resultSet, function(x)x$item$gatheringEvent$country)
## [1] "Italy" "Austria" "Netherlands" "Netherlands" "Netherlands"
## [6] "Netherlands" "Germany" "Netherlands" "Netherlands" "Netherlands"
Note that passing query parameters as a named list only allows for limited queries; the logical conjunction between parameters is for example always AND
. More complex queries can be accomplished using the QuerySpec
object:
# get all specimens with genus name starting with 'Hydro'
qc <-
QueryCondition$new(field = "identifications.defaultClassification.genus",
operator = "STARTS_WITH",
value = "Hydro")
qs <- QuerySpec$new(conditions = list(qc))
res <- sc$query(qs)
The query
function is limited to retrieve 50000 specimen at once (this is determined in the parameter index.max_result_window
, the value is retrievable using the getSettings
method in the metadata section). In order to provide access for a larger amount of data, the query_download
takes the same arguments as query
, but download the data as a gzip stream under the hood. Unlike query
, query_download
returns a list of specimen objects instead of a ResultSet
. Example:
## get the first 100000 specimen objects (not possible with query method)
res <- sc$download_query(QuerySpec$new(size=100000))
Several access methods offer the convenient retrieval of specimens matching a certain identifier or being part of a certain collection. Below we give examples of how to use the currently implemented data access services for specimen records:
Some of the specimens available via the NBA are categorised thematically into special collections, such as the Siebold-, Dubois- or Jongmans collection. The function get_named_collection
lists all available special collections and the identifiers of the specimens within a collection can be queries with get_ids_in_collection
sc$get_named_collections()$content
## Warning in self$handleError(response): javax.ws.rs.NotFoundException:
## RESTEASY003210: Could not find resource for full path: http://
## api.biodiversitydata.nl/v2/specimen/getNamedCollections
## [1] "javax.ws.rs.NotFoundException: RESTEASY003210: Could not find resource for full path: http://api.biodiversitydata.nl/v2/specimen/getNamedCollections"
sc$get_ids_in_collection("siebold")$content
## Warning in self$handleError(response): javax.ws.rs.NotFoundException:
## RESTEASY003210: Could not find resource for full path: http://
## api.biodiversitydata.nl/v2/specimen/getIdsInCollection/siebold
## [1] "javax.ws.rs.NotFoundException: RESTEASY003210: Could not find resource for full path: http://api.biodiversitydata.nl/v2/specimen/getIdsInCollection/siebold"
For any given query (with QuerySpec
or not), returns the count of matches instead of specimen objects:
# Example with QuerySpec:
# how many specimens are there in the 'Botany' collection?
qc <- QueryCondition$new(field='collectionType', operator='EQUALS', value='Botany')
qs <- QuerySpec$new(conditions=list(qc))
# get the number of specimens
sc$count(qs)$content
## [1] 4765887
Note that the count of matches for a given query is also returned by the query
function. However, count
is more lightweight as it returns an integer
instead of a ResultSet
containing Specimen
objects.
Check if a record exists, based on its unitID:
# use SpecimenClient instantiated above
res <- sc$exists('ZMA.INS.1255440')
# content is boolean
res$content
## [1] TRUE
Return a single specimen given its identifier (Note: the identifier of a specimen is different from the unitID, see also here):
id <- "[email protected]"
res <- sc$find(id)
# content is single specimen object
res$content
## <Specimen>
## Fields:
## sourceSystem: object of class <SourceSystem>
## sourceSystemId: RMNH.MAM.17209.B
## recordURI:
## id: [email protected]
## unitID: RMNH.MAM.17209.B
## unitGUID: https://data.biodiversitydata.nl/naturalis/specimen/RMNH.MAM.17209.B
## collectorsFieldNumber:
## assemblageID:
## sourceInstitutionID: Naturalis Biodiversity Center
## sourceID: CRS
## previousSourceID:
## owner: Naturalis Biodiversity Center
## licenseType: Copyright
## license: CC0 1.0
## recordBasis: PreservedSpecimen
## kindOfUnit: skin
## collectionType: Mammalia
## sex: female
## phaseOrStage: juvenile
## title: RMNH.MAM.17209.b_RMNH_MAM_17209.b
## notes:
## preparationType: mounted skin
## previousUnitsText:
## numberOfSpecimen: 1
## fromCaptivity: FALSE
## objectPublic: TRUE
## multiMediaPublic: TRUE
## acquiredFrom: object of class <Agent>
## gatheringEvent: object of class <GatheringEvent>
## identifications: list of length 1
## associatedMultiMediaUris: list of length 0
## theme:
## Methods:
## fromJSONString
## toJSONString
## fromList
## toList
## print
Same as find
, but takes multiple IDs:
ids <- "[email protected],[email protected]"
res <- sc$find_by_ids(ids)
Aggregation services group available data according to different criteria.
This method takes a specific field as an argument and returns all possible values and the frequency for that field in the data. Below we get all possible values for the country in which a specimen was collected.
sc$get_distinct_values("gatheringEvent.country")
Note: By default, get_distinct_values
lists only the first 10 hits. The above query thus does not reflect the distinct values in the hole dataset. This number can be increased with e.g. setting the size
parameter in a QuerySpec
object passed to the method.
sc$get_distinct_values("gatheringEvent.country",
querySpec = QuerySpec$new(size = 10000))
Instead of returning all different values for a given field, this method does a mere count:
sc$count_distinct_values("gatheringEvent.country")
Specimen Metadata services include the same standard metadata services as for the other data types:
# get all paths for the Specimen datatype
sc$get_paths()$content
## [1] "sourceSystem.code"
## [2] "sourceSystem.name"
## [3] "sourceSystemId"
## [4] "recordURI"
## [5] "unitID"
## [6] "unitGUID"
## [7] "collectorsFieldNumber"
## [8] "assemblageID"
## [9] "sourceInstitutionID"
## [10] "sourceID"
## [11] "previousSourceID"
## [12] "owner"
## [13] "licenseType"
## [14] "license"
## [15] "recordBasis"
## [16] "kindOfUnit"
## [17] "collectionType"
## [18] "sex"
## [19] "phaseOrStage"
## [20] "title"
## [21] "notes"
## [22] "preparationType"
## [23] "previousUnitsText"
## [24] "numberOfSpecimen"
## [25] "fromCaptivity"
## [26] "objectPublic"
## [27] "multiMediaPublic"
## [28] "acquiredFrom.agentText"
## [29] "gatheringEvent.projectTitle"
## [30] "gatheringEvent.worldRegion"
## [31] "gatheringEvent.continent"
## [32] "gatheringEvent.country"
## [33] "gatheringEvent.iso3166Code"
## [34] "gatheringEvent.provinceState"
## [35] "gatheringEvent.island"
## [36] "gatheringEvent.locality"
## [37] "gatheringEvent.city"
## [38] "gatheringEvent.sublocality"
## [39] "gatheringEvent.localityText"
## [40] "gatheringEvent.dateTimeBegin"
## [41] "gatheringEvent.dateTimeEnd"
## [42] "gatheringEvent.dateText"
## [43] "gatheringEvent.method"
## [44] "gatheringEvent.altitude"
## [45] "gatheringEvent.altitudeUnifOfMeasurement"
## [46] "gatheringEvent.behavior"
## [47] "gatheringEvent.biotopeText"
## [48] "gatheringEvent.depth"
## [49] "gatheringEvent.depthUnitOfMeasurement"
## [50] "gatheringEvent.code"
## [51] "gatheringEvent.establishmentMeans"
## [52] "gatheringEvent.gatheringPersons.agentText"
## [53] "gatheringEvent.gatheringPersons.fullName"
## [54] "gatheringEvent.gatheringPersons.organization.agentText"
## [55] "gatheringEvent.gatheringPersons.organization.name"
## [56] "gatheringEvent.gatheringOrganizations.agentText"
## [57] "gatheringEvent.gatheringOrganizations.name"
## [58] "gatheringEvent.siteCoordinates.longitudeDecimal"
## [59] "gatheringEvent.siteCoordinates.latitudeDecimal"
## [60] "gatheringEvent.siteCoordinates.gridCellSystem"
## [61] "gatheringEvent.siteCoordinates.gridLatitudeDecimal"
## [62] "gatheringEvent.siteCoordinates.gridLongitudeDecimal"
## [63] "gatheringEvent.siteCoordinates.gridCellCode"
## [64] "gatheringEvent.siteCoordinates.gridQualifier"
## [65] "gatheringEvent.siteCoordinates.coordinateErrorDistanceInMeters"
## [66] "gatheringEvent.siteCoordinates.spatialDatum"
## [67] "gatheringEvent.siteCoordinates.geoShape"
## [68] "gatheringEvent.namedAreas.areaName"
## [69] "gatheringEvent.namedAreas.areaClass"
## [70] "gatheringEvent.associatedTaxa.name"
## [71] "gatheringEvent.associatedTaxa.relationType"
## [72] "gatheringEvent.chronoStratigraphy.youngRegionalSubstage"
## [73] "gatheringEvent.chronoStratigraphy.youngRegionalStage"
## [74] "gatheringEvent.chronoStratigraphy.youngRegionalSeries"
## [75] "gatheringEvent.chronoStratigraphy.youngDatingQualifier"
## [76] "gatheringEvent.chronoStratigraphy.youngInternSystem"
## [77] "gatheringEvent.chronoStratigraphy.youngInternSubstage"
## [78] "gatheringEvent.chronoStratigraphy.youngInternStage"
## [79] "gatheringEvent.chronoStratigraphy.youngInternSeries"
## [80] "gatheringEvent.chronoStratigraphy.youngInternErathem"
## [81] "gatheringEvent.chronoStratigraphy.youngInternEonothem"
## [82] "gatheringEvent.chronoStratigraphy.youngChronoName"
## [83] "gatheringEvent.chronoStratigraphy.youngCertainty"
## [84] "gatheringEvent.chronoStratigraphy.oldDatingQualifier"
## [85] "gatheringEvent.chronoStratigraphy.chronoPreferredFlag"
## [86] "gatheringEvent.chronoStratigraphy.oldRegionalSubstage"
## [87] "gatheringEvent.chronoStratigraphy.oldRegionalStage"
## [88] "gatheringEvent.chronoStratigraphy.oldRegionalSeries"
## [89] "gatheringEvent.chronoStratigraphy.oldInternSystem"
## [90] "gatheringEvent.chronoStratigraphy.oldInternSubstage"
## [91] "gatheringEvent.chronoStratigraphy.oldInternStage"
## [92] "gatheringEvent.chronoStratigraphy.oldInternSeries"
## [93] "gatheringEvent.chronoStratigraphy.oldInternErathem"
## [94] "gatheringEvent.chronoStratigraphy.oldInternEonothem"
## [95] "gatheringEvent.chronoStratigraphy.oldChronoName"
## [96] "gatheringEvent.chronoStratigraphy.chronoIdentifier"
## [97] "gatheringEvent.chronoStratigraphy.oldCertainty"
## [98] "gatheringEvent.bioStratigraphy.youngBioDatingQualifier"
## [99] "gatheringEvent.bioStratigraphy.youngBioName"
## [100] "gatheringEvent.bioStratigraphy.youngFossilZone"
## [101] "gatheringEvent.bioStratigraphy.youngFossilSubZone"
## [102] "gatheringEvent.bioStratigraphy.youngBioCertainty"
## [103] "gatheringEvent.bioStratigraphy.youngStratType"
## [104] "gatheringEvent.bioStratigraphy.bioDatingQualifier"
## [105] "gatheringEvent.bioStratigraphy.bioPreferredFlag"
## [106] "gatheringEvent.bioStratigraphy.rangePosition"
## [107] "gatheringEvent.bioStratigraphy.oldBioName"
## [108] "gatheringEvent.bioStratigraphy.bioIdentifier"
## [109] "gatheringEvent.bioStratigraphy.oldFossilzone"
## [110] "gatheringEvent.bioStratigraphy.oldFossilSubzone"
## [111] "gatheringEvent.bioStratigraphy.oldBioCertainty"
## [112] "gatheringEvent.bioStratigraphy.oldBioStratType"
## [113] "gatheringEvent.lithoStratigraphy.qualifier"
## [114] "gatheringEvent.lithoStratigraphy.preferredFlag"
## [115] "gatheringEvent.lithoStratigraphy.member2"
## [116] "gatheringEvent.lithoStratigraphy.member"
## [117] "gatheringEvent.lithoStratigraphy.informalName2"
## [118] "gatheringEvent.lithoStratigraphy.informalName"
## [119] "gatheringEvent.lithoStratigraphy.importedName2"
## [120] "gatheringEvent.lithoStratigraphy.importedName1"
## [121] "gatheringEvent.lithoStratigraphy.lithoIdentifier"
## [122] "gatheringEvent.lithoStratigraphy.formation2"
## [123] "gatheringEvent.lithoStratigraphy.formationGroup2"
## [124] "gatheringEvent.lithoStratigraphy.formationGroup"
## [125] "gatheringEvent.lithoStratigraphy.formation"
## [126] "gatheringEvent.lithoStratigraphy.certainty2"
## [127] "gatheringEvent.lithoStratigraphy.certainty"
## [128] "gatheringEvent.lithoStratigraphy.bed2"
## [129] "gatheringEvent.lithoStratigraphy.bed"
## [130] "informationWithheld"
## [131] "dataGeneralizations"
## [132] "modified"
## [133] "identifications.taxonRank"
## [134] "identifications.scientificName.fullScientificName"
## [135] "identifications.scientificName.taxonomicStatus"
## [136] "identifications.scientificName.genusOrMonomial"
## [137] "identifications.scientificName.subgenus"
## [138] "identifications.scientificName.specificEpithet"
## [139] "identifications.scientificName.infraspecificEpithet"
## [140] "identifications.scientificName.infraspecificMarker"
## [141] "identifications.scientificName.nameAddendum"
## [142] "identifications.scientificName.authorshipVerbatim"
## [143] "identifications.scientificName.author"
## [144] "identifications.scientificName.year"
## [145] "identifications.scientificName.scientificNameGroup"
## [146] "identifications.scientificName.references.titleCitation"
## [147] "identifications.scientificName.references.citationDetail"
## [148] "identifications.scientificName.references.uri"
## [149] "identifications.scientificName.references.author.agentText"
## [150] "identifications.scientificName.references.author.fullName"
## [151] "identifications.scientificName.references.author.organization.agentText"
## [152] "identifications.scientificName.references.author.organization.name"
## [153] "identifications.scientificName.references.publicationDate"
## [154] "identifications.scientificName.experts.agentText"
## [155] "identifications.scientificName.experts.fullName"
## [156] "identifications.scientificName.experts.organization.agentText"
## [157] "identifications.scientificName.experts.organization.name"
## [158] "identifications.typeStatus"
## [159] "identifications.dateIdentified"
## [160] "identifications.defaultClassification.domain"
## [161] "identifications.defaultClassification.subKingdom"
## [162] "identifications.defaultClassification.kingdom"
## [163] "identifications.defaultClassification.phylum"
## [164] "identifications.defaultClassification.subPhylum"
## [165] "identifications.defaultClassification.superClass"
## [166] "identifications.defaultClassification.className"
## [167] "identifications.defaultClassification.subClass"
## [168] "identifications.defaultClassification.superOrder"
## [169] "identifications.defaultClassification.order"
## [170] "identifications.defaultClassification.subOrder"
## [171] "identifications.defaultClassification.infraOrder"
## [172] "identifications.defaultClassification.superFamily"
## [173] "identifications.defaultClassification.family"
## [174] "identifications.defaultClassification.subFamily"
## [175] "identifications.defaultClassification.tribe"
## [176] "identifications.defaultClassification.subTribe"
## [177] "identifications.defaultClassification.genus"
## [178] "identifications.defaultClassification.subgenus"
## [179] "identifications.defaultClassification.specificEpithet"
## [180] "identifications.defaultClassification.infraspecificEpithet"
## [181] "identifications.defaultClassification.infraspecificRank"
## [182] "identifications.systemClassification.rank"
## [183] "identifications.systemClassification.name"
## [184] "identifications.vernacularNames.name"
## [185] "identifications.vernacularNames.language"
## [186] "identifications.vernacularNames.preferred"
## [187] "identifications.vernacularNames.references.titleCitation"
## [188] "identifications.vernacularNames.references.citationDetail"
## [189] "identifications.vernacularNames.references.uri"
## [190] "identifications.vernacularNames.references.author.agentText"
## [191] "identifications.vernacularNames.references.author.fullName"
## [192] "identifications.vernacularNames.references.author.organization.agentText"
## [193] "identifications.vernacularNames.references.author.organization.name"
## [194] "identifications.vernacularNames.references.publicationDate"
## [195] "identifications.vernacularNames.experts.agentText"
## [196] "identifications.vernacularNames.experts.fullName"
## [197] "identifications.vernacularNames.experts.organization.agentText"
## [198] "identifications.vernacularNames.experts.organization.name"
## [199] "identifications.identificationQualifiers"
## [200] "identifications.identifiers.agentText"
## [201] "identifications.taxonomicEnrichments.vernacularNames.name"
## [202] "identifications.taxonomicEnrichments.vernacularNames.language"
## [203] "identifications.taxonomicEnrichments.synonyms.fullScientificName"
## [204] "identifications.taxonomicEnrichments.synonyms.taxonomicStatus"
## [205] "identifications.taxonomicEnrichments.synonyms.genusOrMonomial"
## [206] "identifications.taxonomicEnrichments.synonyms.subgenus"
## [207] "identifications.taxonomicEnrichments.synonyms.specificEpithet"
## [208] "identifications.taxonomicEnrichments.synonyms.infraspecificEpithet"
## [209] "identifications.taxonomicEnrichments.synonyms.authorshipVerbatim"
## [210] "identifications.taxonomicEnrichments.defaultClassification.domain"
## [211] "identifications.taxonomicEnrichments.defaultClassification.subKingdom"
## [212] "identifications.taxonomicEnrichments.defaultClassification.kingdom"
## [213] "identifications.taxonomicEnrichments.defaultClassification.phylum"
## [214] "identifications.taxonomicEnrichments.defaultClassification.subPhylum"
## [215] "identifications.taxonomicEnrichments.defaultClassification.superClass"
## [216] "identifications.taxonomicEnrichments.defaultClassification.className"
## [217] "identifications.taxonomicEnrichments.defaultClassification.subClass"
## [218] "identifications.taxonomicEnrichments.defaultClassification.superOrder"
## [219] "identifications.taxonomicEnrichments.defaultClassification.order"
## [220] "identifications.taxonomicEnrichments.defaultClassification.subOrder"
## [221] "identifications.taxonomicEnrichments.defaultClassification.infraOrder"
## [222] "identifications.taxonomicEnrichments.defaultClassification.superFamily"
## [223] "identifications.taxonomicEnrichments.defaultClassification.family"
## [224] "identifications.taxonomicEnrichments.defaultClassification.subFamily"
## [225] "identifications.taxonomicEnrichments.defaultClassification.tribe"
## [226] "identifications.taxonomicEnrichments.defaultClassification.subTribe"
## [227] "identifications.taxonomicEnrichments.defaultClassification.genus"
## [228] "identifications.taxonomicEnrichments.defaultClassification.subgenus"
## [229] "identifications.taxonomicEnrichments.defaultClassification.specificEpithet"
## [230] "identifications.taxonomicEnrichments.defaultClassification.infraspecificEpithet"
## [231] "identifications.taxonomicEnrichments.defaultClassification.infraspecificRank"
## [232] "identifications.taxonomicEnrichments.sourceSystem.code"
## [233] "identifications.taxonomicEnrichments.taxonId"
## [234] "identifications.preferred"
## [235] "identifications.verificationStatus"
## [236] "identifications.rockType"
## [237] "identifications.associatedFossilAssemblage"
## [238] "identifications.rockMineralUsage"
## [239] "identifications.associatedMineralName"
## [240] "identifications.remarks"
## [241] "associatedMultiMediaUris.accessUri"
## [242] "associatedMultiMediaUris.format"
## [243] "associatedMultiMediaUris.variant"
## [244] "theme"
# get info e.g. for field collectionType
sc$get_field_info()$content$collectionType
## $indexed
## [1] TRUE
##
## $type
## [1] "keyword"
##
## $allowedOperators
## [1] "=" "!=" "EQUALS_IC"
## [4] "NOT_EQUALS_IC" "IN" "NOT_IN"
## [7] "MATCHES" "NOT_MATCHES" "STARTS_WITH"
## [10] "NOT_STARTS_WITH" "STARTS_WITH_IC" "NOT_STARTS_WITH_IC"
##
## $description
## [1] "The subsection of the main Naturalis collection. e.g. Mammals or Fish or Hymenoptera"
# get all settings
sc$get_settings()
## <Response>
## Fields:
## content: object of class <list>
## response: object of class <response> (httr package)
# get specific setting
sc$get_setting("index.max_result_window")$content
## [[1]]
## [1] 50000
# check if operator is allowed
sc$is_operator_allowed("gatheringEvent.continent", "STARTS_WITH")$content
## [1] TRUE
All fields can be retrieved with get_paths
and specific information on the fields, such as allowed operators etc. with get_field_info
sc$get_paths()$content
sc$get_field_info()$content
Multimedia-specific settings can be retrieved with get_settings
and a specific setting with get_setting
:
sc$get_settings()$content
## $index.max_result_window
## [1] 50000
##
## $specimen.group_by_scientific_name.max_num_buckets
## [1] 10000
sc$get_setting("index.max_result_window")$content
## [[1]]
## [1] 50000
In addition to query services that return JSON formatted data, the NBA also offers the export of Darwin Core Archive (DwCA) files. These files are by default zip archives, please refer to our official API documentation for more information.
Static download services offer the download of predefined datasets. The sets that are available for download can be queried with dwca_get_data_set_names
:
sc$dwca_get_data_set_names()$content
## [1] "amphibia-and-reptilia" "aves"
## [3] "birdsounds" "botany"
## [5] "brachiopoda" "cainozoic-mollusca"
## [7] "chelicerata-and-myriapoda" "cnidaria"
## [9] "coleoptera" "collembola"
## [11] "crustacea" "diptera"
## [13] "echinodermata" "foraminifera"
## [15] "hemiptera" "hymenoptera"
## [17] "lepidoptera" "mammalia"
## [19] "micropaleontology" "mollusca"
## [21] "observations" "odonata"
## [23] "orthopteroidea" "paleobotany"
## [25] "paleontology-invertebrates" "pisces"
## [27] "porifera" "protista"
## [29] "tunicata"
A dataset can then be downloaded using dwca_get_data_set
. A filename can be given as argument, if none is given, the DwCA archive is written to download-YYYY-MM-DDThh:mm.zip in the current working directory.
# download dataset 'porifera' to temporary file
filename <- tempfile(fileext=".zip")
sc$dwca_get_data_set('porifera', filename=filename)
## Query result written to /tmp/RtmpUimw33/file18b22ed3b5c.zip
The dynamic download function dwca_query
allows for download of arbitrary sets, defined by the user’s query. The arguments to this methods are similar to query
, plus the filename:
# download all specimen of genus 'Hydrochoerus'
filename <- tempfile(fileext = ".zip")
qs <-
QuerySpec$new(conditions = list(
QueryCondition$new(
field = "identifications.defaultClassification.genus",
operator = "EQUALS",
value = "Hydrochoerus"
)
))
sc$dwca_query(querySpec = qs, filename = filename)
## Query result written to /tmp/RtmpUimw33/file18b5c69a32e.zip
Query for taxon document with given search criteria. Example:
# query for taxa of genus 'Sedum' that are in the Netherlands Soortenregister (NSR)
tc <- TaxonClient$new()
qc <-
QueryCondition$new(field = "acceptedName.genusOrMonomial",
operator = "EQUALS",
value = "Sedum")
qc2 <-
QueryCondition$new(field = "sourceSystem.code",
operator = "EQUALS",
value = "NSR")
qs <- QuerySpec$new(conditions = list(qc, qc2))
tc$query(qs)
The query
function is limited to retrieve 50000 taxa at once (this is determined in the parameter index.max_result_window
, the value is retrievable using the getSettings
method in the metadata section). In order to provide access for a larger amount of data, the query_download
takes the same arguments as query
, but download the data as a gzip stream under the hood. Unlike query
, query_download
returns a list of taxon objects instead of a ResultSet
.
For a given query, do not return Taxon
objects but the mere count. Example
# get counts for taxa of genus 'Sedum' that are in the
# Netherlands Soortenregister (NSR)
qc <-
QueryCondition$new(field = "acceptedName.genusOrMonomial",
operator = "EQUALS",
value = "Sedum")
qc2 <-
QueryCondition$new(field = "sourceSystem.code",
operator = "EQUALS",
value = "NSR")
qs <- QuerySpec$new(conditions = list(qc, qc2))
tc$count(qs)
## <Response>
## Fields:
## content: object of class <integer>
## response: object of class <response> (httr package)
Returns a taxon object given its identifier:
tc$find("[email protected]")$content
## Warning in self$handleError(response): 404 (NOT FOUND)
## No Specimen exists with ID [email protected]
## [1] "404 (NOT FOUND)\nNo Specimen exists with ID [email protected]"
Given a string with comma-separated identifiers, returns a list of taxon objects:
ids <- "2[email protected],[email protected],[email protected],[email protected],[email protected]"
res <- tc$find_by_ids(ids)
This method takes a specific field as an argument and returns all possible values and the frequency for that field in the data. Example: get all data source systems for taxon objects:
tc$get_distinct_values("sourceSystem.name")$content
## $`Species 2000 - Catalogue Of Life`
## [1] 1998431
##
## $`Naturalis - Dutch Species Register`
## [1] 50346
##
## $`Naturalis - Dutch Caribbean Species Register`
## [1] 8859
Taxon Metadata services include the same standard metadata services as for the other data types:
# get all paths for the Taxon datatype
tc$get_paths()$content
## [1] "sourceSystem.code"
## [2] "sourceSystem.name"
## [3] "sourceSystemId"
## [4] "recordURI"
## [5] "sourceSystemParentId"
## [6] "taxonRank"
## [7] "taxonRemarks"
## [8] "occurrenceStatusVerbatim"
## [9] "acceptedName.fullScientificName"
## [10] "acceptedName.taxonomicStatus"
## [11] "acceptedName.genusOrMonomial"
## [12] "acceptedName.subgenus"
## [13] "acceptedName.specificEpithet"
## [14] "acceptedName.infraspecificEpithet"
## [15] "acceptedName.infraspecificMarker"
## [16] "acceptedName.nameAddendum"
## [17] "acceptedName.authorshipVerbatim"
## [18] "acceptedName.author"
## [19] "acceptedName.year"
## [20] "acceptedName.scientificNameGroup"
## [21] "acceptedName.references.titleCitation"
## [22] "acceptedName.references.citationDetail"
## [23] "acceptedName.references.uri"
## [24] "acceptedName.references.author.agentText"
## [25] "acceptedName.references.author.fullName"
## [26] "acceptedName.references.author.organization.agentText"
## [27] "acceptedName.references.author.organization.name"
## [28] "acceptedName.references.publicationDate"
## [29] "acceptedName.experts.agentText"
## [30] "acceptedName.experts.fullName"
## [31] "acceptedName.experts.organization.agentText"
## [32] "acceptedName.experts.organization.name"
## [33] "defaultClassification.domain"
## [34] "defaultClassification.subKingdom"
## [35] "defaultClassification.kingdom"
## [36] "defaultClassification.phylum"
## [37] "defaultClassification.subPhylum"
## [38] "defaultClassification.superClass"
## [39] "defaultClassification.className"
## [40] "defaultClassification.subClass"
## [41] "defaultClassification.superOrder"
## [42] "defaultClassification.order"
## [43] "defaultClassification.subOrder"
## [44] "defaultClassification.infraOrder"
## [45] "defaultClassification.superFamily"
## [46] "defaultClassification.family"
## [47] "defaultClassification.subFamily"
## [48] "defaultClassification.tribe"
## [49] "defaultClassification.subTribe"
## [50] "defaultClassification.genus"
## [51] "defaultClassification.subgenus"
## [52] "defaultClassification.specificEpithet"
## [53] "defaultClassification.infraspecificEpithet"
## [54] "defaultClassification.infraspecificRank"
## [55] "systemClassification.rank"
## [56] "systemClassification.name"
## [57] "synonyms.fullScientificName"
## [58] "synonyms.taxonomicStatus"
## [59] "synonyms.genusOrMonomial"
## [60] "synonyms.subgenus"
## [61] "synonyms.specificEpithet"
## [62] "synonyms.infraspecificEpithet"
## [63] "synonyms.infraspecificMarker"
## [64] "synonyms.nameAddendum"
## [65] "synonyms.authorshipVerbatim"
## [66] "synonyms.author"
## [67] "synonyms.year"
## [68] "synonyms.scientificNameGroup"
## [69] "synonyms.references.titleCitation"
## [70] "synonyms.references.citationDetail"
## [71] "synonyms.references.uri"
## [72] "synonyms.references.author.agentText"
## [73] "synonyms.references.author.fullName"
## [74] "synonyms.references.author.organization.agentText"
## [75] "synonyms.references.author.organization.name"
## [76] "synonyms.references.publicationDate"
## [77] "synonyms.experts.agentText"
## [78] "synonyms.experts.fullName"
## [79] "synonyms.experts.organization.agentText"
## [80] "synonyms.experts.organization.name"
## [81] "vernacularNames.name"
## [82] "vernacularNames.language"
## [83] "vernacularNames.preferred"
## [84] "vernacularNames.references.titleCitation"
## [85] "vernacularNames.references.citationDetail"
## [86] "vernacularNames.references.uri"
## [87] "vernacularNames.references.author.agentText"
## [88] "vernacularNames.references.author.fullName"
## [89] "vernacularNames.references.author.organization.agentText"
## [90] "vernacularNames.references.author.organization.name"
## [91] "vernacularNames.references.publicationDate"
## [92] "vernacularNames.experts.agentText"
## [93] "vernacularNames.experts.fullName"
## [94] "vernacularNames.experts.organization.agentText"
## [95] "vernacularNames.experts.organization.name"
## [96] "descriptions.description"
## [97] "descriptions.category"
## [98] "descriptions.language"
## [99] "descriptions.author"
## [100] "descriptions.license"
## [101] "descriptions.publicationDate"
## [102] "references.titleCitation"
## [103] "references.citationDetail"
## [104] "references.uri"
## [105] "references.author.agentText"
## [106] "references.author.fullName"
## [107] "references.author.organization.agentText"
## [108] "references.author.organization.name"
## [109] "references.publicationDate"
## [110] "experts.agentText"
## [111] "experts.fullName"
## [112] "experts.organization.agentText"
## [113] "experts.organization.name"
# get info e.g. for field collectionType
tc$get_field_info()$content$synonyms.author
## $indexed
## [1] TRUE
##
## $type
## [1] "keyword"
##
## $allowedOperators
## [1] "=" "!=" "EQUALS_IC"
## [4] "NOT_EQUALS_IC" "CONTAINS" "NOT_CONTAINS"
## [7] "IN" "NOT_IN" "MATCHES"
## [10] "NOT_MATCHES" "STARTS_WITH" "NOT_STARTS_WITH"
## [13] "STARTS_WITH_IC" "NOT_STARTS_WITH_IC"
# get all settings
tc$get_settings()
## <Response>
## Fields:
## content: object of class <list>
## response: object of class <response> (httr package)
# get specific setting
tc$get_setting("index.max_result_window")$content
## [[1]]
## [1] 50000
# check if operator is allowed
tc$is_operator_allowed("synonyms.author", "EQUALS")$content
## [1] TRUE
All fields can be retrieved with get_paths
and specific information on the fields, such as allowed operators etc. with get_field_info
tc$get_paths()$content
tc$get_field_info()$content
Multimedia-specific settings can be retrieved with get_settings
and a specific setting with get_setting
:
tc$get_settings()$content
## $index.max_result_window
## [1] 50000
##
## $taxon.group_by_scientific_name.max_num_buckets
## [1] 10000
tc$get_setting("index.max_result_window")$content
## [[1]]
## [1] 50000
To test if a certain operator can be used for a mutimedia query:
tc$is_operator_allowed("identifications.defaultClassification.genus",
"STARTS_WITH")$content
## Warning in self$handleError(response): Status code:500
## Internal Server Error
## Exception: Invalid element "identifications" in path "identifications.defaultClassification.genus"
## Exception type: nl.naturalis.nba.api.NoSuchFieldException
## Full stack trace stored in response object
## [1] "Internal Server Error"
The taxonomic information in the NBA is also available as Darwin-Core archive files.
Static download services offer the download of predefined datasets. The sets that are available for download can be queried with dwca_get_data_set_names
:
tc$dwca_get_data_set_names()$content
## [1] "dcsr" "nsr"
A dataset can then be downloaded using dwca_get_data_set
. A filename can be given as argument, if none is given, the DwCA archive is written to download-YYYY-MM-DDThh:mm.zip in the current working directory.
# download dataset 'nsr' to temporary file
filename <- tempfile(fileext=".zip")
tc$dwca_get_data_set('nsr', filename=filename)
## Query result written to /tmp/RtmpUimw33/file18b34c3b3ba.zip
The dynamic download function dwca_query
allows for download of arbitrary sets, defined by the user’s query. The arguments to this methods are similar to query
, plus the filename:
# download all taxa for genus 'Clematis'
filename <- tempfile(fileext = ".zip")
qs <-
QuerySpec$new(conditions = list(
QueryCondition$new(
field = "defaultClassification.genus",
operator = "EQUALS",
value = "Clematis"
)
))
tc$dwca_query(querySpec = qs, filename = filename)
## Query result written to /tmp/RtmpUimw33/file18b71cd969d.zip
The GeoArea query service allows for detailed search within the fields of a GeoArea object. As for the other data types, query parameters can be either given as a list
or as a QuerySpec
object. Below, we make a simple query to get the GeoArea
object for the Netherlands:
# instantiate client for geo areas
gc <- GeoClient$new()
# query for GeoArea of the Netherlands
qc <-
QueryCondition$new(field = "locality",
operator = "EQUALS",
value = "Netherlands")
qs <- QuerySpec$new(conditions = list(qc))
res <- gc$query(qs)
# get item
res$content$resultSet[[1]]$item
## <GeoArea>
## Fields:
## sourceSystem: object of class <SourceSystem>
## sourceSystemId: 1004050
## recordURI:
## id: [email protected]
## areaType: Country
## locality: Netherlands
## shape: list of length 2
## source: World Countries
## isoCode: NLD
## countryNL: Nederland
## Methods:
## fromJSONString
## toJSONString
## fromList
## toList
## print
This is a convenience function to directly extract the a GeoJSON object for a specific locality. GeoJSON is a popular format for storing geographical point- and polygon data. To e.g. extract the GeoJSON polygon representation for the Netherlands:
loc <- "Netherlands"
res <- gc$get_geo_json_for_locality(loc)
Results are returned as a list
by default, but can be easily converted to a JSON string, e.g.
jsonlite::toJSON(res$content)
Geo Metadata services include the same standard metadata services as for the other data types:
# get all paths for the GeoArea datatype
gc$get_paths()$content
## [1] "sourceSystem.code" "sourceSystem.name" "sourceSystemId"
## [4] "recordURI" "areaType" "locality"
## [7] "shape" "source" "isoCode"
## [10] "countryNL"
# get info e.g. for field 'areaType'
gc$get_field_info()$content$areaType
## $indexed
## [1] TRUE
##
## $type
## [1] "keyword"
##
## $allowedOperators
## [1] "=" "!=" "EQUALS_IC"
## [4] "NOT_EQUALS_IC" "IN" "NOT_IN"
## [7] "MATCHES" "NOT_MATCHES" "STARTS_WITH"
## [10] "NOT_STARTS_WITH" "STARTS_WITH_IC" "NOT_STARTS_WITH_IC"
# get all settings
gc$get_settings()
## <Response>
## Fields:
## content: object of class <list>
## response: object of class <response> (httr package)
# get specific setting
gc$get_setting("index.max_result_window")$content
## [[1]]
## [1] 50000
# check if operator is allowed
gc$is_operator_allowed("locality", "STARTS_WITH")$content
## [1] TRUE
Multimedia services are accessible with a MultimediaClient
, instantiated as follows:
mc <- MultimediaClient$new()
As for the other data types, the query
method enables simple and complex queries using a list or a QuerySpec
object to specify query parameters.
# example of multimedia query passing parameters as a list
mc$query(queryParams = list(collectionType = 'Cnidaria'))$content
## <QueryResult>
## Fields:
## totalSize: 42432
## resultSet: list of length 10
## Methods:
## fromJSONString
## toJSONString
## fromList
## toList
## print
# example of multimedia query using QuerySpec: get the first 100
# multimedia items associated with a specimen with name starting with "Ba"
qc <-
QueryCondition$new(field =
"identifications.scientificName.fullScientificName",
operator =
"STARTS_WITH",
value =
"Qu")
qs <- QuerySpec$new(conditions = list(qc), size = 100)
res <- mc$query(qs)
# check if scientific names indeed start with 'Qu'
sapply(res$content$resultSet, function(x)
x$item$identifications[[1]]$scientificName$fullScientificName)
## [1] "Quercus poilanei Hickel & A.Camus"
## [2] "Quercus L."
## [3] "Quiinaceae"
## [4] "Quercus luzoniensis Merr."
## [5] "Qualea homosepala Ducke"
## [6] "Qualea rupicola Ducke"
## [7] "Quadrella incana (Kunth) Iltis & Cornejo"
## [8] "Quadrella incana (Kunth) Iltis & Cornejo"
## [9] "Quesnelia imbricata L.B.Sm."
## [10] "Quercus pubescens Willd."
## [11] "Quercus pubescens Willd."
## [12] "Quercus pubescens Willd."
## [13] "Quercus pubescens Willd."
## [14] "Quercus lanuginosa Lam."
## [15] "Quercus pubescens Willd. var. congesta Presl."
## [16] "Quercus petraea (Matt.) Liebl."
## [17] "Quercus pubescens Willd."
## [18] "Quercus pubescens Willd."
## [19] "Quercus pubescens Willd."
## [20] "Quercus pubescens Willd. subsp. palensis"
## [21] "Quercus macrolepis Kotschy"
## [22] "Quercus petraea (Matt.) Liebl. subsp. iberica"
## [23] "Quercus petraea (Matt.) Liebl."
## [24] "Quercus ilex L."
## [25] "Quercus ilex L."
## [26] "Quercus ilex L."
## [27] "Quercus ilex L."
## [28] "Quercus ilex L."
## [29] "Quercus gemelliflora Blume"
## [30] "Quercus gemelliflora Blume"
## [31] "Quercus harlandii Hance ex Walp."
## [32] "Quercus caudatifolia Merr."
## [33] "Quercus caudatifolia Merr."
## [34] "Quercus polystachya Wall. ex A.DC."
## [35] "Quercus placentaria Blume"
## [36] "Quercus tribuloides Sm."
## [37] "Quercus semecarpifolia Sm."
## [38] "Quercus baloot Griff."
## [39] "Quercus ballota Desf."
## [40] "Quercus incana Bartram"
## [41] "Quisqualis indica L."
## [42] "Quisqualis indica L."
## [43] "Quisqualis indica L."
## [44] "Quercus reticulata Humb. & Bonpl."
## [45] "Quercus oglethorpensis W.H.Duncan"
## [46] "Quercus L."
## [47] "Quercus L."
## [48] "Quercus L."
## [49] "Quercus coccifera L."
## [50] "Quercus coccinea Münchh."
## [51] "Quercus fissa Champ. ex Benth."
## [52] "Quercus L."
## [53] "Quercus ilex L."
## [54] "Quercus L."
## [55] "Quercus robur L."
## [56] "Quercus virginiana Mill."
## [57] "Quercus L."
## [58] "Quercus L."
## [59] "Quercus acuta Thunb."
## [60] "Quercus aliena Blume"
## [61] "Quercus castaneifolia C.A.Mey."
## [62] "Quercus sadleriana R.Br.ter"
## [63] "Quercus ×shirlingii Bush"
## [64] "Quercus stellata Wangenh."
## [65] "Quercus phellos L."
## [66] "Quercus nuttallii E.J.Palmer"
## [67] "Quercus pagodifolia (Elliott) Ashe"
## [68] "Quercus muehlenbergii Engelm."
## [69] "Quercus myrtifolia Willd."
## [70] "Quercus nigra L."
## [71] "Quercus muehlenbergii Engelm."
## [72] "Quercus muehlenbergii Engelm."
## [73] "Quercus macrocarpa Michx."
## [74] "Quercus marilandica Münchh."
## [75] "Quercus michauxii Nutt."
## [76] "Quercus reinwardtii Korth."
## [77] "Quintinia apoensis (Elmer) Schltr."
## [78] "Qualea densiflora Warm."
## [79] "Quassia africana (Baill.) Baill."
## [80] "Quassia africana (Baill.) Baill."
## [81] "Quercus spicata Sm."
## [82] "Quercus sundaica Blume"
## [83] "Quercus coccifera L."
## [84] "Quassia africana (Baill.) Baill."
## [85] "Quassia africana (Baill.) Baill."
## [86] "Quinchamalium procumbens Ruiz & Pav."
## [87] "Quiina florida Tul."
## [88] "Quiina negrensis A.C.Sm."
## [89] "Quiina guianensis Aubl."
## [90] "Quiina cruegeriana Griseb."
## [91] "Quapoya sipapoana Maguire"
## [92] "Quinchamalium majus Brongn."
## [93] "Quinchamalium raimondii Pilg."
## [94] "Quadrella indica (L.) Iltis & Cornejo"
## [95] "Quadrella ferruginea (L.) Iltis & Cornejo"
## [96] "Quadrella indica (L.) Iltis & Cornejo"
## [97] "Qualea schomburgkiana Warm."
## [98] "Quadrella odoratissima (Jacq.) Hutch."
## [99] "Quararibea lasiocalyx (K.Schum.) Vischer"
## [100] "Quararibea"
This function returns all values present for a certain field, and their counts. Example: retrieve all different licenses and their counts"
mc$get_distinct_values("license")$content
## $`CC0 1.0`
## [1] 5241130
##
## $`CC BY-NC-ND 4.0`
## [1] 5052428
##
## $`All rights reserved`
## [1] 1772984
##
## $`CC BY-NC-SA 4.0`
## [1] 973577
##
## $`CC BY-NC 4.0`
## [1] 501427
##
## $`CC BY 4.0`
## [1] 269088
##
## $`CC BY-ND 4.0`
## [1] 126687
##
## $`CC BY-NC-SA 3.0`
## [1] 125758
##
## $`CC BY-NC-ND 2.5`
## [1] 97456
##
## $`CC BY-SA 4.0`
## [1] 57608
All fields can be retrieved with get_paths
and specific information on the fields, such as allowed operators etc. with get_field_info
mc$get_paths()$content
mc$get_field_info()$content
Metadata services provide miscellaneous information about the data available via the NBA. Note that there is also type-specific metadata for each data type (e.g. Specimen
) which can be retrieved with the specific client of that class. Here we show the available methods for the MetadataClient
which gives general, non-type specific metadata. The client is instantiated in the standard way:
mc <- MetadataClient$new()
The vocabularies for some fields are controlled by dictionaries with allowed values. For the sex of a museum specimen, for instance, only the terms male, female, mixed and hermaphrodite are allowed to be assigned to the specimen. The fields for which controlled lists are available can be retrieved as follows:
mc$get_controlled_lists()$content
## [1] "AreaClass" "License" "LicenseType"
## [4] "RelationType" "Sex" "SpatialDatum"
## [7] "SpecimenTypeStatus" "TaxonomicStatus"
and for each field that has a controlled vocabulary, there is a separate function to retrieve the allowed values:
mc$get_controlled_list_taxonomic_status()$content
## [1] "accepted name" "alternative name"
## [3] "ambiguous synonym" "basionym"
## [5] "homonym" "invalid name"
## [7] "misapplied name" "misidentification"
## [9] "misspelled name" "nomen nudum"
## [11] "preferred name" "provisionally accepted name"
## [13] "synonym"
mc$get_controlled_list_specimen_type_status()$content
## [1] "allotype" "epitype" "hapantotype" "holotype"
## [5] "isoepitype" "isolectotype" "isoneotype" "isosyntype"
## [9] "isotype" "lectotype" "neotype" "paratype"
## [13] "paralectotype" "syntype" "topotype" "type"
mc$get_controlled_list_sex()$content
## [1] "male" "female" "mixed" "hermaphrodite"
## [5] "unknowable" "undetermined"
mc$get_controlled_list_phase_or_stage()$content
## Warning in self$handleError(response): javax.ws.rs.NotFoundException:
## RESTEASY003210: Could not find resource for full path: http://
## api.biodiversitydata.nl/v2/metadata/getControlledList/PhaseOrStage
## [1] "javax.ws.rs.NotFoundException: RESTEASY003210: Could not find resource for full path: http://api.biodiversitydata.nl/v2/metadata/getControlledList/PhaseOrStage"
To maintain data integrity, dates have to be coded in specific formats in our systems. For instance, yyyy-MM-dd is a valid format. Allowed formats can be retrieved as follows:
mc$get_allowed_date_formats()$content
## [1] "yyyy-MM-dd'T'HH:mm:ssZ" "yyyy-MM-dd'T'HH:mm[:ss][.SSS]Z"
## [3] "yyyy-MM-dd'T'HH:mm[:ss]" "yyyy-MM-dd HH:mm[:ss]"
## [5] "yyyy-MM-dd" "yyyy-MM"
## [7] "yyyy"
The method get_rest_services
returns a list of all services in the NBA as objects of type RestService
.
# get the endPoint of the first rest service in the services list
mc$get_rest_services()$content[[1]]
## <RestService>
## Fields:
## endPoint: /
## method: GET
## consumes:
## produces: list of length 1
## url: https://api.biodiversitydata.nl/v2/
## Methods:
## fromJSONString
## toJSONString
## fromList
## toList
## print
Similar to the document-specific metadata services, we can get general settings with the MetaDataClient
:
#get all settings
mc$get_settings()$content
## $operator.contains.min_term_length
## [1] 3
##
## $operator.contains.max_term_length
## [1] 15
# get value for specific setting
mc$get_setting("operator.contains.min_term_length")$content
## [[1]]
## [1] 3