Main data types and classes

The data served by the NBA consists of four main data types:

  • Specimen
  • Taxon
  • Multimedia
  • Geo

Additionally, the data type Metadata stores miscellaneous information about NBA settings. Each of the data types is modelled as an R6 class and therefore has its own members such as fields and methods. Documentation about a specific class can be retrieved in the standard manner, e.g. ?Specimen Each class of the data model can be instantiated and has a toJSONString and toList method returning the object’s JSON representation and the object’s data as a list datatype, respectively.

# load nbaR
library(nbaR)

# instanciate specimen object
spec <- Specimen$new()
# represent data as JSON or list
spec$toJSONString()
spec$toList()

API Client classes

The interaction with the API is accomplished by the API client classes:

  • SpecimenClient
  • TaxonClient
  • MultimediaClient
  • GeoClient
  • MetadataClient
# initialize client
client <- SpecimenClient$new()

The client class is by default initialized to connect to the base URL http://api.biodiversitydata.nl/v2. For testing purposes, this can be set to a different URL, see ?SpecimenClient for details.

Queries

Concept

With the SpecimentClient created above, the query endpoint for specimens can now be reached via the query function. Query parameters can be specified as a list with named parameters. To query for instance for specimen records that have the type status holotype and a female sex, one can pass a named list as the queryParams parameter to the query function:

# specify two query conditions
l <- list(identifications.typeStatus="holotype", sex="female")

# run query
res <- client$query(queryParams=l)

The query function then returns an object of class Response, which, in turn, has a field content of class QueryResult. From the QueryResult, the single result items can be accessed as follows:

# get first element of results
spec <- res$content$resultSet[[1]]$item

# object should be of type Specimen
class(spec)
## [1] "Specimen" "R6"

Note that by default, both conditions are connected by a logical AND. The queryParams passed as a list thus correspond to basic human readable queries. For more advanced queries, containing different logical operators or nested sub-queries, the user can specify the query in a QuerySpec object.

Advanced queries

Using the QuerySpec object

Advanced queries with different operators than AND or nested query conditions can not be accomplished by simply passing the query parameters as a list. Instead, a query is modeled as a QuerySpec object which captures the relationships between multiple query terms. Please also refer to the NBA QuerySpec documentation for more information.

A QuerySpec object usually consists of one or more QueryCondition objects, specifying query terms. A QueryCondition object usually contains the fields field, operator, and value(see also ?QueryCondition). These fields can be specified in the constructor. If, for example, we want to query for records with a unitID equal to L.4304195, a QueryCondition would look as follows:

# make specimen client instance
client <- SpecimenClient$new()

# specify search in QueryCondition object
qc <-
  QueryCondition$new(field = "unitID",
                     operator = "EQUALS",
                     value = "L.4304195")

Now, a QuerySpec object can be assembled with the QueryCondition(s) passed as a list:

# build QuerySpec using above conditions
qs <- QuerySpec$new(conditions=list(qc))

# do the query
res <- client$query(querySpec=qs)

Below, we show an example of how to nest multiple query conditions. The query conditions below define to query for specimens of sex ** female* and family Equidae and of rank Species.

# specify multiple conditions
q1 <- QueryCondition$new(field = "sex",
                         operator = "EQUALS",
                         value = "female")
q2 <-
  QueryCondition$new(field = "identifications.defaultClassification.family",
                     operator = "EQUALS",
                     value = "Equidae")
q3 <- QueryCondition$new(field = "identifications.taxonRank",
                         operator = "EQUALS",
                         value = "species")

Extending the constraint to also include specimens of rank Subspecies, we can combine the latter condition with an additional one using the method or:

# logical conjunction using 'or' method
q3$or <- list(QueryCondition$new(field = "identifications.taxonRank",
                                 operator="EQUALS",
                                 value="subspecies"))

# build QuerySpec from QueryConditions
qs <- QuerySpec$new(conditions=list(q1, q2, q3))

# call API
res <-client$query(querySpec=qs)

Size of the query result set

By default, the NBA returns the first 10 hits for a given query. In, for instance, a query without parameters has many hits

res <- client$query()
res$content$totalSize
## [1] 48897478

but only the first 10 are returned in the resultSet:

length(res$content$resultSet)
## [1] 10

In order to increase the size of a resultSet, a size parameter can be passed to the constructor of a QuerySpec object. Below, we will get the first 1000 records of the query above:

qs <- QuerySpec$new(size=1000)
res <- client$query(querySpec=qs)
length(res$content$resultSet)
## [1] 1000

Operators

In the above examles we searched for fields that exactly match a given string using the operator EQUALS that is specified in the user-defined QueryCondition. However, for most fields there are more operators for matching available, including e.g. partial matching and ignoring cases:

## search for specimens of genus 'musa'
qc <- QueryCondition$new(field='identifications.defaultClassification.genus', 
                         operator='EQUALS', 
                         value='musa')
qs <- QuerySpec$new(conditions=list(qc))
res <- client$query(querySpec=qs)

## how many hits?
res$content$totalSize
## [1] 0
## no hits! But: Genus names are capitalised, so
## we will ignore the case using operator EQUALS_IC
qc <- QueryCondition$new(field='identifications.defaultClassification.genus', 
                         operator='EQUALS_IC', 
                         value='musa')
qs <- QuerySpec$new(conditions=list(qc))
res <- client$query(querySpec=qs)

## do we have more hits now?
res$content$totalSize
## [1] 505

The function get_field_info on a certain field for a certain datatype lists which operators are allowed for that field e.g. for the field identifications.defaultClassification.genus. Let’s look at other operators which can be used for this field:

client$get_field_info()$
content$identifications.defaultClassification.genus$
allowedOperators
##  [1] "EQUALS"             "NOT_EQUALS"         "EQUALS_IC"         
##  [4] "NOT_EQUALS_IC"      "CONTAINS"           "NOT_CONTAINS"      
##  [7] "IN"                 "NOT_IN"             "MATCHES"           
## [10] "NOT_MATCHES"        "STARTS_WITH"        "NOT_STARTS_WITH"   
## [13] "STARTS_WITH_IC"     "NOT_STARTS_WITH_IC"

Often useful is e.g. the operator IN, which allowes matching against multiple values given as a vector:

qc <- QueryCondition$new(field='identifications.defaultClassification.genus', 
                         operator='IN', 
                         value=c("Phoenix", "Trachycarpus"))
qs <- QuerySpec$new(conditions=list(qc))

## print QuerySpec JSON representation
qs$toJSONString()
## {
##   "conditions": [
##     {
##       "field": "identifications.defaultClassification.genus",
##       "operator": "IN",
##       "value": ["Phoenix", "Trachycarpus"]
##     }
##   ]
## }

For numeric or date fields, common comparison operators such as LT (less than) or GT (greater than) or BETWEEn are implemented:

## operators for gatheringEvent.dateTimeBegin
client$get_field_info()$
content$gatheringEvent.dateTimeBegin$
allowedOperators
##  [1] "EQUALS"        "NOT_EQUALS"    "EQUALS_IC"     "NOT_EQUALS_IC"
##  [5] "LT"            "LTE"           "GT"            "GTE"          
##  [9] "BETWEEN"       "NOT_BETWEEN"   "IN"            "NOT_IN"
## how many specimens were collected between 1600 and 1700?
qc <- QueryCondition$new(field='gatheringEvent.dateTimeBegin', 
                         operator='BETWEEN', 
                         value=c("1600", "1700"))
qs <- QuerySpec$new(conditions=list(qc))
client$count(querySpec=qs)$content
## [1] 563

For more information, please also refer to the NBA documentation on operators.