Getting Occurrence Counts From GBIF
This article requires rgbif >= v3.7.6
It can sometimes be useful to know the number of occurrences for a country, species, basis of record, or year. With
occ_count() it is possible to get simple occurrence counts for wide variety of queries.
occ_count() with no arguments will give the total number of occurrences mediated by GBIF.
occ_count() # should be over 2 billion!
occ_count() uses the same interface as
occ_search(), so almost any query that works for
occ_search() will work for
occ_count(). In fact,
occ_count() is just a short version of
Get the total number of bird occurrences mediated by GBIF.
# should give the same result occ_count(scientificName="Aves") occ_search(scientificName="Aves",limit=0)$meta$count
It is usually better to use taxonKeys rather than scientific names. Note the use of
; for multiple values in the same query.
It is possible to get counts by country or area using the appropriate 2-letter country code. See
# occurrences in Denmark occ_count(country="DK") # occurrences in Denmark and United States occ_count(country="DK;US") # occurrences in Denmark, United States, Mexico occ_count(country="DK;US;MX") # number of occurrences published by the United States occ_count(publishingCountry="US") # number of occurrences published by the United States and Japan occ_count(publishingCountry="US;JP") # number of repatriated records in India occ_count(repatriated = TRUE,country="IN") # number of insect occurrence records published by Japan occ_count(taxonKey=216,publishingCountry="JP") # number of specimen insect occurrence records published by Japan between the years 1900-2000 occ_count(publishingCountry="JP",basisOfRecord="PRESERVED_SPECIMEN",taxonKey=216,year="1900,2000")
Note that ‘year’ means the year when the occurrence was recorded or collected, not when it was published to GBIF.
# number of occurrences between the years occ_count(year="1800,1900") # In recorded or collected in 2023 occ_count(year=2023) # all occurrences published with a coordinate uncertainty less than 10m occ_count(coordinateUncertaintyInMeters = "0,10") # close to a known country (iso2) centroid occ_count(distanceFromCentroidInMeters="0,2000") # close to a known country (iso2) centroid in Sweden occ_count(distanceFromCentroidInMeters="0,2000",country="SE") # not close to a known country (iso2) centroid in Sweden occ_count(distanceFromCentroidInMeters="2000,*",country="SE")
occ_count() will ignore missing values, so if a publisher has not filled in a value, it will not be returned in the count. For example, it is common for occurrence publishers to leave the
coordinateUncertaintyInMeters blank, but not very common to leave the coordinates fields empty.
Here are some other interesting occurrence counts:
# recorded by John Waller occ_count(recordedBy="John Waller") # exactly on 0,0 occ_count(decimalLatitude=0, decimalLongitude=0) # published using DIGIR format occ_count(protocol = "DIGIR") # with images occ_count(mediaType = 'StillImage') # number of occurrences iucn status "critically endangered" occ_count(iucnRedListCategory="CR") # counts by verbatim name supplied by the occurrence publisher occ_count(verbatimScientificName="Calopteryx splendens;Calopteryx virgo") # counts by WKT geometries occ_count(geometry="POLYGON((24.70938 48.9221,24.71056 48.92175,24.71107 48.92296,24.71002 48.92318,24.70938 48.9221))")
There are some
occ_search() queries that do not work. It’s not possible to give
occ_count() multiple values in the form
c("a","b"). Since this will perform two separate request and get two separate counts.
occ_count() is designed to give back a single number, so querying with multiple values is not supported.
# will give ERROR # occ_count(scientificName=c("Calopteryx splendens","Calopteryx virgo")) # will work but will give the total count of both species. occ_count(scientificName="Calopteryx splendens;Calopteryx virgo")
occ_count() also supports querying via the facets interface. Using
occ_count(facet="x") will return a
All below will get a table of occurrence counts by year.
Counts from facets are sorted by
facetLimitto control the number of rows returned.
The facets interface uses
occ_search() internally, so this table can also be fetched using
occ_search(facet="year",occurrenceStatus="PRESENT",limit=0)$facets$year. This particular count is also available via a custom function
occ_count_year() (see below).
occ_search() parameter can be used via the facets interface. Facets can be combined with other search filters to produce a custom result.
# top scientificNames from Japan occ_count(facet="scientificName",country="JP") # top countries publishing specimen bird records between 1850 and 1880 occ_count(facet="scientificName",taxonKey=212,basisOfRecord="PRESERVED_SPECIMEN",year="1850,1880") # Number of present or absence records of Elephants occ_count(facet="occurrenceStatus",scientificName="Elephantidae") # top 100 datasets publishing occurrences to GBIF occ_count(facet="datasetKey",facetLimit=100) # top datasets publishing country centroids on GBIF occ_count(facet="datasetKey",distanceFromCentroidInMeters="0") # common values for coordinateUncertaintyInMeters for museum specimens occ_count(facet="coordinateUncertaintyInMeters",basisOfRecord="PRESERVED_SPECIMEN") # number of iucn listed bird and insect occurrences in Mexico occ_count(facet="iucnRedListCategory",taxonKey="212;216",country="MX") # most common latitude values mediated by GBIF occ_count(facet="decimalLatitude") # top iNaturalist users publishing research-grade obs to GBIF occ_count(facet="recordedBy",datasetKey="50c9509d-22c7-4a22-a47d-8c48425ef4a7") # top 100 iNaturalist users from Ukraine occ_count(facet="recordedBy",datasetKey="50c9509d-22c7-4a22-a47d-8c48425ef4a7",country="UA",facetLimit=100) # top institutions publishing specimen occurrences to GBIF occ_count(facet="institutionCode",basisOfRecord="PRESERVED_SPECIMEN")
Facets can also be quick way for getting unique counts for certain queries, such as species counts.
# unique number of species in Sweden occ_count(facet="speciesKey",facetLimit=200000,country="SE") %>% nrow() # unique number of iucn endangered species recently in Sweden without common geospatial issues occ_count(facet="speciesKey",iucnRedListCategory="CR",country="SE",hasGeospatialIssue = FALSE,year="2000,2023",facetLimit=200000) %>% nrow() # unique number of species in this WKT polygon occ_count(facet="speciesKey",geometry="POLYGON((24.70938 48.9221,24.71056 48.92175,24.71107 48.92296,24.71002 48.92318,24.70938 48.9221))",facetLimit=200000) %>% nrow() # Will not work since the query returns too many results # occ_count(facet="speciesKey",facetLimit=500000) %>% nrow()
Note that if your query returns many rows, you may exceed the
facetLimitmax value. I have tested with
facetLimit=500000, but larger values may fail.
While the facets are powerful and quite useful, they can be slow. For this reason, GBIF also has a few custom API endpoints for getting a table of useful occurrence counts. These can be accessed via the
occ_count_* family of functions.
occ_count_country() will give back a table of total occurrence counts for each country or area.
|United States of America||UNITED_STATES||US||USA||840||NORTH_AMERICA||821428365|
occ_count_country(publishingCountry="MX") will return a table of counts with countries where Mexico publishes occurrences records.
|United States of America||UNITED_STATES||US||USA||840||NORTH_AMERICA||124478|
occ_count_pub_country(country="MX") will return a table of occurrence counts for each publishing country about Mexico. Note that the value for Mexico in this table will be the same as the one above.
See the examples below for more clarification:
# the occurrences Mexico has published in other countries occ_count_country("MX") # the occurrences Denmark has published in other countries occ_count_country("DK") # the occurrences other countries have published in Denmark occ_count_pub_country("DK") # the occurrences other countries have published in Mexico occ_count_pub_country("MX")
occ_count_year() will return a table of total occurrence counts for each year that an occurrence was recorded or collected (not when published to GBIF).
occ_count_basis_of_record() will return a table of occurrences counts for each basis of record type.