Getting Occurrence Counts From GBIF
John Waller
2023-03-02
Source:vignettes/occ_counts.Rmd
occ_counts.Rmd
This article requires rgbif >= v3.7.6
It can sometimes be useful to know the number of occurrences for a
country, species, basis of record, or year. With
occ_count()
it is possible to get simple occurrence counts
for wide variety of queries.
Running occ_count()
with no arguments will give the
total number of occurrences mediated by GBIF.
occ_count() # should be over 2 billion!
occ_count()
uses the same interface as
occ_search()
, so almost any query that works for
occ_search()
will work for occ_count()
. In
fact, occ_count()
is just a short version of
occ_search(limit=0)$meta$count
.
Get the total number of bird occurrences mediated by GBIF.
# should give the same result
occ_count(scientificName="Aves")
occ_search(scientificName="Aves",limit=0)$meta$count
It is usually better to use taxonKeys rather than
scientific names. Note the use of ;
for multiple values in
the same query.
# name_backbone("Aves")$usageKey
occ_count(taxonKey=212)
# total count of birds and insects
occ_count(taxonKey="212;216")
It is possible to get counts by country or area
using the appropriate 2-letter country code. See
enumeration_country()
.
# occurrences in Denmark
occ_count(country="DK")
# occurrences in Denmark and United States
occ_count(country="DK;US")
# occurrences in Denmark, United States, Mexico
occ_count(country="DK;US;MX")
# number of occurrences published by the United States
occ_count(publishingCountry="US")
# number of occurrences published by the United States and Japan
occ_count(publishingCountry="US;JP")
# number of repatriated records in India
occ_count(repatriated = TRUE,country="IN")
# number of insect occurrence records published by Japan
occ_count(taxonKey=216,publishingCountry="JP")
# number of specimen insect occurrence records published by Japan between the years 1900-2000
occ_count(publishingCountry="JP",basisOfRecord="PRESERVED_SPECIMEN",taxonKey=216,year="1900,2000")
Some occ_search()
parameters accept a
range of values, and these will also work for
occ_count()
. A ,
is used to define a range,
such as year="1900,2000"
.
Note that ‘year’ means the year when the occurrence was recorded or collected, not when it was published to GBIF.
# number of occurrences between the years
occ_count(year="1800,1900")
# In recorded or collected in 2023
occ_count(year=2023)
# all occurrences published with a coordinate uncertainty less than 10m
occ_count(coordinateUncertaintyInMeters = "0,10")
# close to a known country (iso2) centroid
occ_count(distanceFromCentroidInMeters="0,2000")
# close to a known country (iso2) centroid in Sweden
occ_count(distanceFromCentroidInMeters="0,2000",country="SE")
# not close to a known country (iso2) centroid in Sweden
occ_count(distanceFromCentroidInMeters="2000,*",country="SE")
Note that occ_count()
will ignore missing
values, so if a publisher has not filled in a value, it will
not be returned in the count. For example, it is common for occurrence
publishers to leave the coordinateUncertaintyInMeters
blank, but not very common to leave the coordinates
fields empty.
Here are some other interesting occurrence counts:
# recorded by John Waller
occ_count(recordedBy="John Waller")
# exactly on 0,0
occ_count(decimalLatitude=0, decimalLongitude=0)
# published using DIGIR format
occ_count(protocol = "DIGIR")
# with images
occ_count(mediaType = 'StillImage')
# number of occurrences iucn status "critically endangered"
occ_count(iucnRedListCategory="CR")
# counts by verbatim name supplied by the occurrence publisher
occ_count(verbatimScientificName="Calopteryx splendens;Calopteryx virgo")
# counts by WKT geometries
occ_count(geometry="POLYGON((24.70938 48.9221,24.71056 48.92175,24.71107 48.92296,24.71002 48.92318,24.70938 48.9221))")
There are some occ_search()
queries that do
not work. It’s not possible to give occ_count()
multiple values in the form c("a","b")
.
Since this will perform two separate request and get two separate
counts. occ_count()
is designed to give back a single
number, so querying with multiple values is not supported.
# will give ERROR
# occ_count(scientificName=c("Calopteryx splendens","Calopteryx virgo"))
# will work but will give the total count of both species.
occ_count(scientificName="Calopteryx splendens;Calopteryx virgo")
Getting counts using facets
occ_count()
also supports querying via the
facets interface. Using
occ_count(facet="x")
will return a
data.frame
.
All below will get a table of occurrence counts by year.
occ_count(facet="year")
occ_count(facet="year",facetLimit=400)
occ_count_year()
year | count |
---|---|
2021 | 230817072 |
2020 | 206722953 |
2019 | 172476238 |
Counts from facets are sorted by
count
. UsefacetLimit
to control the number of rows returned.
The facets interface uses occ_search()
internally, so
this table can also be fetched using
occ_search(facet="year",occurrenceStatus="PRESENT",limit=0)$facets$year
.
This particular count is also available via a custom function
occ_count_year()
(see below).
Almost any occ_search()
parameter can be used via the
facets interface. Facets can be combined with other search filters to
produce a custom result.
# top scientificNames from Japan
occ_count(facet="scientificName",country="JP")
# top countries publishing specimen bird records between 1850 and 1880
occ_count(facet="scientificName",taxonKey=212,basisOfRecord="PRESERVED_SPECIMEN",year="1850,1880")
# Number of present or absence records of Elephants
occ_count(facet="occurrenceStatus",scientificName="Elephantidae")
# top 100 datasets publishing occurrences to GBIF
occ_count(facet="datasetKey",facetLimit=100)
# top datasets publishing country centroids on GBIF
occ_count(facet="datasetKey",distanceFromCentroidInMeters="0")
# common values for coordinateUncertaintyInMeters for museum specimens
occ_count(facet="coordinateUncertaintyInMeters",basisOfRecord="PRESERVED_SPECIMEN")
# number of iucn listed bird and insect occurrences in Mexico
occ_count(facet="iucnRedListCategory",taxonKey="212;216",country="MX")
# most common latitude values mediated by GBIF
occ_count(facet="decimalLatitude")
# top iNaturalist users publishing research-grade obs to GBIF
occ_count(facet="recordedBy",datasetKey="50c9509d-22c7-4a22-a47d-8c48425ef4a7")
# top 100 iNaturalist users from Ukraine
occ_count(facet="recordedBy",datasetKey="50c9509d-22c7-4a22-a47d-8c48425ef4a7",country="UA",facetLimit=100)
# top institutions publishing specimen occurrences to GBIF
occ_count(facet="institutionCode",basisOfRecord="PRESERVED_SPECIMEN")
Only parameters “lastInterpreted”, “eventDate”, and “geometry” cannot be faceted. Multiple values for facets is not supported with
occ_count()
, useocc_search()
instead.
Using facets to get species counts
Facets can also be quick way for getting unique counts for certain queries, such as species counts.
# unique number of species in Sweden
occ_count(facet="speciesKey",facetLimit=200000,country="SE") %>% nrow()
# unique number of iucn endangered species recently in Sweden without common geospatial issues
occ_count(facet="speciesKey",iucnRedListCategory="CR",country="SE",hasGeospatialIssue = FALSE,year="2000,2023",facetLimit=200000) %>% nrow()
# unique number of species in this WKT polygon
occ_count(facet="speciesKey",geometry="POLYGON((24.70938 48.9221,24.71056 48.92175,24.71107 48.92296,24.71002 48.92318,24.70938 48.9221))",facetLimit=200000) %>% nrow()
# Will not work since the query returns too many results
# occ_count(facet="speciesKey",facetLimit=500000) %>% nrow()
Note that if your query returns many rows, you may exceed the
facetLimit
max value. I have tested withfacetLimit=500000
, but larger values may fail.
Using occ_count_* functions
While the facets are powerful and quite useful, they can be
slow. For this reason, GBIF also has a few custom API
endpoints for getting a table of useful occurrence
counts. These can be accessed via the occ_count_*
family of
functions.
occ_count_country()
will give back a table of total
occurrence counts for each country or area.
title | enumName | iso2 | iso3 | isoNumerical | gbifRegion | count |
---|---|---|---|---|---|---|
United States of America | UNITED_STATES | US | USA | 840 | NORTH_AMERICA | 821428365 |
France | FRANCE | FR | FRA | 250 | EUROPE | 136246772 |
Canada | CANADA | CA | CAN | 124 | NORTH_AMERICA | 134162884 |
Sweden | SWEDEN | SE | SWE | 752 | EUROPE | 120223673 |
occ_count_country(publishingCountry="MX")
will return a
table of counts with countries where Mexico publishes occurrences
records.
title | enumName | iso2 | iso3 | isoNumerical | gbifRegion | count |
---|---|---|---|---|---|---|
Mexico | MEXICO | MX | MEX | 484 | LATIN_AMERICA | 21078158 |
United States of America | UNITED_STATES | US | USA | 840 | NORTH_AMERICA | 124478 |
Guatemala | GUATEMALA | GT | GTM | 320 | LATIN_AMERICA | 43327 |
occ_count_pub_country(country="MX")
will return a table
of occurrence counts for each publishing country about
Mexico. Note that the value for Mexico in this table will be the same as
the one above.
See the examples below for more clarification:
# the occurrences Mexico has published in other countries
occ_count_country("MX")
# the occurrences Denmark has published in other countries
occ_count_country("DK")
# the occurrences other countries have published in Denmark
occ_count_pub_country("DK")
# the occurrences other countries have published in Mexico
occ_count_pub_country("MX")
occ_count_year()
will return a table of total occurrence
counts for each year that an occurrence was recorded or collected
(not when published to GBIF).
year | count |
---|---|
2023 | 2128333 |
2022 | 44899603 |
2021 | 230671677 |
occ_count_basis_of_record()
will return a table of
occurrences counts for each basis of record type.
basisOfRecord | count |
---|---|
HUMAN_OBSERVATION | 1967253952 |
PRESERVED_SPECIMEN | 212934368 |
MATERIAL_SAMPLE | 51946547 |
OBSERVATION | 23398982 |
OCCURRENCE | 20362535 |