Working With Taxonomic Names
In order to use GBIF mediated data effectively, you will often need to match a scientific name to the GBIF Backbone Taxonomy.
The goal of name matching is to get back an unambiguous taxonomic key (a number) of the scientific name you are interested in. Having a key makes it easy for GBIF to know what you mean.
name_backbone_checklist are the best ways to go from scientific name to GBIF taxonkey.
name_backbone(name="Calopteryx splendens") # name_backbone(name="Calopteryx splendens", verbose=TRUE)
This will return a
data.frame of the single best match for the name you supplied.
The most interesting columns are:
- usageKey: Another name for the GBIF taxonkey.
name_backbonewill always return only “ACCEPTED” names.
matchType : “EXACT”, “HIGHERRANK”, “FUZZY”, or “NONE” (see below).
- verbatim_name : The name you supplied to GBIF. Useful for matching back to your original data.
A matchType of “HIGHERRANK” usually means the name is not in the GBIF backbone or it is not a species-level name (a genus, family, order …). A matchType of “FUZZY” means that the name you supplied may have been mis-spelled or is a variant not in the backbone. A matchType of “Exact” means the binomial name appears exactly as spelled by you in the GBIF backbone (note that it ignores authorship info).
If you have multiple names to match, you can use
# This requires the newest version of rgbif name_list <- c( "Cirsium arvense (L.) Scop.", "Calopteryx splendens", "Puma concolor (Linnaeus, 1771)", "Ceylonosticta alwisi", "Fake species (John Waller 2021)", "Calopteryx") name_backbone_checklist(name_list)
name_backbone_checklist will also work with a
data.frame of name information also known as a checklist.
name_data <- data.frame( scientificName = c( "Cirsium arvense (L.) Scop.", # a plant "Calopteryx splendens (Harris, 1780)", # an insect "Puma concolor (Linnaeus, 1771)", # a big cat "Ceylonosticta alwisi (Priyadarshana & Wijewardhane, 2016)", # newly discovered insect "Puma concuolor (Linnaeus, 1771)", # a mis-spelled big cat "Fake species (John Waller 2021)", # a fake species "Calopteryx" # Just a Genus ), kingdom = c( "Plantae", "Animalia", "Animalia", "Animalia", "Animalia", "Johnlia", "Animalia" )) name_backbone_checklist(name_data) # To return more than just the 'best' results, run # name_backbone_checklist(name_data,verbose=TRUE)
name_backbone_checklist with a
data.frame, you can include higher taxonomic information (genus, family, order, phylum, kingdom, rank) as columns. The ‘name’ column can also be one of several commonly used aliases (scientificName, sci_name, names, species, species_name, sp_name).
name_data <- data.frame( species = c( "Cirsium arvense (L.) Scop.", # a plant "Calopteryx splendens (Harris, 1780)", # an insect "Puma concolor (Linnaeus, 1771)" ), kingdom = c( "Plantae", "Animalia", "Animalia" )) name_backbone_checklist(name_data)
Too many choices problem
When two or more names exist in the GBIF Backbone Taxonomy that have the same name but different authorship (homotypic synonyms), supplying just the binomial name will result in
matchType : "HIGHERRANK".
For example, the binomial name “Glocianus punctiger” has two entries in the backbone taxonomy. Using the
verbose=TRUE will return both names.
name_backbone("Glocianus punctiger",verbose=TRUE) # returns more names # "Glocianus punctiger (C.R.Sahlberg, 1835)" # "Glocianus punctiger (Gyllenhal, 1837)"
However, giving just the binomial name will return the genus Glocianus, since GBIF doesn’t know which one to choose.
name_backbone("Glocianus punctiger") # matchType : "HIGHERRANK"
name_backbone is designed to give back the best match, it’s not possible for the response to choose between the two names.
Other name_* functions
There are several functions for finding taxonomic information. Typically, the function you want to use is
name_backbone_checklist, but these other functions can also be useful in certain situations.
name_suggest can be useful for looking up subspecies or partial names. It is the same service that lets gbif.org guess which name you are typing in the occurrence search.
name_lookup can be sometimes useful for seeing what is available in other checklists.
name_usage is a catch all function that does a lot.
?name_usage for more examples.
name_usage can be used for looking up all of the order, families, or genera in a higher-rank group.
library(dplyr) # all bird genera, families, and orders name_usage(212,data="children",limit=200)$data %>% filter(!is.na(nubKey)) %>% # only things with a GBIF backbone nubKey glimpse()
name_usage can be used for looking up the common names or vernacular names.
name_usage(key=212, data="vernacularNames")$data # the common names for birds
Read more about how the GBIF backbone is made here.