
Working With Taxonomic Names
John Waller
2021-12-20
Source:vignettes/taxonomic_names.Rmd
taxonomic_names.RmdIn order to use GBIF mediated data effectively, you will often need to match a scientific name to the GBIF Backbone Taxonomy.
The goal of name matching is to get back an unambiguous taxonomic key (a number) of the scientific name you are interested in. Having a key makes it easy for GBIF to know what you mean.
name_backbone or name_backbone_checklist
are the best ways to go from scientific
name to GBIF taxonkey.
name_backbone(name="Calopteryx splendens")
# name_backbone(name="Calopteryx splendens", verbose=TRUE)This will return a data.frame of the single best
match for the name you supplied.
The most interesting columns are:
- usageKey: Another name for the GBIF taxonkey.
-
status :
name_backbonewill always return only “ACCEPTED” names. -
matchType : “EXACT”, “HIGHERRANK”, “FUZZY”, or
“NONE” (see below).
- verbatim_name : The name you supplied to GBIF. Useful for matching back to your original data.
A matchType of “HIGHERRANK” usually means the name is not in the GBIF backbone or it is not a species-level name (a genus, family, order …). A matchType of “FUZZY” means that the name you supplied may have been misspelled or is a variant not in the backbone. A matchType of “Exact” means the binomial name appears exactly as spelled by you in the GBIF backbone (note that it ignores authorship info).
If you have multiple names to match, you can use
name_backbone_checklist.
# This requires the newest version of rgbif
name_list <- c(
"Cirsium arvense (L.) Scop.",
"Calopteryx splendens",
"Puma concolor (Linnaeus, 1771)",
"Ceylonosticta alwisi",
"Fake species (John Waller 2021)",
"Calopteryx")
name_backbone_checklist(name_list)name_backbone_checklist will also work with a
data.frame of name information also known as a
checklist.
name_data <- data.frame(
scientificName = c(
"Cirsium arvense (L.) Scop.", # a plant
"Calopteryx splendens (Harris, 1780)", # an insect
"Puma concolor (Linnaeus, 1771)", # a big cat
"Ceylonosticta alwisi (Priyadarshana & Wijewardhane, 2016)", # newly discovered insect
"Puma concuolor (Linnaeus, 1771)", # a mis-spelled big cat
"Fake species (John Waller 2021)", # a fake species
"Calopteryx" # Just a Genus
),
kingdom = c(
"Plantae",
"Animalia",
"Animalia",
"Animalia",
"Animalia",
"Johnlia",
"Animalia"
))
name_backbone_checklist(name_data)
# To return more than just the 'best' results, run
# name_backbone_checklist(name_data,verbose=TRUE) When using name_backbone_checklist with a
data.frame, you can include higher taxonomic information
(genus, family, order, phylum, kingdom, rank) as columns. The
‘name’ column can also be one of several
commonly used aliases (scientificName, sci_name, names,
species, species_name, sp_name).
name_data <- data.frame(
species = c(
"Cirsium arvense (L.) Scop.", # a plant
"Calopteryx splendens (Harris, 1780)", # an insect
"Puma concolor (Linnaeus, 1771)"
),
kingdom = c(
"Plantae",
"Animalia",
"Animalia"
))
name_backbone_checklist(name_data)Too many choices problem
When two or more names exist in the GBIF
Backbone Taxonomy that have the same name but
different authorship (homotypic synonyms), supplying
just the binomial name will result in
matchType : "HIGHERRANK".
For example, the binomial name “Glocianus punctiger” has two entries
in the backbone taxonomy. Using the verbose=TRUE will
return both names.
name_backbone("Glocianus punctiger",verbose=TRUE) # returns more names
# "Glocianus punctiger (C.R.Sahlberg, 1835)"
# "Glocianus punctiger (Gyllenhal, 1837)"
However, giving just the binomial name will return the genus Glocianus, since GBIF doesn’t know which one to choose.
name_backbone("Glocianus punctiger") # matchType : "HIGHERRANK"Since name_backbone is designed to give back the best
match, it’s not possible for the response to choose between the two
names.
Other name_* functions
There are several functions for finding taxonomic information.
Typically, the function you want to use is name_backbone or
name_backbone_checklist, but these other functions can also
be useful in certain situations.
name_suggest can be useful for looking up subspecies or
partial names. It is the same service that lets gbif.org guess which name you are typing
in the occurrence
search.
name_suggest("Calopteryx splendens")name_lookup can be sometimes useful for seeing what is
available in other checklists.
name_lookup("Calopteryx splendens")$dataname_usage is a catch all function that does a lot.
?name_usage for more examples.
For example, name_usage can be used for looking up all
of the order, families, or genera in a higher-rank group.
library(dplyr)
# all bird genera, families, and orders
name_usage(212,data="children",limit=200)$data %>%
filter(!is.na(nubKey)) %>% # only things with a GBIF backbone nubKey
glimpse()name_usage can be used for looking up the common
names or vernacular names.
name_usage(key=212, data="vernacularNames")$data # the common names for birdsFurther reading
Read more about how the GBIF backbone is made here.