Working With Taxonomic Names
John Waller
2021-12-20
Source:vignettes/taxonomic_names.Rmd
taxonomic_names.Rmd
In order to use GBIF mediated data effectively, you will often need to match a scientific name to the GBIF Backbone Taxonomy.
The goal of name matching is to get back an unambiguous taxonomic key (a number) of the scientific name you are interested in. Having a key makes it easy for GBIF to know what you mean.
name_backbone
or name_backbone_checklist
are the best ways to go from scientific
name to GBIF taxonkey.
name_backbone(name="Calopteryx splendens")
# name_backbone(name="Calopteryx splendens", verbose=TRUE)
This will return a data.frame
of the single best
match for the name you supplied.
The most interesting columns are:
- usageKey: Another name for the GBIF taxonkey.
-
status :
name_backbone
will always return only “ACCEPTED” names. -
matchType : “EXACT”, “HIGHERRANK”, “FUZZY”, or
“NONE” (see below).
- verbatim_name : The name you supplied to GBIF. Useful for matching back to your original data.
A matchType of “HIGHERRANK” usually means the name is not in the GBIF backbone or it is not a species-level name (a genus, family, order …). A matchType of “FUZZY” means that the name you supplied may have been misspelled or is a variant not in the backbone. A matchType of “Exact” means the binomial name appears exactly as spelled by you in the GBIF backbone (note that it ignores authorship info).
If you have multiple names to match, you can use
name_backbone_checklist
.
# This requires the newest version of rgbif
name_list <- c(
"Cirsium arvense (L.) Scop.",
"Calopteryx splendens",
"Puma concolor (Linnaeus, 1771)",
"Ceylonosticta alwisi",
"Fake species (John Waller 2021)",
"Calopteryx")
name_backbone_checklist(name_list)
name_backbone_checklist
will also work with a
data.frame
of name information also known as a
checklist.
name_data <- data.frame(
scientificName = c(
"Cirsium arvense (L.) Scop.", # a plant
"Calopteryx splendens (Harris, 1780)", # an insect
"Puma concolor (Linnaeus, 1771)", # a big cat
"Ceylonosticta alwisi (Priyadarshana & Wijewardhane, 2016)", # newly discovered insect
"Puma concuolor (Linnaeus, 1771)", # a mis-spelled big cat
"Fake species (John Waller 2021)", # a fake species
"Calopteryx" # Just a Genus
),
kingdom = c(
"Plantae",
"Animalia",
"Animalia",
"Animalia",
"Animalia",
"Johnlia",
"Animalia"
))
name_backbone_checklist(name_data)
# To return more than just the 'best' results, run
# name_backbone_checklist(name_data,verbose=TRUE)
When using name_backbone_checklist
with a
data.frame
, you can include higher taxonomic information
(genus, family, order, phylum, kingdom, rank) as columns. The
‘name’ column can also be one of several
commonly used aliases (scientificName, sci_name, names,
species, species_name, sp_name).
name_data <- data.frame(
species = c(
"Cirsium arvense (L.) Scop.", # a plant
"Calopteryx splendens (Harris, 1780)", # an insect
"Puma concolor (Linnaeus, 1771)"
),
kingdom = c(
"Plantae",
"Animalia",
"Animalia"
))
name_backbone_checklist(name_data)
Too many choices problem
When two or more names exist in the GBIF
Backbone Taxonomy that have the same name but
different authorship (homotypic synonyms), supplying
just the binomial name will result in
matchType : "HIGHERRANK"
.
For example, the binomial name “Glocianus punctiger” has two entries
in the backbone taxonomy. Using the verbose=TRUE
will
return both names.
name_backbone("Glocianus punctiger",verbose=TRUE) # returns more names
# "Glocianus punctiger (C.R.Sahlberg, 1835)"
# "Glocianus punctiger (Gyllenhal, 1837)"
However, giving just the binomial name will return the genus Glocianus, since GBIF doesn’t know which one to choose.
name_backbone("Glocianus punctiger") # matchType : "HIGHERRANK"
Since name_backbone
is designed to give back the best
match, it’s not possible for the response to choose between the two
names.
Other name_* functions
There are several functions for finding taxonomic information.
Typically, the function you want to use is name_backbone
or
name_backbone_checklist
, but these other functions can also
be useful in certain situations.
name_suggest
can be useful for looking up subspecies or
partial names. It is the same service that lets gbif.org guess which name you are typing
in the occurrence
search.
name_suggest("Calopteryx splendens")
name_lookup
can be sometimes useful for seeing what is
available in other checklists.
name_lookup("Calopteryx splendens")$data
name_usage
is a catch all function that does a lot.
?name_usage
for more examples.
For example, name_usage
can be used for looking up all
of the order, families, or genera in a higher-rank group.
library(dplyr)
# all bird genera, families, and orders
name_usage(212,data="children",limit=200)$data %>%
filter(!is.na(nubKey)) %>% # only things with a GBIF backbone nubKey
glimpse()
name_usage
can be used for looking up the common
names or vernacular names.
name_usage(key=212, data="vernacularNames")$data # the common names for birds
Further reading
Read more about how the GBIF backbone is made here.