Taxonomic ranks are “whole thing”, as I like to say when something is complicated/messy. Taxonomic ranks are the relative placement of a taxon in a taxonomic hierarchy. If there was only one taxonomic hierarchy with the same taxonomic rank names life would be so much easier for all of us. Unfortunately, this is not the case. Nearly every source of taxonomic information has slightly different taxonomic rank names they use, even if many are the same across sources. This article highlights some of the sticky messes in taxonomic ranks.
NCBI is weird
NCBI taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy) is an outlier among the taxonomic data sources. They use a lot of rank names that hold no information as to the placement of that taxon within a taxonomic hierarchy. The most common of these is no rank. If you’ve retrieved taxonomic information from NCBI you may have noticed this supposed rank name. When you get data from NCBI in taxize you will most certainly run into this rank; just know that it doesn’t tell you where it falls within a taxonomic hierarchy. You can however use
classification() to hopefully get other rank names that do hold actual information.
Another one that holds no information which has started to become more common with NCBI is clade. It’s started to replace no rank in at least some cases. As far as we can tell this rank name also holds no information about it’s placement within a taxonomic hierarchy.
downstream() and related functions in taxize require a rank to be given to the
downto parameter, which says you want taxonomic names down to a particular rank.
Some ranks are not useful for the
downto parameter. They include “no rank”, “unspecified”, “unranked”, and “clade”; there are possibly more. They can’t be used because they have no rank placement in a hierarchy - they can be anywhere in the hierarchy.
As a work around, try looking at the taxonomic hierarchy for your taxa of interest using
classification() or on the data providers website and see if there’s a taxonomic rank below the rank you want, and use that rank in combination with setting
intermediate=TRUE in your
downstream() function call. From the intermediate data you can fetch data you need.
Rank information within taxize
taxize maintains a taxonomic hierarchy reference within the package. There are two:
rank_ref: data.frame with two columns and around 45 rows (may increase or decrease through time as we incorporate or take away some ranks).
?rank_reffor more information
rank_ref_zoo: same as above but more specifically for zoological rank systems; as of this writing only used for WoRMS;
rank_refis used for all other data sources.
?rank_ref_zoofor more information
head(rank_ref) #> rankid ranks #> 1 01 domain #> 2 05 superkingdom #> 3 10 kingdom #> 4 20 subkingdom #> 5 25 infrakingdom,superphylum #> 6 30 phylum,division
We reference this rank information in
tax_name(), and all of the exported
downstream functions. We use the
rankid numeric column to be able to filter data using
We often reference rank information inside the helper functions
prune_too_low(), utility functions to help filter results from data providers by rank information.
We manage these two datasets using the script at
inst/ignore/rank_ref_script.R in the package, which ends with updating each dataset in the
data/ folder of the package.
Let us know
Let us know by opening an issue (https://github.com/ropensci/taxize/issues) if you think you have run into any issues with rank information. The most common issues we see with ranks is with the
downstream() function and the data source specific functions that support it.