This function converts a list of hierarchies for individual species into a single species by taxonomic level matrix, then calculates a distance matrix based on taxonomy alone, and outputs either a phylo or dist object. See details for more information.
Arguments
- input
List of classification data.frame's from the function
classification()
- varstep
Vary step lengths between successive levels relative to proportional loss of the number of distinct classes.
- check
If TRUE, remove all redundant levels which are different for all rows or constant for all rows and regard each row as a different basal taxon (species). If FALSE all levels are retained and basal taxa (species) also must be coded as variables (columns). You will get a warning if species are not coded, but you can ignore this if that was your intention.
If
TRUE
, remove any taxa that are coarser ranks present in other taxa, such as both a genus and a species in that genus in the same tree.- ...
Further arguments passed on to hclust.
- x
Input object to print or plot - output from class2tree function.
Value
An object of class "classtree" with slots:
phylo - The resulting object, a phylo object
classification - The classification data.frame, with taxa as rows, and different classification levels as columns
distmat - Distance matrix
names - The names of the tips of the phylogeny
Note that when you execute the resulting object, you only get the phylo object. You can get to the other 3 slots by calling them directly, like output$names, etc.
Details
See vegan::taxa2dist()
. Thanks to Jari Oksanen for
making the taxa2dist function and pointing it out, and Clarke & Warwick
(1998, 2001), which taxa2dist was based on.
The taxonomy tree created is not only based on the clustering of the
taxonomy ranks (e.g. strain, species, genus, ...), but it also utilizes the
actual taxon clades (e.g. mammals, plants or reptiles, etc.). The process of
this function is as following: First, all possible taxonomy ranks and their
corresponding IDs for each given taxon will be collected from the input.
Then, the rank vectors of all taxa will be aligned, so that they together
will become a matrix where columns are ordered taxonomy ranks of all taxa and
rows are the rank vectors of those taxa. After that, the rank matrix will be
converted into taxonomy ID matrix, any missing rank will have a pseudo
ID from the previous rank. Finally, this taxonomy ID matrix will be used to
cluster taxa that have similar taxonomy hierarchy together.
Examples
if (FALSE) { # \dontrun{
spnames <- c('Quercus robur', 'Iris oratoria', 'Arachis paraguariensis',
'Helianthus annuus','Madia elegans','Lupinus albicaulis',
'Pinus lambertiana')
out <- classification(spnames, db='itis')
tr <- class2tree(out)
plot(tr)
spnames <- c('Klattia flava', 'Trollius sibiricus',
'Arachis paraguariensis',
'Tanacetum boreale', 'Gentiana yakushimensis','Sesamum schinzianum',
'Pilea verrucosa','Tibouchina striphnocalyx','Lycium dasystemum',
'Berkheya echinacea','Androcymbium villosum',
'Helianthus annuus','Madia elegans','Lupinus albicaulis',
'Pinus lambertiana')
out <- classification(spnames, db='ncbi')
tr <- class2tree(out)
plot(tr)
} # }