Introduction
taxlist
is a package designed to handle and assess taxonomic lists in R, providing an object class and functions in S4
language. The homonymous object class taxlist
was originally designed as a module for taxa recorded in vegetation-plot observations (see vegtable
), but became as an independent object with the ability of contain not only lists of species but also synonymy, hierarchical taxonomy, and functional traits (attributes of taxa).
The main aim of this package is to keep consistence in taxonomic lists (a set of rules are checked by the function validObject()
), to enable the re-arrangement of such data, and to statistically assess functional traits and other attributes, for instance taxonomy itself (function tax2traits()
set taxonomic information as trait).
While this package only includes a function for the import of taxonomic lists from Turboveg, almost any data source can be structured as taxlist
object, so far the information is imported into data frames in an R session and the consistency rules are respected (validity).
The use of taxlist
is recommended for people cleaning raw data before importing it to relational databases, either in the context of taxonomic work or biodiversity assessments. The other way around, people having relational databases or clean and structured taxonomic lists may use taxlist
as recipient of this information in R sessions in order to carry out further statistical assessments. Finally, the function print_name()
makes taxlist
suitable for its implementation in interactive documents using rmarkdonw
and knitr
(e.g. reports, manuscripts and check-lists).
The structure of taxlist
objects is inspired on the structure of data handled by Turboveg and relational databases.
Figure: Relational model for taxlist
objects (see Alvarez & Luebert 2018).
Installing taxlist
This package is available from the Comprehensive R Archive Network (CRAN) and can be directly installed within an R-session:
install.packages("taxlist", dependencies = TRUE)
Alternatively, the current development version is available from GitHub and can be installed using the package devtools
:
A vignette is installed with this package introducing to the work with taxlist
and can be accessed by following command in your R-session:
vignette("taxlist-intro")
Building taxlist Objects
Objects can be built step-by-step as in the following example. For it, we will use as reference the “Ferns of Chile” (original in Spanish: “Helechos de Chile”) by Gunkel (1984). We will create an empty taxlist
object using the function new()
:
library(taxlist)
#>
#> Attaching package: 'taxlist'
#> The following objects are masked from 'package:base':
#>
#> levels, levels<-, print
Fern <- new("taxlist")
Fern
#> object size: 5.1 Kb
#> validation of 'taxlist' object: TRUE
#>
#> number of taxon usage names: 0
#> number of taxon concepts: 0
#> trait entries: 0
#> number of trait variables: 0
#> taxon views: 0
Then we have to set the respective taxonomic ranks. In such case, the levels have to be provided from the lowest to highest hierarchical level:
For convenience, we start inserting taxa with their respective names in a top-down direction. We will use the function add_concept()
to add a new taxon. Note that the arguments TaxonName
, AuthorName
, and Level
are used to provide the name of the taxon, the authority of the name and the taxonomic rank, respectively.
Fern <- add_concept(taxlist = Fern, TaxonName = "Asplenium", AuthorName = "L.", Level = "genus")
summary(Fern, "all")
#> ------------------------------
#> concept ID: 1
#> view ID: none
#> level: genus
#> parent: none
#>
#> # accepted name:
#> 1 Asplenium L.
#> ------------------------------
As you see, the inserted genus got the concept ID 1 (see TaxonConceptID
in the previous figure). To insert a species of this genus, we use again the function add_concept()
, but this time we will also provide the ID of the parent taxon with the argument Parent
.
Fern <- add_concept(Fern,
TaxonName = "Asplenium obliquum", AuthorName = "Forster",
Level = "species", Parent = 1
)
summary(Fern, "Asplenium obliquum")
#> ------------------------------
#> concept ID: 2
#> view ID: none
#> level: species
#> parent: 1 Asplenium L.
#>
#> # accepted name:
#> 2 Asplenium obliquum Forster
#> ------------------------------
In the same way, we can add now two varieties of the inserted species:
Fern <- add_concept(Fern,
TaxonName = c(
"Asplenium obliquum var. sphenoides",
"Asplenium obliquum var. chondrophyllum"
),
AuthorName = c(
"(Kunze) Espinosa",
"(Bertero apud Colla) C. Christense & C. Skottsberg"
),
Level = "variety", Parent = c(2, 2)
)
You may have realized that the function summary()
is applied to provide on the one side a display of meta-information for the whole taxlist
object, and on the other side to show a detail of the taxa included in the object. In the later case adding the keyword "all"
as second argument, the summary will show a detailed information for every taxon included in the object.
Fern
#> object size: 6.2 Kb
#> validation of 'taxlist' object: TRUE
#>
#> number of taxon usage names: 4
#> number of taxon concepts: 4
#> trait entries: 0
#> number of trait variables: 0
#> taxon views: 0
#>
#> concepts with parents: 3
#> concepts with children: 2
#>
#> hierarchical levels: variety < species < genus
#> number of concepts in level variety: 2
#> number of concepts in level species: 1
#> number of concepts in level genus: 1
summary(Fern, "all")
#> ------------------------------
#> concept ID: 1
#> view ID: none
#> level: genus
#> parent: none
#>
#> # accepted name:
#> 1 Asplenium L.
#> ------------------------------
#> concept ID: 2
#> view ID: none
#> level: species
#> parent: 1 Asplenium L.
#>
#> # accepted name:
#> 2 Asplenium obliquum Forster
#> ------------------------------
#> concept ID: 3
#> view ID: none
#> level: variety
#> parent: 2 Asplenium obliquum Forster
#>
#> # accepted name:
#> 3 Asplenium obliquum var. sphenoides (Kunze) Espinosa
#> ------------------------------
#> concept ID: 4
#> view ID: none
#> level: variety
#> parent: 2 Asplenium obliquum Forster
#>
#> # accepted name:
#> 4 Asplenium obliquum var. chondrophyllum (Bertero apud Colla) C. Christense & C. Skottsberg
#> ------------------------------
Indented lists
A feature implemented in version 0.2.1 is the function indented_list()
, which provides a better display on the hierarchical strucutre of taxlist
objects.
indented_list(Fern)
#> Asplenium L.
#> Asplenium obliquum Forster
#> Asplenium obliquum var. sphenoides (Kunze) Espinosa
#> Asplenium obliquum var. chondrophyllum (Bertero apud Colla) C. Christense & C. Skottsberg
From data frame to taxlist
A more convenient way is to create an object from a data frame including both, the taxon concepts with their accepted names and the taxonomic ranks with parent-child relationships. In the case of the last example, the required data frame looks like this one:
Fern_df <- data.frame(
TaxonConceptID = 1:4,
TaxonUsageID = 1:4,
TaxonName = c(
"Asplenium", "Asplenium obliquum",
"Asplenium obliquum var. sphenoides",
"Asplenium obliquum var. chondrophyllum"
),
AuthorName = c(
"L.", "Forster", "(Kunze) Espinosa",
"(Bertero apud Colla) C. Christense & C. Skottsberg"
),
Level = c("genus", "species", "variety", "variety"),
Parent = c(NA, 1, 2, 2),
stringsAsFactors = FALSE
)
Fern_df
#> TaxonConceptID TaxonUsageID TaxonName
#> 1 1 1 Asplenium
#> 2 2 2 Asplenium obliquum
#> 3 3 3 Asplenium obliquum var. sphenoides
#> 4 4 4 Asplenium obliquum var. chondrophyllum
#> AuthorName Level Parent
#> 1 L. genus NA
#> 2 Forster species 1
#> 3 (Kunze) Espinosa variety 2
#> 4 (Bertero apud Colla) C. Christense & C. Skottsberg variety 2
This kind of tables can be written in a spreadsheet application and imported to your R session. The two first columns correspond to the IDs of the taxon concept and the respective accepted name. They can be custom IDs but are restricted to integers in taxlist
. For the use of the function df2taxlist()
, the two first columns are mandatory. Also note that the column Parent is pointing to the concept IDs of the respective parent taxon. To get the object, we just use the df2taxlist()
indicating the sequence of taxonomic ranks in the argument levels
.
Fern2 <- df2taxlist(Fern_df, levels = c("variety", "species", "genus"))
#> No values for 'AcceptedName' in 'x'. all names will be considered as accepted names.
Fern2
#> object size: 6.2 Kb
#> validation of 'taxlist' object: TRUE
#>
#> number of taxon usage names: 4
#> number of taxon concepts: 4
#> trait entries: 0
#> number of trait variables: 0
#> taxon views: 0
#>
#> concepts with parents: 3
#> concepts with children: 2
#>
#> hierarchical levels: variety < species < genus
#> number of concepts in level variety: 2
#> number of concepts in level species: 1
#> number of concepts in level genus: 1
Similar Packages
The package taxlist
shares similar objectives with the package taxa
, but uses different approaches for object oriented programming in R, namely taxlist
applies S4 while taxa
uses R6. Additionally, taxa
is rather developer-oriented, while taxlist
is rather a user-oriented package.
In following cases you may prefer to use taxlist
:
- When you need an automatic check on the consistency of information regarding taxonomic ranks and parent-child relationships (parents have to be of a higher rank then children), as well as non-duplicated combinations of names and authors. Such checks are done by the function
validObject()
. - When you foresee statistical assessments on taxonomy diversity or taxon properties (chorology, conservation status, functional traits, etc.).
- When you seek to produce documents using rmarkdown, for instance guide books or check-lists. Also in article manuscripts taxonomic names referring to a taxon concept can easily get formatted by the function
print_name()
. - When importing taxonomic lists from databases stored in Turboveg 2.
- When you seek to implement the package
vegtable
for handling and assessing biodiversity records, especially vegetation-plot data. In that case, taxonomic lists will be formatted bytaxlist
as a slot within avegtable
object.
Rmarkdown Integration
As mentioned before, taxlist
objects can be also used for writing rmarkdown documents (see this poster). For instance you can insert your objects at the beginning of the document with a hidden chunk:
To mention a taxon, you can write in-line codes, such as `r print_name(Easplist, 206)`
, which will insert Cyperus papyrus L. in your document (note that the number is the ID of the taxon concept in Easplist
). Fort a second mention of the same species, you can then use `r print_name(Easplist, 206, second_mention=TRUE)`
, which will insert C. papyrus L. in your text.
Descriptive Statistics
Information located in the slot taxonTraits are suitable for statistical assessments. For instance, in the installed object Easplist
a column called life_form includes a classification of macrophytes into different life forms. To know the frequency of these life forms in the Easplist
, we can use the function count_taxa()
:
# how man taxa in 'Easplist'
count_taxa(Easplist)
#> [1] 3887
# frequency of life forms
count_taxa(~life_form, Easplist)
#> life_form taxa_count
#> 1 acropleustophyte 8
#> 2 chamaephyte 25
#> 3 climbing_plant 25
#> 4 facultative_annual 20
#> 5 obligate_annual 114
#> 6 phanerophyte 26
#> 7 pleustohelophyte 8
#> 8 reed_plant 14
#> 9 reptant_plant 19
#> 10 tussock_plant 52
Furthermore, taxonomic information can be also transferred to this slot using the function tax2traits()
. By this way we will make taxonomic ranks suitable for frequency calculations.
Easplist <- tax2traits(Easplist, get_names = TRUE)
head(Easplist@taxonTraits)
#> TaxonConceptID life_form form variety subspecies
#> 1 7 phanerophyte <NA> <NA> <NA>
#> 2 9 phanerophyte <NA> <NA> <NA>
#> 3 18 facultative_annual <NA> <NA> <NA>
#> 4 20 facultative_annual <NA> <NA> <NA>
#> 5 21 obligate_annual <NA> <NA> <NA>
#> 6 22 chamaephyte <NA> <NA> <NA>
#> species complex genus family
#> 1 Acacia mearnsii <NA> Acacia Leguminosae
#> 2 Acacia polyacantha <NA> Acacia Leguminosae
#> 3 Achyranthes aspera <NA> Achyranthes Amaranthaceae
#> 4 Acmella caulirhiza <NA> Acmella Compositae
#> 5 Acmella uliginosa <NA> Acmella Compositae
#> 6 Aeschynomene schimperi <NA> Aeschynomene Leguminosae
Note that the respective parental ranks are inserted in the table taxonTraits, which contains the attributes of the taxa. In the two next command lines, we will produce a subset with only members of the family Cyperaceae and then calculate the frequency of species per genera.
Cype <- subset(Easplist, family == "Cyperaceae", slot = "taxonTraits")
Cype_stat <- count_taxa(species ~ genus, Cype)
Now, we can sort them to produce a nice bar plot.
Acknowledgements
The author thanks Stephan Hennekens, developer of Turboveg, for his patience and great support finding a common language between R and Turboveg, as well as for his advices on formatting our taxonomic list EA-Splist.
Also thanks to Federico Luebert for the fruitful discussions regarding the terminology used in this project.