The taxa package is intended to:

  • Provide a set of classes to store taxonomic data and any user-specific data associated with it

  • Provide functions to convert commonly used formats to these classes

  • Provide a common foundation for other packages to build on to enable an ecosystem of compatible packages dealing with taxonomic data.

  • Provide generally useful functionality, such as filtering and mapping functions

Main classes

These are the classes users would typically interact with:

  • taxon: A class used to define a single taxon. Many other classes in the `taxa`` package include one or more objects of this class.

  • taxa: Stores one or more taxon objects. This is just a thin wrapper for a list of taxon objects.

  • hierarchy: A class containing an ordered list of taxon objects that represent a hierarchical classification.

  • hierarchies: A list of taxonomic classifications. This is just a thin wrapper for a list of hierarchy objects.

  • taxonomy: A taxonomy composed of taxon objects organized in a tree structure. This differs from the hierarchies class in how the taxon objects are stored. Unlike a hierarchies object, each unique taxon is stored only once and the relationships between taxa are stored in an edgelist.

  • taxmap: A class designed to store a taxonomy and associated user-defined data. This class builds on the taxonomy class. User defined data can be stored in the list obj$data, where obj is a taxmap object. Any number of user-defined lists, vectors, or tables mapped to taxa can be manipulated in a cohesive way such that relationships between taxa and data are preserved.

Minor classes

These classes are mostly components for the larger classes above and would not typically be used on their own.

  • taxon_database: Used to store information about taxonomy databases.

  • taxon_id: Used to store taxon IDs, either arbitrary or from a particular taxonomy database.

  • taxon_name: Used to store taxon names, either arbitrary or from a particular taxonomy database.

  • taxon_rank: Used to store taxon ranks (e.g. species, family), either arbitrary or from a particular taxonomy database.

Major manipulation functions

These are some of the more important functions used to filter data in classes that store multiple taxa, like hierarchies, taxmap, and taxonomy.

  • filter_taxa: Filter taxa in a taxonomy or taxmap object with a series of conditions. Relationships between remaining taxa and user-defined data are preserved (There are many options controlling this).

  • filter_obs: Filter user-defined data taxmap object with a series of conditions. Relationships between remaining taxa and user-defined data are preserved (There are many options controlling this);

  • sample_n_taxa: Randomly sample taxa. Has same abilities as filter_taxa.

  • sample_n_obs: Randomly sample observations. Has same abilities as filter_obs.

  • mutate_obs: Add datasets or columns to datasets in the data list of taxmap objects.

  • pick: Pick out specific taxa, while others are dropped in hierarchy and hierarchies objects.

  • pop: Pop out taxa (drop them) in hierarchy and hierarchies objects.

  • span: Select a range of taxa, either by two names, or relational operators in hierarchy and hierarchies objects.

Mapping functions

There are lots of functions for getting information for each taxon.

The kind of classes used

Note, this is mostly of interest to developers and advanced users.

The classes in the taxa package are mostly R6 classes (R6Class). A few of the simpler ones (taxa and hierarchies) are S3 instead. R6 classes are different than most R objects because they are mutable (e.g. A function can change its input without returning it). In this, they are more similar to class systems in object-oriented languages like python. As in other object-oriented class systems, functions are thought to "belong" to classes (i.e. the data), rather than functions existing independently of the data. For example, the function print in R exists apart from what it is printing, although it will change how it prints based on what the class of the data is that is passed to it. In fact, a user can make a custom print method for their own class by defining a function called print.myclassname. In contrast, the functions that operate on R6 functions are "packaged" with the data they operate on. For example, a print method of an object for an R6 class might be called like my_data$print() instead of print(my_data).

The two ways to call functions

Note, you will need to read the previous section to fully understand this one.

Since the R6 function syntax (e.g. my_data$print()) might be confusing to many R users, all functions in taxa also have S3 versions. For example, the filter_taxa() function can be called on a taxmap object called my_obj like my_obj$filter_taxa(...) (the R6 syntax) or filter_taxa(my_obj, ...) (the S3 syntax). For some functions, these two way of calling the function can have different effect. For functions that do not returned a modified version of the input (e.g. subtaxa()), the two ways have identical behavior. However, functions like filter_taxa(), that modify their inputs, actually change the object passed to them as the first argument as well as returning that object. For example,

my_obj <- filter_taxa(my_obj, ...)




new_obj <- my_obj$filter_taxa(...)

all replace my_obj with the filtered result, but

new_obj <- filter_taxa(my_obj, ...)

will not modify my_obj.

Non-standard evaluation

This is a rather advanced topic.

Like packages such as ggplot2 and dplyr, the taxa package uses non-standard evaluation to allow code to be more readable and shorter. In effect, there are variables that only "exist" inside a function call and depend on what is passed to that function as the first parameter (usually a class object). For example, in the dpylr function filter(), column names can be used as if they were independent variables. See ?dpylr::filter for examples of this. The taxa package builds on this idea.

For many functions that work on taxonomy or taxmap objects (e.g. filter_taxa), some functions that return per-taxon information (e.g. taxon_names()) can be referred to by just the name of the function. When one of these functions are referred to by name, the function is run on the relevant object and its value replaces the function name. For example,

new_obj <- filter_taxa(my_obj, taxon_names == "Bacteria")

is identical to:

new_obj <- filter_taxa(my_obj, taxon_names(my_obj) == "Bacteria")

which is identical to:

new_obj <- filter_taxa(my_obj, my_obj$taxon_names() == "Bacteria")

which is identical to:

my_names <- taxon_names(my_obj)

new_obj <- filter_taxa(my_obj, my_names == "Bacteria")

For taxmap objects, you can also use names of user defined lists, vectors, and the names of columns in user-defined tables that are stored in the obj$data list. See filter_taxa() for examples. You can even add your own functions that are called by name by adding them to the obj$funcs list. For any object with functions that use non-standard evaluation, you can see what values can be used with all_names() like all_names(obj).

Dependencies and inspiration

Various elements of the taxa package were inspired by the dplyr and taxize packages. This package started as parts of the metacoder and binomen packages. There are also many dependencies that make taxa possible.

Feedback and contributions

Find a problem? Have a suggestion? Have a question? Please submit an issue at our GitHub repository:

A GitHub account is free and easy to set up. We welcome feedback! If you don't want to use GitHub for some reason, feel free to email us. We do prefer posting to github since it allows others that might have the same issue to see our conversation. It also helps us keep track of what problems we need to address.

Want to contribute code or make a change to the code? Great, thank you! Please fork our GitHub repository and submit a pull request.

For more information

Checkout the vignette (browseVignettes("taxa")) for detailed introduction and examples.


Scott Chamberlain [email protected]

Zachary Foster [email protected]