Skip to contents

A function to pull in the phyologeny/phylogenies matching a search query


search_treebase(input, by, returns = c("tree", "matrix"),
  exact_match = FALSE, max_trees = Inf, branch_lengths = FALSE,
  curl = getCurlHandle(), verbose = TRUE, pause1 = 0, pause2 = 0,
  attempts = 3, only_metadata = FALSE)



a search query (character string)


the kind of search; author, taxon, subject, study, etc (see list of possible search terms, details)


should the fn return the tree or the character matrix?


force exact matching for author name, taxon, etc. Otherwise does partial matching


Upper bound for the number of trees returned, good for keeping possibly large initial queries fast


logical indicating whether should only return trees that have branch lengths.


the handle to the curl web utility for repeated calls, see the getCurlHandle() function in RCurl package for details.


logical indicating level of progress reporting


number of seconds to hesitate between requests


number of seconds to hesitate between individual files


number of attempts to access a particular resource


option to only return metadata about matching trees which lists,, kind (gene,species,barcode) type (single, consensus) number of taxa, and possible quality score.


either a list of trees (multiphylo) or a list of character matrices


Choose the search type. Options are:

  • abstract search terms in the publication abstract

  • author match authors in the publication

  • subject match subject

  • doi the unique object identifier for the publication

  • ncbi NCBI identifier number for the taxon

  • kind.tree Kind of tree (Gene tree, species tree, barcode tree)

  • type.tree type of tree (Consensus or Single)

  • ntax number of taxa in the matrix

  • quality A quality score for the tree, if it has been rated.

  • study match words in the title of the study or publication

  • taxon taxon scientific name

  • TreeBASE study ID

  • id.tree TreeBASE's unique tree identifier (

  • id.taxon taxon identifier number from TreeBase

  • tree The title for the tree

  • type.matrix Type of matrix

  • matrix Name given the the matrix

  • id.matrix TreeBASE's unique matrix identifier

  • nchar number of characters in the matrix

The package provides partial support for character matrices provided by TreeBASE. At the time of writing, TreeBASE permits ambiguous DNA characters in these matrices, such as `CG` indicating either a C or G, which is not supported by any R interpreter, and thus may lead to errors. for a description of all possible search options, see


if (FALSE) {
## defaults to return phylogeny
Huelsenbeck <- search_treebase("Huelsenbeck", by="author")

## can ask for character matrices:
wingless <- search_treebase("2907", by="id.matrix", returns="matrix")

## Some nexus matrices don't meet's strict requirements,
## these aren't returned
H_matrices <- search_treebase("Huelsenbeck", by="author", returns="matrix")

## Use Booleans in search: and, or, not
## Note that by must identify each entry type if a Boolean is given
HR_trees <- search_treebase("Ronquist or Hulesenbeck", by=c("author", "author"))

## We'll often use max_trees in the example so that they run quickly,
## notice the quotes for species.
dolphins <- search_treebase('"Delphinus"', by="taxon", max_trees=5)
## can do exact matches
humans <- search_treebase('"Homo sapiens"', by="taxon", exact_match=TRUE, max_trees=10)
## all trees with 5 taxa
five <- search_treebase(5, by="ntax", max_trees = 10)
## These are different, a tree id isn't a Study id.  we report both
studies <- search_treebase("2377", by="")
tree <- search_treebase("2377", by="id.tree")
c("TreeID" = tree$, "StudyID" = tree$
## Only results with branch lengths
## Has to grab all the trees first, then toss out ones without branch_lengths
Near <- search_treebase("Near", "author", branch_lengths=TRUE)