Skip to contents

Codecov test coverage R-CMD-check
CodeFactor
Status at rOpenSci Software Peer Review

Assembling phylogenetic trees from taxonomic names

What is phruta?

The phruta R package is designed to simplify the basic phylogenetic pipeline. All the code is run within the same program and data from intermediate steps are saved in independent folders (optional). phruta retrieves gene sequences, combines newly downloaded to local gene sequences, performs sequence alignments, and basic phylogenetic inference.

Who should consider using phruta

The main functions in the phruta R package allow for a quick mining and curation of GenBank sequences. This package is designed for students and researchers interested in generating species-level genetic datasets for particular sets of taxa. Specifically, if you have a clade or group of species in mind, phruta will help you to assemble a molecular dataset with information available in GenBank.

Why use phruta?

phruta simplifies the phylogenetic pipeline, increases reproducibility, and helps organizing information used to infer molecular phylogenies.

How is phruta different from other software?

phruta has two core functions. The main applications of these functions is briefly outlined below:

  • sq.retrieve.direct() and sq.retrieve.indirect(): These functions downloads sequences from genbank (nucleotide database) for particular taxa (taxonomic groups or particular species) and a list of genes.

  • sq.curate(): After sequences are downloaded from genbank, this function curates sequences within each of the examined genes by detecting sequence outliers and by using taxonomic information.

In addition to these two main functions, users will be able to align the downloaded sequences, infer phylogenetic trees, and calibrate phylogenies using additional functions in phruta.

Installing phruta

phruta is currently only available through GitHub. It can be easily installed using the following code.

library(devtools) 
install_github("ropensci/phruta")

Alternatively, you can install phruta using:

install.packages("phruta", repos = "https://ropensci.r-universe.dev")

Please make sure that the R packages msa, DECIPHER, Biostrings, and odseq are correctly installed. If you are interested in using the development version of phruta, please install it using the following code:

library(devtools)
install_github("ropensci/phruta", ref = "dev")

Running phruta from shiny

I have constructed a shiny app that hosts phruta and enables users to run the basic functions in a less-code intensive environment. The app, salphycon is currently available in the following GitHub repo. The shiny app will be live at some point in 2023.

Installing RAxML

In MacOS, RAxML can be easily installed to the PATH using one of the two lines below in conda:

{bash eval=FALSE} conda install -c bioconda/label/cf201901 raxml

{bash eval=FALSE} conda install -c bioconda raxml

For other OS (Windows, Linux), please follow the instructions listed in the official RAxML website

Once RAxML has been installed to your computer, open R and make sure that the following line doesn’t throw an error.

{r eval=FALSE} system("raxmlHPC")

Depending on how RAxML was installed, you may want to check if RAxML is called from the terminal using raxmlHPC or raxmlHPC. This string needs to be passed to tree.raxml using the argument raxml_exec. Please note that this argument corresponds to the exec argument in ips::raxml.

Finally, note that RStudio sometimes has issues finding stuff in the path while using system(). If you’re using macOS, try starting RStudio from the command line by running the following line:

{bash eval=FALSE} open /Applications/RStudio.app

VS code does not suffer of the same issues. In other OS, it might be better to simply avoid using RStudio if you’re interested in running the phylogenetic functions in phruta.

Installing PATHd-8 and treePL

There are excellent guides for installing PATHd-8 and treePL. Here, I summarize two potentially relevant options.

First, you can use Brian O’Meara’s approach for installing PATHd-8 in MacOs and linux. I summarize the code in the following link. For Windows users, please use the compiled version of the software provided in the following link.

Second, you can use homebrew to install treePL (Windows, MacOS, and Linux), thanks to Jonathan Chang.

{bash eval = F} brew install brewsci/bio/treepl

Please check the following link) if you’re interested in running brew from Windows and Linux.

Running phruta from Rstudio while using MacOS?

Only if you’re interested in running phylogenetic analyses, please make sure you open RStudio using the following code from the terminal:

{bash eval=FALSE} open /Applications/RStudio.app

Dedication

My package is dedicated to my mom. I still have lots of things to learn from you. You will always have all my admiration. The logo features a Palenquera in Cartagena (Colombia). For many folks, Palenqueras are just the Black woman ones who sell fruits in particular Colombian turistic areas. However, palenqueras and Palenque are central to Black identity in Colombia, Latin America, and across the America: “Palenque was the first free African town in the Americas”](https://en.wikipedia.org/wiki/San_Basilio_de_Palenque).

Etymology

Fruta is the Spanish word for Fruit. English ph sounds the same as F in Spanish. In phruta, ph is relative to phylogenetics. I pronounce phruta just as fruta in Spanish.

Additional resources

More details about the functions implemented in phruta can be found in the different vignettes associated with the package or in our website.

Alternatives to phruta

Similar functionalities for assembling curated molecular datasets for phylogenetic analyses can be found in phylotaR and SuperCRUNCH. However, note that phylotaR is limited to downloading and curating sequences (e.g. doesn’t align sequences). Similarly, SuperCRUNCH only curates local sequences. phruta is closer to the SUPERSMART and its “new” associated R workflow SUPERSMARTR. However, most of the applications in the different packages that are part of SUPERSMARTR are simplified in phruta.

Contributing

Please see our contributing guide.

Contact

Please see the package DESCRIPTION for package authors.

Code of conduct

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.