Skip to contents

After downloading sequences from genbank, this function curates sequences based on taxonomic information. Note that this function provides two summary datasets. First, the accession numbers. Second, the taxonomic information for each species in the database. The taxonomy strictly follows the gbif taxonomic backbone. The resulting files are saved to "1.CuratedSequences". The resulting files also have the most recent curated taxonomy following the gbif (or selected database) taxonomic backbone.

Usage

sq.curate(
  filterTaxonomicCriteria = NULL,
  mergeGeneFiles = NULL,
  database = "gbif",
  kingdom = NULL,
  folder = "0.Sequences",
  sqs.object = NULL,
  removeOutliers = TRUE,
  minSeqs = 5,
  threshold = 0.05,
  ranks = c("kingdom", "phylum", "class", "order", "family", "genus", "species")
)

Arguments

filterTaxonomicCriteria

A single string of terms (delimited using "|") listing all the strings that could be used to identify the species that should be in the dataset (character).

mergeGeneFiles

A named list, with each element being a character vector indicating the names of the files in "0.Sequences" that need to be combined into a single fasta file. For instance, you can use this argument to combine CO1 and COI.

database

A name of a database with taxonomic information. Although 'gbif' is faster, it only has information for animals and plants. Other databases follow taxize::classification.

kingdom

Optional and only used when database='gbif'. Two possible options: "animals" or "plants."

folder

The name of the folder where the original sequences are located (character).

sqs.object

A list of sequences generated from sq.retrieve.indirect. Only use if you're not interested in download sequences locally.

removeOutliers

Whether odseq:odseq should be used to remove outliers

minSeqs

minimum number of sequences per locus

threshold

Relative to odseq::odseq. Only important if removeOutliers = TRUE

ranks

The taxonomic ranks used to examine the taxonomy of the species in the 0.Sequences folder.

Value

This function will return an object of class list with the following elements. First, the curated sequences with original names. Second, the curated sequences with species-level names. Third, the accession numbers table. Fourth, a summary of taxonomic information for all the species sampled in the files.

Examples

if (FALSE) {
sq.retrieve.direct(
  clades = c("Felis", "Vulpes", "Phoca"),
  species = "Manis_pentadactyla",
  genes = c("ADORA3", "CYTB")
)
sq.curate(
  filterTaxonomicCriteria = "Felis|Vulpes|Phoca|Manis",
  database = "gbif", kingdom = "animals",
  folder = "0.Sequences"
)
}