Retrieve the distribution of genes for given organism in genbank
Source:R/gene.sampling.retrieve.R
gene.sampling.retrieve.Rd
This function retrieves the gene names for given organism in genbank. This function is
useful to explore the genes that are widely sampled for a given taxon
in genbank. However, note that the performance of gene.sampling.retrieve
is entirely dependant on the quality of submissions to genbank. For instance, if
most of the sequences in genbank (nucleotide) don't have information on the
sequenced region, this function won't provide reliable estimates of the sampling
frequency of each gene in the database. We recommend using this function with caution,
and only when there is no additional information on the genes that are widely sampled
for the target group(s) of interest.
Arguments
- organism
A vector of taxonomic groups (character).
- speciesSampling
Whether the results should be at the species-level (logical).
- npar
Number of simultaneous searches (character; optional). The default 2 is generally fast enough and does not exceed the maximum number of connections to genbank.
- nSearchesBatch
Number of searches per batch
Value
This function returns a data.frame
that comprises the following
columns. First, Gene
, including the name of the relevant
gene sampled in the target taxonomic groups. Second,
Sampled in N species
includes the number of species where the
gene is sampled among the target taxa. Third, PercentOfSampledSpecies
indicates the percentage of species where the gene is sampled (assuming
GeneBank's taxonomic backbone).