Convert taxonomic information in a character vector into a taxmap()
object.
The location and identity of important information in the input is specified
using a regular expression
with capture groups and a corresponding key. An object of type taxmap()
is
returned containing the specified information. See the key
option for
accepted sources of taxonomic information.
extract_tax_data( tax_data, key, regex, class_key = "taxon_name", class_regex = "(.*)", class_sep = NULL, sep_is_regex = FALSE, class_rev = FALSE, database = "ncbi", include_match = FALSE, include_tax_data = TRUE )
tax_data | A vector from which to extract taxonomy information. |
---|---|
key | (
|
regex | ( |
class_key | (
|
class_regex | ( |
class_sep | ( |
sep_is_regex | ( |
class_rev | ( |
database | ( |
include_match | ( |
include_tax_data | ( |
Returns an object of type taxmap()
If you have invalid inputs or a download fails for
another reason, then there will be a "unknown" taxon ID as a placeholder
and failed inputs will be assigned to this ID. You can remove these using
filter_taxa()
like so: filter_taxa(result, taxon_ids != "unknown")
. Add
drop_obs = FALSE
if you want the input data, but want to remove the
taxon.
Other parsers:
lookup_tax_data()
,
parse_edge_list()
,
parse_tax_data()
if (FALSE) { # For demonstration purposes, the following example dataset has all the # types of data that can be used, but any one of them alone would work. raw_data <- c( ">id:AB548412-tid:9689-Panthera leo-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Panthera;S_leo", ">id:FJ358423-tid:9694-Panthera tigris-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Panthera;S_tigris", ">id:DQ334818-tid:9643-Ursus americanus-tax:K_Mammalia;P_Carnivora;C_Felidae;G_Ursus;S_americanus" ) # Build a taxmap object from classifications extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "info", org = "info", tax = "class"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$", class_sep = ";", class_regex = "^(.+)_(.+)$", class_key = c(my_rank = "info", tax_name = "taxon_name")) # Build a taxmap object from taxon ids # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "taxon_id", org = "info", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") # Build a taxmap object from ncbi sequence accession numbers # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "seq_id", my_tid = "info", org = "info", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") # Build a taxmap object from taxon names # Note: this requires an internet connection extract_tax_data(raw_data, key = c(my_seq = "info", my_tid = "info", org = "taxon_name", tax = "info"), regex = "^>id:(.+)-tid:(.+)-(.+)-tax:(.+)$") }