Helper function for retrieving biological sequence files from ENSEMBL
Source:R/getENSEMBL.R
getENSEMBL.Seq.Rd
This function downloads gff files of query organisms from ENSEMBL.
Arguments
- organism
Organism selector id, there are three options to characterize an organism:
by
scientific name
: e.g.organism = "Homo sapiens"
by
database specific accession identifier
: e.g.organism = "GCF_000001405.37"
(= NCBI RefSeq identifier forHomo sapiens
)by
taxonomic identifier from NCBI Taxonomy
: e.g.organism = "9606"
(= taxid ofHomo sapiens
)
- type
character, biological sequence type (e.g. "dna", "cds")
- id.type
a character, default "toplevel". id type of assembly, either "toplevel" or "primary_assembly" for genomes. Can be other strings, for non genome objects.
- release
a numeric, the database release version of ENSEMBL (
db = "ensembl"
). Default isrelease = NULL
meaning that the most recent database version is used.release = 75
would for human would give the stable GRCh37 release in ensembl. Value must be > 46, since ensembl did not structure their data if the standard format before that.- path
location where file shall be stored.