Main proteome retrieval function for an organism of interest. By specifying the scientific name of an organism of interest the corresponding fasta-file storing the proteome of the organism of interest can be downloaded and stored locally. Proteome files can be retrieved from several databases.
Usage
getProteome(
db = "refseq",
organism,
reference = TRUE,
release = NULL,
gunzip = FALSE,
path = file.path("_ncbi_downloads", "proteomes")
)
Arguments
- db
a character string specifying the database from which the genome shall be retrieved:
db = "refseq"
db = "genbank"
db = "ensembl"
db = "uniprot"
- organism
there are three options to characterize an organism:
by
scientific name
: e.g.organism = "Homo sapiens"
by
database specific accession identifier
: e.g.organism = "GCF_000001405.37"
(= NCBI RefSeq identifier forHomo sapiens
)by
taxonomic identifier from NCBI Taxonomy
: e.g.organism = "9606"
(= taxid ofHomo sapiens
)
- reference
a logical value indicating whether or not a genome shall be downloaded if it isn't marked in the database as either a reference genome or a representative genome.
- release
the database release version of ENSEMBL (
db = "ensembl"
). Default isrelease = NULL
meaning that the most recent database version is used.- gunzip
a logical value indicating whether or not files should be unzipped.
- path
a character string specifying the location (a folder) in which the corresponding proteome shall be stored. Default is
path
=file.path("_ncbi_downloads","proteomes")
.
Details
Internally this function loads the the overview.txt file from NCBI:
refseq: ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/
genbank: ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/
and creates a directory '_ncbi_downloads/proteomes' to store the proteome of interest as fasta file for future processing.
Examples
if (FALSE) {
# download the proteome of Arabidopsis thaliana from refseq
# and store the corresponding proteome file in '_ncbi_downloads/proteomes'
file_path <- getProteome( db = "refseq",
organism = "Arabidopsis thaliana",
path = file.path("_ncbi_downloads","proteomes") )
Ath_proteome <- read_proteome(file_path, format = "fasta")
# download the proteome of Arabidopsis thaliana from genbank
# and store the corresponding proteome file in '_ncbi_downloads/proteomes'
file_path <- getProteome( db = "genbank",
organism = "Arabidopsis thaliana",
path = file.path("_ncbi_downloads","proteomes") )
Ath_proteome <- read_proteome(file_path, format = "fasta")
}