Skip to contents

This function retrieves the names of all genomes available on the NCBI ftp:// server and stores the results in a file named 'overview.txt' inside the directory _ncbi_downloads' that is built inside the workspace.

Usage

listGenomes(db = "refseq", type = "all", subset = NULL, details = FALSE)

Arguments

db

a character string specifying the database for which genome availability shall be checked. Available options are:

  • db = "refseq"

  • db = "genbank"

  • db = "ensembl"

type

a character string specifying a potential filter of available genomes. Available options are:

  • type = "all"

  • type = "kingdom"

  • type = "group"

  • type = "subgroup"

subset

a character string or character vector specifying a subset of type. E.g. if users are interested in retrieving all Eukaryota species, they can specify: type = "kingdom" and subset = "Eukaryota".

details

a boolean value specifying whether only the scientific names of stored genomes shall be returned (details = FALSE) or all information such as

  • organism_name

  • kingdoms

  • group

  • subgroup

  • file_size_MB, etc.

Details

Internally this function loads the the overview.txt file from NCBI and creates a directory '_ncbi_downloads' in the temdir() folder to store the overview.txt file for future processing. In case the overview.txt file already exists within the '_ncbi_downloads' folder and is accessible within the workspace, no download process will be performed again.

Note

Please note that the ftp:// connection relies on the NCBI or ENSEMBL server and cannot be accurately accessed via a proxy.

Author

Hajk-Georg Drost

Examples

if (FALSE) {
# print details for refseq
listGenomes(db = "refseq") 
# print details for all plants in refseq
listGenomes(db = "refseq", type = "kingdom")
# print details for all plant groups in refseq
listGenomes(db = "refseq", type = "group")
# print details for all plant subgroups in refseq
listGenomes(db = "refseq", type = "subgroup")
}