Skip to contents

Process the output of meta.retrieval by first un-zipping downloaded files and renaming them for more convenient downstream data analysis.

Usage

clean.retrieval(x, gunzip = TRUE, gunzip_overwrite = TRUE)

Arguments

x

a vector containing file paths to the output files generated by meta.retrieval.

gunzip

a logical value indicating whether or not files should only be renamed (gunzip = FALSE) or renamed AND unzipped (gunzip = TRUE, default). Original file will be removed.

gunzip_overwrite

a logical value indicating to allow overwriting existing uncompressed files (gunzip_overwrite = TRUE, default) or abort if it exists (gunzip_overwrite = FALSE). Ignored if argument 'gunzip' is FALSE.

Details

The output of meta.retrieval usually contains compressed sequence files and a naming convention based on the database the respective file was retrieved from (e.g. Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gz). This function helps to format the meta.retrieval output files by

  • 1) Automatically uncompress all sequence files in the meta.retrieval output folder

  • 2) Automatically rename files from e.g. Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gz to SaccharomycesCerevisiae.fna. This allows more convenient downstream analyses and visualizations.

See also

Author

Hajk-Georg Drost

Examples

# Make mock file and clean up
path <- tempfile(fileext = "/Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gz")
dir.create(dirname(path))
saveRDS(1, path)
clean.retrieval(path)
#> Cleaning file names and unzipping files ...
#> Unzipping file Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gz' ...
#> Finished formatting.
#> [1] "/tmp/RtmptuT7Mx/file58d296718a6/SaccharomycesCerevisiae.fna"
list.files(dirname(path)) # Original file is gone
#> [1] "SaccharomycesCerevisiae.fna"
if (FALSE) { # \dontrun{

# The easiest way to use 'clean.retrieval()' in combination with
# 'meta.retrieval()' is to use the pipe operator from the 'magrittr' package
library(magrittr)
meta.retrieval(kingdom = "vertebrate_mammalian",
               db = "refseq",
               type = "genome") %>% clean.retrieval()
} # }