Process the output of meta.retrieval by first
un-zipping downloaded files and renaming them for more convenient downstream data analysis.
Arguments
- x
a vector containing file paths to the output files generated by
meta.retrieval.- gunzip
a logical value indicating whether or not files should only be renamed (
gunzip = FALSE) or renamed AND unzipped (gunzip = TRUE, default). Original file will be removed.- gunzip_overwrite
a logical value indicating to allow overwriting existing uncompressed files (
gunzip_overwrite = TRUE, default) or abort if it exists (gunzip_overwrite = FALSE). Ignored if argument 'gunzip' is FALSE.
Details
The output of meta.retrieval usually contains compressed sequence files
and a naming convention based on the database the respective file was retrieved from (e.g. Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gz).
This function helps to format the meta.retrieval output files by
1) Automatically uncompress all sequence files in the
meta.retrievaloutput folder2) Automatically rename files from e.g.
Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gztoSaccharomycesCerevisiae.fna. This allows more convenient downstream analyses and visualizations.
Examples
# Make mock file and clean up
path <- tempfile(fileext = "/Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gz")
dir.create(dirname(path))
saveRDS(1, path)
clean.retrieval(path)
#> Cleaning file names and unzipping files ...
#> Unzipping file Saccharomyces_cerevisiae_cds_from_genomic_refseq.fna.gz' ...
#> Finished formatting.
#> [1] "/tmp/RtmptuT7Mx/file58d296718a6/SaccharomycesCerevisiae.fna"
list.files(dirname(path)) # Original file is gone
#> [1] "SaccharomycesCerevisiae.fna"
if (FALSE) { # \dontrun{
# The easiest way to use 'clean.retrieval()' in combination with
# 'meta.retrieval()' is to use the pipe operator from the 'magrittr' package
library(magrittr)
meta.retrieval(kingdom = "vertebrate_mammalian",
db = "refseq",
type = "genome") %>% clean.retrieval()
} # }
