Many data science, statistical and academic projects require running models on external software and then performing subsequent analyses of the results in R e.g. for project specific testing and visualisation.
outsider aims to make this process simpler by first enabling non-R software to be run from within R and, second, by making it easier to install external program.
outsider package acts as an interface between the R environment and external programs that are hosted on virtual machines. A virtual machine is hosted on a user’s computer but acts like an external computer with its own operating system. These virtual machines are run through the program Docker. So long as a computer is running Docker, then any of these virtual machines can be downloaded and run without any installation process. Docker runs on multiple operating systems including Windows, OSX and Linux.
For every external program provided through
outsider, a virtual machine, or “Docker image”, needs to be described and specific R code – for launching and interacting with the program – is required. This Docker image and R code are provided through outsider modules that are hosted on GitHub.
Before you can make the most of
outsider you will need to install and start running Docker. Follow the installation instructions for your specific operating system, “Install Docker”.
For some operating systems, “Docker Desktop” is not available. If that is the case, try “Docker Toolbox”. This is a legacy Docker for older operating systems. It has similar functionality but requires a virtual machine and has greater computational overhead.
With Docker installed, you then can install
outsider via GitHub.
xcode-select --installin the terminal.
library(outsider) # repo = NULL will search for ALL available modules # (this may take a long time, depends on internet connection and remote server) print(module_details(repo = 'dombennett/om..mafft'))
## # A tibble: 1 x 7 ## repo program details versions updated_at watchers_count url ## <chr> <chr> <chr> <chr> <dttm> <int> <chr> ## 1 dombennett/… mafft Multiple alignment program for… latest 2020-01-16 10:36:00 0 https://github.co…
To install a module, all that is required is to provide the repo name to the function
repois the unique name for a GitHub repository that hosts an
outsidermodule. It consists of two parts: a GitHub username and a project name. Given its uniqueness, all modules are referred to by their
To confirm the module is installed on a computer, it might be useful to use
module_installed(). This function returns a table of all installed modules.
## # A tibble: 3 x 7 ## package image tag program url image_created image_id ## <fct> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 om..hello.world dombennett/om_hello_… latest hello world https://github.com/DomBennett/o… 8 months ago acdff0a24… ## 2 om..mafft dombennett/om_mafft latest mafft https://github.com/DomBennett/o… 13 months ago 97170a5f7… ## 3 om..partitionf… dombennett/om_partit… <NA> PartitionFi… https://github.com/DomBennett/o… <NA> <NA>
All modules contain functions for interacting with the external program that they host. To see these functions we can use
module_help() to look up the help documents.
library(outsider) # the whole module module_help(repo = 'dombennett/om..mafft') # specific function of a module (if known) module_help(repo = 'dombennett/om..mafft', fname = 'mafft')
Once a function name is known of a particular module, the function can be imported with
library(outsider) mafft <- module_import(fname = 'mafft', repo = 'dombennett/om..mafft') print(is(mafft))
##  "function" "OptionalFunction" "PossibleMethod"
mafft? “mafft” is a multiple alignment tool for for biological sequences. Note, a user can use any name they wish for the function when it is imported. For example,
mafftymcmafftface <- module_import( ...would work equally well.
The imported functions from modules act like portals to the external programs the modules host. To run a command, a user needs to use the function name and give arguments corresponding to the arguments of the external program. For example, on command-line to list the help information for
mafft, we would write
mafft --help. With
outsider we can do run
For a more complicated example, we could launch a small analysis with
mafft like so.
library(outsider) mafft <- module_import(fname = 'mafft', repo = 'dombennett/om..mafft') mafft(arglist = c('--auto', 'input_sequences.fasta', '>', 'output_alignment.fasta'))
Why the spaces between arguments? All the arguments of the external, command-line program must be provided as separated characters. This helps
outsiderparse the elements.
mafft(arglist = c('--auto', 'input_sequences.fasta', '>', 'output_alignment.fasta'))
describes in R how to call the MAFFT program via command-line/terminal. It is equivalent to
mafft --auto input_sequences.fasta > output_alignment.fasta if we were to call the program via command-line/terminal. How do we know how to structure the program arguments? In the case of MAFFT we can look-up the arguments on their website, mafft.cbrc.jp. But often for command-line programs we can call for help with
--help. For MAFFT at the command-line, we could run
mafft --help or with
outsider we can run:
mafft(arglist = '--help')
## ------------------------------------------------------------------------------ ## MAFFT v7.407 (2018/Jul/23) ## - https://mafft.cbrc.jp/alignment/software/ ## MBE 30:772-780 (2013), NAR 30:3059-3066 (2002) ## ------------------------------------------------------------------------------ ## High speed: ## % mafft in > out ## % mafft --retree 1 in > out (fast) ## ## High accuracy (for <~200 sequences x <~2,000 aa/nt): ## % mafft --maxiterate 1000 --localpair in > out (% linsi in > out is also ok) ## % mafft --maxiterate 1000 --genafpair in > out (% einsi in > out) ## % mafft --maxiterate 1000 --globalpair in > out (% ginsi in > out) ## ## If unsure which option to use: ## % mafft --auto in > out ## ## --op # : Gap opening penalty, default: 1.53 ## --ep # : Offset (works like gap extension penalty), default: 0.0 ## --maxiterate # : Maximum number of iterative refinement, default: 0 ## --clustalout : Output: clustal format, default: fasta ## --reorder : Outorder: aligned, default: input order ## --quiet : Do not report progress ## --thread # : Number of threads (if unsure, --thread -1) ##
The help page returned tells us how to structure the arguments:
[options] [input_file] > [output_file]
Where the options (e.g. alignment method, number of threads) are always indicated first with
-- and the input and output files are indicated second with the
Clean up your computer by removing unwanted modules with
outsider’s utility is limited by the number of available modules. Fortunately, it is very easy to create and upload your own module. The package comes with a range of helper functions for minimising the amount of coding for a module developer. If you know how to install an external program on your own computer which you would like would like to run it through
outsider and you have some experience with GitHub, then explore the “outsider.devtools” package.