In intermediate we demonstrated how a complex, command-line program with input and output files (that we partly developed) can be cast as an outsider module.

On this page, we will develop a module for a typical command-line program with lots of arguments.

RAxML

For this walkthrough we will create a module for “RAxML” a program that can generate evolutionary trees using a maximum-likelihood. The exact functioning (or use even!) of the program is not relevant, but it demonstrates well the typical program which outsider aims to bring into the R environment: lots of arguments, complex algorithm developed in an alternative language.

To demonstrate the program’s complexity, here is the command’s syntax:

raxmlHPC[-SSE3|-AVX|-PTHREADS|-PTHREADS-SSE3|-PTHREADS-AVX|-HYBRID|-HYBRID-SSE3|HYBRID-AVX]
      -s sequenceFileName -n outputFileName -m substitutionModel
      [-a weightFileName] [-A secondaryStructureSubstModel]
      [-b bootstrapRandomNumberSeed] [-B wcCriterionThreshold]
      [-c numberOfCategories] [-C] [-d] [-D]
      [-e likelihoodEpsilon] [-E excludeFileName]
      [-f a|A|b|B|c|C|d|D|e|E|F|g|G|h|H|i|I|j|J|k|m|n|N|o|p|P|q|r|R|s|S|t|T|u|v|V|w|W|x|y] [-F]
      [-g groupingFileName] [-G placementThreshold] [-h] [-H]
      [-i initialRearrangementSetting] [-I autoFC|autoMR|autoMRE|autoMRE_IGN]
      [-j] [-J MR|MR_DROP|MRE|STRICT|STRICT_DROP|T_<PERCENT>] [-k] [-K] 
      [-L MR|MRE|T_<PERCENT>] [-M]
      [-o outGroupName1[,outGroupName2[,...]]][-O]
      [-p parsimonyRandomSeed] [-P proteinModel]
      [-q multipleModelFileName] [-r binaryConstraintTree]
      [-R binaryModelParamFile] [-S secondaryStructureFile] [-t userStartingTree]
      [-T numberOfThreads] [-u] [-U] [-v] [-V] [-w outputDirectory] [-W slidingWindowSize]
      [-x rapidBootstrapRandomNumberSeed] [-X] [-y] [-Y quartetGroupingFileName|ancestralSequenceCandidatesFileName]
      [-z multipleTreesFile] [-#|-N numberOfRuns|autoFC|autoMR|autoMRE|autoMRE_IGN]
      [--mesquite][--silent][--no-seq-check][--no-bfgs]
      [--asc-corr=stamatakis|felsenstein|lewis]
      [--flag-check][--auto-prot=ml|bic|aic|aicc]
      [--epa-keep-placements=number][--epa-accumulated-threshold=threshold]
      [--epa-prob-threshold=threshold]
      [--JC69][--K80][--HKY85]
      [--bootstop-perms=number]
      [--quartets-without-replacement]
      [---without-replacement]
      [--print-identical-sequences]

How are we going to code a module that is able to account for all these arguments?

Walkthrough

Building

As before, let’s just jump right in with the module skeleton.

library(outsider.devtools)
flpth <- module_skeleton(repo_user = 'dombennett', program_name = 'raxml',
                         docker_user = 'dombennett', flpth = tempdir(),
                         service = 'github')
# folder name where module is stored
print(basename(flpth))
## [1] "om..raxml"

Next, the Dockerfile. This is a much more complicated installation process. First we require the apt programs wget, make and gcc to download the program and compile it. Then we need to download and compile the program. The majority of the Dockerfile contents can be found in the installation instructions of the RAxML GitHub site.

dockerfile_text <- "
FROM ubuntu:latest
RUN apt-get update && apt-get install -y \
    wget make gcc
# 8.2 for this demonstration
RUN wget https://github.com/stamatak/standard-RAxML/archive/v8.2.12.tar.gz && \
    tar zxvf v8.2.12.tar.gz && \
    rm v8.2.12.tar.gz && \
    mv standard-RAxML-8.2.12 raxml
RUN cd /raxml && make -f Makefile.SSE3.PTHREADS.gcc && \
    rm *.o && cp raxmlHPC-PTHREADS-SSE3 /usr/bin/.
RUN mkdir /working_dir
WORKDIR /working_dir
"
# write to latest Dockerfile
cat(dockerfile_text, file = file.path(flpth, 'inst', 'dockerfiles', 'latest',
                                      'Dockerfile'))

Dockerfile defined. What should our R function be?

#' @name raxml
#' @title raxml
#' @description Run raxml
#' @param arglist Arguments to raxml provided as a character vector
#' @param outdir Filepath to where all output files should be returned.
#' @example examples/example.R
#' @export
raxml <- function(arglist = arglist_get(...), outdir = getwd()) {
  files_to_send <- filestosend_get(arglist)
  arglist <- arglist_parse(arglist)
  otsdr <- outsider_init(pkgnm = 'om..raxml', cmd = 'raxmlHPC-PTHREADS-SSE3',
                         wd = outdir, files_to_send = files_to_send,
                         arglist = arglist)
  run(otsdr)
}

In this function, input and output files are intermixed in the arguments passed to RAxML. To determine what is a file that should be sent to the container from a normal argument, we have filestosend_get. This function goes through all the provided arguments to check for any files (it tests whether they are valid file paths) and returns them as a character vector to files_to_send. Then we have arglist_parse, this converts all given arguments to their basename so that absolute filepaths are dropped. All input files will be passed to the containers working_dir/ and therefore no parental file path are required. This function has additional functions for dropping irrelevant arguments in the context of an outsider module (e.g. no need to specify an output directory at the command-line, this would need to be done at the initiation of the outsider object.)

What’s with this outdir-thingy? The RAxML function returns a series of files. To ensure they are all returned in a convenient location, we have created an additional argument to the raxml() function called outdir. It can be good practice to break-up information that needs to be processed by the module function into things like arglist and outdir. In some instances, you may be calling a program where additional information can be provided outside of remit of the main program being emulated. For example in our last example we had input_file and output_file. Other examples include, number of threads, memory allocation for the program, etc.

OK! So that’s what the function looks like. Let’s write it to file.

function_text <- "
#' @name raxml
#' @title raxml
#' @description Run raxml
#' @param arglist Arguments to raxml provided as a character vector
#' @param outdir Filepath to where all output files should be returned.
#' @example examples/example.R
#' @export
raxml <- function(arglist = arglist_get(...), outdir = getwd()) {
  files_to_send <- filestosend_get(arglist)
  arglist <- arglist_parse(arglist)
  otsdr <- outsider_init(pkgnm = 'om..raxml', cmd = 'raxmlHPC-PTHREADS-SSE3',
                         wd = outdir, files_to_send = files_to_send,
                         arglist = arglist)
  run(otsdr)
}
"
# write to R/functions.R
cat(function_text, file = file.path(flpth, 'R', 'functions.R'))

Let’s check the setup.

Ok, let’s build, test and upload the module! (The test should pass because by default, -h is used in the example and this is a valid argument for RAxML.)

module_build(flpth = flpth)
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running devtools::document() ...
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Updating om..raxml documentation
## First time using roxygen2. Upgrading automatically...
## Updating roxygen version in /private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml/DESCRIPTION
## Writing NAMESPACE
## Loading om..raxml
## Writing NAMESPACE
## Writing raxml.Rd
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running devtools::install() ...
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##   
   checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml/DESCRIPTION’ ...
  
✓  checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml/DESCRIPTION’
## 
  
─  preparing ‘om..raxml’:
## 
  
   checking DESCRIPTION meta-information ...
  
✓  checking DESCRIPTION meta-information
## 
  
─  checking for LF line-endings in source and make files and shell scripts
## 
  
─  checking for empty or unneeded directories
## 
  
─  building ‘om..raxml_0.0.1.tar.gz’
## 
  
   
## 
Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
##   /var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T//RtmpquGDgt/om..raxml_0.0.1.tar.gz --install-tests 
## * installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
## 
-
* installing *source* package ‘om..raxml’ ...
## 
|
** using staged installation
## 
-
** R
## 
|
** inst
## 
-
** byte-compile and prepare package for lazy loading
## 
|

-
** help
## 
|
*** installing help indices
## 
-
** building package indices
## 
|
** testing if installed package can be loaded from temporary location
## 
-
** testing if installed package can be loaded from final location
## 
|
** testing if installed package keeps a record of temporary installation path
## 
-
* DONE (om..raxml)
## 
|

-

 
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running docker_build()
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Command:
## docker build -t dombennett/om_raxml:latest /Library/Frameworks/R.framework/Versions/3.6/Resources/library/om..raxml/dockerfiles/latest
## .....................................................................................................................
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running devtools::build_readme() ...
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Building om..raxml readme
module_test(flpth = flpth)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
##   /private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml --install-tests 
## * installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
## 
-
* installing *source* package ‘om..raxml’ ...
## 
|
** using staged installation
## 
-
** R
## 
|
** inst
## 
-
** byte-compile and prepare package for lazy loading
## 
|

-
** help
## 
|
*** installing help indices
## 
-
** building package indices
## 
|
** testing if installed package can be loaded from temporary location
## 
-
** testing if installed package can be loaded from final location
## 
|
** testing if installed package keeps a record of temporary installation path
## 
-
* DONE (om..raxml)
## 
|

-

 
## No Docker image found for 'om..raxml' -- attempting to pull/build image with tag 'latest'
## Removing package from '/Library/Frameworks/R.framework/Versions/3.6/Resources/library'
## (as 'lib' is unspecified)
## The gloating goat got away! The module is not working....
## But keep on forming! You're doing correctly!

And then upload ….

module_upload(flpth = flpth, code_sharing = TRUE, dockerhub = TRUE)

Continuous Integration (GitHub only)

OK. So you have a module that works and passes its tests. How do we inform the world that it works? The answer: Travis-CI

Travis is a free service that monitors your GitHub for updates and then runs a series of tests on a remote server to determine whether it passes its tests or not. The result of these tests can be added to a repo’s README in form of an image badge: e.g. “Tests Passing”.

To set-up travis for your repo, visit the website https://travis-ci.org/, sign in with your GitHub account and then activate the service your repo of choice, e.g. om..raxml.

Then we need to provide the test instructions for Travis in the form of a YAML file called .travis.yml and upload this to our GitHub repo. The .travis.yml can be created for outsider modules so:

module_travis(flpth = flpth)

Then simply update the repo on GitHub!

Taking the module forward

After you have built a module that passes its tests and is available for download, there are a few extra things to consider:

  • Update the om.yml Add a more detailed description
  • Update the README The default README is very bland. It provides a template for what sorts of things a good README should contain: installation, how to run, where to get more help. Remember to only edit the README.Rmd. To update the README.md for GitHub, you need to “knit” the .Rmd file by running devtools::build_readme() – this is the function module_build() runs.
  • Ensure discoverability outsider::module_search can find outsider module repos on GitHub if they have and om.yml, have their name in the form om.. and have the phrase outsider-module: in their description.

Delete it all

module_uninstall(repo = 'om..raxml')
unlink(x = flpth, recursive = TRUE, force = TRUE)

Tips and tricks

  • The best way to learn how to build your own module is to look at how others have created modules for programs you are familiar with. Check out the list of already available modules at this page.
  • If you find issues with the docker image build step, you can try “logging into” a container and running the installation steps to test out what works. At the terminal, use