Skip to contents

This function generates a pipeline.nix file based on a list of derivation objects. Each derivation defines a build step, and rxp_populate() chains these steps and handles the serialization and conversion of Python objects into R objects (or vice-versa). Derivations are created with rxp_r(), rxp_py() and so on. By default, the pipeline is also immediately built after being generated, but the build process can be postponed by setting build to FALSE. In this case, the pipeline can then be built using rxp_make() at a later stage.

Usage

rxp_populate(derivs, project_path = ".", build = FALSE, py_imports = NULL, ...)

Arguments

derivs

A list of derivation objects, where each object is a list of five elements: - $name, character, name of the derivation - $snippet, character, the nix code snippet to build this derivation - $type, character, can be R, Python or Quarto - $additional_files, character vector of paths to files to make available to build sandbox - $nix_env, character, path to Nix environment to build this derivation A single deriv is the output of rxp_r(), rxp_qmd() or rxp_py() function.

project_path

Path to root of project, defaults to ".".

build

Logical, defaults to FALSE. Should the pipeline get built right after being generated? When FALSE, use rxp_make() to build the pipeline at a later stage.

py_imports

Named character vector of Python import rewrites. Names are the base modules that rixpress auto-imports as "import ", and values are the desired import lines. For example: c(numpy = "import numpy as np", xgboost = "from xgboost import XGBClassifier"). Each entry is applied by replacing "import " with the provided string across generated _rixpress Python library files.

...

Further arguments passed down to methods. Use max-jobs and cores to set parallelism during build. See the documentation of rxp_make() for more details.

Value

Nothing, writes a file called pipeline.nix with the Nix code to build the pipeline, as well as folder called _rixpress with required internal files.

Details

The generated pipeline.nix expression includes:

  • the required imports of environments, typically default.nix files generated by the rix package;

  • correct handling of interdependencies of the different derivations;

  • serialization and deserialization of both R and Python objects, and conversion between them when objects are passed from one language to another;

  • correct loading of R and Python packages, or extra functions needed to build specific targets

The _rixpress folder contains:

  • R, Python or Julia scripts to load the required packages that need to be available to the pipeline.

  • a JSON file with the DAG of the pipeline, used for visualisation, and to allow rxp_populate() to generate the right dependencies between derivations.

  • .rds files with build logs, required for rxp_inspect() and rxp_gc(). See vignette("debugging") for more details.

Inline Python import adjustments In some cases, due to the automatic handling of Python packages, users might want to change import statements. By default if, say, pandas is needed to build a derivation, it will be imported with import pandas. However, Python programmers typically use import pandas as pd. You can either:

See also

Other pipeline functions: rxp_make()

Examples

if (FALSE) { # \dontrun{
# Create derivation objects
d1 <- rxp_r(mtcars_am, filter(mtcars, am == 1))
d2 <- rxp_r(mtcars_head, head(mtcars_am))
list_derivs <- list(d1, d2)

# Generate and build in one go
rxp_populate(derivs = list_derivs, project_path = ".", build = TRUE)

# Or only populate, with inline Python import adjustments
rxp_populate(
  derivs = list_derivs,
  project_path = ".",
  build = FALSE,
  py_imports = c(pandas = "import pandas as pd")
)
# Then later:
rxp_make()
} # }