Skip to contents

Usage

tar_make_clustermq(
  names = NULL,
  shortcut = targets::tar_config_get("shortcut"),
  reporter = targets::tar_config_get("reporter_make"),
  seconds_meta_append = targets::tar_config_get("seconds_meta_append"),
  seconds_meta_upload = targets::tar_config_get("seconds_meta_upload"),
  seconds_reporter = targets::tar_config_get("seconds_reporter"),
  seconds_interval = targets::tar_config_get("seconds_interval"),
  workers = targets::tar_config_get("workers"),
  log_worker = FALSE,
  callr_function = callr::r,
  callr_arguments = targets::tar_callr_args_default(callr_function, reporter),
  envir = parent.frame(),
  script = targets::tar_config_get("script"),
  store = targets::tar_config_get("store"),
  garbage_collection = targets::tar_config_get("garbage_collection")
)

Arguments

names

Names of the targets to run or check. Set to NULL to check/run all the targets (default). Otherwise, you can supply tidyselect helpers like any_of() and starts_with(). Because tar_make() and friends run the pipeline in a new R session, if you pass a character vector to a tidyselect helper, you will need to evaluate that character vector early with !!, e.g. tar_make(names = any_of(!!your_vector)). Applies to ordinary targets (stem) and whole dynamic branching targets (patterns) but not to individual dynamic branches.

shortcut

Logical of length 1, how to interpret the names argument. If shortcut is FALSE (default) then the function checks all targets upstream of names as far back as the dependency graph goes. shortcut = TRUE increases speed if there are a lot of up-to-date targets, but it assumes all the dependencies are up to date, so please use with caution. It relies on stored metadata for information about upstream dependencies. shortcut = TRUE only works if you set names.

reporter

Character of length 1, name of the reporter to user. Controls how messages are printed as targets run in the pipeline. Defaults to tar_config_get("reporter_make"). Choices:

  • "silent": print nothing.

  • "summary": print a running total of the number of each targets in each status category (queued, dispatched, skipped, completed, canceled, or errored). Also show a timestamp ("%H:%M %OS2" strptime() format) of the last time the progress changed and printed to the screen.

  • "timestamp": same as the "verbose" reporter except that each .message begins with a time stamp.

  • "timestamp_positives": same as the "timestamp" reporter except without messages for skipped targets.

  • "verbose": print messages for individual targets as they start, finish, or are skipped. Each individual target-specific time (e.g. "3.487 seconds") is strictly the elapsed runtime of the target and does not include steps like data retrieval and output storage.

  • "verbose_positives": same as the "verbose" reporter except without messages for skipped targets.

seconds_meta_append

Positive numeric of length 1 with the minimum number of seconds between saves to the local metadata and progress files in the data store. Higher values generally make the pipeline run faster, but unsaved work (in the event of a crash) is not up to date. When the pipeline ends, all the metadata and progress data is saved immediately, regardless of seconds_meta_append.

seconds_meta_upload

Positive numeric of length 1 with the minimum number of seconds between uploads of the metadata and progress data to the cloud (see https://books.ropensci.org/targets/cloud-storage.html). Higher values generally make the pipeline run faster, but unsaved work (in the event of a crash) may not be backed up to the cloud. When the pipeline ends, all the metadata and progress data is uploaded immediately, regardless of seconds_meta_upload.

seconds_reporter

Positive numeric of length 1 with the minimum number of seconds between times when the reporter prints progress messages to the R console.

seconds_interval

Deprecated on 2023-08-24 (version 1.2.2.9001). Use seconds_meta_append, seconds_meta_upload, and seconds_reporter instead.

workers

Positive integer, number of persistent clustermq workers to create.

log_worker

Logical, whether to write a log file for each worker. Same as the log_worker argument of clustermq::Q() and clustermq::workers().

callr_function

A function from callr to start a fresh clean R process to do the work. Set to NULL to run in the current session instead of an external process (but restart your R session just before you do in order to clear debris out of the global environment). callr_function needs to be NULL for interactive debugging, e.g. tar_option_set(debug = "your_target"). However, callr_function should not be NULL for serious reproducible work.

callr_arguments

A list of arguments to callr_function.

envir

An environment, where to run the target R script (default: _targets.R) if callr_function is NULL. Ignored if callr_function is anything other than NULL. callr_function should only be NULL for debugging and testing purposes, not for serious runs of a pipeline, etc.

The envir argument of tar_make() and related functions always overrides the current value of tar_option_get("envir") in the current R session just before running the target script file, so whenever you need to set an alternative envir, you should always set it with tar_option_set() from within the target script file. In other words, if you call tar_option_set(envir = envir1) in an interactive session and then tar_make(envir = envir2, callr_function = NULL), then envir2 will be used.

script

Character of length 1, path to the target script file. Defaults to tar_config_get("script"), which in turn defaults to _targets.R. When you set this argument, the value of tar_config_get("script") is temporarily changed for the current function call. See tar_script(), tar_config_get(), and tar_config_set() for details about the target script file and how to set it persistently for a project.

store

Character of length 1, path to the targets data store. Defaults to tar_config_get("store"), which in turn defaults to _targets/. When you set this argument, the value of tar_config_get("store") is temporarily changed for the current function call. See tar_config_get() and tar_config_set() for details about how to set the data store path persistently for a project.

garbage_collection

Logical of length 1, whether to run garbage collection on the main process before sending a target to a worker. Independent from the garbage_collection argument of tar_target(), which controls garbage collection on the worker.

Value

NULL except if callr_function = callr::r_bg(), in which case a handle to the callr background process is returned. Either way, the value is invisibly returned.

Details

tar_make_clustermq() is like tar_make() except that targets run in parallel on persistent workers. A persistent worker is an R process that runs for a long time and runs multiple targets during its lifecycle. Persistent workers launch as soon as the pipeline reaches an outdated target with deployment = "worker", and they keep running until the pipeline starts to wind down.

To configure tar_make_clustermq(), you must configure the clustermq package. To do this, set global options clustermq.scheduler and clustermq.template inside the target script file (default: _targets.R). To read more about configuring clustermq for your scheduler, visit https://mschubert.github.io/clustermq/articles/userguide.html#configuration # nolint or https://books.ropensci.org/targets/hpc.html. clustermq is not a strict dependency of targets, so you must install clustermq yourself.

Storage access

Several functions like tar_make(), tar_read(), tar_load(), tar_meta(), and tar_progress() read or modify the local data store of the pipeline. The local data store is in flux while a pipeline is running, and depending on how distributed computing or cloud computing is set up, not all targets can even reach it. So please do not call these functions from inside a target as part of a running pipeline. The only exception is literate programming target factories in the tarchetypes package such as tar_render() and tar_quarto().

Several functions like tar_make(), tar_read(), tar_load(), tar_meta(), and tar_progress() read or modify the local data store of the pipeline. The local data store is in flux while a pipeline is running, and depending on how distributed computing or cloud computing is set up, not all targets can even reach it. So please do not call these functions from inside a target as part of a running pipeline. The only exception is literate programming target factories in the tarchetypes package such as tar_render() and tar_quarto().

See also

Other pipeline: tar_make_future(), tar_make()

Examples

if (!identical(tolower(Sys.info()[["sysname"]]), "windows")) {
if (identical(Sys.getenv("TAR_EXAMPLES"), "true")) { # for CRAN
tar_dir({ # tar_dir() runs code from a temp dir for CRAN.
tar_script({
  options(clustermq.scheduler = "multiprocess") # Does not work on Windows.
  tar_option_set()
  list(tar_target(x, 1 + 1))
}, ask = FALSE)
tar_make_clustermq()
})
}
}