A drake plan is a data frame with columns
"target" and "command". Each target is an R object
produced in your workflow, and each command is the
R code to produce it.
Usage
drake_plan(
...,
list = NULL,
file_targets = NULL,
strings_in_dots = NULL,
tidy_evaluation = NULL,
transform = TRUE,
trace = FALSE,
envir = parent.frame(),
tidy_eval = TRUE,
max_expand = NULL
)Arguments
- ...
A collection of symbols/targets with commands assigned to them. See the examples for details.
- list
Deprecated
- file_targets
Deprecated.
- strings_in_dots
Deprecated.
- tidy_evaluation
Deprecated. Use
tidy_evalinstead.- transform
Logical, whether to transform the plan into a larger plan with more targets. Requires the
transformfield intarget(). See the examples for details.- trace
Logical, whether to add columns to show what happens during target transformations.
- envir
Environment for tidy evaluation.
- tidy_eval
Logical, whether to use tidy evaluation (e.g. unquoting/
!!) when resolving commands. Tidy evaluation in transformations is always turned on regardless of the value you supply to this argument.- max_expand
Positive integer, optional.
max_expandis the maximum number of targets to generate in eachmap(),split(), orcross()transform. Useful if you have a massive plan and you want to test and visualize a strategic subset of targets before scaling up. Note: themax_expandargument ofdrake_plan()andtransform_plan()is for static branching only. The dynamic branchingmax_expandis an argument ofmake()anddrake_config().
Details
Besides "target" and "command", drake_plan()
understands a special set of optional columns. For details, visit
https://books.ropensci.org/drake/plans.html#special-custom-columns-in-your-plan # nolint
Columns
drake_plan() creates a special data frame. At minimum, that data frame
must have columns target and command with the target names and the
R code chunks to build them, respectively.
You can add custom columns yourself, either with target() (e.g.
drake_plan(y = target(f(x), transform = map(c(1, 2)), format = "fst")))
or by appending columns post-hoc (e.g. plan$col <- vals).
Some of these custom columns are special. They are optional,
but drake looks for them at various points in the workflow.
transform: a call tomap(),split(),cross(), orcombine()to create and manipulate large collections of targets. Details: (https://books.ropensci.org/drake/plans.html#large-plans). # nolintformat: set a storage format to save big targets more efficiently. See the "Formats" section of this help file for more details.trigger: rule to decide whether a target needs to run. It is recommended that you define this one withtarget(). Details:https://books.ropensci.org/drake/triggers.html.hpc: logical values (TRUE/FALSE/NA) whether to send each target to parallel workers. Visithttps://books.ropensci.org/drake/hpc.html#selectivityto learn more.resources: target-specific lists of resources for a computing cluster. Seehttps://books.ropensci.org/drake/hpc.html#advanced-optionsfor details.caching: overrides thecachingargument ofmake()for each target individually. Possible values:"main": tell the main process to store the target in the cache.
"worker": tell the HPC worker to store the target in the cache.
NA: default to the
cachingargument ofmake().
elapsedandcpu: number of seconds to wait for the target to build before timing out (elapsedfor elapsed time andcpufor CPU time).retries: number of times to retry building a target in the event of an error.seed: an optional pseudo-random number generator (RNG) seed for each target.drakeusually comes up with its own unique reproducible target-specific seeds using the global seed (theseedargument tomake()anddrake_config()) and the target names, but you can overwrite these automatic seeds.NAentries default back todrake's automatic seeds.max_expand: for dynamic branching only. Same as themax_expandargument ofmake(), but on a target-by-target basis. Limits the number of sub-targets created for a given target.
Formats
Specialized target formats increase efficiency and flexibility.
Some allow you to save specialized objects like keras models,
while others increase the speed while conserving storage and memory.
You can declare target-specific formats in the plan
(e.g. drake_plan(x = target(big_data_frame, format = "fst")))
or supply a global default format for all targets in make().
Either way, most formats have specialized installation requirements
(e.g. R packages) that are not installed with drake by default.
You will need to install them separately yourself.
Available formats:
"file": Dynamic files. To use this format, simply create local files and directories yourself and then return a character vector of paths as the target's value. Then,drakewill watch for changes to those files in subsequent calls tomake(). This is a more flexible alternative tofile_in()andfile_out(), and it is compatible with dynamic branching. Seehttps://github.com/ropensci/drake/pull/1178for an example."fst": save big data frames fast. Requires thefstpackage. Note: this format strips non-data-frame attributes such as the"fst_tbl": Like"fst", but fortibbleobjects. Requires thefstandtibblepackages. Strips away non-data-frame non-tibble attributes."fst_dt": Like"fst"format, but fordata.tableobjects. Requires thefstanddata.tablepackages. Strips away non-data-frame non-data-table attributes."diskframe": Storesdisk.frameobjects, which could potentially be larger than memory. Requires thefstanddisk.framepackages. Coerces objects todisk.frames. Note:disk.frameobjects get moved to thedrakecache (a subfolder of.drake/for most workflows). To ensure this data transfer is fast, it is best to save yourdisk.frameobjects to the same physical storage drive as thedrakecache,as.disk.frame(your_dataset, outdir = drake_tempfile())."keras": save Keras models as HDF5 files. Requires thekeraspackage."qs": save any R object that can be properly serialized with theqspackage. Requires theqspackage. Usesqsave()andqread(). Uses the default settings inqsversion 0.20.2."rds": save any R object that can be properly serialized. Requires R version >= 3.5.0 due to ALTREP. Note: the"rds"format uses gzip compression, which is slow."qs"is a superior format.
Keywords
drake_plan() understands special keyword functions for your commands.
With the exception of target(), each one is a proper function
with its own help file.
target(): give the target more than just a command. Usingtarget(), you can apply a transformation (examples:https://books.ropensci.org/drake/plans.html#large-plans), # nolint supply a trigger (https://books.ropensci.org/drake/triggers.html), # nolint or set any number of custom columns.file_in(): declare an input file dependency.file_out(): declare an output file to be produced when the target is built.knitr_in(): declare aknitrfile dependency such as an R Markdown (*.Rmd) or R LaTeX (*.Rnw) file.ignore(): forcedraketo entirely ignore a piece of code: do not track it for changes and do not analyze it for dependencies.no_deps(): telldraketo not track the dependencies of a piece of code.drakestill tracks the code itself for changes.id_chr(): Get the name of the current target.drake_envir(): get the environment where drake builds targets. Intended for advanced custom memory management.
Transformations
drake has special syntax for generating large plans.
Your code will look something like
drake_plan(y = target(f(x), transform = map(x = c(1, 2, 3)))
You can read about this interface at
https://books.ropensci.org/drake/plans.html#large-plans. # nolint
Static branching
In static branching, you define batches of targets
based on information you know in advance.
Overall usage looks like
drake_plan(<x> = target(<...>, transform = <call>),
where
<x>is the name of the target or group of targets.<...>is optional arguments totarget().<call>is a call to one of the transformation functions.
Transformation function usage:
map(..., .data, .names, .id, .tag_in, .tag_out)split(..., slices, margin = 1L, drop = FALSE, .names, .tag_in, .tag_out)# nolintcross(..., .data, .names, .id, .tag_in, .tag_out)combine(..., .by, .names, .id, .tag_in, .tag_out)
Dynamic branching
map(..., .trace)cross(..., .trace)group(..., .by, .trace)
map() and cross() create dynamic sub-targets from the variables
supplied to the dots. As with static branching, the variables
supplied to map() must all have equal length.
group(f(data), .by = x) makes new dynamic
sub-targets from data. Here, data can be either static or dynamic.
If data is dynamic, group() aggregates existing sub-targets.
If data is static, group() splits data into multiple
subsets based on the groupings from .by.
Differences from static branching:
...must contain unnamed symbols with no values supplied, and they must be the names of targets.Arguments
.id,.tag_in, and.tag_outno longer apply.
Examples
if (FALSE) { # \dontrun{
isolate_example("contain side effects", {
# For more examples, visit
# https://books.ropensci.org/drake/plans.html.
# Create drake plans:
mtcars_plan <- drake_plan(
write.csv(mtcars[, c("mpg", "cyl")], file_out("mtcars.csv")),
value = read.csv(file_in("mtcars.csv"))
)
if (requireNamespace("visNetwork", quietly = TRUE)) {
plot(mtcars_plan) # fast simplified call to vis_drake_graph()
}
mtcars_plan
make(mtcars_plan) # Makes `mtcars.csv` and then `value`
head(readd(value))
# You can use knitr inputs too. See the top command below.
load_mtcars_example()
head(my_plan)
if (requireNamespace("knitr", quietly = TRUE)) {
plot(my_plan)
}
# The `knitr_in("report.Rmd")` tells `drake` to dive into the active
# code chunks to find dependencies.
# There, `drake` sees that `small`, `large`, and `coef_regression2_small`
# are loaded in with calls to `loadd()` and `readd()`.
deps_code("report.Rmd")
# Formats are great for big data: https://github.com/ropensci/drake/pull/977
# Below, each target is 1.6 GB in memory.
# Run make() on this plan to see how much faster fst is!
n <- 1e8
plan <- drake_plan(
data_fst = target(
data.frame(x = runif(n), y = runif(n)),
format = "fst"
),
data_old = data.frame(x = runif(n), y = runif(n))
)
# Use transformations to generate large plans.
# Read more at
# `https://books.ropensci.org/drake/plans.html#create-large-plans-the-easy-way`. # nolint
drake_plan(
data = target(
simulate(nrows),
transform = map(nrows = c(48, 64)),
custom_column = 123
),
reg = target(
reg_fun(data),
transform = cross(reg_fun = c(reg1, reg2), data)
),
summ = target(
sum_fun(data, reg),
transform = cross(sum_fun = c(coef, residuals), reg)
),
winners = target(
min(summ),
transform = combine(summ, .by = c(data, sum_fun))
)
)
# Split data among multiple targets.
drake_plan(
large_data = get_data(),
slice_analysis = target(
analyze(large_data),
transform = split(large_data, slices = 4)
),
results = target(
rbind(slice_analysis),
transform = combine(slice_analysis)
)
)
# Set trace = TRUE to show what happened during the transformation process.
drake_plan(
data = target(
simulate(nrows),
transform = map(nrows = c(48, 64)),
custom_column = 123
),
reg = target(
reg_fun(data),
transform = cross(reg_fun = c(reg1, reg2), data)
),
summ = target(
sum_fun(data, reg),
transform = cross(sum_fun = c(coef, residuals), reg)
),
winners = target(
min(summ),
transform = combine(summ, .by = c(data, sum_fun))
),
trace = TRUE
)
# You can create your own custom columns too.
# See ?triggers for more on triggers.
drake_plan(
website_data = target(
command = download_data("www.your_url.com"),
trigger = "always",
custom_column = 5
),
analysis = analyze(website_data)
)
# Tidy evaluation can help generate super large plans.
sms <- rlang::syms(letters) # To sub in character args, skip this.
drake_plan(x = target(f(char), transform = map(char = !!sms)))
# Dynamic branching
# Get the mean mpg for each cyl in the mtcars dataset.
plan <- drake_plan(
raw = mtcars,
group_index = raw$cyl,
munged = target(raw[, c("mpg", "cyl")], dynamic = map(raw)),
mean_mpg_by_cyl = target(
data.frame(mpg = mean(munged$mpg), cyl = munged$cyl[1]),
dynamic = group(munged, .by = group_index)
)
)
make(plan)
readd(mean_mpg_by_cyl)
})
} # }