This function takes a bowerbird configuration object and synchronizes each of the data sources defined within it. Data files will be downloaded if they are not present on the local machine, or if the configuration has been set to update local files.
Usage
bb_sync(
config,
create_root = FALSE,
verbose = FALSE,
catch_errors = TRUE,
confirm_downloads_larger_than = 0.1,
dry_run = FALSE
)
Arguments
- config
bb_config: configuration as returned by
bb_config
- create_root
logical: should the data root directory be created if it does not exist? If this is
FALSE
(default) and the data root directory does not exist, an error will be generated- verbose
logical: if
TRUE
, provide additional progress output- catch_errors
logical: if
TRUE
, catch errors and continue the synchronization process. The sync process works through data sources sequentially, and so ifcatch_errors
isFALSE
, then an error during the synchronization of one data source will prevent all subsequent data sources from synchronizing- confirm_downloads_larger_than
numeric or NULL: if non-negative,
bb_sync
will ask the user for confirmation to download any data source of size greater than this number (in GB). A value of zero will trigger confirmation on every data source. A negative or NULL value will not prompt for confirmation. Note that this only applies when R is being used interactively. The expected download size is taken from thecollection_size
parameter of the data source, and so its accuracy is dependent on the accuracy of the data source definition- dry_run
logical: if
TRUE
,bb_sync
will do a dry run of the synchronization process without actually downloading files
Value
a tibble with the name
, id
, source_url
, sync success status
, and files
of each data source. Data sources that contain multiple source URLs will appear as multiple rows in the returned tibble, one per source_url
. files
is a tibble with columns url
(the URL the file was downloaded from), file
(the path to the file), and note
(either "downloaded" for a file that was downloaded, "local copy" for a file that was not downloaded because there was already a local copy, or "decompressed" for files that were extracted from a downloaded (or already-locally-present) compressed file. url
will be NA
for "decompressed" files
Details
Note that when bb_sync
is run, the local_file_root
directory must exist or create_root=TRUE
must be specified (i.e. bb_sync(...,create_root=TRUE)
). If create_root=FALSE
and the directory does not exist, bb_sync
will fail with an error.
Examples
if (FALSE) { # \dontrun{
## Choose a location to store files on the local file system.
## Normally this would be an explicit choice by the user, but here
## we just use a temporary directory for example purposes.
td <- tempdir()
cf <- bb_config(local_file_root = td)
## Bowerbird must then be told which data sources to synchronize.
## Let's use data from the Australian 2016 federal election, which is provided as one
## of the example data sources:
my_source <- bb_example_sources("Australian Election 2016 House of Representatives data")
## Add this data source to the configuration:
cf <- bb_add(cf, my_source)
## Once the configuration has been defined and the data source added to it,
## we can run the sync process.
## We set \code{verbose=TRUE} so that we see additional progress output:
status <- bb_sync(cf, verbose = TRUE)
## The files in this data set have been stored in a data-source specific
## subdirectory of our local file root:
status$files[[1]]
## We can run this at any later time and our repository will update if the source has changed:
status2 <- bb_sync(cf, verbose = TRUE)
} # }