The bb_fingerprint
function, given a data repository configuration, will return the timestamp of download and hashes of all files associated with its data sources. This is intended as a general helper for tracking data provenance: for all of these files, we have information on where they came from (the data source ID), when they were downloaded, and a hash so that later versions of those files can be compared to detect changes. See also vignette("data_provenance")
.
Arguments
- config
bb_config: configuration as returned by
bb_config
- hash
string: algorithm to use to calculate file hashes: "md5", "sha1", or "none". Note that file hashing can be slow for large file collections
Value
a tibble with columns:
filename - the full path and filename of the file
data_source_id - the identifier of the associated data source (as per the
id
argument tobb_source
)size - the file size
last_modified - last modified date of the file
hash - the hash of the file (unless
hash="none"
was specified)
Examples
if (FALSE) { # \dontrun{
cf <- bb_config("/my/file/root") %>%
bb_add(bb_example_sources())
bb_fingerprint(cf)
} # }