Skip to contents

The bb_fingerprint function, given a data repository configuration, will return the timestamp of download and hashes of all files associated with its data sources. This is intended as a general helper for tracking data provenance: for all of these files, we have information on where they came from (the data source ID), when they were downloaded, and a hash so that later versions of those files can be compared to detect changes. See also vignette("data_provenance").

Usage

bb_fingerprint(config, hash = "sha1")

Arguments

config

bb_config: configuration as returned by bb_config

hash

string: algorithm to use to calculate file hashes: "md5", "sha1", or "none". Note that file hashing can be slow for large file collections

Value

a tibble with columns:

  • filename - the full path and filename of the file

  • data_source_id - the identifier of the associated data source (as per the id argument to bb_source)

  • size - the file size

  • last_modified - last modified date of the file

  • hash - the hash of the file (unless hash="none" was specified)

Examples

if (FALSE) { # \dontrun{
  cf <- bb_config("/my/file/root") %>%
    bb_add(bb_example_sources())
  bb_fingerprint(cf)
} # }