Add information about the relationships among DataObject members
in a DataPackage, retrospectively describing the way in which derived data were
created from source data using a processing program such as an R script. These provenance
relationships allow the derived data to be understood sufficiently for users
to be able to reproduce the computations that created the derived data, and to
trace lineage of the derived data objects. The method describeWorkflow
will add provenance relationships between a script that was executed, the files
that it used as sources, and the derived files that it generated.
Arguments
- x
The
DataPackage
to add provenance relationships to.- ...
Additional parameters
- sources
A list of DataObjects for files that were read by the program. Alternatively, a list of DataObject identifiers can be specified as a list of character strings.
- program
The DataObject created for the program such as an R script. Alternatively the DataObject identifier can be specified.
- derivations
A list of DataObjects for files that were generated by the program. Alternatively, a list of DataObject identifiers can be specified as a list of character strings.
- insertDerivations
A
logical
value. If TRUE then the provenance relationshipprov:wasDerivedFrom
will be used to connect every source and derivation. The default value is TRUE.
Details
This method operates on a DataPackage that has had DataObjects for the script, data sources (inputs), and data derivations (outputs) previously added to it, or can reference identifiers for objects that exist in other DataPackage instances. This allows a user to create a standalone package that contains all of its source, script, and derived data, or a set of data packages that are chained together via a set of derivation relationships between the members of those packages.
Provenance relationships are described following the the ProvONE data model, which can be viewed at https://purl.dataone.org/provone-v1-dev. In particular, the following relationships are inserted (among others):
prov:used
indicates which source data was used by a program executionprov:generatedBy
indicates which derived data was created by a program executionprov:wasDerivedFrom
indicates the source data from which derived data were created using the program
Examples
library(datapack)
dp <- new("DataPackage")
# Add the script to the DataPackage
progFile <- system.file("./extdata/pkg-example/logit-regression-example.R", package="datapack")
progObj <- new("DataObject", format="application/R", filename=progFile)
dp <- addMember(dp, progObj)
# Add a script input to the DataPackage
inFile <- system.file("./extdata/pkg-example/binary.csv", package="datapack")
inObj <- new("DataObject", format="text/csv", filename=inFile)
dp <- addMember(dp, inObj)
# Add a script output to the DataPackage
outFile <- system.file("./extdata/pkg-example/gre-predicted.png", package="datapack")
outObj <- new("DataObject", format="image/png", file=outFile)
dp <- addMember(dp, outObj)
# Add the provenenace relationshps, linking the input and output to the script execution
# Note: 'sources' and 'derivations' can also be lists of "DataObjects" or "DataObject' identifiers
dp <- describeWorkflow(dp, sources = inObj, program = progObj, derivations = outObj)
# View the results
utils::head(getRelationships(dp))
#> subject
#> 6 _02624868-f821-4258-86ea-b15d6e311a75
#> 5 _02624868-f821-4258-86ea-b15d6e311a75
#> 7 urn:uuid:240596b1-77c9-4f86-9f07-ddff94e2e978
#> 1 urn:uuid:64df373d-349b-49e0-a908-213c2b1cd53d
#> 2 urn:uuid:859c97f0-fd83-4005-9faa-760d2cae170c
#> 11 urn:uuid:859c97f0-fd83-4005-9faa-760d2cae170c
#> predicate
#> 6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
#> 5 http://www.w3.org/ns/prov#hadPlan
#> 7 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
#> 1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
#> 2 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
#> 11 http://www.w3.org/ns/prov#wasDerivedFrom
#> object subjectType
#> 6 http://www.w3.org/ns/prov#Association blank
#> 5 urn:uuid:240596b1-77c9-4f86-9f07-ddff94e2e978 blank
#> 7 http://purl.dataone.org/provone/2015/01/15/ontology#Program <NA>
#> 1 http://purl.dataone.org/provone/2015/01/15/ontology#Data <NA>
#> 2 http://purl.dataone.org/provone/2015/01/15/ontology#Data <NA>
#> 11 urn:uuid:64df373d-349b-49e0-a908-213c2b1cd53d <NA>
#> objectType dataTypeURI
#> 6 uri <NA>
#> 5 uri <NA>
#> 7 uri <NA>
#> 1 uri <NA>
#> 2 uri <NA>
#> 11 <NA> <NA>