Add data derivation information to a DataPackageSource:
Add information about the relationships among DataObject members
in a DataPackage, retrospectively describing the way in which derived data were
created from source data using a processing program such as an R script. These provenance
relationships allow the derived data to be understood sufficiently for users
to be able to reproduce the computations that created the derived data, and to
trace lineage of the derived data objects. The method
will add provenance relationships between a script that was executed, the files
that it used as sources, and the derived files that it generated.
DataPackageto add provenance relationships to.
A list of DataObjects for files that were read by the program. Alternatively, a list of DataObject identifiers can be specified as a list of character strings.
The DataObject created for the program such as an R script. Alternatively the DataObject identifier can be specified.
A list of DataObjects for files that were generated by the program. Alternatively, a list of DataObject identifiers can be specified as a list of character strings.
logicalvalue. If TRUE then the provenance relationship
prov:wasDerivedFromwill be used to connect every source and derivation. The default value is TRUE.
This method operates on a DataPackage that has had DataObjects for the script, data sources (inputs), and data derivations (outputs) previously added to it, or can reference identifiers for objects that exist in other DataPackage instances. This allows a user to create a standalone package that contains all of its source, script, and derived data, or a set of data packages that are chained together via a set of derivation relationships between the members of those packages.
Provenance relationships are described following the the ProvONE data model, which can be viewed at https://purl.dataone.org/provone-v1-dev. In particular, the following relationships are inserted (among others):
prov:usedindicates which source data was used by a program execution
prov:generatedByindicates which derived data was created by a program execution
prov:wasDerivedFromindicates the source data from which derived data were created using the program
library(datapack) dp <- new("DataPackage") # Add the script to the DataPackage progFile <- system.file("./extdata/pkg-example/logit-regression-example.R", package="datapack") progObj <- new("DataObject", format="application/R", filename=progFile) dp <- addMember(dp, progObj) # Add a script input to the DataPackage inFile <- system.file("./extdata/pkg-example/binary.csv", package="datapack") inObj <- new("DataObject", format="text/csv", filename=inFile) dp <- addMember(dp, inObj) # Add a script output to the DataPackage outFile <- system.file("./extdata/pkg-example/gre-predicted.png", package="datapack") outObj <- new("DataObject", format="image/png", file=outFile) dp <- addMember(dp, outObj) # Add the provenenace relationshps, linking the input and output to the script execution # Note: 'sources' and 'derivations' can also be lists of "DataObjects" or "DataObject' identifiers dp <- describeWorkflow(dp, sources = inObj, program = progObj, derivations = outObj) # View the results utils::head(getRelationships(dp)) #> subject #> 6 _34b909f3-aead-414a-89f5-776813b3048c #> 5 _34b909f3-aead-414a-89f5-776813b3048c #> 1 urn:uuid:258ec740-51f6-4ce5-870d-ce195535fd05 #> 7 urn:uuid:4ec9ca39-02f7-418d-96ae-b1cac5af6ace #> 8 urn:uuid:9f9960e7-4bdf-4101-804c-55cd4c436c9c #> 4 urn:uuid:9f9960e7-4bdf-4101-804c-55cd4c436c9c #> predicate #> 6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type #> 5 http://www.w3.org/ns/prov#hadPlan #> 1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type #> 7 http://www.w3.org/1999/02/22-rdf-syntax-ns#type #> 8 http://purl.org/dc/terms/identifier #> 4 http://www.w3.org/1999/02/22-rdf-syntax-ns#type #> object subjectType #> 6 http://www.w3.org/ns/prov#Association blank #> 5 urn:uuid:4ec9ca39-02f7-418d-96ae-b1cac5af6ace blank #> 1 http://purl.dataone.org/provone/2015/01/15/ontology#Data <NA> #> 7 http://purl.dataone.org/provone/2015/01/15/ontology#Program <NA> #> 8 urn:uuid:9f9960e7-4bdf-4101-804c-55cd4c436c9c <NA> #> 4 http://purl.dataone.org/provone/2015/01/15/ontology#Execution <NA> #> objectType dataTypeURI #> 6 uri <NA> #> 5 uri <NA> #> 1 uri <NA> #> 7 uri <NA> #> 8 literal http://www.w3.org/2001/XMLSchema#string #> 4 uri <NA>