The deposits
package is a universal client for depositing and accessing research data anywhere. Currently supported services are zenodo and figshare. These two systems have fundamentally different interfaces (“API”s, or Application Programming Interfaces), and access to these and indeed all deposition services has traditionally been enabled through individual software clients. The deposits
package aims to be a universal client offering access to a variety of deposition services, without users having to know any specific details of the APIs for each service. This vignette provides a demonstration of the primary functionality of the deposits
package.
The deposits client
The deposits
package uses an R6
client to interface with the individual deposition services. The following sub-section explains the properties of a deposits
client for those unfamiliar with R6
objects.
R6 methods
The R6
package used to construct deposits
clients here allows for structured class objects in R. The objects include elements (such as variables) and methods, which for R are generally functions. A new client can be constructed with the new
operator, which for deposits
requires specifying the service for which the client is to be constructed:
cli <- depositsClient$new (service = "figshare")
Additional functions are called in a similar way, using the notation, cli$deposit_function()
. The deposits
package is constructed so that function calls constructed is this way will “automatically” update the object itself, and so generally do not need to be assigned to a return value. For example, the function deposits_list()
updates the list of deposits on the associated service. In conventional R packages, calling this function would require assigning a return value like this:
cli_updated <- cli$deposits_list ()
R6
objects are, however, always updated internally, so the client itself, cli
, will already include the updated list of deposits without any need for assigning the return value to cli_updated
. That is, rather than the above line, all deposits functions may be called simply as,
cli$deposits_list ()
(The single exception to this is the deposit_download_file()
function, which returns the path to the locally downloaded file, and so should always be assigned to a return value.)
Initialising a deposits client
An empty client can be constructed by naming the desired service. An additional sandbox
parameter constructs a client to the zenodo
sandbox environment intended for testing their API. Actual use of the zenodo
API can then be enabled with the default sandbox = FALSE
.
cli <- depositsClient$new ("zenodo", sandbox = TRUE)
cli
#> <deposits client>
#> deposits service : zenodo
#> sandbox: TRUE
#> url_base : https://sandbox.zenodo.org/api/
#> Current deposits : <none>
#>
#> hostdata : <none>
#> metadata : <none>
Client construction requires personal access or authentication tokens for deposits services to be stored as local environment variables, as described in the main README
document. Authentication tokens are checked when new clients are constructed, so the $new()
function will only succeed with valid tokens.
As also described in that README
, all methods of a deposits
client can be seen with the deposits_methods()
method:
cli$deposits_methods ()
#> List of methods for a deposits client:
#>
#> - deposit_delete
#> - deposit_download_file
#> - deposit_fill_metadata
#> - deposit_new
#> - deposit_retrieve
#> - deposit_service
#> - deposit_update
#> - deposit_upload_file
#> - deposits_list
#> - deposits_methods
#> - deposits_search
#>
#> see `?depositsClient` for full details of all methods.
The client constructed above is mostly empty, but nevertheless demonstrates the two primary fields or elements of a deposits client:
-
hostdata
holding all metadata from a “host” or external deposits service for a particular deposit; and -
metadata
holding a consistently structured representation of the key components of thehostdata
.
The hostdata
structures are generally lists, but differ for different services, whereas the metadata
structures remain consistent between services, and allow data to be transformed from one format to another, and, in future functionality, will allow data to be transferred between different services.
Both of these elements represent the “metadata” of a deposit, with the data itself referred to as “files”, which can be uploaded and downloaded. Thus all deposits begin with metadata, with the actual data upload only possible once the initial metadata has been specified and uploaded.
Metadata
A new deposit is initially constructed by filling the metadata
field with a local representation of metadata. The hostdata
field is filled only after this initial deposit metadata has been uploaded to the external service. The best way to understand the distinction between metadata
and hostdata
is through a practical demonstration.
Metadata as a list
There are several ways of defining metadata for a deposits
entity, perhaps the easiest of which is as a simple list:
metadata <- list (
title = "New Title",
abstract = "This is the abstract",
creator = list ("A. Person", "B. Person")
)
A new deposits client can be filled with this metadata by passing it as the metadata
parameter:
cli <- depositsClient$new (service = "zenodo", sandbox = TRUE, metadata = metadata)
print (cli)
#> <deposits client>
#> deposits service : zenodo
#> sandbox: TRUE
#> url_base : https://sandbox.zenodo.org/api/
#> Current deposits : <none>
#>
#> hostdata : <none>
#> metadata : 4 terms (see 'metadata' element for details)
The summary produced by calling print()
(or, equivalently, just typing cli
in the console) says that the object now includes four metadata terms. They can be seen by viewing cli$metadata
:
cli$metadata
#> $abstract
#> [1] "This is the abstract"
#>
#> $created
#> [1] "2023-01-01"
#>
#> $creator
#> $creator[[1]]
#> [1] "A. Person"
#>
#> $creator[[2]]
#> [1] "B. Person"
#>
#>
#> $title
#> [1] "New Title"
Metadata in deposits
objects are stored as named lists. These metadata are primarily intended for internal use within the deposits
package, and shouldn’t generally need to be manipulated directly by users of this package (although they certainly can be, as illustrated below).
Metadata from a local file
Another convenient way to specify metadata is to use the deposits_metadata_template()
funciton to write a local “.json” representation of metadata. This local file includes all metadata fields recognised by a deposits client. The function also accepts an optional metadata
parameter which accepts a named list of values used to pre-populate entries in the resultant file.
meta_file <- tempfile (pattern = "meta-", fileext = ".yaml")
deposits_metadata_template (filename = meta_file, metadata = metadata)
head (readLines (meta_file))
#> [1] "{"
#> [2] " \"_comment1\": \"Fields starting with underscores will be ignored (and can safely be deleted)\","
#> [3] " \"abstract\": \"This is the abstract\","
#> [4] " \"accessRights\": \"\","
#> [5] " \"accrualMethod\": \"\","
#> [6] " \"accrualPeriodicity\": \"\","
Those metadata can then be directly edited using any text file editor. The name of the file can then also be passed as the metadata
parameter of a new deposits
client. The following code thus produces the same results as above:
cli <- depositsClient$new (service = "zenodo", sandbox = TRUE, metadata = meta_file)
Creating a new deposit
Once filled with metadata, a deposits
client can be used to initiate a new deposit on the associated external service with the $deposit_new()
function. This is not to be confused with the $new()
function which creates a new client. The $deposit_new()
function uses an existing client to create a new deposit on the external service. Using the client constructed above with our sample metadata gives the following result:
cli$deposit_new ()
#> ID of new deposit: 1064327
print (cli)
#> <deposits client>
#> deposits service : zenodo
#> sandbox: TRUE
#> url_base : https://sandbox.zenodo.org/api/
#> Current deposits : 1 (see 'deposits' element for details)
#>
#> url_deposit : https://sandbox.zenodo.org/deposit/1064327
#> deposit id : 1064327
#> hostdata : list with 14 elements
#> metadata : 4 terms (see 'metadata' element for details)
The client now lists one current deposit, additional fields for the URL and “id” of the deposit, and has a hostdata
field with 14 elements. Importantly, the id
field holds a unique integer value used to identify particular deposits both on all external services, and as the deposit_id
parameter of deposits
client functions.
If we now construct a new, empty client, we see the following result:
cli <- depositsClient$new (service = "zenodo", sandbox = TRUE)
print (cli)
#> <deposits client>
#> deposits service : zenodo
#> sandbox: TRUE
#> url_base : https://sandbox.zenodo.org/api/
#> Current deposits : 1 (see 'deposits' element for details)
#>
#> hostdata : <none>
#> metadata : <none>
This differs from our initial client in that it now lists one “current deposit”. We can examine that to get the associated “id” value:
cli$deposits$id
#> [1] 1064327
We can then retrieve the metadata we previously uploaded with the deposit_retrieve()
function:
cli$deposit_retrieve (deposit_id = cli$deposits$deposit_id)
The local client then holds identical information to the previous client immediately after calling deposit_new()
- that is, retrieve_deposit()
has filled the local client with all of the metadata from the previously-created deposit.
Uploading files to deposits
The deposits clients thus far have only been used to construct and upload metadata. The main point of a deposit is of course to store actual data in any arbitrary format alongside these structured metadata. This is achieved with the deposit_upload_file()
function, demonstrated in the following code which uses our deposit retrieved directly above.
path <- file.path (tempdir (), "data.csv")
write.csv (datasets::Orange, path, row.names = FALSE)
cli$deposit_upload_file (path = path)
Although the print
output of our cli
object does not change after uploading, the details of the files are contained in the hostdata$files
element:
cli$hostdata$files
#> checksum filename filesize id
#> 1 cc624d72ede85ef061afa494d9951f6f data.csv 625 56c44dd6-5f84-4212-9a65-d37f64ca886f
#> 2 eaeb7c4f8a931c99e662172299a0b17f datapackage.json 812 32d556ef-5b65-4b9d-a8a8-2e7bed11da5d
#> links.download
#> 1 https://sandbox.zenodo.org/api/files/561f4971-9e86-4235-b574-f5662f6088e3/data.csv
#> 2 https://sandbox.zenodo.org/api/files/561f4971-9e86-4235-b574-f5662f6088e3/datapackage.json
#> links.self
#> 1 https://sandbox.zenodo.org/api/deposit/depositions/1161632/files/56c44dd6-5f84-4212-9a65-d37f64ca886f
#> 2 https://sandbox.zenodo.org/api/deposit/depositions/1161632/files/32d556ef-5b65-4b9d-a8a8-2e7bed11da5d
The list of files includes a “datapackage.json” file generated by the frictionless
package, as described in the main README. Files can be downloaded with the converse download_file
function, demonstrated here by first removing the local copy, and then downloading it from the deposits service:
file.remove (path)
file <- cli$deposit_download_file (filename = "data.csv", path = tempdir ())
file
#> [1] /tmp/RtmpcO59N8/data.csv