Skip to contents

The deposits package is a universal client for depositing and accessing research data anywhere. Currently supported services are zenodo and figshare. These two systems have fundamentally different interfaces (“API”s, or Application Programming Interfaces), and access to these and indeed all deposition services has traditionally been enabled through individual software clients. The deposits package aims to be a universal client offering access to a variety of deposition services, without users having to know any specific details of the APIs for each service. This vignette provides a demonstration of the primary functionality of the deposits package.

The deposits client

The deposits package uses an R6 client to interface with the individual deposition services. The following sub-section explains the properties of a deposits client for those unfamiliar with R6 objects.

R6 methods

The R6 package used to construct deposits clients here allows for structured class objects in R. The objects include elements (such as variables) and methods, which for R are generally functions. A new client can be constructed with the new operator, which for deposits requires specifying the service for which the client is to be constructed:

cli <- depositsClient$new (service = "figshare")

Additional functions are called in a similar way, using the notation, cli$deposit_function(). The deposits package is constructed so that function calls constructed is this way will “automatically” update the object itself, and so generally do not need to be assigned to a return value. For example, the function deposits_list() updates the list of deposits on the associated service. In conventional R packages, calling this function would require assigning a return value like this:

cli_updated <- cli$deposits_list ()

R6 objects are, however, always updated internally, so the client itself, cli, will already include the updated list of deposits without any need for assigning the return value to cli_updated. That is, rather than the above line, all deposits functions may be called simply as,

cli$deposits_list ()

(The single exception to this is the deposit_download_file() function, which returns the path to the locally downloaded file, and so should always be assigned to a return value.)

Initialising a deposits client

An empty client can be constructed by naming the desired service. An additional sandbox parameter constructs a client to the zenodo sandbox environment intended for testing their API. Actual use of the zenodo API can then be enabled with the default sandbox = FALSE.

cli <- depositsClient$new ("zenodo", sandbox = TRUE)
cli
#> <deposits client>
#> deposits service : zenodo
#>           sandbox: TRUE
#>         url_base : https://sandbox.zenodo.org/api/
#> Current deposits : <none>
#>
#>   hostdata : <none>
#>   metadata : <none>

Client construction requires personal access or authentication tokens for deposits services to be stored as local environment variables, as described in the main README document. Authentication tokens are checked when new clients are constructed, so the $new() function will only succeed with valid tokens.

As also described in that README, the main functions of a deposits client are all prefixed with either deposit_ or deposits_, as seen from the result of ls(cli):

ls (cli)
#>  [1] "deposit_delete"        "deposit_download_file" "deposit_fill_metadata"
#>  [4] "deposit_new"           "deposit_retrieve"      "deposit_update"
#>  [7] "deposit_upload_file"   "deposits"              "deposits_list"
#> [10] "headers"               "hostdata"              "id"
#> [13] "initialize"            "metadata"              "print"
#> [16] "sandbox"               "service"               "term_map"
#> [19] "url_base"              "url_service"

The client constructed above is mostly empty, but nevertheless demonstrates the two primary fields or elements of a deposits client:

  1. hostdata holding all metadata from a “host” or external deposits service for a particular deposit; and
  2. metadata holding a consistently structured representation of the key components of the hostdata.

The hostdata structures are generally lists, but differ for different services, whereas the metadata structures remain consistent between services, and allow data to be transformed from one format to another, and, in future functionality, will allow data to be transferred between different services.

Both of these elements represent the “metadata” of a deposit, with the data itself referred to as “files”, which can be uploaded and downloaded. Thus all deposits begin with metadata, with the actual data upload only possible once the initial metadata has been specified and uploaded.

Metadata

A new deposit is initially constructed by filling the metadata field with a local representation of metadata. The hostdata field is filled only after this initial deposit metadata has been uploaded to the external service. These two fields thus further differ through metadata being a local, and locally-defined, representation of metadata for a deposit, whereas hostdata are always provided and defined by the external service. The best way to understand this distinction is through a practical demonstration.

Metadata as a list

There are several ways of defining metadata for a deposits entity, perhaps the easiest of which is as a simple list:

metadata <- list (
    title = "New Title",
    abstract = "This is the abstract",
    creator = list ("A. Person", "B. Person")
)

A new deposits client can be filled with this metadata by passing it as the metadata parameter:

cli <- depositsClient$new (service = "zenodo", sandbox = TRUE, metadata = metadata)
print (cli)
#> <deposits client>
#>  deposits service : zenodo
#>            sandbox: TRUE
#>          url_base : https://sandbox.zenodo.org/api/
#>  Current deposits : <none>
#>
#>     hostdata : <none>
#>     metadata : 3 terms (see 'metadata' element for details)

The summary produced by calling print() (or, equivalently, just typing cli in the console) says that the object now includes three metadata terms. They can be seen by viewing cli$metadata:

cli$metadata
#> <DCEntry>
#> ....|-- updated: 2022-05-23 14:13:42
#> ....|-- title <DCTitle>
#> ........|-- value: New Title
#> ....|-- abstract <DCAbstract>
#> ........|-- value: This is the abstract
#> ....|-- creator <DCCreator>
#> ........|-- value: A. Person
#> ....|-- creator <DCCreator>
#> ........|-- value: B. Person>

Metadata in deposits objects are stored as DCEntry objects in a format provided by the atom4R package. These metadata are primarily intended for internal use within the deposits package, and shouldn’t generally need to be manipulated directly by users of this package (although they certainly can be, as illustrated below).

Metadata from a local file

Another convenient way to specify metadata is to use the deposits_metadata_template() funciton to write a local YAML representation of metadata. This function also accepts the metadata parameter of data specified as a list:

meta_file <- tempfile (pattern = "meta-", fileext = ".yaml")
deposits_metadata_template (filename = meta_file, metadata = metadata)
head (readLines (meta_file))
#> [1] "{"                                                                                                 
#> [2] "  \"_comment1\": \"Fields starting with underscores will be ignored (and can safely be deleted)\","
#> [3] "  \"Abstract\": \"This is the abstract\","                                                         
#> [4] "  \"AccessRights\": \"\","                                                                         
#> [5] "  \"AccrualMethod\": \"\","                                                                        
#> [6] "  \"AccrualPeriodicity\": \"\","

Those metadata can then be directly edited using any text file editor, and the modified results saved to a local file. The name of that file can then also be passed as the metadata parameter of a new deposits client. The following code thus produces the same results as above:

cli <- depositsClient$new (service = "zenodo", sandbox = TRUE, metadata = meta_file)

Creating a new deposit

A deposits client with metadata can be used to initiate a new deposit on the associated external service with the $deposit_new() function. This is not to be confused with the $new() function which creates a new client. The $deposit_new() function uses an existing client to create a new deposit on the external service. Using the client constructed above with our sample metadata gives the following result:

cli$deposit_new ()
print (cli)
#> <deposits client>
#>  deposits service : zenodo
#>            sandbox: TRUE
#>          url_base : https://sandbox.zenodo.org/api/
#>  Current deposits : 1 (see 'deposits' element for details)
#>
#>  url_deposit : https://sandbox.zenodo.org/deposit/1064327
#>   deposit id : 1064327
#>     hostdata : list with 14  elements
#>     metadata : 3 terms (see 'metadata' element for details)

The client now lists one current deposit, additional fields for the URL and “id” of the deposit, and has a hostdata field with 14 elements. Importantly, the id field holds a unique integer value used to identify particular deposits both on all external services, and as the deposit_id parameter of deposits client functions. The following code demonstrates by constructing a new client with no metadata.

cli <- depositsClient$new (service = "zenodo", sandbox = TRUE)
print (cli)
#> <deposits client>
#>  deposits service : zenodo
#>            sandbox: TRUE
#>          url_base : https://sandbox.zenodo.org/api/
#>  Current deposits : 1 (see 'deposits' element for details)
#>
#>   hostdata : <none>
#>   metadata : <none>

This differs from our initial client in that it now lists one “current deposit”. We can examine that to get the associated “id” value:

cli$deposits$id
#> [1] 1064327
deposit_id <- cli$deposits$id

We can then retrieve the metadata we previously uploaded with the deposit_retrieve() function:

cli$deposit_retrieve (deposit_id = deposit_id)

The local client then holds identical information to the previous client immediately after calling deposit_new() - that is, retrieve_deposit() has filled the local client with all of the metadata from that deposit.

Uploading files to deposits

The deposits clients thus far have only been used to construct and upload metadata. The main point of a deposit is of course to store actual data in any arbitrary format alongside these structured metadata. This is achieved with the deposit_upload_file() function, demonstrated in the following code which uses our deposit retrieved directly above.

path <- file.path (tempdir (), "data.Rds")
saveRDS (datasets::Orange, path)
cli$deposit_upload_file (path = path)

Although the print output of our cli object does not change after uploading, the details of the files are contained in the hostdata$files element:

cli$hostdata$files
#>                           checksum               filename filesize
#> 1 22c61006458c44ed979bf3461e330693 data-18e115882cf14.Rds      506
#>                                     id
#> 1 00cad89d-4084-4337-b96a-2f1e232b2b6c
#>                                                                                     links.download
#> 1 https://sandbox.zenodo.org/api/files/e3f727dc-1dfd-4849-88cf-994f630105cf/data-18e115882cf14.Rds
#>                                                                                              links.self
#> 1 https://sandbox.zenodo.org/api/deposit/depositions/1064327/files/00cad89d-4084-4337-b96a-2f1e232b2b6c

Files can be downloaded with the converse download_file function, demonstrated here by first removing the local copy, and then downloading it from the deposits service:

file.remove (path)
file <- cli$deposit_download_file (filename = "data.Rds", path = tempdir ())
identical (readRDS (file), datasets::Orange)
#> [1] TRUE