The deposits package
“deposits” is an R package which provides a universal client for depositing and accessing research data in a variety of online deposition services. Currently supported services are zenodo and figshare. These two systems have fundamentally different interfaces (“API”s, or Application Programming Interfaces), and access to these and indeed all deposition services has traditionally been enabled through individual software clients. The deposits package aims to be a universal client offering access to a variety of deposition services, without users having to know any specific details of the APIs for each service. This vignette demonstrates how the deposits package can be used to manage the processes of uploading and publishing research data, using the methods summarised in Figure 1.
An empty client can be constructed by naming the desired service. An additional
sandbox parameter constructs a client to the
zenodo sandbox environment intended for testing their API. Actual use of the
zenodo API can then be enabled with the default
sandbox = FALSE.
cli <- depositsClient$new ("zenodo", sandbox = TRUE) cli #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : <none> #> #> hostdata : <none> #> metadata : <none>
Client construction requires personal access or authentication tokens for deposits services to be stored as local environment variables, as described in the installation and setup document. Authentication tokens are checked when new clients are constructed, so the
$new() function will only succeed with valid tokens.
cli$deposits_methods () #> List of methods for a deposits client: #> #> - deposit_add_resource #> - deposit_delete #> - deposit_delete_file #> - deposit_download_file #> - deposit_embargo #> - deposit_fill_metadata #> - deposit_new #> - deposit_prereserve_doi #> - deposit_publish #> - deposit_retrieve #> - deposit_service #> - deposit_update #> - deposit_upload_file #> - deposit_version #> - deposits_list #> - deposits_methods #> - deposits_search #> #> see `?depositsClient` for full details of all methods.
All methods are described in detail in the documentation entry for the deposits client. All methods starting with the singular “deposit_” prefix operate on individual deposits. The final 3 methods starting with “deposits_” are general methods applied to services in general (“list” and “search”), or to a deposits client in general (“methods”). The main methods, and relationships between them, are also illustrated in Figure 1.
The client constructed above is mostly empty, but nevertheless demonstrates the two primary fields or elements of a deposits client, the “hostdata” and “metadata”. Both of these elements represent the “metadata” of a deposit, with the data itself referred to as “files”, which can be uploaded and downloaded. These files also have accompanying metadata, according to the “frictionless” workflow as described in the separate “frictionless” vignette.
There are thus three types of metadata used in a deposits workflow:
- “metadata” which describe a deposit and associated properties, such as author names and affiliations, deposit titles and descriptions, dates, keywords, links to other deposits or publications, and many other terms. These kinds of metadata are described in the metadata vignette.
“frictionless metadata” which describe the actual contents of the data to be deposited. These kinds of metadata are (optionally) generated and (always) handled here by the
frictionlesspackage. These kind of metadata are described in the frictionless vignette.
- “hostdata” which are provided in different formats by the various deposits services, and are intended as read-only data used to examine the remote records of a deposit.
The term “metadata” refers through this and all deposits documentation to the first of these three kinds, with the second always explicitly referred to as “frictionless metadata.” The “metadata” and “frictionless metadata” structures remain consistent between services, and allow data to be transformed from one format to another, and between local clients and remote services. In contrast, the
hostdata structures are directly provided by the deposits host services, generally as lists, and with different structures for different services. These structures are read-only fields which are automatically filled by the deposits client, and are intended to provide insight into metadata records stored on host sites.
A new deposit is initially constructed by filling the
metadata field with a local representation of metadata. There are several ways of doing this, as described in the separate metadata vignette. One of the easiest approaches is to define metadata as a simple list:
Note that the “creator” item has to be a list-of-lists, because aspects other than name may also be included, and the second list is required to distinguish different creators, as described in detail in the metadata vignette. A new deposits client can be filled with this metadata by passing it as the
cli <- depositsClient$new ( service = "zenodo", sandbox = TRUE, metadata = metadata ) print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : <none> #> #> hostdata : <none> #> metadata : 3 terms (see 'metadata' element for details)
The summary produced by calling
print() (or, equivalently, just typing
cli in the console) says that the object now includes three metadata terms. They can be seen by viewing
cli$metadata, confirming that the client metadata are precisely what we specified:
#> $abstract #>  "This is the abstract" #> #> $creator #> $creator[] #> $creator[]$name #>  "A. Person" #> #> #> $creator[] #> $creator[]$name #>  "B. Person" #> #> #> #> $title #>  "New Title"
Alternative ways of specifying and entering metadata are described in the metadata vignette, along with detailed descriptions of the kinds of metadata accepted by a deposits client.
Once filled with metadata, a deposits client can be used to initiate a new deposit on the associated external service with the
$deposit_new() method. The
$deposit_new() method uses an existing client to create a new deposit on the nominated service, whereas the the
$new() method method creates a new client. Calling
deposit_new() from the client constructed above with our sample metadata gives the following result:
cli$deposit_new () #> ID of new deposit: 1064327 print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : 1 (see 'deposits' element for details) #> #> url_deposit : https://sandbox.zenodo.org/deposit/1064327 #> deposit id : 1064327 #> hostdata : list with 14 elements #> metadata : 4 terms (see 'metadata' element for details)
The client now lists one current deposit, additional fields for the URL and “id” of the deposit, and has a “hostdata” field with 14 elements. The “ID” value printed by the call to
deposit_new() is listed in the client as its “deposit id”. This is a unique integer value used to identify particular deposits on external services. The value can be accessed any time as
cli$id. The “metadata” item also includes an additional “identifier” element containing a pre-reserved DOI provided by the deposits service.
From that point on, a client will always show (at least) one deposit. For example, if we return at some later time to a new R session and initiate a new, empty client, we would see the following result:
cli <- depositsClient$new (service = "zenodo", sandbox = TRUE) print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : 1 (see 'deposits' element for details) #> #> hostdata : <none> #> metadata : <none>
This differs from our initial client in that it now lists one “current deposit”.
We can examine a deposits client to get the “id” values of all current deposits. Extending from the previous example, the “id” can be accessed as:
cli$deposits$id #>  1064327
More generally, information of all deposits currently associated with a user’s account (as identified by the token described in the installation vignette) can be accessed as
cli$deposits. With the single deposit show in the previous steps, the first few fields of the result look this this:
cli$deposits [, 1:5]
#> conceptrecid created doi #> 1 1200932 2023-00-01T00:00:00 10.5072/zenodo.1064327 #> doi_url id #> 1 https://doi.org/10.5072/zenodo.1064327 1064327
We can retrieve the metadata from this or any previously uploaded deposit with the
cli$deposit_retrieve (deposit_id = cli$deposits$id )
The local client then holds identical information to the previous client immediately after calling
deposit_new() - that is,
retrieve_deposit() has filled the local client with all of the metadata from the previously-created deposit.
The previous sections of this document describe how to initiate a deposits client, and how to use that to initiate and retrieve metadata from a remote deposits services. The main point of a deposit is of course to store actual data in any arbitrary format alongside these structured metadata. This is achieved with the
deposit_upload_file() method, demonstrated in the following code which uses our deposit retrieved directly above. It is recommended to store all data for a single deposit within a single directory, which the following code also creates.
The client then holds additional information which appears after typing
print(cli), or just
print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : 1 (see 'deposits' element for details) #> #> url_deposit : https://sandbox.zenodo.org/deposit/1064327 #> deposit id : 1064327 #> hostdata : list with 14 elements #> metadata : 4 terms (see 'metadata' element for details) #> local_path : /tmp/RtmpxSiYhW/data #> resources : 1 local, 1 remote
The client now holds a
local_path field identifying the directory of the active deposit, and lists numbers of both local and remote resources. The details of the remote resources are contained in the
hostdata$files element (which was previously empty):
cli$hostdata$files #> checksum filename filesize id #> 1 cc624d72ede85ef061afa494d9951f6f data.csv 625 56c44dd6-5f84-4212-9a65-d37f64ca886f #> 2 eaeb7c4f8a931c99e662172299a0b17f datapackage.json 812 32d556ef-5b65-4b9d-a8a8-2e7bed11da5d #> links.download #> 1 https://sandbox.zenodo.org/api/files/561f4971-9e86-4235-b574-f5662f6088e3/data.csv #> 2 https://sandbox.zenodo.org/api/files/561f4971-9e86-4235-b574-f5662f6088e3/datapackage.json #> links.self #> 1 https://sandbox.zenodo.org/api/deposit/depositions/1161632/files/56c44dd6-5f84-4212-9a65-d37f64ca886f #> 2 https://sandbox.zenodo.org/api/deposit/depositions/1161632/files/32d556ef-5b65-4b9d-a8a8-2e7bed11da5d
The list of files includes a “datapackage.json” file generated by the
frictionless package. This file is not counted in “resources”. As described in the main README, and at length in the separate “frictionless” vignette, the “datapackage.json” file contains both the metadata entered in to the deposits client, as well as “frictionless metadata” describing the internal properties of the dataset itself.
Files can be downloaded with the
deposit_download_file function. To demonstrate how that works, the following code first removes the local version, then downloads it from the remote service and confirms that a local version has been successfully re-created.
file.remove (path) file <- cli$deposit_download_file (filename = "data.csv", path = data_dir) file #>  /tmp/RtmpcO59N8/data/data.csv
The workflow described in the preceding section results in a frictionless metadata file being simultaneously generated, filled with deposits metadata, and uploaded to the nominated service. As described in detail in the “frictionless” vignette. An alternative workflow allows frictionless metadata files to be generated locally prior to any uploading. This uses the
deposits_add_resource() method, where a “resource” is a local data file or object.
After initiating a client with metadata, as demonstrated above:
cli <- depositsClient$new (service = "zenodo", sandbox = TRUE, metadata = metadata)
A frictionless metadata file which is only stored locally can then be generated by the following call, by specifying a path to that local file.
cli$deposit_add_resource (path = path)
The client will then list an additional
local_path, as demonstrated above, and in this case will list
resources: 1 local, 0 remote, because the resource has not yet been uploaded to the remote service. The
local_path directory containing the specified file will also have an additional “datapackage.json” file including the deposits metadata used in client construction. This file may be edited as desired prior to uploading. To update a deposits client with changes to external metadata files, simply pass the path to that file to the
deposits_fill_metadata() method. When ready, a single call to the
deposit_upload_file() function will upload the file specified in that call, along with the frictionless “datapackage.json” metadata file.
All deposits are initiated on the nominated services as “private” deposits, meaning:
- They can only be viewed by the deposit owner; and
- They can be freely edited, including complete deletion.
A deposit can only be publicly viewed once it has been published, as described in the final section of this vignette. The process of using deposits to prepare one or more datasets for publication will generally involve multiple stages of editing and updating.
Once a deposits client has been filled with metadata and connected to a
local_path, as demonstrated above, any of the local files may be edited, including the frictionless “datapackage.json” file. The client and the deposit held on the remote server may then be updated by calling the
deposit_update() method. Any changes to the “metadata” field of the “datapackage.json” file will be reflected in the “metadata” field of the deposits client, as well as in the metadata passed to the remote service. Any modified files, including “datapackage.json”, will also be uploaded to the remote service, over-writing previous versions.
Note that local files must first be individually uploaded with the the
deposit_upload_file() method before the
deposit_update() method can be used to update them. Moreover, calling
deposit_update() before all files held in the
local_path directory have been uploaded will generally produce an error noting that all files must first be uploaded prior to calling
An example of a full workflow for creating and editing a deposits client and associated metadata would look something like the following five main steps:
Initiate local deposits client with metadata:
Upload local data, which the following code simulates by creating a “dummy” dataset in the temporary directory of the current R session:
The following call then uploads that dataset to the newly-created deposit:
cli$deposit_upload_file (path = path)
deposit_upload_file()the first time also creates local and remote versions of a frictionless “datapackage.json” file, holding all metadata, and the DOI of the new deposit. Uploading files also automatically generates the
local_pathfield in the deposits client, enabling numbers of local and remote resources to be counted and shown when printing the client.
Modify metadata. The following code provides a proof-of-principle modification of metadata, by changing “New Title” to “Updated Title”:
fr <- file.path (data_dir, "datapackage.json") dp <- frictionless::read_package (fr) dp$metadata$title #>  "New Title" dp$metadata$title <- "Updated Title" frictionless::write_package (dp, data_dir)
This is an indirect way of editing metadata, by using R code. The recommended way to update deposits metadata is to directly edit and modify the “datapackage.json” file.
Update both local client and remote deposit data, noting that the
local_pathvariable is held in the client itself, so does not need to be passed to the update method.
cli$deposit_update () #> Local file at [/tmp/RtmpBM0VYr/data/data.csv] is identical on host and will not be uploaded. #> Local file at [/tmp/RtmpBM0VYr/data/datapackage.json] has changed and will now be uploaded. cli$metadata$title #>  "Updated Title" cli$hostdata$title #>  "Updated Title"
Local modifications are reflected in both updated “metadata” with the deposits client, as well as in “hostdata” stored on the Zenodo service.
Once all metadata and data have been satisfactorily edited, updated, and uploaded, a deposit can be made publicly visible and permanently associated with a Digital Object Identifier (DOI) by publishing it. Prior to publishing, it is often desired to apply an “embargo” to the deposit, in the form of a date after which the deposit will become publicly visible. The two steps to publication are thus generally:
cli$deposit_embargo (embargo_date = "2030-03-30") cli$deposit_publish ()
deposit_publish() method is irreversible, and can never be undone. (Publication is permanent even in the Zenodo sandbox environment.) The published deposit will be permanently associated with the account of the user who published it, as identified by the API token used to initiate the deposits client. Publication will also change many items of the client’s “hostdata”, notably involving a change of status or visibility from “private” to “public”. Once a deposit has been published, the associated DOI, or equivalent the URL given in the deposits client, may be shared as a permanent link to the deposit.