Citing GBIF Mediated Data
John Waller
2021-12-20
Source:vignettes/gbif_citations.Rmd
gbif_citations.Rmd
Data accessed through the GBIF network is free for all, but not free of obligations.
Under the terms of the GBIF data user agreement, users who download data agree to cite a DOI. Good citation also rewards data-publishing institutions and individuals by reinforcing the value of sharing open data and demonstrating its impact to their funders.
Please do read GBIF’s citation guidelines.
The Best Way to Cite
The best way to get
data from GBIF is with rgbif::occ_download()
.
occ_download(pred("taxonKey",7412043))
The newest version of rgbif will give you the DOI you need to make a good citation.
<<gbif download>>
Your download is being processed by GBIF:
https://www.gbif.org/occurrence/download/0056004-210914110416597
Most downloads finish within 15 min.
Check status with
occ_download_wait('0056004-210914110416597')
After it finishes, use
d <- occ_download_get('0056004-210914110416597') %>%
occ_download_import()
to retrieve your download.
Download Info:
Username: jwaller
E-mail: jwaller@gbif.org
Format: DWCA
Download key: 0056004-210914110416597
Created: 2021-11-17T09:17:21.828+00:00
Citation Info:
Please always cite the download DOI when using this data.
https://www.gbif.org/citation-guidelines
DOI: 10.15468/dl.9hqqbn
Citation:
GBIF Occurrence Download https://doi.org/10.15468/dl.9hqqbn Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2021-11-17
For this download, you would use this DOI-citation:
GBIF Occurrence Download https://doi.org/10.15468/dl.9hqqbn Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2021-11-17
You could also get this citation by using
rgbif::gbif_citation()
gbif_citation("0056004-210914110416597") # using the downloadkey
These would be the preferred and easiest ways to create a citation of GBIF mediated data. Below I will describe other special cases that you might want to consider.
Register a Derived Dataset
Derived datasets are a new citation feature on GBIF. Derived datasets are citable records of GBIF-mediated occurrence data. To register a derived dataset, you will need to create a simple text file with two columns:
- A GBIF datasetkey (uuid)
- A count of the number of occurrences from each dataset
This allows GBIF to give credit to each involved dataset. The file you register with GBIF should look like the table below.
datasetkey | n |
---|---|
4fa7b334-ce0d-4e88-aaae-2e0c138d049e | 213 |
906e6978-e292-4a8b-9c39-adf6bb0f3323 | 2 |
721a99a4-71f4-4466-b346-83c367889238 | 35 |
Remember that you should also upload your filtered GBIF dataset of occurrences to a public repository like Zenodo.
There are 3 main reasons to register a derived dataset:
- A GBIF download that has been filtered/reduced significantly (e.g. CoordinateCleaner).
- Data accessed through a cloud service.
- Occurrences obtained using
occ_search()
or similar.
Before using option 3, it is important to consider: could my
occ_search()
have been accomplished with an
occ_download()
? The answer is almost always
YES!.
Here is a simple example of using
rgbif::derived_dataset()
.
library(rgbif)
library(dplyr)
library(CoordinateCleaner)
gbif_download <- occ_download_get('0056004-210914110416597') %>%
occ_download_import()
gbif_download_cleaned <- gbif_download %>%
setNames(tolower(names(.))) %>%
filter(occurrencestatus == "PRESENT") %>%
filter(year >= 1900) %>%
cc_cen(buffer = 2000) %>% # remove country centroids within 2km
cc_inst(buffer = 2000) %>% # remove zoo and herbaria within 2km
cc_sea() # remove from ocean
readr::write_tsv(gbif_download_cleaned,"cleaned_data_for_zenodo.tsv")
At this point, you would have to stop and upload to public repository.
Once you are finished, you can run the following, with the source_url being the link to your publicly accessible modified data. You will need to setup your GBIF credentials for this to work.
# https://www.gbif.org/derived-dataset/about)
derived_data <- gbif_download_clean %>%
group_by(datasetkey) %>%
count()
derived_dataset_prep(
citation_data = derived_data,
title = "Test Derived Dataset",
description = "This data was filtered using CoordinateCleaner.",
source_url = "https://zenodo.org/record/4246090#.YPGS2OgzZPY"
)
# If output looks ok, run derived_dataset to register the dataset on GBIF
# derived_dataset(
# citation_data = data,
# title = "Test Derived Dataset",
# description = "This data was filtered using CoordinateCleaner.",
# source_url = "https://zenodo.org/record/4246090#.YPGS2OgzZPY"
# )
Check your derived-dataset user page to see if it worked.