About Unpaywall
Unpaywall, developed and maintained by the team of OurResearch, is a non-profit service that finds open access copies of scholarly literature by looking up a DOI (Digital Object Identifier). It not only returns open access full-text links, but also helpful metadata about the open access status of a publication such as licensing or provenance information.
Unpaywall uses different data sources to find open access full-texts including:
- Crossref: a DOI registration agency serving major scholarly publishers.
- Directory of Open Access Journals (DOAJ): a registry of open access journals
- Various OAI-PMH metadata sources. OAI-PMH is a protocol often used by open access journals and repositories such as arXiv and PubMed Central.
See Piwowar et al. (2018) for a comprehensive overview of Unpaywall.
Basic usage
There is one major function to talk with Unpaywall,
oadoi_fetch()
, taking a character vector of DOIs and your
email address as required arguments.
library(roadoi)
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
"10.1103/physreve.88.012814"),
email = "najko.jahn@gmail.com")
## # A tibble: 2 × 21
## doi best_oa_location oa_locations oa_locations_embargoed data_standard is_oa
## <chr> <list> <list> <list> <int> <lgl>
## 1 10.1… <tibble> <tibble> <tibble [0 × 0]> 2 TRUE
## 2 10.1… <tibble [1 × 9]> <tibble> <tibble [0 × 0]> 2 TRUE
## # ℹ 15 more variables: is_paratext <lgl>, genre <chr>, oa_status <chr>,
## # has_repository_copy <lgl>, journal_is_oa <lgl>, journal_is_in_doaj <lgl>,
## # journal_issns <chr>, journal_issn_l <chr>, journal_name <chr>,
## # publisher <chr>, published_date <chr>, year <chr>, title <chr>,
## # updated_resource <chr>, authors <list>
What’s returned?
The client supports API version 2. According to the Unpaywall Data Format, the following variables with the following definitions are returned:
Column | Description |
---|---|
doi |
DOI (always in lowercase) |
best_oa_location |
list-column describing the best OA location. Algorithm prioritizes publisher hosted content (e.g. Hybrid or Gold) |
oa_locations |
list-column of all the OA locations. |
oa_locations_embargoed |
list-column of locations expected to be available in the future based on information like license metadata and journals’ delayed OA policies |
data_standard |
Indicates the data collection approaches used for this
resource. 1 mostly uses Crossref for hybrid detection.
2 uses more comprehensive hybrid detection methods. |
is_oa |
Is there an OA copy (logical)? |
is_paratext |
Is the item an ancillary part of a journal, like a table of contents? See here for more information https://support.unpaywall.org/support/solutions/articles/44001894783. |
genre |
Publication type |
oa_status |
Classifies OA resources by location and license terms as one of: gold, hybrid, bronze, green or closed. See here for more information https://support.unpaywall.org/support/solutions/articles/44001777288-what-do-the-types-of-oa-status-green-gold-hybrid-and-bronze-mean-. |
has_repository_copy |
Is a full-text available in a repository? |
journal_is_oa |
Is the article published in a fully OA journal? Uses the Directory of Open Access Journals (DOAJ) as source. |
journal_is_in_doaj |
Is the journal listed in the Directory of Open Access Journals (DOAJ). |
journal_issns |
ISSNs, i.e. unique code to identify journals. |
journal_issn_l |
Linking ISSN. |
journal_name |
Journal title |
publisher |
Publisher |
published_date |
Date published |
year |
Year published. |
title |
Publication title. |
updated_resource |
Time when the data for this resource was last updated. |
authors |
Lists authors (if available) |
The columns best_oa_location
and
oa_locations
are list-columns that contain useful metadata
about the OA sources found by Unpaywall These are
Column | Description |
---|---|
endpoint_id |
Unique repository identifier |
evidence |
How the OA location was found and is characterized by Unpaywall? |
host_type |
OA full-text provided by publisher or
repository . |
is_best |
Is this location the for its resource? |
license |
The license under which this copy is published |
oa_date |
When this document first became available at this location |
pmh_id |
OAI-PMH endpoint where we found this location |
repository_institution |
Hosting institution of the repository. |
updated |
Time when the data for this location was last updated |
url |
The URL where you can find this OA copy. |
url_for_landing_page |
The URL for a landing page describing this OA copy. |
url_for_pdf |
The URL with a PDF version of this OA copy. |
versions |
The content version accessible at this location following the DRIVER 2.0 Guidelines (https://wiki.surfnet.nl/display/DRIVERguidelines/DRIVER-VERSION+Mappings) |
The Unpaywall schema is also described here: https://unpaywall.org/data-format.
The columns best_oa_location
. oa_locations
and oa_locations_embargoed
are list-columns that contain
useful metadata about the OA sources found by Unpaywall.
If .flatten = TRUE
the list-column
oa_locations
will be restructured in a long format where
each OA fulltext is represented by one row, which allows to take into
account all OA locations found by Unpaywall in a data analysis.
library(dplyr)
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
"10.1103/physreve.88.012814",
"10.1093/reseval/rvaa038",
"10.1101/2020.05.22.111294",
"10.1093/bioinformatics/btw541"),
email = "najko.jahn@gmail.com", .flatten = TRUE) %>%
dplyr::count(is_oa, evidence, is_best)
## # A tibble: 8 × 4
## is_oa evidence is_best n
## <lgl> <chr> <lgl> <int>
## 1 TRUE oa journal (via doaj) TRUE 2
## 2 TRUE oa repository (semantic scholar lookup) FALSE 1
## 3 TRUE oa repository (via OAI-PMH doi match) FALSE 5
## 4 TRUE oa repository (via OAI-PMH title and first author match) TRUE 1
## 5 TRUE oa repository (via crossref license) FALSE 1
## 6 TRUE oa repository (via pmcid lookup) FALSE 2
## 7 TRUE open (via crossref license) TRUE 1
## 8 TRUE open (via crossref license, author manuscript) TRUE 1
Any API restrictions?
There are no API restrictions. However, Unpaywall requires an email
address when using its API. If you are too tired to type in your email
address every time, you can store the email in the
.Renviron
file with the option
roadoi_email
roadoi_email = "najko.jahn@gmail.com"
You can open your .Renviron
file calling
Save the file and restart your R session. To stop sharing the email
when using roadoi, delete it from your .Renviron
file.
Keeping track of crawling
To follow your API call, and to estimate the time until completion,
use the .progress
parameter inherited from
plyr
to display a progress bar.
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
"10.1103/physreve.88.012814"),
email = "najko.jahn@gmail.com",
.progress = "text")
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
## # A tibble: 2 × 21
## doi best_oa_location oa_locations oa_locations_embargoed data_standard is_oa
## <chr> <list> <list> <list> <int> <lgl>
## 1 10.1… <tibble> <tibble> <tibble [0 × 0]> 2 TRUE
## 2 10.1… <tibble [1 × 9]> <tibble> <tibble [0 × 0]> 2 TRUE
## # ℹ 15 more variables: is_paratext <lgl>, genre <chr>, oa_status <chr>,
## # has_repository_copy <lgl>, journal_is_oa <lgl>, journal_is_in_doaj <lgl>,
## # journal_issns <chr>, journal_issn_l <chr>, journal_name <chr>,
## # publisher <chr>, published_date <chr>, year <chr>, title <chr>,
## # updated_resource <chr>, authors <list>
References
Piwowar, H., Priem, J., Larivière, V., Alperin, J. P., Matthias, L., Norlander, B., … Haustein, S. (2018). The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles. PeerJ, 6, e4375. https://doi.org/10.7717/peerj.4375