A Brief Introduction to openalexR
Source:vignettes/articles/A_Brief_Introduction_to_openalexR.Rmd
A_Brief_Introduction_to_openalexR.Rmd
https://github.com/ropensci/openalexR
Latest version: 1.4.0, 2024-10-24
by Massimo Aria
Full Professor in Social Statistics
PhD in Computational Statistics
Laboratory and Research Group STAD Statistics, Technology, Data Analysis
Department of Economics and Statistics
University of Naples Federico II
email aria@unina.it
An R-package to gather bibliographic data from OpenAlex
openalexR helps you interface with the OpenAlex API to retrieve bibliographic infomation about publications, authors, institutions, sources, funders, publishers, topics and concepts with 5 main functions:
oa_query()
: generates a valid query, written following the OpenAlex API syntax, from a set of arguments provided by the user.oa_request()
: downloads a collection of entities matching the query created byoa_query()
or manually written by the user, and returns a JSON object in a list format.oa2df()
: converts the JSON object in classical bibliographic tibble/data frame.oa_fetch()
: composes three functions above so the user can execute everything in one step, i.e.,oa_query |> oa_request |> oa2df
oa_random()
: to get random entity, e.g.,oa_random("works")
gives a different work each time you run it
Works (think papers, publications)
This paper:
Aria, M., & Cuccurullo, C. (2017). bibliometrix:
An R-tool for comprehensive science mapping analysis.
Journal of informetrics, 11(4), 959-975.
is associated to the OpenAlex-id
W2755950973. If you know your paper’s OpenAlex ID, all
you need to do is passing identifier = <openalex id>
as an argument in oa_fetch()
:
paper_id <- oa_fetch(
identifier = "W2755950973",
entity = "works",
verbose = TRUE
)
## Requesting url: https://api.openalex.org/works/W2755950973
dplyr::glimpse(paper_id)
## Rows: 1
## Columns: 39
## $ id <chr> "https://openalex.org/W2755950973"
## $ title <chr> "bibliometrix : An R-tool for comprehensiv…
## $ display_name <chr> "bibliometrix : An R-tool for comprehensiv…
## $ author <list> [<data.frame[2 x 12]>]
## $ ab <chr> "The use of bibliometrics is gradually ext…
## $ publication_date <date> 2017-09-12
## $ so <chr> "Journal of Informetrics"
## $ so_id <chr> "https://openalex.org/S205292342"
## $ host_organization <chr> "Elsevier BV"
## $ issn_l <chr> "1751-1577"
## $ url <chr> "https://doi.org/10.1016/j.joi.2017.08.007"
## $ pdf_url <lgl> NA
## $ license <lgl> NA
## $ version <lgl> NA
## $ first_page <chr> "959"
## $ last_page <chr> "975"
## $ volume <chr> "11"
## $ issue <chr> "4"
## $ is_oa <lgl> FALSE
## $ is_oa_anywhere <lgl> FALSE
## $ oa_status <chr> "closed"
## $ oa_url <lgl> NA
## $ any_repository_has_fulltext <lgl> FALSE
## $ language <chr> "en"
## $ grants <lgl> NA
## $ cited_by_count <int> 7406
## $ counts_by_year <list> [<data.frame[10 x 2]>]
## $ publication_year <int> 2017
## $ cited_by_api_url <chr> "https://api.openalex.org/works?filter=ci…
## $ ids <list> <"https://openalex.org/W2755950973", "http…
## $ doi <chr> "https://doi.org/10.1016/j.joi.2017.08.007"
## $ type <chr> "article"
## $ referenced_works <list> <"https://openalex.org/W1497199863", "htt…
## $ related_works <list> <"https://openalex.org/W45233828", "https:…
## $ is_paratext <lgl> FALSE
## $ is_retracted <lgl> FALSE
## $ concepts <list> [<data.frame[10 x 5]>]
## $ topics <list> [<tbl_df[12 x 5]>]
## $ apc <list> [<data.frame[2 x 5]>]
oa_fetch()
is a composition of functions:
oa_query |> oa_request |> oa2df
. As results,
oa_query()
returns the query string including the OpenAlex
endpoint API server address (default). oa_request()
downloads the bibliographic records matching the query. Finally,
oa2df()
converts the final result list to a tibble. The
final result is a complicated tibble, but we can use
show_works()
to display a simplified version:
paper_id %>%
show_works() %>%
knitr::kable()
id | display_name | first_author | last_author | so | url | is_oa | top_concepts |
---|---|---|---|---|---|---|---|
W2755950973 | bibliometrix : An R-tool for comprehensive science mapping analysis | Massimo Aria | Corrado Cuccurullo | Journal of Informetrics | https://doi.org/10.1016/j.joi.2017.08.007 | FALSE | Workflow, Bibliometrics, Software |
External id formats
OpenAlex endpoint accepts OpenAlex IDs and other external IDs (e.g., DOI, ISSN) in several formats, including Digital Object Identifier (DOI) and Persistent Identifiers (PIDs).
oa_fetch(
# identifier = "https://doi.org/10.1016/j.joi.2017.08.007", # would also work (PIDs)
identifier = "doi:10.1016/j.joi.2017.08.007",
entity = "works"
) %>%
show_works() %>%
knitr::kable()
id | display_name | first_author | last_author | so | url | is_oa | top_concepts |
---|---|---|---|---|---|---|---|
W2755950973 | bibliometrix : An R-tool for comprehensive science mapping analysis | Massimo Aria | Corrado Cuccurullo | Journal of Informetrics | https://doi.org/10.1016/j.joi.2017.08.007 | FALSE | Workflow, Bibliometrics, Software |
More than one publications/authors
https://api.openalex.org/authors/https://orcid.org/
If you know the OpenAlex IDs of these entities, you can also feed
them into the identifier
argument.
oa_fetch(
identifier = c("W2741809807", "W2755950973"),
# identifier = c("https://doi.org/10.1016/j.joi.2017.08.007", "https://doi.org/10.1016/j.joi.2017.08.007"), # TODO
entity = "works",
verbose = TRUE
) %>%
show_works() %>%
knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=openalex%3AW2741809807%7CW2755950973
## Getting 1 page of results with a total of 2 records...
id | display_name | first_author | last_author | so | url | is_oa | top_concepts |
---|---|---|---|---|---|---|---|
W2755950973 | bibliometrix : An R-tool for comprehensive science mapping analysis | Massimo Aria | Corrado Cuccurullo | Journal of Informetrics | https://doi.org/10.1016/j.joi.2017.08.007 | FALSE | Workflow, Bibliometrics, Software |
W2741809807 | The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles | Heather Piwowar | Stefanie Haustein | PeerJ | https://doi.org/10.7717/peerj.4375 | TRUE | Citation, License, Bibliometrics |
However, if you only know their external identifies, say, DOIs, you
would need to use doi
as a filter (either the canonical
form with https://doi.org/ or
without should work):
oa_fetch(
# identifier = c("W2741809807", "W2755950973"),
doi = c("10.1016/j.joi.2017.08.007", "https://doi.org/10.1093/bioinformatics/btab727"),
entity = "works",
verbose = TRUE
) %>%
show_works() %>%
knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=doi%3A10.1016%2Fj.joi.2017.08.007%7Chttps%3A%2F%2Fdoi.org%2F10.1093%2Fbioinformatics%2Fbtab727
## Getting 1 page of results with a total of 2 records...
id | display_name | first_author | last_author | so | url | is_oa | top_concepts |
---|---|---|---|---|---|---|---|
W2755950973 | bibliometrix : An R-tool for comprehensive science mapping analysis | Massimo Aria | Corrado Cuccurullo | Journal of Informetrics | https://doi.org/10.1016/j.joi.2017.08.007 | FALSE | Workflow, Bibliometrics, Software |
W3206431085 | PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods | Joseph D. Romano | Jason H. Moore | Bioinformatics | https://doi.org/10.1093/bioinformatics/btab727 | TRUE | Python (programming language), Benchmarking, Benchmark (surveying) |
Filters
In most cases, we are interested in downloading a collection of items that meet one or more inclusion/exclusion criteria (filters). Supported filters for each entity are listed here.
Example: We want to download all works published by a set of authors. We can do this by filtering on the authorships.author.id/author.id or authorships.author.orcid/author.orcid attribute (see more on works attributes):
oa_fetch(
entity = "works",
author.id = c("A5048491430", "A5023888391"),
verbose = TRUE
) %>%
show_works() %>%
knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=author.id%3AA5048491430%7CA5023888391
## Getting 1 page of results with a total of 125 records...
## Warning in oa_request(oa_query(filter = filter_i, multiple_id = multiple_id, :
## The following work(s) have truncated lists of authors: W4230863633.
## Query each work separately by its identifier to get full list of authors.
## For example:
## lapply(c("W4230863633"), \(x) oa_fetch(identifier = x))
## Details at https://docs.openalex.org/api-entities/authors/limitations.
id | display_name | first_author | last_author | so | url | is_oa | top_concepts |
---|---|---|---|---|---|---|---|
W2741809807 | The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles | Heather Piwowar | Stefanie Haustein | PeerJ | https://doi.org/10.7717/peerj.4375 | TRUE | Citation, License, Bibliometrics |
W2046766973 | Sharing Detailed Research Data Is Associated with Increased Citation Rate | Heather Piwowar | Douglas B. Fridsma | PLoS ONE | https://doi.org/10.1371/journal.pone.0000308 | TRUE | Citation, Clinical trial, Impact factor |
W2045657963 | Data reuse and the open data citation advantage | Heather Piwowar | Todd Vision | PeerJ | https://doi.org/10.7717/peerj.175 | TRUE | Citation, Reuse |
W1572136682 | Altmetrics: Value all research products | Heather Piwowar | NA | Nature | https://doi.org/10.1038/493159a | TRUE | Altmetrics, Value (mathematics) |
W2122130843 | Scientometrics 2.0: New metrics of scholarly impact on the social Web | Jason Priem | Bradely H. Hemminger | First Monday | https://doi.org/10.5210/fm.v15i7.2874 | FALSE | Bookmarking, Altmetrics, Social media |
W1553564559 | Altmetrics in the wild: Using social media to explore scholarly impact | Jason Priem | Bradley M. Hemminger | arXiv (Cornell University) | https://arxiv.org/abs/1203.4745 | TRUE | Altmetrics, Social media, Citation |
orcids <- c("0000-0003-3737-6565", "0000-0002-8517-9411")
canonical_orcids <- paste0("https://orcid.org/", orcids)
oa_fetch(
entity = "works",
author.orcid = canonical_orcids,
verbose = TRUE
) %>%
show_works() %>%
knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=author.orcid%3Ahttps%3A%2F%2Forcid.org%2F0000-0003-3737-6565%7Chttps%3A%2F%2Forcid.org%2F0000-0002-8517-9411
## Getting 2 pages of results with a total of 320 records...
## Warning in oa_request(oa_query(filter = filter_i, multiple_id = multiple_id, :
## The following work(s) have truncated lists of authors: W3202287394, W3207775241.
## Query each work separately by its identifier to get full list of authors.
## For example:
## lapply(c("W3202287394", "W3207775241"), \(x) oa_fetch(identifier = x))
## Details at https://docs.openalex.org/api-entities/authors/limitations.
id | display_name | first_author | last_author | so | url | is_oa | top_concepts |
---|---|---|---|---|---|---|---|
W2755950973 | bibliometrix : An R-tool for comprehensive science mapping analysis | Massimo Aria | Corrado Cuccurullo | Journal of Informetrics | https://doi.org/10.1016/j.joi.2017.08.007 | FALSE | Workflow, Bibliometrics, Software |
W2777772618 | Interoception and Mental Health: A Roadmap | Sahib S. Khalsa | Nancy Zucker | Biological Psychiatry Cognitive Neuroscience and Neuroimaging | https://doi.org/10.1016/j.bpsc.2017.12.004 | TRUE | Mental health, Allostasis, Anxiety |
W2955219525 | Scaling tree-based automated machine learning to biomedical big data with a feature set selector | Trang T. Le | Jason H. Moore | Bioinformatics | https://doi.org/10.1093/bioinformatics/btz470 | TRUE | Pipeline (software), Scalability, Feature (linguistics) |
W3005144120 | Mapping the Evolution of Social Research and Data Science on 30 Years of Social Indicators Research | Massimo Aria | Maria Spano | Social Indicators Research | https://doi.org/10.1007/s11205-020-02281-3 | FALSE | Human geography, Data collection, Position (finance) |
W2408216567 | Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains | Corrado Cuccurullo | Fabrizia Sarto | Scientometrics | https://doi.org/10.1007/s11192-016-1948-8 | FALSE | Domain (mathematical analysis), Content analysis, Public domain |
W2952824318 | A Nonlinear Simulation Framework Supports Adjusting for Age When Analyzing BrainAGE | Trang T. Le | Martin P. Paulus | Frontiers in Aging Neuroscience | https://doi.org/10.3389/fnagi.2018.00317 | TRUE | Correlation, Mood, Set (abstract data type) |
Example: We want to download all works that have been cited more than 50 times, published between 2020 and 2021, and include the strings “bibliometric analysis” or “science mapping” in the title. Maybe we also want the results to be sorted by total citations in a descending order.
Setting the argument count_only = TRUE
, the function
oa_request()
returns the number of items matching the query
without downloading the collection.
oa_fetch(
entity = "works",
title.search = c("bibliometric analysis", "science mapping"),
cited_by_count = ">50",
from_publication_date = "2020-01-01",
to_publication_date = "2021-12-31",
options = list(sort = "cited_by_count:desc"),
count_only = TRUE,
verbose = TRUE
)
## Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20analysis%7Cscience%20mapping%2Ccited_by_count%3A%3E50%2Cfrom_publication_date%3A2020-01-01%2Cto_publication_date%3A2021-12-31&sort=cited_by_count%3Adesc
## count db_response_time_ms page per_page
## [1,] 376 58 1 1
We can now download the records and transform it into a tibble/data
frame by setting count_only = FALSE
(also the default
value):
oa_fetch(
entity = "works",
title.search = c("bibliometric analysis", "science mapping"),
cited_by_count = ">50",
from_publication_date = "2020-01-01",
to_publication_date = "2021-12-31",
options = list(sort = "cited_by_count:desc"),
count_only = FALSE
) %>%
show_works() %>%
knitr::kable()
id | display_name | first_author | last_author | so | url | is_oa | top_concepts |
---|---|---|---|---|---|---|---|
W3160856016 | How to conduct a bibliometric analysis: An overview and guidelines | Naveen Donthu | Weng Marc Lim | Journal of Business Research | https://doi.org/10.1016/j.jbusres.2021.04.070 | TRUE | Bibliometrics, Field (mathematics), Resource (disambiguation) |
W3001491100 | Software tools for conducting bibliometric analysis in science: An up-to-date review | José A. Moral-Muñoz | Manuel J. Cobo | El Profesional de la Informacion | https://doi.org/10.3145/epi.2020.ene.03 | TRUE | Bibliometrics, Visualization, Set (abstract data type) |
W3038273726 | Investigating the emerging COVID-19 research trends in the field of business and management: A bibliometric analysis approach | Surabhi Verma | Anders Gustafsson | Journal of Business Research | https://doi.org/10.1016/j.jbusres.2020.06.057 | TRUE | Bibliometrics, Field (mathematics), Empirical research |
W3044902155 | Financial literacy: A systematic review and bibliometric analysis | Kirti Goyal | Satish Kumar | International Journal of Consumer Studies | https://doi.org/10.1111/ijcs.12605 | FALSE | Financial literacy, Content analysis, Citation |
W3042215340 | A bibliometric analysis using VOSviewer of publications on COVID-19 | Yuetian Yu | Erzhen Chen | Annals of Translational Medicine | https://doi.org/10.21037/atm-20-4235 | TRUE | Citation, Bibliometrics, China |
W3198357836 | Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis | John W. Goodell | Debidutta Pattnaik | Journal of Behavioral and Experimental Finance | https://doi.org/10.1016/j.jbef.2021.100577 | FALSE | Scholarship, Valuation (finance), Corporate finance |
Read on to see how we can shorten these two function calls.
Authors
Similarly to work, we can use identifier to pass in authors’ OpenAlex ID.
Example: We want more information on authors with IDs A5069892096 and A5023888391.
oa_fetch(
identifier = c("A5069892096", "A5023888391"),
verbose = TRUE
) %>%
show_authors() %>%
knitr::kable()
## Requesting url: https://api.openalex.org/authors?filter=openalex%3AA5069892096%7CA5023888391
## Getting 1 page of results with a total of 2 records...
id | display_name | orcid | works_count | cited_by_count | affiliation_display_name | top_concepts |
---|---|---|---|---|---|---|
A5069892096 | Massimo Aria | 0000-0002-8517-9411 | 197 | 10955 | University of Naples Federico II | Physiology, Pathology and Forensic Medicine, Periodontics |
A5023888391 | Jason Priem | 0000-0001-6187-6610 | 62 | 3693 | OurResearch | Statistics, Probability and Uncertainty, Information Systems, Communication |
Example: We want download all authors’ records of scholars who work at the University of Naples Federico II (OpenAlex ID: I71267560) and who have published more than 499 works.
Let’s first check how many records match the query, then set
count_only = FALSE
to download the entire collection. We
can do this by first defining a list of arguments, then adding
count_only
(default FALSE
) to this list:
my_arguments <- list(
entity = "authors",
last_known_institutions.id = "I71267560",
works_count = ">499"
)
do.call(oa_fetch, c(my_arguments, list(count_only = TRUE)))
## count db_response_time_ms page per_page
## [1,] 46 164 1 1
do.call(oa_fetch, my_arguments) %>%
show_authors() %>%
knitr::kable()
## Warning: Unknown or uninitialised column: `name`.
## Warning: Unknown or uninitialised column: `display_name`.
## Warning: Unknown or uninitialised column: `name`.
## Warning: Unknown or uninitialised column: `display_name`.
## Warning: Unknown or uninitialised column: `name`.
## Warning: Unknown or uninitialised column: `display_name`.
## Warning: Unknown or uninitialised column: `name`.
## Warning: Unknown or uninitialised column: `display_name`.
id | display_name | orcid | works_count | cited_by_count | affiliation_display_name | top_concepts |
---|---|---|---|---|---|---|
A5091797706 | L. Lista | 0000-0001-6471-5492 | 3438 | 155375 | Istituto Nazionale di Fisica Nucleare, Sezione di Napoli | Nuclear and High Energy Physics, Nuclear and High Energy Physics, Nuclear and High Energy Physics |
A5106552509 | C. Sciacca | 0000-0002-8412-4072 | 2710 | 94358 | University of Naples Federico II | Nuclear and High Energy Physics, Nuclear and High Energy Physics, Nuclear and High Energy Physics |
A5106315809 | M. Merola | 0000-0002-7082-8108 | 1326 | 70620 | Istituto Nazionale di Fisica Nucleare, Sezione di Napoli | Nuclear and High Energy Physics, Nuclear and High Energy Physics, Nuclear and High Energy Physics |
A5003544129 | Annamaria Colao | 0000-0001-6986-266X | 1310 | 44049 | University of Naples Federico II | Endocrinology, Diabetes and Metabolism, Endocrinology, Diabetes and Metabolism, Surgery |
A5037805233 | Micaela Morelli | 0000-0003-0394-5782 | 1164 | 12062 | University of Naples Federico II | Cellular and Molecular Neuroscience, Cellular and Molecular Neuroscience, Neurology |
A5076706548 | Salvatore Capozziello | 0000-0003-4886-2024 | 1024 | 34376 | University of Naples Federico II | Astronomy and Astrophysics, Nuclear and High Energy Physics, Astronomy and Astrophysics |
You can also use other filters such as display_name
,
has_orcid
, and orcid
:
oa_fetch(
entity = "authors",
display_name.search = "Massimo Aria",
has_orcid = "true"
) %>%
show_authors() %>%
knitr::kable()
id | display_name | orcid | works_count | cited_by_count | affiliation_display_name | top_concepts |
---|---|---|---|---|---|---|
A5069892096 | Massimo Aria | 0000-0002-8517-9411 | 197 | 10955 | University of Naples Federico II | Physiology, Pathology and Forensic Medicine, Periodontics |
oa_fetch(
entity = "authors",
orcid = "0000-0002-8517-9411"
) %>%
show_authors() %>%
knitr::kable()
id | display_name | orcid | works_count | cited_by_count | affiliation_display_name | top_concepts |
---|---|---|---|---|---|---|
A5069892096 | Massimo Aria | 0000-0002-8517-9411 | 197 | 10955 | University of Naples Federico II | Physiology, Pathology and Forensic Medicine, Periodontics |
Institutions
Example: We want download all records regarding Italian institutions (country_code:it) that are classified as educational (type:education). Again, we check how many records match the query then download the collection:
italian_insts <- list(
entity = "institutions",
country_code = "it",
type = "education",
verbose = TRUE
)
do.call(oa_fetch, c(italian_insts, list(count_only = TRUE)))
## Requesting url: https://api.openalex.org/institutions?filter=country_code%3Ait%2Ctype%3Aeducation
## count db_response_time_ms page per_page
## [1,] 232 41 1 1
## Requesting url: https://api.openalex.org/institutions?filter=country_code%3Ait%2Ctype%3Aeducation
## Getting 2 pages of results with a total of 232 records...
## Rows: 232
## Columns: 21
## $ id <chr> "https://openalex.org/I861853513", "https:/…
## $ display_name <chr> "Sapienza University of Rome", "University …
## $ display_name_alternatives <list> <"Université La Sapienza de Rome", "Rimska…
## $ display_name_acronyms <list> NA, "UNIMI", "UNIBO", "UNIPD", NA, NA, "UN…
## $ display_name_international <list> <"Universiteit van Rome", "جامعة روما سابي…
## $ ror <chr> "https://ror.org/02be6w209", "https://ror.o…
## $ ids <list> <"https://openalex.org/I861853513", "https…
## $ country_code <chr> "IT", "IT", "IT", "IT", "IT", "IT", "IT", "…
## $ geo <list> [<data.frame[1 x 7]>], [<data.frame[1 x 7]…
## $ type <chr> "education", "education", "education", "edu…
## $ homepage_url <chr> "https://www.uniroma1.it", "https://www.uni…
## $ image_url <chr> "https://commons.wikimedia.org/w/index.php?…
## $ image_thumbnail_url <chr> "https://commons.wikimedia.org/w/index.php?…
## $ associated_institutions <list> [<data.frame[4 x 6]>], [<data.frame[2 x 6]…
## $ works_count <int> 209637, 185366, 176817, 173604, 122598, 117…
## $ cited_by_count <int> 5134159, 5470057, 4673112, 5019135, 3162965…
## $ counts_by_year <list> [<data.frame[13 x 3]>], [<data.frame[13 x …
## $ works_api_url <chr> "https://api.openalex.org/works?filter=inst…
## $ topics <list> [<tbl_df[100 x 5]>], [<tbl_df[100 x 5]>], …
## $ updated_date <chr> "2024-10-23T18:34:17.183337", "2024-10-24T1…
## $ created_date <chr> "2016-06-24", "2016-06-24", "2016-06-24", "…
Concepts (think theme, keywords)
Example: We want to download the records of all the concepts that concern at least one million works:
popular_concepts <- list(
entity = "concepts",
works_count = ">1000000",
verbose = TRUE
)
do.call(oa_fetch, c(popular_concepts, list(count_only = TRUE)))
## Requesting url: https://api.openalex.org/concepts?filter=works_count%3A%3E1000000
## count db_response_time_ms page per_page
## [1,] 273 18 1 1
## Requesting url: https://api.openalex.org/concepts?filter=works_count%3A%3E1000000
## Getting 2 pages of results with a total of 273 records...
## Rows: 273
## Columns: 16
## $ id <chr> "https://openalex.org/C41008148", "https://…
## $ display_name <chr> "Computer science", "Medicine", "Biology", …
## $ display_name_international <list> <"informatika", "የኮምፒውተር፡ጥናት", "Informatic…
## $ description <chr> "study of computation", "field of study for…
## $ description_international <list> <"studie van berekening en inligtingverwer…
## $ wikidata <chr> "https://www.wikidata.org/wiki/Q21198", "ht…
## $ level <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1…
## $ ids <list> <"https://openalex.org/C41008148", "https:…
## $ image_url <chr> "https://upload.wikimedia.org/wikipedia/com…
## $ image_thumbnail_url <chr> "https://upload.wikimedia.org/wikipedia/com…
## $ ancestors <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, [<…
## $ related_concepts <list> [<data.frame[93 x 5]>], [<data.frame[51 x …
## $ works_count <int> 91640517, 63165832, 46945702, 43366261, 376…
## $ cited_by_count <int> 677663491, 814030740, 884155767, 570502558,…
## $ counts_by_year <list> [<data.frame[13 x 3]>], [<data.frame[13 x …
## $ works_api_url <chr> "https://api.openalex.org/works?filter=conc…
Other examples
Get all works citing a particular work
We can download all publications citing another publication by using the filter attribute cites.
For example, if we want to download all publications citing the
article Aria and Cuccurullo (2017), we have just to set the argument
filter as cites = "W2755950973"
where “W2755950973” is the
OA id for the article by Aria and Cuccurullo.
aria_count <- oa_fetch(
entity = "works",
cites = "W2755950973",
count_only = TRUE,
verbose = TRUE
)
## Requesting url: https://api.openalex.org/works?filter=cites%3AW2755950973
aria_count
## count db_response_time_ms page per_page
## [1,] 7509 109 1 1
This query will return a collection of NA publications. Among these articles, let’s download the ones published in the following year:
oa_fetch(
entity = "works",
cites = "W2755950973",
publication_year = 2018,
count_only = FALSE,
verbose = TRUE
) %>%
dplyr::glimpse()
## Requesting url: https://api.openalex.org/works?filter=cites%3AW2755950973%2Cpublication_year%3A2018
## Getting 1 page of results with a total of 31 records...
## Rows: 31
## Columns: 39
## $ id <chr> "https://openalex.org/W2896801517", "https…
## $ title <chr> "Global trends in infectious diseases of s…
## $ display_name <chr> "Global trends in infectious diseases of s…
## $ author <list> [<data.frame[2 x 12]>], [<data.frame[2 x …
## $ ab <chr> "Pork accounts for more than one-third of …
## $ publication_date <date> 2018-10-22, 2018-11-26, 2018-12-20, 2018-…
## $ so <chr> "Proceedings of the National Academy of Sc…
## $ so_id <chr> "https://openalex.org/S125754415", "https:…
## $ host_organization <chr> "National Academy of Sciences", "Wiley", "…
## $ issn_l <chr> "0027-8424", "0043-1397", "0169-5347", "00…
## $ url <chr> "https://doi.org/10.1073/pnas.1806068115",…
## $ pdf_url <chr> "https://www.pnas.org/content/pnas/115/45/…
## $ license <chr> NA, NA, NA, NA, "cc-by", NA, NA, "publishe…
## $ version <chr> "publishedVersion", "publishedVersion", NA…
## $ first_page <chr> "11495", "378", "224", "12", "e0207655", "…
## $ last_page <chr> "11500", "390", "238", "63", "e0207655", "…
## $ volume <chr> "115", "55", "34", "50", "13", "205", "45"…
## $ issue <chr> "45", "1", "3", "1", "11", NA, "3", "4-5",…
## $ is_oa <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALS…
## $ is_oa_anywhere <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,…
## $ oa_status <chr> "bronze", "bronze", "green", "bronze", "go…
## $ oa_url <chr> "https://www.pnas.org/content/pnas/115/45/…
## $ any_repository_has_fulltext <lgl> TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALS…
## $ language <chr> "en", "en", "en", "en", "en", "en", "en", …
## $ grants <list> <"https://openalex.org/F4320332299", "Nat…
## $ cited_by_count <int> 218, 199, 172, 150, 124, 114, 111, 84, 83,…
## $ counts_by_year <list> [<data.frame[7 x 2]>], [<data.frame[6 x 2…
## $ publication_year <int> 2018, 2018, 2018, 2018, 2018, 2018, 2018, …
## $ cited_by_api_url <chr> "https://api.openalex.org/works?filter=cit…
## $ ids <list> <"https://openalex.org/W2896801517", "htt…
## $ doi <chr> "https://doi.org/10.1073/pnas.1806068115",…
## $ type <chr> "review", "article", "article", "article",…
## $ referenced_works <list> <"https://openalex.org/W1530619192", "htt…
## $ related_works <list> <"https://openalex.org/W4384639906", "htt…
## $ is_paratext <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ is_retracted <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ concepts <list> [<data.frame[25 x 5]>], [<data.frame[22 x…
## $ topics <list> [<tbl_df[12 x 5]>], [<tbl_df[12 x 5]>], […
## $ apc <list> NA, [<data.frame[2 x 5]>], [<data.frame[2…
Convert an OpenAlex data frame to a bibliometrix object
The bibliometrix R-package (https://www.bibliometrix.org) provides a set of tools for quantitative research in bibliometrics and scientometrics. Today it represents one of the most used science mapping software in the world. In a recent survey on bibliometric analysis tools, Moral-Muñoz et al. (2020) wrote: “At this moment, maybe Bibliometrix and its Shiny platform contain the more extensive set of techniques implemented, and together with the easiness of its interface, could be a great software for practitioners”.
The function oa2bibliometrix converts a bibliographic data frame of works into a bibliometrix object. This object can be used as input collection of a science mapping workflow.
bib_ls <- list(
identifier = NULL,
entity = "works",
cites = "W2755950973",
from_publication_date = "2022-01-01",
to_publication_date = "2022-03-31"
)
do.call(oa_fetch, c(bib_ls, list(count_only = TRUE)))
## count db_response_time_ms page per_page
## [1,] 402 33 1 1
do.call(oa_fetch, bib_ls) %>%
oa2bibliometrix() %>%
dplyr::glimpse()
## Rows: 402
## Columns: 53
## $ AU <chr> "YIXIA CHEN;MING‐WEI LIN;DAN ZHUANG", "WEN…
## $ RP <chr> "COLLEGE OF COMPUTER AND CYBER SECURITY, F…
## $ C1 <chr> "COLLEGE OF COMPUTER AND CYBER SECURITY, F…
## $ AU_UN <chr> "FUJIAN NORMAL UNIVERSITY;FUJIAN NORMAL UN…
## $ AU_CO <chr> "CHINA;CHINA;CHINA", "MALAYSIA;INDIA;INDIA…
## $ ID <chr> "WASTEWATER;ENVIRONMENTAL SCIENCE;CONTAMIN…
## $ id_url <chr> "https://openalex.org/W4210864411", "https…
## $ title <chr> "Wastewater treatment and emerging contami…
## $ author <list> [<data.frame[3 x 12]>], [<data.frame[4 x …
## $ publication_date <date> 2022-02-08, 2022-03-08, 2022-02-09, 2022-…
## $ so_id <chr> "https://openalex.org/S203465130", "https:…
## $ host_organization <chr> "Elsevier BV", "Wiley", "Taylor & Francis"…
## $ issn_l <chr> "0045-6535", "0742-6046", "0020-7543", "10…
## $ url <chr> "https://doi.org/10.1016/j.chemosphere.202…
## $ pdf_url <chr> NA, NA, NA, "https://link.springer.com/con…
## $ license <chr> NA, NA, NA, NA, "cc-by", NA, "cc-by-nc-nd"…
## $ version <chr> NA, NA, NA, "publishedVersion", "published…
## $ first_page <chr> "133932", "1129", "7527", "297", "104608",…
## $ last_page <chr> "133932", "1155", "7550", "338", "104608",…
## $ volume <chr> "297", "39", "60", "32", "136", "30", "159…
## $ issue <chr> NA, "6", "24", "1", NA, "2", NA, NA, "6", …
## $ is_oa <lgl> FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TR…
## $ is_oa_anywhere <lgl> FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TR…
## $ oa_status <chr> "closed", "closed", "closed", "bronze", "h…
## $ oa_url <chr> NA, NA, NA, "https://link.springer.com/con…
## $ any_repository_has_fulltext <lgl> FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FA…
## $ language <chr> "en", "en", "en", "en", "en", "en", "en", …
## $ grants <list> <"https://openalex.org/F4320321001", "Nat…
## $ counts_by_year <list> [<data.frame[3 x 2]>], [<data.frame[3 x 2…
## $ cited_by_api_url <chr> "https://api.openalex.org/works?filter=cit…
## $ ids <list> <"https://openalex.org/W4210864411", "htt…
## $ doi <chr> "https://doi.org/10.1016/j.chemosphere.202…
## $ referenced_works <list> <"https://openalex.org/W1854025783", "htt…
## $ related_works <list> <"https://openalex.org/W4388943160", "htt…
## $ is_paratext <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ is_retracted <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ concepts <list> [<data.frame[16 x 5]>], [<data.frame[24 x…
## $ topics <list> [<tbl_df[8 x 5]>], [<tbl_df[12 x 5]>], [<…
## $ apc <list> [<data.frame[2 x 5]>], [<data.frame[2 x 5…
## $ id_oa <chr> "W4210864411", "W4220991995", "W4210997151…
## $ CR <chr> "W1854025783;W1896090423;W1965064785;W1990…
## $ TI <chr> "WASTEWATER TREATMENT AND EMERGING CONTAMI…
## $ AB <chr> "IN RECENT YEARS, EMERGING CONTAMINANTS HA…
## $ SO <chr> "CHEMOSPHERE", "PSYCHOLOGY AND MARKETING",…
## $ DT <chr> "REVIEW", "ARTICLE", "ARTICLE", "ARTICLE",…
## $ DB <chr> "OPENALEX", "OPENALEX", "OPENALEX", "OPENA…
## $ JI <chr> "S203465130", "S102896891", "S65690446", "…
## $ J9 <chr> "S203465130", "S102896891", "S65690446", "…
## $ PY <int> 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
## $ TC <int> 172, 140, 118, 116, 101, 98, 90, 90, 87, 8…
## $ DI <chr> "10.1016/j.chemosphere.2022.133932", "10.1…
## $ SR_FULL <chr> "YIXIA CHEN, 2022, CHEMOSPHERE", "WENG MAR…
## $ SR <chr> "YIXIA CHEN, 2022, CHEMOSPHERE", "WENG MAR…
About OpenAlex
OpenAlex is a fully open catalog of the global research system. It’s named after the ancient Library of Alexandria. The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities:
Works are papers, books, datasets, etc; they cite other works
Authors are people who create works
Institutions are universities and other orgs that are affiliated with works (via authors)
Concepts tag Works with a topic
Acknowledgements
Package hex was made with Midjourney and thus inherits a CC BY-NC 4.0 license.