A Brief Introduction to openalexR

https://github.com/ropensci/openalexR

Latest version: 2.0.1, 2025-07-23

by Massimo Aria

Full Professor in Social Statistics

PhD in Computational Statistics

Laboratory and Research Group STAD Statistics, Technology, Data Analysis

Department of Economics and Statistics

University of Naples Federico II

email aria@unina.it

https://massimoaria.com

An R-package to gather bibliographic data from OpenAlex

openalexR helps you interface with the OpenAlex API to retrieve bibliographic infomation about publications, authors, institutions, sources, funders, publishers, topics and concepts with 5 main functions:

oa_query(): generates a valid query, written following the OpenAlex API syntax, from a set of arguments provided by the user.
oa_request(): downloads a collection of entities matching the query created by oa_query() or manually written by the user, and returns a JSON object in a list format.
oa2df(): converts the JSON object in classical bibliographic tibble/data frame.
oa_fetch(): composes three functions above so the user can execute everything in one step, i.e., oa_query |> oa_request |> oa2df
oa_random(): to get random entity, e.g., oa_random("works") gives a different work each time you run it

library(openalexR)
library(dplyr)

Works (think papers, publications)

This paper:

Aria, M., & Cuccurullo, C. (2017). bibliometrix: 
An R-tool for comprehensive science mapping analysis. 
Journal of informetrics, 11(4), 959-975.

is associated to the OpenAlex-id W2755950973. If you know your paper’s OpenAlex ID, all you need to do is passing identifier = <openalex id> as an argument in oa_fetch():

paper_id <- oa_fetch(
  identifier = "W2755950973",
  entity = "works",
  verbose = TRUE
)

## Requesting url: https://api.openalex.org/works/W2755950973

dplyr::glimpse(paper_id)

## Rows: 1
## Columns: 39
## $ id                          <chr> "https://openalex.org/W2755950973"
## $ title                       <chr> "bibliometrix : An R-tool for comprehensiv…
## $ display_name                <chr> "bibliometrix : An R-tool for comprehensiv…
## $ authorships                 <list> [<tbl_df[2 x 7]>]
## $ doi                         <chr> "https://doi.org/10.1016/j.joi.2017.08.007"
## $ publication_date            <date> 2017-09-12
## $ publication_year            <int> 2017
## $ fwci                        <dbl> 105.654
## $ cited_by_count              <int> 9965
## $ counts_by_year              <list> [<data.frame[11 x 2]>]
## $ cited_by_api_url            <chr> "https://api.openalex.org/works?filter=cit…
## $ ids                         <list> <"https://openalex.org/W2755950973", "htt…
## $ type                        <chr> "article"
## $ is_oa                       <lgl> FALSE
## $ is_oa_anywhere              <lgl> FALSE
## $ oa_status                   <chr> "closed"
## $ oa_url                      <lgl> NA
## $ any_repository_has_fulltext <lgl> FALSE
## $ source_display_name         <chr> "Journal of Informetrics"
## $ source_id                   <chr> "https://openalex.org/S205292342"
## $ issn_l                      <chr> "1751-1577"
## $ host_organization           <chr> "https://openalex.org/P4310320990"
## $ host_organization_name      <chr> "Elsevier BV"
## $ landing_page_url            <chr> "https://doi.org/10.1016/j.joi.2017.08.007"
## $ referenced_works            <list> <"https://openalex.org/W1497199863", "http…
## $ referenced_works_count      <int> 70
## $ related_works               <list> <"https://openalex.org/W45233828", "https…
## $ concepts                    <list> [<data.frame[10 x 5]>]
## $ topics                      <list> [<tbl_df[12 x 5]>]
## $ keywords                    <list> [<data.frame[1 x 3]>]
## $ is_paratext                 <lgl> FALSE
## $ is_retracted                <lgl> FALSE
## $ language                    <chr> "en"
## $ grants                      <lgl> NA
## $ apc                         <list> [<data.frame[2 x 5]>]
## $ first_page                  <chr> "959"
## $ last_page                   <chr> "975"
## $ volume                      <chr> "11"
## $ issue                       <chr> "4"

oa_fetch() is a composition of functions: oa_query |> oa_request |> oa2df. As results, oa_query() returns the query string including the OpenAlex endpoint API server address (default). oa_request() downloads the bibliographic records matching the query. Finally, oa2df() converts the final result list to a tibble. The final result is a complicated tibble, but we can use show_works() to display a simplified version:

paper_id %>% 
  show_works() %>%
  knitr::kable()

id	display_name	first_author	last_author	is_oa	top_concepts
W2755950973	bibliometrix : An R-tool for comprehensive science mapping analysis	Massimo Aria	Corrado Cuccurullo	FALSE	Workflow, Bibliometrics, Software

External id formats

OpenAlex endpoint accepts OpenAlex IDs and other external IDs (e.g., DOI, ISSN) in several formats, including Digital Object Identifier (DOI) and Persistent Identifiers (PIDs).

oa_fetch(
  # identifier = "https://doi.org/10.1016/j.joi.2017.08.007", # would also work (PIDs)
  identifier = "doi:10.1016/j.joi.2017.08.007",
  entity = "works"
) %>% 
  show_works() %>%
  knitr::kable()

id	display_name	first_author	last_author	is_oa	top_concepts
W2755950973	bibliometrix : An R-tool for comprehensive science mapping analysis	Massimo Aria	Corrado Cuccurullo	FALSE	Workflow, Bibliometrics, Software

More than one publications/authors

https://api.openalex.org/authors/https://orcid.org/

If you know the OpenAlex IDs of these entities, you can also feed them into the identifier argument.

oa_fetch(
  identifier = c("W2741809807", "W2755950973"),
  # identifier = c("https://doi.org/10.1016/j.joi.2017.08.007", "https://doi.org/10.1016/j.joi.2017.08.007"), # TODO
  entity = "works",
  verbose = TRUE
) %>% 
  show_works() %>%
  knitr::kable()

## Requesting url: https://api.openalex.org/works?filter=openalex%3AW2741809807%7CW2755950973

## Getting 1 page of results with a total of 2 records...

id	display_name	first_author	last_author	is_oa	top_concepts
W2755950973	bibliometrix : An R-tool for comprehensive science mapping analysis	Massimo Aria	Corrado Cuccurullo	FALSE	Workflow, Bibliometrics, Software
W2741809807	The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles	Heather Piwowar	Stefanie Haustein	TRUE	Citation, License, Bibliometrics

However, if you only know their external identifies, say, DOIs, you would need to use doi as a filter (either the canonical form with https://doi.org/ or without should work):

oa_fetch(
  # identifier = c("W2741809807", "W2755950973"),
  doi = c("10.1016/j.joi.2017.08.007", "https://doi.org/10.1093/bioinformatics/btab727"),
  entity = "works",
  verbose = TRUE
) %>% 
  show_works() %>%
  knitr::kable()

## Requesting url: https://api.openalex.org/works?filter=doi%3A10.1016%2Fj.joi.2017.08.007%7Chttps%3A%2F%2Fdoi.org%2F10.1093%2Fbioinformatics%2Fbtab727

## Getting 1 page of results with a total of 2 records...

id	display_name	first_author	last_author	is_oa	top_concepts
W2755950973	bibliometrix : An R-tool for comprehensive science mapping analysis	Massimo Aria	Corrado Cuccurullo	FALSE	Workflow, Bibliometrics, Software
W3206431085	PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods	Joseph D. Romano	Jason H. Moore	TRUE	Python (programming language), Benchmarking, Benchmark (surveying)

Filters

In most cases, we are interested in downloading a collection of items that meet one or more inclusion/exclusion criteria (filters). Supported filters for each entity are listed here.

Example: We want to download all works published by a set of authors. We can do this by filtering on the authorships.author.id/author.id or authorships.author.orcid/author.orcid attribute (see more on works attributes):

oa_fetch(
  entity = "works",
  author.id = c("A5048491430", "A5023888391"),
  verbose = TRUE
) %>% 
  show_works() %>% 
  knitr::kable()

## Requesting url: https://api.openalex.org/works?filter=author.id%3AA5048491430%7CA5023888391

## Getting 1 page of results with a total of 125 records...

## Warning in oa_request(oa_query(filter = filter_i, multiple_id = multiple_id, : 
## The following work(s) have truncated lists of authors: W4230863633.
## Query each work separately by its identifier to get full list of authors.
## For example:
##   lapply(c("W4230863633"), \(x) oa_fetch(identifier = x))
## Details at https://docs.openalex.org/api-entities/authors/limitations.

id	display_name	first_author	last_author	is_oa	top_concepts
W2741809807	The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles	Heather Piwowar	Stefanie Haustein	TRUE	Citation, License, Bibliometrics
W2046766973	Sharing Detailed Research Data Is Associated with Increased Citation Rate	Heather Piwowar	Douglas B. Fridsma	TRUE	Citation, Clinical trial, Impact factor
W2045657963	Data reuse and the open data citation advantage	Heather Piwowar	Todd Vision	TRUE	Citation, Reuse
W1572136682	Altmetrics: Value all research products	Heather Piwowar	NA	TRUE	Altmetrics, Value (mathematics)
W2122130843	Scientometrics 2.0: New metrics of scholarly impact on the social Web	Jason Priem	Bradely H. Hemminger	FALSE	Bookmarking, Altmetrics, Social media
W1553564559	Altmetrics in the wild: Using social media to explore scholarly impact	Jason Priem	Bradley M. Hemminger	TRUE	Altmetrics, Social media, Citation

orcids <- c("0000-0003-3737-6565", "0000-0002-8517-9411")
canonical_orcids <- paste0("https://orcid.org/", orcids)
oa_fetch(
  entity = "works",
  author.orcid = canonical_orcids,
  verbose = TRUE
) %>% 
  show_works() %>% 
  knitr::kable()

## Requesting url: https://api.openalex.org/works?filter=author.orcid%3Ahttps%3A%2F%2Forcid.org%2F0000-0003-3737-6565%7Chttps%3A%2F%2Forcid.org%2F0000-0002-8517-9411

## Getting 2 pages of results with a total of 350 records...

## Warning in oa_request(oa_query(filter = filter_i, multiple_id = multiple_id, : 
## The following work(s) have truncated lists of authors: W3202287394, W3207775241.
## Query each work separately by its identifier to get full list of authors.
## For example:
##   lapply(c("W3202287394", "W3207775241"), \(x) oa_fetch(identifier = x))
## Details at https://docs.openalex.org/api-entities/authors/limitations.

id	display_name	first_author	last_author	is_oa	top_concepts
W2755950973	bibliometrix : An R-tool for comprehensive science mapping analysis	Massimo Aria	Corrado Cuccurullo	FALSE	Workflow, Bibliometrics, Software
W2777772618	Interoception and Mental Health: A Roadmap	Sahib S. Khalsa	Nancy Zucker	TRUE	Mental health, Perception
W2955219525	Scaling tree-based automated machine learning to biomedical big data with a feature set selector	Trang T. Le	Jason H. Moore	TRUE	Pipeline (software), Scalability, Feature (linguistics)
W3005144120	Mapping the Evolution of Social Research and Data Science on 30 Years of Social Indicators Research	Massimo Aria	Maria Spano	FALSE	Human geography, Data collection, Position (finance)
W2408216567	Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains	Corrado Cuccurullo	Fabrizia Sarto	FALSE	Domain (mathematical analysis), Content analysis, Public domain
W2952824318	A Nonlinear Simulation Framework Supports Adjusting for Age When Analyzing BrainAGE	Trang T. Le	Martin P. Paulus	TRUE	Nonlinear system

Example: We want to download all works that have been cited more than 50 times, published between 2020 and 2021, and include the strings “bibliometric analysis” or “science mapping” in the title. Maybe we also want the results to be sorted by total citations in a descending order.

Setting the argument count_only = TRUE, the function oa_request() returns the number of items matching the query without downloading the collection.

oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50", 
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  count_only = TRUE,
  verbose = TRUE
)

## Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20analysis%7Cscience%20mapping%2Ccited_by_count%3A%3E50%2Cfrom_publication_date%3A2020-01-01%2Cto_publication_date%3A2021-12-31&sort=cited_by_count%3Adesc

## $count
## [1] 485
## 
## $db_response_time_ms
## [1] 28
## 
## $page
## [1] 1
## 
## $per_page
## [1] 1
## 
## $groups_count
## NULL

We can now download the records and transform it into a tibble/data frame by setting count_only = FALSE (also the default value):

oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50", 
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  count_only = FALSE
) %>%
  show_works() %>%
  knitr::kable()

id	display_name	first_author	last_author	is_oa	top_concepts
W3160856016	How to conduct a bibliometric analysis: An overview and guidelines	Naveen Donthu	Weng Marc Lim	TRUE	Bibliometrics, Field (mathematics), Resource (disambiguation)
W3001491100	Software tools for conducting bibliometric analysis in science: An up-to-date review	José A. Moral-Muñoz	Manuel J. Cobo	TRUE	Bibliometrics, Visualization, Set (abstract data type)
W3038273726	Investigating the emerging COVID-19 research trends in the field of business and management: A bibliometric analysis approach	Surabhi Verma	Anders Gustafsson	TRUE	Bibliometrics, Field (mathematics), Disease
W3044902155	Financial literacy: A systematic review and bibliometric analysis	Kirti Goyal	Satish Kumar	FALSE	Financial literacy, Citation, Content analysis
W3198357836	Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis	John W. Goodell	Debidutta Pattnaik	FALSE	Scholarship, Valuation (finance), Corporate finance
W3042215340	A bibliometric analysis using VOSviewer of publications on COVID-19	Yuetian Yu	Erzhen Chen	TRUE	Citation, Bibliometrics, China

Read on to see how we can shorten these two function calls.

Authors

Similarly to work, we can use identifier to pass in authors’ OpenAlex ID.

Example: We want more information on authors with IDs A5069892096 and A5023888391.

oa_fetch(
  identifier = c("A5069892096", "A5023888391"),
  verbose = TRUE
) %>%
  show_authors() %>%
  knitr::kable()

## Requesting url: https://api.openalex.org/authors?filter=openalex%3AA5069892096%7CA5023888391

## Getting 1 page of results with a total of 2 records...

id	display_name	orcid	works_count	cited_by_count	top_concepts
A5069892096	Massimo Aria	0000-0002-8517-9411	220	14191	Physiology, Pathology and Forensic Medicine, Sociology and Political Science
A5023888391	Jason Priem	0000-0001-6187-6610	62	3914	Statistics, Probability and Uncertainty, Information Systems, Communication

Example: We want download all authors’ records of scholars who work at the University of Naples Federico II (OpenAlex ID: I71267560) and who have published more than 499 works.

Let’s first check how many records match the query, then set count_only = FALSE to download the entire collection. We can do this by first defining a list of arguments, then adding count_only (default FALSE) to this list:

my_arguments <- list(
  entity = "authors",
  last_known_institutions.id = "I71267560",
  works_count = ">499"
  )

do.call(oa_fetch, c(my_arguments, list(count_only = TRUE)))

## $count
## [1] 43
## 
## $db_response_time_ms
## [1] 102
## 
## $page
## [1] 1
## 
## $per_page
## [1] 1
## 
## $groups_count
## NULL

do.call(oa_fetch, my_arguments) %>% 
  show_authors() %>%
  knitr::kable()

id	display_name	orcid	works_count	cited_by_count	top_concepts
A5114377868	L. Lista	0000-0001-6471-5492	2952	136916	Nuclear and High Energy Physics, Nuclear and High Energy Physics, Nuclear and High Energy Physics
A5106552509	C. Sciacca	0000-0002-8412-4072	2763	101166	Nuclear and High Energy Physics, Nuclear and High Energy Physics, Nuclear and High Energy Physics
A5106315809	M. Merola	0000-0002-7082-8108	1360	73762	Nuclear and High Energy Physics, Nuclear and High Energy Physics, Nuclear and High Energy Physics
A5003544129	Annamaria Colao	0000-0001-6986-266X	1332	47405	Endocrinology, Diabetes and Metabolism, Endocrinology, Diabetes and Metabolism, Surgery
A5076706548	Salvatore Capozziello	0000-0003-4886-2024	1063	40201	Astronomy and Astrophysics, Nuclear and High Energy Physics, Astronomy and Astrophysics
A5026402548	Gabriella Fabbrocini	0000-0002-0064-1874	998	18401	Dermatology, Immunology, Dermatology

You can also use other filters such as display_name, has_orcid, and orcid:

oa_fetch(
  entity = "authors",
  display_name.search = "Massimo Aria",
  has_orcid = "true"
) %>%
  show_authors() %>%
  knitr::kable()

id	display_name	orcid	works_count	cited_by_count	top_concepts
A5069892096	Massimo Aria	0000-0002-8517-9411	220	14191	Physiology, Pathology and Forensic Medicine, Sociology and Political Science

oa_fetch(
  entity = "authors",
  orcid = "0000-0002-8517-9411"
) %>%
  show_authors() %>%
  knitr::kable()

id	display_name	orcid	works_count	cited_by_count	top_concepts
A5069892096	Massimo Aria	0000-0002-8517-9411	220	14191	Physiology, Pathology and Forensic Medicine, Sociology and Political Science

Institutions

Example: We want download all records regarding Italian institutions (country_code:it) that are classified as educational (type:education). Again, we check how many records match the query then download the collection:

italian_insts <- list(
  entity = "institutions",
  country_code = "it",
  type = "education",
  verbose = TRUE
)

do.call(oa_fetch, c(italian_insts, list(count_only = TRUE)))

## Requesting url: https://api.openalex.org/institutions?filter=country_code%3Ait%2Ctype%3Aeducation

## $count
## [1] 150
## 
## $db_response_time_ms
## [1] 82
## 
## $page
## [1] 1
## 
## $per_page
## [1] 1
## 
## $groups_count
## NULL

dplyr::glimpse(do.call(oa_fetch, italian_insts))

## Requesting url: https://api.openalex.org/institutions?filter=country_code%3Ait%2Ctype%3Aeducation

## Getting 1 page of results with a total of 150 records...

## Rows: 150
## Columns: 22
## $ id                         <chr> "https://openalex.org/I154387261", "https:/…
## $ display_name               <chr> "Vita-Salute San Raffaele University", "Mag…
## $ display_name_alternatives  <list> <"Università Vita-Salute San Raffaele", "U…
## $ display_name_acronyms      <list> NA, NA, NA, "UniSanRaffaele", NA, <"USGM",…
## $ international_display_name <list> <"Універсітэт Віта-Салютэ Сан-Рафаэле", "U…
## $ ror                        <chr> "https://ror.org/01gmqr298", "https://ror.o…
## $ ids                        <list> <"https://openalex.org/I154387261", "https…
## $ country_code               <chr> "IT", "IT", "IT", "IT", "IT", "IT", "IT", "…
## $ geo                        <list> [<data.frame[1 x 7]>], [<data.frame[1 x 7]…
## $ type                       <chr> "education", "education", "education", "edu…
## $ homepage_url               <chr> "https://www.unisr.it", "https://web.unicz.…
## $ image_url                  <chr> "https://commons.wikimedia.org/w/index.php?…
## $ image_thumbnail_url        <chr> "https://commons.wikimedia.org/w/index.php?…
## $ associated_institutions    <list> [<data.frame[1 x 6]>], [<data.frame[1 x 6]…
## $ works_count                <int> 34249, 16847, 15940, 12771, 9767, 4670, 393…
## $ cited_by_count             <int> 1135034, 421597, 410646, 571560, 291485, 91…
## $ counts_by_year             <list> [<data.frame[14 x 3]>], [<data.frame[14 x …
## $ summary_stats              <list> <4.819498, 371.000000, 15121.000000>, <4.1…
## $ works_api_url              <chr> "https://api.openalex.org/works?filter=inst…
## $ topics                     <list> [<tbl_df[100 x 5]>], [<tbl_df[100 x 5]>], …
## $ updated_date               <chr> "2025-07-16T02:01:41.989582", "2025-07-15T1…
## $ created_date               <chr> "2016-06-24", "2016-06-24", "2016-06-24", "…

Keywords

Example: We want to download the records of all the keywords that more than 1000 works were tagged with:

popular_keywords <- list(
  entity = "keywords",
  works_count = ">1000",
  verbose = TRUE
)

do.call(oa_fetch, c(popular_keywords, list(count_only = TRUE)))

## Requesting url: https://api.openalex.org/keywords?filter=works_count%3A%3E1000

## $count
## [1] 101
## 
## $db_response_time_ms
## [1] 18
## 
## $page
## [1] 1
## 
## $per_page
## [1] 1
## 
## $groups_count
## NULL

dplyr::glimpse(do.call(oa_fetch, popular_keywords))

## Requesting url: https://api.openalex.org/keywords?filter=works_count%3A%3E1000

## Getting 1 page of results with a total of 101 records...

## Rows: 101
## Columns: 7
## $ id             <chr> "https://openalex.org/keywords/diagnosis", "https://ope…
## $ display_name   <chr> "Diagnosis", "Second Language Acquisition", "Audio-Visu…
## $ works_count    <int> 267464, 127763, 117782, 91651, 73541, 71513, 64572, 636…
## $ cited_by_count <int> 888915, 1203886, 1259369, 482101, 808723, 335022, 87885…
## $ works_api_url  <chr> "https://api.openalex.org/works?filter=keywords.id:keyw…
## $ updated_date   <chr> "2024-04-15T13:29:44.932572", "2024-05-13T10:01:10.6536…
## $ created_date   <chr> "2024-04-10", "2024-04-10", "2024-04-10", "2024-04-10",…

Other examples

Get all works citing a particular work

We can download all publications citing another publication by using the filter attribute cites.

For example, if we want to download all publications citing the article Aria and Cuccurullo (2017), we have just to set the argument filter as cites = "W2755950973" where “W2755950973” is the OA id for the article by Aria and Cuccurullo.

aria_count <- oa_fetch(
  entity = "works",
  cites = "W2755950973",
  count_only = TRUE,
  verbose = TRUE
)

## Requesting url: https://api.openalex.org/works?filter=cites%3AW2755950973

aria_count

## $count
## [1] 10032
## 
## $db_response_time_ms
## [1] 46
## 
## $page
## [1] 1
## 
## $per_page
## [1] 1
## 
## $groups_count
## NULL

This query will return a collection of 10032 publications. Among these articles, let’s download the ones published in the following year:

oa_fetch(
  entity = "works",
  cites = "W2755950973",
  publication_year = 2018,
  count_only = FALSE,
  verbose = TRUE
) %>% 
  dplyr::glimpse()

## Requesting url: https://api.openalex.org/works?filter=cites%3AW2755950973%2Cpublication_year%3A2018

## Getting 1 page of results with a total of 32 records...

## Rows: 32
## Columns: 43
## $ id                          <chr> "https://openalex.org/W2896801517", "https…
## $ title                       <chr> "Global trends in infectious diseases of s…
## $ display_name                <chr> "Global trends in infectious diseases of s…
## $ authorships                 <list> [<tbl_df[2 x 7]>], [<tbl_df[2 x 7]>], [<t…
## $ abstract                    <chr> "Pork accounts for more than one-third of …
## $ doi                         <chr> "https://doi.org/10.1073/pnas.1806068115",…
## $ publication_date            <date> 2018-10-22, 2018-11-26, 2018-12-20, 2018-…
## $ publication_year            <int> 2018, 2018, 2018, 2018, 2018, 2018, 2018, …
## $ fwci                        <dbl> 4.076, 14.079, 7.875, 44.117, 12.804, 1.24…
## $ cited_by_count              <int> 260, 249, 196, 174, 129, 125, 118, 93, 87,…
## $ counts_by_year              <list> [<data.frame[8 x 2]>], [<data.frame[7 x 2…
## $ cited_by_api_url            <chr> "https://api.openalex.org/works?filter=cit…
## $ ids                         <list> <"https://openalex.org/W2896801517", "htt…
## $ type                        <chr> "review", "article", "article", "article",…
## $ is_oa                       <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALS…
## $ is_oa_anywhere              <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,…
## $ oa_status                   <chr> "bronze", "bronze", "green", "bronze", "go…
## $ oa_url                      <chr> "https://doi.org/10.1073/pnas.1806068115",…
## $ any_repository_has_fulltext <lgl> TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALS…
## $ source_display_name         <chr> "Proceedings of the National Academy of Sc…
## $ source_id                   <chr> "https://openalex.org/S125754415", "https:…
## $ issn_l                      <chr> "0027-8424", "0043-1397", "0169-5347", "00…
## $ host_organization           <chr> "https://openalex.org/P4310320052", "https…
## $ host_organization_name      <chr> "National Academy of Sciences", "Wiley", "…
## $ landing_page_url            <chr> "https://doi.org/10.1073/pnas.1806068115",…
## $ pdf_url                     <chr> NA, "https://agupubs.onlinelibrary.wiley.c…
## $ license                     <chr> NA, NA, NA, NA, "cc-by", NA, NA, "publishe…
## $ version                     <chr> "publishedVersion", "publishedVersion", NA…
## $ referenced_works            <list> <"https://openalex.org/W1530619192", "htt…
## $ referenced_works_count      <int> 23, 85, 97, 248, 84, 228, 75, 138, 78, 59,…
## $ related_works               <list> <"https://openalex.org/W4391486112", "htt…
## $ concepts                    <list> [<data.frame[26 x 5]>], [<data.frame[22 x…
## $ topics                      <list> [<tbl_df[12 x 5]>], [<tbl_df[12 x 5]>], […
## $ keywords                    <list> [<data.frame[3 x 3]>], [<data.frame[3 x 3…
## $ is_paratext                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ is_retracted                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ language                    <chr> "en", "en", "en", "en", "en", "en", "en", …
## $ grants                      <list> <"https://openalex.org/F4320332299", "Nat…
## $ apc                         <list> NA, [<data.frame[2 x 5]>], [<data.frame[2…
## $ first_page                  <chr> "11495", "378", "224", "12", "e0207655", "…
## $ last_page                   <chr> "11500", "390", "238", "63", "e0207655", "…
## $ volume                      <chr> "115", "55", "34", "50", "13", "205", "45"…
## $ issue                       <chr> "45", "1", "3", "1", "11", NA, "3", "4-5",…

Convert an OpenAlex data frame to a bibliometrix object

The bibliometrix R-package (https://www.bibliometrix.org) provides a set of tools for quantitative research in bibliometrics and scientometrics. Today it represents one of the most used science mapping software in the world. In a recent survey on bibliometric analysis tools, Moral-Muñoz et al. (2020) wrote: “At this moment, maybe Bibliometrix and its Shiny platform contain the more extensive set of techniques implemented, and together with the easiness of its interface, could be a great software for practitioners”.

The function oa2bibliometrix converts a bibliographic data frame of works into a bibliometrix object. This object can be used as input collection of a science mapping workflow.

bib_ls <- list(
  identifier = NULL,
  entity = "works",
  cites = "W2755950973",
  from_publication_date = "2022-01-01",
  to_publication_date = "2022-03-31"
)

do.call(oa_fetch, c(bib_ls, list(count_only = TRUE)))

## $count
## [1] 405
## 
## $db_response_time_ms
## [1] 44
## 
## $page
## [1] 1
## 
## $per_page
## [1] 1
## 
## $groups_count
## NULL

do.call(oa_fetch, bib_ls) %>% 
  oa2bibliometrix() %>% 
  dplyr::glimpse()

## Warning in oa2bibliometrix(.): oa2bibliometrix() is deprecated. Please use
## bibliometrix::convert2df() instead.

## Rows: 405
## Columns: 59
## $ AU                          <chr> "PRAKASH CHANDRA BAHUGUNA;RAJEEV SRIVASTAV…
## $ RP                          <chr> "SCHOOL OF BUSINESS, UNIVERSITY OF PETROLE…
## $ C1                          <chr> "SCHOOL OF BUSINESS, UNIVERSITY OF PETROLE…
## $ AU_UN                       <chr> "UNIVERSITY OF PETROLEUM AND ENERGY STUDIE…
## $ AU_CO                       <chr> "INDIA;INDIA;INDIA", "INDIA;INDIA;INDIA", …
## $ ID                          <chr> "WASTEWATER;ENVIRONMENTAL SCIENCE;WEB OF S…
## $ id_url                      <chr> "https://openalex.org/W4210864411", "https…
## $ title                       <chr> "Wastewater treatment and emerging contami…
## $ authorships                 <list> [<tbl_df[3 x 7]>], [<tbl_df[4 x 7]>], [<t…
## $ abstract                    <chr> NA, "Abstract Conversational agents are sy…
## $ doi                         <chr> "https://doi.org/10.1016/j.chemosphere.202…
## $ publication_date            <date> 2022-02-08, 2022-03-08, 2022-03-01, 2022-…
## $ fwci                        <dbl> 4.565, 27.467, 76.072, 34.370, 9.519, 8.55…
## $ counts_by_year              <list> [<data.frame[4 x 2]>], [<data.frame[4 x 2…
## $ cited_by_api_url            <chr> "https://api.openalex.org/works?filter=cit…
## $ ids                         <list> <"https://openalex.org/W4210864411", "htt…
## $ is_oa                       <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FAL…
## $ is_oa_anywhere              <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FAL…
## $ oa_status                   <chr> "closed", "closed", "bronze", "closed", "h…
## $ oa_url                      <chr> NA, NA, "https://link.springer.com/content…
## $ any_repository_has_fulltext <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FA…
## $ source_display_name         <chr> "Chemosphere", "Psychology and Marketing",…
## $ source_id                   <chr> "https://openalex.org/S203465130", "https:…
## $ issn_l                      <chr> "0045-6535", "0742-6046", "1019-6781", "00…
## $ host_organization           <chr> "https://openalex.org/P4310320990", "https…
## $ host_organization_name      <chr> "Elsevier BV", "Wiley", "Springer Science+…
## $ landing_page_url            <chr> "https://doi.org/10.1016/j.chemosphere.202…
## $ pdf_url                     <chr> NA, NA, "https://link.springer.com/content…
## $ license                     <chr> NA, NA, NA, NA, "cc-by", NA, NA, "cc-by-nc…
## $ version                     <chr> NA, NA, "publishedVersion", NA, "published…
## $ referenced_works            <list> <"https://openalex.org/W1854025783", "htt…
## $ referenced_works_count      <int> 88, 182, 394, 154, 79, 40, 54, 160, 119, 5…
## $ related_works               <list> <"https://openalex.org/W4394593659", "htt…
## $ concepts                    <list> [<data.frame[16 x 5]>], [<data.frame[24 x…
## $ topics                      <list> [<tbl_df[12 x 5]>], [<tbl_df[12 x 5]>], […
## $ keywords                    <list> [<data.frame[1 x 3]>], [<data.frame[2 x 3…
## $ is_paratext                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ is_retracted                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ language                    <chr> "en", "en", "en", "en", "en", "en", "en", …
## $ grants                      <list> <"https://openalex.org/F4320321001", "Nat…
## $ apc                         <list> [<data.frame[2 x 5]>], [<data.frame[2 x 5…
## $ first_page                  <chr> "133932", "1129", "297", "7527", "104608",…
## $ last_page                   <chr> "133932", "1155", "338", "7550", "104608",…
## $ volume                      <chr> "297", "39", "32", "60", "136", "14", "30"…
## $ issue                       <chr> NA, "6", "1", "24", NA, "7", "2", NA, NA, …
## $ id_oa                       <chr> "W4210864411", "W4220991995", "W4220923293…
## $ CR                          <chr> "W1854025783;W1896090423;W1965064785;W1990…
## $ TI                          <chr> "WASTEWATER TREATMENT AND EMERGING CONTAMI…
## $ AB                          <chr> NA, "ABSTRACT CONVERSATIONAL AGENTS ARE SY…
## $ SO                          <chr> "CHEMOSPHERE", "PSYCHOLOGY AND MARKETING",…
## $ DT                          <chr> "REVIEW", "ARTICLE", "ARTICLE", "ARTICLE",…
## $ DB                          <chr> "OPENALEX", "OPENALEX", "OPENALEX", "OPENA…
## $ JI                          <chr> "S203465130", "S102896891", "S137519996", …
## $ J9                          <chr> "S203465130", "S102896891", "S137519996", …
## $ PY                          <int> 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
## $ TC                          <int> 223, 212, 211, 180, 154, 143, 134, 131, 13…
## $ DI                          <chr> "10.1016/j.chemosphere.2022.133932", "10.1…
## $ SR_FULL                     <chr> "PRAKASH CHANDRA BAHUGUNA, 2022, CHEMOSPHE…
## $ SR                          <chr> "PRAKASH CHANDRA BAHUGUNA, 2022, CHEMOSPHE…

About OpenAlex

OpenAlex is a fully open catalog of the global research system. It’s named after the ancient Library of Alexandria. The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities:

Works are papers, books, datasets, etc; they cite other works
Authors are people who create works
Institutions are universities and other orgs that are affiliated with works (via authors)
Concepts tag Works with a topic

Acknowledgements

Package hex was made with Midjourney and thus inherits a CC BY-NC 4.0 license.