Skip to contents

 

https://github.com/ropensci/openalexR

Latest version: 1.2.2.9999, 2024-04-20

 

by Massimo Aria

Full Professor in Social Statistics

PhD in Computational Statistics

Laboratory and Research Group STAD Statistics, Technology, Data Analysis

Department of Economics and Statistics

University of Naples Federico II

email

https://massimoaria.com

 

An R-package to gather bibliographic data from OpenAlex

openalexR helps you interface with the OpenAlex API to retrieve bibliographic infomation about publications, authors, institutions, sources, funders, publishers, topics and concepts with 5 main functions:

  • oa_query(): generates a valid query, written following the OpenAlex API syntax, from a set of arguments provided by the user.

  • oa_request(): downloads a collection of entities matching the query created by oa_query() or manually written by the user, and returns a JSON object in a list format.

  • oa2df(): converts the JSON object in classical bibliographic tibble/data frame.

  • oa_fetch(): composes three functions above so the user can execute everything in one step, i.e., oa_query |> oa_request |> oa2df

  • oa_random(): to get random entity, e.g., oa_random("works") gives a different work each time you run it

Works (think papers, publications)

This paper:

Aria, M., & Cuccurullo, C. (2017). bibliometrix: 
An R-tool for comprehensive science mapping analysis. 
Journal of informetrics, 11(4), 959-975.

is associated to the OpenAlex-id W2755950973. If you know your paper’s OpenAlex ID, all you need to do is passing identifier = <openalex id> as an argument in oa_fetch():

paper_id <- oa_fetch(
  identifier = "W2755950973",
  entity = "works",
  verbose = TRUE
)
## Requesting url: https://api.openalex.org/works/W2755950973
dplyr::glimpse(paper_id)
## Rows: 1
## Columns: 38
## $ id                          <chr> "https://openalex.org/W2755950973"
## $ title                       <chr> "bibliometrix : An R-tool for comprehensiv…
## $ display_name                <chr> "bibliometrix : An R-tool for comprehensiv…
## $ author                      <list> [<data.frame[2 x 12]>]
## $ ab                          <chr> "The use of bibliometrics is gradually ext…
## $ publication_date            <chr> "2017-11-01"
## $ so                          <chr> "Journal of informetrics"
## $ so_id                       <chr> "https://openalex.org/S205292342"
## $ host_organization           <chr> "Elsevier BV"
## $ issn_l                      <chr> "1751-1577"
## $ url                         <chr> "https://doi.org/10.1016/j.joi.2017.08.007"
## $ pdf_url                     <lgl> NA
## $ license                     <lgl> NA
## $ version                     <lgl> NA
## $ first_page                  <chr> "959"
## $ last_page                   <chr> "975"
## $ volume                      <chr> "11"
## $ issue                       <chr> "4"
## $ is_oa                       <lgl> FALSE
## $ is_oa_anywhere              <lgl> FALSE
## $ oa_status                   <chr> "closed"
## $ oa_url                      <lgl> NA
## $ any_repository_has_fulltext <lgl> FALSE
## $ language                    <chr> "en"
## $ grants                      <lgl> NA
## $ cited_by_count              <int> 4702
## $ counts_by_year              <list> [<data.frame[10 x 2]>]
## $ publication_year            <int> 2017
## $ cited_by_api_url            <chr> "https://api.openalex.org/works?filter=ci…
## $ ids                         <list> <"https://openalex.org/W2755950973", "http…
## $ doi                         <chr> "https://doi.org/10.1016/j.joi.2017.08.007"
## $ type                        <chr> "article"
## $ referenced_works            <list> <"https://openalex.org/W767067438", "https…
## $ related_works               <list> <"https://openalex.org/W4240783740", "http…
## $ is_paratext                 <lgl> FALSE
## $ is_retracted                <lgl> FALSE
## $ concepts                    <list> [<data.frame[10 x 5]>]
## $ topics                      <list> [<tbl_df[12 x 5]>]

oa_fetch() is a composition of functions: oa_query |> oa_request |> oa2df. As results, oa_query() returns the query string including the OpenAlex endpoint API server address (default). oa_request() downloads the bibliographic records matching the query. Finally, oa2df() converts the final result list to a tibble. The final result is a complicated tibble, but we can use show_works() to display a simplified version:

paper_id %>% 
  show_works() %>%
  knitr::kable()
id display_name first_author last_author so url is_oa top_concepts
W2755950973 bibliometrix : An R-tool for comprehensive science mapping analysis Massimo Aria Corrado Cuccurullo Journal of informetrics https://doi.org/10.1016/j.joi.2017.08.007 FALSE Workflow, Bibliometrics, Software

External id formats

OpenAlex endpoint accepts OpenAlex IDs and other external IDs (e.g., DOI, ISSN) in several formats, including Digital Object Identifier (DOI) and Persistent Identifiers (PIDs).

oa_fetch(
  # identifier = "https://doi.org/10.1016/j.joi.2017.08.007", # would also work (PIDs)
  identifier = "doi:10.1016/j.joi.2017.08.007",
  entity = "works"
) %>% 
  show_works() %>%
  knitr::kable()
id display_name first_author last_author so url is_oa top_concepts
W2755950973 bibliometrix : An R-tool for comprehensive science mapping analysis Massimo Aria Corrado Cuccurullo Journal of informetrics https://doi.org/10.1016/j.joi.2017.08.007 FALSE Workflow, Bibliometrics, Software

More than one publications/authors

https://api.openalex.org/authors/https://orcid.org/

If you know the OpenAlex IDs of these entities, you can also feed them into the identifier argument.

oa_fetch(
  identifier = c("W2741809807", "W2755950973"),
  # identifier = c("https://doi.org/10.1016/j.joi.2017.08.007", "https://doi.org/10.1016/j.joi.2017.08.007"), # TODO
  entity = "works",
  verbose = TRUE
) %>% 
  show_works() %>%
  knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=openalex%3AW2741809807%7CW2755950973
## Getting 1 page of results with a total of 2 records...
id display_name first_author last_author so url is_oa top_concepts
W2755950973 bibliometrix : An R-tool for comprehensive science mapping analysis Massimo Aria Corrado Cuccurullo Journal of informetrics https://doi.org/10.1016/j.joi.2017.08.007 FALSE Workflow, Bibliometrics, Software
W2741809807 The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles Heather Piwowar Stefanie Haustein PeerJ https://doi.org/10.7717/peerj.4375 TRUE Citation, License, Bibliometrics

However, if you only know their external identifies, say, DOIs, you would need to use doi as a filter (either the canonical form with https://doi.org/ or without should work):

oa_fetch(
  # identifier = c("W2741809807", "W2755950973"),
  doi = c("10.1016/j.joi.2017.08.007", "https://doi.org/10.1093/bioinformatics/btab727"),
  entity = "works",
  verbose = TRUE
) %>% 
  show_works() %>%
  knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=doi%3A10.1016%2Fj.joi.2017.08.007%7Chttps%3A%2F%2Fdoi.org%2F10.1093%2Fbioinformatics%2Fbtab727
## Getting 1 page of results with a total of 2 records...
id display_name first_author last_author so url is_oa top_concepts
W2755950973 bibliometrix : An R-tool for comprehensive science mapping analysis Massimo Aria Corrado Cuccurullo Journal of informetrics https://doi.org/10.1016/j.joi.2017.08.007 FALSE Workflow, Bibliometrics, Software
W3206431085 PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods Joseph D. Romano Jason H. Moore Bioinformatics https://doi.org/10.1093/bioinformatics/btab727 TRUE Python (programming language), Benchmarking, Benchmark (surveying)

Filters

In most cases, we are interested in downloading a collection of items that meet one or more inclusion/exclusion criteria (filters). Supported filters for each entity are listed here.

Example: We want to download all works published by a set of authors. We can do this by filtering on the authorships.author.id/author.id or authorships.author.orcid/author.orcid attribute (see more on works attributes):

oa_fetch(
  entity = "works",
  author.id = c("A5048491430", "A5023888391"),
  verbose = TRUE
) %>% 
  show_works() %>% 
  knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=author.id%3AA5048491430%7CA5023888391
## Getting 1 page of results with a total of 124 records...
id display_name first_author last_author so url is_oa top_concepts
W2046766973 Sharing Detailed Research Data Is Associated with Increased Citation Rate Heather Piwowar Douglas B. Fridsma PloS one https://doi.org/10.1371/journal.pone.0000308 TRUE Citation, Clinical trial, Impact factor
W2741809807 The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles Heather Piwowar Stefanie Haustein PeerJ https://doi.org/10.7717/peerj.4375 TRUE Citation, License, Bibliometrics
W2045657963 Data reuse and the open data citation advantage Heather Piwowar NA PeerJ https://doi.org/10.7717/peerj.175 TRUE Citation, Reuse
W1572136682 Altmetrics: Value all research products Heather Piwowar NA Nature https://doi.org/10.1038/493159a TRUE Altmetrics, Value (mathematics)
W2122130843 Scientometrics 2.0: New metrics of scholarly impact on the social Web Jason Priem Bradely H. Hemminger First Monday https://doi.org/10.5210/fm.v15i7.2874 FALSE Bookmarking, Altmetrics, Social media
W2038196424 Coverage and adoption of altmetrics sources in the bibliometric community Stefanie Haustein Jens Terliesner Scientometrics https://doi.org/10.1007/s11192-013-1221-3 FALSE Altmetrics, Bookmarking, Social media
orcids <- c("0000-0003-3737-6565", "0000-0002-8517-9411")
canonical_orcids <- paste0("https://orcid.org/", orcids)
oa_fetch(
  entity = "works",
  author.orcid = canonical_orcids,
  verbose = TRUE
) %>% 
  show_works() %>% 
  knitr::kable()
## Requesting url: https://api.openalex.org/works?filter=author.orcid%3Ahttps%3A%2F%2Forcid.org%2F0000-0003-3737-6565%7Chttps%3A%2F%2Forcid.org%2F0000-0002-8517-9411
## Getting 2 pages of results with a total of 211 records...
id display_name first_author last_author so url is_oa top_concepts
W2755950973 bibliometrix : An R-tool for comprehensive science mapping analysis Massimo Aria Corrado Cuccurullo Journal of informetrics https://doi.org/10.1016/j.joi.2017.08.007 FALSE Workflow, Bibliometrics, Software
W2955219525 Scaling tree-based automated machine learning to biomedical big data with a feature set selector Trang T. Le Jason H. Moore Bioinformatics https://doi.org/10.1093/bioinformatics/btz470 TRUE Pipeline (software), Scalability, Feature (linguistics)
W2408216567 Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains Corrado Cuccurullo Fabrizia Sarto Scientometrics https://doi.org/10.1007/s11192-016-1948-8 FALSE Domain (mathematical analysis), Content analysis, Public domain
W3005144120 Mapping the Evolution of Social Research and Data Science on 30 Years of Social Indicators Research Massimo Aria Maria Spano Social indicators research https://doi.org/10.1007/s11205-020-02281-3 FALSE Human geography, Data collection, Position (finance)
W4221118572 Thematic Analysis as a New Culturomic Tool: The Social Media Coverage on COVID-19 Pandemic in Italy Massimo Aria Maria Spano Sustainability https://doi.org/10.3390/su14063643 TRUE Thematic map, Social media, Bibliometrics
W4319869569 Systematic literature review of 10 years of cyclist safety research Antonella Scarano Alfonso Montella Accident analysis and prevention https://doi.org/10.1016/j.aap.2023.106996 FALSE Centrality, Crash, SAFER

Example: We want to download all works that have been cited more than 50 times, published between 2020 and 2021, and include the strings “bibliometric analysis” or “science mapping” in the title. Maybe we also want the results to be sorted by total citations in a descending order.

Setting the argument count_only = TRUE, the function oa_request() returns the number of items matching the query without downloading the collection.

oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50", 
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  count_only = TRUE,
  verbose = TRUE
)
## Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20analysis%7Cscience%20mapping%2Ccited_by_count%3A%3E50%2Cfrom_publication_date%3A2020-01-01%2Cto_publication_date%3A2021-12-31&sort=cited_by_count%3Adesc
##      count db_response_time_ms page per_page
## [1,]   218                  99    1        1

We can now download the records and transform it into a tibble/data frame by setting count_only = FALSE (also the default value):

oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50", 
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  count_only = FALSE
) %>%
  show_works() %>%
  knitr::kable()
id display_name first_author last_author so url is_oa top_concepts
W3160856016 How to conduct a bibliometric analysis: An overview and guidelines Naveen Donthu Weng Marc Lim Journal of business research https://doi.org/10.1016/j.jbusres.2021.04.070 TRUE Bibliometrics, Field (mathematics), Resource (disambiguation)
W3038273726 Investigating the emerging COVID-19 research trends in the field of business and management: A bibliometric analysis approach Surabhi Verma Anders Gustafsson Journal of business research https://doi.org/10.1016/j.jbusres.2020.06.057 TRUE Bibliometrics, Field (mathematics), Empirical research
W2990450011 Forty-five years of Journal of Business Research: A bibliometric analysis Naveen Donthu Debidutta Pattnaik Journal of business research https://doi.org/10.1016/j.jbusres.2019.10.039 FALSE Publishing, Bibliometrics, Empirical research
W3001491100 Software tools for conducting bibliometric analysis in science: An up-to-date review José A. Moral-Muñoz Manuel J. Cobo ˜El œProfesional de la información https://doi.org/10.3145/epi.2020.ene.03 TRUE Bibliometrics, Visualization, Set (abstract data type)
W3044902155 Financial literacy: A systematic review and bibliometric analysis Kirti Goyal Satish Kumar International journal of consumer studies https://doi.org/10.1111/ijcs.12605 FALSE Financial literacy, Content analysis, Citation
W2990688366 A bibliometric analysis of board diversity: Current status, development, and future research directions H. Kent Baker Arunima Haldar Journal of business research https://doi.org/10.1016/j.jbusres.2019.11.025 FALSE Diversity (politics), Ethnic group, Bibliometrics

Read on to see how we can shorten these two function calls.

Authors

Similarly to work, we can use identifier to pass in authors’ OpenAlex ID.

Example: We want more information on authors with IDs A5069892096 and A5023888391.

oa_fetch(
  identifier = c("A5069892096", "A5023888391"),
  verbose = TRUE
) %>%
  show_authors() %>%
  knitr::kable()
## Requesting url: https://api.openalex.org/authors?filter=openalex%3AA5069892096%7CA5023888391
## Getting 1 page of results with a total of 2 records...
id display_name orcid works_count cited_by_count affiliation_display_name top_concepts
A5069892096 Massimo Aria 0000-0002-8517-9411 183 7375 University of Naples Federico II Statistics, Internal medicine, Pathology
A5023888391 Jason Priem 0000-0001-6187-6610 67 2397 OurResearch World Wide Web, Library science, Law

Example: We want download all authors’ records of scholars who work at the University of Naples Federico II (OpenAlex ID: I71267560) and who have published more than 499 works.

Let’s first check how many records match the query, then set count_only = FALSE to download the entire collection. We can do this by first defining a list of arguments, then adding count_only (default FALSE) to this list:

my_arguments <- list(
  entity = "authors",
  last_known_institution.id = "I71267560",
  works_count = ">499"
  )

do.call(oa_fetch, c(my_arguments, list(count_only = TRUE)))
##      count db_response_time_ms page per_page
## [1,]    25                 132    1        1
do.call(oa_fetch, my_arguments) %>% 
  show_authors() %>%
  knitr::kable()
id display_name orcid works_count cited_by_count affiliation_display_name top_concepts
A5072318694 G. Chiefari NA 878 46298 University of Naples Federico II Quantum mechanics, Particle physics, Nuclear physics
A5023058736 F. Fienga 0000-0001-5978-4952 855 16983 University of Naples Federico II Quantum mechanics, Nuclear physics, Particle physics
A5035636337 S. Patricelli NA 793 43469 University of Naples Federico II Quantum mechanics, Particle physics, Nuclear physics
A5026402548 Gabriella Fabbrocini 0000-0002-0064-1874 729 11375 University of Naples Federico II Dermatology, Internal medicine, Pathology
A5057084037 Fabrizio Pane 0000-0003-2563-4125 724 19861 University of Naples Federico II Internal medicine, Immunology, Genetics
A5078562748 Sabino De Placido 0000-0001-5077-6286 697 25286 University of Naples Federico II Genetics, Internal medicine, Oncology

You can also use other filters such as display_name, has_orcid, and orcid:

oa_fetch(
  entity = "authors",
  display_name.search = "Massimo Aria",
  has_orcid = "true"
) %>%
  show_authors() %>%
  knitr::kable()
id display_name orcid works_count cited_by_count affiliation_display_name top_concepts
A5069892096 Massimo Aria 0000-0002-8517-9411 183 7375 University of Naples Federico II Statistics, Internal medicine, Pathology
oa_fetch(
  entity = "authors",
  orcid = "0000-0002-8517-9411"
) %>%
  show_authors() %>%
  knitr::kable()
id display_name orcid works_count cited_by_count affiliation_display_name top_concepts
A5069892096 Massimo Aria 0000-0002-8517-9411 183 7375 University of Naples Federico II Statistics, Internal medicine, Pathology

Institutions

Example: We want download all records regarding Italian institutions (country_code:it) that are classified as educational (type:education). Again, we check how many records match the query then download the collection:

italian_insts <- list(
  entity = "institutions",
  country_code = "it",
  type = "education",
  verbose = TRUE
)

do.call(oa_fetch, c(italian_insts, list(count_only = TRUE)))
## Requesting url: https://api.openalex.org/institutions?filter=country_code%3Ait%2Ctype%3Aeducation
##      count db_response_time_ms page per_page
## [1,]   232                  27    1        1
dplyr::glimpse(do.call(oa_fetch, italian_insts))
## Requesting url: https://api.openalex.org/institutions?filter=country_code%3Ait%2Ctype%3Aeducation
## Getting 2 pages of results with a total of 232 records...
## Rows: 232
## Columns: 21
## $ id                         <chr> "https://openalex.org/I861853513", "https:/…
## $ display_name               <chr> "Sapienza University of Rome", "University …
## $ display_name_alternatives  <list> <"Universitat de Roma La Sapienza", "Rimsk…
## $ display_name_acronyms      <list> NA, "UNIMI", "UNIBO", "UNIPD", NA, NA, NA,…
## $ display_name_international <list> <"Universiteit van Rome", "جامعة روما سابي…
## $ ror                        <chr> "https://ror.org/02be6w209", "https://ror.o…
## $ ids                        <list> <"https://openalex.org/I861853513", "https…
## $ country_code               <chr> "IT", "IT", "IT", "IT", "IT", "IT", "IT", "…
## $ geo                        <list> [<data.frame[1 x 7]>], [<data.frame[1 x 7]…
## $ type                       <chr> "education", "education", "education", "edu…
## $ homepage_url               <chr> "http://www.uniroma1.it/", "http://www.unim…
## $ image_url                  <chr> "https://commons.wikimedia.org/w/index.php?…
## $ image_thumbnail_url        <chr> "https://commons.wikimedia.org/w/index.php?…
## $ associated_institutions    <list> [<data.frame[4 x 6]>], [<data.frame[2 x 6]…
## $ works_count                <int> 190517, 173483, 167505, 163655, 118246, 116…
## $ cited_by_count             <int> 3908596, 4324338, 3693077, 4023890, 2458548…
## $ counts_by_year             <list> [<data.frame[13 x 3]>], [<data.frame[13 x …
## $ works_api_url              <chr> "https://api.openalex.org/works?filter=inst…
## $ x_concepts                 <list> [<data.frame[13 x 5]>], [<data.frame[14 x …
## $ updated_date               <chr> "2024-04-19T08:00:13.084478", "2024-04-19T2…
## $ created_date               <chr> "2016-06-24", "2016-06-24", "2016-06-24", "…

Concepts (think theme, keywords)

Example: We want to download the records of all the concepts that concern at least one million works:

popular_concepts <- list(
  entity = "concepts",
  works_count = ">1000000",
  verbose = TRUE
)

do.call(oa_fetch, c(popular_concepts, list(count_only = TRUE)))
## Requesting url: https://api.openalex.org/concepts?filter=works_count%3A%3E1000000
##      count db_response_time_ms page per_page
## [1,]   263                  33    1        1
dplyr::glimpse(do.call(oa_fetch, popular_concepts))
## Requesting url: https://api.openalex.org/concepts?filter=works_count%3A%3E1000000
## Getting 2 pages of results with a total of 263 records...
## Rows: 263
## Columns: 16
## $ id                         <chr> "https://openalex.org/C41008148", "https://…
## $ display_name               <chr> "Computer science", "Medicine", "Biology", …
## $ display_name_international <list> <"informatika", "የኮምፒውተር፡ጥናት", "Informatic…
## $ description                <chr> "study of computation", "field of study for…
## $ description_international  <list> <"studie van berekening en inligtingverwer…
## $ wikidata                   <chr> "https://www.wikidata.org/wiki/Q21198", "ht…
## $ level                      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1…
## $ ids                        <list> <"https://openalex.org/C41008148", "https:…
## $ image_url                  <chr> "https://upload.wikimedia.org/wikipedia/com…
## $ image_thumbnail_url        <chr> "https://upload.wikimedia.org/wikipedia/com…
## $ ancestors                  <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, [<…
## $ related_concepts           <list> [<data.frame[93 x 5]>], [<data.frame[51 x …
## $ works_count                <int> 88004985, 61707722, 45777177, 42230377, 368…
## $ cited_by_count             <int> 504575855, 664080278, 734137838, 473703760,…
## $ counts_by_year             <list> [<data.frame[13 x 3]>], [<data.frame[13 x …
## $ works_api_url              <chr> "https://api.openalex.org/works?filter=conc…

Other examples

Get all works citing a particular work

We can download all publications citing another publication by using the filter attribute cites.

For example, if we want to download all publications citing the article Aria and Cuccurullo (2017), we have just to set the argument filter as cites = "W2755950973" where “W2755950973” is the OA id for the article by Aria and Cuccurullo.

aria_count <- oa_fetch(
  entity = "works",
  cites = "W2755950973",
  count_only = TRUE,
  verbose = TRUE
) 
## Requesting url: https://api.openalex.org/works?filter=cites%3AW2755950973
aria_count
##      count db_response_time_ms page per_page
## [1,]  4808                  74    1        1

This query will return a collection of NA publications. Among these articles, let’s download the ones published in the following year:

oa_fetch(
  entity = "works",
  cites = "W2755950973",
  publication_year = 2018,
  count_only = FALSE,
  verbose = TRUE
) %>% 
  dplyr::glimpse()
## Requesting url: https://api.openalex.org/works?filter=cites%3AW2755950973%2Cpublication_year%3A2018
## Getting 1 page of results with a total of 23 records...
## Rows: 23
## Columns: 38
## $ id                          <chr> "https://openalex.org/W2906775602", "https…
## $ title                       <chr> "Revisiting five decades of educational te…
## $ display_name                <chr> "Revisiting five decades of educational te…
## $ author                      <list> [<data.frame[3 x 12]>], [<data.frame[2 x …
## $ ab                          <chr> "Abstract Reflecting on 50 years of educat…
## $ publication_date            <chr> "2018-12-26", "2018-11-29", "2018-11-05", …
## $ so                          <chr> "British journal of educational technology…
## $ so_id                       <chr> "https://openalex.org/S110346167", "https:…
## $ host_organization           <chr> "Wiley-Blackwell", "Public Library of Scie…
## $ issn_l                      <chr> "0007-1013", "1932-6203", "0300-3930", "24…
## $ url                         <chr> "https://doi.org/10.1111/bjet.12730", "htt…
## $ pdf_url                     <chr> "https://onlinelibrary.wiley.com/doi/pdfdi…
## $ license                     <chr> NA, "cc-by", NA, "publisher-specific-oa", …
## $ version                     <chr> "publishedVersion", "publishedVersion", NA…
## $ first_page                  <chr> "12", "e0207655", "308", "162", "e0199706"…
## $ last_page                   <chr> "63", "e0207655", "327", "176", "e0199706"…
## $ volume                      <chr> "50", "13", "45", "329", "13", "101", NA, …
## $ issue                       <chr> "1", "11", "3", "4-5", "6", "12", NA, "3",…
## $ is_oa                       <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE…
## $ is_oa_anywhere              <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE…
## $ oa_status                   <chr> "bronze", "gold", "closed", "bronze", "gol…
## $ oa_url                      <chr> "https://onlinelibrary.wiley.com/doi/pdfdi…
## $ any_repository_has_fulltext <lgl> FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FA…
## $ language                    <chr> "en", "en", "en", "en", "en", "en", "en", …
## $ grants                      <list> <"https://openalex.org/F4320321114", "Bun…
## $ cited_by_count              <int> 111, 96, 88, 72, 67, 61, 42, 41, 23, 21, 2…
## $ counts_by_year              <list> [<data.frame[8 x 2]>], [<data.frame[6 x 2…
## $ publication_year            <int> 2018, 2018, 2018, 2018, 2018, 2018, 2018, …
## $ cited_by_api_url            <chr> "https://api.openalex.org/works?filter=cit…
## $ ids                         <list> <"https://openalex.org/W2906775602", "htt…
## $ doi                         <chr> "https://doi.org/10.1111/bjet.12730", "htt…
## $ type                        <chr> "article", "article", "article", "article"…
## $ referenced_works            <list> <"https://openalex.org/W137366450", "http…
## $ related_works               <list> <"https://openalex.org/W1982686870", "htt…
## $ is_paratext                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ is_retracted                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ concepts                    <list> [<data.frame[21 x 5]>], [<data.frame[8 x …
## $ topics                      <list> [<tbl_df[12 x 5]>], [<tbl_df[12 x 5]>], […

Convert an OpenAlex data frame to a bibliometrix object

The bibliometrix R-package (https://www.bibliometrix.org) provides a set of tools for quantitative research in bibliometrics and scientometrics. Today it represents one of the most used science mapping software in the world. In a recent survey on bibliometric analysis tools, Moral-Muñoz et al. (2020) wrote: “At this moment, maybe Bibliometrix and its Shiny platform contain the more extensive set of techniques implemented, and together with the easiness of its interface, could be a great software for practitioners”.

The function oa2bibliometrix converts a bibliographic data frame of works into a bibliometrix object. This object can be used as input collection of a science mapping workflow.

bib_ls <- list(
  identifier = NULL,
  entity = "works",
  cites = "W2755950973",
  from_publication_date = "2022-01-01",
  to_publication_date = "2022-03-31"
)

do.call(oa_fetch, c(bib_ls, list(count_only = TRUE)))
##      count db_response_time_ms page per_page
## [1,]   322                  27    1        1
do.call(oa_fetch, bib_ls) %>% 
  oa2bibliometrix() %>% 
  dplyr::glimpse()
## Rows: 322
## Columns: 52
## $ AU                          <chr> "WENG MARC LIM;SATISH KUMAR;SANJEEV VERMA;…
## $ RP                          <chr> "FACULTY OF BUSINESS, DESIGN AND ARTS SWIN…
## $ C1                          <chr> "FACULTY OF BUSINESS, DESIGN AND ARTS SWIN…
## $ AU_UN                       <chr> "SWINBURNE UNIVERSITY OF TECHNOLOGY SARAWA…
## $ AU_CO                       <chr> "MALAYSIA;INDIA;INDIA;INDIA", "CHINA;CHINA…
## $ ID                          <chr> "CITATION;FIELD (MATHEMATICS);SERVICE (BUS…
## $ id_url                      <chr> "https://openalex.org/W4220991995", "https…
## $ title                       <chr> "Alexa, what do we know about conversation…
## $ author                      <list> [<data.frame[4 x 12]>], [<data.frame[4 x …
## $ publication_date            <chr> "2022-03-08", "2022-01-01", "2022-02-09", …
## $ so_id                       <chr> "https://openalex.org/S102896891", "https:…
## $ host_organization           <chr> "Wiley-Blackwell", "Elsevier BV", "Taylor …
## $ issn_l                      <chr> "0742-6046", "1364-0321", "0020-7543", "00…
## $ url                         <chr> "https://doi.org/10.1002/mar.21654", "http…
## $ pdf_url                     <chr> NA, NA, NA, NA, "https://link.springer.com…
## $ license                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, "cc-by", N…
## $ version                     <chr> NA, NA, NA, NA, "publishedVersion", "publi…
## $ first_page                  <chr> "1129", "111780", "7527", "133146", "297",…
## $ last_page                   <chr> "1155", "111780", "7550", "133146", "338",…
## $ volume                      <chr> "39", "153", "60", "289", "32", "199", "30…
## $ issue                       <chr> "6", NA, "24", NA, "1", NA, "2", NA, "6", …
## $ is_oa                       <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FA…
## $ is_oa_anywhere              <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FA…
## $ oa_status                   <chr> "closed", "closed", "closed", "closed", "b…
## $ oa_url                      <chr> NA, NA, NA, NA, "https://link.springer.com…
## $ any_repository_has_fulltext <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, F…
## $ language                    <chr> "en", "en", "en", "en", "en", "en", "en", …
## $ grants                      <list> NA, <"https://openalex.org/F4320321001", …
## $ counts_by_year              <list> [<data.frame[3 x 2]>], [<data.frame[3 x 2…
## $ cited_by_api_url            <chr> "https://api.openalex.org/works?filter=cit…
## $ ids                         <list> <"https://openalex.org/W4220991995", "htt…
## $ doi                         <chr> "https://doi.org/10.1002/mar.21654", "http…
## $ referenced_works            <list> <"https://openalex.org/W1656810637", "htt…
## $ related_works               <list> <"https://openalex.org/W4213299913", "htt…
## $ is_paratext                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ is_retracted                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ concepts                    <list> [<data.frame[24 x 5]>], [<data.frame[16 x…
## $ topics                      <list> [<tbl_df[12 x 5]>], [<tbl_df[12 x 5]>], […
## $ id_oa                       <chr> "W4220991995", "W3208801174", "W4210997151…
## $ CR                          <chr> "W1656810637;W1659431435;W1791587663;W1965…
## $ TI                          <chr> "ALEXA, WHAT DO WE KNOW ABOUT CONVERSATION…
## $ AB                          <chr> "ABSTRACT CONVERSATIONAL AGENTS ARE SYSTEM…
## $ SO                          <chr> "PSYCHOLOGY & MARKETING", "RENEWABLE & SUS…
## $ DT                          <chr> "ARTICLE", "ARTICLE", "ARTICLE", "ARTICLE"…
## $ DB                          <chr> "OPENALEX", "OPENALEX", "OPENALEX", "OPENA…
## $ JI                          <chr> "S102896891", "S68497187", "S65690446", "S…
## $ J9                          <chr> "S102896891", "S68497187", "S65690446", "S…
## $ PY                          <int> 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
## $ TC                          <int> 91, 64, 56, 52, 49, 47, 47, 44, 42, 40, 39…
## $ DI                          <chr> "10.1002/mar.21654", "10.1016/j.rser.2021.…
## $ SR_FULL                     <chr> "WENG MARC LIM, 2022, PSYCHOLOGY & MARKETI…
## $ SR                          <chr> "WENG MARC LIM, 2022, PSYCHOLOGY & MARKETI…

About OpenAlex

OpenAlex is a fully open catalog of the global research system. It’s named after the ancient Library of Alexandria. The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities:

  • Works are papers, books, datasets, etc; they cite other works

  • Authors are people who create works

  • Institutions are universities and other orgs that are affiliated with works (via authors)

  • Concepts tag Works with a topic

Acknowledgements

Package hex was made with Midjourney and thus inherits a CC BY-NC 4.0 license.