Skip to contents

While oa_fetch() offers a convenient and flexible way of retrieving results from queries to the OpenAlex API, we may want to specify some of its arguments to optimize your API calls for certain use cases.

This vignette shows how to perform an efficient literature search, comparing to a similar search in PubMed using the rentrez package.

Motivating example

Suppose you’re interested in finding publications that explore the links between the BRAF gene and melanoma.

With the rentrez package, we can use the entrez_search function retrieves up to 10 records matching the search query from the PubMed database.

braf_pubmed <- entrez_search(db = "pubmed", term = "BRAF and melanoma", retmax = 10)
braf_pubmed
#> Entrez search result with 8270 hits (object contains 10 IDs and no web_history object)
#>  Search term (as translated):  "BRAF"[All Fields] AND ("melanoma"[MeSH Terms] OR  ...
braf_pubmed$ids |> 
  entrez_summary(db = "pubmed") |> 
  extract_from_esummary("title") |> 
  tibble::enframe("id", "title")
#> # A tibble: 10 × 2
#>    id       title                                                               
#>    <chr>    <chr>                                                               
#>  1 39553047 Metachronous Cutaneous Melanoma Metastases of the Left Anterior Par…
#>  2 39551678 Role of neoadjuvant pembrolizumab in advanced melanoma.             
#>  3 39542696 Precision Oncology in Melanoma: Changing Practices.                 
#>  4 39525757 Malignant metastatic melanoma in brain with unknown primary origin:…
#>  5 39534873 Exploring resistance to immune checkpoint inhibitors and targeted t…
#>  6 39529955 Expert consensus on the diagnosis and treatment of solid tumors wit…
#>  7 39528552 Elucidation of anti-human melanoma and anti-aging mechanisms of com…
#>  8 39519404 Ecto-NOX Disulfide-Thiol Exchanger 2 (ENOX2/tNOX) Is a Potential Pr…
#>  9 39519055 Amelanotic Melanoma-Biochemical and Molecular Induction Pathways.   
#> 10 39517999 Adjuvant Use of Pembrolizumab for Stage III Melanoma in a Real-Worl…

On the other hand, with openalexR, we can use the search argument of oa_fetch():

braf_oa <- oa_fetch(
  search = "BRAF AND melanoma",
  pages = 1,
  per_page = 10,
  verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?search=BRAF%20AND%20melanoma
#> Using basic paging...
#> Getting 1 page of results with a total of 10 records...
#> Warning: Note: `oa_fetch` and `oa2df` now return new names for some columns in openalexR v2.0.0.
#>     See NEWS.md for the list of changes.
#>     Call `get_coverage()` to view the all updated columns and their original names in OpenAlex.
#> This warning is displayed once every 8 hours.
braf_oa |> 
  show_works(simp_func = identity) |> 
  select(1:2)
#> # A tibble: 10 × 2
#>    id          display_name                                                     
#>    <chr>       <chr>                                                            
#>  1 W2128542677 Improved Survival with Vemurafenib in Melanoma with BRAF V600E M…
#>  2 W2106543129 Inhibition of Mutated, Activated BRAF in Metastatic Melanoma     
#>  3 W2163188200 Mutations of the BRAF gene in human cancer                       
#>  4 W2168143310 Survival in BRAF V600–Mutant Advanced Melanoma Treated with Vemu…
#>  5 W2136474966 Improved Survival with MEK Inhibition in BRAF-Mutated Melanoma   
#>  6 W2121545342 Combined Vemurafenib and Cobimetinib in <i>BRAF</i>-Mutated Mela…
#>  7 W2096387850 Combined BRAF and MEK Inhibition versus BRAF Inhibition Alone in…
#>  8 W1819015028 BRAF and RAS mutations in human lung cancer and melanoma.        
#>  9 W2128035403 Nivolumab in Previously Untreated Melanoma without<i>BRAF</i>Mut…
#> 10 W1971947883 Clinical efficacy of a RAF inhibitor needs broad target blockade…

This call performs a search using the OpenAlex API, retrieving the 10 most relevant results for the query “BRAF AND melanoma”.

By default, an oa_fetch() call will return all records associated with a search, for example, querying “BRAF AND melanoma” in OpenAlex may return over 54,000 records. Fetching all of these records would be unnecessarily slow, especially when we are often only interested in the top, say, 10 results (based on citation count or relevance — more on sorting below).

We can limit the number of results with the arguments per_page (number of records to return per page, between 1 and 200, default 200) and pages (range of pages to return, e.g., 1:3 for the first 3 pages, default NULL to return all pages). For example, if you want the top 250 records, you can set

  • per_page = 50, pages = 1:5 to get exactly 250 records; or
  • per_page = 200, pages = 1:2 to get 400 records, then you can slice the dataframe one more time to get the first 250.

Sorting results

By default, the results from oa_fetch are sorted based on relevance_score, a measure of how closely each result matches the query.1 If a different ordering is desired, such as sorting by citation count, you can specify sort in the options argument.

Here are the commonly used sorting options:

  • relevance_score: Default, ranks results based on query match relevance.
  • cited_by_count: Sorts results based on the number of times the work has been cited.
  • publication_date: Sorts by publication date.
results <- openalexR::oa_fetch(
  search = "BRAF AND melanoma", 
  pages = 1,
  per_page = 10,
  options = list(sort = "cited_by_count:desc"),
  verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?search=BRAF%20AND%20melanoma&sort=cited_by_count%3Adesc
#> Using basic paging...
#> Getting 1 page of results with a total of 10 records...

Conclusion

The openalexR package provides a powerful and flexible interface for conducting academic literature searches using the OpenAlex API. By controlling the number of results and the sorting order, you can tailor your search to retrieve the most relevant or impactful publications. In cases where large datasets are involved, it’s useful to limit the number of results returned to ensure efficient and timely searches.

We encourage users to explore further options provided by openalexR to refine their search and retrieve the specific data they need for their research projects: