While oa_fetch() offers a convenient and flexible way of
retrieving results from queries to the OpenAlex API, we may want to
specify some of its arguments to optimize your API calls for certain use
cases.
This vignette shows how to perform an efficient literature search, comparing to a similar search in PubMed using the rentrez package.
Motivating example
Suppose you’re interested in finding publications that explore the links between the BRAF gene and melanoma.
With the rentrez package, we can use the
entrez_search function retrieves up to 10 records matching
the search query from the PubMed database.
braf_pubmed <- entrez_search(db = "pubmed", term = "BRAF and melanoma", retmax = 10)
braf_pubmed
#> Entrez search result with 9100 hits (object contains 10 IDs and no web_history object)
#> Search term (as translated): "BRAF"[All Fields] AND ("melanoma"[MeSH Terms] OR ...
braf_pubmed$ids |>
entrez_summary(db = "pubmed") |>
extract_from_esummary("title") |>
tibble::enframe("id", "title")
#> # A tibble: 10 × 2
#> id title
#> <chr> <chr>
#> 1 42199973 High-Risk Acute Myeloid Leukemia Developing in a Patient with Recur…
#> 2 42196460 Dynamic Time-Resolved Remodeling of the Immune Microenvironment Aft…
#> 3 42195328 Resveratrol as a Dual MAPK/STAT3 Inhibitor in Glioblastoma: Mutatio…
#> 4 42193039 Regulation and Implementation of Apoptosis in Melanoma Tumor Cells …
#> 5 42189441 The Role of Adjuvant Primary Site Radiotherapy for Cutaneous Melano…
#> 6 42187585 Overlapping Toxicities of Pembrolizumab and Lenvatinib: A Case of C…
#> 7 42183907 Clinicopathologic characterization of primary anal canal mucosal me…
#> 8 42182515 Genomic, Clinical, and Spatial Predictors of Durable Response to BR…
#> 9 42179835 Adjuvant and neoadjuvant treatment for melanoma: integrating immuno…
#> 10 42177185 Germline variants in cancer susceptibility genes among patients wit…On the other hand, with openalexR, we can use the
search argument of oa_fetch():
braf_oa <- oa_fetch(
search = "BRAF AND melanoma",
pages = 1,
per_page = 10,
verbose = TRUE
)
#> Requesting url: <https://api.openalex.org/works?search=BRAF%20AND%20melanoma>
#> Using basic paging...
#> ℹ Getting 1 page of results with a total of 10 records...
braf_oa |>
show_works(simp_func = identity) |>
select(1:2)
#> # A tibble: 10 × 2
#> id display_name
#> <chr> <chr>
#> 1 W2128542677 Improved Survival with Vemurafenib in Melanoma with BRAF V600E M…
#> 2 W2128035403 Nivolumab in Previously Untreated Melanoma without <i>BRAF</i> M…
#> 3 W2106543129 Inhibition of Mutated, Activated BRAF in Metastatic Melanoma
#> 4 W2156078931 Combined BRAF and MEK Inhibition in Melanoma with BRAF V600 Muta…
#> 5 W2136474966 Improved Survival with MEK Inhibition in BRAF-Mutated Melanoma
#> 6 W2096387850 Combined BRAF and MEK Inhibition versus BRAF Inhibition Alone in…
#> 7 W2168143310 Survival in BRAF V600–Mutant Advanced Melanoma Treated with Vemu…
#> 8 W2121545342 Combined Vemurafenib and Cobimetinib in <i>BRAF</i> -Mutated Mel…
#> 9 W2166262263 Dabrafenib in BRAF-mutated metastatic melanoma: a multicentre, o…
#> 10 W1971947883 Clinical efficacy of a RAF inhibitor needs broad target blockade…This call performs a search using the OpenAlex API, retrieving the 10 most relevant results for the query “BRAF AND melanoma”.
By default, an oa_fetch() call will return all records
associated with a search, for example, querying “BRAF AND melanoma” in
OpenAlex may return over 54,000 records. Fetching all of these records
would be unnecessarily slow, especially when we are often only
interested in the top, say, 10 results (based on citation count or
relevance — more on sorting below).
We can limit the number of results with the arguments
per_page (number of records to return per page, between 1
and 200, default 200) and pages (range of pages to return,
e.g., 1:3 for the first 3 pages, default NULL to
return all pages). For example, if you want the top 250 records, you can
set
-
per_page = 50, pages = 1:5to get exactly 250 records; or -
per_page = 200, pages = 1:2to get 400 records, then you can slice the dataframe one more time to get the first 250.
Sorting results
By default, the results from oa_fetch are sorted based
on relevance_score, a measure of how closely each result
matches the query.1 If a different ordering is desired, such as
sorting by citation count, you can specify sort in the
options argument.
Here are the commonly used sorting options:
-
relevance_score: Default, ranks results based on query match relevance. -
cited_by_count: Sorts results based on the number of times the work has been cited. -
publication_date: Sorts by publication date.
results <- openalexR::oa_fetch(
search = "BRAF AND melanoma",
pages = 1,
per_page = 10,
options = list(sort = "cited_by_count:desc"),
verbose = TRUE
)
#> Requesting url:
#> <https://api.openalex.org/works?search=BRAF%20AND%20melanoma&sort=cited_by_count%3Adesc>
#> Using basic paging...
#> ℹ Getting 1 page of results with a total of 10 records...Conclusion
The openalexR package provides a powerful and flexible
interface for conducting academic literature searches using the OpenAlex
API. By controlling the number of results and the sorting order, you can
tailor your search to retrieve the most relevant or impactful
publications. In cases where large datasets are involved, it’s useful to
limit the number of results returned to ensure efficient and timely
searches.
We encourage users to explore further options provided by
openalexR to refine their search and retrieve the specific
data they need for their research projects:
