Building complex search strategies

Building your search with Boolean operators

First load the medrxivr package:

library(medrxivr)

To find records that contain any of many terms, pass the terms as a vector to the mx_search() function, as in the code chunk below. Query terms can include regular expression syntax - see the section at the end of this document on common regular expression that may be useful when searching.

myquery <- c("dementia","vascular","alzheimer's") # Combined with Boolean OR

mx_results <- mx_search(data = mx_snapshot(),     # Use daily snapshot for data
                        query = myquery)
#> Found 5785 record(s) matching your search.

To find records relevant to more than one topic domain, create a vector for each topic (note: there is no upper limit on the number of topics your can have) and combine these vectors into a list which is then passed to the mx_search() function:

topic1  <- c("dementia","vascular","alzheimer's")  # Combined with Boolean OR
topic2  <- c("lipids","statins","cholesterol")     # Combined with Boolean OR
myquery <- list(topic1, topic2)                    # Combined with Boolean AND

mx_results <- mx_search(data = mx_snapshot(),
                        query = myquery)
#> Found 371 record(s) matching your search.

Additional filters and options

Limit search by field

By default, a range of fields (title, abstract, first author, subject, link (which contains DOI)) are searched, but you can limit the search to a subset of these using the fields argument:


# Limit search to title/abstract
mx_results <- mx_search(data = mx_snapshot(),
                        query = "dementia",
                        fields = c("title","abstract"))
#> Found 1045 record(s) matching your search.

# Search by DOI
mx_results <- mx_search(data = mx_snapshot(),
                        query = "10.1101/2020.01.30.20019836",
                        fields = "link")
#> Found 1 record(s) matching your search.

Exclude records containing certain terms

Often it is useful to be able to exclude records that contain a certain term that is not relevant to your search. For example, in the search below, we are looking for records related to “dementia” alone by excluding those that mention “mild cognitive impairment”:

mx_results <- mx_search(data = mx_snapshot(),
                        query = "dementia",
                        NOT = "[Mm]ild cognitive impairment")
#> Found 896 record(s) matching your search.

Limit by date posted

You can define either/both of the earliest and latest date you wish to include records from. Note: the search is inclusive of both dates specified:

mx_results <- mx_search(data = mx_snapshot(),
                        query = "dementia",
                        from_date = "2020-01-01",      # 1st Jan 2020
                        to_date = "2020-01-08")        # 8th Jan 2020
#> Found 2 record(s) matching your search.

Return multiple versions of a record

medRxiv allows authors to upload a new version of their preprint as often as they like. By default, medrxivr only returns the most recent version of the preprint, but if you are interested in exploring how a record changed over time, you can retrieve all versions of the preprint by setting deduplicate = FALSE

mx_results <- mx_search(data = mx_snapshot(),
                        query = "10.1101/2020.01.30.20019836",
                        fields = "link",
                        deduplicate = FALSE)
#> Found 4 record(s) matching your search.
#> Note, there may be >1 version of the same record.

Useful syntax for the systematic reviewer

Capitalisation

Example regex: [Dd]ementia
Description: The search is case sensitive, so this syntax allows you to find both Dementia and dementia using a single term, rather than having to enter them separately. However, setting the autocaps argument of mx_search() to TRUE will automatically search for both capitalised and uncapitalised versions of your search terms (e.g. with auto_caps = TRUE you just need to search for “dementia” to find both Dementia and dementia - behind the scenes, “dementia” is converted to “[Dd]ementia”.

Wildcard

Example regex: randomi*ation
Description: The wildcard operator “*” defines any single alphanumeric character - in this case, the term will find both randomisation and randomization.

NEAR

Example regex: systematic NEAR4 review
Description: The “NEAR4” operator defines that up to 4 words can be between systematic and review and the search will still find it. To change how far apart the terms are allowed to be, simply change the number following NEAR (e.g. to find terms that are only one word apart, the syntax would be systematic NEAR1 review). Please note that the search is directional, in that the example term here will find “systematic methods for the review”, but will not find “the review was systematic”.

Word limits

Example regex: \\bNCOV\\b
Description: Sometimes it is useful to be able to define the start and end of terms. For example, if you were searching for NCOV-19, simply using ncov as your search term would also return records containing uncovered. Using \\b allows you to define where the term beings and ends, thus excluding false positive matches.

Example using these regexes

To find records that contain “Mendelian” within 4 words of “randomisation” (with varying capitalisation of “Mendelian” and UK/US spellings of “randomisation”), the following syntax is correct:

mx_results <- mx_search(data = mx_snapshot(),
                        query = "mendelian NEAR4 randomi*ation", 
                        auto_caps = TRUE)
#> Found 967 record(s) matching your search.

Regex tester

To check whether your search term will find what you expect it to, there is a useful regex tester, designed by Adam Spannbauer.

Yaoxiang Li

2025-07-28