vignettes/introducing-europepmc.Rmd
introducing-europepmc.RmdEurope PMC is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other literature and patent sources.
For more background on Europe PMC, see:
Levchenko, M., Gou, Y., Graef, F., Hamelers, A., Huang, Z., Ide-Smith, M., … McEntyre, J. (2017). Europe PMC in 2017. Nucleic Acids Research, 46(D1), D1254–D1260. https://doi.org/10.1093/nar/gkx1005
This client supports the Europe PMC search syntax. If you are unfamiliar with searching Europe PMC, check out the Europe PMC query builder, a very nice tool that helps you to build queries. To make use of Europe PMC queries in R, copy & paste the search string to the search functions of this package.
In the following, some examples demonstrate how to search Europe PMC with R.
empc_search() is the main function to query Europe PMC. It searches both metadata and fulltexts.
library(europepmc) europepmc::epmc_search('malaria') #> # A tibble: 100 x 29 #> id source pmid doi title authorString journalTitle journalVolume #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 3279… MED 3279… 10.1… Stra… Lu XM, Le R… Methods Mol… 2170 #> 2 3204… MED 3204… 10.1… Bala… Drewry LL, … Virulence 11 #> 3 3204… MED 3204… 10.1… Mode… Olaniyi S, … J Biol Dyn 14 #> 4 3206… MED 3206… 10.1… Pred… Patel H, Du… Virulence 11 #> 5 3190… MED 3190… 10.1… Sett… Bucşan AN, … Virulence 11 #> 6 3240… MED 3240… 10.1… B ce… Hahn WO, Pe… Virulence 11 #> 7 3286… MED 3286… 10.1… Misd… Iriart X, M… Emerg Micro… 9 #> 8 3246… MED 3246… 10.1… Back… Xing Y, Guo… J Biol Dyn 14 #> 9 3282… MED 3282… 10.1… Modi… Eleftheriou… Methods Mol… 2198 #> 10 3271… MED 3271… 10.1… Mamm… Sutherland … Virulence 11 #> # … with 90 more rows, and 21 more variables: pubYear <chr>, journalIssn <chr>, #> # pageInfo <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, #> # inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>, #> # citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>, #> # hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>, pmcid <chr>, issue <chr>, versionNumber <int>
It is worth noting that Europe PMC expands queries with MeSH synonyms by default, a behavior which can be turned off with the synonym parameter.
europepmc::epmc_search('malaria', synonym = FALSE) #> # A tibble: 100 x 29 #> id source pmid pmcid doi title authorString journalTitle issue #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 3204… MED 3204… PMC7… 10.1… Bala… Drewry LL, … Virulence 1 #> 2 3279… MED 3279… <NA> 10.1… Stra… Lu XM, Le R… Methods Mol… <NA> #> 3 3204… MED 3204… <NA> 10.1… Mode… Olaniyi S, … J Biol Dyn 1 #> 4 3240… MED 3240… <NA> 10.1… B ce… Hahn WO, Pe… Virulence 1 #> 5 3286… MED 3286… <NA> 10.1… Misd… Iriart X, M… Emerg Micro… 1 #> 6 3246… MED 3246… <NA> 10.1… Back… Xing Y, Guo… J Biol Dyn 1 #> 7 3206… MED 3206… PMC7… 10.1… Pred… Patel H, Du… Virulence 1 #> 8 3190… MED 3190… PMC6… 10.1… Sett… Bucşan AN, … Virulence 1 #> 9 3294… MED 3294… <NA> 10.1… Dele… Saito M, Br… Lancet Chil… 10 #> 10 3271… MED 3271… <NA> 10.1… Mamm… Sutherland … Virulence 1 #> # … with 90 more rows, and 20 more variables: journalVolume <chr>, #> # pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>, #> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>, #> # hasSuppl <chr>, citedByCount <int>, hasReferences <chr>, #> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>, versionNumber <int>
To get an exact match, use quotes as in the following example:
europepmc::epmc_search('"Human malaria parasites"') #> # A tibble: 100 x 28 #> id source pmid doi title authorString journalTitle pubYear journalIssn #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 3247… MED 3247… 10.1… C-te… Kimata-Arig… J Biochem 2020 "0021-924x… #> 2 PPR1… PPR <NA> 10.1… A dr… Cobb DW, Ku… <NA> 2020 <NA> #> 3 3298… MED 3298… 10.1… Moni… Kattenberg … Mol Ecol 2020 "0962-1083… #> 4 PPR2… PPR <NA> 10.1… A ma… Marcenac P,… <NA> 2020 <NA> #> 5 3192… MED 3192… 10.1… Falc… Rosenthal P… Biochim Bio… 2020 "1570-9639… #> 6 3249… MED 3249… 10.1… Deve… Lantero E, … J Biomed Na… 2020 "1550-7033… #> 7 PPR9… PPR <NA> 10.1… Mala… Kwon H, Rey… <NA> 2019 <NA> #> 8 PPR9… PPR <NA> 10.1… Disr… Subudhi AK,… <NA> 2019 <NA> #> 9 PPR1… PPR <NA> 10.2… A ro… Jivapetthai… <NA> 2019 <NA> #> 10 PPR6… PPR <NA> 10.1… Gene… McLean KJ, … <NA> 2018 <NA> #> # … with 90 more rows, and 19 more variables: pubType <chr>, #> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>, #> # hasSuppl <chr>, citedByCount <int>, hasReferences <chr>, #> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>, issue <chr>, journalVolume <chr>, #> # pageInfo <chr>, pmcid <chr>
By default, 100 records are returned, but the number of results can be expanded or limited with the limit parameter.
europepmc::epmc_search('"Human malaria parasites"', limit = 10) #> # A tibble: 10 x 27 #> id source pmid doi title authorString journalTitle pubYear journalIssn #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 3247… MED 3247… 10.1… C-te… Kimata-Arig… J Biochem 2020 "0021-924x… #> 2 PPR1… PPR <NA> 10.1… A dr… Cobb DW, Ku… <NA> 2020 <NA> #> 3 3298… MED 3298… 10.1… Moni… Kattenberg … Mol Ecol 2020 "0962-1083… #> 4 PPR2… PPR <NA> 10.1… A ma… Marcenac P,… <NA> 2020 <NA> #> 5 3192… MED 3192… 10.1… Falc… Rosenthal P… Biochim Bio… 2020 "1570-9639… #> 6 3249… MED 3249… 10.1… Deve… Lantero E, … J Biomed Na… 2020 "1550-7033… #> 7 PPR9… PPR <NA> 10.1… Mala… Kwon H, Rey… <NA> 2019 <NA> #> 8 PPR9… PPR <NA> 10.1… Disr… Subudhi AK,… <NA> 2019 <NA> #> 9 PPR1… PPR <NA> 10.2… A ro… Jivapetthai… <NA> 2019 <NA> #> 10 PPR6… PPR <NA> 10.1… Gene… McLean KJ, … <NA> 2018 <NA> #> # … with 18 more variables: pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, #> # inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>, #> # citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>, #> # hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>, issue <chr>, journalVolume <chr>, #> # pageInfo <chr>
Results are sorted by relevance. Other options via the sort parameter are
sort = 'cited' by the number of citation, descending from the most cited publicationsort = 'date' by date published starting with the most recent publicationSometimes, you would like to check, if articles are indexed in Europe PMC using DOI names, a widely used identifier for scholarly articles. Use epmc_search_by_doi() for this purpose.
my_dois <- c( "10.1159/000479962", "10.1002/sctm.17-0081", "10.1161/strokeaha.117.018077", "10.1007/s12017-017-8447-9" ) europepmc::epmc_search_by_doi(doi = my_dois) #> # A tibble: 4 x 28 #> id source pmid doi title authorString journalTitle issue journalVolume #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 2895… MED 2895… 10.1… Clin… Schnieder M… Eur Neurol 5-6 78 #> 2 2894… MED 2894… 10.1… Conc… Doeppner TR… Stem Cells … 11 6 #> 3 2901… MED 2901… 10.1… One-… Psychogios … Stroke 11 48 #> 4 2862… MED 2862… 10.1… Defe… Carboni E, … Neuromolecu… 2-3 19 #> # … with 19 more variables: pubYear <chr>, journalIssn <chr>, pageInfo <chr>, #> # pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, #> # hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>, #> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>, pmcid <chr>
By default, a non-nested data frame printed as tibble is returned. Other formats are output = "id_list" returning a list of IDs and sources, and output = “‘raw’”" for getting full metadata as list. Please be aware that these lists can become very large.
Europe PMC provides text-mined annotations contained in abstracts and open access full-text articles.
These automatically identified concepts and term can be retrieved at the article-level:
europepmc::epmc_annotations_by_id(c("MED:28585529", "PMC:PMC1664601")) #> # A tibble: 774 x 13 #> source ext_id pmcid prefix exact postfix name uri id type section #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 MED 28585… PMC5… "tive… Beta… " allo… Beta… http… http… Orga… Title … #> 2 MED 28585… PMC5… "at, … suga… " (Bet… suga… http… http… Orga… Abstra… #> 3 MED 28585… PMC5… "d a … beet ". " beet http… http… Orga… Abstra… #> 4 MED 28585… PMC5… "lati… beets " (B. … beets http… http… Orga… Abstra… #> 5 MED 28585… PMC5… "of <… B. v… " ssp.… B. v… http… http… Orga… Abstra… #> 6 MED 28585… PMC5… " bee… ssp ". mar… ssp http… http… Gene… Abstra… #> 7 MED 28585… PMC5… "ify … Beta… " ssp.… Beta… http… http… Orga… Abstra… #> 8 MED 28585… PMC5… "beet… ssp ". vul… ssp http… http… Gene… Abstra… #> 9 MED 28585… PMC5… "ed v… MBS "). " MBS http… http… Gene… Abstra… #> 10 MED 28585… PMC5… "2 wa… MBS " and … MBS http… http… Gene… Abstra… #> # … with 764 more rows, and 2 more variables: provider <chr>, subType <chr>
To obtain a list of articles where Europe PMC has text-minded annotations, either subset the resulting data.frame
tt <- epmc_search("malaria") tt[tt$hasTextMinedTerms == "Y" | tt$hasTMAccessionNumbers == "Y",] #> # A tibble: 71 x 29 #> id source pmid doi title authorString journalTitle journalVolume #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 3279… MED 3279… 10.1… Stra… Lu XM, Le R… Methods Mol… 2170 #> 2 3204… MED 3204… 10.1… Bala… Drewry LL, … Virulence 11 #> 3 3204… MED 3204… 10.1… Mode… Olaniyi S, … J Biol Dyn 14 #> 4 3206… MED 3206… 10.1… Pred… Patel H, Du… Virulence 11 #> 5 3190… MED 3190… 10.1… Sett… Bucşan AN, … Virulence 11 #> 6 3240… MED 3240… 10.1… B ce… Hahn WO, Pe… Virulence 11 #> 7 3286… MED 3286… 10.1… Misd… Iriart X, M… Emerg Micro… 9 #> 8 3246… MED 3246… 10.1… Back… Xing Y, Guo… J Biol Dyn 14 #> 9 3282… MED 3282… 10.1… Modi… Eleftheriou… Methods Mol… 2198 #> 10 3294… MED 3294… 10.1… Dele… Saito M, Br… Lancet Chil… 4 #> # … with 61 more rows, and 21 more variables: pubYear <chr>, journalIssn <chr>, #> # pageInfo <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, #> # inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>, #> # citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>, #> # hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>, pmcid <chr>, issue <chr>, versionNumber <int>
or expand the query choosing an annotation type or provider from the Europe PMC Advanced Search query builder.
epmc_search('malaria AND (ANNOTATION_TYPE:"Cell") AND (ANNOTATION_PROVIDER:"Europe PMC")') #> # A tibble: 100 x 28 #> id source pmid pmcid doi title authorString journalTitle issue #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 3098… MED 3098… PMC7… 10.1… Clin… Enane LA, S… J Pediatric… 3 #> 2 3130… MED 3130… PMC7… 10.1… Blac… Opoka RO, W… Clin Infect… 11 #> 3 3215… MED 3215… PMC7… 10.1… In V… Yao X, Ye F… Clin Infect… 15 #> 4 3169… MED 3169… PMC7… 10.1… Redu… Kingston HW… J Infect Dis 9 #> 5 3150… MED 3150… <NA> 10.1… Acut… Oshomah-Bel… J Trop Pedi… 2 #> 6 3168… MED 3168… <NA> 10.1… Eval… Ferdinand D… Trans R Soc… 3 #> 7 3167… MED 3167… <NA> 10.1… A Sy… Thiengsusuk… Eur J Drug … 2 #> 8 3153… MED 3153… <NA> 10.1… Asso… Peitzmeier … AIDS Behav 3 #> 9 3104… MED 3104… PMC7… 10.1… Elev… Datta D, Co… Clin Infect… 6 #> 10 3085… MED 3085… <NA> 10.1… An E… Woodford J,… J Infect Dis 6 #> # … with 90 more rows, and 19 more variables: journalVolume <chr>, #> # pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>, #> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>, #> # hasSuppl <chr>, citedByCount <int>, hasReferences <chr>, #> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>
Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the Protein Data bank in Europe published 2016:
europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016') #> # A tibble: 100 x 28 #> id source pmid pmcid doi title authorString journalTitle issue #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 2798… MED 2798… PMC5… 10.1… Shor… Lin J, Pozh… Biochemistry 2 #> 2 2781… MED 2781… PMC5… 10.1… Stru… Wakamatsu T… Appl Enviro… 2 #> 3 2803… MED 2803… PMC5… 10.1… Stru… Waz S, Naka… J Biol Chem 7 #> 4 2803… MED 2803… PMC5… 10.1… Stru… Christensen… PLoS One 12 #> 5 2806… MED 2806… PMC5… 10.1… Stru… Gai Z, Wang… Cell Discov <NA> #> 6 2802… MED 2802… PMC5… 10.1… Crys… Kuk AC, Mas… Nat Struct … 2 #> 7 2801… MED 2801… PMC5… 10.1… Stru… Levdikov VM… J Biol Chem 7 #> 8 2800… MED 2800… PMC5… 10.1… Stru… Zhao H, Wei… Sci Rep <NA> #> 9 2800… MED 2800… PMC6… 10.1… Disc… Cheeseman M… J Med Chem 1 #> 10 2786… MED 2786… <NA> 10.1… Stru… Bhatt A, Ma… Chembiochem 2 #> # … with 90 more rows, and 19 more variables: journalVolume <chr>, #> # pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>, #> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>, #> # hasSuppl <chr>, citedByCount <int>, hasReferences <chr>, #> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>, #> # hasTMAccessionNumbers <chr>, firstIndexDate <chr>, #> # firstPublicationDate <chr>
The following sources are supported
To retrieve metadata about these external database links, use europepmc_epmc_db().
Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use
europepmc::epmc_citations("9338777", limit = 500) #> # A tibble: 233 x 11 #> id source citationType title authorString journalAbbrevia… pubYear volume #> <chr> <chr> <chr> <chr> <chr> <chr> <int> <chr> #> 1 3156… MED research-ar… Regu… Chung HC, N… J Vet Sci 2019 20 #> 2 3023… MED research su… Bioe… Legallais C… Adv Healthc Mat… 2018 7 #> 3 3026… MED research su… Porc… Fiebig U, F… Xenotransplanta… 2018 25 #> 4 2975… MED historical … Infe… Weiss RA. Xenotransplanta… 2018 25 #> 5 2964… MED research su… Trac… Kawasaki J,… Viruses 2018 10 #> 6 2876… MED research su… Pres… Kawasaki J,… J Virol 2017 91 #> 7 2843… MED research su… Thre… Colon-Moran… Virology 2017 507 #> 8 2805… MED research su… Anti… Inoue Y, Yo… Ann Biomed Eng 2017 45 #> 9 2783… MED research-ar… Tran… Kim N, Choi… PLoS One 2016 11 #> 10 2746… MED research su… Exis… Kuse K, Ito… J Virol 2016 90 #> # … with 223 more rows, and 3 more variables: issue <chr>, pageInfo <chr>, #> # citedByCount <int>
For reference section from an article:
europepmc::epmc_refs("28632490", limit = 200) #> # A tibble: 169 x 19 #> id source citationType title authorString journalAbbrevia… issue pubYear #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> #> 1 1200… MED JOURNAL ART… Tric… Adolfsson-E… Chemosphere 9-10 2002 #> 2 1879… MED JOURNAL ART… In v… Ahn KC, Zha… Environ. Health… 9 2008 #> 3 1855… MED JOURNAL ART… Effe… Aiello AE, … Am J Public Hea… 8 2008 #> 4 1768… MED JOURNAL ART… Cons… Aiello AE, … Clin. Infect. D… <NA> 2007 #> 5 1527… MED JOURNAL ART… Rela… Aiello AE, … Antimicrob. Age… 8 2004 #> 6 1820… MED JOURNAL ART… The … Allmyr M, H… Sci. Total Envi… 1 2008 #> 7 1700… MED JOURNAL ART… Tric… Allmyr M, A… Sci. Total Envi… 1 2006 #> 8 2694… MED JOURNAL ART… Pres… Alvarez-Riv… J Chromatogr A <NA> 2016 #> 9 2319… MED JOURNAL ART… Expo… Anderson SE… Toxicol. Sci. 1 2012 #> 10 2583… MED JOURNAL ART… Obse… Vladar EK, … Methods Cell Bi… <NA> 2015 #> # … with 159 more rows, and 11 more variables: volume <chr>, pageInfo <chr>, #> # citedOrder <int>, match <chr>, essn <chr>, issn <chr>, #> # publicationTitle <chr>, publisherLoc <chr>, publisherName <chr>, #> # externalLink <chr>, doi <chr>
Europe PMC gives not only access to metadata, but also to full-texts. Adding AND (OPEN_ACCESS:y) to your search query, returns only those articles where Europe PMC has also the fulltext.
Fulltext as xml document can accessed via the PMID or the PubMed Central ID (PMCID):
europepmc::epmc_ftxt("PMC3257301") #> {xml_document} #> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML"> #> [1] <front>\n <journal-meta>\n <journal-id journal-id-type="nlm-ta">PLoS ... #> [2] <body>\n <sec id="s1">\n <title>Introduction</title>\n <p>Atmosphe ... #> [3] <back>\n <ack>\n <p>We would like to thank Dr. C. Gourlay and Dr. T. ...