The rplos package interacts with the API services of PLoS (Public Library of Science) Journals. You used to need an API key to work with this package - that is no longer needed!

This tutorial will go through three use cases to demonstrate the kinds of things possible in rplos.

  • Search across PLoS papers in various sections of papers
  • Search for terms and visualize results as a histogram OR as a plot through time
  • Text mining of scientific literature

Load package from CRAN

Search across PLoS papers in various sections of papers

searchplos is a general search, and in this case searches for the term Helianthus and returns the DOI’s of matching papers

searchplos(q = "Helianthus", fl = "id", limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1      646     0
#> 
#> $data
#> # A tibble: 5 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pone.0198869
#> 2 10.1371/journal.pone.0213065
#> 3 10.1371/journal.pone.0148280
#> 4 10.1371/journal.pone.0111982
#> 5 10.1371/journal.pone.0212371

Get only full article DOIs

searchplos(q = "*:*", fl = 'id', fq = 'doc_type:full', start = 0, limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1   292789     0
#> 
#> $data
#> # A tibble: 5 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pcbi.0020071
#> 2 10.1371/journal.pbio.1000152
#> 3 10.1371/journal.pbio.1000153
#> 4 10.1371/journal.pbio.1000159
#> 5 10.1371/journal.pbio.1000165

Get DOIs for only PLoS One articles

searchplos(q = "*:*", fl = 'id', fq = 'journal_key:PLoSONE',
           start = 0, limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1  2125942     0
#> 
#> $data
#> # A tibble: 5 x 1
#>   id                                       
#>   <chr>                                    
#> 1 10.1371/journal.pone.0002397/title       
#> 2 10.1371/journal.pone.0002397/abstract    
#> 3 10.1371/journal.pone.0002397/references  
#> 4 10.1371/journal.pone.0002397/body        
#> 5 10.1371/journal.pone.0002397/introduction

Get DOIs for full article in PLoS One

searchplos(q = "*:*", fl = 'id',
   fq = list('journal_key:PLoSONE', 'doc_type:full'),
   start = 0, limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1   246927     0
#> 
#> $data
#> # A tibble: 5 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pone.0002399
#> 2 10.1371/journal.pone.0002401
#> 3 10.1371/journal.pone.0002403
#> 4 10.1371/journal.pone.0002405
#> 5 10.1371/journal.pone.0002407

Search for many terms

q <- c('ecology','evolution','science')
lapply(q, function(x) searchplos(x, limit = 2))
#> [[1]]
#> [[1]]$meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1    55873     0
#> 
#> [[1]]$data
#> # A tibble: 2 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pone.0001248
#> 2 10.1371/journal.pone.0059813
#> 
#> 
#> [[2]]
#> [[2]]$meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1    82168     0
#> 
#> [[2]]$data
#> # A tibble: 2 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pbio.2002255
#> 2 10.1371/journal.pone.0205798
#> 
#> 
#> [[3]]
#> [[3]]$meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1   260220     0
#> 
#> [[3]]$data
#> # A tibble: 2 x 1
#>   id                          
#>   <chr>                       
#> 1 10.1371/journal.pone.0229237
#> 2 10.1371/journal.pone.0202320

Search on specific sections

A suite of functions were created as light wrappers around searchplos as a shorthand to search specific sections of a paper.

  • plosauthor searchers in authors
  • plosabstract searches in abstracts
  • plostitle searches in titles
  • plosfigtabcaps searches in figure and table captions
  • plossubject searches in subject areas

plosauthor searches across authors, and in this case returns the authors of the matching papers. the fl parameter determines what is returned

plosauthor(q = "Eisen", fl = "author", limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1     1107     0
#> 
#> $data
#> # A tibble: 5 x 1
#>   author                                                                       
#>   <chr>                                                                        
#> 1 Myungsun Kang,Timothy J Eisen,Ellen A Eisen,Arup K Chakraborty,Herman N Eisen
#> 2 Myungsun Kang,Timothy J Eisen,Ellen A Eisen,Arup K Chakraborty,Herman N Eisen
#> 3 Myungsun Kang,Timothy J Eisen,Ellen A Eisen,Arup K Chakraborty,Herman N Eisen
#> 4 Myungsun Kang,Timothy J Eisen,Ellen A Eisen,Arup K Chakraborty,Herman N Eisen
#> 5 Myungsun Kang,Timothy J Eisen,Ellen A Eisen,Arup K Chakraborty,Herman N Eisen

plosabstract searches across abstracts, and in this case returns the id and title of the matching papers

plosabstract(q = 'drosophila', fl = 'id,title', limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1     3944     0
#> 
#> $data
#> # A tibble: 5 x 2
#>   id                     title                                                  
#>   <chr>                  <chr>                                                  
#> 1 10.1371/journal.pone.… Host Range and Specificity of the Drosophila C Virus   
#> 2 10.1371/journal.pgen.… Drosophila Myc restores immune homeostasis of Imd path…
#> 3 10.1371/journal.pone.… Exogenous expression of Drp1 plays neuroprotective rol…
#> 4 10.1371/journal.pone.… A Drosophila model for developmental nicotine exposure 
#> 5 10.1371/image.pbio.v0… PLoS Biology Issue Image | Vol. 6(5) May 2008

plostitle searches across titles, and in this case returns the title and journal of the matching papers

plostitle(q = 'drosophila', fl = 'title,journal', limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1     2480     0
#> 
#> $data
#> # A tibble: 5 x 2
#>   journal  title                                                                
#>   <chr>    <chr>                                                                
#> 1 PLOS ONE Tandem Duplications and the Limits of Natural Selection in Drosophil…
#> 2 PLoS ONE A DNA Virus of Drosophila                                            
#> 3 PLOS ONE Nematocytes: Discovery and characterization of a novel anculeate hem…
#> 4 PLOS ONE Peptidergic control in a fruit crop pest: The spotted-wing drosophil…
#> 5 PLoS ONE In Vivo RNAi Rescue in Drosophila melanogaster with Genomic Transgen…

Search terms & visualize results as a histogram OR as a plot through time

plosword allows you to search for 1 to K words and visualize the results as a histogram, comparing number of matching papers for each word

out <- plosword(list("monkey", "Helianthus", "sunflower", "protein", "whale"),
    vis = "TRUE")
out$table
#>   No_Articles       Term
#> 1       14528     monkey
#> 2         646 Helianthus
#> 3        1876  sunflower
#> 4      164537    protein
#> 5        2142      whale
out$plot
plot of chunk plosword1plot

plot of chunk plosword1plot

You can also pass in curl options, in this case get verbose information on the curl call.

plosword('Helianthus', callopts = list(verbose = TRUE))
#> Number of articles with search term 
#>                                 646

Visualize terms

plot_throughtime allows you to search for up to 2 words and visualize the results as a line plot through time, comparing number of articles matching through time. Visualize with the ggplot2 package, only up to two terms for now.

library("ggplot2")
plot_throughtime(terms = "phylogeny", limit = 200) +
  geom_line(size = 2, color = 'black')
plot of chunk throughtime1

plot of chunk throughtime1

More

See the Faceted and highlighted searches and Full text vignettes for more rplos help.