Allows for progammatic searching of the arXiv pre-print repository.
Arguments
- query
Search pattern as a string; a vector of such strings also allowed, in which case the elements are combined with
AND.- id_list
arXiv doc IDs, as comma-delimited string or a vector of such strings
- start
An offset for the start of search
- limit
Maximum number of records to return.
- sort_by
How to sort the results (ignored if
id_listis provided)- ascending
If TRUE, sort in ascending order; else descending (ignored if
id_listis provided)- batchsize
Maximum number of records to request at one time
- force
If TRUE, force search request even if it seems extreme
- output_format
Indicates whether output should be a data frame or a list.
- sep
String to use to separate multiple authors, affiliations, DOI links, and categories, in the case that
output_format="data.frame".
Value
If output_format="data.frame", the result is a data
frame with each row being a manuscript and columns being the
various fields.
If output_format="list", the result is a list parsed from
the XML output of the search, closer to the raw output from arXiv.
The data frame format has the following columns.
| [,1] | id | arXiv ID |
| [,2] | submitted | date first submitted |
| [,3] | updated | date last updated |
| [,4] | title | manuscript title |
| [,5] | summary | abstract |
| [,6] | authors | author names |
| [,7] | affiliations | author affiliations |
| [,8] | link_abstract | hyperlink to abstract |
| [,9] | link_pdf | hyperlink to pdf |
| [,10] | link_doi | hyperlink to DOI |
| [,11] | comment | authors' comment |
| [,12] | journal_ref | journal reference |
| [,13] | doi | published DOI |
| [,14] | primary_category | primary category |
| [,15] | categories | all categories |
The contents are all strings; missing values are empty strings ("").
The columns authors, affiliations, link_doi,
and categories may have multiple entries separated by
sep (by default, "|").
The result includes an attribute "search_info" that includes
information about the details of the search parameters, including
the time at which it was completed. Another attribute
"total_results" is the total number of records that match
the query.
Examples
# \donttest{
# search for author Peter Hall with deconvolution in title
z <- arxiv_search(query = 'au:"Peter Hall" AND ti:deconvolution', limit=2)
attr(z, "total_results") # total no. records matching query
#> [1] 0
z$title
#> character(0)
# search for a set of documents by arxiv identifiers
z <- arxiv_search(id_list = c("0710.3491v1", "0804.0713v1", "1003.0315v1"))
# can also use a comma-separated string
z <- arxiv_search(id_list = "0710.3491v1,0804.0713v1,1003.0315v1")
# Journal references, if available
z$journal_ref
#> [1] "Annals of Statistics 2007, Vol. 35, No. 4, 1535-1558"
#> [2] "Annals of Statistics 2008, Vol. 36, No. 2, 665-685"
#> [3] ""
# search for a range of dates (in this case, one day)
z <- arxiv_search("submittedDate:[199701010000 TO 199701012400]", limit=2)
# }
