Get summaries of objects in NCBI datasets from a unique ID

The NCBI offer two distinct formats for summary documents. Version 1.0 is a relatively limited summary of a database record based on a shared Document Type Definition. Version 1.0 summaries are only available as XML and are not available for some newer databases Version 2.0 summaries generally contain more information about a given record, but each database has its own distinct format. 2.0 summaries are available for records in all databases and as JSON and XML files. As of version 0.4, rentrez fetches version 2.0 summaries by default and uses JSON as the exchange format (as JSON object can be more easily converted into native R types). Existing scripts which relied on the structure and naming of the "Version 1.0" summary files can be updated by setting the new version argument to "1.0".

Usage

entrez_summary(
  db,
  id = NULL,
  web_history = NULL,
  version = c("2.0", "1.0"),
  always_return_list = FALSE,
  retmode = NULL,
  config = NULL,
  ...
)

Arguments

db: character Name of the database to search for
id: vector with unique ID(s) for records in database db. In the case of sequence databases these IDs can take form of an NCBI accession followed by a version number (eg AF123456.1 or AF123456.2)
web_history: A web_history object
version: either 1.0 or 2.0 see above for description
always_return_list: logical, return a list of esummary objects even when only one ID is provided (see description for a note about this option)
retmode: either "xml" or "json". By default, xml will be used for version 1.0 records, json for version 2.0.
config: vector configuration options passed to httr::GET
...: character Additional terms to add to the request, see NCBI documentation linked to in references for a complete list

Value

A list of esummary records (if multiple IDs are passed and always_return_list if FALSE) or a single record.

file XMLInternalDocument xml file containing the entire record returned by the NCBI.

Details

By default, entrez_summary returns a single record when only one ID is passed and a list of such records when multiple IDs are passed. This can lead to unexpected behaviour when the results of a variable number of IDs (perhaps the result of entrez_search) are processed with an apply family function or in a for-loop. If you use this function as part of a function or script that generates a variably-sized vector of IDs setting always_return_list to TRUE will avoid these problems. The function extract_from_esummary is provided for the specific case of extracting named elements from a list of esummary objects, and is designed to work on single objects as well as lists.

References

https://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ESummary_

Examples

if (FALSE) { # \dontrun{
 pop_ids = c("307082412", "307075396", "307075338", "307075274")
 pop_summ <- entrez_summary(db="popset", id=pop_ids)
 extract_from_esummary(pop_summ, "title")
 
 # clinvar example
 res <- entrez_search(db = "clinvar", term = "BRCA1", retmax=10)
 cv <- entrez_summary(db="clinvar", id=res$ids)
 cv
 extract_from_esummary(cv, "title", simplify=FALSE)
 extract_from_esummary(cv, "trait_set")[1:2] 
 extract_from_esummary(cv, "gene_sort") 
} # }

Get summaries of objects in NCBI datasets from a unique ID

Usage

Arguments

Value

Details

References

See also

Examples

About

Community

Resources