Get summaries of objects in NCBI datasets from a unique ID
Source:R/entrez_summary.r
entrez_summary.Rd
The NCBI offer two distinct formats for summary documents.
Version 1.0 is a relatively limited summary of a database record based on a
shared Document Type Definition. Version 1.0 summaries are only available as
XML and are not available for some newer databases
Version 2.0 summaries generally contain more information about a given
record, but each database has its own distinct format. 2.0 summaries are
available for records in all databases and as JSON and XML files.
As of version 0.4, rentrez fetches version 2.0 summaries by default and
uses JSON as the exchange format (as JSON object can be more easily converted
into native R types). Existing scripts which relied on the structure and
naming of the "Version 1.0" summary files can be updated by setting the new
version
argument to "1.0".
Usage
entrez_summary(
db,
id = NULL,
web_history = NULL,
version = c("2.0", "1.0"),
always_return_list = FALSE,
retmode = NULL,
config = NULL,
...
)
Arguments
- db
character Name of the database to search for
- id
vector with unique ID(s) for records in database
db
. In the case of sequence databases these IDs can take form of an NCBI accession followed by a version number (eg AF123456.1 or AF123456.2)- web_history
A web_history object
- version
either 1.0 or 2.0 see above for description
- always_return_list
logical, return a list of esummary objects even when only one ID is provided (see description for a note about this option)
- retmode
either "xml" or "json". By default, xml will be used for version 1.0 records, json for version 2.0.
- config
vector configuration options passed to
httr::GET
- ...
character Additional terms to add to the request, see NCBI documentation linked to in references for a complete list
Value
A list of esummary records (if multiple IDs are passed and always_return_list if FALSE) or a single record.
file XMLInternalDocument xml file containing the entire record returned by the NCBI.
Details
By default, entrez_summary returns a single record when only one ID is
passed and a list of such records when multiple IDs are passed. This can lead
to unexpected behaviour when the results of a variable number of IDs (perhaps the
result of entrez_search
) are processed with an apply family function
or in a for-loop. If you use this function as part of a function or script that
generates a variably-sized vector of IDs setting always_return_list
to
TRUE
will avoid these problems. The function
extract_from_esummary
is provided for the specific case of extracting
named elements from a list of esummary objects, and is designed to work on
single objects as well as lists.
See also
config
for available configs
extract_from_esummary
which can be used to extract
elements from a list of esummary records
Examples
if (FALSE) { # \dontrun{
pop_ids = c("307082412", "307075396", "307075338", "307075274")
pop_summ <- entrez_summary(db="popset", id=pop_ids)
extract_from_esummary(pop_summ, "title")
# clinvar example
res <- entrez_search(db = "clinvar", term = "BRCA1", retmax=10)
cv <- entrez_summary(db="clinvar", id=res$ids)
cv
extract_from_esummary(cv, "title", simplify=FALSE)
extract_from_esummary(cv, "trait_set")[1:2]
extract_from_esummary(cv, "gene_sort")
} # }