Skip to contents

Retrieve compound IDs (CIDs) from PubChem.


  from = "name",
  domain = c("compound", "substance", "assay"),
  match = c("all", "first", "ask", "na"),
  verbose = getOption("verbose"),
  arg = NULL,
  first = NULL,



character; search term, one or more compounds.


character; type of input. See details for more information.


character; query domain, can be one of "compound", "substance", "assay".


character; How should multiple hits be handled?, "all" all matches are returned, "first" the first matching is returned, "ask" enters an interactive mode and the user is asked for input, "na" returns NA if multiple hits are found.


logical; should a verbose output be printed on the console?


character; optinal arguments like "name_type=word" to match individual words.


deprecated. Use `match` instead.


currently unused.


a tibble.


Valid values for the from argument depend on the domain:

  • compound: "name", "smiles", "inchi", "inchikey", "formula", "sdf", "cas" (an alias for "xref/RN"), <xref>, <structure search>, <fast search>.

  • substance: "name", "sid", <xref>, "sourceid/<source id>" or "sourceall".

  • assay: "aid", <assay target>.

<structure search> is assembled as "substructure | superstructure | similarity | identity / smiles | inchi | sdf | cid", e.g. from = "substructure/smiles".

<xref> is assembled as "xref/{RegistryID | RN | PubMedID | MMDBID | ProteinGI, NucleotideGI | TaxonomyID | MIMID | GeneID | ProbeID | PatentID}", e.g. from = "xref/RN" will query by CAS RN.

<fast search> is either fastformula or it is assembled as "fastidentity | fastsimilarity_2d | fastsimilarity_3d | fastsubstructure | fastsuperstructure/smiles | smarts | inchi | sdf | cid", e.g. from = "fastidentity/smiles".

<source id> is any valid PubChem Data Source ID. When from = "sourceid/<source id>", the query is the ID of the substance in the depositor's database.

If from = "sourceall" the query is one or more valid Pubchem depositor names. Depositor names are not case sensitive.

Depositor names and Data Source IDs can be found at

<assay target> is assembled as "target/{gi | proteinname | geneid | genesymbol | accession}", e.g. from = "target/geneid" will query by GeneID.


Please respect the Terms and Conditions of the National Library of Medicine, the data usage policies of National Center for Biotechnology Information,,, and the data usage policies of the indicidual data sources


Wang, Y., J. Xiao, T. O. Suzek, et al. 2009 PubChem: A Public Information System for Analyzing Bioactivities of Small Molecules. Nucleic Acids Research 37: 623–633.

Kim, Sunghwan, Paul A. Thiessen, Evan E. Bolton, et al. 2016 PubChem Substance and Compound Databases. Nucleic Acids Research 44(D1): D1202–D1213.

Kim, S., Thiessen, P. A., Bolton, E. E., & Bryant, S. H. (2015). PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic acids research, gkv396.

Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13 .


if (FALSE) {
# might fail if API is not available
get_cid("Triclosan", arg = "name_type=word")
# from SMILES
get_cid("CCCC", from = "smiles")
# from InChI
get_cid("InChI=1S/CH5N/c1-2/h2H2,1H3", from = "inchi")
# from InChIKey
get_cid("BPGDAMSIGCZZLK-UHFFFAOYSA-N", from = "inchikey")
# from formula
get_cid("C26H52NO6P", from = "formula")
# from CAS RN
get_cid("56-40-6", from = "xref/rn")
# similarity
get_cid(5564, from = "similarity/cid")
get_cid("CCO", from = "similarity/smiles")
# from SID
get_cid("126534046", from = "sid", domain = "substance")
# sourceid
get_cid("VCC957895", from = "sourceid/23706", domain = "substance")
# sourceall
get_cid("Optopharma Ltd", from = "sourceall", domain = "substance")
# from AID (CIDs of substances tested in the assay)
get_cid(170004, from = "aid", domain = "assay")
# from GeneID (CIDs of substances tested on the gene)
get_cid(25086, from = "target/geneid", domain = "assay")

# multiple inputs
get_cid(c("Triclosan", "Aspirin"))