webchem
is a R package to retrieve chemical information from the web.
This package interacts with a suite of web APIs to retrieve chemical information.
Source | Function(s | API Docs | API key |
---|---|---|---|
Chemical Identifier Resolver (CIR) | cir_query() |
link | none |
ChemSpider | get_csid() , csid_compinfo() , csid_extcompinfo() |
link | required (link) |
PubChem | get_cid() , cid_compinfo() |
link | none |
Chemical Translation Service (CTS) | cts_convert() , cts_compinfo() |
link | none |
PAN Pesticide Database | pan() |
link | none |
Allan Wood's Compendium of Pesticide Common Names | allanwood() |
link | none |
ChemSpider functions require a security token. Please register at RSC (https://www.rsc.org/rsc-id/register) to retrieve a security token.
install.packages("webchem")
install.packages("devtools")
library("devtools")
install_github("ropensci/webchem")
library("webchem")
CAS numbers and molecular weight for Triclosan.
Use first
to return only the first hit.
cir_query('Triclosan', 'cas')
#> [1] "3380-34-5" "112099-35-1" "88032-08-0"
cir_query('Triclosan', 'cas', first = TRUE)
#> [1] "3380-34-5"
cir_query('Triclosan', 'mw')
#> [1] "289.5451"
Query SMILES and InChIKey from CAS (Triclosan).
Inputs might by ambiguous and we can specify where to search using resolver=
.
cir_query('3380-34-5', 'smiles')
#> [1] "C1=CC(=CC(=C1OC2=CC=C(C=C2Cl)Cl)O)Cl"
cir_query('3380-34-5', 'stdinchikey', resolver = 'cas_number')
#> [1] "InChIKey=XEFQLINVKFYRCS-UHFFFAOYSA-N"
Convert InChiKey (Triclosan) to ChemSpider ID and retrieve the number of rings
cir_query('XEFQLINVKFYRCS-UHFFFAOYSA-N', 'chemspider_id', first = TRUE)
#> [1] "<!DOCTYPE"
cir_query('XEFQLINVKFYRCS-UHFFFAOYSA-N', 'ring_count')
#> [1] "2"
You'll need a API key:
token = '<YOUR TOKEN HERE'
Retrieve the ChemSpider ID of Triclosan
(id <- get_csid('Triclosan', token = token))
#> [1] "5363"
Use this ID to query information from ChemSpider
csid_extcompinfo(id, token = token)
#> CSID
#> "5363"
#> MF
#> "C_{12}H_{7}Cl_{3}O_{2}"
#> SMILES
#> "c1cc(c(cc1Cl)O)Oc2ccc(cc2Cl)Cl"
#> InChI
#> "InChI=1/C12H7Cl3O2/c13-7-1-3-11(9(15)5-7)17-12-4-2-8(14)6-10(12)16/h1-6,16H"
#> InChIKey
#> "XEFQLINVKFYRCS-UHFFFAOYAS"
#> AverageMass
#> "289.5418"
#> MolecularWeight
#> "289.5418"
#> MonoisotopicMass
#> "287.951172"
#> NominalMass
#> "288"
#> ALogP
#> "5.53"
#> XLogP
#> "5"
#> CommonName
#> "Triclosan"
Retrieve PubChem CID
get_cid('Triclosan')
#> [1] "5564" "131203" "627458" "15942656" "16220126" "16220128"
#> [7] "16220129" "16220130" "18413505" "22947105" "23656593" "24848164"
#> [13] "25023954" "25023955" "25023956" "25023957" "25023958" "25023959"
#> [19] "25023960" "25023961" "25023962" "25023963" "25023964" "25023965"
#> [25] "25023966" "25023967" "25023968" "25023969" "25023970" "25023971"
#> [31] "25023972" "25023973" "45040608" "45040609" "67606151" "71752714"
cid <- get_cid('3380-34-5')
Use this CID to retrieve some chemical properties:
props <- cid_compinfo(cid)
props$InChIKey
#> [1] "XEFQLINVKFYRCS-UHFFFAOYSA-N"
props$MolecularWeight
#> [1] "289.541780"
props$IUPACName
#> [1] "5-chloro-2-(2,4-dichlorophenoxy)phenol"
CTS allows to convert from nearly every possible identifier to nearly every possible identifier:
cts_convert(query = '3380-34-5', from = 'CAS', to = 'PubChem CID')
#> [1] "5564"
cts_convert(query = '3380-34-5', from = 'CAS', to = 'ChemSpider')
#> [1] "5363"
(inchk <- cts_convert(query = 'Triclosan', from = 'Chemical Name', to = 'inchikey'))
#> [1] "XEFQLINVKFYRCS-UHFFFAOYSA-N"
Moreover, we can a lot of information stored in the CTS database using InChIkey
info <- cts_compinfo(inchikey = inchk)
info[1:5]
#> $inchikey
#> [1] "XEFQLINVKFYRCS-UHFFFAOYSA-N"
#>
#> $inchicode
#> [1] "InChI=1S/C12H7Cl3O2/c13-7-1-3-11(9(15)5-7)17-12-4-2-8(14)6-10(12)16/h1-6,16H"
#>
#> $molweight
#> [1] 289.5418
#>
#> $exactmass
#> [1] 287.9512
#>
#> $formula
#> [1] "C12H7Cl3O2"
pan()
returns a list of 73 entries, here I extract only 4 of those:
pan_list <- pan('lambda-Cyhalothrin', first = TRUE)
pan_list[c("CAS Number", "Chemical Class", "Water Solubility (Avg, mg/L)", "Adsorption Coefficient (Koc)" )]
#> $`CAS Number`
#> [1] "91465-08-6"
#>
#> $`Chemical Class`
#> [1] "Pyrethroid"
#>
#> $`Water Solubility (Avg, mg/L)`
#> [1] "0.0050"
#>
#> $`Adsorption Coefficient (Koc)`
#> [1] "157000"
allanwood()
returns a list of 9 entries and can query common names and cas numbers:
allanwood('Fluazinam', type = 'commonname')
#> $cname
#> [1] "Fluazinam"
#>
#> $status
#> [1] "ISO 1750 (published)"
#>
#> $pref_iupac_name
#> [1] "3-chloro-N-[3-chloro-2,6-dinitro-4-(trifluoromethyl)phenyl]-5-(trifluoromethyl)pyridin-2-amine"
#>
#> $iupac_name
#> [1] "3-chloro-N-(3-chloro-5-trifluoromethyl-2-pyridyl)-α,α,α-trifluoro-2,6-dinitro-p-toluidine"
#>
#> $cas
#> [1] "79622-59-6"
#>
#> $formula
#> [1] "C13H4Cl2F6N4O4"
#>
#> $activity
#> [1] "fungicides (pyridine fungicides)"
#>
#> $inchikey
#> [1] "UZCGKGPEKUCDTF-UHFFFAOYSA-N"
#>
#> $inch
#> [1] "InChI=1S/C13H4Cl2F6N4O4/c14-6-1-4(12(16,17)18)3-22-11(6)23-9-7(24(26)27)2-5(13(19,20)21)8(15)10(9)25(28)29/h1-3H,(H,22,23)"
allanwood('79622-59-6', type = 'cas')$cname
#> [1] "fluazinam"
Without the fantastic web services webchem
wouldn't be here.
Therefore, kudos to the web service providers and developers!
If you're more familiar with Python you should check out Matt Swains repositories: ChemSpiPy, PubChemPy and CirPy provide similar functionality as webchem
.
- Please report any issues, bugs or feature requests.
- License: MIT