Coder Social home page Coder Social logo

rdkit / rdkit Goto Github PK

View Code? Open in Web Editor NEW
2.6K 82.0 867.0 201.05 MB

The official sources for the RDKit library

License: BSD 3-Clause "New" or "Revised" License

CMake 1.11% C++ 24.36% Python 5.70% C 2.46% JavaScript 0.22% HTML 63.98% Makefile 0.01% QMake 0.01% Smarty 0.01% LLVM 0.06% Shell 0.01% Java 0.58% C# 0.07% Fortran 0.01% Yacc 0.09% Lex 0.01% SMT 0.01% Jupyter Notebook 0.74% Dockerfile 0.01% SWIG 0.56%
cheminformatics c-plus-plus python rdkit

rdkit's Introduction

RDKit

Azure build Status DOI

What is it?

The RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.

  • BSD license - a business friendly license for open source
  • Core data structures and algorithms in C++
  • Python 3.x wrapper generated using Boost.Python
  • Java and C# wrappers generated with SWIG
  • JavaScript (generated with emscripten) and CFFI wrappers around important functionality
  • 2D and 3D molecular operations
  • Descriptor and Fingerprint generation for machine learning
  • Molecular database cartridge for PostgreSQL supporting substructure and similarity searches as well as many descriptor calculators
  • Cheminformatics nodes for KNIME
  • Contrib folder with useful community-contributed software harnessing the power of the RDKit

Installation and getting started

If you are working in Python and using conda (our recommendation), installation is super easy:

$ conda install -c conda-forge rdkit

You can then take a look at our Getting Started in Python guide.

More detailed installation instructions are available in Docs/Book/Install.md.

Documentation

Available on the RDKit page and in the Docs folder on GitHub

The RDKit blog often has useful tips and tricks.

Support and Community

If you have questions, comments, or suggestions, the best places for those are:

If you've found a bug or would like to request a feature, please create an issue

We also have a LinkedIn group

We have a yearly user group meeting (the UGM) where members of the community do presentations and lightning talks on things they've done with the RDKit. Materials from past UGMs, which can quite useful, are also online:

License

Code released under the BSD license.

rdkit's People

Contributors

adalke avatar alexandersavelyev avatar apahl avatar avaucher avatar bjonnh-work avatar bp-kelley avatar coleb avatar d-b-w avatar daenuprobst avatar davidacosgrove avatar e-kwsm avatar gedeck avatar greglandrum avatar ichirutake avatar jlvarjo avatar jones-gareth avatar k-ujihara avatar mcs07 avatar mwojcikowski avatar nadineschneider avatar ptosco avatar rachelnwalker avatar ricrogz avatar rvianello avatar samoturk avatar sriniker avatar tadhurst-cdd avatar thegodone avatar unixjunkie avatar vfscalfani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rdkit's Issues

Docs for Descriptors.MolWt are wrong

In [21]: Descriptors.MolWt?
Type:       function
String Form:<function <lambda> at 0x2f70320>
File:       /scratch/RDKit_trunk/rdkit/Chem/Descriptors.py
Definition: Descriptors.MolWt(*x, **y)
Docstring:
The average molecular weight of the molecule ignoring hydrogens

>>> MolWt(Chem.MolFromSmiles('CC'))
30.07...
>>> MolWt(Chem.MolFromSmiles('[NH4+].[Cl-]'))
53.49...

Stereochemistry lost for reacting atoms that don't change connectivity

Reported by Robert Feinstein in this thread: http://www.mail-archive.com/[email protected]/msg02908.html

# Demo of RDKit reaction transform nuking stereocenters
from rdkit import Chem
from rdkit.Chem import AllChem

# Define simple transform that includes possible stereocenter ([C:2])
rxn = AllChem.ReactionFromSmarts('[C:2][C:1]=O>>[C:2][C:1]=S')

# React achiral mol as test
ps = rxn.RunReactants((Chem.MolFromSmiles('CC=O'),))
Chem.MolToSmiles( ps[0][0], isomericSmiles=True )
# Output is 'CC=S'

# React mol with chiral center far removed
ps = rxn.RunReactants((Chem.MolFromSmiles('[Cl][C@H]([Br])CCCC=O'),))
Chem.MolToSmiles( ps[0][0], isomericSmiles=True )
# Output is 'S=CCCC[C@H](Cl)Br'

# React mol with chiral center included in transform component
ps = rxn.RunReactants((Chem.MolFromSmiles('[Cl][C@H](C=O)'),))
Chem.MolToSmiles( ps[0][0], isomericSmiles=True )
# Output is 'S=CCCl' - chriality has been lost.

Bad ring query matches for molecules from MolFromSmarts

In [5]: Chem.MolFromSmiles('c:1:c:c:c:c:c1').HasSubstructMatch(Chem.MolFromSmarts('[R2]~[R1]~[R2]'))
Out[5]: False

In [6]: Chem.MolFromSmarts('c:1:c:c:c:c:c1').HasSubstructMatch(Chem.MolFromSmarts('[R2]~[R1]~[R2]'))
Out[6]: True

In [7]: Chem.MolFromSmarts('ccc').HasSubstructMatch(Chem.MolFromSmarts('[R2]~[R1]~[R2]'))
Out[7]: True

MolFragmentToSmiles generating non-canonical results

In [3]: Chem.MolFragmentToSmiles(Chem.MolFromSmiles('c1c(C)cccc1'),(0,1,2))
Out[3]: 'Ccc'

In [4]: Chem.MolFragmentToSmiles(Chem.MolFromSmiles('c1c(C)cccc1'),(1,2,3))
Out[4]: 'ccC'

In [5]: Chem.MolFragmentToSmiles(Chem.MolFromSmiles('c1c(C)cccc1'),(1,3,2))
Out[5]: 'ccC'

Double bond stereochemistry not preserved in reactions.

Reported by Sabrina Syeda.
Thread here: http://www.mail-archive.com/[email protected]/msg03080.html

>>rxn = AllChem.ReactionFromSmarts('[CX4:4][CH1:3]=[CH1:2][CX4:5].[Br:1]>>[C:5][C:2]=[C:3][C:4][Br:1]')
>>rxn.Initialize()
>>r = [Chem.MolFromSmiles('CCC\C=C\C(C)C'), Chem.MolFromSmiles('Br')]
>>ps = rxn.RunReactants(tuple(r))
>> for p in ps:
    ...:     for m in p:
    ...:         print Chem.MolToSmiles(m, isomericSmiles= True)
    ...:         
[out] CCC(Br)C=CC(C)C
[out] CCCC=CC(C)(C)Br

SDWriter initialized on a file object can produce an unhandled C++ exception

Not calling the flush method of an SDWriter initialized on a file object may produce an unhandled exception and terminate the interpreter:

 In [7]: with open('xyz.sdf', 'w') as xyz:
    ...:     w = Chem.SDWriter(xyz)
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     w.write(Chem.MolFromSmiles('c1ccccc1'))
    ...:     
 terminate called after throwing an instance of 'boost::python::error_already_set' 
 Aborted

Incorrect atom labels from BRICS

In [4]: m = Chem.MolFromSmiles('CCOC1(C)CCCCC1')

In [5]: Chem.MolToSmiles(BRICS.BreakBRICSBonds(m),True)
Out[5]: '[3*]O[3*].[4*]CC.[4*]C1(C)CCCCC1'

(dupe of sf.net issue 287 to experiment with github issue tracking)

Support for Pillow/PIL fork

Pillow is a "friendly" fork of PIL. Effectively it replaces PIL.

It's nearly completely backwards compatible, except that it places things under the "PIL" module. Things like "import Image" need to be "from PIL import Image".

Support for it is a couple of lines to rdkit/sping/PIL/pidPIL.py. Change:

import Image, ImageFont, ImageDraw

  • to -

try:
from PIL import Image, ImageFont, ImageDraw
except ImportError:
import Image, ImageFont, ImageDraw

Compatibility with sdf files served by the PDB

SDFs provided by the PDB (Protein Data Bank) have a slightly different format than what RDKit is expecting.

Example file that would fail with the old code
http://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=XK2&structIdList=1HVR&instanceType=all&excludeUnobserved=false&includeHydrogens=false

The old error was:


Post-condition Violation
Element '' not found
Violation occurred on line 91 in file /home/jandom/workspace/rdkit/Code/GraphMol/PeriodicTable.h
Failed Expression: anum>-1


logLevel bug in MolFromInchi

There is a typo in inchi.py . "logLogLevel" is used where it should be "logLevel". Here is a reproducible:

>>> Chem.MolFromInchi("InChI=1S/CH2/h1H2", logLevel=100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "..../site-packages/rdkit/Chem/inchi.py", line 76, in MolFromInchi
    if logLogLevel not in logLevelToLogFunctionLookup:
NameError: global name 'logLogLevel' is not defined

Reported by Andrew Dalke

Molecules from InChI have incorrect molecular weight.

In [24]: m =Chem.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')

In [25]: em = Chem.EditableMol(m)

In [26]: em.RemoveBond(8,7)

In [27]: nm = em.GetMol()

In [29]: frags = Chem.GetMolFrags(nm,asMols=True)

In [30]: [Descriptors.MolWt(x) for x in frags]
Out[30]: [5.04, 6.048]

It doesn't always happen though:

In [31]: m = Chem.MolFromSmiles('CO')
In [32]: em = Chem.EditableMol(m)

In [33]: em.RemoveBond(0,1)

In [34]: nm = em.GetMol()

In [35]: frags = Chem.GetMolFrags(nm,asMols=True)

In [36]: [Descriptors.MolWt(x) for x in frags]
Out[36]: [16.043, 18.015]

MolFromInchi doesn't work

I am using python Python 2.7.3
from rdkit import Chem
m2 = Chem.inchi.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
I got
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'MolFromInchi'
But if I use MolFromSmiles
from rdkit import Chem
m2 = Chem.MolFromSmiles('C1CCC1')
It works.

mol fails to transfer to inchi format

from rdkit import Chem
from rdkit.Chem import BRICS
m1 = Chem.inchi.MolFromInchi('InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)')
m2 = BRICS.BreakBRICSBonds(m1)
Chem.MolToSmiles(m2,True)

I got

'[14_]c1nnc(C)nc1O.[16_]c1ccccc1'.

But when I try to get inchi format

Chem.inchi.MolToInchi(m2)

I got

[23:56:23] ERROR: Unknown element(s): *
''

By the way,
res = list(BRICS.FindBRICSBonds(m1))
res

I got
[((8, 7), ('14', '16'))]

What are '14' and '16'?

Thanks.

InChI generation code not recognizing stereo

reported by Jan Holst Jensen

> For example: InChI strings generated for spiro.mol (spiro.mol - attached):
>
> IUPAC:
> InChI=1S/2C9H14Cl2/c2*1-7(10)3-9(4-7)5-8(2,11)6-9/h2*3-6H2,1-2H3/t2*7-,8-,9-/m10/s1
> RDKit: InChI=1S/2C9H14Cl2/c2*1-7(10)3-9(4-7)5-8(2,11)6-9/h2*3-6H2,1-2H3

This one still doesn't recognize the stereo. I'll file a bug for it:
In [2]: Chem.MolToInchi(Chem.MolFromMolFile('spiro.mol'))
[09:53:16] WARNING: Omitted undefined stereo
Out[2]: 'InChI=1S/2C9H14Cl2/c2*1-7(10)3-9(4-7)5-8(2,11)6-9/h2*3-6H2,1-2H3'

Here's the file:

spiro.mol
  ACD/Labs0709041010  

 22 24  0  0  0  0  0  0  0  0  1 V2000
    9.2912   -9.4308    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8009   -6.9967    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    7.7063   -7.9840    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.8986   -9.1405    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.1531   -6.3991    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.5689   -7.8459    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.7408   -6.2812    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.8789   -9.3129    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.3257   -7.7280    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.1335   -6.5716    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
   15.2311   -8.7153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.6519  -14.7621    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.1616  -12.3280    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    8.0670  -13.3153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.2593  -14.4717    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.5138  -11.7304    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.9296  -13.1772    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.1015  -11.6125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.2396  -14.6442    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   13.6864  -13.0593    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.4941  -11.9029    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   15.5918  -14.0466    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
  3  2  1  0  0  0  0
  3  4  1  1  0  0  0
  1  3  1  0  0  0  0
  5  3  1  0  0  0  0
  6  1  1  0  0  0  0
  5  6  1  0  0  0  0
  7  6  1  0  0  0  0
  6  8  1  1  0  0  0
  9  7  1  0  0  0  0
  8  9  1  0  0  0  0
 10  9  1  0  0  0  0
  9 11  1  1  0  0  0
 14 13  1  0  0  0  0
 14 15  1  1  0  0  0
 12 14  1  0  0  0  0
 16 14  1  0  0  0  0
 17 12  1  0  0  0  0
 16 17  1  0  0  0  0
 18 17  1  0  0  0  0
 17 19  1  1  0  0  0
 20 18  1  0  0  0  0
 19 20  1  0  0  0  0
 21 20  1  0  0  0  0
 20 22  1  1  0  0  0
M  END
>  <NAME>
spiro 

$$$$

Cannot generate coordinates for output from DeleteSubstructs

As the example shows, this came from problems with salt stripping and is not helped by sanitization.

In [15]: m = Chem.MolFromSmiles('[I-].C[n+]1c(\\C=C\\2/C=CC=CN2CC=C)sc3ccccc13') 

In [16]: sr = SaltRemover.SaltRemover()

In [17]: nm =sr(m)

In [18]: AllChem.Compute2DCoords(nm)
[05:47:08] 

****
Pre-condition Violation

Violation occurred on line 656 in file /scratch/RDKit_trunk/Code/GraphMol/Depictor/EmbeddedFrag.cpp
Failed Expression: d_eatoms.find(aid) == d_eatoms.end()
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-fbe6edb91321> in <module>()
----> 1 AllChem.Compute2DCoords(nm)

RuntimeError: Pre-condition Violation

In [19]: Chem.SanitizeMol(nm)
Out[19]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

In [20]: AllChem.Compute2DCoords(nm)
[05:47:18] 

****
Pre-condition Violation

Violation occurred on line 656 in file /scratch/RDKit_trunk/Code/GraphMol/Depictor/EmbeddedFrag.cpp
Failed Expression: d_eatoms.find(aid) == d_eatoms.end()
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-fbe6edb91321> in <module>()
----> 1 AllChem.Compute2DCoords(nm)

RuntimeError: Pre-condition Violation

In [21]: p = Chem.MolFromSmiles('[I-]')

In [22]: Chem.DeleteSubstructs(m,p)
Out[22]: <rdkit.Chem.rdchem.Mol at 0x3630980>

In [23]: nm2=Chem.DeleteSubstructs(m,p)

In [24]: AllChem.Compute2DCoords(nm2)
[05:47:57] 

****
Pre-condition Violation

Violation occurred on line 656 in file /scratch/RDKit_trunk/Code/GraphMol/Depictor/EmbeddedFrag.cpp
Failed Expression: d_eatoms.find(aid) == d_eatoms.end()
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-24-c2a107eeb439> in <module>()
----> 1 AllChem.Compute2DCoords(nm2)

RuntimeError: Pre-condition Violation

In [25]: Chem.MolToSmiles(nm2,True)
Out[25]: 'C=CCN1C=CC=C/C1=C\\c1sc2ccccc2[n+]1C'

BaseFeatures_DIP2_NoMicroSpecies.fdef not parseable

In [7]: ffact = ChemicalFeatures.BuildFeatureFactory('./BaseFeatures_DIP2_NoMicrospecies.fdef')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-be7869918589> in <module>()
----> 1 ffact = ChemicalFeatures.BuildFeatureFactory('./BaseFeatures_DIP2_NoMicrospecies.fdef')

ValueError:  pattern->getNumAtoms() != len(feature weight vector)

improper behavior for empty SDMolSuppliers

This is reasonable:

[14]>>> s = Chem.SDMolSupplier()

[15]>>> s.SetData("")

[16]>>> s.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
/scratch/RDKit_sf/<ipython-input-16-5e5e6532ea26> in <module>()
----> 1 s.next()

StopIteration: End of supplier hit

But this is bad:

[11]>>> s = Chem.SDMolSupplier()

[12]>>> s.SetData("")

[13]>>> len(s)
  [13]: 1

as is this:

[17]>>> s = Chem.SDMolSupplier()

[18]>>> s.SetData("")

[19]>>> s[0]

[20]>>>

and this is incomprehensible:

[17]>>> s = Chem.SDMolSupplier()

[18]>>> s.SetData("")

[19]>>> s[0]

[20]>>> s[1]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/scratch/RDKit_sf/<ipython-input-20-88de191fe097> in <module>()
----> 1 s[1]

IndexError: invalid index

[21]>>> s.SetData("")

[22]>>> s[0]

[23]>>> s.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
/scratch/RDKit_sf/<ipython-input-23-5e5e6532ea26> in <module>()
----> 1 s.next()

StopIteration: End of supplier hit

[24]>>> len(s)
  [24]: 2

MCS code does not support stereochemistry

Thread here: http://www.mail-archive.com/[email protected]/msg02934.html

In [2]: mol1 = Chem.MolFromSmiles("Fc1ccc(cc1)[C@@]3(OCc2cc(C#N)ccc23)CCCN(C)C") 
In [3]: mol2 = Chem.MolFromSmiles("Fc1ccc(cc1)[C@]3(OCc2cc(C#N)ccc23)CCCN(C)C")

In [4]: from rdkit.Chem import MCS

In [6]: MCS.FindMCS((mol1,mol2))
Out[6]: MCSResult(numAtoms=24, numBonds=26, smarts='[F]-[#6]:1:[#6]:[#6]:[#6](-[#6]-2(-[#6]-[#6]-[#6]-[#7](-[#6])-[#6])-[#8]-[#6]-[#6]:3:[#6]:[#6](:[#6]:[#6]:[#6]:3-2)-[#6]#[#7]):[#6]:[#6]:1', completed=1)

RemoveAtoms with chiral centers causes problems in SMILES generation

In [2]: smiles = "CCN1CCN(c2cc3[nH]c(C(=O)[C@@]4(CC)CC[C@](C)(O)CC4)nc3cc2Cl)CC1"

In [3]: mol = Chem.MolFromSmiles(smiles)

In [4]: tmp = Chem.EditableMol(mol)

In [5]: for atom in [29, 28, 27, 26, 25, 24, 8, 7, 6, 5, 4, 3, 2, 1, 0]: tmp.RemoveAtom(atom)

In [6]: mol = tmp.GetMol()

In [8]: Chem.MolToSmiles(mol)
[05:00:49] 

****
Range Error
idx
Violation occurred on line 153 in file /scratch/RDKit_git/Code/GraphMol/ROMol.cpp
Failed Expression: 0 <= 18 <= 14
****

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-2ae14fafa41a> in <module>()
----> 1 Chem.MolToSmiles(mol)

RuntimeError: Range Error

Remove the chiral spec and things work fine:

In [13]: smiles2 = "CCN1CCN(c2cc3[nH]c(C(=O)[C]4(CC)CC[C](C)(O)CC4)nc3cc2Cl)CC1"

In [14]: mol = Chem.MolFromSmiles(smiles2)

In [15]: tmp = Chem.EditableMol(mol)

In [16]: for atom in [29, 28, 27, 26, 25, 24, 8, 7, 6, 5, 4, 3, 2, 1, 0]: tmp.RemoveAtom(atom)

In [17]: mol = tmp.GetMol()

In [18]: Chem.MolToSmiles(mol)
Out[18]: 'CCC1(C(=O)c(n)[nH])CCC(C)(O)CC1'

reported by Dan Warner

Incorrect InChIs after clearing computed properties

(reported by Francis Atkinson)

from __future__ import print_function

from rdkit import Chem

old_mol=Chem.MolFromMolBlock("""
  Marvin  02211109112D

 13 12  0  0  0  0            999 V2000
   -0.7607  -10.6459    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0457  -10.2343    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6692  -10.6459    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    1.3843  -10.2343    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    2.0993  -10.6459    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    2.8142  -10.2343    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   -1.4740  -10.2352    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6692  -11.4731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.3843   -9.4072    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.0993  -11.4731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.5317  -10.6451    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2440  -10.2326    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.8132   -9.4072    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  3  4  1  0  0  0  0
  4  9  1  1  0  0  0
  1  2  1  0  0  0  0
  5 10  1  1  0  0  0
  4  5  1  0  0  0  0
  6 11  1  0  0  0  0
 11 12  1  0  0  0  0
  5  6  1  0  0  0  0
  6 13  1  6  0  0  0
  2  3  1  0  0  0  0
  1  7  1  0  0  0  0
  3  8  1  1  0  0  0
M  END
""")



main_mol = Chem.DeleteSubstructs(old_mol,Chem.MolFromSmiles('O=[Sb](=O)O'))
main_mol.ClearComputedProps()
Chem.SanitizeMol(main_mol)
print(Chem.MolToSmiles(old_mol,True))
print(Chem.MolToSmiles(main_mol,True))

old_mol.Debug()
main_mol.Debug()

print(Chem.MolToInchi(old_mol))
print(Chem.MolToInchi(new_mol))


logging in inchi.py module

inchi.MolFromInchi and inchi.MolToInchiAndAuxInfo contain a 'log(log)' statement. If the 'logLevel' argument is not None an error is produced.

aromatic Si written in SMILES, but cannot be read

In [2]: Chem.MolFromSmiles('Cc1cc[si](-c2cccc3ccc4cc5ccccc5cc4c32)[si](C)n1')
[04:48:35] SMILES Parse Error: syntax error for input: Cc1cc[si](-c2cccc3ccc4cc5ccccc5cc4c32)[si](C)n1

In [3]: Chem.MolFromSmiles('Cc1cc[Si](-c2cccc3ccc4cc5ccccc5cc4c32)[Si](C)n1')
Out[3]: <rdkit.Chem.rdchem.Mol at 0x242d440>

In [5]: Chem.CanonSmiles('C1=CC=CC=[Si]1')
Out[5]: 'c1cc[si]cc1'

Added USR descriptor

  • Ultrafast Shape Descriptor,
  • access via rdkit.Chem.Descriptors.USR,
  • added unit tests (sanity and numeric),
  • also some docs.

Hashed topological torsion fingerprints not compatible with old version.

2012_12_1:

In [9]: AllChem.GetHashedTopologicalTorsionFingerprint(Chem.MolFromSmiles('CCCCO'),nBits=4192).GetNonzeroElements()
Out[9]: {544: 1, 1760: 1}

2013_03_1:

In [3]: AllChem.GetHashedTopologicalTorsionFingerprint(Chem.MolFromSmiles('CCCCO'),nBits=4192).GetNonzeroElements()
Out[3]: {1974: 1, 3516: 1}

There's no good reason for this to be the case.

SDWriter failing with bad boost::any_cast on windows

[reported by Paul C]

In [10]: from rdkit import Chem

In [11]: from rdkit.Chem import Descriptors

In [12]: from rdkit.ML.Descriptors import MoleculeDescriptors

In [13]: m = Chem.MolFromSmiles('CC')

In [15]: nms=[x[0] for x in Descriptors._descList]

In [16]: calc = MoleculeDescriptors.MolecularDescriptorCalculator(nms)

In [17]: ds= calc.CalcDescriptors(m)

In [18]: w=Chem.SDWriter('blah.sdf')

In [19]: w.write(m)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-19-4b04ce05d7ef> in <module>()
----> 1 w.write(m)

RuntimeError: boost::bad_any_cast: failed conversion using boost::any_cast

Inital take on the USR Descriptor (no tests)

Hi Greg,

Here is my take an the USR Descriptor - it's 2x faster than Adrian's implementation but it's a lot less clearer. It probably can be improved quickly. Numerics agree with Adrian but I should probably add a test case.

Jan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.