molinfo-vienna / cdpkit Goto Github PK
View Code? Open in Web Editor NEWThe Chemical Data Processing Toolkit
Home Page: https://cdpkit.org
License: GNU Lesser General Public License v2.1
The Chemical Data Processing Toolkit
Home Page: https://cdpkit.org
License: GNU Lesser General Public License v2.1
In the SDF output file, there are inconsistent style of the property tags.
ShapeScreen version: V1.0.0 (C) Thomas Seidel, Build: 202307120555
Operating system and version: Ubuntu 22 (Linux master 5.15.0-43-generic)
When we compare a molecule to itself, the Tanimito shape similarity should be 1.
Shapescreen give a shape similarity close to 0 which means no similarity.
Its 3D representation can be available here: ZINC12524304.sdf
Let's compare the shape of ZINC12524304 with itself:
shapescreen -d ZINC12524304.sdf -q ZINC12524304.sdf -o score_only.sdf -r score_only.rpt -t 1 --score-only true
From the report file,score_only_1.rpt, we can find that:
Shape Tanimoto = 0.071 Color Tanimoto = 0.013 Tanimoto Combo = 0.084
The values of score are far away from 1.
installing this toolkit with the following:
mkdir build
cd build
cmake ..
make
make install
yields an error at the end of make install indicating that the */build/Doc/C++-API/html and */build/Doc/Python-API/html folders are missing.
I would suggest either not requiring those folders or running doxygen
in */build/Doc/C++-API/ and */build/Doc/Python-API/ beforehand if available.
Hello Thomas,
my congratulations with the release of CDPKit!
I'm interested in enumeration of conformers where a part of atoms have fixed pre-defined 3D coordinates and positions of only some atoms are sampled. Is it possible to perform in CDPKit from Python? It seems not, because I could not find something relevant in the documentation, but maybe I missed something.
Kind regards,
Pavel
Hi developers,
I'm using the CDPKit confgen module, and I found one case that the confgen generated conformation does not obey the pre-specified stereo chemistry.
I'm using the latest matster branch and I compiled the latest code and installed it by myself.
In this case, my usage is like: confgen -i test.smi -o test.sdf -t 32 -n 100 -C LARGE_SET_DIVERSE -v DEBUG
In this case, the smi file contain only one molecule, the SMILES is
[H]/C(=C1/S/C(=N\c2c([H])c([H])c([H])c(C(=O)[O-])c2[H])N(C([H])([H])[H])C1=O)c1c([H])c([H])c([H])c([H])c1OC([H])([H])C(=O)[O-]
However, when I check the generated conformations, I found the bond stereo changed, and none of the generated conformations preserve the original bond cis or trans form in the specified SMILES.
The confgen generated sdf file is ligand_confs_generated.sdf
, and when I check the isomer SMILES, I found that all of the embedded conformations give wrong bond stereo. (I'm mainly a rdkit user)
The SMILES converted by the embedded conformation is
[H]/C(=C1\S/C(=N/c2c([H])c([H])c([H])c(C(=O)[O-])c2[H])N(C([H])([H])[H])C1=O)c1c([H])c([H])c([H])c([H])c1OC([H])([H])C(=O)[O-]
And if we check the poses by eye, we can also see the difference:
This is the original structure, prepared from RCSB PDB.
This is the CDPKit confgen generated structure.
Hope this problem can be fixed soon, thanks for your effort!!
Can you provide instructions to install on M1 mac please ?
following the guide
make gives me this error
CDPKit/Libs/C++/Source/CDPL/Internal/StringDataIOUtilities.hpp:88:27:
error: no member named 'setlocale' in namespace 'std'; did you mean simply 'setlocale'?
A minor issue. It was mentioned in the help message that the default value of the argument n
is 100. however, it was not set in the script and causes an error if a user does not supply it.
python gen_confs.py -i 1.sdf -o 1_conf.sdf
Traceback (most recent call last):
File "/home/pavel/tmp/conforge/gen_confs.py", line 180, in <module>
main()
File "/home/pavel/tmp/conforge/gen_confs.py", line 64, in main
conf_gen.settings.maxNumOutputConformers = args.max_confs # apply the -n argument
Boost.Python.ArgumentError: Python argument types in
None.None(ConformerGeneratorSettings, NoneType)
did not match C++ signature:
None(CDPL::ConfGen::ConformerGeneratorSettings {lvalue}, unsigned long)
python gen_confs.py -i 1.sdf -o 1_conf.sdf -n 100
- Generating conformers for molecule 'n1' (#1)...
-> Generated 12 conformer(s)
12
benzene
C -0.80396474 1.25550659 0.00000000
C 0.59119526 1.25550659 0.00000000
C 1.28873326 2.46325759 0.00000000
C 0.59107926 3.67176659 0.00000000
C -0.80374574 3.67168859 0.00000000
C -1.50134674 2.46348259 0.00000000
H -1.35372374 0.30318959 0.00000000
H 1.14070326 0.30299359 0.00000000
H 2.38841326 2.46333759 0.00000000
H 1.14127926 4.62390959 0.00000000
H -1.35386774 4.62396959 0.00000000
H -2.60095074 2.46366559 0.00000000
Hi Thomas,
In this issue I want to report some weird conformations generated from CDPKit confgen.
test.zip
In the uploaded zip file, there are a few molecules with problematic conformations.
In test.sdf
, I show some cases where confgen cannot rotate one of the torsion angles and then the later energy minimization failed to tune the H orientation so that the molecule is in a super high energy state.
For another molecule with SMILES: [H]Oc1c([H])c([H])c([H])c(/C(=N\\N([H])C(=O)c2nn([H])c(-c3c([H])c([H])c(F)c([H])c3[H])c2[H])C([H])([H])[H])c1[H]
, the confgen failed with a FORCEFIELD_SETUP_FAILED
:
- Molecule 1/1:
Found 1 molecular graph component
Force field setup failed: MMFF94InteractionParameterizer: could not determine MMFF94 type of atom #12
Conformer generation finished with return code FORCEFIELD_SETUP_FAILED
Processing time: 0.002s
And I met several of this kind of cases before, and wondering what cause this to happen.
In ligand_confs_generated.sdf
, the molecule contain a saturated 5 member ring, with one chiral cabon. The carboxyl group connect to this chiral C normally forms an equatorial bond, as shown in the PDB cocrystal structure:
However, none of the confgen generated conformers have this equatorial bond, they are all axial bond instead.
A later MMFF94s energy minimization performed in RDKit can solve this. However, considering CDPKit confgen has a built-in MMFF94 minimization step, I'm curious why this issue cannot be solved inside the confgen itself. Is this related to the lack of rules in the torsion library or the fragment library? Or is it because of some wrong records in these two libs?
Hope this helps and hope these issues can be solved soon!!
Thank you.
Hi Thomas,
Recently we have tested the CDPKit confgen for the astex and posebuster datasets. We curated the crystal structures for each of the ligands, apply some Glide-style fragmentation for the ligands, and then use confgen to regenerate conformations for these fragments.
Here I attached several cases where we find that confgen failed to recover the crystal conformer of ligands. We categorize these cases into 4 classes:
The confgen will generate some cis form amide conformer, which we think should not happen and not found in the crystal structures.
We found some cases with double bond stereo problems as we mentioned in the last issue. I input the confgen with a SMILES converted from the crystal ligand and get wrong double bond stereos as returned. In the attached files we give two of this kind of cvases.
For some cases there are saturated hetero rings in ligands. We give 2 cases where none of the generated conformations recover the correct ring parkering state with crystal conformer.
And the most serious issue now. For lots of cases there are still lacks of torsion rules to recover crystal conformers for some very specific torsion environments. None of the generated conformations matched the correct dihedral angle values for that specific torsions, and we cannot reproduce similar 3D orientation after regenerating conformations. For this part we give 12 cases. And some of them are easy, while several of them are really much more difficult since they are just huge and with > 10 torsions. For these huge cases, maybe it's better to try to sample all the torsion angles efficiently rather than adding torsion ranges in the library one by one.....
Similar to last time, for all the cases, the PDB_ID_ligand.sdf
represent the crystal structure of the ligands, and the aligned_root_conf.sdf
represents conformers re-generated from the confgen.
Thanks for the help, and hopes these can be fixed soon!
Excellent project!
I tried to use a python script to calculate the potential pharmacophore characteristics of ligand,
error1.txt
But there is an error reported.
Boost.Python.ArgumentError: Python argument types in
DefaultPharmacophoreGenerator.init(DefaultPharmacophoreGenerator, bool)
did not match C++ signature:
I don't know how to solve this problem, I hope to get help
Here are the details of the python script
import sys
import os.path as path
import CDPL.Pharm as Pharm
import CDPL.Math as Math
import CDPL.Chem as Chem
import CDPL.Base as Base
import numpy
from collections import Counter
def count_features():
if len(sys.argv) < 2:
print >> sys.stderr, 'Usage:', sys.argv[0], '[input.sdf]'
sys.exit(2)
ifs = Base.FileIOStream(sys.argv[1], 'r')
ligand = Chem.BasicMolecule()
sdf_reader = Chem.SDFMoleculeReader(ifs)
lig_pharm = Pharm.BasicPharmacophore()
pharm_gen = Pharm.DefaultPharmacophoreGenerator(True)
ftr_list = list()
while sdf_reader.read(ligand):
Chem.perceiveSSSR(ligand, True)
Chem.setAromaticityFlags(ligand, False)
Chem.setRingFlags (ligand, False)
pharm_gen.generate(ligand, lig_pharm)
ftr_list += [Pharm.getType(ftr) for ftr in lig_pharm]
with open("feature_count.txt", "w+") as writer:
writer.write(str(dict(Counter(ftr_list))))
if name == 'main':
count_features()
Hello,
I am trying to run example form README to test installation.
Case 1: CDPKit is installed in conda environment (cloned repo and ran pip install -e .
), PYTHONPATH
is empty. When trying import CDPL, getting the following error:
>>> from CDPL import Chem
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vagrant/CDPKit/Libs/Python/CDPL/__init__.py", line 37, in <module>
from ._cdpl import *
ModuleNotFoundError: No module named 'CDPL._cdpl'
Case 2: I built CDPKit from source code into /progs/cdpk
folder, added /progs/cdpk/Include
to PYTHONPATH
. When trying example, getting the following error:
>>> from CDPL import Chem
>>> from CDPL import Pharm
>>> mol = Chem.parseSMILES('Cc1ccccc1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'CDPL.Chem' has no attribute 'parseSMILES'
I will appreciate any help with it.
The current of version confgen
has two -p
arguments
-p [ –fixed-substr-min-atoms ] arg
-p [ –progress ] [=arg(=1)]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.