molinfo-vienna / cdpkit Goto Github PK

View Code? Open in Web Editor NEW

49.0 4.0 8.0 372.24 MB

The Chemical Data Processing Toolkit

Home Page: https://cdpkit.org

License: GNU Lesser General Public License v2.1

CMake 1.07% C++ 85.05% Python 0.87% Shell 0.06% Pawn 0.02% Mermaid 12.91% Promela 0.02% Dockerfile 0.01%

chemoinformatics

cdpkit's People

Stargazers

Watchers

Forkers

minghao2016 chris9182 rnaimehaom unixjunkie biocheming fongchun rongfengzou

cdpkit's Issues

Inconsistent style in the tag of SDF output file

In the SDF output file, there are inconsistent style of the property tags.

score-only option results in that shapescreen gives wrong value

I believe this to be a bug with shapescreen
This is a feature request

Environment Information

ShapeScreen version: V1.0.0 (C) Thomas Seidel, Build: 202307120555
Operating system and version: Ubuntu 22 (Linux master 5.15.0-43-generic)

Expected Behavior

When we compare a molecule to itself, the Tanimito shape similarity should be 1.

Actual Behavior

Shapescreen give a shape similarity close to 0 which means no similarity.

Steps to Reproduce

Let me show you an example:

Its 3D representation can be available here: ZINC12524304.sdf

Let's compare the shape of ZINC12524304 with itself:

shapescreen -d ZINC12524304.sdf -q ZINC12524304.sdf -o score_only.sdf  -r score_only.rpt -t 1 --score-only true

From the report file,score_only_1.rpt, we can find that:

Shape Tanimoto = 0.071
Color Tanimoto = 0.013
Tanimoto Combo = 0.084

The values of score are far away from 1.

make install prompts the user with error (due to missing doc files)

installing this toolkit with the following:

mkdir build
cd build
cmake ..
make
make install

yields an error at the end of make install indicating that the */build/Doc/C++-API/html and */build/Doc/Python-API/html folders are missing.

I would suggest either not requiring those folders or running doxygen in */build/Doc/C++-API/ and */build/Doc/Python-API/ beforehand if available.

Conformer generation with restrictions

Hello Thomas,
my congratulations with the release of CDPKit!
I'm interested in enumeration of conformers where a part of atoms have fixed pre-defined 3D coordinates and positions of only some atoms are sampled. Is it possible to perform in CDPKit from Python? It seems not, because I could not find something relevant in the documentation, but maybe I missed something.
Kind regards,
Pavel

ConfGen does not preserve pre-specified bond stereo

Hi developers,
I'm using the CDPKit confgen module, and I found one case that the confgen generated conformation does not obey the pre-specified stereo chemistry.
I'm using the latest matster branch and I compiled the latest code and installed it by myself.
In this case, my usage is like: confgen -i test.smi -o test.sdf -t 32 -n 100 -C LARGE_SET_DIVERSE -v DEBUG

In this case, the smi file contain only one molecule, the SMILES is
[H]/C(=C1/S/C(=N\c2c([H])c([H])c([H])c(C(=O)[O-])c2[H])N(C([H])([H])[H])C1=O)c1c([H])c([H])c([H])c([H])c1OC([H])([H])C(=O)[O-]

However, when I check the generated conformations, I found the bond stereo changed, and none of the generated conformations preserve the original bond cis or trans form in the specified SMILES.

The confgen generated sdf file is ligand_confs_generated.sdf, and when I check the isomer SMILES, I found that all of the embedded conformations give wrong bond stereo. (I'm mainly a rdkit user)

The SMILES converted by the embedded conformation is
[H]/C(=C1\S/C(=N/c2c([H])c([H])c([H])c(C(=O)[O-])c2[H])N(C([H])([H])[H])C1=O)c1c([H])c([H])c([H])c([H])c1OC([H])([H])C(=O)[O-]

And if we check the poses by eye, we can also see the difference:

This is the original structure, prepared from RCSB PDB.

This is the CDPKit confgen generated structure.

Hope this problem can be fixed soon, thanks for your effort!!

How to install on M1 mac

Can you provide instructions to install on M1 mac please ?

following the guide

make gives me this error

CDPKit/Libs/C++/Source/CDPL/Internal/StringDataIOUtilities.hpp:88:27:
error: no member named 'setlocale' in namespace 'std'; did you mean simply 'setlocale'?

Missing default value for an argument n in gen_confs.py

A minor issue. It was mentioned in the help message that the default value of the argument n is 100. however, it was not set in the script and causes an error if a user does not supply it.

python gen_confs.py -i 1.sdf -o 1_conf.sdf 
Traceback (most recent call last):
  File "/home/pavel/tmp/conforge/gen_confs.py", line 180, in <module>
    main()
  File "/home/pavel/tmp/conforge/gen_confs.py", line 64, in main
    conf_gen.settings.maxNumOutputConformers = args.max_confs # apply the -n argument
Boost.Python.ArgumentError: Python argument types in
    None.None(ConformerGeneratorSettings, NoneType)
did not match C++ signature:
    None(CDPL::ConfGen::ConformerGeneratorSettings {lvalue}, unsigned long)

python gen_confs.py -i 1.sdf -o 1_conf.sdf -n 100
- Generating conformers for molecule 'n1' (#1)...
 -> Generated 12 conformer(s)

Cannot open benzene.xyz format as follows

12
benzene
C -0.80396474 1.25550659 0.00000000
C 0.59119526 1.25550659 0.00000000
C 1.28873326 2.46325759 0.00000000
C 0.59107926 3.67176659 0.00000000
C -0.80374574 3.67168859 0.00000000
C -1.50134674 2.46348259 0.00000000
H -1.35372374 0.30318959 0.00000000
H 1.14070326 0.30299359 0.00000000
H 2.38841326 2.46333759 0.00000000
H 1.14127926 4.62390959 0.00000000
H -1.35386774 4.62396959 0.00000000
H -2.60095074 2.46366559 0.00000000

Energetic issues of conformer generated by ConfGen

Hi Thomas,
In this issue I want to report some weird conformations generated from CDPKit confgen.
test.zip

In the uploaded zip file, there are a few molecules with problematic conformations.
In test.sdf, I show some cases where confgen cannot rotate one of the torsion angles and then the later energy minimization failed to tune the H orientation so that the molecule is in a super high energy state.

For another molecule with SMILES: [H]Oc1c([H])c([H])c([H])c(/C(=N\\N([H])C(=O)c2nn([H])c(-c3c([H])c([H])c(F)c([H])c3[H])c2[H])C([H])([H])[H])c1[H], the confgen failed with a FORCEFIELD_SETUP_FAILED:

- Molecule 1/1:
Found 1 molecular graph component
Force field setup failed: MMFF94InteractionParameterizer: could not determine MMFF94 type of atom #12
Conformer generation finished with return code FORCEFIELD_SETUP_FAILED
Processing time: 0.002s

And I met several of this kind of cases before, and wondering what cause this to happen.

In ligand_confs_generated.sdf, the molecule contain a saturated 5 member ring, with one chiral cabon. The carboxyl group connect to this chiral C normally forms an equatorial bond, as shown in the PDB cocrystal structure:

However, none of the confgen generated conformers have this equatorial bond, they are all axial bond instead.

A later MMFF94s energy minimization performed in RDKit can solve this. However, considering CDPKit confgen has a built-in MMFF94 minimization step, I'm curious why this issue cannot be solved inside the confgen itself. Is this related to the lack of rules in the torsion library or the fragment library? Or is it because of some wrong records in these two libs?

Hope this helps and hope these issues can be solved soon!!
Thank you.

Bad conformers generated for several cases in Astex and Posebuster virtual screening datasets

Hi Thomas,
Recently we have tested the CDPKit confgen for the astex and posebuster datasets. We curated the crystal structures for each of the ligands, apply some Glide-style fragmentation for the ligands, and then use confgen to regenerate conformations for these fragments.

confgen_problems.zip

Here I attached several cases where we find that confgen failed to recover the crystal conformer of ligands. We categorize these cases into 4 classes:

The confgen will generate some cis form amide conformer, which we think should not happen and not found in the crystal structures.
We found some cases with double bond stereo problems as we mentioned in the last issue. I input the confgen with a SMILES converted from the crystal ligand and get wrong double bond stereos as returned. In the attached files we give two of this kind of cvases.
For some cases there are saturated hetero rings in ligands. We give 2 cases where none of the generated conformations recover the correct ring parkering state with crystal conformer.
And the most serious issue now. For lots of cases there are still lacks of torsion rules to recover crystal conformers for some very specific torsion environments. None of the generated conformations matched the correct dihedral angle values for that specific torsions, and we cannot reproduce similar 3D orientation after regenerating conformations. For this part we give 12 cases. And some of them are easy, while several of them are really much more difficult since they are just huge and with > 10 torsions. For these huge cases, maybe it's better to try to sample all the torsion angles efficiently rather than adding torsion ranges in the library one by one.....

Similar to last time, for all the cases, the PDB_ID_ligand.sdf represent the crystal structure of the ligands, and the aligned_root_conf.sdf represents conformers re-generated from the confgen.

Thanks for the help, and hopes these can be fixed soon!

support xyz format or not?

Excellent project!

How to use the CDPKit to caculate the number of ligand-based pharmacophore features

I tried to use a python script to calculate the potential pharmacophore characteristics of ligand,
error1.txt
But there is an error reported.

Boost.Python.ArgumentError: Python argument types in
DefaultPharmacophoreGenerator.init(DefaultPharmacophoreGenerator, bool)
did not match C++ signature:

I don't know how to solve this problem, I hope to get help

Here are the details of the python script
import sys
import os.path as path
import CDPL.Pharm as Pharm
import CDPL.Math as Math
import CDPL.Chem as Chem
import CDPL.Base as Base
import numpy
from collections import Counter

def count_features():
if len(sys.argv) < 2:
print >> sys.stderr, 'Usage:', sys.argv[0], '[input.sdf]'
sys.exit(2)
ifs = Base.FileIOStream(sys.argv[1], 'r')

ligand = Chem.BasicMolecule()
sdf_reader = Chem.SDFMoleculeReader(ifs)

lig_pharm = Pharm.BasicPharmacophore()
pharm_gen = Pharm.DefaultPharmacophoreGenerator(True)
ftr_list = list()

while sdf_reader.read(ligand):
    Chem.perceiveSSSR(ligand, True)
    Chem.setAromaticityFlags(ligand, False)
    Chem.setRingFlags (ligand, False)
    pharm_gen.generate(ligand, lig_pharm)
    ftr_list += [Pharm.getType(ftr) for ftr in lig_pharm]

with open("feature_count.txt", "w+") as writer:
    writer.write(str(dict(Counter(ftr_list))))

if name == 'main':
count_features()

Errors when import CDPL

Hello,

I am trying to run example form README to test installation.

Case 1: CDPKit is installed in conda environment (cloned repo and ran pip install -e .), PYTHONPATH is empty. When trying import CDPL, getting the following error:

>>> from CDPL import Chem
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vagrant/CDPKit/Libs/Python/CDPL/__init__.py", line 37, in <module>
    from ._cdpl import *
ModuleNotFoundError: No module named 'CDPL._cdpl'

Case 2: I built CDPKit from source code into /progs/cdpk folder, added /progs/cdpk/Include to PYTHONPATH. When trying example, getting the following error:

>>> from CDPL import Chem
>>> from CDPL import Pharm
>>> mol = Chem.parseSMILES('Cc1ccccc1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'CDPL.Chem' has no attribute 'parseSMILES'

I will appreciate any help with it.

Confgen has duplicated arg names

The current of version confgen has two -p arguments

-p [ –fixed-substr-min-atoms ] arg
-p [ –progress ] [=arg(=1)]