Coder Social home page Coder Social logo

molinfo-vienna / cdpkit Goto Github PK

View Code? Open in Web Editor NEW
49.0 4.0 8.0 372.24 MB

The Chemical Data Processing Toolkit

Home Page: https://cdpkit.org

License: GNU Lesser General Public License v2.1

CMake 1.07% C++ 85.05% Python 0.87% Shell 0.06% Pawn 0.02% Mermaid 12.91% Promela 0.02% Dockerfile 0.01%
chemoinformatics

cdpkit's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cdpkit's Issues

score-only option results in that shapescreen gives wrong value

  • I believe this to be a bug with shapescreen
  • This is a feature request

Environment Information

ShapeScreen version: V1.0.0 (C) Thomas Seidel, Build: 202307120555
Operating system and version: Ubuntu 22 (Linux master 5.15.0-43-generic)

Expected Behavior

When we compare a molecule to itself, the Tanimito shape similarity should be 1.

Actual Behavior

Shapescreen give a shape similarity close to 0 which means no similarity.

Steps to Reproduce

Let me show you an example:

Its 3D representation can be available here: ZINC12524304.sdf

Let's compare the shape of ZINC12524304 with itself:

shapescreen -d ZINC12524304.sdf -q ZINC12524304.sdf -o score_only.sdf  -r score_only.rpt -t 1 --score-only true

From the report file,score_only_1.rpt, we can find that:

Shape Tanimoto = 0.071
Color Tanimoto = 0.013
Tanimoto Combo = 0.084

The values of score are far away from 1.

make install prompts the user with error (due to missing doc files)

installing this toolkit with the following:

mkdir build
cd build
cmake ..
make
make install

yields an error at the end of make install indicating that the */build/Doc/C++-API/html and */build/Doc/Python-API/html folders are missing.

I would suggest either not requiring those folders or running doxygen in */build/Doc/C++-API/ and */build/Doc/Python-API/ beforehand if available.

Conformer generation with restrictions

Hello Thomas,
my congratulations with the release of CDPKit!
I'm interested in enumeration of conformers where a part of atoms have fixed pre-defined 3D coordinates and positions of only some atoms are sampled. Is it possible to perform in CDPKit from Python? It seems not, because I could not find something relevant in the documentation, but maybe I missed something.
Kind regards,
Pavel

ConfGen does not preserve pre-specified bond stereo

Hi developers,
I'm using the CDPKit confgen module, and I found one case that the confgen generated conformation does not obey the pre-specified stereo chemistry.
I'm using the latest matster branch and I compiled the latest code and installed it by myself.
In this case, my usage is like: confgen -i test.smi -o test.sdf -t 32 -n 100 -C LARGE_SET_DIVERSE -v DEBUG

In this case, the smi file contain only one molecule, the SMILES is
[H]/C(=C1/S/C(=N\c2c([H])c([H])c([H])c(C(=O)[O-])c2[H])N(C([H])([H])[H])C1=O)c1c([H])c([H])c([H])c([H])c1OC([H])([H])C(=O)[O-]

However, when I check the generated conformations, I found the bond stereo changed, and none of the generated conformations preserve the original bond cis or trans form in the specified SMILES.

a8yJxOPMNL

The confgen generated sdf file is ligand_confs_generated.sdf, and when I check the isomer SMILES, I found that all of the embedded conformations give wrong bond stereo. (I'm mainly a rdkit user)

The SMILES converted by the embedded conformation is
[H]/C(=C1\S/C(=N/c2c([H])c([H])c([H])c(C(=O)[O-])c2[H])N(C([H])([H])[H])C1=O)c1c([H])c([H])c([H])c([H])c1OC([H])([H])C(=O)[O-]

And if we check the poses by eye, we can also see the difference:
1701661867309
This is the original structure, prepared from RCSB PDB.

1701661930265

This is the CDPKit confgen generated structure.

Hope this problem can be fixed soon, thanks for your effort!!

How to install on M1 mac

Can you provide instructions to install on M1 mac please ?

following the guide

make gives me this error

CDPKit/Libs/C++/Source/CDPL/Internal/StringDataIOUtilities.hpp:88:27:
error: no member named 'setlocale' in namespace 'std'; did you mean simply 'setlocale'?

Missing default value for an argument n in gen_confs.py

A minor issue. It was mentioned in the help message that the default value of the argument n is 100. however, it was not set in the script and causes an error if a user does not supply it.

python gen_confs.py -i 1.sdf -o 1_conf.sdf 
Traceback (most recent call last):
  File "/home/pavel/tmp/conforge/gen_confs.py", line 180, in <module>
    main()
  File "/home/pavel/tmp/conforge/gen_confs.py", line 64, in main
    conf_gen.settings.maxNumOutputConformers = args.max_confs # apply the -n argument
Boost.Python.ArgumentError: Python argument types in
    None.None(ConformerGeneratorSettings, NoneType)
did not match C++ signature:
    None(CDPL::ConfGen::ConformerGeneratorSettings {lvalue}, unsigned long)
python gen_confs.py -i 1.sdf -o 1_conf.sdf -n 100
- Generating conformers for molecule 'n1' (#1)...
 -> Generated 12 conformer(s)

Cannot open benzene.xyz format as follows

12
benzene
C -0.80396474 1.25550659 0.00000000
C 0.59119526 1.25550659 0.00000000
C 1.28873326 2.46325759 0.00000000
C 0.59107926 3.67176659 0.00000000
C -0.80374574 3.67168859 0.00000000
C -1.50134674 2.46348259 0.00000000
H -1.35372374 0.30318959 0.00000000
H 1.14070326 0.30299359 0.00000000
H 2.38841326 2.46333759 0.00000000
H 1.14127926 4.62390959 0.00000000
H -1.35386774 4.62396959 0.00000000
H -2.60095074 2.46366559 0.00000000

Energetic issues of conformer generated by ConfGen

Hi Thomas,
In this issue I want to report some weird conformations generated from CDPKit confgen.
test.zip

In the uploaded zip file, there are a few molecules with problematic conformations.
In test.sdf, I show some cases where confgen cannot rotate one of the torsion angles and then the later energy minimization failed to tune the H orientation so that the molecule is in a super high energy state.
1701714097175

1701714212904

For another molecule with SMILES: [H]Oc1c([H])c([H])c([H])c(/C(=N\\N([H])C(=O)c2nn([H])c(-c3c([H])c([H])c(F)c([H])c3[H])c2[H])C([H])([H])[H])c1[H], the confgen failed with a FORCEFIELD_SETUP_FAILED:

- Molecule 1/1:
Found 1 molecular graph component
Force field setup failed: MMFF94InteractionParameterizer: could not determine MMFF94 type of atom #12
Conformer generation finished with return code FORCEFIELD_SETUP_FAILED
Processing time: 0.002s

And I met several of this kind of cases before, and wondering what cause this to happen.

In ligand_confs_generated.sdf, the molecule contain a saturated 5 member ring, with one chiral cabon. The carboxyl group connect to this chiral C normally forms an equatorial bond, as shown in the PDB cocrystal structure:
1701714465998

However, none of the confgen generated conformers have this equatorial bond, they are all axial bond instead.
1701715179102

A later MMFF94s energy minimization performed in RDKit can solve this. However, considering CDPKit confgen has a built-in MMFF94 minimization step, I'm curious why this issue cannot be solved inside the confgen itself. Is this related to the lack of rules in the torsion library or the fragment library? Or is it because of some wrong records in these two libs?

Hope this helps and hope these issues can be solved soon!!
Thank you.

Bad conformers generated for several cases in Astex and Posebuster virtual screening datasets

Hi Thomas,
Recently we have tested the CDPKit confgen for the astex and posebuster datasets. We curated the crystal structures for each of the ligands, apply some Glide-style fragmentation for the ligands, and then use confgen to regenerate conformations for these fragments.

confgen_problems.zip

Here I attached several cases where we find that confgen failed to recover the crystal conformer of ligands. We categorize these cases into 4 classes:

  1. The confgen will generate some cis form amide conformer, which we think should not happen and not found in the crystal structures.

  2. We found some cases with double bond stereo problems as we mentioned in the last issue. I input the confgen with a SMILES converted from the crystal ligand and get wrong double bond stereos as returned. In the attached files we give two of this kind of cvases.

  3. For some cases there are saturated hetero rings in ligands. We give 2 cases where none of the generated conformations recover the correct ring parkering state with crystal conformer.

  4. And the most serious issue now. For lots of cases there are still lacks of torsion rules to recover crystal conformers for some very specific torsion environments. None of the generated conformations matched the correct dihedral angle values for that specific torsions, and we cannot reproduce similar 3D orientation after regenerating conformations. For this part we give 12 cases. And some of them are easy, while several of them are really much more difficult since they are just huge and with > 10 torsions. For these huge cases, maybe it's better to try to sample all the torsion angles efficiently rather than adding torsion ranges in the library one by one.....

Similar to last time, for all the cases, the PDB_ID_ligand.sdf represent the crystal structure of the ligands, and the aligned_root_conf.sdf represents conformers re-generated from the confgen.

Thanks for the help, and hopes these can be fixed soon!

How to use the CDPKit to caculate the number of ligand-based pharmacophore features

I tried to use a python script to calculate the potential pharmacophore characteristics of ligand,
error1.txt
But there is an error reported.

Boost.Python.ArgumentError: Python argument types in
DefaultPharmacophoreGenerator.init(DefaultPharmacophoreGenerator, bool)
did not match C++ signature:

I don't know how to solve this problem, I hope to get help

Here are the details of the python script
import sys
import os.path as path
import CDPL.Pharm as Pharm
import CDPL.Math as Math
import CDPL.Chem as Chem
import CDPL.Base as Base
import numpy
from collections import Counter

def count_features():
if len(sys.argv) < 2:
print >> sys.stderr, 'Usage:', sys.argv[0], '[input.sdf]'
sys.exit(2)
ifs = Base.FileIOStream(sys.argv[1], 'r')

ligand = Chem.BasicMolecule()
sdf_reader = Chem.SDFMoleculeReader(ifs)

lig_pharm = Pharm.BasicPharmacophore()
pharm_gen = Pharm.DefaultPharmacophoreGenerator(True)
ftr_list = list()

while sdf_reader.read(ligand):
    Chem.perceiveSSSR(ligand, True)
    Chem.setAromaticityFlags(ligand, False)
    Chem.setRingFlags (ligand, False)
    pharm_gen.generate(ligand, lig_pharm)
    ftr_list += [Pharm.getType(ftr) for ftr in lig_pharm]

with open("feature_count.txt", "w+") as writer:
    writer.write(str(dict(Counter(ftr_list))))

if name == 'main':
count_features()

Errors when import CDPL

Hello,

I am trying to run example form README to test installation.

Case 1: CDPKit is installed in conda environment (cloned repo and ran pip install -e .), PYTHONPATH is empty. When trying import CDPL, getting the following error:

>>> from CDPL import Chem
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vagrant/CDPKit/Libs/Python/CDPL/__init__.py", line 37, in <module>
    from ._cdpl import *
ModuleNotFoundError: No module named 'CDPL._cdpl'

Case 2: I built CDPKit from source code into /progs/cdpk folder, added /progs/cdpk/Include to PYTHONPATH. When trying example, getting the following error:

>>> from CDPL import Chem
>>> from CDPL import Pharm
>>> mol = Chem.parseSMILES('Cc1ccccc1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'CDPL.Chem' has no attribute 'parseSMILES'

I will appreciate any help with it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.