Coder Social home page Coder Social logo

molwitch-cdk's Introduction

Which Internal Toolkit for CHemicals

MolWitch implementation using CDK

Available on Maven Central

Usually, one needs to add 2 dependencies: This adds the API.

<dependency>
  <groupId>gov.nih.ncats</groupId>
  <artifactId>molwitch</artifactId>
  <version>0.6.1</version>
</dependency>

There also needs to be a molwitch implementation

To add CDK molwitch implementation (this version uses CDK 2.6):

<dependency>
        <groupId>gov.nih.ncats</groupId>
        <artifactId>molwitch-cdk</artifactId>
        <version>1.0.9</version>
</dependency>

Current API Contract Compliance

Results from running the latest code on Molwitch-cdk using the API Contract

Feature Compliance Level Comments
Extended Tetrahedral PARTIALLY
Fingerprint FULLY
fullInchi FULLY ( 998 )
fullInchi NOT_COMPLIANT ( 2 )
Valence Error FULLY ( 1 )
Valence Error PARTIALLY ( 1 )
Valence Error NOT_COMPLIANT ( 1 ) Hypervalent Hydrogen Incorrect Valence
parse mol wierd parity FULLY
inchiKey FULLY ( 998 )
inchiKey NOT_COMPLIANT ( 2 )
Remove Non Descript Hydrogens FULLY
Inchi FULLY
Default Fingerprinter FULLY
Mol Parser FULLY
Create Chemical FULLY
MolSearcher FULLY
Write Mol FULLY
Problematic Smiles FULLY
Chemical Source FULLY
Clone Chemical FULLY
Atom Alias FULLY
mol parser unknown format FULLY
Hetero Atom Tetrahedral FULLY
Tetrahedral FULLY
Atom Path Traversal FULLY
Atom Coords FULLY
Isotopes FULLY
Cis/Trans FULLY

molwitch-cdk's People

Contributors

chemmitch avatar dkatzel-ncats avatar tylerperyea avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

molwitch-cdk's Issues

CDK AtomContainer2 Survey/Feedback

Hi NCATS!

I am looking for feedback.

It looks like you create all your CDK IAtomContainers via the builder which is prefer way. This also means you've been using AtomContainer2 for some time (there is a Sys Prop to turn it off). Did you have any issues with this/pain points. Our plan was to remove the old one for v3.0 but obviously the smoother we can make this the better.

Some downstream projects (e.g. AMBIT) look to be creating their containers with new AtomContainer so they may have a bigger "shock".

Thanks,
John

Ring combintorial explosion problem causes problems.

AllRingsFinder arf = new AllRingsFinder();

This can throw an exception:

Caused by: org.openscience.cdk.exception.CDKException: Threshold exceeded for AllRingsFinder

We don't need to find all rings, just need to mark those bonds that are in rings. If CDK has a detector for a more limited number of rings that's certainly preferable. It would be totally acceptable to find rings < 12 atoms as well, if there is no other option. Again, this is only used to find those bonds that are in rings.

A good example of something that fails here if you call "isRingBond()" is this structure:


   JSDraw209182020322D

 78 90  0  0  1  0              0 V2000
   33.2167  -28.5376    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   34.3601  -29.5987    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   31.7264  -28.9972    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   31.3788  -30.5182    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   33.5637  -27.0168    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   35.0546  -26.5569    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   35.4014  -25.0361    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   36.9224  -24.6891    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   37.5990  -23.2836    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   36.9224  -21.8780    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   35.4014  -21.5310    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   34.1818  -22.5039    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   34.1818  -24.0638    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   37.8013  -20.5892    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   39.3567  -20.7060    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   40.2354  -19.4169    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   41.7908  -19.5337    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   42.4679  -18.1280    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   43.9889  -17.7811    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   43.9889  -16.2211    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   45.2080  -15.2485    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   46.7290  -15.5952    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   47.9958  -14.6850    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   48.3802  -13.1732    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   47.5016  -11.8846    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   47.9618  -10.3937    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   46.8178   -9.3325    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   49.4136   -9.8236    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   48.1248   -8.9449    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   49.4136   -8.2636    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   50.7648   -7.4839    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   50.3609   -5.9772    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   51.1407   -4.6260    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   50.1913   -3.3886    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   52.6473   -4.2221    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   53.9985   -5.0025    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   55.1015   -3.8992    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   56.6085   -4.3031    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   57.0118   -5.8097    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   55.9091   -6.9130    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   54.4024   -6.5091    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   53.6221   -7.8598    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   52.1154   -8.2636    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   52.1154   -9.8236    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   50.7648  -10.6040    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   50.9970  -12.1461    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   49.9362  -13.2895    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   51.0528  -14.3787    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   51.2082  -15.9311    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   50.3295  -17.2199    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   48.8278  -17.6425    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   47.4061  -17.0009    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   46.7290  -18.4066    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   45.2080  -18.7533    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   45.2080  -20.3135    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   43.9889  -21.2861    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   42.4679  -20.9388    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   41.5892  -22.2280    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   40.0338  -22.1111    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   39.1550  -23.4003    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   39.8315  -24.8054    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   58.4419   -6.4335    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   59.8211   -5.7046    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   61.0036   -6.7220    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   60.7135   -8.2551    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   62.4756   -6.2070    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   63.6584   -7.2244    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   62.7657   -4.6742    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   64.0479   -5.5627    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   65.2894   -4.6175    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   64.7738   -3.1453    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   63.2144   -3.1803    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   61.5831   -3.6568    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   60.1111   -4.1720    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   59.0936   -2.9893    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   59.7174   -1.5600    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   57.5346   -3.0477    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   56.8057   -1.6685    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  6  0  0  0
  1  3  1  0  0  0  0
  1  5  1  0  0  0  0
  3  4  1  0  0  0  0
  6  5  2  0  0  0  0
  7  6  1  1  0  0  0
  8  7  1  0  0  0  0
  7 13  1  0  0  0  0
  9  8  1  1  0  0  0
  9 10  1  0  0  0  0
 60  9  1  0  0  0  0
 10 11  1  0  0  0  0
 10 14  1  6  0  0  0
 11 12  1  0  0  0  0
 12 13  2  0  0  0  0
 15 14  1  6  0  0  0
 15 16  1  0  0  0  0
 15 59  1  0  0  0  0
 17 16  1  6  0  0  0
 17 18  1  0  0  0  0
 17 57  1  0  0  0  0
 19 18  1  6  0  0  0
 19 20  1  0  0  0  0
 19 54  1  0  0  0  0
 21 20  2  0  0  0  0
 22 21  1  0  0  0  0
 22 23  1  6  0  0  0
 22 52  1  0  0  0  0
 24 23  1  6  0  0  0
 24 25  1  0  0  0  0
 24 47  1  0  0  0  0
 26 25  1  0  0  0  0
 26 27  1  1  0  0  0
 28 26  1  0  0  0  0
 28 29  1  1  0  0  0
 30 28  1  0  0  0  0
 45 28  1  0  0  0  0
 31 30  1  6  0  0  0
 31 32  1  0  0  0  0
 31 43  1  0  0  0  0
 33 32  1  0  0  0  0
 33 34  1  6  0  0  0
 33 35  1  0  0  0  0
 36 35  1  6  0  0  0
 36 37  1  0  0  0  0
 36 41  1  0  0  0  0
 38 37  1  6  0  0  0
 38 39  1  0  0  0  0
 77 38  1  0  0  0  0
 39 40  1  1  0  0  0
 39 62  1  0  0  0  0
 41 40  1  1  0  0  0
 41 42  1  0  0  0  0
 43 42  1  1  0  0  0
 43 44  1  0  0  0  0
 45 44  1  1  0  0  0
 45 46  1  0  0  0  0
 47 46  1  1  0  0  0
 47 48  1  0  0  0  0
 48 49  1  0  0  0  0
 49 50  2  0  0  0  0
 50 51  1  0  0  0  0
 52 51  1  0  0  0  0
 52 53  1  1  0  0  0
 54 53  1  1  0  0  0
 54 55  1  0  0  0  0
 55 56  2  0  0  0  0
 57 56  1  0  0  0  0
 57 58  1  1  0  0  0
 59 58  1  1  0  0  0
 59 60  1  0  0  0  0
 60 61  1  6  0  0  0
 63 62  1  1  0  0  0
 63 64  1  0  0  0  0
 74 63  1  0  0  0  0
 64 65  1  6  0  0  0
 64 66  1  0  0  0  0
 66 67  1  1  0  0  0
 66 68  1  0  0  0  0
 68 69  1  0  0  0  0
 68 72  1  0  0  0  0
 68 73  1  6  0  0  0
 69 70  1  0  0  0  0
 70 71  1  0  0  0  0
 72 71  1  0  0  0  0
 74 73  1  6  0  0  0
 75 74  1  0  0  0  0
 75 76  1  6  0  0  0
 75 77  1  0  0  0  0
 77 78  1  1  0  0  0
M  END

image

Stderr Stacktrace on stdinchikey sometimes

org.openscience.cdk.exception.CDKException: Cannot assign Kekulé structure, non-sigma bond order has already been assigned?
at org.openscience.cdk.aromaticity.Kekulization.kekulize(Kekulization.java:113)
at gov.nih.ncats.molwitch.cdk.CdkChemicalImpl.lambda$kekulize$15(CdkChemicalImpl.java:979)
at gov.nih.ncats.molwitch.cdk.CdkChemicalImpl.doWithQueryFixes(CdkChemicalImpl.java:772)
at gov.nih.ncats.molwitch.cdk.CdkChemicalImpl.kekulize(CdkChemicalImpl.java:977)
at gov.nih.ncats.molwitch.Chemical.kekulize(Chemical.java:540)
at gov.nih.ncats.molwitch.cdk.CdkChemicalInchiImplFactory.asStdInchi(CdkChemicalInchiImplFactory.java:94)
at gov.nih.ncats.molwitch.inchi.Inchi.asStdInchi(Inchi.java:48)
at gov.nih.ncats.molwitch.Chemical.toInchi(Chemical.java:579)
at gov.nih.ncats.molvec.internal.algo.experimental.InChIKeySetScorer.score(InChIKeySetScorer.java:98)
at gov.nih.ncats.molvec.internal.algo.experimental.ModifiedMolvecPipel

I don't know that it should actually throw the exception, but it should probably have a global flag for whether it swallows, throws, or prints. I don't think printing is the right default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.