Coder Social home page Coder Social logo

Comments (5)

pckroon avatar pckroon commented on September 23, 2024

Thanks for the report (and sorry for the slow reply, I was stuck without internet due to a move).
I'll need a bit more time to dig in to the details here, although the main culprit is the detection/analysis of aromatic regions.

from pysmiles.

nbehrnd avatar nbehrnd commented on September 23, 2024

I'm not aware if the systematic comparison of the output (re hybridization) offered by pysmiles vs. the by other programs like RDKit may become/already is part of checking the consistency, or not beyond the ones based on pytest.

Thus, rather out of curiosity, a doodle using OpenBabel was set up. Two differences to your program are identified; namely systematic difference of the labels about the atom types from those used by you and RDKit, and perhaps more importantly, the count of the atom indices starts by one.

Feel free to use the conceptual script as you like. The archive contains both an earlier doodle in a Jupyter notebook, as well as a script just for the CLI.

check_openbabel.zip

from pysmiles.

pckroon avatar pckroon commented on September 23, 2024

Many thanks for the help!

For the smiles "c1ccncc1CNO" openbabel and pysmiles produce the same result as far as I can see:

0 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
1 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
2 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
3 {'element': 'N', 'charge': 0, 'aromatic': True, 'hcount': 0}
4 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
5 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 0}
6 {'element': 'C', 'charge': 0, 'aromatic': False, 'hcount': 2}
7 {'element': 'N', 'charge': 0, 'aromatic': False, 'hcount': 1}
8 {'element': 'O', 'charge': 0, 'aromatic': False, 'hcount': 1}

  1 Car 2
  2 Car 2
  3 Car 2
  4 Nar 2
  5 Car 2
  6 Car 2
  7 C3  3
  8 Nox 3
  9 O3  3

For "OCCn2c(=N)n(CCOc1ccc(Cl)cc1Cl)c3ccccc23" they indeed produce different results for the bicyclic/extracyclic aromatic moiety (as expected).
I would appreciate some help to come up with a (good/simple) algorithm to decide whether an atom is (anti)aromatic. The current implementation is (obviously) too simplistic.
Once that's implemented it should /absolutely/ be added to the testsuite.

... perhaps more importantly, the count of the atom indices starts by one.

I don't think this is an issue. I adhere to python standards where we start counting at 0. Besides, the numbering is fully arbitrary. Pysmiles will number the atoms in the order in which they're in the smiles, but this is not something other software has to do. If you want to be sure/align the produced graphs/molecules you need to solve the graph isomorphism. But that's out of scope here and for pysmiles (besides, networkx has good solutions for it).

from pysmiles.

nbehrnd avatar nbehrnd commented on September 23, 2024

from pysmiles.

pckroon avatar pckroon commented on September 23, 2024

Thanks for the links. It's interesting to note that in 2021 aromaticity in SMILES is still ill-defined :)

I need a bit more time to really digest this, and decide on what the desired behaviour is as well.
Looking at the molecule in your opening post the assignment of implicit hydrogens is actually very suspicious independent of aromaticity.

As for parser behaviour, I see three options.

  1. Take the provided SMILES as ground truth. If that says something is aromatic, it is. This would also leave kekulized input as-is.
  2. Re-assess aromaticity so that kekulized cycles may become aromatic.
  3. Re-assess all aromaticity. This means atoms that are aromatic in the input may be made non-aromatic.

Anything that's not option 1 is hard :'). Then again, some method of detecting aromaticity is desirable for the writer.

To be continued...

from pysmiles.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.