Comments (5)
Thanks for the report (and sorry for the slow reply, I was stuck without internet due to a move).
I'll need a bit more time to dig in to the details here, although the main culprit is the detection/analysis of aromatic regions.
from pysmiles.
I'm not aware if the systematic comparison of the output (re hybridization) offered by pysmiles vs. the by other programs like RDKit may become/already is part of checking the consistency, or not beyond the ones based on pytest.
Thus, rather out of curiosity, a doodle using OpenBabel was set up. Two differences to your program are identified; namely systematic difference of the labels about the atom types from those used by you and RDKit, and perhaps more importantly, the count of the atom indices starts by one.
Feel free to use the conceptual script as you like. The archive contains both an earlier doodle in a Jupyter notebook, as well as a script just for the CLI.
from pysmiles.
Many thanks for the help!
For the smiles "c1ccncc1CNO"
openbabel and pysmiles produce the same result as far as I can see:
0 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
1 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
2 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
3 {'element': 'N', 'charge': 0, 'aromatic': True, 'hcount': 0}
4 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 1}
5 {'element': 'C', 'charge': 0, 'aromatic': True, 'hcount': 0}
6 {'element': 'C', 'charge': 0, 'aromatic': False, 'hcount': 2}
7 {'element': 'N', 'charge': 0, 'aromatic': False, 'hcount': 1}
8 {'element': 'O', 'charge': 0, 'aromatic': False, 'hcount': 1}
1 Car 2
2 Car 2
3 Car 2
4 Nar 2
5 Car 2
6 Car 2
7 C3 3
8 Nox 3
9 O3 3
For "OCCn2c(=N)n(CCOc1ccc(Cl)cc1Cl)c3ccccc23" they indeed produce different results for the bicyclic/extracyclic aromatic moiety (as expected).
I would appreciate some help to come up with a (good/simple) algorithm to decide whether an atom is (anti)aromatic. The current implementation is (obviously) too simplistic.
Once that's implemented it should /absolutely/ be added to the testsuite.
... perhaps more importantly, the count of the atom indices starts by one.
I don't think this is an issue. I adhere to python standards where we start counting at 0. Besides, the numbering is fully arbitrary. Pysmiles will number the atoms in the order in which they're in the smiles, but this is not something other software has to do. If you want to be sure/align the produced graphs/molecules you need to solve the graph isomorphism. But that's out of scope here and for pysmiles (besides, networkx has good solutions for it).
from pysmiles.
from pysmiles.
Thanks for the links. It's interesting to note that in 2021 aromaticity in SMILES is still ill-defined :)
I need a bit more time to really digest this, and decide on what the desired behaviour is as well.
Looking at the molecule in your opening post the assignment of implicit hydrogens is actually very suspicious independent of aromaticity.
As for parser behaviour, I see three options.
- Take the provided SMILES as ground truth. If that says something is aromatic, it is. This would also leave kekulized input as-is.
- Re-assess aromaticity so that kekulized cycles may become aromatic.
- Re-assess all aromaticity. This means atoms that are aromatic in the input may be made non-aromatic.
Anything that's not option 1 is hard :'). Then again, some method of detecting aromaticity is desirable for the writer.
To be continued...
from pysmiles.
Related Issues (20)
- Will pysmiles generate the only SMILES? HOT 4
- could you not PRINT warnings? HOT 1
- pysmiles 'Unmatched ring indices [0]' HOT 1
- R/S chirality HOT 2
- Molecular Formula and Molecular Weight HOT 4
- write_smiles can create invalid SMILES when provided with chemically invalid graphs HOT 2
- Bug!In the conversion process of smiles and graph, the number of hydrogen atoms is converted incorrectly HOT 1
- `[se]` and `[as]` are not properly recognized HOT 2
- Writing smiles silently breaks on graphs with multiple fragments HOT 2
- Inconsistent writing and reading of mono-atomic smiles for Se and As HOT 1
- Question on rings HOT 1
- NetworkX v3
- Misinterpretation of the ring closure bonds HOT 1
- `reinterpret_aromatic` erases aromatic rings containing a `N+` HOT 2
- Bug Simpliefied Smiles of 1 atom HOT 3
- thiophene + adding hydrogen leads to incorrect count HOT 1
- fill valance fails for charged molecules
- Treasure Hunt HOT 1
- Add pysmiles to conda-forge HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pysmiles.