Coder Social home page Coder Social logo

smiles-to-iupac-translator's Introduction


STOUT Logo
V2.0

Smiles TO iUpac Translator: Advanced Chemical Nomenclature Translation

License Maintenance Workflow
GitHub issues GitHub contributors tensorflow
GitHub release PyPI version Python versions DOI

Key FeaturesInstallationHow To UseAcknowledgementsCitation

STOUT Demo

Key Features

  • 🧪 Translate SMILES to IUPAC names
  • 🔬 Convert IUPAC names back to valid SMILES strings
  • 🤖 Powered by advanced transformer models
  • 💻 Cross-platform support (Linux, macOS, Windows via Ubuntu shell)
  • 🚀 High-performance chemical nomenclature translation

Installation

Choose your preferred installation method:

📦 PyPI Installation
pip install STOUT-pypi
🐍 Conda Environment Setup
conda create --name STOUT python=3.10 
conda activate STOUT
conda install -c decimer stout-pypi
📥 Direct Repository Installation
pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git

How To Use

from STOUT import translate_forward, translate_reverse

# SMILES to IUPAC name translation
SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
IUPAC_name = translate_forward(SMILES)
print(f"🧪 IUPAC name of {SMILES} is: {IUPAC_name}")

# IUPAC name to SMILES translation
IUPAC_name = "1,3,7-trimethylpurine-2,6-dione"
SMILES = translate_reverse(IUPAC_name)
print(f"🔬 SMILES of {IUPAC_name} is: {SMILES}")

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)

Part of the DECIMER Project

DECIMER Logo

About Us

Cheminformatics and Computational Metabolomics Group

Citation

Rajan, K., Zielesny, A. & Steinbeck, C. STOUT: SMILES to IUPAC names using neural machine translation. J Cheminform 13, 34 (2021). https://doi.org/10.1186/s13321-021-00512-4

Model Card

Rajan, K., Steinbeck, C., & Zielesny, A. (2024). STOUT V2 - Model library. Zenodo. https://doi.org/10.5281/zenodo.13318286

Repository Analytics

Repobeats analytics image


Made with ❤️ by the Steinbeck Group

smiles-to-iupac-translator's People

Contributors

alt-shreya avatar dependabot[bot] avatar egonw avatar kohulan avatar obrink avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

smiles-to-iupac-translator's Issues

Dockerfile

I have been trying to install the package in numerous environments but I keep running into dependency issues. Is there any possibility you would be willing to create a Dockerfile to help simplify the install process?

Thanks thanks for you help and work!

PyPi - tensorflow 2.15.0

Hello, would it be possible to update the pypi requirments to work with tensorflow 2.15.0? When I call pip install STOUT-pypi==2.0.5 I get ERROR: Could not find a version that satisfies the requirement tensorflow==2.10.1

I'm using Windows 10 and python 3.11.4.

Requirements are a little strict? And IUPAC name of O=Cc1ccccc1 is: styrene

Hi,

Awesome package. Just a comment that I wanted to install stout into an existing environment I'm using for Chemoinformatics. This environment has rdkit installed through conda, and tensorflow 2.6.2 (not tensorflow-gpu). I installed pystow manually and then ignored the requirements for stout, and it works fine. Just wondering if the requirement for tensorflow-gpu and the pypi version of rdkit is a bit strict?

So anyway all working nicely. I ran a bunch of test molecules, which mostly look great.
However this one caught my eye.
IUPAC name of O=Cc1ccccc1 is: styrene. Should be benzaldehyde.
Just thought I'd flag this as a test case, as its a relatively simple molecule. Not meant as a critisism or anything.

Keep up the great work.

Kind regards,
Will

Not working for the example of intended use on MacOS.

Hi,

After following the steps that you recommend to install the package, I got the following errors:

Downloading trained model to Trained_models/60/forward ...
... done downloading trained model!
Archive:  STOUT_trained_models_v2.1.zip
   creating: Trained_models/
   creating: Trained_models/60/
   creating: Trained_models/60/forward/
  inflating: Trained_models/60/forward/ckpt-1.data-00000-of-00001
  inflating: Trained_models/60/forward/ckpt-1.index
  inflating: Trained_models/60/forward/checkpoint
   creating: Trained_models/60/reverse/
  inflating: Trained_models/60/reverse/ckpt-1.data-00000-of-00001
  inflating: Trained_models/60/reverse/ckpt-1.index
  inflating: Trained_models/60/reverse/checkpoint
   creating: Trained_models/30/
   creating: Trained_models/30/forward/
  inflating: Trained_models/30/forward/ckpt-1.data-00000-of-00001
  inflating: Trained_models/30/forward/ckpt-1.index
  inflating: Trained_models/30/forward/checkpoint
   creating: Trained_models/30/reverse/
  inflating: Trained_models/30/reverse/ckpt-1.data-00000-of-00001
  inflating: Trained_models/30/reverse/ckpt-1.index
  inflating: Trained_models/30/reverse/checkpoint
Traceback (most recent call last):
  File "STOUT_V_2.1.py", line 303, in <module>
    main()
  File "STOUT_V_2.1.py", line 50, in main
    iupac_name = translate(selfies.encoder(canonical_smiles.decode('utf-8').strip()).replace("][","] ["))
  File "STOUT_V_2.1.py", line 167, in translate
    result, sentence = evaluate(sentence)
  File "STOUT_V_2.1.py", line 133, in evaluate
    inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]
  File "STOUT_V_2.1.py", line 133, in <listcomp>
    inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]
KeyError: '[Branch1]'

Ammonia

Hi,

Today, when I tried to generate the SMILES string for 'ammonia', I got '[NH2+]' back, which is certainly wrong.
>>> STOUT.translate_reverse('ammonia') '[NH2+]'

When I tried to convert 'Ammonia', I got back a mess of weird strings.

>>> STOUT.translate_reverse('Ammonia') '[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr].[Pr'

I also tried the systematic name.
>>> STOUT.translate_reverse('azane') 'N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.N.'

>>> STOUT.translate_reverse('Azane') '[15NH3]'

I'm not sure if this is intended and I guess the error is on my side, but could you please have a look? :)

In the other direction, it works well:
>>> STOUT.translate_forward('N') 'azane'

>>> STOUT.translate_forward('[NH2+]') 'azanium'

Thank you
Philipp

Wrong comment sign?

in the line
python3 STOUT_V_2.0.py 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C' -> SMILES to IUPAC
I assume that everything after including "->" is a comment.
Use a comment sign in this case.

how to train ?

Best regards. Could you please help with some instructions to re-train it with my own data?
Thank you in advance.

Conda Package

Is it possible to get a conda package? This would make it easier to create the conda environment then install this.

Step by step guide not working

When I try to run
pip install tensorflow-gpu==2.3.0 selfies matplotlib unicodedata2
I am getting ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==2.3.0
ERROR: No matching distribution found for tensorflow-gpu==2.3.0

SELFIES version

Thanks for this great model! Is the SELFIES version mentioned in the paper also available?

Can we do a SMILES to Common Chemical Name Translator?

Hello,

Is it possible to do a SMILES to Common Chemical Name as well as to IUPAC as well? I have the data already available in the form of a name to SMILES directly? Is that possible.

I will definitely build a connector to this. Been playing with a bit as well. Really like what you have done this is so awesome.

Possibility to change the input of models to accept a batch as a collection of SMILES/names?

Dear @Kohulan,

I'm wondering if it is possible to adjust the model so that it can accept multiple inputs. For example, input might be a batch of smiles / names, therefore increasing the performance of the model.

Right now the only way for multiple inputs is just passing them one by one in a for-loop:

# SMILES to IUPAC name translation
smiles_list = ['CC(=O)OC(CC(=O)O)C[N+](C)(C)C',
             'CC(CN)O',
             'C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl',
             'CCN1C=NC2=C(N=CN=C21)N',]

for smiles in smiles_list:
    IUPAC_names = translate_forward(smiles)
    print("IUPAC name of "+smiles+" is: "+IUPAC_names)

Do you think it is possible to implement as an input a batch (collection of SMILES/names) to have something like that:

# SMILES to IUPAC name translation
smiles_list = ['CC(=O)OC(CC(=O)O)C[N+](C)(C)C',
             'CC(CN)O',
             'C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl',
             'CCN1C=NC2=C(N=CN=C21)N',]

IUPAC_names = translate_forward(smiles_list)

Is it feasible by changing input shapes and do necessary preprocessing of input data or the only way is to re-train/fine-tune the model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.