Coder Social home page Coder Social logo

latent-gan's Introduction

LatentGAN

LatentGAN [1] with heteroencoder trained on ChEMBL 25 [2], which encodes SMILES strings into latent vector representations of size 512. A Wasserstein Generative Adversarial network with Gradient Penalty [3] is then trained to generate latent vectors resembling that of the training set, which are then decoded using the heteroencoder. The code has been confirmed to work with the environment provided in the environment.yml provided, but not all packages might be necessary. Refactoring the code to follow the other baselines is a Work in Progess.

Dependencies

This model uses the heteroencoder available as a package from [4], which is to be available soon. The heteroencoder further requires This package to run properly on the ChEMBL model.

General Usage Instructions

One can use the runfile (run.py) for a single script that does the entire process from encoding smiles to create a training set, create and train a model, followed by sampling and decoding latent vectors back into SMILES using default hyperparameters.

Arguments:
-sf Input SMILES file name.
-st Output storage directory path [DEFAULT:"storage/example/"].
-lf Output latent file name [DEFAULT:"encoded_smiles.latent"], this will be put in the storage directory.
-ds Output decoded SMILES file name [DEFAULT:"decoded_smiles.csv"], this will be put in the storage directory.
--n-epochs Number of epochs to train the model for [DEFAULT: 2000].
--sample-n Give how many latent vectors for the model to sample after the last epoch has finished training. Default: 30000.
--encoder The data set the pre-trained heteroencoder has been trained on [chembl|moses] [DEFAULT:chembl] IMPORTANT: Currently only moses model compatible with recent version of the heteroencoder.

OR one can conduct the individual steps by using each script in succesion. This is useful to e.g. sample a saved model checkpoint using (sample.py).

  1. Encode SMILES (encode.py): Gives a .latent file of latent vectors from a given SMILES file. Currently only accepts SMILES of token size smaller than 128.
Arguments:
-sf Input SMILES file name.
-o Output Smiles file name.
  1. Create Model (create_model.py): Creates blank model files generator.txt and discriminator.txt based on an input .latent file.
Arguments: 
-i .latent file, 
-o path to directory you want to place the models in. 
  1. Train Model (train_model.py): Trains generator/discriminator with the specified parameters. Will also create .json logfiles of generator and discriminator losses.
Arguments:
-i .latent file. 
-o model directory path.
--n-epochs Number of epochs to train for.
--starting-epoch Model checkpoint epoch to start training from, if checkpoints exist. 
--batch-size Batch size of latent vectors, Default: 64. 
--save-interval How often to save model checkpoints. 
--sample-after-training Give how many latent vectors for the model to sample after the last epoch has finished training. Default: 0.
--decode-mols-save-path Give output path for SMILES file if you want your sampled latent vectors decoded. 
--n-critic-number Number of of times discriminator will train between each generator number. Default: 5.
--lr learning rate, Default: 2e-4. 
--b1,--b2 ADAM optimizer constants. Default 0.5 and 0.9, respectively.
-m Message to print into the logfile. 
  1. Sample Model (sample.py): Samples an already trained model for a given number of latent vectors.
Arguments: 
-l input generator checkpoint file. 
-olf path to output .latent file -n number of latent vectors to sample. 
-d Option to also decode the latent vectors to SMILES. 
-odsf output path to SMILES file. 
-m message to print in logfile.
  1. Decode Model (decode.py) decodes a .latent file to SMILES.
Arguments 
-l input .latent file. 
-o output SMILES file path. 
-m message to print in logfile.

Links

[1] A De Novo Molecular Generation Method Using Latent Vector Based Generative Adversarial Network

[2] ChEMBL

[3] Improved training of Wasserstein GANs

[4] Deep-Drug-Coder

latent-gan's People

Contributors

seemonj avatar dierme avatar

Watchers

paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.