molecularai / reinvent Goto Github PK

View Code? Open in Web Editor NEW

335.0 21.0 108.0 120.6 MB

License: Apache License 2.0

Python 99.73% Dockerfile 0.27%

cheminformatics neural-networks astrazeneca reinforcement-learning denovo-design transfer-learning

reinvent's People

Contributors

Stargazers

Watchers

Forkers

msultan unixjunkie tblaschke catenate15 cthoyt-forks-and-packages dhristozov rct20140922 ulamaca jinchengneng sirimullalab chemical-project sailfish009 fujirock iirissundin yupliu oengkvist llzheng recherhe xianzeng-lab ajayarunachalam hyunp2 paul-goldsmith robertlizatovic flavda yingli2009 q20110911 ashar799 cloud-moym zzkdxn yoctocell kntkb biocheming andyj10224 icamps shouti-fredchang frankji prasannavd biocreator seihwan2021 shiska07 kong0706 ayato0507 lilleswing marco-foscato githubxin123 lsalases wangxr0526 j3mdamas menggf ifyoungnet rnaimehaom wenhao-gao fortweng tiger-tiger waseem-abbas05 cmargreitter bbgao joonseonyoon patronov daedalus-1337 moummj wittler-github ys-arch dot23 akitosok mhanson2019 highdxy zyh0608 woaiyong710 atlimited freeenergylab shunsunsun ardeat albertbou92 fiberleif mars-wei lyndonlens bwang-ecnu schwallergroup marcostenta m-hakmi yuki-nco ylyzz21 seahurt akhilmedvolt toluwajosh drowning-fish-sys vincenzo-palmacci mesfind yansonggu deargen xuelianl ftry bytetora mkatouda parrondo nonsensejoke jidushanbojue fl65inc halx

reinvent's Issues

Use standard format for readme

Currently the readme is just plain text. If it is updated to markdown or RST, GitHub will format it nicely and it will be much easier for users to read

Project workflows

Hello,

I had some questions regarding setting up my own project using the Reinforcement_Learning notebook:

The scoring function makes use of the Aurora kinase model in the demo. If I would like to change the target, do I create my own model using the [Create_Model_Demo.ipynb] notebook?
The prior and agent, are they always trained using the transfer learning notebook that uses the same model as created by the [Create_Model_Demo.ipynb] notebook? If not, what do I use?
The [Create_Model_Demo.ipynb] notebook uses different smiles to create the model, if I were to create a model for a specific target, do I use smiles from known actives as input? And if the target doesn't have any known actives, how do I progress?
What is the difference between the [Create_Model_Demo.ipynb] notebook and the [Model_Building_Demo.ipynb] notebook?
In general which notebooks should be used in what order to get the best out of REINVENT if I were to start a project for a target with no known OR with known actives?
Around how big should the smiles dataset be?

I hope these questions are clear and not too much of a problem!

Thank you for your time and warm regards,
Wout Van Eynde

Conda env create fails on Windows/Mac

Hey,

conda can't solve the dependencies for non-linux platforms. This is because the packages in reinvent-shared.yml contain build strings.
The easiest fix is to use the --no-build flag when exporting the environment.
However, for long-term support, it might be beneficial to create minimal reinvent-minimal.yml, which only contains the explicitly needed dependencies and not the dependencies of our dependencies.

\Thomas

Can not run reinvent 3.0 successfully on RTX3090

Excuse me Sir，
Followed errors come when I running the notebook. I also tried pytorch 1.10.1. Could you give some advice to solve this problem?
Thank you!

/home/user/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/torch/cuda/init.py:143: UserWarning:
GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

Can't run the reinforcement learning part in the notebook

Thank you for your great repo.
I struggled with this problem for a while. I don't know where I am wrong. I just followed the instructions you gave. I will show how I installed the libraries and how I run the notebooks in detail. It is a little bit longer, but please help me.

install reinvent library
$ git clone https://github.com/MolecularAI/Reinvent.git
$ cd Reinvent
$ conda env create -f reinvent.yml
$ conda activate reinvent.v3.0
then I opened another terminal to install ReinventCommunity
$ git clone https://github.com/MolecularAI/ReinventCommunity.git
$ cd ReinventCommunity
$ conda env create -f environment.yml
$ conda activate ReinventCommunity
ran one of the notebooks, here I chose the first example.
(ReinventCommunity) pharma1@pharma1:/mnt/SSD/projects/ReinventCommunity$ jupyter notebook
a. opened file "Complete_Use-Case-DRD2_Demo.ipynb"

b. changed cell No.1 only the following part to meet my system

from:
reinvent_dir = os.path.expanduser("~/Desktop/Projects/Publications/2020/2020-04_REINVENT_2.0/Reinvent") reinvent_env = os.path.expanduser("~/miniconda3/envs/reinvent_shared.v2.1") output_dir = os.path.expanduser("~/Desktop/REINVENT_Use-Case-DRD2_demo")

to:
reinvent_dir = os.path.expanduser("../../Reinvent")
reinvent_env = os.path.expanduser("~/miniconda3/envs/reinvent.v3.0")
output_dir = os.path.expanduser("~/Desktop/REINVENT_Use-Case-DRD2_demo")

c. ran the following cells until cell No.12 without any error

d. ran cell No.13 showed the following error messages:
Traceback (most recent call last):
File "../../Reinvent/input.py", line 6, in
from running_modes.manager import Manager
File "/mnt/SSD/projects/Reinvent/running_modes/manager.py", line 4, in
from running_modes.configurations import GeneralConfigurationEnvelope
File "/mnt/SSD/projects/Reinvent/running_modes/configurations/init.py", line 2, in
from running_modes.configurations.scoring import ScoringRunnerComponents, ScoringRunnerConfiguration
File "/mnt/SSD/projects/Reinvent/running_modes/configurations/scoring/init.py", line 1, in
from running_modes.configurations.scoring.scoring_runner_components import ScoringRunnerComponents
File "/mnt/SSD/projects/Reinvent/running_modes/configurations/scoring/scoring_runner_components.py", line 3, in
from reinvent_scoring.scoring import ScoringFuncionParameters
ModuleNotFoundError: No module named 'reinvent_scoring'

e. I checked if reinvent_scoring was installed
(reinvent.v3.0) pharma1@pharma1:/mnt/SSD/projects/Reinvent$ python
Python 3.7.7 (default, Mar 26 2020, 15:48:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import reinvent_scoring

It showed reinvent_scoring was installed for sure. So I don't know how this error happened.

f. I tried to run reinvent using the command-line execution just like you mentioned:
$ conda activate reinvent.v3.0
(reinvent.v3.0) pharma1@pharma1:/mnt/SSD/projects/Reinvent$ python input.py /home/pharma1/Desktop/REINVENT_Use-Case-DRD2_demo/DRD2_config.json

I got the following error messages:

Traceback (most recent call last):
File "input.py", line 20, in
manager = Manager(configuration)
File "/mnt/SSD/projects/Reinvent/running_modes/manager.py", line 13, in init
self._load_environmental_variables()
File "/mnt/SSD/projects/Reinvent/running_modes/manager.py", line 22, in _load_environmental_variables
with open(os.path.join(project_root, '../configs/config.json'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/SSD/projects/Reinvent/running_modes/../configs/config.json'

Do I need to change the name "DRD2_config.json" to "config.json" and put it into the folder where "example.config.json" is?

g. I tries the other examples, everything was okay until I need to run REINVENT, and the same errors appeared.

I really don't know where I was wrong! Any help is highly appreciated. I was so so frustrated. Many thanks.

unittest failure

I'm getting a failure when I do "python -m unittest"

======================================================================
FAIL: test_inception_model_1 (unittest_reinvent.inception_tests.test_add.Test_inception_model_add)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/software/Reinvent/unittest_reinvent/inception_tests/test_add.py", line 50, in test_inception_model_1
    self.assertEqual(len(self.inception_model.memory), 3)
AssertionError: 4 != 3

Any suggestions?

How to get the batches of the generated compounds?

Theoretically,for generative model,the Batches of compounds generated early would be Unwanted,so I try to get The last few batches of compounds and Loop multiple times with some change in configuration.After checking the results file,I found there was
no batches related information.
Of course, because the scores are an overall upward trend, dividing them according to scores is also a desirable way.But I still want to know if there is a way to get batch information.

look forward to hearing from you if it doesn't bother you. thx~

Reinvent 3.2 version inconsistencies

Hi,

I was trying to install Reinvent 3.2 and while deciding from which commit/tag to install from, I identified some inconsistencies.

Looking at the commit graph (https://github.com/MolecularAI/Reinvent/network), I see that there is a source code drift between master and the reinvent3.2 branch. Namely, there are two commits that seem to be associated with most of the 3.2 update, but they are different commits in each branch: 8abbb0c#diff-f05597e59d240c2c8e8430d508288b0bbaacaddf679e467815fb6c7c3110ac43 and b7324d2#diff-f05597e59d240c2c8e8430d508288b0bbaacaddf679e467815fb6c7c3110ac43

I cloned and diffed the commits directly, and there are many differences. For the installation, most notably:

-    - reinvent-models==0.0.15rc1
+    - reinvent-models==0.0.25

Since PyPi doesn't have the 0.0.25 version available, I had no choice but to consider b7324d2 as version 3.2, but it would be good to clarify this point to make sure the source code is consistent and correct.

tagging @GuoJeff since you made most of these commits.

Thanks,
João

Tautomers

I am still using REINVENT 2.0 (have not updated to 3.0 yet) and have observed that incorrect (less abundant at specific pH) tautomers are generated. I have used QED as scoring function component in RL mode. As these are still valid SMILES molecules, incorrect tautomers (not penalised) are present even in the last batches of generated molecules.
Examples of molecules with incorrect tautomers:
c1ccccc1-c1[nH]c(=N)nc(OC)c1 &
c1(=N)[nH]c(C)c(C#N)c(OC2CCCC2)n1
Is there some way to tailor scoring function components in REINVENT2.0 to generate correct tautomers? Are there updates to REINVENT 3.0 that would help generate correct tautomers? Thank you.

I tried to follow the steps to install Reinvent, but can't create the environment, it shows "ResolvePackageNotFound:"

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

ld_impl_linux-64=2.33.1
secretstorage=3.2.0
libgcc-ng=9.1.0
gstreamer=1.14.0
libedit=3.1.20181209
libuuid=1.0.3
ncurses=6.2
glib=2.63.1
libstdcxx-ng=9.1.0
libgfortran-ng=7.3.0
gst-plugins-base=1.14.0
readline=8.0
dbus=1.13.12

I followed some suggestion from here, datitran/object_detector_app#41
But, it resulted in "Found conflicts! Looking for incompatible packages."
Thanks for your help.

Smiles sample question

When I try to use follow file to sample smiles,

import tqdm
from typing import List
from models.model import Model

class SamplefromModel:
    """
    Interface for molecule generators.
    """
   
    def __init__(self, file_path=r"~/Reinvent/data/augmented.prior", batch_size=128):        
        self.file_path =file_path
        self.batch_size =batch_size
        self.RNN =Model.load_from_file(file_path,sampling_mode=True)
    
    
    def generate(self, number_samples: int) -> List[str]:
        """
        Samples SMILES strings from a molecule generator.
    
        Args:
            number_samples: number of molecules to generate
    
        Returns:
            A list of SMILES strings.
        """
        
        molecules_left = number_samples
        totalsmiles = []
        with tqdm.tqdm(total=number_samples) as progress_bar:            
            while molecules_left > 0:                
                current_batch_size = min(self.batch_size, molecules_left)
                smiles, likelihoods = self.RNN.sample_smiles(current_batch_size, batch_size=self.batch_size)
                totalsmiles.extend(smiles)

                                    
                molecules_left -= current_batch_size
                progress_bar.update(current_batch_size)
        
        
        return totalsmiles

there was some strange error happened,it run well at no_cuda situation but with cuda,the error as follow:
RuntimeError: Input, output and indices must be on the current device

could you please help me out? thanks a lot!

reinvent_shared.yml: ResolvePackageNotFound: intel-openmp=2019.5

The installation instructions currently fail at $ conda env create -n reinvent -f reinvent_shared.yml with the following output:

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - intel-openmp=2019.5

Indeed, when I search the channels listed in the yml file, there are no 2019.5 variants:

$ conda search -c rdkit -c pytorch -c openeye -c omnia -c anaconda -c conda-forge intel-openmp
Loading channels: done
# Name                       Version           Build  Channel             
intel-openmp                2017.0.4      hf7c01fb_0  anaconda            
intel-openmp                2017.0.4      hf7c01fb_0  pkgs/main           
intel-openmp                2018.0.0               8  anaconda            
intel-openmp                2018.0.0               8  pkgs/main           
intel-openmp                2018.0.0      h15fc484_7  anaconda            
intel-openmp                2018.0.0      h15fc484_7  pkgs/main           
intel-openmp                2018.0.0      hc7b2577_8  anaconda            
intel-openmp                2018.0.0      hc7b2577_8  pkgs/main           
intel-openmp                2018.0.3               0  anaconda            
intel-openmp                2018.0.3               0  pkgs/main           
intel-openmp                  2019.0             117  anaconda            
intel-openmp                  2019.0             117  pkgs/main           
intel-openmp                  2019.0             118  anaconda            
intel-openmp                  2019.0             118  pkgs/main           
intel-openmp                  2019.1             144  anaconda            
intel-openmp                  2019.1             144  pkgs/main           
intel-openmp                  2019.3             199  anaconda            
intel-openmp                  2019.3             199  pkgs/main           
intel-openmp                  2019.4             243  anaconda            
intel-openmp                  2019.4             243  pkgs/main           
intel-openmp                  2020.0             166  anaconda            
intel-openmp                  2020.0             166  pkgs/main           
intel-openmp                  2020.1             217  anaconda            
intel-openmp                  2020.1             217  pkgs/main           
intel-openmp                  2020.2             254  anaconda            
intel-openmp                  2020.2             254  pkgs/main

Does the yml file need updating, or have I missed a step somewhere? Can I substitute 2019.5 for 2019.4, or would one of the 2020 versions be a better option (i.e. backwards compatibility)?

Cheers

python main_test.py failed

Thank you so much for your great repo.
I followed the instruction and created a conda environment and run
python main_test.py
I got the following error messages:

(reinvent.v3.0) xzhang@R1124G1:~/projects/Reinvent$ python main_test.py
Traceback (most recent call last):
File "main_test.py", line 7, in
from unittest_reinvent.running_modes import *
File "/home/xzhang/projects/Reinvent/unittest_reinvent/running_modes/init.py", line 1, in
from unittest_reinvent.running_modes.create_model_tests import *
File "/home/xzhang/projects/Reinvent/unittest_reinvent/running_modes/create_model_tests/init.py", line 1, in
from unittest_reinvent.running_modes.create_model_tests.test_create_model import TestCreateModel
File "/home/xzhang/projects/Reinvent/unittest_reinvent/running_modes/create_model_tests/test_create_model.py", line 9, in
from unittest_reinvent.fixtures.paths import MAIN_TEST_PATH, SMILES_SET_PATH
File "/home/xzhang/projects/Reinvent/unittest_reinvent/fixtures/paths.py", line 5, in
with open(os.path.join(project_root, '../../configs/config.json'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/xzhang/projects/Reinvent/unittest_reinvent/fixtures/../../configs/config.json'

Could you please help me? Thanks

What's the version of ChEMBL dataset for training the prior generative model?

Hello,
I would like to know what's the version of ChEMBL dataset used for training the prior generative model?
Thanks,
xuelianl

Train new prior

Is there a tutorial for training new prior? I have tried the create_model.json.
I cannot see where I can define how many epochs to train over either any code of the training loop of the prior.

Thanks

model prediction with maccs/avalon descriptors

there was error when use maccs/avalon descriptors,After changing the following code(in model_container.maccs_key/avalon), the problem is solved:

return fingerprints ==> return np.array(fingerprints)

Tracing docking results from Dockstream

Hello,

it seems a bit tedious to go from Reinvent results, that include Dockstream in its objective function, to the docked poses of molecules in the SDF files that Dockstream has written during its runs. Essentially I need to look up the step in Reinvent, open the correct SDF file for the step in which the molecule was generated, identify the molecule via its SMILES string and then I know which docking pose I can inspect in the SDF file.

Can you share some ideas how you work with this in practice ? Is there some option to configure better lineage, e.g. achieve consistent ligand names in Dockstream and reinvent output ?

Lars

DockStream component config defaulting to some unknown config?

Hi,

I noticed that if my pathing is incorrect for the DockStream configuration_path, i.e. it leads to nowhere, REINVENT will still run and still perform docking, but with some unknown configuration. What is it running on in this case?

To test, run the component with any string in configuration_path which isn't an actual file.

ResolvePackageNotFound

Hi,
Creating environment does not work (Windows10):

..\Reinvent>conda env create -f reinvent.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

libedit=3.1.20181209
ld_impl_linux-64=2.33.1
libgcc-ng=9.1.0
glib=2.63.1
gstreamer=1.14.0
libgfortran-ng=7.3.0
ncurses=6.2
secretstorage=3.2.0
dbus=1.13.12
libstdcxx-ng=9.1.0
libuuid=1.0.3
gst-plugins-base=1.14.0
readline=8.0

Thanks

TypeError: 'NoneType' object is not callable

Hi! I got a trouble when running a transfer learning demo in the built environment.

(only 1 epoch is set for training.)
python3 input.py ../../test/transfer_learning_config.json

/home/young/miniconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /opt/conda/conda-bld/pytorch_1607370128159/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
100%|######################################################################################################################| 1/1 [00:01<00:00, 1.14s/it]19:11:46: base_transfer_learning_logger.log_message +26: INFO Collecting data for epoch 1
Exception ignored in: <function LocalTransferLearningLogger.del at 0x7f62e2a863b0>
Traceback (most recent call last):
File "/mnt/c/Users/YANG/Desktop/xtai/ai/Reinvent-master/running_modes/transfer_learning/logging/local_transfer_learning_logger.py", line 20, in del
File "/home/young/miniconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 1033, in close
File "/home/young/miniconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 133, in flush
File "/home/young/miniconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 106, in flush
File "/home/young/miniconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 155, in flush
File "/home/young/miniconda3/envs/reinvent.v3.0/lib/python3.7/queue.py", line 89, in join
File "/home/young/miniconda3/envs/reinvent.v3.0/lib/python3.7/threading.py", line 289, in wait
TypeError: 'NoneType' object is not callable

don't know how to solve it. could you help me please? Thanks a lot!

How to run Reinveint in parallel?

Hi,

Firstly, thank you for the developing these excellent resources.

I wonder, would it be possible to run Reinvent on multiple CPUs or GPUs in parallel?

Thank you for your time

Does Reinvent implement reaction rules?

Thanks a lot for this amazing tool! I wonder if Reinvent is implementing any kind of reaction rules for its computations. For instance, other tools such as RetroPath include them. If reaction rules are not considered, how reinvent determines if a SMILE is valid or not?

Thank in advance :)

Problem with RL molecular weight as scoring function component

Hi,
I'm trying to use Molecular weight as a scoring function component using this .py script to create the .json to submit:

    # load dependencies
import os
import re
import json
import tempfile

# --------- change these path variables as required
reinvent_dir = os.getenv('REINVENT_DIR')
#module load 
#reinvent_env = os.path.expanduser("")
output_dir = os.path.expanduser("/home/s970675/reinvent_test/test_soglie/MW")

# --------- do not change
#get the notebook's root path
#try: ipynb_path
#except NameError: ipynb_path = os.getcwd()

# if required, generate a folder to store the results
try:
    os.mkdir(output_dir)
except FileExistsError:
    pass

# initialize the dictionary
configuration = {
    "version": 2,                          # we are going to use REINVENT's newest release
    "run_type": "reinforcement_learning"   # other run types: "sampling", "validation",
                                        #                  "transfer_learning",
                                        #                  "scoring" and "create_model"
}
# add block to specify whether to run locally or not and
# where to store the results and logging
configuration["logging"] = {
    "sender": "http://127.0.0.1",          # only relevant if "recipient" is set to "remote"
    "recipient": "local",                  # either to local logging or use a remote REST-interface
    "logging_frequency": 10,               # log every x-th steps
    "logging_path": os.path.join(output_dir, "progress.log"), # load this folder in tensorboard
    "resultdir": os.path.join(output_dir, "results"),         # will hold the compounds (SMILES) and summaries
    "job_name": "Reinforcement learning demo",                # set an arbitrary job name for identification
    "job_id": "demo"                       # only relevant if "recipient" is set to "remote"
}
# add the "parameters" block
configuration["parameters"] = {}

# add a "diversity_filter"
configuration["parameters"]["diversity_filter"] =  {
    "name": "IdenticalMurckoScaffold",     # other options are: "IdenticalTopologicalScaffold", 
                                        #                    "NoFilter" and "ScaffoldSimilarity"
                                        # -> use "NoFilter" to disable this feature
    "nbmax": 25,                           # the bin size; penalization will start once this is exceeded
    "minscore": 0.4,                       # the minimum total score to be considered for binning
    "minsimilarity": 0.4                   # the minimum similarity to be placed into the same bin
}

# prepare the inception (we do not use it in this example, so "smiles" is an empty list)
configuration["parameters"]["inception"] = {
    "smiles": [],                          # fill in a list of SMILES here that can be used (or leave empty)
    "memory_size": 100,                    # sets how many molecules are to be remembered
    "sample_size": 10                      # how many are to be sampled each epoch from the memory
}

# set all "reinforcement learning"-specific run parameters
configuration["parameters"]["reinforcement_learning"] = {
    "prior": os.path.join(reinvent_dir, "data/augmented.prior"), # path to the pre-trained model
    "agent": os.path.join(reinvent_dir, "data/augmented.prior"), # path to the pre-trained model
    "n_steps": 125,                        # the number of epochs (steps) to be performed; often 1000
    "sigma": 128,                          # used to calculate the "augmented likelihood", see publication
    "learning_rate": 0.0001,               # sets how strongly the agent is influenced by each epoch
    "batch_size": 128,                     # specifies how many molecules are generated per epoch
    "reset": 0,                            # if not '0', the reset the agent if threshold reached to get
                                        # more diverse solutions
    "reset_score_cutoff": 0.5,             # if resetting is enabled, this is the threshold
    "margin_threshold": 50                 # specify the (positive) margin between agent and prior
}

component_molecular_weight = {
    "component_type": "mol_weight",
    "name": "Molecular Weight",    # arbitrary name for the component
    "weight": 1,                           # the weight of the component (default: 1)
    "model_path": None,                    # not required; note, this is "null" in JSON
    "smiles": [],                          # not required
    "specific_parameters": {
        "transformation_type": "step",
        "high": 650,
        "low": 300,
        "transformation": True
        }      
}

scoring_function = {
    "name": "custom_product",              # this is our default one (alternative: "custom_sum")
    "parallel": False,                     # sets whether components are to be executed
                                        # in parallel; note, that python uses "False" / "True"
                                        # but the JSON "false" / "true"

    # the "parameters" list holds the individual components
    "parameters": [
        component_molecular_weight
    ]
}


configuration["parameters"]["scoring_function"] = scoring_function
# write the configuration file to the disc
configuration_JSON_path = os.path.join(output_dir, "RL_config.json")
with open(configuration_JSON_path, 'w') as f:
    json.dump(configuration, f, indent=4, sort_keys=True)

However when I try to run REINVENT i get this error:
'''
/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/input.py", line 25, in
main()
File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/input.py", line 21, in main
manager.run()
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/running_modes/manager.py", line 95, in run
job()
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/running_modes/manager.py", line 46, in _run_reinforcement_learning
scoring_function = self._setup_scoring_function(rl_components.scoring_function)
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/running_modes/manager.py", line 62, in _setup_scoring_function
scoring_function_instance = ScoringFunctionFactory(scoring_function_parameters)
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/scoring/scoring_function_factory.py", line 16, in new
return cls.create_scoring_function_instance(sf_parameters, scoring_function_registry)
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/scoring/scoring_function_factory.py", line 25, in create_scoring_function_instance
return scoring_function(parameters, sf_parameters.parallel)
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/scoring/function/custom_product.py", line 14, in init
super().init(parameters, parallel)
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/scoring/function/base_scoring_function.py", line 39, in init
self.scoring_components = factory.create_score_components()
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/scoring/score_components/score_component_factory.py", line 39, in create_score_components
return [self._current_components.get(p.component_type)(p) for p in self._parameters]
File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/scoring/score_components/score_component_factory.py", line 39, in
return [self._current_components.get(p.component_type)(p) for p in self._parameters]
TypeError: 'NoneType' object is not callable
'''
I tried with other phys-chem properties everything runs smoothly, I'm wondering if this is a problem of MW component..
Thanks in advance for your help.

How do you write your name?

Hi, I am wondering how you prefer to write the name of your project. So far I ahve seen the following alternatives at various locations:

REINVENT
Reinvent
[RE]-Invent

Which one is preferred?

Thanks!

Could you please add more types of predictive property components?

Hi, I've found that the predictive property components of scoring are restricted to scikit-learn models only. However, there are different types of implemented predictive models in practice, such as PyTorch. I wonder whether it's possible to add more supported types of predictive models such as PyTroch to Reinvent.

Numpy 1.17 version is too high resulting in an error

Hello, install numpy 1.17 according to the environment version you provided, but I encountered the following problems when executing the create_model mode:

/home/zh/sda3/Anaconda3/envs/reinvent_2.0/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/zh/sda3/Anaconda3/envs/reinvent_2.0/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/zh/sda3/Anaconda3/envs/reinvent_2.0/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/zh/sda3/Anaconda3/envs/reinvent_2.0/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/zh/sda3/Anaconda3/envs/reinvent_2.0/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/zh/sda3/Anaconda3/envs/reinvent_2.0/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

When I reinstalled numpy==1.16.4, I executed the command again without any errors,

So do you need to modify the provided environmental dependencies?

Some problems of using fp16 to reduce memory consumption

Hi,

I used my own QSAR model, but it was out of memory
I want to use apex for fp16 mixed precision calculation

There is a statement when using apex： model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
I'm a little confused about using those two parameters instead of model and optimizer

Look forward to your reply
sincerely

Suggestion: Torch.cuda.is_available() checks for Reinvent Models PyPI Package

PyPI Package : Reinvent Models
Version : 0.0.14
Suggestion : Additional CUDA checks to make the package compatible with cpu

Hello everyone,

Firstly, allow me thank you building this amazing library, we are building an open source library to accelerate hypothesis generation in the scientific discovery process where we have integrated, among others, reinvent-models to allow users to generate molecules using your framework.

We have noticed that the package reinvent-models is missing CUDA checks in some parts of the code base where is assuming that it'll always run on gpu.

To overcome this limitation, we mirrored in a repo the PyPI package of the latest version of reinvent-models and add those changes to make it compatible with cpu.

Since we thought this might be a useful change, we thought you could consider including this in the next release of the package.

Please let us know your thoughts on this and thanks again for the amazing work :)

Kind regards,
Ashish Dave

Test error and type error

Hi，
Thank you for developing such an excellent tool!
I had some problems when I ran the code.
The first is type error when I run transfer_ learning tutorial,
Typeerror appears: 'Nonetype' object is not callable.I was confused.
The error report is as follows：
Exception ignored in: <function LocalTransferLearningLogger.del at 0x7f04ecea2830>
Traceback (most recent call last):
File "/home/ubuntu-61/code/Reinvent-1-master/Reinvent-1-master/running_modes/transfer_learning/logging/local_transfer_learning_logger.py", line 21, in del
File "/home/ubuntu-61/anaconda3/envs/pytorch4/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 1033, in close
File "/home/ubuntu-61/anaconda3/envs/pytorch4/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 133, in flush
File "/home/ubuntu-61/anaconda3/envs/pytorch4/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 106, in flush
File "/home/ubuntu-61/anaconda3/envs/pytorch4/lib/python3.7/site-packages/tensorboard/summary/writer/event_file_writer.py", line 155, in flush
File "/home/ubuntu-61/anaconda3/envs/pytorch4/lib/python3.7/queue.py", line 89, in join
File "/home/ubuntu-61/anaconda3/envs/pytorch4/lib/python3.7/threading.py", line 289, in wait
TypeError: 'NoneType' object is not callable

The second error is assertion error，

When I run“python -m pytest unittest_reinvent”,report 4 failed, 92 passed, 29 warnings.
The error report is as follows：
======================== FAILURES ===============================================================
______________________________________ Test_model_functions.test_likelihood_function_differences _______________________________________

self = <unittest_reinvent.model_function_tests.test_model_functions.Test_model_functions testMethod=test_likelihood_function_differences>

def test_likelihood_function_differences(self):
    seq, sample, nll = self.model.sample_sequences_and_smiles(batch_size=128)
    nll2 = self.model.likelihood(seq)
    nll3 = self.model.likelihood_smiles(sample)

  nt.assert_array_almost_equal(nll.detach().cpu().numpy(), nll2.detach().cpu().numpy(), 3)

E AssertionError:
E Arrays are not almost equal to 3 decimals
E
E Mismatch: 45.3%
E Max absolute difference: 0.00488281
E Max relative difference: 0.00026931
E x: array([19.316, 28.097, 33.41 , 18.136, 45.46 , 24.451, 43.118, 38.982,
E 32.112, 20.839, 21.984, 32.963, 21.883, 30.506, 37.408, 31.448,
E 24.063, 43.964, 38.294, 21.17 , 42.545, 34.434, 27.453, 27.187,...
E y: array([19.313, 28.096, 33.409, 18.131, 45.459, 24.447, 43.119, 38.986,
E 32.115, 20.837, 21.984, 32.965, 21.884, 30.507, 37.411, 31.451,
E 24.063, 43.965, 38.294, 21.171, 42.546, 34.435, 27.452, 27.185,...

unittest_reinvent/model_function_tests/test_model_functions.py:65: AssertionError
__________________________________________ Test_model_functions.test_likelihoods_from_model_1 __________________________________________

self = <unittest_reinvent.model_function_tests.test_model_functions.Test_model_functions testMethod=test_likelihoods_from_model_1>

def test_likelihoods_from_model_1(self):
    likelihoods = self.model.likelihood_smiles(["CCC", "c1ccccc1"])

  self.assertAlmostEqual(likelihoods[0].item(), 20.9116, 3)

E AssertionError: 20.9108943939209 != 20.9116 within 3 places (0.0007056060791015284 difference)

unittest_reinvent/model_function_tests/test_model_functions.py:28: AssertionError
_____________________________________________ Test_murcko_scaffold_filter.test_save_to_csv _____________________________________________

self = <unittest_reinvent.scaffoldfilter_tests.test_murcko_scaffold_filter.Test_murcko_scaffold_filter testMethod=test_save_to_csv>

def test_save_to_csv(self):
    folder = self.workfolders[0]
    self.scaffold_filter.save_to_csv(folder)
    output_file = os.path.join(folder, "scaffold_memory.csv")

  self.assertEqual(os.path.isfile(output_file), True)

E AssertionError: False != True

unittest_reinvent/scaffoldfilter_tests/test_murcko_scaffold_filter.py:56: AssertionError
_______________________________________________ Test_no_scaffold_filter.test_save_to_csv _______________________________________________

self = <unittest_reinvent.scaffoldfilter_tests.test_no_filter.Test_no_scaffold_filter testMethod=test_save_to_csv>

def test_save_to_csv(self):
    folder = self.workfolders[0]
    self.scaffold_filter.save_to_csv(folder)
    output_file = os.path.join(folder, "scaffold_memory.csv")

  self.assertEqual(os.path.isfile(output_file), True)

E AssertionError: False != True

unittest_reinvent/scaffoldfilter_tests/test_no_filter.py:53: AssertionError
my running platform ： linux -- Python 3.7.8, pytest-5.3.0, py-1.9.0, pluggy-0.13.1
I don't know if this is related to the error above.

Thanks in advance for your help,

Sincerely

Train a new model with a new .smi set

Hi,

Thanks again for developing this wonderful tool.
I'm trying to train a new model with a set of smiles of interest. First of all I create a new model and then train it with a TL run.
I managed to do that with your original chembl training set without any problems, but trying with other training sets and using the same protocol always gives me this type of error:

File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/input.py", line 25, in <module> main() File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/input.py", line 21, in main manager.run() File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/running_modes/manager.py", line 95, in run job() File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/running_modes/manager.py", line 41, in _run_transfer_learning runner.run() File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/running_modes/transfer_learning/transfer_learning_runner.py", line 29, in run self._train_epoch(epoch, self._config.input_smiles_path) File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/running_modes/transfer_learning/transfer_learning_runner.py", line 38, in _train_epoch for _, batch in enumerate(self._progress_bar(data_loader, total=len(data_loader))): File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/tqdm/std.py", line 1091, in __iter__ for obj in iterable: File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in __next__ data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/models/dataset.py", line 22, in __getitem__ encoded = self._vocabulary.encode(tokens) File "/cluster/import/master/progs/all/opensource/reinvent/0.0_2020.09.14_9c57636/source/models/vocabulary.py", line 59, in encode vocab_index[i] = self._tokens[token] KeyError: '7' ^M 69%|######8 | 7798/11350 [04:31<02:03, 28.77it/s]
It doesn't stop always at the same point and also the key error is not always the same (sometimes is KeyError: '%15' or KeyError: '%19'). It would assume that is related to the training set. I tried to pre process all the smiles with rdkit, retaining only the atom types you use in the original publication (['H','B','C','N','O','F','S','Cl','Br']) and also using 'isomericsmiles=False' but this doesn't solve the problem unfortunately.

Thanks in advance for your help,

Cheers

from reinvent_scoring.scoring.scoring_function_parameters import ScoringFuncionParameters

this import statement is failing all over the place due to the typo.

it should be ScoringFunctionParameters and not ScoringFuncionParameters because it is that way in the reinvent_scoring package. I had to manually install reinvent_chemistry, reinvent_models and reinvent_scoring and also create the json.config file

Use of individual config filenames prevented

One line in the input.py prevents Reinventto use config files in different directories and with different file names, which among other things makes tutorials like this one (https://github.com/MolecularAI/ReinventCommunity/blob/d9fa00faaccb5bf40913d21250aac1dde52cdad7/notebooks/Complete_Use-Case-DRD2_Demo.ipynb) impossible without changing the code by hand.

Reinvent/input.py

Line 11 in b7324d2

    
           DEFAULT_BASE_CONFIG_PATH = (Path(__file__).parent / 'configs/config.json').resolve()

What's the different between REINVENT 2.0 and 3.x?

Hello,

I saw REINVENT 3.0 is under active development. What's the different between 3.x and 2.0/2.1? Which should we use for an academic study?

Are there any groups or organizations that communicate

Hello, I often encounter a lot of unclear problems when reproducing your results, but it is not a bug in the code, so I want to know if you have a group or organization for communicating problems. If you don’t understand the problem, you can promptly Q&A?

If not, I think there is a very good idea, which can attract more people to understand this project, and it is very convenient for you to promote. What do you think?

Complete "Matching substructure" could not be retained in generated molecules

Hello,

First of all, I would like to thank all authors for this great tool.

I was using matching substructure component. I could not get matching substructures in generated molecules (for some substructures). I was wondering If this has to do with the type of compounds/settings used.
I tried a lot of different options - training for large number of steps, excluding the custom alerts component, adding Tanimoto similarity component, using SMARTS/SMILES, also using inception with the set of molecules.

I get partial substructure match in some of the generated molecules. Part of substructure is retained in some while in others the other part of substructure. I would like to have complete substructure in generated molecules. Did anyone face similar issues? Any help would be really helpful. Thank you.

Unpickling error

Hi, I am getting this error while trying to run the
Complete_Use-Case-DRD2_Demo notebook in ReinventCommunity

(reinvent.v3.0) ubuntu@ip-172-31-47-40:~/repos/Reinvent2/Reinvent$ python /home/ubuntu/repos/Reinvent/input.py /home/ubuntu/repos/OutputS/DRD2_config.json
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 31, in _load_model
activity_model = self._load_container(parameters)
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 38, in _load_container
scikit_model = pickle.load(f)
_pickle.UnpicklingError: invalid load key, '\x00'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ubuntu/repos/Reinvent/input.py", line 21, in
manager.run()
File "/home/ubuntu/repos/Reinvent/running_modes/manager.py", line 16, in run
runner = RunningMode(self.configuration)
File "/home/ubuntu/repos/Reinvent/running_modes/constructors/running_mode.py", line 22, in new
return ReinforcementLearningModeConstructor(configuration)
File "/home/ubuntu/repos/Reinvent/running_modes/constructors/reinforcement_learning_mode_constructor.py", line 20, in new
scoring_function = ScoringFunctionFactory(config.scoring_function)
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/scoring_function_factory.py", line 16, in new
return cls.create_scoring_function_instance(sf_parameters, scoring_function_registry)
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/scoring_function_factory.py", line 25, in create_scoring_function_instance
return scoring_function(parameters, sf_parameters.parallel)
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/function/custom_sum.py", line 13, in init
super().init(parameters, parallel)
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/function/base_scoring_function.py", line 62, in init
self.scoring_components = factory.create_score_components()
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/score_component_factory.py", line 69, in create_score_components
return [self._current_components.get(p.component_type)(p) for p in self._parameters]
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/score_component_factory.py", line 69, in
return [self._current_components.get(p.component_type)(p) for p in self._parameters]
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 16, in init
self.activity_model = self._load_model(parameters)
File "/home/ubuntu/anaconda3/envs/reinvent.v3.0/lib/python3.7/site-packages/reinvent_scoring/scoring/score_components/standard/predictive_property_component.py", line 33, in _load_model
raise Exception(f"The loaded file {parameters.model_path} isn't a valid scikit-learn model")
Exception: The loaded file /home/ubuntu/repos/ReinventCommunity/notebooks/models/drd2.pkl isn't a valid scikit-learn model

I would really appreciate it if you could let me know what's causing this issue.

configs/config.json absent from installation instructions

Hi,
When doing a run, if the configs/config.json is not done, this is faced:

Traceback (most recent call last):
  File "/progs/all/opensource/reinvent/3.0/source/input.py", line 20, in <module>
    manager = Manager(configuration)
  File "/progs/all/opensource/reinvent/3.0/source/running_modes/manager.py", line 13, in __init__
    self._load_environmental_variables()
  File "/progs/all/opensource/reinvent/3.0/source/running_modes/manager.py", line 22, in _load_environmental_variables
    with open(os.path.join(project_root, '../configs/config.json'), 'r') as f:

It is mentioned on the README.md, but only on the Tests section, which can be misleading when installing Reinvent. Very minor, but thought I should mention it.

Reorganize repository as python package

Hi, it would be nice to be able to pip install this code.

I would be happy to help with a PR or give advice if you're not sure how to do this.

error while running Lib-Invent

Hi, I got the following issue, during running of the this Jupyter Notebook Lib-INVENT RL1 - QSAR.ipynb under the https://github.com/MolecularAI/Lib-INVENT/tree/main/tutorial

I have created the envs for Lib-Invent and installed all the packages from https://github.com/MolecularAI/Lib-INVENT/blob/main/environment.yml

I have also attached the *.json file, which i used.

Can you please help me out.

RL_input.json.zip

Traceback (most recent call last):
File "/media/medicina/ReinventCommunity/notebooks/Lib-INVENT/input.py", line 20, in
manager.run()
File "/media/medicina/ReinventCommunity/notebooks/Lib-INVENT/running_modes/manager.py", line 81, in run
job()
File "/media/medicina/ReinventCommunity/notebooks/Lib-INVENT/running_modes/manager.py", line 51, in _reinforcement_learning
reinforcement_learning.run()
File "/media/medicina/ReinventCommunity/notebooks/Lib-INVENT/running_modes/reinforcement_learning/reinforcement_learning.py", line 43, in run
actor_nlls, critic_nlls, augmented_nlls = self._updating(sampled_sequences, score_summary.total_score)
File "/media/medicina/ReinventCommunity/notebooks/Lib-INVENT/running_modes/reinforcement_learning/reinforcement_learning.py", line 59, in _updating
actor_nlls, critic_nlls, augmented_nlls = self.learning_strategy.run(scaffold_batch, decorator_batch, score, actor_nlls)
AttributeError: 'NoneType' object has no attribute 'run'

Conda env create fails on Ubuntu16.04

Hello, I am very happy to see your work. I tried to reproduce your results, but I executed the conda env create -f reinvent_shared.yml command according to the environment installation method provided by you, but I got the following error:

Solving environment: failed

ResolvePackageNotFound: 
  - intel-openmp==2019.5=281

My computer system is Ubuntu 16.04, what is the problem and how should I solve it?

What is the meaning of data processing

I’m looking at your data processing tutorial, but I don’t understand what this process is doing. I think many people who use REINVENT need to understand. Do you have data processing documentation and why do you want to process data like this?

A unit test is failing

unittest_reinvent/scoring_tests/scoring_components/test_predictive_property_component.py is failing. This is due to changing models/smiles.
Also, the regression fixture functions are currently using the wrong models.

Reinforcement learning, generating invalid molecules resulting assertion error and it is failing the entire process.

I have came across an frequent issues, where the reinforcement learning process is generating an molecule which rdkit considers as invalid molecule.
As a result some time I am getting

Assertion Error from this place
or
Math domain error when I am using custom product from this line

The above issues terminate the entire process, although it is due to only one single invalid molecule

My question is

Is there any solution to prevent generating invalid molecule (as per rdkit)
If an invalid molecule is generated how can we remove it from the batch, or how can we skip the step and move to next step? So the entire process does not fail due to one single invalid molecule?

Pytorch 1.3.1 version does not exist on the official website, how to install Pytorch 1.3.1 separately？

Hello, I am running conda env create -f reinvent_shared.yml, because of my network problem, there is a problem with the installation of pytorch, so I want to execute pip install pytorch==1.3.1 alone, but the system prompts no Found that pytorch==1.3.1, and then I checked on the pytorch official website, I did not find pytorch 1.3.1 version, so I want to confirm whether there is a version error in the environment you provide?

Complete Use Case in Reinvent Community

Hi, the Complete Use Case example appears to be missing the drd2.pkl file. When I used the old drd2.pkl file (from a download from a few days ago), the file raised an error, ModuleNotFoundError: No module named 'sklearn.ensemble.forest'. I believe this is because sklearn changed the name of this model to sklearn.ensemble._forest. Can you please provide an updated drd2.pkl file that works with the Use Case Example? Thanks.

Add LICENSE

technically without a license, nobody else can use this code. It would be good to add one. If you're not familiar with licensing, this site is helpful: https://choosealicense.com/. The best ones for academic work IMO are MIT and Apache

KeyError: '%10' when train a new model

Hi,

Thank you for your code！Again！
When I was training a new model, I encountered some mistakes，
I used methods from reinventcommunity to standardize smi files，
But the problem is not solved
When I trained to the fourth epoch, the problem occurred，KeyError: '%10'
The smi file is attached.(a txt file)

Thanks in advance for your help,

Sincerely
2.filtered.zip

Will more scoring functions be added in the future?

Hi there, I've used Reinvent in recent days and generated some interesting molecules. Thanks a lot for your efforts!
However, I've found that the scoring functions are still limited and more scoring functions are desired, for example, logP and H-Acceptors. I would appreciate it if you could add more frequently used scoring functions in the near future.

Cannt run on RTX 30 series GPU

This repo needs cudatoolkit=10.1.243 but RTX 30 series GPU needs cuda version >= 11.1.
If I create env from reinvent_shared.v2.1 it just cannot run.
If I create env with pytorch=1.8.0 and cudatoolkit=11.1.74 it will throw out error

Traceback (most recent call last):
  File "/home/user/fyr/Reinvent//input.py", line 25, in <module>
    main()
  File "/home/user/fyr/Reinvent//input.py", line 21, in main
    manager.run()
  File "/data/user/Reinvent/running_modes/manager.py", line 95, in run
    job()
  File "/data/user/Reinvent/running_modes/manager.py", line 41, in _run_transfer_learning
    runner.run()
  File "/data/user/Reinvent/running_modes/transfer_learning/transfer_learning_runner.py", line 29, in run
    self._train_epoch(epoch, self._config.input_smiles_path)
  File "/data/user/Reinvent/running_modes/transfer_learning/transfer_learning_runner.py", line 43, in _train_epoch
    loss.backward()
  File "/home/user/anaconda3/envs/reinvent_shared.v2.1/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/user/anaconda3/envs/reinvent_shared.v2.1/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

How can I make this repo work on 30 series GPU?
Thank you very much!

Missing DockStream integration in reinvent_scoring

Dear Developer,

Thanks for sharing really nice package of AI drug discovery!
To integrate Dockstream with Reinvent 3.0, reinvent_scoring should be modified I think.

reinvent_scoring/scoring/score_components/score_component_factory.py

# import DockStream from reinvent_scoring.scoring.score_components
from reinvent_scoring.scoring.score_components import DockStream

# And should be added following line  in _deafult_scoring_component_registry.
enum.DOCKSTREAM: DockStream

Thanks,

Taka

molecularai / reinvent Goto Github PK

reinvent's People

Contributors

Stargazers

Watchers

Forkers

reinvent's Issues

Recommend Projects

Recommend Topics

Recommend Org