Coder Social home page Coder Social logo

gflownet's People

Contributors

bengioe avatar mj10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gflownet's Issues

Generated Docking Candidates

Dear Authors,

I am deveping a model that uses your dataset with docking energies. We realized that the docking energy from the AutoDock Vina can have quite a significant fluctuation depending on the input optimized molecule geometry, random seed, and the hyperparameters chosen for the docking process. Therefore, this makes it hard for us to reproduce the set of reward/energies obtained from your dataset.
Since the verification method can be unstable, instead, we would like to further study your model by looking at some of the top candidates genrated from GFlownet trained on this dataset.

In the paper, you described that $10^{6}$ candidates were generated and the top-1000 were analyzed in terms of their average reward.
Could you please kindly provide the $10^{6}$ or just the top-1000 generated ones along with their scores?

Thank you very much!
Haote

Difficult to reproduce results listed in the paper

Thank you for your fantastic work!

However, I got a problem reproducing the results in your paper. Based on your released code and the method listed in your paper, I got much inferior results.

Could you please share more details on your implementation for molecule generation? Thank you so much.

Error in Comment Line

Hello Authors,
There is an error on line 269 of toy_grid_dag.py where 'the' has been used twice. I just stumbled upon it so thought of pointing it out.

Error: Tensors used as indices must be long, byte or bool tensors

Dear authors, thanks for sharing the code for this wonderful work!

I am currently trying to run the naive gflownet training code in molecular docking setting by running
python gflownet.py
under the mols directory. I have unzipped the datasets and have all requirements installed. And I have successfully run the model in the toy grid environment.

However, I got this error when I run in the mols environment:

Exception while sampling:
tensors used as indices must be long, byte or bool tensors

And when I further look up, it seems like the problem occurs around the line 70 in model_block.py. I tried to print out the stem_block_batch_idx but it doesn't seems like could be transfered to long type directly, which is required by an index:

tensor([[-8.4156e-02, -4.2767e-02, -7.2483e-02, -3.3011e-02, -1.1865e-02,
2.0981e-03, 1.3293e-02, -7.3515e-03, -4.1853e-02, 2.1048e-02,
3.8597e-02, -1.5558e-02, 2.1581e-02, 4.9257e-03, 9.5167e-02,
4.0965e-02, 2.0146e-02, -5.5610e-02, -3.5318e-02, -3.1394e-02,
7.2078e-02, 1.8894e-02, -3.0249e-02, 2.9740e-02, 5.6950e-02,
-3.8425e-02, 2.8620e-02, 9.2052e-02, -8.5357e-03, 1.6788e-02,
7.7801e-02, -4.2119e-02, 1.3606e-02, 7.5316e-02, 4.7131e-02,
-4.3429e-03, 1.4157e-04, 2.0939e-02, -2.3499e-02, -6.5888e-02,
-2.8960e-02, 3.1548e-02, -9.2680e-03, 5.4192e-02, -9.6579e-03,
2.0602e-02, 1.8935e-02, 4.1228e-03, -6.3467e-02, 3.6747e-02,
1.4168e-02, -6.1473e-03, -1.9472e-02, -3.3970e-02, -5.7308e-03,
-4.6021e-02, -3.8956e-02, 4.7375e-02, -8.4562e-02, -1.0087e-02,
2.0478e-02, -6.8286e-02, 5.4663e-02, -5.1468e-02, 1.2617e-02,
2.4625e-02, 5.2167e-02, 5.7779e-02, -5.7788e-02, -1.3323e-02,
1.3913e-02, -7.4439e-02, -4.0981e-02, 5.0797e-02, -5.6230e-02,
-5.0963e-02, -5.5488e-02, -2.7339e-02, 1.0469e-02, 3.4695e-02,
-3.2623e-02, 7.6694e-03, -5.8748e-03, 7.0495e-02, -2.2805e-02,
-5.4334e-03, -2.1636e-02, 1.9597e-02, 6.2370e-02, -2.4995e-02,
1.6165e-02, -4.6878e-03, 2.9743e-02, 1.2653e-02, -5.4271e-02,
1.1247e-02, -3.8340e-03, -4.7489e-02, 1.5719e-02, 3.2552e-02,
6.0665e-02, -1.2330e-02, 2.6115e-02, -2.7376e-02, 3.4152e-02,
-1.0086e-02, -2.4257e-02, 3.2202e-02, -3.2659e-02, 8.6094e-02,
-3.1996e-02, 7.8751e-02, 4.5367e-02, -3.8693e-02, -3.6531e-02,
6.7311e-03, 3.2884e-02, -3.2774e-02, -3.8855e-02, 2.8814e-02,
4.3942e-02, -1.3374e-02, 3.0905e-02, -7.0064e-02, -5.7230e-03,
4.5093e-02, 3.8167e-02, -3.0602e-02, -4.0387e-02, -1.5985e-02,
-9.5962e-02, -1.1354e-02, 2.0879e-02, 1.4092e-02, -3.8405e-02,
1.4337e-02, -6.0682e-02, -9.0190e-03, -5.0898e-02, -4.7344e-02,
4.1045e-02, -6.7031e-02, 8.8112e-02, 3.2149e-02, 3.7748e-02,
-4.0757e-02, 1.4378e-02, -1.0749e-01, 6.1679e-02, -6.7268e-03,
-2.7889e-02, -5.9315e-02, -5.5883e-02, -2.6489e-02, 7.3640e-02,
1.8273e-02, -5.2330e-02, -7.7003e-05, 6.8413e-04, -1.4364e-01,
-1.9389e-02, 4.5649e-02, -4.0468e-02, -4.2819e-02, 4.5874e-02,
-1.6481e-02, 1.2627e-02, -8.4941e-02, -3.7458e-02, 2.1359e-02,
-9.2863e-02, -3.4932e-03, 7.1990e-02, 6.2144e-02, 8.1462e-02,
-2.0569e-02, 5.9194e-02, 1.6996e-03, 8.0618e-03, 6.1753e-02,
4.1602e-02, 1.0910e-02, 2.0523e-02, -9.9781e-04, 1.9131e-02,
-1.0267e-02, -9.4474e-02, -3.5725e-02, 9.9953e-03, -4.3195e-02,
-7.9051e-02, -3.1881e-02, 9.2158e-03, -9.6167e-04, -2.7508e-02,
7.1478e-02, -5.4107e-02, 8.0026e-02, -1.8887e-02, 4.6941e-02,
6.5166e-02, 1.2000e-02, 3.9906e-02, -2.8206e-02, 3.7483e-02,
3.5408e-02, -2.5863e-02, 2.3528e-02, 7.1814e-03, 8.0863e-02,
-1.3736e-02, -8.5978e-02, -4.1238e-02, -1.2545e-02, 5.5479e-02,
7.3487e-03, 8.9125e-02, -3.4814e-02, -4.5358e-02, 4.9893e-02,
3.5286e-02, 3.2084e-02, 5.0868e-02, 2.3549e-02, -9.2907e-02,
-6.9315e-03, -1.3088e-02, 8.7066e-02, 1.1554e-02, 1.3771e-02,
-1.7489e-02, -5.2921e-02, 9.2110e-03, 1.6766e-02, 4.8030e-02,
1.4481e-02, 2.9254e-03, 3.5795e-02, 1.0397e-01, -2.0675e-03,
-2.9916e-02, -5.3299e-02, -2.1396e-02, -5.3189e-02, 3.2805e-02,
-2.6538e-03, -2.6352e-02, -1.2823e-02, 6.1972e-02, 5.4822e-02,
4.5579e-02, -3.6638e-02, 8.1013e-03, -5.6014e-02, 1.5187e-02,
-6.5561e-02]], device='cuda:0', dtype=torch.float64,
grad_fn=)

I wonder if I am running the code in the correct way. Is this index correct and if so, do you know what's happening?

help with running gflownet.py

I've just started with gflownets and wanted to run the code using the default parameter values, however I keep getting exception while sampling, and ultimately, an attribute error with torch_geometric. Where could I be going wrong?

Potential bug with `FlowNetAgent.sample_many`

Hi there!

Thanks for sharing the code and just wanted to say I've enjoyed your paper. I was reading your code and noticed that there might be a subtle bug in the grid-env dag script. I might also have read it wrong...

https://github.com/bengioe/gflownet/blob/dddfbc522255faa5d6a76249633c94a54962cbcb/grid/toy_grid_dag.py#L316-L320

On line 316, we zip two things: zip([e for d, e in zip(done, self.envs) if not d], acts)

Here done is a vector of bools of length batch-size, self.envs is a list of GridEnv of length n-envs or buffer-size, and acts is a vector of ints of length (n-envs or buffer-size,).

By default, all the lengths of the above objects should be 16.

I was reading through the code, and noticed that if any of the elements in done are True, then on line 316 we filter them out with if not d. If env[0] was "done", then we would have a list of 15 envs, basically self.envs[1:]. Then when you zip up the actions and the shorter list envs, the actions will be aligned incorrectly... We will basically end up with self.envs[1:] being aligned to actions act[:-1]. As a result, step is now length 15, and on the next line, we again line up the incorrect actions of length 16 with our step list of length 16.

Perhaps we need to filter act based on the done vector? E.g act = act[done] after line 316?

Maybe I've got this wrong, so apologies for the noise if that's the case, but thought I'd leave a note in case what I'm suggesting is the case.

All the best!

Clarification regarding the number of molecular building blocks. Why they are different from JT-VAE?

Hello,

First, I really enjoyed reading the paper. Amazing work!

I have a question regarding the number of building blocks used for generating small molecules. Appendix A.3 of the paper states that there are a total of 105 unique building blocks (after accounting for different attachment points) and that they were obtained by the process suggested by the JT-VAE paper. (Jin et al. (2020)). However, in the JT-VAE paper, the total vocabulary size is $|\chi|=780$ obtained from the same ZINC dataset. My understanding is they are both the same. If that is correct, why are the number of building blocks different here? What am I missing? If they are not the same, can you please explain the difference?

Thank you so much for your help

Want to know the detailed preparation process of dataset

Hi I just want to use GFlowNet for another protein pocket. Now I have a dataset of SMILES and docking scores, but I'm not very sure about the rest of the preparation process of the dataset. For example, if you curate the result from the BRICS algorithm, then how you process the blocks that do not emerge in the block dictionary? And do you have a script for the generation of "jbonds" and "stem_idx"? I'd appreciate it if you could provide some! Thanks!

About Reproducibility Issues

Hi there,

Thank you very much for sharing the source codes.

For reproducibility, I modified the codes as follows,

self.train_rng = np.random.RandomState(int(time.time()))

---> self.train_rng = np.random.RandomState(142857)

as well as to add

torch.manual_seed(142857)
torch.cuda.manual_seed(142857)
torch.cuda.manual_seed_all(142857)

However, I encountered an issue. I ran it more than 3 times with the same random seed, but the results are totally different (although they are close). I didn't modify other parts, except for addressing package compatibility issues.

0 [1152.62, 112.939, 23.232]
100 [460.257, 44.253, 17.728]
200 [68.114, 6.007, 8.045]

0 [1151.024, 112.603, 24.993]
100 [471.219, 45.525, 15.964]
200 [66.349, 6.174, 4.607]

0 [1263.066, 124.094, 22.128]
100 [467.747, 44.899, 18.76]
200 [61.992, 5.715, 4.841]

I am wondering whether you encountered such an issue before.

Best,

Dong

seh_chembl.csv

Does the seh_chembl.csv file contain the results from the GFlowNet model output? I am just checking if I am interpreting the CSV correctly.

Clarification for similarity measures

Dear Authors,

It was exciting to read about your methods and results for molecule generation applications! I have a few questions below and hope that you can provide more details.

In the article, you mentioned that the top 1000 molecules' mean pairwise Tanimoto similarities were calculated.

  1. Could you please specify the type of fingerprint(s) that are used for the Tanimoto similarity calculation?

  2. In addition, does the mean pairwise similarity mean taking each molecule in the set and calculating its similarity to all the other molecules in the set, repeating this procedure for all the other molecules then taking the mean of all the means?

I wasn't able to locate any code/comments relating to this part specifically. I would greatly appreciate it if you can point me to the codes as well!

Regards,

Haote

About handling parent molecules in samples list

Hello,

In mols/glfownet.py, I noticed that you include the parents of m in samples list in _get_sample_model() (line 197 and line 203)

However, in the case we sample a null action (action==0), we only add m and not its parents (line 182). Is this a mistake, or am I missing something? Why is this case treated differently?

Thank you

About the implementation of advantage function in PPO Agent

I find that the implementation in PPOAgent from line 514 in grid/toy_grid_dag.py: adv = r + vsp * (1-d) - vs is only an implementation of the delta term in PPO raw paper. It's not the full term of the advantage function.

Was that a misunderstanding of your code or PPO?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.