compsciencelab / torchmd-exp Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 0.0 442.21 MB

Implementation of Differentiable Molecular Simulations with torchMD.

Python 70.29% Jupyter Notebook 29.71%

torchmd-exp's People

Contributors

Stargazers

Watchers

torchmd-exp's Issues

Set up extensive logging system to file

Summary

Currently, we do not log anything beyond the losses and model checkpoints. I think it would be nice to set up a logger that could be set to print everything to a file in the logdir. This includes, but is not limited to:

The epoch and batch information for each batch.
In which module or function the code is at each moment.
Intermediary values of interest or some checks that indicate that everything is working fine (e.g. gradient values, weight values, energies for the reweighting...)

TODO

The idea would be to have a logging object set up and accessible from all modules that easily allows calls like log.info('some info') and that stores where this call is being made from and the corresponding information.

Embeddings not properly passed to compute gradients

Bug found when training with several molecules and sim_batch_size > 1. The first epoch works fine, but after that there is a chance that the states of a molecule are sent together with the embeddings of another one on the self.local_we_worker.compute_gradients(...) here.

I am currently trying to find the error. My guess is that there is some random sampling at some point that affects states but not embeddings and embeddings are not updated to match the new ordering before sending them.

Struggle with replicating the training curve

Hi! Thank you for the amazing paper and for the open-sourcing the code!

I was trying to reproduce the training process and encountered few difficulties.
For some reason, the given datasets, when loaded, has all the grain masses equal to zero:

Which, obviously, breaks the program when divided by.

I was trying to set it None for each molecule since in this case torchmd guesses masses during Parameters building. However, it sets all to 12. (which, I guess, corresponds to C-alpha atoms). This is probably not the way it is supposed to be, since it implies the wrong physics. And the training loss is stuck around 2.8:

After that, I've mapped resname-s to the known AA masses. This indeed improved the train loss being started from 2 and decreased to 1 (and still being slowly on the way down):

However, the training curve looks nothing like in the example notebook where it start from around 5 and drops to almost zero.

I am using train_ff.yaml with only "log_dir" and "device" modified.

Do you have an idea of what might be wrong?
From what I can see, the input.yaml of the newly trained model differs from the one in data/models/fastfolders, particularly in such fields as max_num_neighbors and some other , so my next step would be to try using the same values, I guess.

I would much appreciate If you could help me with replicating the results. I am eager to use the trajectory reweighting method with another CG-potentials and slightly extended CG-systems and I really hope your implementation to help me a lot with that.

P.S. In order to launch I also had to resolve an environment (which doesn't work from the environment.yaml missing certain packages that conflict with each other) and add a "timestep" key to the logger. I can make a PR with the environment.yaml that worked for me.

Edit: The one with mapped AA masses eventually went to the zero-proximity after 5k steps. Still would be great to make optimisation faster, like in the example notebook.

Modify Learner/Logger so that not entered keys that appear in results_dict don't raise error

Right now creating the learner instance is something like:

learner = Learner(scheme, steps, output_period, train_names=train_names, log_dir=args.log_dir,
                      keys = ('epoch', 'level', 'steps', 'train_loss', 'val_loss', 'loss_1', 'loss_2', 'val_loss_1', 'val_loss_2'))

The problem is that the keys argument requires some of this keys to be introduced, otherwise there will be an error when using the logger to write to file. Also, the given keys will be written even though they may not be used in that specific training.

The idea would be to pass to the Learner only the keys that the user really wants to write, e.g. keys=('epoch', 'level', 'steps', 'train_loss', 'val_loss',), and, although the results dict has some other keys, the logger should only write the ones passed (plus the train_names losses if given) without throwing an error.

Item getter for last batch repeats objects

The random indices used in the item_getter are sampled with replacement and can produce duplicates.

Reproducibility of torchmdexp

Currently torchmexp is not reproducible because there are various factors that use RNGs that we are not seeding.

Torch: Torch is seeded manually at the beginning of the train scripts, so it should not be a problem

Python random: The standard random library is used to produce the randomness in ProteinDataset.shuffle(), used in both folding and docking. At least in the case of docking, this was a source of non-reproducibility since the order of the batching depended on this. Also, when using a number of molecules that is not divisible by the batch size, a random sample of the molecules is used to fill the last batch, which is also determined by the standard random library.

This is fixed adding random.seed(args.seed) on the train scripts.

Numpy random: In the funcions to add noise, both for folding and for docking, we use np.random.normal(...). This is another source of non reproducibility because the initial structures for the training will be different.

This is fixed adding np.random.seed(args.seed) on the train scripts.

TorchMD randomness: Simulations use random sampling of velocities with torch.randn(...). Therefore, I think that if we add seedings to the standard python random and to numpy, this should already be reproducible because we already seed torch.

The questions are:

Are there other sources of randomness that I have not considered here?
Do we want to allow for 100% reproducibility?
If so, we could make somehting like torchmdexp.utils.init(args.seed) that initializes and seeds everything. Or we could change the randomness from python and numpy and use only torch.

compsciencelab / torchmd-exp Goto Github PK

torchmd-exp's People

Contributors

Stargazers

Watchers

torchmd-exp's Issues

Set up extensive logging system to file

Summary

TODO

Embeddings not properly passed to compute gradients

Struggle with replicating the training curve

Modify Learner/Logger so that not entered keys that appear in results_dict don't raise error

Item getter for last batch repeats objects

Reproducibility of torchmdexp

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent