compsciencelab / torchmd-exp Goto Github PK
View Code? Open in Web Editor NEWImplementation of Differentiable Molecular Simulations with torchMD.
Implementation of Differentiable Molecular Simulations with torchMD.
Currently, we do not log anything beyond the losses and model checkpoints. I think it would be nice to set up a logger that could be set to print everything to a file in the logdir
. This includes, but is not limited to:
The idea would be to have a logging object set up and accessible from all modules that easily allows calls like log.info('some info')
and that stores where this call is being made from and the corresponding information.
Bug found when training with several molecules and sim_batch_size > 1
. The first epoch works fine, but after that there is a chance that the states
of a molecule are sent together with the embeddings
of another one on the self.local_we_worker.compute_gradients(...)
here.
I am currently trying to find the error. My guess is that there is some random sampling at some point that affects states
but not embeddings
and embeddings
are not updated to match the new ordering before sending them.
Hi! Thank you for the amazing paper and for the open-sourcing the code!
I was trying to reproduce the training process and encountered few difficulties.
For some reason, the given datasets, when loaded, has all the grain masses
equal to zero:
Which, obviously, breaks the program when divided by.
I was trying to set it None
for each molecule since in this case torchmd
guesses masses during Parameters
building. However, it sets all to 12. (which, I guess, corresponds to C-alpha atoms). This is probably not the way it is supposed to be, since it implies the wrong physics. And the training loss is stuck around 2.8:
After that, I've mapped resname
-s to the known AA masses. This indeed improved the train loss being started from 2 and decreased to 1 (and still being slowly on the way down):
However, the training curve looks nothing like in the example notebook where it start from around 5 and drops to almost zero.
I am using train_ff.yaml
with only "log_dir" and "device" modified.
Do you have an idea of what might be wrong?
From what I can see, the input.yaml
of the newly trained model differs from the one in data/models/fastfolders
, particularly in such fields as max_num_neighbors
and some other , so my next step would be to try using the same values, I guess.
I would much appreciate If you could help me with replicating the results. I am eager to use the trajectory reweighting method with another CG-potentials and slightly extended CG-systems and I really hope your implementation to help me a lot with that.
P.S. In order to launch I also had to resolve an environment (which doesn't work from the environment.yaml
missing certain packages that conflict with each other) and add a "timestep" key to the logger. I can make a PR with the environment.yaml
that worked for me.
Edit: The one with mapped AA masses eventually went to the zero-proximity after 5k steps. Still would be great to make optimisation faster, like in the example notebook.
Right now creating the learner instance is something like:
learner = Learner(scheme, steps, output_period, train_names=train_names, log_dir=args.log_dir,
keys = ('epoch', 'level', 'steps', 'train_loss', 'val_loss', 'loss_1', 'loss_2', 'val_loss_1', 'val_loss_2'))
The problem is that the keys
argument requires some of this keys to be introduced, otherwise there will be an error when using the logger to write to file. Also, the given keys will be written even though they may not be used in that specific training.
The idea would be to pass to the Learner only the keys that the user really wants to write, e.g. keys=('epoch', 'level', 'steps', 'train_loss', 'val_loss',)
, and, although the results dict has some other keys, the logger should only write the ones passed (plus the train_names losses if given) without throwing an error.
The random indices used in the item_getter
are sampled with replacement and can produce duplicates.
Currently torchmexp is not reproducible because there are various factors that use RNGs that we are not seeding.
Torch: Torch is seeded manually at the beginning of the train scripts, so it should not be a problem
Python random: The standard random library is used to produce the randomness in ProteinDataset.shuffle()
, used in both folding and docking. At least in the case of docking, this was a source of non-reproducibility since the order of the batching depended on this. Also, when using a number of molecules that is not divisible by the batch size, a random sample of the molecules is used to fill the last batch, which is also determined by the standard random library.
This is fixed adding random.seed(args.seed)
on the train scripts.
Numpy random: In the funcions to add noise, both for folding and for docking, we use np.random.normal(...)
. This is another source of non reproducibility because the initial structures for the training will be different.
This is fixed adding np.random.seed(args.seed)
on the train scripts.
TorchMD randomness: Simulations use random sampling of velocities with torch.randn(...)
. Therefore, I think that if we add seedings to the standard python random and to numpy, this should already be reproducible because we already seed torch.
The questions are:
torchmdexp.utils.init(args.seed)
that initializes and seeds everything. Or we could change the randomness from python and numpy and use only torch.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.