Comments (8)
Hi Florian,
Most of the slow parts should be parallelized (in particular, the main training loop should be run on a GPU with cuda-- it'll be very very slow on a CPU). Is there a particular part that's being very slow for you?
Kevin
from chemprop.
Ok that is odd.
For a batch size of 256 I get around 1 it/s and my 1080TI hoovers around 10% utilization.
Because my dataset has around 4 million samples, you can imagine that I wait some time for an epoch to finish. :-)
If you say the slow parts are already parallelized, then the error has to be on my site.
Do you have an idead what the problem could be?
from chemprop.
Ah, we've never run it on anything quite so big. I guess that would explain why it's taking forever. So it takes ~4 hr per epoch?
Our code caches a lot of the computation during the first epoch, though, so the first epoch is the slowest epoch; subsequent epochs should be roughly 4x faster. (Though the cache for a dataset of that size would use something like half a terabyte of RAM... so if you end up having trouble with memory you can chunk your dataset using the --num_chunks option, which also turns off the caching.)
We may also look into parallelizing some of the CPU computation that happens with each batch, if you're still running into trouble; just let us know. (We haven't done this parallelization yet because we usually just cache that computation during the first epoch.)
from chemprop.
Ah, we've never run it on anything quite so big. I guess that would explain why it's taking forever. So it takes ~4 hr per epoch?
Yeah roughly, it is more in the 3 hour range.
Our code caches a lot of the computation during the first epoch, though, so the first epoch is the slowest epoch; subsequent epochs should be roughly 4x faster. (Though the cache for a dataset of that size would use something like half a terabyte of RAM... so if you end up having trouble with memory you can chunk your dataset using the --num_chunks option, which also turns off the caching.)
I should have said that I already had to turn off the caching (unfortunatly), because I had problems with the dataset using up all my precious memory.
We may also look into parallelizing some of the CPU computation that happens with each batch, if you're still running into trouble; just let us know. (We haven't done this parallelization yet because we usually just cache that computation during the first epoch.)
I also started to look into it:
You call the featurization step (mol2graph) directly in the arguments section of the forward step of the encoder (e.g. for the mpn it is in mpn.py row 335).
For parallelization would´nt it be better to call this somewhere before?
Do you know a better place to start looking for good parallelization opportunities?
from chemprop.
Yeah, I believe the mol2graph step is the slowest CPU-based step based on some profiling tests we've run in the past, so that's probably the best place to start. We can look into parallelizing it too.
from chemprop.
Hi Florian,
Try pulling the master branch again. You can use the new option --no_cache to turn of the caching without having to hack it, and you can use the new option --parallel_featurization to do the CPU-based featurization asynchronously with the model (which will probably become default in the near future). We observed ~75% speedup compared to the previous version with cache turned off, when running with this option on a dataset of about 100k (this was rather surprising to us too; even though we typically cache from the second epoch onwards, it seems like the featurization was still taking more time than we thought). If you find that it's using too much RAM, you can just decrease the value of the flag --batches_per_queue_group, which would likely cause just a small performance hit. Hope this helps! And please let us know if anything goes wrong when using these new options.
(There are a lot of other code changes since we finally merged our dev branch, but the basic interface should still be the same-- most of the new code is just our experimental options. We just merged so that we could sync the branches before making some helpful engineering changes, like the one I described above. Please let us know if you encounter any problems, though.)
from chemprop.
Hi Kevin,
that was fast!
Thanks a lot and I will try the new code as soon as possible, but probably not until the new year.
I´ll give you my feedback, too.
I wish you happy holidays!
Best wishes,
Florian
from chemprop.
Sounds good, happy holidays to you too!
from chemprop.
Related Issues (15)
- code lacks a proper project structure HOT 1
- features_only failing? HOT 3
- prediction by default picking up all models in the directory HOT 1
- Replace hard-wired features with feature factory HOT 1
- Project is listed as "HTML" project? HOT 1
- Issue with TensorboardX version 1.7
- 3D distance feature
- The benchmark should use the same random seed to split train, valid and test data HOT 1
- About this model training file
- Error while running the code HOT 1
- predict.py AssertionError HOT 1
- BCELoss is unstable HOT 3
- generalize FunctionalGroupFeaturizer HOT 2
- adding partial charges HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chemprop.