Comments (2)
So, first off, best practice when training for long epochs is to test ALL your code before launching an hours-and-hours long first epoch. I do this by just setting the training loops to loop over only the first example or two then checkpointing and outputting any statistics (there are issues, for example, with some of my checkpoint scripts not playing well with python 3, if I recall correctly).
My modelnet40 training times were significantly lower, (a couple hours per epoch? It's been over a year since I ran them so I don't recall and the logs are buried in an external hard drive somewhere) running on a single Maxwell Titan X. If you're working with dicom data and you've got ~256x256x256 volumes then you absolutely should expect your training time to scale with the dimensionality of your data (48 hours for a single epoch sounds mercifully fast, actually). Remember that I run all of my data at 32x32x32--if you're at 256x256x256 then you're already using more than 500 TIMES as much memory/computational cost for a single sample, assuming you're using a similar style network.
Running with a batchsize of 1 is also going to significantly slow you down and may mess up batchnorm (in the experiments I run with batchsize 1 on a localization/segmentation net I actually don't run into batchnorm issues, but I've heard that with BS<16 some people have issues). If you can run with a batch size closer to 10 you should definitely see significant speedups.
Given, however, that memory constraints are likely preventing you loading a larger batch onto your relatively small card, you might want to consider resampling your data to a more amenable size. If you're trying to look for fine-grained things (say, nodules in a CT scan, which only span a few voxels) then you might consider selecting random crops from your volumes, such that you keep the same resolution but don't have to deal with the entire spatial extent.
As to the number of epochs necessary, it's impossible to say with certainty since it's problem-specific, and depends on things like the size of your net (#parameters, depth, optimization difficulty) and the size and complexity of your dataset--also, it's more a matter of the number of iterations than the number of epochs (5 epochs on a 100,000 sample dataset is many more gradient descent steps than 50 epochs on a 1,000 sample dataset!) . A good rule of thumb is to start with a small number of epochs (say, 30, and anneal the learning rate at 15 and 25 epochs) and see how that does. Make sure to checkpoint before you anneal your learning rates!
You could also do the short-term training a couple times and do something like cyclic learning rates between different experiments (starting with the same net then re-running the experiment, kicking the learning rate back up to the start and then annealing it) and then use a Snapshot Ensemble with the different checkpoints to eek out a few more points of performance. I don't think you'll need the full 250 epochs, but in general if you can bake your net for closer to that many iterations you'll tend to be better off. In my experience 300 tends to be "certain, but probably overkill," and if I can spare the time to find some good annealing points I can do it closer to 100-150 epochs.
from generative-and-discriminative-voxel-modeling.
I think I should try batch size>16, and see the difference.
and I will start with cropped-resampled dicom data.
always thank you for your replies.
from generative-and-discriminative-voxel-modeling.
Related Issues (11)
- Request: make cuDNN optional? HOT 1
- Pytorch implementation
- ValueError: unmarshallable object HOT 1
- GUI crashing when trying to render HOT 3
- Crashing when importing VAE.py HOT 1
- higher data dimension? HOT 1
- classification model: cannot import name checkpoints HOT 3
- theano and lasagne version HOT 1
- Are the configurations for the classification correct? HOT 1
- 24 rotations
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from generative-and-discriminative-voxel-modeling.