Hello, Thank you for your kind replies. I see your

Recommended specification? about generative-and-discriminative-voxel-modeling HOT 2 CLOSED

EJShim commented on June 16, 2024

Recommended specification?

from generative-and-discriminative-voxel-modeling.

Comments (2)

ajbrock commented on June 16, 2024

So, first off, best practice when training for long epochs is to test ALL your code before launching an hours-and-hours long first epoch. I do this by just setting the training loops to loop over only the first example or two then checkpointing and outputting any statistics (there are issues, for example, with some of my checkpoint scripts not playing well with python 3, if I recall correctly).

My modelnet40 training times were significantly lower, (a couple hours per epoch? It's been over a year since I ran them so I don't recall and the logs are buried in an external hard drive somewhere) running on a single Maxwell Titan X. If you're working with dicom data and you've got ~256x256x256 volumes then you absolutely should expect your training time to scale with the dimensionality of your data (48 hours for a single epoch sounds mercifully fast, actually). Remember that I run all of my data at 32x32x32--if you're at 256x256x256 then you're already using more than 500 TIMES as much memory/computational cost for a single sample, assuming you're using a similar style network.

Running with a batchsize of 1 is also going to significantly slow you down and may mess up batchnorm (in the experiments I run with batchsize 1 on a localization/segmentation net I actually don't run into batchnorm issues, but I've heard that with BS<16 some people have issues). If you can run with a batch size closer to 10 you should definitely see significant speedups.

Given, however, that memory constraints are likely preventing you loading a larger batch onto your relatively small card, you might want to consider resampling your data to a more amenable size. If you're trying to look for fine-grained things (say, nodules in a CT scan, which only span a few voxels) then you might consider selecting random crops from your volumes, such that you keep the same resolution but don't have to deal with the entire spatial extent.

As to the number of epochs necessary, it's impossible to say with certainty since it's problem-specific, and depends on things like the size of your net (#parameters, depth, optimization difficulty) and the size and complexity of your dataset--also, it's more a matter of the number of iterations than the number of epochs (5 epochs on a 100,000 sample dataset is many more gradient descent steps than 50 epochs on a 1,000 sample dataset!) . A good rule of thumb is to start with a small number of epochs (say, 30, and anneal the learning rate at 15 and 25 epochs) and see how that does. Make sure to checkpoint before you anneal your learning rates!

You could also do the short-term training a couple times and do something like cyclic learning rates between different experiments (starting with the same net then re-running the experiment, kicking the learning rate back up to the start and then annealing it) and then use a Snapshot Ensemble with the different checkpoints to eek out a few more points of performance. I don't think you'll need the full 250 epochs, but in general if you can bake your net for closer to that many iterations you'll tend to be better off. In my experience 300 tends to be "certain, but probably overkill," and if I can spare the time to find some good annealing points I can do it closer to 100-150 epochs.

from generative-and-discriminative-voxel-modeling.

EJShim commented on June 16, 2024

I think I should try batch size>16, and see the difference.

and I will start with cropped-resampled dicom data.

always thank you for your replies.

from generative-and-discriminative-voxel-modeling.

Recommended specification? about generative-and-discriminative-voxel-modeling HOT 2 CLOSED

Comments (2)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent