Coder Social home page Coder Social logo

Comments (4)

j96w avatar j96w commented on August 15, 2024

Hi, please make sure you are running the correct code, cuz the training of the vanilla segmentation should be way much faster than the DenseFusion/Iterative-Refine. It can converge within 20 hours after about 20 epochs with single GPU.

from densefusion.

zqsui avatar zqsui commented on August 15, 2024

Thanks for you reply! I ran the train.py under the vanilla_segmentation folder with the default parameters:

python train.py --dataset_root=/folder_to_ycb_video_dataset

And the output from line 46 of train.py in vanilla_segmentation is

(96189, 2949)

here's the link to the trained log file for vanilla segmentation training

https://drive.google.com/file/d/19cHHcNB22acT-JB8tji6-0MK7UGXt1Dg/view?usp=sharing

In this file, you can see the for training epoch 1, it took ~ 6 hours to batch 19448 and is still not finished.

What do you think might be the issue?

I'm using a Titan X and the segmentation training took ~7.4G memory on GPU

from densefusion.

j96w avatar j96w commented on August 15, 2024

Ok, I see your point. Actually, there is no need for you to use such a big epoch size (96189/3=32063 batches in your case). You can resize the epoch size to about 5000 and run testing after every 5000 batches. BTW, I also find that your training runs strangely slow, where it takes you about 10 minutes for 500 batches, while on my server with one 1080Ti, it costs 3 min 44 sec. Maybe you can also double check whether the data loader costs too much time. My dataset is on an HDD, not SSD.

from densefusion.

zqsui avatar zqsui commented on August 15, 2024

Thanks for your reply! I think I figured out the issue. After doing a simple profiling, the average data loading time for each batch (batch_size = 3) is around 0.005 second and the average train time including data loading for each batch is around 0.85 second. So for 500 batches, it will take about 500 * 0.85 = 425 seconds which is around 7 minutes with my old Titan X (bought at 2016). My dataset is also on HDD. So the data loader is not the bottleneck and I think it is fair for your new 1080Ti computing twice as fast as my old Titan X. Btw, I also tested the segmentation training on a new 2070 with batch_size = 2 as batch_size = 3 can't fit in a 8GB 2070. The average batch train time is around 0.57 second. I think I will stick to my titan x and train the segmentation.

Thanks again for your help!

from densefusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.