Hi, Thanks for sharing the code! I'm also trying to use your code in

Is is possible to share trained segmentation model on YCB-video dataset? about densefusion HOT 4 CLOSED

j96w commented on August 15, 2024

Is is possible to share trained segmentation model on YCB-video dataset?

from densefusion.

Comments (4)

j96w commented on August 15, 2024

Hi, please make sure you are running the correct code, cuz the training of the vanilla segmentation should be way much faster than the DenseFusion/Iterative-Refine. It can converge within 20 hours after about 20 epochs with single GPU.

from densefusion.

zqsui commented on August 15, 2024

Thanks for you reply! I ran the train.py under the vanilla_segmentation folder with the default parameters:

python train.py --dataset_root=/folder_to_ycb_video_dataset

And the output from line 46 of train.py in vanilla_segmentation is

(96189, 2949)

here's the link to the trained log file for vanilla segmentation training

https://drive.google.com/file/d/19cHHcNB22acT-JB8tji6-0MK7UGXt1Dg/view?usp=sharing

In this file, you can see the for training epoch 1, it took ~ 6 hours to batch 19448 and is still not finished.

What do you think might be the issue?

I'm using a Titan X and the segmentation training took ~7.4G memory on GPU

from densefusion.

j96w commented on August 15, 2024

Ok, I see your point. Actually, there is no need for you to use such a big epoch size (96189/3=32063 batches in your case). You can resize the epoch size to about 5000 and run testing after every 5000 batches. BTW, I also find that your training runs strangely slow, where it takes you about 10 minutes for 500 batches, while on my server with one 1080Ti, it costs 3 min 44 sec. Maybe you can also double check whether the data loader costs too much time. My dataset is on an HDD, not SSD.

from densefusion.

zqsui commented on August 15, 2024

Thanks for your reply! I think I figured out the issue. After doing a simple profiling, the average data loading time for each batch (batch_size = 3) is around 0.005 second and the average train time including data loading for each batch is around 0.85 second. So for 500 batches, it will take about 500 * 0.85 = 425 seconds which is around 7 minutes with my old Titan X (bought at 2016). My dataset is also on HDD. So the data loader is not the bottleneck and I think it is fair for your new 1080Ti computing twice as fast as my old Titan X. Btw, I also tested the segmentation training on a new 2070 with batch_size = 2 as batch_size = 3 can't fit in a 8GB 2070. The average batch train time is around 0.57 second. I think I will stick to my titan x and train the segmentation.

Thanks again for your help!

from densefusion.

Recommend Projects

Is is possible to share trained segmentation model on YCB-video dataset? about densefusion HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent