Light

Question on the difference in results after reproduction about dgnet HOT 17 CLOSED

ujn7843 commented on July 27, 2024

Question on the difference in results after reproduction

from dgnet.

Comments (17)

xxxliu95 commented on July 27, 2024 1

Hey Claydon,

Given the fact that the batch size I use is just 4, it may not be a good solution to use multiple GPUs to accelerate the training. If I am correct, DataParallel is for these models with a large batch size. And your error might be that on some GPUs there is only one training image.

To accelerate the training, I would suggest you work on accelerating meta-test training, which is the major reason for the slow training of our model. In every meta-test step, we need to create new computation graphs as the meta-test model. This is extremely slow. Without the meta-test training stages, one epoch may only take around 20 mins but with meta-test, it takes 1+ hours.

Best wishes,
Xiao

from dgnet.

xxxliu95 commented on July 27, 2024

Hi Jiayi,

Thank you for your interest in our work.

The current code has the parameters without any fine-tuning. As in my experiments, I had to tune several hyperparameters to obtain better results for each case and I only saved the pre-trained models but did not save the hyperparameters for each case.

I can provide several tips here. You can change the resampling rate in the data loader for training and testing by varying 1.1 in "resize_order = re / 1.1" between 1.0 - 1.3. I also performed experiments with input sizes 224x224 and 288x288. Sometimes 288x288 gives better results but the GPU memory will cause issues. I also tried to tune the meta_step_size from 0.001 to 0.01. Also, # k_un = 1 # k1 = 20 # k2 = 2, the three training parameters matter.

I will try to find tuned codes at least for BCD->A such that you can play with it.

Best wishes,
Xiao

from dgnet.

ujn7843 commented on July 27, 2024

Hi Xiao,

Thanks for your reminder!

You may kindly send the tuning code to my email: [email protected] or just make it public on hub If you can find it.

Jiayi

from dgnet.

xxxliu95 commented on July 27, 2024

Hi Jiayi,

I just started re-training. I will release the tuned version for BCD->A soon.

Feel free to talk with me at MICCAI 2021.

Best wishes,
Xiao

from dgnet.

ujn7843 commented on July 27, 2024

Hi Xiao,

Thanks for your effort re-training the model. I attempted to tune the parameters and currently have several questions regarding the implementation details.

For changing "resize_order = re / 1.1", you said it was used to change the resampling rate in the dataloader, but after I looked through the code, I am wondering isn't it directly scaling the values of raw data?
For changing "# k_un = 1 # k1 = 20 # k2 = 2", I consider k_un to be the number of iterations for meta-train and meta-test steps. I am wondering the naming reasons for 'un' in 'k_un', 'un_imgs', 'un_reco_loss' and etc. Besides, I am wondering how changing k1 and k2 could affect inference results since they are simply used to record the training process in my view.

I hope that you can tell me where I got it wrong. Thanks in advance!

Jiayi

from dgnet.

xxxliu95 commented on July 27, 2024

Hi Jiayi,

The resampling rate is the order to rescale the images, which affects how large the anatomy of interest is.

"un" means unlabeled. k1 and k2 affect the learning rate decay.

Best wishes,
Xiao

from dgnet.

ujn7843 commented on July 27, 2024

Hi Xiao,

Thank you for your reminder.

I noticed that you used 'scheduler.step(val_score)' twice in train_meta.py. Is that for accelerating learning rate decay? If it is, why not tune the step_size param inside lr_scheduler.StepLR?

Thanks.

Jiayi

from dgnet.

xxxliu95 commented on July 27, 2024

Hi Jiayi,

Oh yes, good point. I think it is a mistake when I copy from other versions to this public version.

There should be only one step. This will cause the LR to decay too quickly and the model will converge to a local minima quickly.

Best wishes,
Xiao

from dgnet.

ujn7843 commented on July 27, 2024

Hi Xiao,

I am currently tuning the model myself according to the tips you gave. May I know the exact way for you to tune the parameters? Did you use grid search, random search, Bayesian optimization or other methods? Is there any priority among the different parameters that need to be tuned first?

Thanks in advance.

Jiayi

from dgnet.

xxxliu95 commented on July 27, 2024

Hi Jiayi,

Tuning the model is a little bit tricky. I tune the hyperparameters by checking the losses and visuals during training. You may first try to change a bit the resampling rate to see how it affects the results and keep k1 and k2 fixed first if you are just playing with it. If you want to consider our model as one baseline, I suggest you wait for the tuned version or I can share you with the well-trained model wights.

I found the results of 5% cases with the current version. It seems that for BCD->A 5% cases, the current parameters work well and the results are even better than the results I report in the paper. I am busy during this week as I am in MICCAI presenting this paper. I will put up more details of training hopefully in next weeks.

Best wishes,
Xiao

from dgnet.

ujn7843 commented on July 27, 2024

Great! Thanks.

from dgnet.

ujn7843 commented on July 27, 2024

Hi Xiao,

Kindly note that citation 10 in this paper might be wrong... You might want to cite the paper which proposes Dice loss but actually it isn't ...

Best,
Jiayi

from dgnet.

xxxliu95 commented on July 27, 2024

Hi Jiayi,

Thanks for that. I personally prefer to cite the very initial DICE paper. But yes, to be accurate, the following paper should be cited.

Milletari F, Navab N, Ahmadi SA, 2016. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. 2016 Fourth International Conference on 3D Vision, 565–571doi:10.1109/3DV.2016.79.

Best wishes,
Xiao

from dgnet.

xxxliu95 commented on July 27, 2024

Hi Jiayi,

I updated the code and training details. I fixed some issues like the double copied lr step code.

The current parameters are for BCD->A cases. You can train the models to see how it performs. Overall, to tune the model, you may want to change the initial learning rate (2e-5 to 5e-5), the number of training epochs (80 to 120 for 100% cases), training parameters (k1 and k2), and resampling rate. Our model has many parameters to tune, which is also a drawback of disentanglement. The results I reported in the paper may not be the potentially best results of our model as I did not have much time to tune before submission. I actually found better results for the 5% case with the parameters in this public version.

Best wishes,
Xiao

from dgnet.

ujn7843 commented on July 27, 2024

Hi Xiao,

Thank you very much for your work. I will try to tune the model to see how it performs. Many thanks!

Best,
Jiayi

from dgnet.

Claydon-Wang commented on July 27, 2024

Hi Xiao,

Thank you very much for your work. I will try to tune the model to see how it performs. Many thanks!

Best,

Jiayi

Hi, Yang,

I was wondering if the model could be trained in "DataParallel", since it is time-consuming to use a single GPU.

I turn "model.to(device)" to "model = torch.nn.DataParalle(model).cuda()" and the mistake is shown as below:

DataParallel multi-gpu RuntimeError: chunk expects at least a 1-dimensional tensor

How can I solve it. Can you give me some advice? Thanks in advance!

from dgnet.

Claydon-Wang commented on July 27, 2024

Hi liu,

Thank you for your kind reply, I will try it.

Best regards,
Wang

from dgnet.

Related Issues (8)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.