Coder Social home page Coder Social logo

rcps's Issues

代码运行报错

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py --mixed --benchmark --task la --exp_name running --wandb --entity xxx
/usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
| distributed init (rank 0): env://

Semi-Supervised Medical Image Segmentation Training
Mixed Precision - True; CUDNN Benchmark - True; Num GPU - 1; Num Worker - 8
successfully loaded config file: {'MODEL': {'PROJECT_DIM': 64, 'LEAKY': True, 'NORM': 'BATCH'}, 'TRAIN': {'LR': 0.01, 'MOMENTUM': 0.9, 'DECAY': 0.0001, 'BURN_IN': 5, 'BURN': 0, 'RAMPUP': 100, 'EPOCHS': 100, 'BATCHSIZE': 1, 'SEED': 42, 'RATIO': 0.1, 'LOSS_TYPE': 1, 'SAMPLE_NUM': 400, 'BUFFER_SIZE': 1, 'CPS_RATIO': 0.1, 'CON_RATIO': 0.1}, 'TEST': {'BATCHSIZE': 4}}
Traceback (most recent call last):
File "/home/chaijingwen/RCPS-main/train.py", line 184, in
main()
File "/home/chaijingwen/RCPS-main/train.py", line 74, in main
AddChanneld(keys=['image', 'label'], allow_missing_keys=True),
NameError: name 'AddChanneld' is not defined
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1208857) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/ccj/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-04-22_10:17:25
host : mvp-C621-WD12-IPMI
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1208857)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

您好,我在运行train.py时出现了以上的报错,可以请您帮忙看下是什么问题吗

ratio change problem

Hello, I'm new in medical image segmentation and I wanna change ratio in la.cfg. So I wonder what cps_ratio and con_ratio mean in la.cfg? Or I just need change the ratio into 0.2?

LA dataset

At present, I am unable to obtain LA dataset. Can you share it with me? I will only be doing academic research, and this is my student email address [email protected]

Training in real semi-supervision scenario训练报错

我的数据路径如下
image

按照您readme的指示,我把您的代码进行替换后用于训练真实半监督场景(其中image_root我设置为image_root = './data/LAA')
image

报错信息如下:
image

请问除定义数据部分的代码需要修改外,还有其他需要修改吗

about requirements

你好,感谢您的工作和分享。在尝试复现你们工作时我需要申请对应环境的服务器,但是我没有在文档中看到代码的requirements,能麻烦您告知吗?

关于半监督训练

您好,我在前10个epoch的训练中,阶段性的得到了这样的结果,这是全监督还是半监督呢?
wandb: Run summary:
wandb: train/train_contrastive_l_loss_mean 0.98026
wandb: train/train_contrastive_u_loss_mean 1.18749
wandb: train/train_cosine_l_loss_mean 0.36995
wandb: train/train_cosine_u_loss_mean 0.49217
wandb: train/train_cps_l_loss_mean 1.59464
wandb: train/train_cps_u_loss_mean 2.29451
wandb: train/train_seg_loss_mean 5.00324
wandb: val/val_loss_mean 1.86493
wandb: val/val_metric_mean 0.6811

终端的部分输出如下:
Semi-Supervised Medical Image Segmentation Training
Mixed Precision - True; CUDNN Benchmark - True; Num GPU - 1; Num Worker - 8
successfully loaded config file: {'MODEL': {'PROJECT_DIM': 64, 'LEAKY': True, 'NORM': 'BATCH'}, 'TRAIN': {'LR': 0.01, 'MOMENTUM': 0.9, 'DECAY': 0.0001, 'BURN_IN': 5, 'BURN': 0, 'RAMPUP': 100, 'EPOCHS': 100, 'BATCHSIZE': 1, 'SEED': 42, 'RATIO': 0.1, 'LOSS_TYPE': 1, 'SAMPLE_NUM': 400, 'BUFFER_SIZE': 1, 'CPS_RATIO': 0.1, 'CON_RATIO': 0.1}, 'TEST': {'BATCHSIZE': 4}}
Task la prepared. Num labeled subjects: 8; Num unlabeled subjects: 72; Num validation subjects: 20
这里我把ratio设置为0.1,但我的数据文件夹是按照训练和验证,图像和标签,分为四个子文件夹的,且我没有将您在readme文档半监督训练需要进行替换的代码放进train.py文件,那么理论上应当按照全监督去训练。不过运行结果里显示还是有Num unlabeled subjects: 72。请问在全监督训练中是如何保证有标注图像中ratio以外的数据没有参与到训练中的呢?

期待您百忙之中的回答,祝您工作顺利,生活愉快。

LA数据集链接打不开

你好,感谢您的工作和分享。您提供的LA数据集的链接无法打开,请问方便使用邮箱或者网盘给我一份吗?非常感谢!
我的邮箱号是[email protected]

Test mould and make predictions

Hello hsiangyuzhao! I saw the Figure 2 in your paper that has blue lines denote the predictions and I wonder how can I get that cut line after training my own mould? In other word, I wanna test the train mould I got and make some predictions. Is it mark red in the train_visualization?

Code

Hi, thanks for your excellent work. When will you release the code?

CUDA out of memory

Hello,thanks for your sharing very much.When I tried to run the train.py,the Error always happened no matter how many Gpus I used. It is strange that different Gpus require different amounts of memory.The best Gpu I used is 4 NVIDIA A100 . Training is good but on the fifth iteration evaluation loop started, the error always arised. Do you know how to fix it?

About real semi-supervised scene

Hi, thank you for the README update and congratulations to the acceptance!

What is the different between real semi-supervised scene and changing label ratios?
To make sure, when changing label ratios, the model didn't use the label to compute segmentation loss, isn't this the same as real semi-supervised scene?

A follow up question:
Will there be precision change (drop) when switching to real semi-supervised scene compared to the results reported on the paper?

Many Thanks.

Code result

Author, thank you for your work. But when I reproduced the code, I set the negative samples to N=100, but there is still a little gap between my results and your paper.
LA dataset:
bg_dice: 0.9896 ± 0.0049; la_dice: 0.8712 ± 0.0577; bg_hd95: 2.2933 ± 1.3194; la_hd95: 15.1779 ± 16.0119; bg_asd: 0.4185 ± 0.2263; la_asd: 3.6897 ± 3.3981;
Pancreas dataset:
bg_dice: 0.996 ± 0.0015; pancreas_dice: 0.7719 ± 0.0702; bg_hd95: 1.2585 ± 0.4075; pancreas_hd95: 12.1445 ± 10.8312; bg_asd: 0.2605 ± 0.1133; pancreas_asd: 2.8957 ± 1.3297;
Here is my training command:
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 train.py --mixed --benchmark --task pancreas --exp_name pancreas --wandb
paper result:
2023-04-11_085119

关于用自己的数据集训练

你们好,首先非常感谢你们的工作!我想请问如果要尝试在自己的数据集上进行训练的话,需要改那些文件?我发现你们还没有提供对自己数据集的支持,但是非常想尝试用你们的模型试着跑跑实验看看分割效果
感谢:)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.