hsiangyuzhao / rcps Goto Github PK

View Code? Open in Web Editor NEW

46.0 2.0 3.0 74 KB

official implementation of rectified contrastive pseudo supervision

License: MIT License

Python 100.00%

rcps's Issues

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py --mixed --benchmark --task la --exp_name running --wandb --entity xxx
/usr/lib/python3/dist-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
| distributed init (rank 0): env://

Semi-Supervised Medical Image Segmentation Training
Mixed Precision - True; CUDNN Benchmark - True; Num GPU - 1; Num Worker - 8
successfully loaded config file: {'MODEL': {'PROJECT_DIM': 64, 'LEAKY': True, 'NORM': 'BATCH'}, 'TRAIN': {'LR': 0.01, 'MOMENTUM': 0.9, 'DECAY': 0.0001, 'BURN_IN': 5, 'BURN': 0, 'RAMPUP': 100, 'EPOCHS': 100, 'BATCHSIZE': 1, 'SEED': 42, 'RATIO': 0.1, 'LOSS_TYPE': 1, 'SAMPLE_NUM': 400, 'BUFFER_SIZE': 1, 'CPS_RATIO': 0.1, 'CON_RATIO': 0.1}, 'TEST': {'BATCHSIZE': 4}}
Traceback (most recent call last):
File "/home/chaijingwen/RCPS-main/train.py", line 184, in
main()
File "/home/chaijingwen/RCPS-main/train.py", line 74, in main
AddChanneld(keys=['image', 'label'], allow_missing_keys=True),
NameError: name 'AddChanneld' is not defined
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1208857) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/ccj/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ccj/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-04-22_10:17:25
host : mvp-C621-WD12-IPMI
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1208857)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

您好，我在运行train.py时出现了以上的报错，可以请您帮忙看下是什么问题吗

ratio change problem

Hello, I'm new in medical image segmentation and I wanna change ratio in la.cfg. So I wonder what cps_ratio and con_ratio mean in la.cfg? Or I just need change the ratio into 0.2?

代码第10个epoch报错（全监督-RATIO=0.1）

在按照提示进行pip install moviepy imageio后，显示已有

但再次运行train.py仍报相同错误如第一张图所示
请问还需要怎么改

LA dataset

At present, I am unable to obtain LA dataset. Can you share it with me? I will only be doing academic research, and this is my student email address [email protected]

Training in real semi-supervision scenario训练报错

我的数据路径如下

按照您readme的指示，我把您的代码进行替换后用于训练真实半监督场景（其中image_root我设置为image_root = './data/LAA'）

报错信息如下：

请问除定义数据部分的代码需要修改外，还有其他需要修改吗

about requirements

你好，感谢您的工作和分享。在尝试复现你们工作时我需要申请对应环境的服务器，但是我没有在文档中看到代码的requirements，能麻烦您告知吗？

How can I resolve this question? Thank you

您好，我在前10个epoch的训练中，阶段性的得到了这样的结果，这是全监督还是半监督呢？
wandb: Run summary:
wandb: train/train_contrastive_l_loss_mean 0.98026
wandb: train/train_contrastive_u_loss_mean 1.18749
wandb: train/train_cosine_l_loss_mean 0.36995
wandb: train/train_cosine_u_loss_mean 0.49217
wandb: train/train_cps_l_loss_mean 1.59464
wandb: train/train_cps_u_loss_mean 2.29451
wandb: train/train_seg_loss_mean 5.00324
wandb: val/val_loss_mean 1.86493
wandb: val/val_metric_mean 0.6811

终端的部分输出如下：
Semi-Supervised Medical Image Segmentation Training
Mixed Precision - True; CUDNN Benchmark - True; Num GPU - 1; Num Worker - 8
successfully loaded config file: {'MODEL': {'PROJECT_DIM': 64, 'LEAKY': True, 'NORM': 'BATCH'}, 'TRAIN': {'LR': 0.01, 'MOMENTUM': 0.9, 'DECAY': 0.0001, 'BURN_IN': 5, 'BURN': 0, 'RAMPUP': 100, 'EPOCHS': 100, 'BATCHSIZE': 1, 'SEED': 42, 'RATIO': 0.1, 'LOSS_TYPE': 1, 'SAMPLE_NUM': 400, 'BUFFER_SIZE': 1, 'CPS_RATIO': 0.1, 'CON_RATIO': 0.1}, 'TEST': {'BATCHSIZE': 4}}
Task la prepared. Num labeled subjects: 8; Num unlabeled subjects: 72; Num validation subjects: 20
这里我把ratio设置为0.1，但我的数据文件夹是按照训练和验证，图像和标签，分为四个子文件夹的，且我没有将您在readme文档半监督训练需要进行替换的代码放进train.py文件,那么理论上应当按照全监督去训练。不过运行结果里显示还是有Num unlabeled subjects: 72。请问在全监督训练中是如何保证有标注图像中ratio以外的数据没有参与到训练中的呢？

期待您百忙之中的回答，祝您工作顺利，生活愉快。

LA数据集链接打不开

你好，感谢您的工作和分享。您提供的LA数据集的链接无法打开，请问方便使用邮箱或者网盘给我一份吗？非常感谢！
我的邮箱号是[email protected]

Test mould and make predictions

Hello hsiangyuzhao! I saw the Figure 2 in your paper that has blue lines denote the predictions and I wonder how can I get that cut line after training my own mould? In other word, I wanna test the train mould I got and make some predictions. Is it mark red in the train_visualization?

Code

Hi, thanks for your excellent work. When will you release the code?

数据集处理

请问数据集是怎么处理的呢？

关于数据

Thank you for your If you encounter issues downloading the data, you may find the same data at : https://academictorrents.com/details/80ecfefcabede760cdbdf63e38986501f7becd49
Please note that the orientation of the data downloaded from this link is not correct, please correct them manually.
请问这里的 "correct them manually"具体指什么呢？

CUDA out of memory

Hello,thanks for your sharing very much.When I tried to run the train.py,the Error always happened no matter how many Gpus I used. It is strange that different Gpus require different amounts of memory.The best Gpu I used is 4 NVIDIA A100 . Training is good but on the fifth iteration evaluation loop started, the error always arised. Do you know how to fix it?

single card for training

Hello, how will I use a single card for training, and what are the commands?

全监督（ratio = 1）训练出现loss Nan的问题

你好，我在全监督（ratio = 1）训练时出现了nan的问题，如下图所示：

请问需要改哪里的代码，我的实验设置与论文的相同。

About real semi-supervised scene

Hi, thank you for the README update and congratulations to the acceptance!

What is the different between real semi-supervised scene and changing label ratios?
To make sure, when changing label ratios, the model didn't use the label to compute segmentation loss, isn't this the same as real semi-supervised scene?

A follow up question:
Will there be precision change (drop) when switching to real semi-supervised scene compared to the results reported on the paper?

Many Thanks.

Code result

Author, thank you for your work. But when I reproduced the code, I set the negative samples to N=100, but there is still a little gap between my results and your paper.
LA dataset:
bg_dice: 0.9896 ± 0.0049; la_dice: 0.8712 ± 0.0577; bg_hd95: 2.2933 ± 1.3194; la_hd95: 15.1779 ± 16.0119; bg_asd: 0.4185 ± 0.2263; la_asd: 3.6897 ± 3.3981;
Pancreas dataset:
bg_dice: 0.996 ± 0.0015; pancreas_dice: 0.7719 ± 0.0702; bg_hd95: 1.2585 ± 0.4075; pancreas_hd95: 12.1445 ± 10.8312; bg_asd: 0.2605 ± 0.1133; pancreas_asd: 2.8957 ± 1.3297;
Here is my training command:
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 train.py --mixed --benchmark --task pancreas --exp_name pancreas --wandb
paper result:

关于用自己的数据集训练

你们好，首先非常感谢你们的工作！我想请问如果要尝试在自己的数据集上进行训练的话，需要改那些文件？我发现你们还没有提供对自己数据集的支持，但是非常想尝试用你们的模型试着跑跑实验看看分割效果
感谢：）

hsiangyuzhao / rcps Goto Github PK

rcps's Issues

train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2024-04-22_10:17:25 host : mvp-C621-WD12-IPMI rank : 0 (local_rank: 0) exitcode : 1 (pid: 1208857) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-04-22_10:17:25
host : mvp-C621-WD12-IPMI
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1208857)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html