Coder Social home page Coder Social logo

dvlab-research / context-aware-consistency Goto Github PK

View Code? Open in Web Editor NEW
155.0 155.0 19.0 1.05 MB

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

License: MIT License

Python 99.90% Shell 0.10%
cvpr2021 semantic-segmentation semi-supervised-learning semi-supervised-segmentation

context-aware-consistency's People

Contributors

x-lai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

context-aware-consistency's Issues

CPU RAM explodes after ~60 iterations.

Hello, I was training your model using:

python3 train.py --config configs/voc_cac_deeplabv3+_resnet50_1over8_datalist0.json

The training starts but till 60 iterations of the 1st epoch the RAM explodes and the system crashes.

GPU: P100 16GB
CPU RAM: 25gb
Batch SIze: 2

Runtime Error when val

thanks for your work.
but i found an error when i try to test the code in VOC
`Checkpoint <E:\Context-Aware-Consistency-master\pretrained\voc_1over8_datalist0_deeplabv3+_resnet101.pth> (epoch 63) was loaded

EVALUATION

0%| | 0/724 [00:17<?, ?it/s]
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\envs\cv\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "D:\ProgramData\Anaconda3\envs\cv\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "c:\Users\Administrator.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy_main
.py", line 45, in
cli.main()
File "c:\Users\Administrator.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
run()
File "c:\Users\Administrator.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 267, in run_file
runpy.run_path(options.target, run_name=compat.force_str("main"))
File "D:\ProgramData\Anaconda3\envs\cv\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\ProgramData\Anaconda3\envs\cv\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\ProgramData\Anaconda3\envs\cv\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "e:\Context-Aware-Consistency-master\train.py", line 128, in
main(config['n_gpu'], config['n_gpu'], config, args.resume, args.test)
File "e:\Context-Aware-Consistency-master\train.py", line 99, in main
trainer.train()
File "e:\Context-Aware-Consistency-master\base\base_trainer.py", line 105, in train
results = self._valid_epoch(0)
File "e:\Context-Aware-Consistency-master\trainer.py", line 145, in _valid_epoch
for batch_idx, (data, target) in enumerate(tbar):
File "D:\ProgramData\Anaconda3\envs\cv\lib\site-packages\tqdm\std.py", line 1185, in iter
for obj in iterable:
File "D:\ProgramData\Anaconda3\envs\cv\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "D:\ProgramData\Anaconda3\envs\cv\lib\site-packages\torch\utils\data\dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "D:\ProgramData\Anaconda3\envs\cv\lib\site-packages\torch\utils\data_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "D:\ProgramData\Anaconda3\envs\cv\lib\site-packages\torch\utils\data_utils\collate.py", line 83, in default_collate
return [default_collate(samples) for samples in transposed]
File "D:\ProgramData\Anaconda3\envs\cv\lib\site-packages\torch\utils\data_utils\collate.py", line 83, in
return [default_collate(samples) for samples in transposed]
File "D:\ProgramData\Anaconda3\envs\cv\lib\site-packages\torch\utils\data_utils\collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 375, 500] at entry 0 and [3, 396, 500] at entry 1`

Why iters are fixed for semi experiment? (like 1157)

Hi,
as the question, why iters are fixed?
If it is fixed, and the iters to run over the epoch are less than the fixed number. Maybe it's not suitable to call that the whole training process runs 80 epochs.

An I also want to ask how much time it takes to train the models (80 epochs, fixed iters, voc/city dataset).

Thanks!!

Question of the table

Hi,

Thanks for your work!

I'm wondering your result on this table is based on deeplabv3+ or not.
result

Cheers,

My reproduce results is slightly lower

Hello, I use the config file you provide to reproduce on Pascal Voc dataset. But I got somehow slightly lower results in multiple dataset split setting. My reproduce result are as following.
截屏2021-12-17 上午10 03 57
And config file used in my experiment is as following.

{
    "name": "CAC",
    "experim_name": "cac_datalist0_1of8_3",
    "dataset": "voc",
    "data_dir": ###,
    "datalist": 3,
    "n_gpu": 4,
    "n_labeled_examples": 10582,
    "diff_lrs": true,
    "ramp_up": 0.1,
    "unsupervised_w": 30,
    "ignore_index": 255,
    "lr_scheduler": "Poly",
    "use_weak_lables":false,
    "weakly_loss_w": 0.4,
    "pretrained": true,
    "random_seed": 42,

    "model":{
        "supervised": false,
        "semi": true,
        "supervised_w": 1,

        "sup_loss": "CE",

        "layers": 101,
        "downsample": true,
        "proj_final_dim": 128,
        "out_dim": 256,
        "backbone": "deeplab_v3+",
        "pos_thresh_value": 0.75,
        "weight_unsup": 0.1,
        "epoch_start_unsup": 5,
        "selected_num": 3200,
        "temp": 0.1,
        "step_save": 2,
        "stride": 8
    },


    "optimizer": {
        "type": "SGD",
        "args":{
            "lr": 0.01,
            "weight_decay": 1e-4,
            "momentum": 0.9
        }
    },

    "train_supervised": {
        "batch_size": 8,
        "crop_size": 320,
        "shuffle": true,
        "base_size": 400,
        "scale": true,
        "augment": true,
        "flip": true,
        "rotate": false,
        "blur": false,
        "split": "train_supervised",
        "num_workers": 8
    },

    "train_unsupervised": {
        "batch_size": 8,
        "crop_size": 320,
        "shuffle": true,
        "base_size": 400,
        "scale": true,
        "augment": true,
        "flip": true,
        "rotate": false,
        "blur": false,
        "split": "train_unsupervised",
        "num_workers": 8,
        "iou_bound": [0.1, 1.0],
        "stride": 8
    },

    "val_loader": {
        "batch_size": 4,
        "val": true,
        "split": "val",
        "shuffle": false,
        "num_workers": 4
    },

    "trainer": {
        "epochs": 80,
        "save_dir": "saved/",
        "save_period": 1,
  
        "monitor": "max Mean_IoU",
        "early_stop": 100,
        
        "tensorboardX": true,
        "log_dir": "saved/",
        "log_per_iter": 20,

        "val": true,
        "val_per_epochs": 1
    }
}

Could you give me some advice about how to correctly reproduce your results?
Thanks a lot.

如何选择重叠区域

你好,请问如何确定论文中随机选取的重叠区域刚好是目标所在的区域(伪标签的值)?如果选取的重叠区域不是目标所在,那对比损失还有意义吗?

How to train with full supervised mode?

Thank you for your nice work!
I was a little wondering about :

  1. the setting of training the model with total labeled 10,582 samples;
  2. when employing "Full labeled data", the differences between settings of ”SupOnly“ and "CAC".

Question about parameter "selected_num"

I get ValueError when I try to reproduce your paper results on Pascal Voc dataset. Concretely, it raises "ValueError: Cannot take a larger sample than population when 'replace=False'" when use default selected_num setting with value of 6400. And error aforementioned occurs at following line.

selected_idx1 = np.random.choice(range(b*h*w), selected_num, replace=False)
I don't know how to solve this bug. Hope for your solution.

Training problems

image
This error will be reported at 20% of the first round of training. Can you tell me the reason? How to solve it? My configuration is two pieces of 2080ti and 64g memory

Bugs in Learning Rate Scheduler?

image

May the learning rate used not 'poly'?

def get_lr(self): T = self.last_epoch * self.iters_per_epoch + self.cur_iter factor = pow((1 - 1.0 * T / self.N), 0.9) if self.warmup_iters > 0 and T < self.warmup_iters: factor = 1.0 * T / self.warmup_iters self.cur_iter %= self.iters_per_epoch self.cur_iter += 1 assert factor >= 0, 'error in lr_scheduler' return [base_lr * factor for base_lr in self.base_lrs]

However, the lr updated as follows,
self.lr_scheduler.step(epoch=epoch-1)

Is that a bug or a specific design?
https://github.com/pytorch/pytorch/blob/41054f2ab5bb39d28a3eb8497f1a65b42385a996/torch/optim/lr_scheduler.py#L155

Question of training

I try to train the model by only one GPU. But the process is ended by the following error :

Traceback (most recent call last):
File "train.py", line 127, in
mp.spawn(main, nprocs=config['n_gpu'], args=(config['n_gpu'], config, args.resume, args.test))
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/media/ders/sundingpeng/paper_code/Context-Aware-Consistency-master-2/train.py", line 98, in main
trainer.train()
File "/media/ders/sundingpeng/paper_code/Context-Aware-Consistency-master-2/base/base_trainer.py", line 115, in train
results = self._valid_epoch(epoch)
File "/media/ders/sundingpeng/paper_code/Context-Aware-Consistency-master-2/trainer.py", line 145, in _valid_epoch
for batch_idx, (data, target) in enumerate(tbar):
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 83, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 83, in
return [default_collate(samples) for samples in transposed]
File "/home/ders/anaconda3/envs/sdp/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)

RuntimeError: stack expects each tensor to be equal size, but got [3, 366, 500] at entry 0 and [3, 335, 500] at entry 1

Questions about _train_epoch in trainer.py

if self.mode == 'supervised':
# dataloader = iter(self.supervised_loader)
# tbar = tqdm(range(len(self.supervised_loader)), ncols=135)
dataloader = iter(cycle(self.supervised_loader))
tbar = tqdm(range(self.iter_per_epoch), ncols=135)
else:
dataloader = iter(zip(cycle(self.supervised_loader), cycle(self.unsupervised_loader)))
tbar = tqdm(range(self.iter_per_epoch), ncols=135)

The comment part is your original code.
In the semi-supervised method, 'cycle' was used to expand the number of iterations of labeled images. Obviously the number of iterations in the fully-supervised is much less. I think this comparison may be unfair. What is your opinion or modification plan? Looking forward to your answer, thanks!

Understanding the UnSupervised Dataloader

Hey @X-Lai ,

Thank you for sharing your work!

I was able to set up the repository and run the experiments in accordance with the steps provided. However , I am finding it difficult to understand some part of the code related to unsupervised data loader. Please find my queries below:

  1. Why have we chosen 320x320 for VOC dataset and 720x720 for Cityscapes? Just wanted to understand the rational behind it.
  2. In code, in file dataloaders/voc.py:
        overlap1_ul = [max(0, y2-y1), max(0, x2-x1)]
        overlap1_br = [min(self.crop_size, self.crop_size+y2-y1, h//self.stride * self.stride), min(self.crop_size, self.crop_size+x2-x1, w//self.stride * self.stride)]
        overlap2_ul = [max(0, y1-y2), max(0, x1-x2)]
        overlap2_br = [min(self.crop_size, self.crop_size+y1-y2, h//self.stride * self.stride), min(self.crop_size, self.crop_size+x1-x2, w//self.stride * self.stride)]

I am not quite able to understand the utility of self.stride, why is this necessary and what exactly does overlap1_ul and overlap1_br represent

Regards
Nitin

About using weak-labels

Hi, thank you for your great work!

Could you please give me some advice on using weak labels? Are the classification model are first pre-trained on ImageNet are next fine-tuned on specific weak-annotated datasets like CityScapes?

The problem of image shape

when use "python3 train.py --config configs/voc_cac_deeplabv3+_resnet50_1over8_datalist0.json" with two GPUS to run , i find that one GPUS imput image size is (1,3,335,500) while another is (1,3,366,500) . in this case ,i can run to end.
but ,when i run with only one GPU,the problem of diffrent size is occr.
how is go ???

The CAM visualization

Dear Author,

Thanks so much for your excellent work, and the code. I'm just get into the semi-supervised segmentation field, and I'm really interested for your Grad-CAM visualization, which is shown in your first page of the paper.

Could you give me some advice to visualize it? Any advice/pseudo-code could be really helpful!

Thanks again!

Baselines compared ...

Hi, thanks for your nice work!

You have listed some results of other methods (such as CCT), have you just re-implement the method under your settings?

As we know, the splits of dataset differ in those semi-sup papers.

Thanks.

Directional Contrastive Loss calculation

Hi X-Lai,

Thank you very much for sharing your well-organized code and I'm trying to use your Directional Contrastive Loss in my work. I have a question on the calculation of Directional Contrastive Loss.
Equation 1 in your conference paper has two items in the denominator, while the calculation in your code is as following:
logits1 = torch.exp(pos1 - neg_max1).squeeze(-1) / (logits1_down + eps) # in model.py Line 231.
Should it be 'torch.exp(pos1 - neg_max1).squeeze(-1) / (torch.exp(pos1 - eg_max1).squeeze(-1)+logits1_down + eps)'? Or 'torch.exp(pos1 - neg_max1).squeeze(-1)' is already added up to logits1_down.

Ps: Is 'neg_max1' for normalization?

I'm looking forward to your reply!

About training the model.

I try to train the model as your guide ( python3 train.py --config configs/voc_cac_deeplabv3+_resnet50_1over8_datalist0.json
), but the program is ended with an error. Can you give me some suggestions?

The error :
Traceback (most recent call last):
File "/home/anaconda3/envs/sdp36/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/sundingpeng/paper_code/Context-Aware-Consistency-master/train.py", line 98, in main
trainer.train()
File "/home/sundingpeng/paper_code/Context-Aware-Consistency-master/base/base_trainer.py", line 113, in train
self._train_epoch(epoch)
File "/home/sundingpeng/paper_code/Context-Aware-Consistency-master/trainer.py", line 81, in _train_epoch
curr_iter=batch_idx, target_ul=target_ul, epoch=epoch-1, **kargs)
File "/home/anaconda3/envs/sdp36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/anaconda3/envs/sdp36/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
output = self.module(inputs[0], **kwargs[0])
File "/home/anaconda3/envs/sdp36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/home/sundingpeng/paper_code/Context-Aware-Consistency-master/models/model.py", line 174, in forward
selected_idx1 = np.random.choice(range(b
h
w), selected_num, replace=False)
File "mtrand.pyx", line 954, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

About the ablation study

Thank you for your amazing job.

I have one question about your ablation results. As in Table 3,
1634628819(1)
you applied Proj+Context+L2 regularization in Exp.1. However, the performance is lower than SupOnly.

Could you please give me some more explanation about this phenomenon?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.