fcdl94 / mib Goto Github PK
View Code? Open in Web Editor NEWOfficial code for Modeling the Background for Incremental Learning in Semantic Segmentation https://arxiv.org/abs/2002.00718
License: MIT License
Official code for Modeling the Background for Incremental Learning in Semantic Segmentation https://arxiv.org/abs/2002.00718
License: MIT License
what is run_inc.py in the command line? it is not in the folder。
Which pretrained model do you use from https://github.com/mapillary/inplace_abn#training-on-imagenet-1k?
I tried several but I got errors:
model = make_model(opts, classes=tasks.get_per_task_classes(opts.dataset, opts.task, opts.step))
File "/home/xialei/MiB/segmentation_module.py", line 29, in make_model
del pre_dict['state_dict']['classifier.fc.weight']
KeyError: 'classifier.fc.weight'
Thanks!
Hi @fcdl94, sorry for bothering you again, and since the questions are related to VOC, I open a new issue instead. I have re-run the disjoint and overlapped experiments for all the methods you included in the paper's Table 1. The ranking of the results is the same, which means the conclusions and claims of the paper works fine. However, when I list the MiB results in the sheet, something weird happen.
First, let me include my command line for replication as follow, and both settings are all training from scratch, which means they should be using the different pretrained model for the step 1-5:
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 0 --lr 0.01 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 2 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 3 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 4 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 5 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 0 --lr 0.01 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 1 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 2 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 3 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 4 --lr 0.001 --epochs 30 --method MiB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB
I get the results after all the steps are completed, as shown below:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | all except 0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MiB (Disjoint) | 84.61 | 16.49 | 31.68 | 58.00 | 27.89 | 43.96 | 1.02 | 15.31 | 73.63 | 1.05 | 35.17 | 22.15 | 65.75 | 54.63 | 28.12 | 80.94 | 0.09 | 24.38 | 13.66 | 15.79 | 17.38 | 31.35 |
MiB (Overlapped) | 84.73 | 15.17 | 27.21 | 45.64 | 21.93 | 42.31 | 3.06 | 46.96 | 77.94 | 4.06 | 36.41 | 37.46 | 64.53 | 43.26 | 28.14 | 80.49 | 0.02 | 25.04 | 17.06 | 14.06 | 15.09 | 32.29 |
Results Summary:
For the Disjoint setting, the 1-15 class mIoU is 37.05 and 16-20 class mIoU is 14.26. The all
mIoU should be 31.35;
For the Overlapped setting, the 1-15 class mIoU is 38.30 and 16-20 class mIoU is 14.25. The all
mIoU should be 32.29.
Compared with the Table 2 results, for the disjoint setting, you get 46.2 mIoU for 1-15 and 12.9 mIoU for 16-20, then 37.9 mIoU for all
; for the Overlapped setting, you get 35.1 mIoU for 1-15 and 13.5 for 16-20, then 29.7 mIoU for all
.
Therefore,
Is there anything I miss to reproduce the VOC 15-1 results with MiB?
FYI, I attach the raw data of each steps for each setting:
Step 0:
T-IoU | 0.9428957387543162 | 0.9093635384910007 | 0.407387220528084 | 0.9114261479178447 | 0.6930536991709606 | 0.8098990488060078 | 0.951383562635301 | 0.9017522187102619 | 0.9226235898934364 | 0.460997208347546 | 0.8647273405405083 | 0.5378474228922152 | 0.8973456596703172 | 0.867327442844769 | 0.8703716247790984 | 0.8676089249667962 |
---|
Step 1:
T-IoU | 0.9167589144636081 | 0.8873648605178784 | 0.4092252459852222 | 0.869479481025851 | 0.6806551633049537 | 0.7638765341517527 | 0.9205706473134169 | 0.8486735934421107 | 0.915373926952166 | 0.43939618475723113 | 0.8347598489209404 | 0.503665661560724 | 0.8803754274877195 | 0.8143450733616353 | 0.8425866785715388 | 0.8478188317980148 | 0.17931164855409873 |
---|
Step 2:
T-IoU | 0.9124010280617076 | 0.7157312195506308 | 0.3758244658997688 | 0.8048439599053759 | 0.5128759972974389 | 0.7772130550282125 | 0.8140185464753737 | 0.7540073558194968 | 0.8067052172080738 | 0.22098235839790872 | 0.24187419224161838 | 0.41607016629238047 | 0.7928031218145428 | 0.7074281456336383 | 0.8009249524418639 | 0.8291462394258059 | 0.08864073567935006 | 0.234592279843589 |
---|
Step 3:
T-IoU | 0.8605716951061348 | 0.7427626686495384 | 0.3754501232161909 | 0.809016676048648 | 0.5147889505056727 | 0.6270363019643613 | 0.7184538662827296 | 0.7334033091838555 | 0.8304927366076473 | 0.1250355660764828 | 0.41305395553844215 | 0.44932921473209253 | 0.787892298275096 | 0.7264193954508246 | 0.7572718800408372 | 0.8392311512765394 | 0.04363873022939195 | 0.15701272132864186 | 0.15189688631990797 |
---|
Step 4:
T-IoU | 0.8617972763749628 | 0.20872328663156442 | 0.3631064199006128 | 0.5886765010309403 | 0.34041861509825144 | 0.5474293754913939 | 0.13570653927230805 | 0.5574966269754023 | 0.64431626024196 | 0.031194197377586146 | 0.4463714388961355 | 0.24921212066929457 | 0.6278902980821934 | 0.5785498297789483 | 0.504660152102315 | 0.7900153928512044 | 0.011786844749089675 | 0.20088916659028505 | 0.12481700285889741 | 0.19506359566450185 |
---|
Step 5:
T-IoU | 0.8460856346827669 | 0.16490433429984022 | 0.31678143548785276 | 0.5799742617085539 | 0.2789076000769832 | 0.4395905332124107 | 0.01023589863212125 | 0.1530806668255949 | 0.7362783346863634 | 0.010541665752975061 | 0.35172514843737246 | 0.2214938353272161 | 0.6574922122682622 | 0.5462862022895337 | 0.28115039555445903 | 0.8093942994334622 | 0.0009336352682833694 | 0.24378756157086445 | 0.1365743221933673 | 0.15787678203324385 | 0.17377045020350113 |
---|
Step 0:
T-IoU | 0.942351454969199 | 0.9088923380558995 | 0.41445877002346937 | 0.8943720292308379 | 0.7216543435352611 | 0.8245976268597867 | 0.9404349006553606 | 0.9077450313931694 | 0.9266660651840857 | 0.4700834557332964 | 0.857335799102383 | 0.5793174660444275 | 0.8987260795649801 | 0.8612262856604168 | 0.8698300613490206 | 0.8655979116146679 |
---|
Step 1:
T-IoU | 0.9182269199439186 | 0.8792879987006258 | 0.4017180006248381 | 0.8441055742619886 | 0.6994568677193088 | 0.7797706674159932 | 0.9149230512624791 | 0.8669237437597723 | 0.925295289580988 | 0.4621438312184923 | 0.8278570724749236 | 0.5046357276965907 | 0.8827558057428029 | 0.8319480561554519 | 0.8351552396152283 | 0.8531293756555369 | 0.20081410045276016 |
---|
Step 2:
T-IoU | 0.9102123086486242 | 0.6716115648361257 | 0.3323608949874176 | 0.7409822233592194 | 0.4959355955312635 | 0.7102449043004382 | 0.8138243088151255 | 0.7864668388252009 | 0.8396605082671436 | 0.19756995217299297 | 0.47104696504611043 | 0.3864640364884321 | 0.7830237173154856 | 0.6534901834317308 | 0.8130206998712594 | 0.8267346598625166 | 0.2135801990714857 | 0.2739649761053714 |
---|
Step 3:
T-IoU | 0.8632306048394159 | 0.705974539166515 | 0.36605442930053417 | 0.7172314140174203 | 0.47740362529902175 | 0.6360963911559759 | 0.7891308674752905 | 0.8167691950759172 | 0.8518501315621179 | 0.1859484531773666 | 0.5689216234786797 | 0.4579828893103606 | 0.7713658863930264 | 0.6804223882563137 | 0.7407504318523732 | 0.8474278492117775 | 0.14708674311949124 | 0.2741278598946114 | 0.18034994422937553 |
---|
Step 4:
T-IoU | 0.8647615150869634 | 0.12248332643577793 | 0.29494393825587 | 0.5665025942424852 | 0.25189086565151464 | 0.504964955569028 | 0.2607169020464405 | 0.6373994602886512 | 0.6749896193770941 | 0.038352339739995286 | 0.3740172433656488 | 0.33634831372874185 | 0.584688752624233 | 0.40671331551812523 | 0.32672648567942325 | 0.7970850159150706 | 0.009704602104938941 | 0.20900667311557677 | 0.14287384004333556 | 0.18019787539515425 |
---|
Step 5:
T-IoU | 0.847341876904206 | 0.1516912277406037 | 0.27214761938518534 | 0.4564398366880403 | 0.21925902115097656 | 0.4230704428405192 | 0.03062049153551857 | 0.4696080316553439 | 0.7793639447531278 | 0.04059292675086336 | 0.3641082518756912 | 0.3746037758239177 | 0.6453468608880033 | 0.43260491849114996 | 0.28142230199939966 | 0.8048656345565484 | 0.0001993869309182823 | 0.2504386187169609 | 0.17055907785579358 | 0.14061286493441055 | 0.1509222886875183 |
---|
When I train the MiB from scratch using the VOC dataset, the Loss keeps being nan during the training. Any idea about this issue?
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_no_pretrained --task 15-5 --lr 0.01 --epochs 30 --method MiB --no_pretrained
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_no_pretrained --task 15-5 --lr 0.01 --epochs 30 --method MiB --no_pretrained
*****************************************Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.*****************************************
INFO:rank1: Device: cuda:1
INFO:rank0: [!] starting logging at directory ./logs/15-5-voc/test_MIB_voc_15_5_lr_0.01_no_pretrained/
INFO:rank0: Device: cuda:0
INFO:rank0: Dataset: voc, Train set: 8437, Val set: 1240, Test set: 1240, n_classes 16
INFO:rank0: Total batch size is 24
INFO:rank0: Backbone: resnet101
INFO:rank0: [!] Model made without pre-trained
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Selected optimization level O0: Pure FP32 training.
Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",)
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
INFO:rank0: [!] Train from scratch
INFO:rank1: tensor([[79]])
INFO:rank0: tensor([[79]])
INFO:rank0: Epoch 0, lr = 0.010000
INFO:rank0: Epoch 0, Batch 10/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 20/351, Loss=nan
Warning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 30/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 40/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 50/351, Loss=nanWarning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 60/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 70/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 80/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 90/351, Loss=nan
Warning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 100/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 110/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 120/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 130/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 140/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 150/351, Loss=nan
Warning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 160/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 170/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 180/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 190/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 200/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 210/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 220/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 230/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 240/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 250/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 260/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 270/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 280/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 290/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 300/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 310/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 320/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 330/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 340/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 350/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Class Loss=nan, Reg Loss=0.0
INFO:rank0: End of Epoch 0/30, Average Loss=nan, Class Loss=nan, Reg Loss=0.0
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.
INFO:rank0: validate on val set...
INFO:rank0: Validation, Class Loss=nan, Reg Loss=0.0 (without scaling)
INFO:rank1: Done validation
INFO:rank0: Done validation
INFO:rank0: End of Validation 0/30, Validation Loss=nan, Class Loss=nan, Reg Loss=0.0
INFO:rank0:
Total samples: 1240.000000
Overall Acc: 0.694367
Mean Acc: 0.062500
FreqW Acc: 0.482146
Mean IoU: 0.043398
Class IoU:
class 0: 0.6943672262601215
class 1: 0.0
class 2: 0.0
class 3: 0.0
class 4: 0.0
class 5: 0.0
class 6: 0.0
class 7: 0.0
class 8: 0.0
class 9: 0.0
class 10: 0.0
class 11: 0.0
class 12: 0.0
class 13: 0.0
class 14: 0.0
class 15: 0.0
Class Acc:
class 0: 0.9999999999999951
class 1: 0.0
class 2: 0.0
class 3: 0.0
class 4: 0.0
class 5: 0.0
class 6: 0.0
class 7: 0.0
class 8: 0.0
class 9: 0.0
class 10: 0.0
class 11: 0.0
class 12: 0.0
class 13: 0.0
class 14: 0.0
class 15: 0.0
INFO:rank0: [!] Checkpoint saved.
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, lr = 0.009699
INFO:rank0: Epoch 1, Batch 10/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 20/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 30/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 40/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 50/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 60/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 70/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 80/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 90/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 100/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 110/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 120/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 130/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 140/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 150/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 160/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 170/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 180/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 190/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 200/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 210/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 220/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 230/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 240/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 250/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 260/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 270/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 280/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 290/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 300/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 310/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 320/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 330/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 340/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 350/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Class Loss=nan, Reg Loss=0.0
INFO:rank0: End of Epoch 1/30, Average Loss=nan, Class Loss=nan, Reg Loss=0.0
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.
INFO:rank0: validate on val set...
INFO:rank1: Done validation
INFO:rank0: Validation, Class Loss=nan, Reg Loss=0.0 (without scaling)
INFO:rank0: Done validation
INFO:rank0: End of Validation 1/30, Validation Loss=nan, Class Loss=nan, Reg Loss=0.0
INFO:rank0:
Total samples: 1240.000000
Overall Acc: 0.694367
Mean Acc: 0.062500
FreqW Acc: 0.482146
Mean IoU: 0.043398
Class IoU:
class 0: 0.6943672262601215
class 1: 0.0
class 2: 0.0
class 3: 0.0
class 4: 0.0
class 5: 0.0
class 6: 0.0
class 7: 0.0
class 8: 0.0
class 9: 0.0
class 10: 0.0
class 11: 0.0
class 12: 0.0
class 13: 0.0
class 14: 0.0
class 15: 0.0
Class Acc:
class 0: 0.9999999999999951
class 1: 0.0
class 2: 0.0
class 3: 0.0
class 4: 0.0
class 5: 0.0
class 6: 0.0
class 7: 0.0
class 8: 0.0
class 9: 0.0
class 10: 0.0
class 11: 0.0
class 12: 0.0
class 13: 0.0
class 14: 0.0
class 15: 0.0
I think the training results should be the same at 1-15 for Single-step addition of five classes (15-5) and Multi-step addition of five classes (15-1).
Isn’t it different the last five times?
You have referenced the following documents:Incremental Learning of Object Detectors without Catastrophic Forgetting
I verified my idea in this article. Did I understand it wrong? Or is it a mistake in the article?
Thanks for your reply, the article idea is great!
Hello,
Thanks for your interesting work. I am executing your method using other data sets. But in this process, I encountered the problem of data set partition. Should all test data be used for each task of incremental learning? How to devide the test data set? Is it possible to split test data set using ade-split.ipynb?
Hi, when I was reproducing the 15-5s mission, there was a problem.
I got the performance of the last model (21 classes) as following:
Class IoU: class 0: 0.8489214 class 1: 0.16607599 class 2: 0.287244 class 3: 0.33080766 class 4: 0.08697148 class 5: 0.37604603 class 6: 0.0072344807 class 7: 0.2163677 class 8: 0.6778649 class 9: 0.010483054 class 10: 0.28229013 class 11: 0.08285885 class 12: 0.54856247 class 13: 0.5060702 class 14: 0.26826012 class 15: 0.79161125 class 16: 0.00043867234 class 17: 0.22526477 class 18: 0.13574767 class 19: 0.14906816 class 20: 0.2198876 Class Acc: class 0: 0.9055752 class 1: 0.1661803 class 2: 0.45422554 class 3: 0.33121628 class 4: 0.08705099 class 5: 0.38315275 class 6: 0.0072452705 class 7: 0.21683867 class 8: 0.6815347 class 9: 0.010607969 class 10: 0.301625 class 11: 0.083156176 class 12: 0.59717894 class 13: 0.5174191 class 14: 0.27290416 class 15: 0.86344343 class 16: 0.0004472701 class 17: 0.70130014 class 18: 0.41352695 class 19: 0.9158315 class 20: 0.64600176
I used the command as following:
`CUDA_VISIBLE_DEVICES=1 python -m torch.distributed.launch --nproc_per_node=1 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 0
--lr 0.01
--epochs 30
--method MiB
--opt_level O1 \
CUDA_VISIBLE_DEVICES=1 python -m torch.distributed.launch --nproc_per_node=1 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 1
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 2
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 3
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 4
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 5
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 `
Hi, I have run the code on 50-50-50 setting on ade20k.
I want to know how to visualize on the verification set. If you have the corresponding code, can you provide it?
Thank you for your work and look forward to your reply
To Reproduce
Dataset: ADE20K
Setting: 50-50-50
Hi,this is a really interesting work,but when reading your manuscript,i am confused about the two setups: overlapped and disjoint, can you describe it with much more details?
Hello @fcdl94, I really liked your research work, and I was trying to reproduce the results.
I tried for 19-1, 15-5, and 15-5s split on PASCAL VOC. The command shown below is for the 19-1 split.
Could you also please confirm whether these default settings are enough to reproduce the result in Table 1: Mean IoU on the Pascal-VOC 2012 dataset for different incremental class learning scenarios of your research paper?
#disjoint
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 0 --lr 0.01 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 1 --lr 0.001 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/
#overlap
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 0 --lr 0.01 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/ --overlap
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 1 --lr 0.001 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/ --overlap
I was wondering on infering the tensorboard logs or the general log file
First I want to thank you for your work! When I try to reproduce the result of ADE20k, I find the issue of out of memory under two RTX 2080Ti. Could you provide the batch size and also the GPU number for ADE20k training?
I did not find the batch size and GPU number you used for all the experiments in your ArXiv version paper, which I think should be critical for reproduction and to validate your paper's conclusion.
Questions Highlight:
Exp Command:
command: CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained
Error Log:
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
INFO:rank1: Device: cuda:1
Filtering images...
INFO:rank0: [!] starting logging at directory ./logs/100-50-ade/test_MIB_ade_100_50_lr_0.01_no_pretrained/
INFO:rank0: Device: cuda:0
0/2000 ...
Filtering images...
0/2000 ...
1000/2000 ...
1000/2000 ...
Filtering images...
0/2000 ...
Filtering images...
0/2000 ...
1000/2000 ...
1000/2000 ...
INFO:rank0: Dataset: ade, Train set: 13452, Val set: 2000, Test set: 2000, n_classes 101
INFO:rank0: Total batch size is 24
INFO:rank0: Backbone: resnet101
INFO:rank0: [!] Model made without pre-trained
Selected optimization level O0: Pure FP32 training.
Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : Falseloss_scale : 1.0INFO:rank0: [!] Train from scratch
INFO:rank0: tensor([[50]])
INFO:rank1: tensor([[50]])
INFO:rank0: Epoch 0, lr = 0.010000
Traceback (most recent call last):
File "run.py", line 390, in <module>
main(opts)
File "run.py", line 277, in main
train_loader=train_loader, scheduler=scheduler, logger=logger)
File "/home/jovyan/MiB/train.py", line 128, in train
scaled_loss.backward()
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 1; 10.73 GiB total capacity; 9.31 GiB already allocated; 371.56 MiB free; 243.46 MiB cached)
Traceback (most recent call last):
File "run.py", line 390, in <module>
main(opts)
File "run.py", line 277, in main
train_loader=train_loader, scheduler=scheduler, logger=logger)
File "/home/jovyan/MiB/train.py", line 128, in train
scaled_loss.backward()
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 0; 10.73 GiB total capacity; 9.31 GiB already allocated; 373.56 MiB free; 241.46 MiB cached)
Traceback (most recent call last):
File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
main()
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 242, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/MiB/bin/python', '-u', 'run.py', '--local_rank=1', '--data_root', 'data', '--batch_size', '12', '--dataset', 'ade', '--name', 'test_MIB_ade_100_50_lr_0.01_no_pretrained', '--task', '100-50', '--lr', '0.01', '--epochs', '30', '--method', 'MiB', '--no_pretrained']' returned non-zero exit status 1.
Hello,
Thanks for the good work.
I have a few questions to ask.
Is the step "1-15" under task "15-5" and step "1-15" under task "15-5s" the same? If yes, why is there a difference in the performance or mean IoU values in the paper?
I performed the FT and MiB experiments, and the mean IoU is differing too much from the one in the paper. The difference between the command given by you and the one I am using is that I have a single CUDA device and reduced the batch size to 12.
Thanks and regards,
Sreeni...
In Subset.py, I saw you performed the normalization to both input and label. I'm confused about why using normalization to the label?
train_transform = transform.Compose([
transform.RandomResizedCrop(opts.crop_size, (0.5, 2.0)),
transform.RandomHorizontalFlip(),
transform.ToTensor(),
transform.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
sample, target = self.transform(sample, target)
From the pytorch offical document about cross entropy loss function, we know that the label is only used as index.
same as title~
Thanks!
In the file loss.py, under class UnbiasedCrossEntropy(nn.Module) line 102
outputs[:, old_cl:] = inputs[:, old_cl:] - den.unsqueeze(dim=1)
inputs[] is the raw logits of network, den is the logsumexp of those logits. To my understanding input[] should be first go through log and exp before operate with den.
Hello,I want to ask about the logic of cutting data, how a photo puts it on the incremental learning steps. Sorry to bother you.Tks.
# This is the code who actually make the split.
all_added = set()
imgs={}
others = {}
every=set()
added = [0 for i in range(151)]
for i in range(sets+1):
others[i]=set()
imgs[i]=set()
# This is the most imporant function.
# It computes the score of the image to be assinged to a class
def score(ratio,imgs,expected,minval=False,w1=120,w2=1000,w3=1 ):
return (1.-ratio)*w1+w3*minval
for i in imgs_to_cls.keys(): # loop for every image in the dataset
# img_counts = num of images per class for the class in the actual image
img_counts = [class_members[j] for j in imgs_to_cls[i]]
# ratio = images assigned to the class / total number of images for the class
ratios = [(added[j]+0.0)/class_members[j] for j in imgs_to_cls[i]]
added_counts = [added[j] for j in imgs_to_cls[i]]
# assignments = set of each class in the image
assignments = [map_class_to_set[j] for j in imgs_to_cls[i]]
scores = [score(ratios[c],added_counts[c],class_members[j],1./(img_counts[c]/sum(img_counts))) for c,j in enumerate(imgs_to_cls[i])]
# take the highest scorer class
cl = scores.index(max(scores))
# take the set of the higher scorer class
a=assignments[cl]
# add the image to the step images
imgs[a].update([i])
for j,ac in enumerate(assignments):
if ac==a:
# increment the number of images assigned to the classes in the current image contained in the same step.
added[imgs_to_cls[i][j]] += 1
for i in range(1,len(added)):
assignment = map_class_to_set[i]
ratios = [len(set(idxs[i]).intersection(imgs[j]))/class_members[i] for j in range(0,sets+1)]
if ratios[assignment]<0.5 or (ratios[assignment]<1. and added[i]<100):
print(i,ratios[assignment], sum(ratios[1:]), class_members[i], len(set(idxs[i]).intersection(imgs[assignment])))
s=0
for i in range(sets+1):
print(len(imgs[i]))
s+=len(imgs[i])
print(s)
Hi, thank you for creating such an innovative and wonderful incremental semantic segmentation method.
In this paper, the mean IoU of MiB in disjoint Pascal-Voc (19-1) Task is 69.6(1-19) 25.6(20) 67.4(all). When I try to reproduce the performance, the final mean IoU of MiB(step1) in disjoint Pascal-Voc (19-1) task is 62.7(1-19) 17.3(20) 60.5(all), thus I want to know if there is a problem of my MiB(step0). Could you please show me the mean IoU of your MiB(step0) in disjoint Pascal-Voc (19-1) task?
Here is the test result when my MiB(step0) was trained after 30 epochs:
Train set: 10034, Val set: 1421, Test set: 1421, n_classes 20
INFO:rank0: *** Test the model on all seen classes...
INFO:rank0: *** Model restored from checkpoints/step/19-1-voc_MIB_original_0.pth
Total samples: 1422.000000
Overall Acc: 0.937206
Mean Acc: 0.850237
FreqW Acc: 0.888375
Mean IoU: 0.753881
Class IoU:
class 0: 0.9282959437197884
class 1: 0.8882853094595112
class 2: 0.3797590436288108
class 3: 0.8807409395997082
class 4: 0.6648712821388697
class 5: 0.7976358668707264
class 6: 0.9164809865812081
class 7: 0.8817972918369146
class 8: 0.9234520147664763
class 9: 0.32377577234982474
class 10: 0.8422507677142984
class 11: 0.5292598617495181
class 12: 0.8956875294271032
class 13: 0.8407520660676776
class 14: 0.848592612779058
class 15: 0.8356916493750883
class 16: 0.5890588805790536
class 17: 0.8099400266733182
class 18: 0.4580267122561858
class 19: 0.8432569294818675
Class Acc:
class 0: 0.966766716830572
class 1: 0.960997648920115
class 2: 0.8587795396354625
class 3: 0.9328072965429103
class 4: 0.8608227242951523
class 5: 0.9055918146656152
class 6: 0.9412850562654307
class 7: 0.9158580847049574
class 8: 0.9614393245122251
class 9: 0.47560739230891674
class 10: 0.9135487146581773
class 11: 0.5875200166947372
class 12: 0.9519971945124238
class 13: 0.9047293200139962
class 14: 0.9272278961100133
class 15: 0.8997701656461145
class 16: 0.7169984272644204
class 17: 0.867543489475999
class 18: 0.5452584289443756
class 19: 0.9101945156506736
And the test result when my MiB(step1) was trained after 30 epochs:
Train set: 548, Val set: 74, Test set: 1449, n_classes 21
INFO:rank0: *** Test the model on all seen classes...
INFO:rank0: *** Model restored from checkpoints/step/19-1-voc_MIB_original_1.pth
Total samples: 1450.000000
Overall Acc: 0.890334
Mean Acc: 0.725162
FreqW Acc: 0.820754
Mean IoU: 0.618920 # 0.6411918301227864 0-19 20 0.1734
Class IoU:
class 0: 0.8938328066648968
class 1: 0.7782195086558429
class 2: 0.37033196721590445
class 3: 0.8413915510994053
class 4: 0.525415475349433
class 5: 0.6729141125705496
class 6: 0.20963013074814107
class 7: 0.7363044189891342
class 8: 0.8777653860912165
class 9: 0.29350207126893696
class 10: 0.8145631162334964
class 11: 0.49090401230842257
class 12: 0.8398493348474195
class 13: 0.7777190817231865
class 14: 0.683588348834814
class 15: 0.7999431095131695
class 16: 0.555339335033059
class 17: 0.7869103135756841
class 18: 0.4080286080402448
class 19: 0.4676839136927719
class 20: 0.17347901289607445
Class Acc:
class 0: 0.9573831275998825
class 1: 0.7941349223562392
class 2: 0.7405488472008162
class 3: 0.939724801161348
class 4: 0.5830666957769701
class 5: 0.7291240593393206
class 6: 0.21006324595180248
class 7: 0.7776654179327952
class 8: 0.9586466681798896
class 9: 0.40889428771720887
class 10: 0.8555168166200787
class 11: 0.5290773778328225
class 12: 0.9181100424143835
class 13: 0.8976137666936727
class 14: 0.7100771395812419
class 15: 0.8855347468407907
class 16: 0.702358914737102
class 17: 0.8942199044152891
class 18: 0.4998755740006467
class 19: 0.47593462075060633
class 20: 0.7608375943271062
I run the following commands to train the MiB in disjoint Pascal-Voc (19-1) task.
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name MIB_original --task 19-1 --step 0 --lr 0.001 --epochs 30 --method MiB
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name MIB_original --task 19-1 --step 1 --lr 0.001 --epochs 30 --method MiB
Hello, in your paper you mentioned that you implemented the paper "Weakly supervised scene parsing with point-based distance metric learning" . Could you please post this part of the code? thank you very much!
Hello, i want to use batch normalization instead of In-place ABN. What setting should I revise ?. I only revised norm_act to std in argparser.py. But i got error torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute 'activation'.
tks for help.
Hi, how are the things going on with you?
I have a question about this code when I perform the run.py. the error is : ValueError: Wrong image_set entered! Please use image_set="train" or image_set="trainval" or image_set="val", the dataset is Pascal VOC 2012.
I think this error is cased by the dataset_root setting or other dataload setting. About how to download dataset in your Readme file, I do not understand the detail about that, could you detail the dataset_root setting or give the file distribution form after the dataset is downloaded.
The attachment is the file distribution where I put the dataset.
I would be very grateful if you could tell me the solution, looking forward to your replay.
Thank you for your great work, MiB!
Depending on your introduction, I have configured all the environments and run the command:
python -m torch.distributed.launch --nproc_per_node=0 run.py --data_root data --batch_size 12 --dataset voc --name MIB --task 15-5 --step 1 --lr 0.001 --epochs 30 --method MIB
But the experiment could not run and there was no error. The following is the result of my command run:
Could you please tell me is there any data path that needs to be set in advance or are there other variables that need to be set before experimenting?
Thank you for your feedback.
Hi, i am facing a problem like below:
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name LWF --task 100-50 --step 0 --lr 0.01 --epochs 60 --method LWF
/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank
argument to be set, please
change it to read from os.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Root Cause (first observed failure):
[0]:
time : 2022-06-15_09:57:39
host : aaa-Z490-AORUS-MASTER
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1651456)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
I think it is related to version conflict...my gpu is RTX3090, therefore, i must use cuda 11.3.
Please help me to solve the problem..Thank you!
Hi,
I want to reproduce your work.
but I get an error as follow
FileNotFoundError: [Errno 2] No such file or directory: 'pretrained/resnet101_iabn_sync.pth.tar'
can you tell me how to get it?
Hi, I am trying to reproduce the 15-5s results with the voc dataset. However, the results (see the table below) I got are significantly (5 points) higher than those reported in your paper.
step | mIoU | background | aeroplane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | diningtable | dog | horse | motorbike | person | pottedplant | sheep | sofa | train | tvmonitor | base | novel |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 80.37% | 94.36% | 90.62% | 41.06% | 89.66% | 70.50% | 82.79% | 94.69% | 89.50% | 93.99% | 47.16% | 86.11% | 54.31% | 90.43% | 87.54% | 86.42% | 86.84% | 80.37% | ||||||
1 | 74.93% | 92.30% | 89.18% | 40.96% | 85.45% | 70.35% | 79.40% | 93.12% | 86.16% | 91.80% | 44.55% | 84.41% | 52.69% | 89.12% | 82.97% | 83.02% | 84.83% | 23.53% | 78.14% | 23.53% | ||||
2 | 58.70% | 90.79% | 63.33% | 33.44% | 73.98% | 56.07% | 77.29% | 76.04% | 75.58% | 79.69% | 11.87% | 35.73% | 37.31% | 80.49% | 62.46% | 78.60% | 82.57% | 15.51% | 25.89% | 63.45% | 20.70% | |||
3 | 55.12% | 84.07% | 61.58% | 35.32% | 66.56% | 45.26% | 68.80% | 75.12% | 74.11% | 81.62% | 14.12% | 47.14% | 47.08% | 83.13% | 70.48% | 62.91% | 83.62% | 3.38% | 27.31% | 15.69% | 62.56% | 15.46% | ||
4 | 39.80% | 84.85% | 29.00% | 26.06% | 51.45% | 28.52% | 57.93% | 50.61% | 58.18% | 62.76% | 2.69% | 34.24% | 22.81% | 66.81% | 53.43% | 29.02% | 78.51% | 0.40% | 23.19% | 13.31% | 22.23% | 46.05% | 14.78% | |
5 | 34.29% | 83.12% | 22.23% | 14.02% | 40.29% | 21.00% | 53.62% | 10.21% | 37.06% | 72.87% | 0.74% | 31.99% | 34.20% | 72.61% | 52.94% | 18.75% | 76.92% | 0.08% | 28.18% | 12.10% | 16.76% | 20.31% | 40.16% | 15.49% |
For your reference, I run step 0 and step [1-5] with the commands below.
CUDA_VISIBLE_DEVICES=1,2,3 python -m torch.distributed.launch --nproc_per_node=3 run.py
--data_root path/to/data --method MiB --dataset voc --task 15-5s
--step 0 --overlap --lr 0.01 --batch_size 8 --epochs 30 --name MiB
for step in {1..5};
do
CUDA_VISIBLE_DEVICES=1,2,3 python -m torch.distributed.launch --nproc_per_node=3 run.py
--data_root path/to/data --method MiB --dataset voc --task 15-5s
--step ${step} --overlap --lr 0.001 --batch_size 8 --epochs 30 --name MiB
done
Note that I do no modifications to your codes. Hope that you can help to see if my results are correct.
Hi, In MiB, inappropriate dependency versioning constraints can cause risks.
Below are the dependencies and version constraints that the project is using
absl-py==0.8.0
apex==0.1
apturl==0.5.2
asn1crypto==0.24.0
astor==0.8.0
attrs==19.1.0
Automat==0.6.0
backcall==0.1.0
bleach==3.1.0
Brlapi==0.6.6
certifi==2018.1.18
chardet==3.0.4
click==6.7
colorama==0.3.7
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==2.1.4
cupshelpers==1.0
cvxpy==1.0.25
cycler==0.10.0
decorator==4.4.0
defer==1.0.6
defusedxml==0.6.0
dill==0.3.1.1
distro-info===0.18ubuntu0.18.04.1
ecos==2.0.7.post1
entrypoints==0.3
future==0.17.1
gast==0.3.1
google-pasta==0.1.7
grpcio==1.23.0
h5py==2.10.0
httplib2==0.9.2
hyperlink==17.3.1
idna==2.6
incremental==16.10.1
inplace-abn==1.0.7
ipykernel==5.1.2
ipython==7.8.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.15.1
Jinja2==2.10.1
joblib==0.11
jsonschema==3.0.2
jupyter==1.0.0
jupyter-client==5.3.3
jupyter-console==6.0.0
jupyter-core==4.5.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keyring==10.6.0
keyrings.alt==3.0
kiwisolver==1.1.0
language-selector==0.1
launchpadlib==1.10.6
lazr.restfulclient==0.13.5
lazr.uri==1.0.3
louis==3.5.0
macaroonbakery==1.1.3
Mako==1.0.7
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mistune==0.8.4
multiprocess==0.70.9
nbconvert==5.6.0
nbformat==4.4.0
netifaces==0.10.4
nose==1.3.7
notebook==6.0.1
numpy==1.17.2
oauth==1.0.1
olefile==0.45.1
osqp==0.6.1
PAM==0.4.2
pandocfilters==1.4.2
parso==0.5.1
pexpect==4.7.0
pickleshare==0.7.5
Pillow==6.1.0
pluggy==0.6.0
prometheus-client==0.7.1
prompt-toolkit==2.0.9
protobuf==3.9.1
ptyprocess==0.6.0
py==1.5.2
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycairo==1.16.2
pycrypto==2.6.1
pycups==1.9.73
Pygments==2.4.2
pygobject==3.26.1
pymacaroons==0.13.0
PyNaCl==1.1.2
pyOpenSSL==17.5.0
pyparsing==2.4.2
pyRFC3339==1.0
pyrsistent==0.15.4
pyserial==3.4
pytest==3.3.2
python-apt==1.6.4
python-dateutil==2.8.0
python-debian==0.1.32
pytz==2018.3
pyxdg==0.25
PyYAML==3.12
pyzmq==18.1.0
qtconsole==4.5.5
reportlab==3.4.0
requests==2.18.4
requests-unixsocket==0.1.5
scikit-learn==0.19.1
scipy==1.3.1
screen-resolution-extra==0.0.0
scs==2.1.1.post2
SecretStorage==2.3.1
Send2Trash==1.5.0
service-identity==16.0.0
simplegeneric==0.8.1
simplejson==3.13.2
six==1.12.0
ssh-import-id==5.7
system-service==0.3
systemd-python==234
tensorboard==1.14.0
tensorboardX==1.8
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0
termcolor==1.1.0
terminado==0.8.2
testpath==0.4.2
torch==1.2.0
torchvision==0.4.0
tornado==6.0.3
traitlets==4.3.2
Twisted==17.9.0
ubuntu-drivers-common==0.0.0
ufw==0.36
urllib3==1.22
usb-creator==0.3.3
wadllib==1.3.2
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.6
widgetsnbextension==3.5.1
wrapt==1.11.2
xkit==0.0.0
zope.interface==4.3.2
The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.
After further analysis, in this project,
The version constraint of dependency multiprocess can be changed to ==0.70.4.
The version constraint of dependency multiprocess can be changed to >=0.70.4,<=0.70.4.
The version constraint of dependency Pillow can be changed to ==9.2.0.
The version constraint of dependency Pillow can be changed to >=2.0.0,<=9.1.1.
The version constraint of dependency pyasn1 can be changed to >=0.4.1,<=0.4.8.
The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.
The invocation of the current project includes all the following methods.
logger.debug logger.info
PIL.Image.open
open
torch.log_softmax utils.logger.Logger.info self._mean.reshape self.n_classes.mask.label_pred.int.mask.label_true.astype.self.n_classes.np.bincount.reshape.sum dict dataset.transform.Resize self.mod2 any metrics.StreamSegMetrics slice range.keys self.licarl.item x.split join argparser.modify_command_options model.named_parameters torchvision.transforms.functional.adjust_contrast self.book.clear utils.loss.KnowledgeDistillationLoss os.makedirs self.lambd torch.nn.Linear p.grad.detach torch.cat metrics.update p.grad.detach.pow torch.nn.functional.binary_cross_entropy_with_logits mask.float torch.no_grad utils.logger.Logger.add_image self.mod5 torch.exp NotImplementedError VOCSegmentation torch.randint os.path.join os.path.isdir numpy.load n.self.model_old_dict.p.detach.pow model.eval tbl.items matplotlib.pyplot.subplots type torch.utils.data.DataLoader targets.sum.loss.torch.masked_select.sum RW mask.label_true.astype self.regularizer.load_state_dict inputs.shape.labels_new.F.one_hot.float train.Trainer torch.nn.functional.pad tasks.get_task_labels self.fisher_old.items n.self.score.to random.random numpy.random.choice list.append warnings.warn random.uniform Compose images.detach.cpu.numpy self.IncrementalSegmentationModule.super.__init__ key.self.fisher_old.to denorm self.convs targets.sum.loss.torch.masked_select.mean os.path.isfile class_loss.torch.tensor.to torch.nn.CrossEntropyLoss torch.log module.named_children matplotlib.use self.PolyLR.super.__init__ torch.optim.lr_scheduler.StepLR.state_dict self.GlobalAvgPool2d.super.__init__ torch.distributed.get_rank utils.Denormalize self._fast_hist p.clone numpy.concatenate utils.PolyLR model.module.init_new_classifier torch.nn.functional.pad.view self.classes.torch.FloatTensor.torch.log.to target.label2color.transpose.astype loss.mean.mean torch.nn.functional.one_hot numpy.diag.sum torchvision.transforms.functional.resized_crop torch.nn.Conv2d.append labels.cpu.numpy.cpu self._global_pooling self.lde_loss utils.filter_images torch.logsumexp.unsqueeze apex.amp.scale_loss modules.GlobalAvgPool2d n.self.model_temp.to math.sqrt optim.zero_grad torchvision.transforms.functional.adjust_saturation idxs_path.np.load.tolist logging.basicConfig self.get_score utils.loss.UnbiasedKnowledgeDistillationLoss self.head fig.tight_layout torch.nn.functional.pad.repeat p.clone.detach.cpu self.regularizer.update numpy.bincount scaled_loss.backward p.torch.clone.detach argparse.ArgumentParser t torch.nn.functional.avg_pool2d self.red_bn torch.optim.lr_scheduler.StepLR.load_state_dict torch.distributed.reduce label2color self.score.items convert_bn2gn train.Trainer.load_state_dict transform mod epoch_loss.torch.tensor.to cls.bias.data.copy_ images.to.detach torch.nn.GroupNorm.add_module self.DeeplabV3.super.__init__ self.model.named_parameters m samples.cpu.numpy torch.nn.functional.nll_loss self._transform_tag torch.optim.SGD.load_state_dict score.items f.readlines modules.DeeplabV3 logger.debug torch.nn.functional.leaky_relu utils.logger.Logger idxs.append opts.backbone.models.__dict__ self.convs.add_ apex.parallel.DistributedDataParallel.state_dict inputs.size norm_act m.eval torch.tensor lt.flatten numpy.random.seed torch.softmax x.dim vars int AdeSegmentation metrics.synch apex.parallel.DistributedDataParallel.cuda metrics.reset apex.parallel.DistributedDataParallel.parameters torchvision.transforms.functional.crop inputs.shape.targets.shape.range.x.x.torch.tensor.to __all__.append apex.amp.initialize n.self.model_old_dict.p.pow.n.self.score_actual.sum sorted utils.logger.Logger.add_results torch.zeros_like mat.max n.self.model_old_dict.p.n.self.fisher_old.sum tasks.get_task_list logger.info PIL.Image.open p.clone.detach self.ResNet.super.__init__ torch.from_numpy torch.nn.functional.interpolate torchvision.transforms.functional.rotate self.body math.log dataset.transform.RandomResizedCrop numpy.mean lbl.label2color.transpose self.logger.add_image sample.apply_ TypeError random.seed round fil inputs.shape.labels_new.F.one_hot.float.permute float logging.info modules.ResidualBlock model_old.state_dict tensorboardX.SummaryWriter utils.logger.Logger.add_table torch.nn.MSELoss train_loader.sampler.set_epoch prediction.cpu.numpy math.exp labels.cpu.numpy.to lp.flatten functools.partial torch.nn.init.calculate_gain voc_cmap self.get_score.items self.ResidualBlock.super.__init__ segmentation_module.make_model inputs.narrow.narrow x.idxs.append self.order.index self.lkd_loss torch.nn.BCEWithLogitsLoss labels.outputs.mean train.Trainer.validate torchvision.transforms.functional.pad os.path.expanduser labels.cpu.numpy os.path.exists self.get_params setattr torch.FloatTensor get_dataset IncrementalSegmentationModule torch.masked_select optim.step repr freq.iu.freq.freq.sum v.to torchvision.transforms.functional.adjust_hue zip torch.load par.to torch.arange apex.parallel.DistributedDataParallel.fix_bn cityscapes_cmap self.fisher.items torch.sum images.to.to dataset argparser.get_argparser.parse_args dataset.transform.Compose utils.loss.BCEWithLogitsLossWithIgnoreIndex torch.optim.lr_scheduler.StepLR torch.nn.init.constant_ apex.parallel.DistributedDataParallel.load_state_dict utils.logger.Logger.print utils.logger.Logger.add_figure dataset.transform.CenterCrop random.randint min self.confusion_matrix_to_fig ax.figure.colorbar torch.nn.MaxPool2d torchvision.transforms.functional.adjust_brightness SegmentationModule focal_loss.mean FocalLoss self._check_input self.red_conv util.try_index images.detach.cpu torchvision.transforms.functional.to_tensor self.classifier torchvision.transforms.functional.center_crop self.map_bn self.regularizer.penalty.item self.cls.bias.data.copy_ torch.cuda.manual_seed self.confusion_matrix.astype p.detach argparse.ArgumentParser.add_argument str utils.logger.Logger.debug self.__strip_zero list x.view torch.distributed.get_world_size train.Trainer.train outputs.max prediction.cpu.numpy.cpu transforms.append self.global_pooling_conv torch.isinf self.logger.add_text model.modules torch.utils.data.random_split self.total_samples.torch.tensor.to.cpu mat.min filter train.Trainer.state_dict ax.imshow apex.parallel.DistributedDataParallel.to tasks.get_per_task_classes len self.proj_bn utils.get_regularizer n.self.model_old_dict.p.pow reg_loss.torch.tensor.to torchvision.transforms.functional.hflip utils.Label2Color self.bn1 self.device.n.self.model_temp.to.p.detach.pow torch.nn.GroupNorm ade_cmap Lambda numpy.array outputs.narrow lbl.label2color.transpose.astype self._std.reshape self.model_old torch.nn.functional.cross_entropy RuntimeError criterion model self.delta.items self.mod1 model.state_dict normalize_fn params.append par.torch.clone.to self.logger.add_figure self.get models.util.try_index ValueError enumerate torch.nn.Conv2d self.mod3 self.info model.head.parameters img.denorm.astype numpy.unique model.train n.self.score_plus_fisher.mean random.shuffle all labels.remove self.pool_red_conv results.items opts.backbone.models.__dict__.load_state_dict p.to fisher.items inputs.shape.labels_new.F.one_hot.float.permute.clone main self.transform self.book.get format _NETS.items utils.logger.Logger.close torch.sigmoid range dataset.transform.RandomHorizontalFlip task_dict.keys ax.set self.logger.close self.convs.clone PI callable loss.mean.item n.self.model_old_dict.p.pow.n.self.score_plus_fisher.sum isinstance torch.save EWC copy.deepcopy torch.nn.Sequential torch.nn.init.xavier_normal_ self.lde_loss.item dropout utils.Subset self.total_samples.torch.tensor.to self.modules logger.add_scalar apex.parallel.DistributedDataParallel.eval outputs_no_bgk.labels.sum open self.FocalLoss.super.__init__ self.confusion_matrix.torch.tensor.to.cpu torchvision.transforms.functional.resize p.torch.clone.detach.cpu utils.loss.IcarlLoss blocks.append self.confusion_matrix.sum mask.float.mean torch.manual_seed utils.color_map torch.nn.ModuleList image_set.rstrip self.IdentityResidualBlock.super.__init__ numpy.zeros self.mod4 t.apply_ dataset.transform.ToTensor self.global_pooling_bn x.size.x.size.x.view.mean self._network.append utils.loss.UnbiasedCrossEntropy torch.index_select argparser.get_argparser model.cls.parameters torch.isnan self._stride_dilation cls.weight.data.copy_ torch.tensor.to self.licarl torchvision.transforms.functional.vflip x.size numpy.save torch.cuda.set_device utils.logger.Logger.add_scalar numpy.diag res.values self.n_classes.mask.label_pred.int.mask.label_true.astype.self.n_classes.np.bincount.reshape inputs.shape.torch.tensor.to torch.logsumexp collections.OrderedDict torch.utils.data.distributed.DistributedSampler self._network super.__init__ torch.clone os.listdir FileNotFoundError torch.where torch.nn.functional.elu torchvision.transforms.Lambda bitget new_bias.squeeze self.confusion_matrix.torch.tensor.to self.proj_conv self.regularizer.state_dict self.regularizer.penalty metrics.StreamSegMetrics.to_str metrics.get_results save_ckpt apex.parallel.DistributedDataParallel torch.optim.SGD.state_dict torch.device inputs.shape.labels_new.F.one_hot.float.permute.sum self.model_old.state_dict mask.float.sum max in_size.in_size.inputs.view.mean hasattr logging.error torch.mean numpy.zeros.astype self.add_module ret_samples.append focal_loss.sum scheduler.step apex.parallel.DistributedDataParallel.train self.lkd_loss.item tasks_voc.keys dataset.transform.Normalize n.self.score_old.to torch.distributed.barrier torch.distributed.init_process_group index.self.images.Image.open.convert functools.reduce torch.optim.SGD self.target_transform torchvision.transforms.functional.normalize confusion_matrix.cpu.numpy target.label2color.transpose torch.ones_like inputs.view model.body.parameters self.logger.add_scalar p.torch.clone.detach.to super self.reset_parameters print tuple
@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.
Hello. I am running 15-5s with VOC dataset.
During the validation in step1(after the step0), my result is like,
Class IoU:
class 0: 0.8864838886464639
class 1: X
class 2: X
class 3: X
class 4: X
class 5: X
class 6: X
class 7: X
class 8: X
class 9: X
class 10: X
class 11: X
class 12: X
class 13: X
class 14: X
class 15: X
class 16: 0.5655430208625937
Class Acc:
class 0: 0.9604431435003239
class 1: X
class 2: X
class 3: X
class 4: X
class 5: X
class 6: X
class 7: X
class 8: X
class 9: X
class 10: X
class 11: X
class 12: X
class 13: X
class 14: X
class 15: X
class 16: 0.6672904492468744
Is this appropriate result..??
All of the old classes are X.
If this is not appropriate, could you tell me some advice of it?
Thank you :)
Hello,
I am trying to reproduce disjoint 15-5s setting.
But my result is very different from yours.
My command is :
/home/nayoung/nayoung/MiB/run.py --data_root '/home/nayoung/nayoung/' --batch_size 10 --dataset voc --name MIB --task 15-5s --step 0 --lr 0.01 --epochs 30 --method MiB
for step1~5 :
/home/nayoung/nayoung/MiB/run.py --data_root '/home/nayoung/nayoung/' --batch_size 10 --dataset voc --name MIB --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB
I used batch size 10 becuz of cuda memory, and I didn't used the pretrained model.
Also I set the loss_kd=100.
background | aeroplane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | diningtable | dog | horse | motorbike | person | pottedplant | sheep | sofa | train | tvmonitor |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.857241 | 0.596404 | 0.249615 | 0.489829 | 0.336007 | 0.254114 | 0.694971 | 0.631736 | 0.539938 | 0.124421 | 0.380302 | 0.230107 | 0.470491 | 0.438303 | 0.5446194 | 0.615698 | |||||
0.822046 | 0.571647 | 0.246822 | 0.475479 | 0.322084 | 0.202237 | 0.607911 | 0.599167 | 0.515914 | 0.123029 | 0.300344 | 0.230606 | 0.447299 | 0.413728 | 0.5464546 | 0.603189 | 0.06315 | ||||
0.812532 | 0.53895 | 0.238265 | 0.418296 | 0.236745 | 0.17652 | 0.540683 | 0.536196 | 0.477998 | 0.089279 | 0.28096 | 0.100062 | 0.383524 | 0.36603 | 0.5146568 | 0.589685 | 0.056601 | 0.065537 | |||
0.523217 | 0.503853 | 0.216371 | 0.287688 | 0.198159 | 0.151194 | 0.494373 | 0.503627 | 0.455402 | 0.093359 | 0.119011 | 0.123516 | 0.33748 | 0.289346 | 0.5154741 | 0.565243 | 0.049754 | 0.061803 | 0.035291 | ||
0.424163 | 0.464728 | 0.215501 | 0.285088 | 0.162308 | 0.139302 | 0.465628 | 0.475487 | 0.407798 | 0.062629 | 0.131808 | 0.035045 | 0.331196 | 0.272611 | 0.4768657 | 0.551702 | 0.04413 | 0.06458 | 0.030589 | 0.110248 | |
0.303423 | 0.404756 | 0.196714 | 0.210973 | 0.101944 | 0.115709 | 0.366374 | 0.38747 | 0.39362 | 0.044943 | 0.073729 | 0.031481 | 0.310951 | 0.23618 | 0.4594278 | 0.545644 | 0.04088 | 0.061531 | 0.026771 | 0.092094 | 0.020551 |
class mIoU | 0.51339 | 0.227215 | 0.361226 | 0.226208 | 0.173179 | 0.528324 | 0.522281 | 0.465112 | 0.08961 | 0.214359 | 0.125136 | 0.380157 | 0.336033 | 0.5095831 | 0.578527 | 0.050903 | 0.063363 | 0.030884 | 0.101171 | 0.020551 |
1-15 : 0.350022
16-20 : 0.053374
all : 0.27586
Hi,
Could you please share the code for generating data splits under data/ folder, it is convenient to use default splits such as (19-1, 15-5, 15-5s, 100-50), but It would be good to have the code, I am interested in some different splits.
BTW, I found the split voc 15-5s step-0 seems wrong, but I can use the one from 15-5, they should be the same.
Thanks.
Thanks for your great work!
How do I install the corresponding version of Inplace-ABN on Windows?
'Pip install inplace-abn' doesn't work.Neither does 'git clone' the package and then 'python setup.py install'.
I look forward to your reply.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.