Coder Social home page Coder Social logo

fcdl94 / mib Goto Github PK

View Code? Open in Web Editor NEW
167.0 167.0 44.0 13.79 MB

Official code for Modeling the Background for Incremental Learning in Semantic Segmentation https://arxiv.org/abs/2002.00718

License: MIT License

Python 89.13% Shell 0.45% Jupyter Notebook 10.43%

mib's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mib's Issues

minor issues

what is run_inc.py in the command line? it is not in the folder。
Which pretrained model do you use from https://github.com/mapillary/inplace_abn#training-on-imagenet-1k?

I tried several but I got errors:

model = make_model(opts, classes=tasks.get_per_task_classes(opts.dataset, opts.task, opts.step))
File "/home/xialei/MiB/segmentation_module.py", line 29, in make_model
del pre_dict['state_dict']['classifier.fc.weight']
KeyError: 'classifier.fc.weight'

Thanks!

Questions about the VOC 15-1 Disjoint and Overlapped results with MiB?

Hi @fcdl94, sorry for bothering you again, and since the questions are related to VOC, I open a new issue instead. I have re-run the disjoint and overlapped experiments for all the methods you included in the paper's Table 1. The ranking of the results is the same, which means the conclusions and claims of the paper works fine. However, when I list the MiB results in the sheet, something weird happen.

First, let me include my command line for replication as follow, and both settings are all training from scratch, which means they should be using the different pretrained model for the step 1-5:

  • overlapped:

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 0 --lr 0.01 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 1 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 2 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 3 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 4 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --master_port 2001 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913_overlapped --task 15-5s --overlap --step 5 --lr 0.001 --epochs 30 --method MiB

  • disjoint:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 0 --lr 0.01 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 1 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 2 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 3 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 4 --lr 0.001 --epochs 30 --method MiB

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_port 2000 --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5s_lr_0.01_with_pretrained_0913 --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB

I get the results after all the steps are completed, as shown below:

  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 all except 0
MiB (Disjoint) 84.61 16.49 31.68 58.00 27.89 43.96 1.02 15.31 73.63 1.05 35.17 22.15 65.75 54.63 28.12 80.94 0.09 24.38 13.66 15.79 17.38 31.35
MiB (Overlapped) 84.73 15.17 27.21 45.64 21.93 42.31 3.06 46.96 77.94 4.06 36.41 37.46 64.53 43.26 28.14 80.49 0.02 25.04 17.06 14.06 15.09 32.29

Results Summary:

  • For the Disjoint setting, the 1-15 class mIoU is 37.05 and 16-20 class mIoU is 14.26. The all mIoU should be 31.35;

  • For the Overlapped setting, the 1-15 class mIoU is 38.30 and 16-20 class mIoU is 14.25. The all mIoU should be 32.29.

Compared with the Table 2 results, for the disjoint setting, you get 46.2 mIoU for 1-15 and 12.9 mIoU for 16-20, then 37.9 mIoU for all; for the Overlapped setting, you get 35.1 mIoU for 1-15 and 13.5 for 16-20, then 29.7 mIoU for all.

Therefore,

  • The first problem is that the most different one should be in the Disjoint setting. For the 1-15 class, mine is 37.5 while yours is 46.2;
  • The second problem is that in my results, the Overlapped setting achieves better performance than the Disjoint setting, which is different from Table 2 results.

Is there anything I miss to reproduce the VOC 15-1 results with MiB?

FYI, I attach the raw data of each steps for each setting:

  • Disjoint:

Step 0:

T-IoU 0.9428957387543162 0.9093635384910007 0.407387220528084 0.9114261479178447 0.6930536991709606 0.8098990488060078 0.951383562635301 0.9017522187102619 0.9226235898934364 0.460997208347546 0.8647273405405083 0.5378474228922152 0.8973456596703172 0.867327442844769 0.8703716247790984 0.8676089249667962

Step 1:

T-IoU 0.9167589144636081 0.8873648605178784 0.4092252459852222 0.869479481025851 0.6806551633049537 0.7638765341517527 0.9205706473134169 0.8486735934421107 0.915373926952166 0.43939618475723113 0.8347598489209404 0.503665661560724 0.8803754274877195 0.8143450733616353 0.8425866785715388 0.8478188317980148 0.17931164855409873

Step 2:

T-IoU 0.9124010280617076 0.7157312195506308 0.3758244658997688 0.8048439599053759 0.5128759972974389 0.7772130550282125 0.8140185464753737 0.7540073558194968 0.8067052172080738 0.22098235839790872 0.24187419224161838 0.41607016629238047 0.7928031218145428 0.7074281456336383 0.8009249524418639 0.8291462394258059 0.08864073567935006 0.234592279843589

Step 3:

T-IoU 0.8605716951061348 0.7427626686495384 0.3754501232161909 0.809016676048648 0.5147889505056727 0.6270363019643613 0.7184538662827296 0.7334033091838555 0.8304927366076473 0.1250355660764828 0.41305395553844215 0.44932921473209253 0.787892298275096 0.7264193954508246 0.7572718800408372 0.8392311512765394 0.04363873022939195 0.15701272132864186 0.15189688631990797

Step 4:

T-IoU 0.8617972763749628 0.20872328663156442 0.3631064199006128 0.5886765010309403 0.34041861509825144 0.5474293754913939 0.13570653927230805 0.5574966269754023 0.64431626024196 0.031194197377586146 0.4463714388961355 0.24921212066929457 0.6278902980821934 0.5785498297789483 0.504660152102315 0.7900153928512044 0.011786844749089675 0.20088916659028505 0.12481700285889741 0.19506359566450185

Step 5:

T-IoU 0.8460856346827669 0.16490433429984022 0.31678143548785276 0.5799742617085539 0.2789076000769832 0.4395905332124107 0.01023589863212125 0.1530806668255949 0.7362783346863634 0.010541665752975061 0.35172514843737246 0.2214938353272161 0.6574922122682622 0.5462862022895337 0.28115039555445903 0.8093942994334622 0.0009336352682833694 0.24378756157086445 0.1365743221933673 0.15787678203324385 0.17377045020350113
  • Overlapped:

Step 0:

T-IoU 0.942351454969199 0.9088923380558995 0.41445877002346937 0.8943720292308379 0.7216543435352611 0.8245976268597867 0.9404349006553606 0.9077450313931694 0.9266660651840857 0.4700834557332964 0.857335799102383 0.5793174660444275 0.8987260795649801 0.8612262856604168 0.8698300613490206 0.8655979116146679

Step 1:

T-IoU 0.9182269199439186 0.8792879987006258 0.4017180006248381 0.8441055742619886 0.6994568677193088 0.7797706674159932 0.9149230512624791 0.8669237437597723 0.925295289580988 0.4621438312184923 0.8278570724749236 0.5046357276965907 0.8827558057428029 0.8319480561554519 0.8351552396152283 0.8531293756555369 0.20081410045276016

Step 2:

T-IoU 0.9102123086486242 0.6716115648361257 0.3323608949874176 0.7409822233592194 0.4959355955312635 0.7102449043004382 0.8138243088151255 0.7864668388252009 0.8396605082671436 0.19756995217299297 0.47104696504611043 0.3864640364884321 0.7830237173154856 0.6534901834317308 0.8130206998712594 0.8267346598625166 0.2135801990714857 0.2739649761053714

Step 3:

T-IoU 0.8632306048394159 0.705974539166515 0.36605442930053417 0.7172314140174203 0.47740362529902175 0.6360963911559759 0.7891308674752905 0.8167691950759172 0.8518501315621179 0.1859484531773666 0.5689216234786797 0.4579828893103606 0.7713658863930264 0.6804223882563137 0.7407504318523732 0.8474278492117775 0.14708674311949124 0.2741278598946114 0.18034994422937553

Step 4:

T-IoU 0.8647615150869634 0.12248332643577793 0.29494393825587 0.5665025942424852 0.25189086565151464 0.504964955569028 0.2607169020464405 0.6373994602886512 0.6749896193770941 0.038352339739995286 0.3740172433656488 0.33634831372874185 0.584688752624233 0.40671331551812523 0.32672648567942325 0.7970850159150706 0.009704602104938941 0.20900667311557677 0.14287384004333556 0.18019787539515425

Step 5:

T-IoU 0.847341876904206 0.1516912277406037 0.27214761938518534 0.4564398366880403 0.21925902115097656 0.4230704428405192 0.03062049153551857 0.4696080316553439 0.7793639447531278 0.04059292675086336 0.3641082518756912 0.3746037758239177 0.6453468608880033 0.43260491849114996 0.28142230199939966 0.8048656345565484 0.0001993869309182823 0.2504386187169609 0.17055907785579358 0.14061286493441055 0.1509222886875183

NaN Loss and learn nothing for all the classes when training from scratch without using pretrained model

When I train the MiB from scratch using the VOC dataset, the Loss keeps being nan during the training. Any idea about this issue?

  • command:
    python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_no_pretrained --task 15-5 --lr 0.01 --epochs 30 --method MiB --no_pretrained
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name test_MIB_voc_15_5_lr_0.01_no_pretrained --task 15-5 --lr 0.01 --epochs 30 --method MiB --no_pretrained

*****************************************Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.*****************************************

INFO:rank1: Device: cuda:1
INFO:rank0: [!] starting logging at directory ./logs/15-5-voc/test_MIB_voc_15_5_lr_0.01_no_pretrained/
INFO:rank0: Device: cuda:0
INFO:rank0: Dataset: voc, Train set: 8437, Val set: 1240, Test set: 1240, n_classes 16
INFO:rank0: Total batch size is 24
INFO:rank0: Backbone: resnet101
INFO:rank0: [!] Model made without pre-trained
Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
Selected optimization level O0:  Pure FP32 training.

Defaults for this optimization level are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",)
Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
INFO:rank0: [!] Train from scratch
INFO:rank1: tensor([[79]])
INFO:rank0: tensor([[79]])
INFO:rank0: Epoch 0, lr = 0.010000
INFO:rank0: Epoch 0, Batch 10/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 20/351, Loss=nan
Warning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 30/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 40/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 50/351, Loss=nanWarning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 60/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 70/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 80/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 90/351, Loss=nan
Warning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 100/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 110/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 120/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 130/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 140/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 150/351, Loss=nan
Warning: NaN or Inf found in input tensor.INFO:rank0: Epoch 0, Batch 160/351, Loss=nanWarning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 170/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 180/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 190/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 200/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 210/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 220/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 230/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 240/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 250/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 260/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 270/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 280/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 290/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 300/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 310/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 320/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 330/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 340/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Batch 350/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 0, Class Loss=nan, Reg Loss=0.0
INFO:rank0: End of Epoch 0/30, Average Loss=nan, Class Loss=nan, Reg Loss=0.0
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.
INFO:rank0: validate on val set...
INFO:rank0: Validation, Class Loss=nan, Reg Loss=0.0 (without scaling)
INFO:rank1: Done validation
INFO:rank0: Done validation
INFO:rank0: End of Validation 0/30, Validation Loss=nan, Class Loss=nan, Reg Loss=0.0
INFO:rank0:
Total samples: 1240.000000
Overall Acc: 0.694367
Mean Acc: 0.062500
FreqW Acc: 0.482146
Mean IoU: 0.043398
Class IoU:
        class 0: 0.6943672262601215
        class 1: 0.0
        class 2: 0.0
        class 3: 0.0
        class 4: 0.0
        class 5: 0.0
        class 6: 0.0
        class 7: 0.0
        class 8: 0.0
        class 9: 0.0
        class 10: 0.0
        class 11: 0.0
        class 12: 0.0
        class 13: 0.0
        class 14: 0.0
        class 15: 0.0
Class Acc:
        class 0: 0.9999999999999951
        class 1: 0.0
        class 2: 0.0
        class 3: 0.0
        class 4: 0.0
        class 5: 0.0
        class 6: 0.0
        class 7: 0.0
        class 8: 0.0
        class 9: 0.0
        class 10: 0.0
        class 11: 0.0
        class 12: 0.0
        class 13: 0.0
        class 14: 0.0
        class 15: 0.0

INFO:rank0: [!] Checkpoint saved.
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, lr = 0.009699
INFO:rank0: Epoch 1, Batch 10/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 20/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 30/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 40/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 50/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 60/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 70/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 80/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 90/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 100/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 110/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 120/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 130/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 140/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 150/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 160/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 170/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 180/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 190/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 200/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 210/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 220/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 230/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 240/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 250/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 260/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 270/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 280/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 290/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 300/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 310/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 320/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 330/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 340/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Batch 350/351, Loss=nan
Warning: NaN or Inf found in input tensor.
INFO:rank0: Epoch 1, Class Loss=nan, Reg Loss=0.0
INFO:rank0: End of Epoch 1/30, Average Loss=nan, Class Loss=nan, Reg Loss=0.0
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.
INFO:rank0: validate on val set...
INFO:rank1: Done validation
INFO:rank0: Validation, Class Loss=nan, Reg Loss=0.0 (without scaling)
INFO:rank0: Done validation
INFO:rank0: End of Validation 1/30, Validation Loss=nan, Class Loss=nan, Reg Loss=0.0
INFO:rank0:
Total samples: 1240.000000
Overall Acc: 0.694367
Mean Acc: 0.062500
FreqW Acc: 0.482146
Mean IoU: 0.043398
Class IoU:
        class 0: 0.6943672262601215
        class 1: 0.0
        class 2: 0.0
        class 3: 0.0
        class 4: 0.0
        class 5: 0.0
        class 6: 0.0
        class 7: 0.0
        class 8: 0.0
        class 9: 0.0
        class 10: 0.0
        class 11: 0.0
        class 12: 0.0
        class 13: 0.0
        class 14: 0.0
        class 15: 0.0
Class Acc:
        class 0: 0.9999999999999951
        class 1: 0.0
        class 2: 0.0
        class 3: 0.0
        class 4: 0.0
        class 5: 0.0
        class 6: 0.0
        class 7: 0.0
        class 8: 0.0
        class 9: 0.0
        class 10: 0.0
        class 11: 0.0
        class 12: 0.0
        class 13: 0.0
        class 14: 0.0
        class 15: 0.0

I have a little question about the data of table1

image

image

I think the training results should be the same at 1-15 for Single-step addition of five classes (15-5) and Multi-step addition of five classes (15-1).
Isn’t it different the last five times?
You have referenced the following documents:Incremental Learning of Object Detectors without Catastrophic Forgetting
I verified my idea in this article. Did I understand it wrong? Or is it a mistake in the article?
image
Thanks for your reply, the article idea is great!

The division of the test data set.

Hello,
Thanks for your interesting work. I am executing your method using other data sets. But in this process, I encountered the problem of data set partition. Should all test data be used for each task of incremental learning? How to devide the test data set? Is it possible to split test data set using ade-split.ipynb?

About reproducing "Disjoint 15-5s" setting

Hi, when I was reproducing the 15-5s mission, there was a problem.
I got the performance of the last model (21 classes) as following:

Class IoU: class 0: 0.8489214 class 1: 0.16607599 class 2: 0.287244 class 3: 0.33080766 class 4: 0.08697148 class 5: 0.37604603 class 6: 0.0072344807 class 7: 0.2163677 class 8: 0.6778649 class 9: 0.010483054 class 10: 0.28229013 class 11: 0.08285885 class 12: 0.54856247 class 13: 0.5060702 class 14: 0.26826012 class 15: 0.79161125 class 16: 0.00043867234 class 17: 0.22526477 class 18: 0.13574767 class 19: 0.14906816 class 20: 0.2198876 Class Acc: class 0: 0.9055752 class 1: 0.1661803 class 2: 0.45422554 class 3: 0.33121628 class 4: 0.08705099 class 5: 0.38315275 class 6: 0.0072452705 class 7: 0.21683867 class 8: 0.6815347 class 9: 0.010607969 class 10: 0.301625 class 11: 0.083156176 class 12: 0.59717894 class 13: 0.5174191 class 14: 0.27290416 class 15: 0.86344343 class 16: 0.0004472701 class 17: 0.70130014 class 18: 0.41352695 class 19: 0.9158315 class 20: 0.64600176

I used the command as following:
`CUDA_VISIBLE_DEVICES=1 python -m torch.distributed.launch --nproc_per_node=1 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 0
--lr 0.01
--epochs 30
--method MiB
--opt_level O1 \

CUDA_VISIBLE_DEVICES=1 python -m torch.distributed.launch --nproc_per_node=1 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 1
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 2
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 3
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 4
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 \

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 run.py
--batch_size 10
--dataset voc
--name MIB_lr0-01_for_step0_lr0-001_for_step1
--task 15-5s
--step 5
--lr 0.001
--epochs 30
--method MiB
--opt_level O1 `

visualize on the Val set (ADE20K)

Hi, I have run the code on 50-50-50 setting on ade20k.
I want to know how to visualize on the verification set. If you have the corresponding code, can you provide it?

Thank you for your work and look forward to your reply

To Reproduce

Dataset: ADE20K
Setting: 50-50-50

How do i infer the logs or tensorboard?

Hello @fcdl94, I really liked your research work, and I was trying to reproduce the results.
I tried for 19-1, 15-5, and 15-5s split on PASCAL VOC. The command shown below is for the 19-1 split.
Could you also please confirm whether these default settings are enough to reproduce the result in Table 1: Mean IoU on the Pascal-VOC 2012 dataset for different incremental class learning scenarios of your research paper?

#disjoint
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 0 --lr 0.01 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 1 --lr 0.001 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/

#overlap
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 0 --lr 0.01 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/ --overlap
python -m torch.distributed.launch --nproc_per_node=4 run.py --data_root /ssd_scratch/cvit/dksingh/ --batch_size 6 --dataset voc --name MiB --task 19-1 --step 1 --lr 0.001 --epochs 30 --method MiB --logdir /ssd_scratch/cvit/dksingh/mib_logs/ --overlap

I was wondering on infering the tensorboard logs or the general log file

Batch size, GPU number, performance without pretrained model for ADE20k training?

First I want to thank you for your work! When I try to reproduce the result of ADE20k, I find the issue of out of memory under two RTX 2080Ti. Could you provide the batch size and also the GPU number for ADE20k training?

I did not find the batch size and GPU number you used for all the experiments in your ArXiv version paper, which I think should be critical for reproduction and to validate your paper's conclusion.

  • Questions Highlight:

    1. What's your batch size setting for the ADE20k and VOC training?
    2. What GPU did you use for training and testing?
    3. Could you provide the training time for your experiments?
    4. Could you provide the final performance without using the pretrained model from Inplace-ABN repo?
    5. Could you provide the performance of the step-0 for MiB under VOC and ADE training with pretrained model?
  • Exp Command:
    command: CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained

  • Error Log:

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
INFO:rank1: Device: cuda:1
Filtering images...
INFO:rank0: [!] starting logging at directory ./logs/100-50-ade/test_MIB_ade_100_50_lr_0.01_no_pretrained/
INFO:rank0: Device: cuda:0
        0/2000 ...
Filtering images...
        0/2000 ...
        1000/2000 ...
        1000/2000 ...
Filtering images...
        0/2000 ...
Filtering images...
        0/2000 ...
        1000/2000 ...
        1000/2000 ...
INFO:rank0: Dataset: ade, Train set: 13452, Val set: 2000, Test set: 2000, n_classes 101
INFO:rank0: Total batch size is 24
INFO:rank0: Backbone: resnet101
INFO:rank0: [!] Model made without pre-trained
Selected optimization level O0:  Pure FP32 training.

Defaults for this optimization level are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : Falseloss_scale             : 1.0INFO:rank0: [!] Train from scratch
INFO:rank0: tensor([[50]])
INFO:rank1: tensor([[50]])
INFO:rank0: Epoch 0, lr = 0.010000
Traceback (most recent call last):
  File "run.py", line 390, in <module>
    main(opts)
  File "run.py", line 277, in main
    train_loader=train_loader, scheduler=scheduler, logger=logger)
  File "/home/jovyan/MiB/train.py", line 128, in train
    scaled_loss.backward()
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 1; 10.73 GiB total capacity; 9.31 GiB already allocated; 371.56 MiB free; 243.46 MiB cached)
Traceback (most recent call last):
  File "run.py", line 390, in <module>
    main(opts)
  File "run.py", line 277, in main
    train_loader=train_loader, scheduler=scheduler, logger=logger)
  File "/home/jovyan/MiB/train.py", line 128, in train
    scaled_loss.backward()
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 0; 10.73 GiB total capacity; 9.31 GiB already allocated; 373.56 MiB free; 241.46 MiB cached)
Traceback (most recent call last):
  File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
    main()
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 242, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/MiB/bin/python', '-u', 'run.py', '--local_rank=1', '--data_root', 'data', '--batch_size', '12', '--dataset', 'ade', '--name', 'test_MIB_ade_100_50_lr_0.01_no_pretrained', '--task', '100-50', '--lr', '0.01', '--epochs', '30', '--method', 'MiB', '--no_pretrained']' returned non-zero exit status 1.

Difference in the mean IoU on test dataset

Hello,

Thanks for the good work.

I have a few questions to ask.

  1. Is the step "1-15" under task "15-5" and step "1-15" under task "15-5s" the same? If yes, why is there a difference in the performance or mean IoU values in the paper?

  2. I performed the FT and MiB experiments, and the mean IoU is differing too much from the one in the paper. The difference between the command given by you and the one I am using is that I have a single CUDA device and reduced the batch size to 12.

Thanks and regards,
Sreeni...

Confused about data augmentation

In Subset.py, I saw you performed the normalization to both input and label. I'm confused about why using normalization to the label?

train_transform = transform.Compose([
        transform.RandomResizedCrop(opts.crop_size, (0.5, 2.0)),
        transform.RandomHorizontalFlip(),
        transform.ToTensor(),
        transform.Normalize(mean=[0.485, 0.456, 0.406],
                            std=[0.229, 0.224, 0.225]),
    ])
sample, target = self.transform(sample, target)

From the pytorch offical document about cross entropy loss function, we know that the label is only used as index.

loss calculation

In the file loss.py, under class UnbiasedCrossEntropy(nn.Module) line 102
outputs[:, old_cl:] = inputs[:, old_cl:] - den.unsqueeze(dim=1)
inputs[] is the raw logits of network, den is the logsumexp of those logits. To my understanding input[] should be first go through log and exp before operate with den.

ADE-split.ipynb question

Hello,I want to ask about the logic of cutting data, how a photo puts it on the incremental learning steps. Sorry to bother you.Tks.

# This is the code who actually make the split.
all_added = set()

imgs={}
others = {}
every=set()
added = [0 for i in range(151)]
for i in range(sets+1):
    others[i]=set()
    imgs[i]=set()

# This is the most imporant function. 
# It computes the score of the image to be assinged to a class
def score(ratio,imgs,expected,minval=False,w1=120,w2=1000,w3=1 ):
    return (1.-ratio)*w1+w3*minval

for i in imgs_to_cls.keys(): # loop for every image in the dataset
    # img_counts = num of images per class for the class in the actual image  
    img_counts = [class_members[j] for j in imgs_to_cls[i]] 
    # ratio = images assigned to the class / total number of images for the class  
    ratios = [(added[j]+0.0)/class_members[j] for j in imgs_to_cls[i]]
    added_counts = [added[j] for j in imgs_to_cls[i]]
    # assignments = set of each class in the image
    assignments = [map_class_to_set[j] for j in imgs_to_cls[i]]

    scores = [score(ratios[c],added_counts[c],class_members[j],1./(img_counts[c]/sum(img_counts))) for c,j in enumerate(imgs_to_cls[i])]

    # take the highest scorer class
    cl = scores.index(max(scores))
    # take the set of the higher scorer class
    a=assignments[cl]

    # add the image to the step images
    imgs[a].update([i])
    for j,ac in enumerate(assignments):
        if ac==a:
            # increment the number of images assigned to the classes in the current image contained in the same step.
            added[imgs_to_cls[i][j]] += 1


for i in range(1,len(added)):        
        assignment = map_class_to_set[i]
        ratios = [len(set(idxs[i]).intersection(imgs[j]))/class_members[i] for j in range(0,sets+1)]
        if ratios[assignment]<0.5 or (ratios[assignment]<1. and added[i]<100):
            print(i,ratios[assignment], sum(ratios[1:]), class_members[i], len(set(idxs[i]).intersection(imgs[assignment])))
s=0
for i in range(sets+1):
    print(len(imgs[i]))
    s+=len(imgs[i])

print(s)

About the performance in disjoint Pascal-Voc (19-1) task.

Hi, thank you for creating such an innovative and wonderful incremental semantic segmentation method.
In this paper, the mean IoU of MiB in disjoint Pascal-Voc (19-1) Task is 69.6(1-19) 25.6(20) 67.4(all). When I try to reproduce the performance, the final mean IoU of MiB(step1) in disjoint Pascal-Voc (19-1) task is 62.7(1-19) 17.3(20) 60.5(all), thus I want to know if there is a problem of my MiB(step0). Could you please show me the mean IoU of your MiB(step0) in disjoint Pascal-Voc (19-1) task?

Here is the test result when my MiB(step0) was trained after 30 epochs:

Train set: 10034, Val set: 1421, Test set: 1421, n_classes 20
INFO:rank0: *** Test the model on all seen classes...
INFO:rank0: *** Model restored from checkpoints/step/19-1-voc_MIB_original_0.pth
Total samples: 1422.000000
Overall Acc: 0.937206
Mean Acc: 0.850237
FreqW Acc: 0.888375
Mean IoU: 0.753881
Class IoU:
class 0: 0.9282959437197884
class 1: 0.8882853094595112
class 2: 0.3797590436288108
class 3: 0.8807409395997082
class 4: 0.6648712821388697
class 5: 0.7976358668707264
class 6: 0.9164809865812081
class 7: 0.8817972918369146
class 8: 0.9234520147664763
class 9: 0.32377577234982474
class 10: 0.8422507677142984
class 11: 0.5292598617495181
class 12: 0.8956875294271032
class 13: 0.8407520660676776
class 14: 0.848592612779058
class 15: 0.8356916493750883
class 16: 0.5890588805790536
class 17: 0.8099400266733182
class 18: 0.4580267122561858
class 19: 0.8432569294818675
Class Acc:
class 0: 0.966766716830572
class 1: 0.960997648920115
class 2: 0.8587795396354625
class 3: 0.9328072965429103
class 4: 0.8608227242951523
class 5: 0.9055918146656152
class 6: 0.9412850562654307
class 7: 0.9158580847049574
class 8: 0.9614393245122251
class 9: 0.47560739230891674
class 10: 0.9135487146581773
class 11: 0.5875200166947372
class 12: 0.9519971945124238
class 13: 0.9047293200139962
class 14: 0.9272278961100133
class 15: 0.8997701656461145
class 16: 0.7169984272644204
class 17: 0.867543489475999
class 18: 0.5452584289443756
class 19: 0.9101945156506736

And the test result when my MiB(step1) was trained after 30 epochs:

Train set: 548, Val set: 74, Test set: 1449, n_classes 21
INFO:rank0: *** Test the model on all seen classes...
INFO:rank0: *** Model restored from checkpoints/step/19-1-voc_MIB_original_1.pth
Total samples: 1450.000000
Overall Acc: 0.890334
Mean Acc: 0.725162
FreqW Acc: 0.820754
Mean IoU: 0.618920 # 0.6411918301227864 0-19 20 0.1734
Class IoU:
class 0: 0.8938328066648968
class 1: 0.7782195086558429
class 2: 0.37033196721590445
class 3: 0.8413915510994053
class 4: 0.525415475349433
class 5: 0.6729141125705496
class 6: 0.20963013074814107
class 7: 0.7363044189891342
class 8: 0.8777653860912165
class 9: 0.29350207126893696
class 10: 0.8145631162334964
class 11: 0.49090401230842257
class 12: 0.8398493348474195
class 13: 0.7777190817231865
class 14: 0.683588348834814
class 15: 0.7999431095131695
class 16: 0.555339335033059
class 17: 0.7869103135756841
class 18: 0.4080286080402448
class 19: 0.4676839136927719
class 20: 0.17347901289607445
Class Acc:
class 0: 0.9573831275998825
class 1: 0.7941349223562392
class 2: 0.7405488472008162
class 3: 0.939724801161348
class 4: 0.5830666957769701
class 5: 0.7291240593393206
class 6: 0.21006324595180248
class 7: 0.7776654179327952
class 8: 0.9586466681798896
class 9: 0.40889428771720887
class 10: 0.8555168166200787
class 11: 0.5290773778328225
class 12: 0.9181100424143835
class 13: 0.8976137666936727
class 14: 0.7100771395812419
class 15: 0.8855347468407907
class 16: 0.702358914737102
class 17: 0.8942199044152891
class 18: 0.4998755740006467
class 19: 0.47593462075060633
class 20: 0.7608375943271062

I run the following commands to train the MiB in disjoint Pascal-Voc (19-1) task.

python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name MIB_original --task 19-1 --step 0 --lr 0.001 --epochs 30 --method MiB
python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset voc --name MIB_original --task 19-1 --step 1 --lr 0.001 --epochs 30 --method MiB

i want to use standard BN

Hello, i want to use batch normalization instead of In-place ABN. What setting should I revise ?. I only revised norm_act to std in argparser.py. But i got error torch.nn.modules.module.ModuleAttributeError: 'BatchNorm2d' object has no attribute 'activation'.
tks for help.

about dataset

Hi, how are the things going on with you?
I have a question about this code when I perform the run.py. the error is : ValueError: Wrong image_set entered! Please use image_set="train" or image_set="trainval" or image_set="val", the dataset is Pascal VOC 2012.
I think this error is cased by the dataset_root setting or other dataload setting. About how to download dataset in your Readme file, I do not understand the detail about that, could you detail the dataset_root setting or give the file distribution form after the dataset is downloaded.
The attachment is the file distribution where I put the dataset.
I would be very grateful if you could tell me the solution, looking forward to your replay.
fig1

About running command related issues

Thank you for your great work, MiB!

Depending on your introduction, I have configured all the environments and run the command:
python -m torch.distributed.launch --nproc_per_node=0 run.py --data_root data --batch_size 12 --dataset voc --name MIB --task 15-5 --step 1 --lr 0.001 --epochs 30 --method MIB

But the experiment could not run and there was no error. The following is the result of my command run:
run_command

The following is my dataset:
File_path

Could you please tell me is there any data path that needs to be set in advance or are there other variables that need to be set before experimenting?
Thank you for your feedback.

Can not implement run.py

Hi, i am facing a problem like below:

python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name LWF --task 100-50 --step 0 --lr 0.01 --epochs 60 --method LWF
/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


INFO:rank1: Device: cuda:1
Traceback (most recent call last):
File "run.py", line 390, in
main(opts)
File "run.py", line 116, in main
logger = Logger(logdir_full, rank=rank, debug=opts.debug, summary=opts.visualize, step=opts.step)
File "/home/cuong69/Desktop/MiB-master/utils/logger.py", line 15, in init
import tensorboardX
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/init.py", line 5, in
from .torchvis import TorchVis
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/torchvis.py", line 11, in
from .writer import SummaryWriter
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/writer.py", line 15, in
from .event_file_writer import EventFileWriter
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/event_file_writer.py", line 28, in
from .proto import event_pb2
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/proto/event_pb2.py", line 15, in
from tensorboardX.proto import summary_pb2 as tensorboardX_dot_proto_dot_summary__pb2
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/proto/summary_pb2.py", line 15, in
from tensorboardX.proto import tensor_pb2 as tensorboardX_dot_proto_dot_tensor__pb2
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/proto/tensor_pb2.py", line 15, in
from tensorboardX.proto import resource_handle_pb2 as tensorboardX_dot_proto_dot_resource__handle__pb2
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/tensorboardX/proto/resource_handle_pb2.py", line 22, in
serialized_pb=_b('\n(tensorboardX/proto/resource_handle.proto\x12\x0ctensorboardX"r\n\x13ResourceHandleProto\x12\x0e\n\x06\x64\x65vice\x18\x01 \x01(\t\x12\x11\n\tcontainer\x18\x02 \x01(\t\x12\x0c\n\x04name\x18\x03 \x01(\t\x12\x11\n\thash_code\x18\x04 \x01(\x04\x12\x17\n\x0fmaybe_type_name\x18\x05 \x01(\tB/\n\x18org.tensorflow.frameworkB\x0eResourceHandleP\x01\xf8\x01\x01\x62\x06proto3')
TypeError: new() got an unexpected keyword argument 'serialized_options'
Filtering images...
0/2000 ...
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1651457 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1651456) of binary: /home/cuong69/anaconda3/envs/plop/bin/python
Traceback (most recent call last):
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/cuong69/anaconda3/envs/plop/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

run.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-06-15_09:57:39
host : aaa-Z490-AORUS-MASTER
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1651456)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

I think it is related to version conflict...my gpu is RTX3090, therefore, i must use cuda 11.3.
Please help me to solve the problem..Thank you!

about pretrain model

Hi,
I want to reproduce your work.
but I get an error as follow
FileNotFoundError: [Errno 2] No such file or directory: 'pretrained/resnet101_iabn_sync.pth.tar'
can you tell me how to get it?

Inconsistent 15-5s VOC results with the reported ones

Hi, I am trying to reproduce the 15-5s results with the voc dataset. However, the results (see the table below) I got are significantly (5 points) higher than those reported in your paper.

step mIoU background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor base novel
0 80.37% 94.36% 90.62% 41.06% 89.66% 70.50% 82.79% 94.69% 89.50% 93.99% 47.16% 86.11% 54.31% 90.43% 87.54% 86.42% 86.84% 80.37%
1 74.93% 92.30% 89.18% 40.96% 85.45% 70.35% 79.40% 93.12% 86.16% 91.80% 44.55% 84.41% 52.69% 89.12% 82.97% 83.02% 84.83% 23.53% 78.14% 23.53%
2 58.70% 90.79% 63.33% 33.44% 73.98% 56.07% 77.29% 76.04% 75.58% 79.69% 11.87% 35.73% 37.31% 80.49% 62.46% 78.60% 82.57% 15.51% 25.89% 63.45% 20.70%
3 55.12% 84.07% 61.58% 35.32% 66.56% 45.26% 68.80% 75.12% 74.11% 81.62% 14.12% 47.14% 47.08% 83.13% 70.48% 62.91% 83.62% 3.38% 27.31% 15.69% 62.56% 15.46%
4 39.80% 84.85% 29.00% 26.06% 51.45% 28.52% 57.93% 50.61% 58.18% 62.76% 2.69% 34.24% 22.81% 66.81% 53.43% 29.02% 78.51% 0.40% 23.19% 13.31% 22.23% 46.05% 14.78%
5 34.29% 83.12% 22.23% 14.02% 40.29% 21.00% 53.62% 10.21% 37.06% 72.87% 0.74% 31.99% 34.20% 72.61% 52.94% 18.75% 76.92% 0.08% 28.18% 12.10% 16.76% 20.31% 40.16% 15.49%

For your reference, I run step 0 and step [1-5] with the commands below.

step 0

CUDA_VISIBLE_DEVICES=1,2,3 python -m torch.distributed.launch --nproc_per_node=3 run.py 
                --data_root path/to/data --method MiB --dataset voc --task 15-5s 
                --step 0 --overlap --lr 0.01 --batch_size 8 --epochs 30 --name MiB

step 1-5

for step in {1..5};
do
CUDA_VISIBLE_DEVICES=1,2,3 python -m torch.distributed.launch --nproc_per_node=3 run.py 
                --data_root path/to/data --method MiB --dataset voc --task 15-5s 
                --step ${step} --overlap --lr 0.001 --batch_size 8 --epochs 30 --name MiB
done

Note that I do no modifications to your codes. Hope that you can help to see if my results are correct.

Project dependencies may have API risk issues

Hi, In MiB, inappropriate dependency versioning constraints can cause risks.

Below are the dependencies and version constraints that the project is using

absl-py==0.8.0
apex==0.1
apturl==0.5.2
asn1crypto==0.24.0
astor==0.8.0
attrs==19.1.0
Automat==0.6.0
backcall==0.1.0
bleach==3.1.0
Brlapi==0.6.6
certifi==2018.1.18
chardet==3.0.4
click==6.7
colorama==0.3.7
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==2.1.4
cupshelpers==1.0
cvxpy==1.0.25
cycler==0.10.0
decorator==4.4.0
defer==1.0.6
defusedxml==0.6.0
dill==0.3.1.1
distro-info===0.18ubuntu0.18.04.1
ecos==2.0.7.post1
entrypoints==0.3
future==0.17.1
gast==0.3.1
google-pasta==0.1.7
grpcio==1.23.0
h5py==2.10.0
httplib2==0.9.2
hyperlink==17.3.1
idna==2.6
incremental==16.10.1
inplace-abn==1.0.7
ipykernel==5.1.2
ipython==7.8.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.15.1
Jinja2==2.10.1
joblib==0.11
jsonschema==3.0.2
jupyter==1.0.0
jupyter-client==5.3.3
jupyter-console==6.0.0
jupyter-core==4.5.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
keyring==10.6.0
keyrings.alt==3.0
kiwisolver==1.1.0
language-selector==0.1
launchpadlib==1.10.6
lazr.restfulclient==0.13.5
lazr.uri==1.0.3
louis==3.5.0
macaroonbakery==1.1.3
Mako==1.0.7
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mistune==0.8.4
multiprocess==0.70.9
nbconvert==5.6.0
nbformat==4.4.0
netifaces==0.10.4
nose==1.3.7
notebook==6.0.1
numpy==1.17.2
oauth==1.0.1
olefile==0.45.1
osqp==0.6.1
PAM==0.4.2
pandocfilters==1.4.2
parso==0.5.1
pexpect==4.7.0
pickleshare==0.7.5
Pillow==6.1.0
pluggy==0.6.0
prometheus-client==0.7.1
prompt-toolkit==2.0.9
protobuf==3.9.1
ptyprocess==0.6.0
py==1.5.2
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycairo==1.16.2
pycrypto==2.6.1
pycups==1.9.73
Pygments==2.4.2
pygobject==3.26.1
pymacaroons==0.13.0
PyNaCl==1.1.2
pyOpenSSL==17.5.0
pyparsing==2.4.2
pyRFC3339==1.0
pyrsistent==0.15.4
pyserial==3.4
pytest==3.3.2
python-apt==1.6.4
python-dateutil==2.8.0
python-debian==0.1.32
pytz==2018.3
pyxdg==0.25
PyYAML==3.12
pyzmq==18.1.0
qtconsole==4.5.5
reportlab==3.4.0
requests==2.18.4
requests-unixsocket==0.1.5
scikit-learn==0.19.1
scipy==1.3.1
screen-resolution-extra==0.0.0
scs==2.1.1.post2
SecretStorage==2.3.1
Send2Trash==1.5.0
service-identity==16.0.0
simplegeneric==0.8.1
simplejson==3.13.2
six==1.12.0
ssh-import-id==5.7
system-service==0.3
systemd-python==234
tensorboard==1.14.0
tensorboardX==1.8
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0
termcolor==1.1.0
terminado==0.8.2
testpath==0.4.2
torch==1.2.0
torchvision==0.4.0
tornado==6.0.3
traitlets==4.3.2
Twisted==17.9.0
ubuntu-drivers-common==0.0.0
ufw==0.36
urllib3==1.22
usb-creator==0.3.3
wadllib==1.3.2
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.6
widgetsnbextension==3.5.1
wrapt==1.11.2
xkit==0.0.0
zope.interface==4.3.2

The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.

After further analysis, in this project,
The version constraint of dependency multiprocess can be changed to ==0.70.4.
The version constraint of dependency multiprocess can be changed to >=0.70.4,<=0.70.4.
The version constraint of dependency Pillow can be changed to ==9.2.0.
The version constraint of dependency Pillow can be changed to >=2.0.0,<=9.1.1.
The version constraint of dependency pyasn1 can be changed to >=0.4.1,<=0.4.8.

The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.

The invocation of the current project includes all the following methods.

The calling methods from the multiprocess
logger.debug
logger.info
The calling methods from the Pillow
PIL.Image.open
The calling methods from the pyasn1
open
The calling methods from the all methods
torch.log_softmax
utils.logger.Logger.info
self._mean.reshape
self.n_classes.mask.label_pred.int.mask.label_true.astype.self.n_classes.np.bincount.reshape.sum
dict
dataset.transform.Resize
self.mod2
any
metrics.StreamSegMetrics
slice
range.keys
self.licarl.item
x.split
join
argparser.modify_command_options
model.named_parameters
torchvision.transforms.functional.adjust_contrast
self.book.clear
utils.loss.KnowledgeDistillationLoss
os.makedirs
self.lambd
torch.nn.Linear
p.grad.detach
torch.cat
metrics.update
p.grad.detach.pow
torch.nn.functional.binary_cross_entropy_with_logits
mask.float
torch.no_grad
utils.logger.Logger.add_image
self.mod5
torch.exp
NotImplementedError
VOCSegmentation
torch.randint
os.path.join
os.path.isdir
numpy.load
n.self.model_old_dict.p.detach.pow
model.eval
tbl.items
matplotlib.pyplot.subplots
type
torch.utils.data.DataLoader
targets.sum.loss.torch.masked_select.sum
RW
mask.label_true.astype
self.regularizer.load_state_dict
inputs.shape.labels_new.F.one_hot.float
train.Trainer
torch.nn.functional.pad
tasks.get_task_labels
self.fisher_old.items
n.self.score.to
random.random
numpy.random.choice
list.append
warnings.warn
random.uniform
Compose
images.detach.cpu.numpy
self.IncrementalSegmentationModule.super.__init__
key.self.fisher_old.to
denorm
self.convs
targets.sum.loss.torch.masked_select.mean
os.path.isfile
class_loss.torch.tensor.to
torch.nn.CrossEntropyLoss
torch.log
module.named_children
matplotlib.use
self.PolyLR.super.__init__
torch.optim.lr_scheduler.StepLR.state_dict
self.GlobalAvgPool2d.super.__init__
torch.distributed.get_rank
utils.Denormalize
self._fast_hist
p.clone
numpy.concatenate
utils.PolyLR
model.module.init_new_classifier
torch.nn.functional.pad.view
self.classes.torch.FloatTensor.torch.log.to
target.label2color.transpose.astype
loss.mean.mean
torch.nn.functional.one_hot
numpy.diag.sum
torchvision.transforms.functional.resized_crop
torch.nn.Conv2d.append
labels.cpu.numpy.cpu
self._global_pooling
self.lde_loss
utils.filter_images
torch.logsumexp.unsqueeze
apex.amp.scale_loss
modules.GlobalAvgPool2d
n.self.model_temp.to
math.sqrt
optim.zero_grad
torchvision.transforms.functional.adjust_saturation
idxs_path.np.load.tolist
logging.basicConfig
self.get_score
utils.loss.UnbiasedKnowledgeDistillationLoss
self.head
fig.tight_layout
torch.nn.functional.pad.repeat
p.clone.detach.cpu
self.regularizer.update
numpy.bincount
scaled_loss.backward
p.torch.clone.detach
argparse.ArgumentParser
t
torch.nn.functional.avg_pool2d
self.red_bn
torch.optim.lr_scheduler.StepLR.load_state_dict
torch.distributed.reduce
label2color
self.score.items
convert_bn2gn
train.Trainer.load_state_dict
transform
mod
epoch_loss.torch.tensor.to
cls.bias.data.copy_
images.to.detach
torch.nn.GroupNorm.add_module
self.DeeplabV3.super.__init__
self.model.named_parameters
m
samples.cpu.numpy
torch.nn.functional.nll_loss
self._transform_tag
torch.optim.SGD.load_state_dict
score.items
f.readlines
modules.DeeplabV3
logger.debug
torch.nn.functional.leaky_relu
utils.logger.Logger
idxs.append
opts.backbone.models.__dict__
self.convs.add_
apex.parallel.DistributedDataParallel.state_dict
inputs.size
norm_act
m.eval
torch.tensor
lt.flatten
numpy.random.seed
torch.softmax
x.dim
vars
int
AdeSegmentation
metrics.synch
apex.parallel.DistributedDataParallel.cuda
metrics.reset
apex.parallel.DistributedDataParallel.parameters
torchvision.transforms.functional.crop
inputs.shape.targets.shape.range.x.x.torch.tensor.to
__all__.append
apex.amp.initialize
n.self.model_old_dict.p.pow.n.self.score_actual.sum
sorted
utils.logger.Logger.add_results
torch.zeros_like
mat.max
n.self.model_old_dict.p.n.self.fisher_old.sum
tasks.get_task_list
logger.info
PIL.Image.open
p.clone.detach
self.ResNet.super.__init__
torch.from_numpy
torch.nn.functional.interpolate
torchvision.transforms.functional.rotate
self.body
math.log
dataset.transform.RandomResizedCrop
numpy.mean
lbl.label2color.transpose
self.logger.add_image
sample.apply_
TypeError
random.seed
round
fil
inputs.shape.labels_new.F.one_hot.float.permute
float
logging.info
modules.ResidualBlock
model_old.state_dict
tensorboardX.SummaryWriter
utils.logger.Logger.add_table
torch.nn.MSELoss
train_loader.sampler.set_epoch
prediction.cpu.numpy
math.exp
labels.cpu.numpy.to
lp.flatten
functools.partial
torch.nn.init.calculate_gain
voc_cmap
self.get_score.items
self.ResidualBlock.super.__init__
segmentation_module.make_model
inputs.narrow.narrow
x.idxs.append
self.order.index
self.lkd_loss
torch.nn.BCEWithLogitsLoss
labels.outputs.mean
train.Trainer.validate
torchvision.transforms.functional.pad
os.path.expanduser
labels.cpu.numpy
os.path.exists
self.get_params
setattr
torch.FloatTensor
get_dataset
IncrementalSegmentationModule
torch.masked_select
optim.step
repr
freq.iu.freq.freq.sum
v.to
torchvision.transforms.functional.adjust_hue
zip
torch.load
par.to
torch.arange
apex.parallel.DistributedDataParallel.fix_bn
cityscapes_cmap
self.fisher.items
torch.sum
images.to.to
dataset
argparser.get_argparser.parse_args
dataset.transform.Compose
utils.loss.BCEWithLogitsLossWithIgnoreIndex
torch.optim.lr_scheduler.StepLR
torch.nn.init.constant_
apex.parallel.DistributedDataParallel.load_state_dict
utils.logger.Logger.print
utils.logger.Logger.add_figure
dataset.transform.CenterCrop
random.randint
min
self.confusion_matrix_to_fig
ax.figure.colorbar
torch.nn.MaxPool2d
torchvision.transforms.functional.adjust_brightness
SegmentationModule
focal_loss.mean
FocalLoss
self._check_input
self.red_conv
util.try_index
images.detach.cpu
torchvision.transforms.functional.to_tensor
self.classifier
torchvision.transforms.functional.center_crop
self.map_bn
self.regularizer.penalty.item
self.cls.bias.data.copy_
torch.cuda.manual_seed
self.confusion_matrix.astype
p.detach
argparse.ArgumentParser.add_argument
str
utils.logger.Logger.debug
self.__strip_zero
list
x.view
torch.distributed.get_world_size
train.Trainer.train
outputs.max
prediction.cpu.numpy.cpu
transforms.append
self.global_pooling_conv
torch.isinf
self.logger.add_text
model.modules
torch.utils.data.random_split
self.total_samples.torch.tensor.to.cpu
mat.min
filter
train.Trainer.state_dict
ax.imshow
apex.parallel.DistributedDataParallel.to
tasks.get_per_task_classes
len
self.proj_bn
utils.get_regularizer
n.self.model_old_dict.p.pow
reg_loss.torch.tensor.to
torchvision.transforms.functional.hflip
utils.Label2Color
self.bn1
self.device.n.self.model_temp.to.p.detach.pow
torch.nn.GroupNorm
ade_cmap
Lambda
numpy.array
outputs.narrow
lbl.label2color.transpose.astype
self._std.reshape
self.model_old
torch.nn.functional.cross_entropy
RuntimeError
criterion
model
self.delta.items
self.mod1
model.state_dict
normalize_fn
params.append
par.torch.clone.to
self.logger.add_figure
self.get
models.util.try_index
ValueError
enumerate
torch.nn.Conv2d
self.mod3
self.info
model.head.parameters
img.denorm.astype
numpy.unique
model.train
n.self.score_plus_fisher.mean
random.shuffle
all
labels.remove
self.pool_red_conv
results.items
opts.backbone.models.__dict__.load_state_dict
p.to
fisher.items
inputs.shape.labels_new.F.one_hot.float.permute.clone
main
self.transform
self.book.get
format
_NETS.items
utils.logger.Logger.close
torch.sigmoid
range
dataset.transform.RandomHorizontalFlip
task_dict.keys
ax.set
self.logger.close
self.convs.clone
PI
callable
loss.mean.item
n.self.model_old_dict.p.pow.n.self.score_plus_fisher.sum
isinstance
torch.save
EWC
copy.deepcopy
torch.nn.Sequential
torch.nn.init.xavier_normal_
self.lde_loss.item
dropout
utils.Subset
self.total_samples.torch.tensor.to
self.modules
logger.add_scalar
apex.parallel.DistributedDataParallel.eval
outputs_no_bgk.labels.sum
open
self.FocalLoss.super.__init__
self.confusion_matrix.torch.tensor.to.cpu
torchvision.transforms.functional.resize
p.torch.clone.detach.cpu
utils.loss.IcarlLoss
blocks.append
self.confusion_matrix.sum
mask.float.mean
torch.manual_seed
utils.color_map
torch.nn.ModuleList
image_set.rstrip
self.IdentityResidualBlock.super.__init__
numpy.zeros
self.mod4
t.apply_
dataset.transform.ToTensor
self.global_pooling_bn
x.size.x.size.x.view.mean
self._network.append
utils.loss.UnbiasedCrossEntropy
torch.index_select
argparser.get_argparser
model.cls.parameters
torch.isnan
self._stride_dilation
cls.weight.data.copy_
torch.tensor.to
self.licarl
torchvision.transforms.functional.vflip
x.size
numpy.save
torch.cuda.set_device
utils.logger.Logger.add_scalar
numpy.diag
res.values
self.n_classes.mask.label_pred.int.mask.label_true.astype.self.n_classes.np.bincount.reshape
inputs.shape.torch.tensor.to
torch.logsumexp
collections.OrderedDict
torch.utils.data.distributed.DistributedSampler
self._network
super.__init__
torch.clone
os.listdir
FileNotFoundError
torch.where
torch.nn.functional.elu
torchvision.transforms.Lambda
bitget
new_bias.squeeze
self.confusion_matrix.torch.tensor.to
self.proj_conv
self.regularizer.state_dict
self.regularizer.penalty
metrics.StreamSegMetrics.to_str
metrics.get_results
save_ckpt
apex.parallel.DistributedDataParallel
torch.optim.SGD.state_dict
torch.device
inputs.shape.labels_new.F.one_hot.float.permute.sum
self.model_old.state_dict
mask.float.sum
max
in_size.in_size.inputs.view.mean
hasattr
logging.error
torch.mean
numpy.zeros.astype
self.add_module
ret_samples.append
focal_loss.sum
scheduler.step
apex.parallel.DistributedDataParallel.train
self.lkd_loss.item
tasks_voc.keys
dataset.transform.Normalize
n.self.score_old.to
torch.distributed.barrier
torch.distributed.init_process_group
index.self.images.Image.open.convert
functools.reduce
torch.optim.SGD
self.target_transform
torchvision.transforms.functional.normalize
confusion_matrix.cpu.numpy
target.label2color.transpose
torch.ones_like
inputs.view
model.body.parameters
self.logger.add_scalar
p.torch.clone.detach.to
super
self.reset_parameters
print
tuple

@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.

Question of validation result in step1, 15-5s

Hello. I am running 15-5s with VOC dataset.
During the validation in step1(after the step0), my result is like,

Class IoU:
class 0: 0.8864838886464639
class 1: X
class 2: X
class 3: X
class 4: X
class 5: X
class 6: X
class 7: X
class 8: X
class 9: X
class 10: X
class 11: X
class 12: X
class 13: X
class 14: X
class 15: X
class 16: 0.5655430208625937
Class Acc:
class 0: 0.9604431435003239
class 1: X
class 2: X
class 3: X
class 4: X
class 5: X
class 6: X
class 7: X
class 8: X
class 9: X
class 10: X
class 11: X
class 12: X
class 13: X
class 14: X
class 15: X
class 16: 0.6672904492468744

Is this appropriate result..??
All of the old classes are X.
If this is not appropriate, could you tell me some advice of it?
Thank you :)

Different result in disjoint 15-5s setting.

Hello,
I am trying to reproduce disjoint 15-5s setting.
But my result is very different from yours.

My command is :
/home/nayoung/nayoung/MiB/run.py --data_root '/home/nayoung/nayoung/' --batch_size 10 --dataset voc --name MIB --task 15-5s --step 0 --lr 0.01 --epochs 30 --method MiB
for step1~5 :
/home/nayoung/nayoung/MiB/run.py --data_root '/home/nayoung/nayoung/' --batch_size 10 --dataset voc --name MIB --task 15-5s --step 5 --lr 0.001 --epochs 30 --method MiB

I used batch size 10 becuz of cuda memory, and I didn't used the pretrained model.
Also I set the loss_kd=100.

background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor
0.857241 0.596404 0.249615 0.489829 0.336007 0.254114 0.694971 0.631736 0.539938 0.124421 0.380302 0.230107 0.470491 0.438303 0.5446194 0.615698          
0.822046 0.571647 0.246822 0.475479 0.322084 0.202237 0.607911 0.599167 0.515914 0.123029 0.300344 0.230606 0.447299 0.413728 0.5464546 0.603189 0.06315        
0.812532 0.53895 0.238265 0.418296 0.236745 0.17652 0.540683 0.536196 0.477998 0.089279 0.28096 0.100062 0.383524 0.36603 0.5146568 0.589685 0.056601 0.065537      
0.523217 0.503853 0.216371 0.287688 0.198159 0.151194 0.494373 0.503627 0.455402 0.093359 0.119011 0.123516 0.33748 0.289346 0.5154741 0.565243 0.049754 0.061803 0.035291    
0.424163 0.464728 0.215501 0.285088 0.162308 0.139302 0.465628 0.475487 0.407798 0.062629 0.131808 0.035045 0.331196 0.272611 0.4768657 0.551702 0.04413 0.06458 0.030589 0.110248  
0.303423 0.404756 0.196714 0.210973 0.101944 0.115709 0.366374 0.38747 0.39362 0.044943 0.073729 0.031481 0.310951 0.23618 0.4594278 0.545644 0.04088 0.061531 0.026771 0.092094 0.020551
                                         
class mIoU 0.51339 0.227215 0.361226 0.226208 0.173179 0.528324 0.522281 0.465112 0.08961 0.214359 0.125136 0.380157 0.336033 0.5095831 0.578527 0.050903 0.063363 0.030884 0.101171 0.020551

1-15 : 0.350022
16-20 : 0.053374
all : 0.27586

Code for generating data splits

Hi,

Could you please share the code for generating data splits under data/ folder, it is convenient to use default splits such as (19-1, 15-5, 15-5s, 100-50), but It would be good to have the code, I am interested in some different splits.

BTW, I found the split voc 15-5s step-0 seems wrong, but I can use the one from 15-5, they should be the same.

Thanks.

Question about how to install inplace-abn

Thanks for your great work!
How do I install the corresponding version of Inplace-ABN on Windows?
'Pip install inplace-abn' doesn't work.Neither does 'git clone' the package and then 'python setup.py install'.
I look forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.