isl-org / lang-seg Goto Github PK
View Code? Open in Web Editor NEWLanguage-Driven Semantic Segmentation
License: MIT License
Language-Driven Semantic Segmentation
License: MIT License
I try to run a zero-shot demo. I compiler and install torch-encoding in gcc7.5.
(lang-seg) [zhongzm@ai_gpu28 lang-seg]$ python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset fss \
> --widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \
> --weights checkpoints/fss_l16.ckpt
Traceback (most recent call last):
File "test_lseg_zs.py", line 8, in <module>
from modules.lseg_module_zs import LSegModuleZS
File "/public/home/zhongzm/project/lang-seg/modules/lseg_module_zs.py", line 7, in <module>
from .lsegmentation_module_zs import LSegmentationModuleZS
File "/public/home/zhongzm/project/lang-seg/modules/lsegmentation_module_zs.py", line 13, in <module>
from encoding.models import get_segmentation_model
File "/public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/__init__.py", line 13, in <module>
from . import nn, functions, parallel, utils, models, datasets, transforms
File "/public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/nn/__init__.py", line 12, in <module>
from .encoding import *
File "/public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/nn/encoding.py", line 18, in <module>
from ..functions import scaled_l2, aggregate, pairwise_cosine
File "/public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/functions/__init__.py", line 2, in <module>
from .encoding import *
File "/public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/functions/encoding.py", line 15, in <module>
from encoding import cpu
ImportError: /public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jS2_
And I try to reinstall the env then get this error.
File "/public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/functions/encoding.py", line 17, in <module>
from encoding import gpu
ImportError: cannot import name 'gpu' from partially initialized module 'encoding' (most likely due to a circular import) (/public/home/zhongzm/anaconda3/envs/lang-seg/lib/python3.8/site-packages/encoding/__init__.py)```
Hi,
I have a question on the difference of settings between the demo in your README and the experiment in your paper.
In the README, you published the pre-trained weight for demo.
It says while training the backbones for both image and text are ViT-L/16
.
The section 5.1 in your paper says
We used LSeg with DPT and a smaller
ViT-B/32
backbone together with the CLIPViT-B/32
text encoder ...
When reproducing your results in 5.1, does that require a full-scratch training with ViT-B/32 backbone for the images?
Also, are there any other differences, such as batch size? More specifically, How do I change the arguments in train.sh
?
Finally, is it possible to share with us (or me) the weight used for your results?
Thank you in advance.
model = _load_state(cls, checkpoint, strict=strict, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 158, in _load_state
obj = cls(**_cls_kwargs)
File "/teamspace/studios/this_studio/lang-seg/modules/lseg_module.py", line 55,
There is a flag --finetune_weights in modules/lseg_module.py, I do not see where this flag is used. Can it be directly used to fine-tune a pre-trained model on a new dataset?
Hi, thanks for open-sourcing the code.
I have a quick question:
What's the reason for choosing DPT as the image encoder?
What should I note if I want to use other encoders (e.g., HR-Net)?
I am running windows and I have issues with installing this project, specifically for the torch encoding package.
Some primary error include:
error: ninja: error: loading 'build.ninja': The system cannot find the file specified.
and
Error building extension 'enclib_cpu'
These errors are usually in tandem with a giant list of other errors presumable in dependencies. When I tried to build the package via Docker, similar issues arose as well.
Things I have tried:
I am running on a Windows 10 machine.
Are there any fixes or guides to get lang-seg to work under these circumstances?
I cannot download the torch-encoding library. When running the lseg_app.py file, I encounter the following error:
File "/jinx/language-drive-seg/lang-seg-main/data/init.py", line 17, in
import encoding.datasets as enc_ds
ModuleNotFoundError: No module named 'encoding'
I found that it is likely due to not having the torch-encoding library in the dependencies. After attempting to download it with the command, I encountered an error as shown in the screenshot. Could you please advise on how to resolve this issue?
hi、i saw the paper that text feature dim is N 、but N is not sure in different image , so how can we design the Spatial regularization Structure at the back
I want to use a ResNet-based LSeg and I did the following:
Generally, I added a elif branch in _make_encoder(), which returns a resnet101, and modified the dimension in _make_scratch as [256,512,1024,2048]. I also replaced the forward_vit in lseg_net.py with a vanilla ResNet forward (return 4-stage output). To this end, I could start training, but could not get expected performance .
I might plug in ResNet wrongly or miss some points. Is there any demos of ResNet-based LSeg and if there is any ResNet pre-trained weights of LSeg? Thanks!
If I use eight GPUS, in addition to modifying the batch size=8, do I need to modify other parameters such as --num_nodes and lr?
Hi, I would like to tried your code, but the error shows up when I tried to install the pytorch-encoding. Could you give us your environment info of CUDA GPU python g++ and so on?Do you have any advice about installing it,which I see a lot of people get the error.
My env:
OS: Ubuntu 18.04
gcc: 7.5.0
GPU:3090
driver:515
CUDA: tried 11.7 and 10.2
pytorch: tried 1.12 and 1.7.1
Thanks for your excellent research.
I have a question about Lseg's zero-shot setting.
Was the Lseg trained in an inductive zero-shot setting?
What is the difference between inductive zero-shot setting and language-driven semantic segmentation setting in the training step?
The settings for zero-shot semantic segmentation are confusing. Please help me.
Thank you for reading.
Hi, I have a reproduction issue for Table 5. In the paper, the LSeg with ViT-B/32 backbone achieves 79.7 pixAcc and 37.8 mIoU. However, I only get 78.9 pixAcc and 33.7 mIoU by using the released code. The reproduced pixAcc/mIoU are not as expected.
Our reproduction command is as follows on 8 GPU cards.
python -u train_lseg.py --dataset ade20k --data_path datasets --batch_size 4 --exp_name lseg_ade20k_b32_240e --base_lr 0.004 --weight_decay 1e-4 --no-scaleinv --max_epochs 240 --widehead --accumulate_grad_batches 2 --backbone clip_vitb32_384
So what is the reason of the performance gap? I may miss some detail settings.
By the way, I encounter a warning when running the released code.
[W reducer.cpp:283] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [512, 768, 1, 1], strides() = [768, 1, 768, 768]
Have ever met this warning? May the performance gap caused by this warning?
Look forward to your reply.
Hi! Thanks for the good work!
I met a problem when I tried to run test.sh. The error said, "cannot import name 'Resize' from 'utils'". I have checked utils.py, there is no function or class called 'Resize'. Is the code missing this part?
Looking forward to your reply.
Hi, could you please provide the range of the learning rate, or other hyper-parameter settings for the zero-shot experiments on the COCO-20i dataset? It is difficult to reproduce the results shown in the paper.
I use ViT-L/16 as backbone, and the results are 10 points lower than yours.
Hi! Thanks for the great work
I can't seem to import encoding after following the installation steps. The error I got is "cannot import name 'gpu' from partially initialized module 'encoding' (most likely due to a circular import)". Can you please let me know whether you know the cause of this issue? Thanks!
When I train the code, the multi-GPU allocation stops at the following location:
"
Resuming checkpoint None, exp_version=None
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/5
Resuming checkpoint None, exp_version=None
initializing ddp: GLOBAL_RANK: 4, MEMBER: 5/5
"
Is this training? Or something wrong ?
Hi Boyi,
Thanks a lot for releasing the code for LSeg!
I’ve been having a play around using the demo code + model in a zero-shot setting and just have a few (hopefully quick) questions about some of the model parameters.
Could you please give a brief overview (description, where the default values originate, what the optimum values might be) of the following parameters used in the LSeg_MultiEvalModule:
1. ‘scales’ - e.g., lseg_app.py Line 315
2. (‘base_size’ - e.g., additional_utils/models.py Line 28)
3. ‘crop_size’ - e.g., additional_utils/models.py Line 29
And this parameter used in the LSegNet class:
5. scale_factor - e.g., module/models/lseg_net.py Line 216 (this has a default value of 0.5 and is different to the scale_factor parameter that is passed to 'Interpolate')
Thanks!
Hi! Can you please let me know what is the correct training configuration to reproduce the performance reported in the paper?
In the paper you mentioned that 6 GPUs were used and batch size was set to 6. Does this mean that I should just launch train.sh with 6 GPUs available? And can you please let me know what is the approximate time for training? Thanks!
Hi,
Thanks for open-sourcing such great work. I have some questions when using this code:
test_lseg.py
script support multi-GPU inference? When using a single GPU, it takes about 2~3 hours for inference on ade20k.demo_e200.ckpt
on ade20k and got (pixAcc: 0.8078, mIoU: 0.3207), is that correct? It seems lower than the values in the paper.train.sh
, backbone is vit_l16_384) with 8*V100 but found it needs ~90 hours for training 240 epochs. Is it reasonable (it seems much longer than you said in #7)?get_labels()
in lseg_module.py
. Have you evaluated the mIoU on cityscapes?Thanks in advance.
where should i download the pertained model lseg_ade20k_l16.ckpt, as depicted in test.sh.
Hi! Thanks for your interesting work!
I am trying to reproduce the zero-shot experiments in the paper recently, but like #19 (comment) , it gets mIoU much lower than yours.
Here is my scripts:
train_lseg_zs.py:
from modules.lseg_module_zs import LSegModuleZS
from utils import do_training, get_default_argument_parser
if __name__ == "__main__":
parser = LSegModuleZS.add_model_specific_args(get_default_argument_parser())
args = parser.parse_args()
do_training(args, LSegModuleZS)
command:
python -u train_lseg_zs.py --backbone clip_resnet101 --exp_name lsegzs_pascal_f0 --dataset pascal \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 --batch_size 8 \
Default aruguments: base_lr=0.004, weight_decay=1e-4, momentum=0.9
I wonder where the problem is. And could you please share your training scripts for the zero-shot experiment?
Thanks for your great jobs, I would like to know how many gpus is need to train this model?
Hi,
Thanks for your great work. I believe the link to the zero-shot COCO fold 1 model is identical to fold 2, or did I miss anything?
Could you please take a look?
Many thanks,
Good day,
I was wondering whether the demo model available from the repo (demo_e200.ckpt
) was solely trained on ADE20K as specified in the repo or whether it was trained on all 7 datasets presented in section 5.2. This is unclear to me since the demo model works well with classes that are not covered by ADE20K such as the animals, and these classes are covered by others such as COCO.
In case it was trained on multiple datasets, I would like to know how to do so myself.
Thank you in advance.
System: 4xRTX3090.
Training scripts: Default training scripts:
python -u train_lseg.py --dataset ade20k --data_path ./datasets --batch_size 1 --exp_name lseg_ade20k_l16
--base_lr 0.004 --weight_decay 1e-4 --no-scaleinv --max_epochs 200 --widehead --accumulate_grad_batches 2 --backbone clip_vitl16_384
My system has shown that one epoch takes for 45mins, which is a pretty long time for 200 epochs. Is that a normal procedure? Or we may not need max epochs like that?
Thank you very much for your innovative work.
I want to know the update code for the image encoder on your model, but I can't find him, which is very important to me. Like you, I also want to use clip's text encoder, but it prompts me to 'Try to back through the graph a second time'. There are two networks that need backpropagation, please tell me how you did it.
Hi, @Boyiliee ,
Great work. I just follow your instruction to run the demo and failed. The issue occurred when loading the model from the released checkpoint. I attach the errors below:
"super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted"
looking forward to your feedback.
Hi, the paper pointed out that we should freeze the text encoder of CLIP during training. I wonder how to achieve it and where is the corrresponding code. Thanks!
Congrats on your paper accepted to ICLR 2022!
Do you have your pretrained models on 4 folds of Pascal-5i and COCO-20i? Can you share them?
I really appreciate your response.
Hi thanks for the interesting work and demo!
I wrote train_lseg_zs.py
based on train_lseg.py
to train a zero-shot model, it gets mIoU = 28.36% (pascal fold 0, best val miou=27.51%, epoch=0), versus 52.8% reported in the paper. I have tested the pretrained model pascal_fold0.ckpt
and get mIoU = 52.8%.
So I wonder how the model is trained? And could you please provide training scripts for the zero-shot experiment?
Here is my script:
train_lseg_zs.py
from modules.lseg_module_zs import LSegModuleZS
from utils import do_training, get_default_argument_parser
if __name__ == "__main__":
parser = LSegModuleZS.add_model_specific_args(get_default_argument_parser())
args = parser.parse_args()
do_training(args, LSegModuleZS)
command:
python -u train_lseg_zs.py --backbone clip_resnet101 --exp_name lsegzs_pascal_f0 --dataset pascal \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 --batch_size 8 \
Default aruguments: base_lr=0.004, weight_decay=1e-4, momentum=0.9
Here is the log (Epoch01-05):
:=========== Few-shot Seg. with HSNet ===========
| logpath:
| benchmark: pascal
| bsz: 8
| fold: 0
| nshot: 0
| finetune_mode: False
:================================================
[Epoch: 00] [Batch: 0001/0125] L: 0.76545 Avg L: 0.76545 mIoU: 8.32 | FB-IoU: 13.67
*** Validation [@Epoch 00] Avg L: 0.77806 mIoU: 8.66 FB-IoU: 12.32 ***
[Epoch: 00] [Batch: 0001/1425] L: 0.70466 Avg L: 0.70466 mIoU: 5.47 | FB-IoU: 30.35
[Epoch: 00] [Batch: 0051/1425] L: 0.47214 Avg L: 0.54077 mIoU: 38.52 | FB-IoU: 58.87
[Epoch: 00] [Batch: 0101/1425] L: 0.51257 Avg L: 0.52989 mIoU: 41.43 | FB-IoU: 62.78
[Epoch: 00] [Batch: 0151/1425] L: 0.47782 Avg L: 0.51624 mIoU: 44.89 | FB-IoU: 65.67
[Epoch: 00] [Batch: 0201/1425] L: 0.44150 Avg L: 0.50688 mIoU: 47.24 | FB-IoU: 67.38
[Epoch: 00] [Batch: 0251/1425] L: 0.49717 Avg L: 0.49714 mIoU: 48.49 | FB-IoU: 68.80
[Epoch: 00] [Batch: 0301/1425] L: 0.45244 Avg L: 0.49022 mIoU: 49.80 | FB-IoU: 69.67
[Epoch: 00] [Batch: 0351/1425] L: 0.44908 Avg L: 0.48395 mIoU: 50.82 | FB-IoU: 70.51
[Epoch: 00] [Batch: 0401/1425] L: 0.47295 Avg L: 0.47964 mIoU: 51.95 | FB-IoU: 71.10
[Epoch: 00] [Batch: 0451/1425] L: 0.51765 Avg L: 0.47718 mIoU: 52.74 | FB-IoU: 71.66
[Epoch: 00] [Batch: 0501/1425] L: 0.46239 Avg L: 0.47528 mIoU: 53.60 | FB-IoU: 72.06
[Epoch: 00] [Batch: 0551/1425] L: 0.45335 Avg L: 0.47240 mIoU: 54.53 | FB-IoU: 72.64
[Epoch: 00] [Batch: 0601/1425] L: 0.49251 Avg L: 0.47043 mIoU: 54.87 | FB-IoU: 72.97
[Epoch: 00] [Batch: 0651/1425] L: 0.45890 Avg L: 0.46813 mIoU: 55.26 | FB-IoU: 73.36
[Epoch: 00] [Batch: 0701/1425] L: 0.41210 Avg L: 0.46570 mIoU: 55.87 | FB-IoU: 73.77
[Epoch: 00] [Batch: 0751/1425] L: 0.48672 Avg L: 0.46475 mIoU: 56.46 | FB-IoU: 74.08
[Epoch: 00] [Batch: 0801/1425] L: 0.43756 Avg L: 0.46335 mIoU: 57.12 | FB-IoU: 74.42
[Epoch: 00] [Batch: 0851/1425] L: 0.50632 Avg L: 0.46193 mIoU: 57.38 | FB-IoU: 74.69
[Epoch: 00] [Batch: 0901/1425] L: 0.42512 Avg L: 0.46019 mIoU: 57.81 | FB-IoU: 74.98
[Epoch: 00] [Batch: 0951/1425] L: 0.43964 Avg L: 0.45871 mIoU: 58.42 | FB-IoU: 75.25
[Epoch: 00] [Batch: 1001/1425] L: 0.45453 Avg L: 0.45688 mIoU: 58.87 | FB-IoU: 75.46
[Epoch: 00] [Batch: 1051/1425] L: 0.46058 Avg L: 0.45650 mIoU: 59.39 | FB-IoU: 75.68
[Epoch: 00] [Batch: 1101/1425] L: 0.44513 Avg L: 0.45552 mIoU: 59.83 | FB-IoU: 75.89
[Epoch: 00] [Batch: 1151/1425] L: 0.48968 Avg L: 0.45476 mIoU: 60.06 | FB-IoU: 76.05
[Epoch: 00] [Batch: 1201/1425] L: 0.37759 Avg L: 0.45366 mIoU: 60.34 | FB-IoU: 76.26
[Epoch: 00] [Batch: 1251/1425] L: 0.33930 Avg L: 0.45289 mIoU: 60.50 | FB-IoU: 76.39
[Epoch: 00] [Batch: 1301/1425] L: 0.39120 Avg L: 0.45196 mIoU: 60.78 | FB-IoU: 76.57
[Epoch: 00] [Batch: 1351/1425] L: 0.41460 Avg L: 0.45114 mIoU: 61.08 | FB-IoU: 76.73
[Epoch: 00] [Batch: 1401/1425] L: 0.44821 Avg L: 0.45055 mIoU: 61.29 | FB-IoU: 76.87
*** Training [@Epoch 00] Avg L: 0.44989 mIoU: 61.41 FB-IoU: 76.94 ***
[Epoch: 00] [Batch: 0001/0125] L: 0.44296 Avg L: 0.66636 mIoU: 8.40 | FB-IoU: 24.95
[Epoch: 00] [Batch: 0051/0125] L: 0.40932 Avg L: 0.43308 mIoU: 26.50 | FB-IoU: 57.01
[Epoch: 00] [Batch: 0101/0125] L: 0.43034 Avg L: 0.42625 mIoU: 27.23 | FB-IoU: 58.17
*** Validation [@Epoch 00] Avg L: 0.42385 mIoU: 27.51 FB-IoU: 58.78 ***
[Epoch: 01] [Batch: 0001/1425] L: 0.40329 Avg L: 0.44986 mIoU: 61.41 | FB-IoU: 76.94
[Epoch: 01] [Batch: 0051/1425] L: 0.43004 Avg L: 0.44835 mIoU: 61.71 | FB-IoU: 77.12
[Epoch: 01] [Batch: 0101/1425] L: 0.48569 Avg L: 0.44722 mIoU: 62.09 | FB-IoU: 77.29
[Epoch: 01] [Batch: 0151/1425] L: 0.34780 Avg L: 0.44597 mIoU: 62.34 | FB-IoU: 77.47
[Epoch: 01] [Batch: 0201/1425] L: 0.46758 Avg L: 0.44531 mIoU: 62.66 | FB-IoU: 77.64
[Epoch: 01] [Batch: 0251/1425] L: 0.44974 Avg L: 0.44444 mIoU: 62.86 | FB-IoU: 77.78
[Epoch: 01] [Batch: 0301/1425] L: 0.44508 Avg L: 0.44375 mIoU: 63.05 | FB-IoU: 77.90
[Epoch: 01] [Batch: 0351/1425] L: 0.44855 Avg L: 0.44306 mIoU: 63.22 | FB-IoU: 78.04
[Epoch: 01] [Batch: 0401/1425] L: 0.43203 Avg L: 0.44248 mIoU: 63.46 | FB-IoU: 78.18
[Epoch: 01] [Batch: 0451/1425] L: 0.45815 Avg L: 0.44203 mIoU: 63.72 | FB-IoU: 78.32
[Epoch: 01] [Batch: 0501/1425] L: 0.40439 Avg L: 0.44159 mIoU: 63.93 | FB-IoU: 78.45
[Epoch: 01] [Batch: 0551/1425] L: 0.38655 Avg L: 0.44125 mIoU: 64.11 | FB-IoU: 78.54
[Epoch: 01] [Batch: 0601/1425] L: 0.44867 Avg L: 0.44079 mIoU: 64.27 | FB-IoU: 78.62
[Epoch: 01] [Batch: 0651/1425] L: 0.39662 Avg L: 0.44018 mIoU: 64.45 | FB-IoU: 78.73
[Epoch: 01] [Batch: 0701/1425] L: 0.40490 Avg L: 0.43969 mIoU: 64.63 | FB-IoU: 78.82
[Epoch: 01] [Batch: 0751/1425] L: 0.49497 Avg L: 0.43916 mIoU: 64.80 | FB-IoU: 78.91
[Epoch: 01] [Batch: 0801/1425] L: 0.41001 Avg L: 0.43860 mIoU: 65.06 | FB-IoU: 79.03
[Epoch: 01] [Batch: 0851/1425] L: 0.47044 Avg L: 0.43825 mIoU: 65.24 | FB-IoU: 79.13
[Epoch: 01] [Batch: 0901/1425] L: 0.42717 Avg L: 0.43781 mIoU: 65.47 | FB-IoU: 79.21
[Epoch: 01] [Batch: 0951/1425] L: 0.46506 Avg L: 0.43753 mIoU: 65.62 | FB-IoU: 79.30
[Epoch: 01] [Batch: 1001/1425] L: 0.39906 Avg L: 0.43723 mIoU: 65.72 | FB-IoU: 79.37
[Epoch: 01] [Batch: 1051/1425] L: 0.47572 Avg L: 0.43697 mIoU: 65.85 | FB-IoU: 79.45
[Epoch: 01] [Batch: 1101/1425] L: 0.42398 Avg L: 0.43653 mIoU: 65.94 | FB-IoU: 79.50
[Epoch: 01] [Batch: 1151/1425] L: 0.46927 Avg L: 0.43618 mIoU: 66.06 | FB-IoU: 79.56
[Epoch: 01] [Batch: 1201/1425] L: 0.44691 Avg L: 0.43587 mIoU: 66.17 | FB-IoU: 79.64
[Epoch: 01] [Batch: 1251/1425] L: 0.41890 Avg L: 0.43541 mIoU: 66.29 | FB-IoU: 79.71
[Epoch: 01] [Batch: 1301/1425] L: 0.48377 Avg L: 0.43498 mIoU: 66.46 | FB-IoU: 79.78
[Epoch: 01] [Batch: 1351/1425] L: 0.32745 Avg L: 0.43464 mIoU: 66.52 | FB-IoU: 79.85
[Epoch: 01] [Batch: 1401/1425] L: 0.44876 Avg L: 0.43403 mIoU: 66.63 | FB-IoU: 79.93
*** Training [@Epoch 01] Avg L: 0.43380 mIoU: 66.73 FB-IoU: 79.97 ***
[Epoch: 01] [Batch: 0001/0125] L: 0.44064 Avg L: 0.42399 mIoU: 27.44 | FB-IoU: 58.71
[Epoch: 01] [Batch: 0051/0125] L: 0.39828 Avg L: 0.42200 mIoU: 23.29 | FB-IoU: 56.57
[Epoch: 01] [Batch: 0101/0125] L: 0.41144 Avg L: 0.42079 mIoU: 20.82 | FB-IoU: 55.18
*** Validation [@Epoch 01] Avg L: 0.42004 mIoU: 20.03 FB-IoU: 54.81 ***
[Epoch: 02] [Batch: 0001/1425] L: 0.43414 Avg L: 0.43380 mIoU: 66.73 | FB-IoU: 79.97
[Epoch: 02] [Batch: 0051/1425] L: 0.47006 Avg L: 0.43274 mIoU: 66.83 | FB-IoU: 80.04
[Epoch: 02] [Batch: 0101/1425] L: 0.39890 Avg L: 0.43218 mIoU: 66.90 | FB-IoU: 80.11
[Epoch: 02] [Batch: 0151/1425] L: 0.43031 Avg L: 0.43159 mIoU: 67.06 | FB-IoU: 80.19
[Epoch: 02] [Batch: 0201/1425] L: 0.41378 Avg L: 0.43115 mIoU: 67.17 | FB-IoU: 80.25
[Epoch: 02] [Batch: 0251/1425] L: 0.41220 Avg L: 0.43086 mIoU: 67.30 | FB-IoU: 80.33
[Epoch: 02] [Batch: 0301/1425] L: 0.37929 Avg L: 0.43035 mIoU: 67.39 | FB-IoU: 80.40
[Epoch: 02] [Batch: 0351/1425] L: 0.44048 Avg L: 0.42986 mIoU: 67.48 | FB-IoU: 80.46
[Epoch: 02] [Batch: 0401/1425] L: 0.37508 Avg L: 0.42955 mIoU: 67.67 | FB-IoU: 80.53
[Epoch: 02] [Batch: 0451/1425] L: 0.43737 Avg L: 0.42913 mIoU: 67.79 | FB-IoU: 80.61
[Epoch: 02] [Batch: 0501/1425] L: 0.38389 Avg L: 0.42876 mIoU: 67.88 | FB-IoU: 80.67
[Epoch: 02] [Batch: 0551/1425] L: 0.36958 Avg L: 0.42827 mIoU: 68.02 | FB-IoU: 80.75
[Epoch: 02] [Batch: 0601/1425] L: 0.39566 Avg L: 0.42797 mIoU: 68.12 | FB-IoU: 80.82
[Epoch: 02] [Batch: 0651/1425] L: 0.36679 Avg L: 0.42770 mIoU: 68.26 | FB-IoU: 80.88
[Epoch: 02] [Batch: 0701/1425] L: 0.38809 Avg L: 0.42742 mIoU: 68.35 | FB-IoU: 80.93
[Epoch: 02] [Batch: 0751/1425] L: 0.32842 Avg L: 0.42722 mIoU: 68.43 | FB-IoU: 81.00
[Epoch: 02] [Batch: 0801/1425] L: 0.26225 Avg L: 0.42675 mIoU: 68.53 | FB-IoU: 81.07
[Epoch: 02] [Batch: 0851/1425] L: 0.33936 Avg L: 0.42639 mIoU: 68.67 | FB-IoU: 81.14
[Epoch: 02] [Batch: 0901/1425] L: 0.38384 Avg L: 0.42603 mIoU: 68.79 | FB-IoU: 81.20
[Epoch: 02] [Batch: 0951/1425] L: 0.39195 Avg L: 0.42583 mIoU: 68.87 | FB-IoU: 81.25
[Epoch: 02] [Batch: 1001/1425] L: 0.45193 Avg L: 0.42544 mIoU: 68.97 | FB-IoU: 81.31
[Epoch: 02] [Batch: 1051/1425] L: 0.34169 Avg L: 0.42516 mIoU: 69.07 | FB-IoU: 81.37
[Epoch: 02] [Batch: 1101/1425] L: 0.38230 Avg L: 0.42485 mIoU: 69.17 | FB-IoU: 81.42
[Epoch: 02] [Batch: 1151/1425] L: 0.34792 Avg L: 0.42452 mIoU: 69.24 | FB-IoU: 81.47
[Epoch: 02] [Batch: 1201/1425] L: 0.37395 Avg L: 0.42417 mIoU: 69.37 | FB-IoU: 81.54
[Epoch: 02] [Batch: 1251/1425] L: 0.36396 Avg L: 0.42388 mIoU: 69.45 | FB-IoU: 81.59
[Epoch: 02] [Batch: 1301/1425] L: 0.45701 Avg L: 0.42368 mIoU: 69.51 | FB-IoU: 81.63
[Epoch: 02] [Batch: 1351/1425] L: 0.35544 Avg L: 0.42358 mIoU: 69.61 | FB-IoU: 81.68
[Epoch: 02] [Batch: 1401/1425] L: 0.40611 Avg L: 0.42350 mIoU: 69.67 | FB-IoU: 81.72
*** Training [@Epoch 02] Avg L: 0.42335 mIoU: 69.69 FB-IoU: 81.74 ***
[Epoch: 02] [Batch: 0001/0125] L: 0.48114 Avg L: 0.42028 mIoU: 19.98 | FB-IoU: 54.76
[Epoch: 02] [Batch: 0051/0125] L: 0.45244 Avg L: 0.42095 mIoU: 20.91 | FB-IoU: 55.19
[Epoch: 02] [Batch: 0101/0125] L: 0.42347 Avg L: 0.42128 mIoU: 21.42 | FB-IoU: 55.35
*** Validation [@Epoch 02] Avg L: 0.42098 mIoU: 21.69 FB-IoU: 55.55 ***
[Epoch: 03] [Batch: 0001/1425] L: 0.41656 Avg L: 0.42335 mIoU: 69.69 | FB-IoU: 81.74
[Epoch: 03] [Batch: 0051/1425] L: 0.38363 Avg L: 0.42257 mIoU: 69.78 | FB-IoU: 81.80
[Epoch: 03] [Batch: 0101/1425] L: 0.36494 Avg L: 0.42210 mIoU: 69.87 | FB-IoU: 81.86
[Epoch: 03] [Batch: 0151/1425] L: 0.31996 Avg L: 0.42191 mIoU: 69.94 | FB-IoU: 81.92
[Epoch: 03] [Batch: 0201/1425] L: 0.28822 Avg L: 0.42155 mIoU: 70.03 | FB-IoU: 81.97
[Epoch: 03] [Batch: 0251/1425] L: 0.41492 Avg L: 0.42117 mIoU: 70.13 | FB-IoU: 82.03
[Epoch: 03] [Batch: 0301/1425] L: 0.37413 Avg L: 0.42083 mIoU: 70.22 | FB-IoU: 82.08
[Epoch: 03] [Batch: 0351/1425] L: 0.44080 Avg L: 0.42038 mIoU: 70.33 | FB-IoU: 82.14
[Epoch: 03] [Batch: 0401/1425] L: 0.44819 Avg L: 0.42010 mIoU: 70.41 | FB-IoU: 82.20
[Epoch: 03] [Batch: 0451/1425] L: 0.35568 Avg L: 0.41966 mIoU: 70.49 | FB-IoU: 82.25
[Epoch: 03] [Batch: 0501/1425] L: 0.39758 Avg L: 0.41940 mIoU: 70.54 | FB-IoU: 82.30
[Epoch: 03] [Batch: 0551/1425] L: 0.44330 Avg L: 0.41916 mIoU: 70.60 | FB-IoU: 82.34
[Epoch: 03] [Batch: 0601/1425] L: 0.34404 Avg L: 0.41895 mIoU: 70.68 | FB-IoU: 82.38
[Epoch: 03] [Batch: 0651/1425] L: 0.39861 Avg L: 0.41872 mIoU: 70.75 | FB-IoU: 82.42
[Epoch: 03] [Batch: 0701/1425] L: 0.37759 Avg L: 0.41848 mIoU: 70.84 | FB-IoU: 82.47
[Epoch: 03] [Batch: 0751/1425] L: 0.38684 Avg L: 0.41823 mIoU: 70.92 | FB-IoU: 82.52
[Epoch: 03] [Batch: 0801/1425] L: 0.37498 Avg L: 0.41805 mIoU: 71.00 | FB-IoU: 82.56
[Epoch: 03] [Batch: 0851/1425] L: 0.39698 Avg L: 0.41779 mIoU: 71.10 | FB-IoU: 82.61
[Epoch: 03] [Batch: 0901/1425] L: 0.38732 Avg L: 0.41745 mIoU: 71.18 | FB-IoU: 82.66
[Epoch: 03] [Batch: 0951/1425] L: 0.39495 Avg L: 0.41708 mIoU: 71.26 | FB-IoU: 82.71
[Epoch: 03] [Batch: 1001/1425] L: 0.37841 Avg L: 0.41702 mIoU: 71.33 | FB-IoU: 82.74
[Epoch: 03] [Batch: 1051/1425] L: 0.36131 Avg L: 0.41684 mIoU: 71.37 | FB-IoU: 82.77
[Epoch: 03] [Batch: 1101/1425] L: 0.38658 Avg L: 0.41641 mIoU: 71.43 | FB-IoU: 82.81
[Epoch: 03] [Batch: 1151/1425] L: 0.28449 Avg L: 0.41623 mIoU: 71.49 | FB-IoU: 82.85
[Epoch: 03] [Batch: 1201/1425] L: 0.37200 Avg L: 0.41598 mIoU: 71.55 | FB-IoU: 82.88
[Epoch: 03] [Batch: 1251/1425] L: 0.34831 Avg L: 0.41563 mIoU: 71.61 | FB-IoU: 82.92
[Epoch: 03] [Batch: 1301/1425] L: 0.43214 Avg L: 0.41548 mIoU: 71.69 | FB-IoU: 82.97
[Epoch: 03] [Batch: 1351/1425] L: 0.39302 Avg L: 0.41528 mIoU: 71.77 | FB-IoU: 83.01
[Epoch: 03] [Batch: 1401/1425] L: 0.39410 Avg L: 0.41494 mIoU: 71.84 | FB-IoU: 83.05
*** Training [@Epoch 03] Avg L: 0.41487 mIoU: 71.87 FB-IoU: 83.07 ***
[Epoch: 03] [Batch: 0001/0125] L: 0.43704 Avg L: 0.42103 mIoU: 21.67 | FB-IoU: 55.53
[Epoch: 03] [Batch: 0051/0125] L: 0.41049 Avg L: 0.42024 mIoU: 21.10 | FB-IoU: 55.31
[Epoch: 03] [Batch: 0101/0125] L: 0.42498 Avg L: 0.41957 mIoU: 20.52 | FB-IoU: 55.03
*** Validation [@Epoch 03] Avg L: 0.41898 mIoU: 20.37 FB-IoU: 55.04 ***
[Epoch: 04] [Batch: 0001/1425] L: 0.36217 Avg L: 0.41486 mIoU: 71.87 | FB-IoU: 83.07
[Epoch: 04] [Batch: 0051/1425] L: 0.40869 Avg L: 0.41446 mIoU: 71.92 | FB-IoU: 83.10
[Epoch: 04] [Batch: 0101/1425] L: 0.39136 Avg L: 0.41423 mIoU: 71.98 | FB-IoU: 83.14
[Epoch: 04] [Batch: 0151/1425] L: 0.35325 Avg L: 0.41399 mIoU: 72.03 | FB-IoU: 83.18
[Epoch: 04] [Batch: 0201/1425] L: 0.40706 Avg L: 0.41368 mIoU: 72.11 | FB-IoU: 83.22
[Epoch: 04] [Batch: 0251/1425] L: 0.37324 Avg L: 0.41343 mIoU: 72.17 | FB-IoU: 83.26
[Epoch: 04] [Batch: 0301/1425] L: 0.35059 Avg L: 0.41320 mIoU: 72.24 | FB-IoU: 83.29
[Epoch: 04] [Batch: 0351/1425] L: 0.38189 Avg L: 0.41303 mIoU: 72.30 | FB-IoU: 83.33
[Epoch: 04] [Batch: 0401/1425] L: 0.41410 Avg L: 0.41285 mIoU: 72.34 | FB-IoU: 83.36
[Epoch: 04] [Batch: 0451/1425] L: 0.40258 Avg L: 0.41262 mIoU: 72.42 | FB-IoU: 83.40
[Epoch: 04] [Batch: 0501/1425] L: 0.39372 Avg L: 0.41237 mIoU: 72.49 | FB-IoU: 83.44
[Epoch: 04] [Batch: 0551/1425] L: 0.37758 Avg L: 0.41220 mIoU: 72.56 | FB-IoU: 83.48
[Epoch: 04] [Batch: 0601/1425] L: 0.34296 Avg L: 0.41200 mIoU: 72.62 | FB-IoU: 83.52
[Epoch: 04] [Batch: 0651/1425] L: 0.37940 Avg L: 0.41178 mIoU: 72.69 | FB-IoU: 83.55
[Epoch: 04] [Batch: 0701/1425] L: 0.38315 Avg L: 0.41164 mIoU: 72.74 | FB-IoU: 83.59
[Epoch: 04] [Batch: 0751/1425] L: 0.36820 Avg L: 0.41140 mIoU: 72.79 | FB-IoU: 83.63
[Epoch: 04] [Batch: 0801/1425] L: 0.45394 Avg L: 0.41122 mIoU: 72.84 | FB-IoU: 83.66
[Epoch: 04] [Batch: 0851/1425] L: 0.41756 Avg L: 0.41101 mIoU: 72.87 | FB-IoU: 83.68
[Epoch: 04] [Batch: 0901/1425] L: 0.41762 Avg L: 0.41072 mIoU: 72.93 | FB-IoU: 83.72
[Epoch: 04] [Batch: 0951/1425] L: 0.37698 Avg L: 0.41059 mIoU: 72.99 | FB-IoU: 83.75
[Epoch: 04] [Batch: 1001/1425] L: 0.34747 Avg L: 0.41038 mIoU: 73.04 | FB-IoU: 83.78
[Epoch: 04] [Batch: 1051/1425] L: 0.42113 Avg L: 0.41022 mIoU: 73.09 | FB-IoU: 83.82
[Epoch: 04] [Batch: 1101/1425] L: 0.31263 Avg L: 0.40999 mIoU: 73.15 | FB-IoU: 83.85
[Epoch: 04] [Batch: 1151/1425] L: 0.39397 Avg L: 0.40979 mIoU: 73.22 | FB-IoU: 83.89
[Epoch: 04] [Batch: 1201/1425] L: 0.33008 Avg L: 0.40968 mIoU: 73.28 | FB-IoU: 83.92
[Epoch: 04] [Batch: 1251/1425] L: 0.43431 Avg L: 0.40958 mIoU: 73.34 | FB-IoU: 83.95
[Epoch: 04] [Batch: 1301/1425] L: 0.38524 Avg L: 0.40942 mIoU: 73.39 | FB-IoU: 83.99
[Epoch: 04] [Batch: 1351/1425] L: 0.39327 Avg L: 0.40932 mIoU: 73.43 | FB-IoU: 84.01
[Epoch: 04] [Batch: 1401/1425] L: 0.36319 Avg L: 0.40910 mIoU: 73.48 | FB-IoU: 84.04
*** Training [@Epoch 04] Avg L: 0.40901 mIoU: 73.51 FB-IoU: 84.05 ***
[Epoch: 04] [Batch: 0001/0125] L: 0.45650 Avg L: 0.41905 mIoU: 20.35 | FB-IoU: 55.02
[Epoch: 04] [Batch: 0051/0125] L: 0.42925 Avg L: 0.41770 mIoU: 20.47 | FB-IoU: 55.09
[Epoch: 04] [Batch: 0101/0125] L: 0.40337 Avg L: 0.41644 mIoU: 20.52 | FB-IoU: 55.09
*** Validation [@Epoch 04] Avg L: 0.41569 mIoU: 20.62 FB-IoU: 55.19 ***
[Epoch: 05] [Batch: 0001/1425] L: 0.38914 Avg L: 0.40901 mIoU: 73.51 | FB-IoU: 84.05
[Epoch: 05] [Batch: 0051/1425] L: 0.38582 Avg L: 0.40863 mIoU: 73.56 | FB-IoU: 84.08
[Epoch: 05] [Batch: 0101/1425] L: 0.39982 Avg L: 0.40837 mIoU: 73.63 | FB-IoU: 84.12
[Epoch: 05] [Batch: 0151/1425] L: 0.36958 Avg L: 0.40814 mIoU: 73.70 | FB-IoU: 84.16
[Epoch: 05] [Batch: 0201/1425] L: 0.44301 Avg L: 0.40796 mIoU: 73.77 | FB-IoU: 84.19
[Epoch: 05] [Batch: 0251/1425] L: 0.37197 Avg L: 0.40778 mIoU: 73.84 | FB-IoU: 84.22
[Epoch: 05] [Batch: 0301/1425] L: 0.34206 Avg L: 0.40756 mIoU: 73.89 | FB-IoU: 84.26
[Epoch: 05] [Batch: 0351/1425] L: 0.31855 Avg L: 0.40728 mIoU: 73.94 | FB-IoU: 84.29
[Epoch: 05] [Batch: 0401/1425] L: 0.36472 Avg L: 0.40715 mIoU: 74.00 | FB-IoU: 84.33
[Epoch: 05] [Batch: 0451/1425] L: 0.33310 Avg L: 0.40697 mIoU: 74.04 | FB-IoU: 84.36
[Epoch: 05] [Batch: 0501/1425] L: 0.38936 Avg L: 0.40689 mIoU: 74.09 | FB-IoU: 84.38
[Epoch: 05] [Batch: 0551/1425] L: 0.28624 Avg L: 0.40672 mIoU: 74.15 | FB-IoU: 84.41
[Epoch: 05] [Batch: 0601/1425] L: 0.35422 Avg L: 0.40659 mIoU: 74.19 | FB-IoU: 84.44
[Epoch: 05] [Batch: 0651/1425] L: 0.38933 Avg L: 0.40646 mIoU: 74.24 | FB-IoU: 84.47
[Epoch: 05] [Batch: 0701/1425] L: 0.36583 Avg L: 0.40621 mIoU: 74.30 | FB-IoU: 84.50
[Epoch: 05] [Batch: 0751/1425] L: 0.35363 Avg L: 0.40603 mIoU: 74.36 | FB-IoU: 84.53
[Epoch: 05] [Batch: 0801/1425] L: 0.33130 Avg L: 0.40575 mIoU: 74.42 | FB-IoU: 84.56
[Epoch: 05] [Batch: 0851/1425] L: 0.35086 Avg L: 0.40560 mIoU: 74.46 | FB-IoU: 84.59
[Epoch: 05] [Batch: 0901/1425] L: 0.32383 Avg L: 0.40543 mIoU: 74.50 | FB-IoU: 84.61
[Epoch: 05] [Batch: 0951/1425] L: 0.39691 Avg L: 0.40528 mIoU: 74.54 | FB-IoU: 84.64
[Epoch: 05] [Batch: 1001/1425] L: 0.37308 Avg L: 0.40502 mIoU: 74.58 | FB-IoU: 84.67
[Epoch: 05] [Batch: 1051/1425] L: 0.37183 Avg L: 0.40492 mIoU: 74.63 | FB-IoU: 84.70
[Epoch: 05] [Batch: 1101/1425] L: 0.37630 Avg L: 0.40475 mIoU: 74.67 | FB-IoU: 84.73
[Epoch: 05] [Batch: 1151/1425] L: 0.41443 Avg L: 0.40453 mIoU: 74.71 | FB-IoU: 84.76
[Epoch: 05] [Batch: 1201/1425] L: 0.36623 Avg L: 0.40434 mIoU: 74.76 | FB-IoU: 84.79
[Epoch: 05] [Batch: 1251/1425] L: 0.31162 Avg L: 0.40414 mIoU: 74.81 | FB-IoU: 84.82
[Epoch: 05] [Batch: 1301/1425] L: 0.39881 Avg L: 0.40392 mIoU: 74.84 | FB-IoU: 84.84
[Epoch: 05] [Batch: 1351/1425] L: 0.39712 Avg L: 0.40369 mIoU: 74.89 | FB-IoU: 84.87
[Epoch: 05] [Batch: 1401/1425] L: 0.42172 Avg L: 0.40356 mIoU: 74.94 | FB-IoU: 84.90
*** Training [@Epoch 05] Avg L: 0.40350 mIoU: 74.96 FB-IoU: 84.91 ***
[Epoch: 05] [Batch: 0001/0125] L: 0.47616 Avg L: 0.41578 mIoU: 20.61 | FB-IoU: 55.18
[Epoch: 05] [Batch: 0051/0125] L: 0.45201 Avg L: 0.41625 mIoU: 20.61 | FB-IoU: 55.11
[Epoch: 05] [Batch: 0101/0125] L: 0.41850 Avg L: 0.41652 mIoU: 20.59 | FB-IoU: 55.01
*** Validation [@Epoch 05] Avg L: 0.41648 mIoU: 20.59 FB-IoU: 55.01 ***
This is a question on an interesting report in the paper.
The paper reported
We also evaluated on a model initialized with the CLIP image encoder with the same setup and hyperparameters, but observed worse performance than using the ViT initialization.
It seems surprising that CLIP image encoder, which is already well-aligned to the text encoder, is not helpful for the task. Do authors have any guesses about the reason? And, was the performance much worse or a little worse?
Thank you for your great paper.
I tried to train a zero-shot model (vitl16_384) and tested on PASCAL fold 0 and got the following problems:
This is my training script:
python train_lseg_zs.py \
--exp_name train_vitl16_pascal_fold0 --project_name lightseg \
--backbone clip_vitl16_384 \
--dataset pascal --data_path data/Dataset_HSN \
--fold 0 --nshot 0 \
--batch_size 4 --base_lr 0.0001 --max_epochs 200 \
--weight_decay 1e-5 --no-scaleinv --widehead
How does the max_epochs
argument take part in the training process since there are only 4 epochs logged out?
Apart from changing the model from vitl16_384
to vitb32_384
, is there anything wrong with my training script?
[W reducer.cpp:283] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [512, 256, 1, 1], strides() = [256, 1, 256, 256]
bucket_view.sizes() = [512, 256, 1, 1], strides() = [256, 1, 1, 1] (function operator())
While training, DPP is enabled and I only used 01 GPU with batch_size = 4
. I am not sure if this damages training. Does the argument accumulate_grad_batches
probably make this happen?
I want to run
python -u test_lseg_zs.py --backbone clip_resnet101 --module clipseg_DPT_test_v2 --dataset fss \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 \
--weights checkpoints/fss_l16.ckpt
load it to <class 'modules.lseg_module_zs.LSegModuleZS'>
and get the error.
size mismatch for net.scratch.layer4_rn.weight: copying a param with shape torch.Size([256, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 2048, 3, 3]).
Hello,
This is great work @Boyiliee ! I'm excited to try this out.
I have a quick question: what kind of system requirements are necessary to train and run inference on this model? Specifically I am wondering about the type of GPU(s) needed to train LSeg.
Getting the following error after running the streamlit command i.e. - streamlit run lseg_app.py
`Namespace(model='encnet', backbone='clip_vitl16_384', dataset='ade20k', workers=16, base_size=520, crop_size=480, train_split='train', aux=False, se_loss=False, se_weight=0.2, batch_size=16, test_batch_size=16, no_cuda=False, seed=1, weights='', eval=False, export=None, acc_bn=False, test_val=False, no_val=False, module='lseg', data_path='../datasets/', scale_inv=True, widehead=False, widehead_hr=False, ignore_index=-1, label_src='default', arch_option=0, block_depth=0, activation='lrelu', cuda=True)
** Use norm [0.5, 0.5, 0.5], [0.5, 0.5, 0.5] as the mean and std **
{'base_size': 520, 'crop_size': 480}
train
BaseDataset: base_size 520, crop_size 480
2022-01-13 13:34:01.885 Traceback (most recent call last):
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 540, in get_or_create_cached_value
return_value = _read_from_cache(
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 339, in _read_from_cache
raise e
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 324, in _read_from_cache
return _read_from_mem_cache(
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 242, in _read_from_mem_cache
raise CacheKeyNotFoundError("Key not found in mem cache")
streamlit.legacy_caching.caching.CacheKeyNotFoundError: Key not found in mem cache
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/streamlit/script_runner.py", line 354, in _run_script
exec(code, module.dict)
File "/home/resham/lang-seg/lseg_app.py", line 341, in
lseg_model, lseg_transform = load_model()
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 574, in wrapped_func
return get_or_create_cached_value()
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/streamlit/legacy_caching/caching.py", line 558, in get_or_create_cached_value
return_value = func(*args, **kwargs)
File "/home/resham/lang-seg/lseg_app.py", line 274, in load_model
module = LSegModule.load_from_checkpoint(
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 157, in load_from_checkpoint
model = cls._load_model_state(checkpoint, strict=strict, kwargs)
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 199, in _load_model_state
model = cls(_cls_kwargs)
File "/home/resham/lang-seg/modules/lseg_module.py", line 55, in init
self.trainset = self.get_trainset(
File "/home/resham/lang-seg/modules/lsegmentation_module.py", line 202, in get_trainset
dset = get_dataset(
File "/home/resham/lang-seg/data/init.py", line 19, in get_dataset
return encoding_datasetsname.lower()
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/encoding/datasets/init.py", line 39, in get_dataset
return datasetsname.lower()
File "/home/resham/anaconda3/envs/lang-seg/lib/python3.9/site-packages/encoding/datasets/ade20k.py", line 29, in init
assert os.path.exists(root), "Please setup the dataset using" +
AssertionError: Please setup the dataset usingencoding/scripts/prepare_ade20k.py`
Hello, can you tell me how many GPU do you use for training the model? This is important for reproducing your results. Thank you!
Hi, would you be interested in sharing a web demo on Huggingface Spaces for lang-seg?
It would make this model more accessible as it would allow people to try out the model directly from the browser. Some other recent machine learning model repos have set up Spaces for easy access:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/akhaliq/BLIP
github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore
Spaces is completely free, and I can help setup a Gradio Space. Here are some getting started instructions if you'd prefer to do it yourself: https://huggingface.co/blog/gradio-spaces
Hi ,
Thanks for open sourcing this awesome work. While going through your code I could not find the zero-shot or few-shot splits for the dataset. I could only find the ADE20K supervised label split. Does this mean this code is for the fully supervised version ?
Hi thanks for providing great work.
I have a question about the implementation detail of label set vectors (T). As you've pointed out in the paper, text encoder embeds the set of N potential labels into continuous vector space. However, as far as I can see, the code below seems to be that part, but it seems that only the feature of the eos token is selected after tokenizing the label set.
lang-seg/modules/models/lseg_net.py
Line 183 in 9d063b1
Hi, I am trying to use your model for research purposes on Explainable AI.
After struggling for more than I'd like to admit I finally managed to get it up and working, however, I can't find an easy way to get the pixel-level embeddings from your framework, since the interfaces are quite convoluted.
Right now I've been able to do so with evaluator._modules['module'].net.get_image_features(image)
starting from your notebook. I had to write get_image_features
as a modified version of forward that ends at the image features. As such, I don't think this is the best way.
Do you have any suggestion on how to proceed? Maybe some general instructions on how to try to do so?
Thank you in advance!
I wonder if this is suitable for segmentation on grayscale 2D medical images. How to do the data preparation? It looks like I need to prepare the medical dataset using exactly the same file structure as ADE20k dataset?
For training, do I still have to use "--dataset ade20k" argument if I prepare my customized training dataset?
Any other suggestions? Many thanks!
Hi
Thanks for your great work! When I tried to add LSegNet into my own framework, there was a RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. My train function is(run on ADE20K):
def train(self, cur_epoch, optim, train_loader, scheduler=None, print_int=10, logger=None):
device = self.device
model = self.model
criterion = nn.CrossEntropyLoss(ignore_index=-1)
model.train()
for cur_step, (images, labels) in enumerate(train_loader):
images = images.to(device, dtype=torch.float32)
labels = labels.to(device, dtype=torch.long)
optim.zero_grad()
outputs = model(images, labelset='')
loss = criterion(outputs, labels)
self.scaler.scale(loss)
loss.backward()
optim.step()
if scheduler is not None:
scheduler.step()
The model is LSegNet and I didn't modify lseg_net.py. I think maybe some optimizations have been made by Pytorch-lighting. Could you give me some suggestions? Thank you!
Traceback (most recent call last):
File "/home/airs/Clip_Seg/lang-seg/prepare_ade20k.py", line 9, in
from encoding.utils import download, mkdir
File "/home/airs/anaconda3/lib/python3.9/site-packages/encoding/init.py", line 13, in
from . import nn, functions, parallel, utils, models, datasets, transforms
File "/home/airs/anaconda3/lib/python3.9/site-packages/encoding/nn/init.py", line 12, in
from .encoding import *
File "/home/airs/anaconda3/lib/python3.9/site-packages/encoding/nn/encoding.py", line 18, in
from ..functions import scaled_l2, aggregate, pairwise_cosine
File "/home/airs/anaconda3/lib/python3.9/site-packages/encoding/functions/init.py", line 2, in
from .encoding import *
File "/home/airs/anaconda3/lib/python3.9/site-packages/encoding/functions/encoding.py", line 17, in
from encoding import gpu
ImportError: cannot import name 'gpu' from partially initialized module 'encoding' (most likely due to a circular import) (/home/airs/anaconda3/lib/python3.9/site-packages/encoding/init.py)
Hi, thanks for your great work.
I noticed that you use "other" class to refer background or unknown classes in training. May I ask where is the corresponding processing code? Beside, do you encode the unseen (novel) classes as "other" in training procedures?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.