nvlabs / odise Goto Github PK

Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]

Home Page: https://arxiv.org/abs/2303.04803

License: Other

Python 99.51% Dockerfile 0.49%

deep-learning instance-segmentation panoptic-segmentation pytorch semantic-segmentation diffusion-models text-image-retrieval zero-shot-learning open-vocabulary open-vocabulary-segmentation

odise's Introduction

ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation exploits pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. It leverages the frozen representation of both these models to perform panoptic segmentation of any category in the wild.

This repository is the official implementation of ODISE introduced in the paper:

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models Jiarui Xu, Sifei Liu*, Arash Vahdat*, Wonmin Byeon, Xiaolong Wang, Shalini De Mello CVPR 2023 Highlight. (*equal contribution)

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Visual Results

Citation

If you find our work useful in your research, please cite:

@article{xu2023odise,
  title={{Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models}},
  author={Xu, Jiarui and Liu, Sifei and Vahdat, Arash and Byeon, Wonmin and Wang, Xiaolong and De Mello, Shalini},
  journal={arXiv preprint arXiv:2303.04803},
  year={2023}
}

Environment Setup

Install dependencies by running:

conda create -n odise python=3.9
conda activate odise
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone [email protected]:NVlabs/ODISE.git 
cd ODISE
pip install -e .

(Optional) install xformers for efficient transformer implementation: One could either install the pre-built version

pip install xformers==0.0.16

or build from latest source

# (Optional) Makes the build much faster
pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)

Model Zoo

We provide two pre-trained models for ODISE trained with label or caption supervision on COCO's entire training set. ODISE's pre-trained models are subject to the Creative Commons — Attribution-NonCommercial-ShareAlike 4.0 International — CC BY-NC-SA 4.0 License terms. Each model contains 28.1M trainable parameters. The download links for these models are provided in the table below. When you run the demo/demo.py or inference script for the very first time, it will also automatically download ODISE's pre-trained model to your local folder $HOME/.torch/iopath_cache/NVlabs/ODISE/releases/download/v1.0.0/.

	ADE20K(A-150)			COCO			ADE20K-Full (A-847)	Pascal Context 59 (PC-59)	Pascal Context 459 (PC-459)	Pascal VOC 21 (PAS-21)	download
	PQ	mAP	mIoU	PQ	mAP	mIoU	mIoU	mIoU	mIoU	mIoU
ODISE (label)	22.6	14.4	29.9	55.4	46.0	65.2	11.1	57.3	14.5	84.6	checkpoint
ODISE (caption)	23.4	13.9	28.7	45.6	38.4	52.4	11.0	55.3	13.8	82.7	checkpoint

Get Started

See Preparing Datasets for ODISE.

See Getting Started with ODISE for detailed instructions on training and inference with ODISE.

Demo

Integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo:
Run the demo on Google Colab:

Important Note: When you run the demo/demo.py script for the very first time, besides ODISE's pre-trained models, it will also automaticlaly download the pre-trained models for Stable Diffusion v1.3 and CLIP, from their original sources, to your local directories $HOME/.torch/ and $HOME/.cache/clip, respectively. The pre-trained models for Stable Diffusion and CLIP are subject to their original license terms from Stable Diffusion and CLIP, respectively.

To run ODISE's demo from the command line:
```
python demo/demo.py --input demo/examples/coco.jpg --output demo/coco_pred.jpg --vocab "black pickup truck, pickup truck; blue sky, sky"
```
The output is saved in demo/coco_pred.jpg. For more detailed options for demo/demo.py see Getting Started with ODISE.
To run the Gradio demo locally:
```
python demo/app.py
```

Acknowledgement

Code is largely based on Detectron2, Stable Diffusion, Mask2Former, OpenCLIP and GLIDE.

Thank you, all, for the great open-source projects!

odise's People

Contributors

Stargazers

Watchers

Forkers

cv-seg qianqian121 ylqi jiachen0212 xvjiarui kingeorge xiaoyao-li healthonrails kgonia ykw3000 3d-vision-project zebrajack dandingbudanding freeseg ryylcc deanofthewebb israrbacha maddy12 clipnerf iq-scm gorluxor goldfishfive jingshuangliu22 sheffieldcao ricklentz manu87ds carlhuangnuc 2132660698 evelynmitchell dr-data zzmshinnosuke vishalkmr wolfworld6 anchor1021 tuananh1007 mingming715 iwcyou martin-liao sanjunliu oedosoldier chomolungma awj2021 hyunbinui been2zz

odise's Issues

Training Time

Thanks for releasing the code. How long does your method need to train?

#

Running equipment and running time

Hello, I have learned about your work. I feel really cool. I would like to ask, what is the running equipment and running time?

predictions['sem_seg'] does not contain the background class

The shape of predictions['sem_seg'] is [133, H, W] for coco, does it contain the background class, then how to get the correct semantic segmentation output？

cityscape evaluation

could you please indicate what the command for evaluating on cityscape is?
thanks in advance.

could you provide code about evaluation of the task of open-vocabulary object detection on the LVIS?

could you provide code about evaluation of the task of open-vocabulary object detection on the LVIS? I don't seem to find code of that

Abut the different of code and paper

The paper claims that ODISE freezes the Denoising Unet. However, upon inspecting the code from ODISE's "ldm.py" file, I encountered some aspects that left me uncertain about the actual freezing status of the Unet. This code is from ODISE/odise/modeling/meta_arch/ldm.py 974

Used GPUs -

Dear authors,

thank you for your brilliant work!
I have one question concerning the used GPUs. According to NVIDIA the V100s are available with 16GB and 32GB as well. Which ones did you use?

BR
Thanos

No confidence score in the demo result

Hi, how can I get the confidence score of each instance segmentation in the demo.py?

AssertionError: datasets/coco/panoptic_train2017/000000000009.png

I have a problem like title when I train this model . What should I do to solve this?

fatal error: cusparse.h: No such file or directory

when run "pip install -e .",
the error happen:
lude/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
#include <cusparse.h>
^~~~~~~~~~~~
compilation terminated.
error: command '/home/cheng/ws/miniconda3/envs/odise/bin/nvcc' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> detectron2

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Great work! Expecting the released code!

environment implement

i follow your install.md step by step and one by one .but unfortunately i still confronted this problem when i run pip install -e .
Failed to build mask2former
ERROR: Could not build wheels for mask2former, which is required to install pyproject.toml-based projects

i really confused about all this stuff and tried for a long time ,could you kind god give some solutions or some great links that i could download these beautiful datasets ?

much appreciate!!

RuntimeError: CUDA error: invalid argument with 3090 GPU

Thanks for your great work. When I try to train the model with eight 3090 GPUs with the following commands,
./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 8 --amp --ref 32

The following errors are encountered.

Starting training from iteration 0
 Exception during training:
Traceback (most recent call last):
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/zoloz/8T-1/zitong/code/ODISE/odise/engine/train_loop.py", line 297, in run_step
    grad_norm = self.grad_scaler(
  File "/home/zoloz/8T-1/zitong/code/ODISE/odise/engine/train_loop.py", line 207, in __call__
    self._scaler.scale(loss).backward(create_graph=create_graph)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
    return user_fn(self, *args)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/ldm/modules/diffusionmodules/util.py", line 142, in backward
    input_grads = torch.autograd.grad(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/__init__.py", line 300, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
    return user_fn(self, *args)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autogr
[06/17 03:38:05 d2.engine.hooks]: Total training time: 0:00:25 (0:00:00 on hooks)
[06/17 03:38:05 d2.utils.events]: odise_label_coco_50e_bs16x8/default  iter: 0/368752    lr: N/A  max_mem: 19297M
Traceback (most recent call last):
  File "/home/zoloz/8T-1/zitong/code/ODISE/./tools/train_net.py", line 392, in <module>
    launch(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/detectron2/engine/launch.py", line 67, in launch
    mp.spawn(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/detectron2/engine/launch.py", line 126, in _distributed_worker
    main_func(*args)
  File "/home/zoloz/8T-1/zitong/code/ODISE/tools/train_net.py", line 363, in main
    do_train(args, cfg)
  File "/home/zoloz/8T-1/zitong/code/ODISE/tools/train_net.py", line 309, in do_train
    trainer.train(start_iter, cfg.train.max_iter)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/zoloz/8T-1/zitong/code/ODISE/odise/engine/train_loop.py", line 297, in run_step
    grad_norm = self.grad_scaler(
  File "/home/zoloz/8T-1/zitong/code/ODISE/odise/engine/train_loop.py", line 207, in __call__
    self._scaler.scale(loss).backward(create_graph=create_graph)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
    return user_fn(self, *args)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/ldm/modules/diffusionmodules/util.py", line 142, in backward
    input_grads = torch.autograd.grad(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/__init__.py", line 300, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
    return user_fn(self, *args)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/function.py", line 414, in wrapper
    outputs = fn(ctx, *args)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/xformers/ops/fmha/__init__.py", line 111, in backward
    grads = _memory_efficient_attention_backward(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/xformers/ops/fmha/__init__.py", line 382, in _memory_efficient_attention_backward
    grads = op.apply(ctx, inp, grad)
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/xformers/ops/fmha/cutlass.py", line 184, in apply
    (grad_q, grad_k, grad_v,) = cls.OPERATOR(
  File "/home/dazhi/miniconda3/envs/odise/lib/python3.9/site-packages/torch/_ops.py", line 442, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Performance of the demo varies dramatically with the CUDA device

Hi, and thank you for your fantastic work! I have encountered a minor issue that I'd like to bring to your attention. I found that when I change the line model.to(cfg.train.device) to model.to("cuda:1") (or any other device) in the demo.ipynb, there is a significant difference in the generated segmentation map compared to the original (please see the attached image below).

The original code runs perfectly fine and produces results consistent with those in the paper. However, when I make this modification, I don't encounter any specific warnings or errors, so I'm uncertain where the issue lies (I suspect that perhaps some modules are not loaded correctly). I'd greatly appreciate your help with this issue. Thank you!

error in gradio demo app.py

I can run demo.py and demo.ipynb, but when I run the app.py file by the command python demo/app.py, the following error occurs:

Traceback (most recent call last):
File "/home/luoc/workspace/ODISE/demo/app.py", line 294, in
examples_handler = gr.Examples(
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/gradio/helpers.py", line 71, in create_examples
client_utils.synchronize_async(examples_obj.create)
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/gradio_client/utils.py", line 359, in synchronize_async
return fsspec.asyn.sync(fsspec.asyn.get_loop(), func, *args, **kwargs) # type: ignore
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
raise return_result
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
result[0] = await coro
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/gradio/helpers.py", line 278, in create
await self.cache()
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/gradio/helpers.py", line 312, in cache
prediction = await Context.root_block.process_api(
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/gradio/blocks.py", line 1108, in process_api
result = await self.call_function(
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/gradio/blocks.py", line 915, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/luoc/miniconda3/envs/odise/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/luoc/workspace/ODISE/demo/app.py", line 253, in inference
model=models[model_name],
KeyError: None

a weird bug

Thanks for the nice work!
So I was playing with some images using the hugging face demo, and I found out that the model is able to detect the coffee maker in the scene if I use the LVIS categories. However, if I just use a single category "coffee maker,coffee machine", the model is not able to detect the coffee maker in the image. Do you know what might be the problem here? BTW, I can provide the image if you want.

Should the input image be in RGB or BGR?

Thanks for the excellent work in open source.
The run_on_image function says the input image should be in BGR order. But in the demo code, the input image is in RGB mode. So, I'm unsure about which mode would yield better results.

Additionally, I found that the ODISE(Lable) model doesn't recognize the "poles". What could be the reason? Is the prompt "poles" incorrect?

The performance obtained is not ideal

Hello, thank you for your excellent work. Can you provide a detailed environment configuration for running your code? The results I achieved locally differ significantly from the expectations you provided.

Training with clipped_grad_norm value of NaN

When I train on coco dataset, I find that the clipped_grad_norm value is NaN and total_loss is difficult to decrease, what is wrong or what might be the reason for this？

Error while installing ODISE

I am running into many errors while installing ODISE. Mainly with compiling Mask2Former.

Errors include:

 fatal error: 'crypt.h' file not found

 fatal error: 'cusparse.h' file not found

Here is my workaround (or at least attempted).

Install Mask2former from their repo (https://github.com/facebookresearch/Mask2Former/blob/main/INSTALL.md) -- this is the main issue. However make sure you use python 3.9.

Once you are able to install Detectron2 and Mask2former. Then you should be set for ODISE.

I had to append CUDA_HOME="/usr/local/cuda-11.3" pip install -e . CUDA_HOME to install detectron2 and mask2former inside my conda environment.

strange result

512x512 configuration as in ablation studies

Hello, could you share the $512\times512$ configuration used in the ablation study? Is there any other change other than the resolution?

I've just modify all 1024 into 512 in configs/common/data/coco_panoptic_semseg.py. It diffs like this:

--- a/configs/common/data/coco_panoptic_semseg.py
+++ b/configs/common/data/coco_panoptic_semseg.py
@@ -49,10 +49,10 @@ dataloader.train = L(build_d2_train_dataloader)(
             L(T.ResizeScale)(
                 min_scale=0.1,
                 max_scale=2.0,
-                target_height=1024,
-                target_width=1024,
+                target_height=512,
+                target_width=512,
             ),
-            L(T.FixedSizeCrop)(crop_size=(1024, 1024)),
+            L(T.FixedSizeCrop)(crop_size=(512, 512)),
         ],
         image_format="RGB",
     ),
@@ -68,7 +68,7 @@ dataloader.test = L(build_d2_test_dataloader)(
     mapper=L(DatasetMapper)(
         is_train=False,
         augmentations=[
-            L(T.ResizeShortestEdge)(short_edge_length=1024, sample_style="choice", max_size=2560),
+            L(T.ResizeShortestEdge)(short_edge_length=512, sample_style="choice", max_size=1280),
diff --git a/configs/common/models/odise_with_caption.py b/configs/common/models/odise_with_caption.py
index e2862cb..03a2bf8 100644
--- a/configs/common/models/odise_with_caption.py
+++ b/configs/common/models/odise_with_caption.py
@@ -25,7 +25,7 @@ model.backbone = L(FeatureExtractorBackbone)(
     ),
     out_features=["s2", "s3", "s4", "s5"],
     use_checkpoint=True,
-    slide_training=True,
+    slide_training=False,

I suppose $512\times512$ does not require slides so I turned it off as well. I wonder if these are consistent with your configuration.

Evaluation of open-vocabulary instance segmentation on the Table B.5?

Dear @shalinidemello @xvjiarui,

Could you provide more details about how to reproduce the results in Table B.5? e.g. do we need to train on this instance task or just use the panoptic pretrained model? Could you provide the training and evaluation code? Is this from this line ?

Thank you!

out of memory

CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.69 GiB total capacity; 21.31 GiB already allocated; 12.06 MiB free; 21.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting

I am working on a gpu with 24 gb and I run out of space and I am using crops of 512x512 with batch size 1

Is this a memory leak problem? I am also using garbage collector but that only delays running into the 'out of memory' error.
Help is appreciated :)

conda error while downloading the specified pytorch cuda versions

When I try to install these versions "conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia", I got the following error.

**"Downloading and Extracting Packages
CondaError: Downloaded bytes did not match Content-Length
url: https://conda.anaconda.org/nvidia/linux-64/libcufft-dev-10.7.1.112-ha5ce4c0_0.tar.bz2
target_path: /home/aub/anaconda3/pkgs/libcufft-dev-10.7.1.112-ha5ce4c0_0.tar.bz2
Content-Length: 206803679
downloaded bytes: 102857120

CancelledError()
CancelledError()
CancelledError()
CancelledError() "**

I have tried to update my conda but did not work

checkpoint for trained model without Implicit Captioner

Hi, do you have the checkpoint for the ODISE trained without Implicit Captioner? Thanks.

error in demo

Hello, GREAT JOB! It looks like your demo shows an error when inference, do you have any plan to fix it? Looking forward to play with it :D

How to convert to onnx format? Do you have a built-in script

Some questions about the code

Thank you for your outstanding work.

I have thoroughly reviewed the paper and the code. Most of it is clear and understandable. However, I find the following sections rather perplexing: self.alpha_cond and self.alpha_cond_time_embed

self.alpha_cond = nn.Parameter(torch.zeros_like(self.ldm_extractor.ldm.uncond_inputs))
self.alpha_cond_time_embed = nn.Parameter(torch.zeros(self.ldm_extractor.ldm.unet.time_embed[-1].out_features))

It appears that self.alpha_cond and self.alpha_cond_time_embed are used to interact with prefixes (as referenced here), which are generated by the Implicit Captioner. Subsequently, the results of this interaction are fed into the Latent Diffusion Model.

I'm curious about the necessity of the following operation (as mentioned here):

batched_inputs["cond_inputs"] = (self.ldm_extractor.ldm.uncond_inputs + torch.tanh(self.alpha_cond) * prefix_embed).

It seems that we could directly feed prefix_embed into the Latent Diffusion Model. I would like to understand the purpose and rationale behind introducing self.alpha_cond and self.alpha_cond_time_embed. Has any previous work employed such an operation?

I eagerly anticipate your response. Thank you very much.

AttributeError: module 'keras.backend' has no attribute 'is_tensor'

inference(input_image, vocab, label_list)

Running the above gives the error in the title at google colab.
Should I change the versions of libraries such as einops and keras?

Can you share some implementation details about the result about 'K-Means Clustering of Frozen Diffusion Features'??

About 'K-Means Clustering of Frozen Diffusion Features', how do you perform on the dataset? Because the LDM model accept the text input to generate the new image samples, and what do you input to obtain which layers' latent feature map and how do you perform the k-menas cluster? Great thanks.

Cant reduce the batch size

My setup is having 8 Titan X GPUs, when i tried to set --ref 32 it gives this error,

/var/spool/slurm/slurmd/job86812/slurm_script: line 50: $benchmarch_logs: ambiguous redirect
Traceback (most recent call last):
File "/home/mu480317/ODISE/./tools/train_net.py", line 392, in
launch(
File "/home/mu480317/.conda/envs/ODISE/lib/python3.9/site-packages/detectron2/engine/launch.py", line 67, in launch
mp.spawn(
File "/home/mu480317/.conda/envs/ODISE/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/mu480317/.conda/envs/ODISE/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/mu480317/.conda/envs/ODISE/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 5 terminated with the following error:
Traceback (most recent call last):
File "/home/mu480317/.conda/envs/ODISE/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/mu480317/.conda/envs/ODISE/lib/python3.9/site-packages/detectron2/engine/launch.py", line 126, in _distributed_worker
main_func(*args)
File "/home/mu480317/ODISE/tools/train_net.py", line 319, in main
cfg = auto_scale_workers(cfg, comm.get_world_size())
File "/home/mu480317/ODISE/odise/config/utils.py", line 65, in auto_scale_workers
assert cfg.dataloader.train.total_batch_size % old_world_size == 0, (
AssertionError: Invalid reference_world_size in config! 8 % 32 != 0

When --ref 8 , then the GPU memory is overflowing.

Please help me solve this. Thank you

Setup issue with mask2former

  running build_ext
  building 'MultiScaleDeformableAttention' extension
  Emitting ninja build file //ODISE/third_party/Mask2Former/build/temp.linux-x86_64-cpython-39/build.ninja...
  error: [Errno 2] No such file or directory: '//ODISE/third_party/Mask2Former/build/temp.linux-x86_64-cpython-39/build.ninja'
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> mask2former

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Is the error related to ninja build?

Detail about 'background' class

Hi, thanks for your great work.

I have a question about the classifier in this paper, I want to know whether you used the 'background' class in the $C_{train}$.

We encode the names of all the categories in Ctrain with the frozen text encoder and define the set of embeddings of all the training categories' names as: Equation4

If you used a background class, is it learnable or fixed?

System RAM crashes while loading model in Google Colab

Thanks for the great Colab

I have a problems.

System RAM out of memory
When executing the code below, it overflows the system RAM and crashes.
I installed xformers, but I couldn't avoid it. Is there any solution?

model = instantiate_odise(cfg.model)

The version problem of stable dffusion

Does ODISE only support Stable Diffusion v1.3, does it support stable diffusion of 1.5 or higher？

Minimum GPU requirements

I get CUDA out of memory error when I run python demo/demo.py --input demo/examples/coco.jpg --output demo/coco_pred.jpg --vocab "black pickup truck, pickup truck; blue sky, sky" on RTX 3060 GPU with 12GB of vram.

Last lines of the error is as follows:

output_features[k] = torch.zeros(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 176.00 MiB (GPU 0; 11.73 GiB total capacity; 8.91 GiB already allocated; 136.75 MiB free; 9.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What are the minimum requirements for running inference code? Is there a way to prevent getting these errors on less powerful systems? Is it possible to perform inference using CPU?

Thanks!

Installation puts too-new version of numpy

When installing following the instructions in the readme, we end up with numpy version 1.25.2

This breaks detectron2 visualizer, which throws error: module ‘numpy‘ has no attribute ‘bool‘.

Therefore, I needed to downgrade numpy to v1.23.*: conda install numpy==1.23.*

Then it works as expected

install‘s q：about detectron2

run：pip install -e .

return：ERROR: Could not build wheels for detectron2, which is required to install pyproject.toml-based projects

detail：
39\detectron2\model_zoo\configs\new_baselines
copying detectron2\model_zoo\configs\new_baselines\mask_rcnn_R_50_FPN_50ep_LSJ.py -> build\lib.win-amd64-cpython-39\detectron2\model_zoo\configs\new_baselines
running build_ext
D:\anaconda\envs\nerf\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'detectron2._C' extension
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035\AppData
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035\AppData\Local
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035\AppData\Local\Temp
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092
creating C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\detectron2
error: could not create 'C:\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\build\temp.win-amd64-cpython-39\Release\Users\PaXini_035\AppData\Local\Temp\pip-install-a7zcsi7z\detectron2_c46d9e951c9e4544ac7db943756ba092\detectron2': 文件名或扩展名太长。
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for detectron2
Running setup.py clean for detectron2
Building wheel for lvis (setup.py) ... done
Created wheel for lvis: filename=lvis-0.5.3-py3-none-any.whl size=14020 sha256=d9272abdad25f5a6bfe26b3f5bf00c215e352eb030e512cdfc85a1dc3100997e
Stored in directory: C:\Users\PaXini_035\AppData\Local\Temp\pip-ephem-wheel-cache-polfx76g\wheels\56\46\42\dc63fcf42b15c084a2d44b6d6854d3dd27d0f3886363ce582b
Building wheel for panopticapi (setup.py) ... done
Created wheel for panopticapi: filename=panopticapi-0.1-py3-none-any.whl size=9302 sha256=17b9b66051da4a373f6fceff0b62c995b4a3173b8ba6d5a8560729a47abed543
Stored in directory: C:\Users\PaXini_035\AppData\Local\Temp\pip-ephem-wheel-cache-polfx76g\wheels\52\9a\3e\b664fb2d7b0016a15b505840f9d97ece85bbc203b74debcde0
Building wheel for pathtools (setup.py) ... done
Created wheel for pathtools: filename=pathtools-0.1.2-py3-none-any.whl size=8801 sha256=bd9d445360da0cdea47b35a8206cac311954640517ba449e1678f28c5e10878b
Stored in directory: c:\users\paxini_035\appdata\local\pip\cache\wheels\ac\67\0c\7406f4ff2becf8690a173e4ad09fad416c31dd5ddcb23b7f9d
Building wheel for future (setup.py) ... done
Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492055 sha256=d9023b0844c47de4abc7637c72575b36e901f949ed769d05226bd5478252fbdc
Stored in directory: c:\users\paxini_035\appdata\local\pip\cache\wheels\56\e1\4e\6ceef740e8a6cd23736ece789be212141ec1a451067edcb87f
Successfully built diffdist antlr4-python3-runtime mask2former test-tube lvis panopticapi pathtools future
Failed to build detectron2
ERROR: Could not build wheels for detectron2, which is required to install pyproject.toml-based projects

RuntimeError: expected scalar type Half but found Float

Thanks for your great work!

When I run the code tools/train_net.py with 2 V100 GPUs, I encounter the follow error:

File "/mnt/cap/caijh/app/src/detectron2/detectron2/engine/train_loop.py", line 155, in train
    self.run_step()
  File "/mnt/workspace/code/ODISE/odise/engine/train_loop.py", line 297, in run_step
    grad_norm = self.grad_scaler(
  File "/mnt/workspace/code/ODISE/odise/engine/train_loop.py", line 207, in __call__
    self._scaler.scale(loss).backward(create_graph=create_graph)
  File "/mnt/cap/caijh/anaconda3/envs/odise/lib/python3.9/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/mnt/cap/caijh/anaconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/mnt/cap/caijh/anaconda3/envs/odise/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
    return user_fn(self, *args)
  File "/mnt/workspace/code/ODISE/third_party/stable-diffusion/ldm/modules/diffusionmodules/util.py", line 138, in backward
    output_tensors = ctx.run_function(*shallow_copies)
  File "/mnt/workspace/code/ODISE/third_party/stable-diffusion/ldm/modules/attention.py", line 212, in _forward
    x = self.attn1(self.norm1(x)) + x
  File "/mnt/cap/caijh/anaconda3/envs/odise/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/cap/caijh/anaconda3/envs/odise/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/mnt/cap/caijh/anaconda3/envs/odise/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Half but found Float

The arguments are

./tools/train_net.py --config-file configs/Panoptic/odise_label_coco_50e.py --num-gpus 2 --amp

Appreciate any idea to solve this issue, thank you.