Coder Social home page Coder Social logo

shi-labs / oneformer Goto Github PK

View Code? Open in Web Editor NEW
1.3K 19.0 121.0 8.91 MB

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023

Home Page: https://praeclarumjj3.github.io/oneformer

License: MIT License

Python 21.07% Shell 0.03% C++ 0.26% Cuda 2.38% Jupyter Notebook 76.26%
ade20k cityscapes coco image-segmentation instance-segmentation panoptic-segmentation semantic-segmentation transformer oneformer universal-segmentation

oneformer's People

Contributors

alihassanijr avatar honghuis avatar praeclarumjj3 avatar rbavery avatar skalskip avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oneformer's Issues

A question during distributed training.

Excellent Work! I have a question about the loss_contrastive. I find that if we are using distributed training, loss_contrastive function will collect text and image features from the whole batch. I think there may be a situation: For example, for a semantic segmentation task, there are two images A and B have the same classes, which means that they have exactly the same $$Q_{text}$$. For an object query from A, it should be close to a text query from A and far away from other queries. But there is also a text query from B, which is exactly the same as the text query from A. I think this is a conflict. Hope for your reply.

How to train Binary Semantic Segmentation

Thank you for sharing good resources.
I am trying to do binary Semantic Segmentation with only one class. The data I have currently are images and masks, and the masks are labeled with 0 and 1 (background 0, foreground 1).

I've looked at the code provided by OneFormer on GitHub, but all the examples are for instances(panoptic), and I couldn't find any examples of semantic segmentation using just images and masks without a Json file used in instances. So, I'm asking you this question.
(I've tried training semantic segmentation with the ade20k dataset as an example, but even then, instance annotations were essential.)

Is it possible to do semantic segmentation without a panoptic (or instance) annotations file? If so, what resources should I refer to? It's a bit complicated, so it's a bit difficult to understand.
Thank you.

Cuda Out Of Memory

When doing inference on my own data i get this warning: Attempting to copy inputs of <function sem_seg_postprocess at 0x7fc057578a70> to CPU due to CUDA OOM

Does this in the end affect the performance and is there a way to fix this?

I am using one RTX 2070 for inference.

GPU Memory requirements

Hi

Firstly thank you for releasing this amazing work. Not only is the model amazing but the code quality is excellent. Very easy to follow.

I have a question regarding GPU memory requirements for training. In the readme there's a bit of conflicting information.

We train all our models using 8 A6000 (48 GB each) GPUs. We use 8 A100 (80 GB each) for training Swin-L† OneFormer and DiNAT-L† OneFormer on COCO and all

Is it 8xA6000 (384GB) or 8xA100 (640GB)?
Additionally would it be possible to achieve good results with less, say 2xA6000 (96GB), with it just taking longer?

Many Thanks
Tom

Questions about the input of the class_transformer

Thanks for your great work!! But I found something that confused me.

To make things easier, let's first see the logic of the Transformer in the code.

The self.class_transformer is an instance of Transformer, and its forward should be

def forward(self, src, mask, query_embed, pos_embed, task_token=None):

here, the src will be fed into transformer_encoder layers (an instance of TransformerEncoderLayer )
output = src
for layer in self.layers:
output = layer(
output, src_mask=mask, src_key_padding_mask=src_key_padding_mask, pos=pos
)

which will be further fed into the self.with_pos_embed

q = k = self.with_pos_embed(src, pos)

And the function with_pos_embed is
def with_pos_embed(self, tensor, pos: Optional[Tensor]):
return tensor if pos is None else tensor + pos

Here, in my understanding, the tensor denotes the input features the transformer encoder, and the pos denotes the positional embeddings.

However, it seems that the tensor feats is actually the positional embeddings but is treated as the input features, while the tensor self.class_input_proj(mask_features) is actually the input features, but is treated as the positional embeddings

feats = self.pe_layer(mask_features, None)
out_t, _ = self.class_transformer(feats, None,
self.query_embed.weight[:-1],
self.class_input_proj(mask_features),
tasks if self.use_task_norm else None)

Am I misunderstanding here?

RuntimeError: Not implemented on the CPU

I have read that similar issue, but the cuda version on my machine is 11.6, and my PyTorch is installed with CUDA 11.3, so it doesn’t feel like it’s caused by the same bug. At the same time, I can run another repo normally in the same environment.
image

Colab training example

Hello 👋🏻! As requested in #9 I created dedicated issue. I hope you will get email notifications now.

I started to build Google Colab but I did run into some problems:

ImportError: /usr/local/lib/python3.7/dist-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrIdEEPT_v

Here is my current version of the notebook: https://colab.research.google.com/drive/1ugQqod5zZLTh9bibOEaflI6QHwkcAM41#scrollTo=Rerjwwk_ZEY_

Do you have any idea how to solve that?

Outdated .pth and .yaml files for Di

Thanks for your incredible work team. Getting this error on inference:

Weight format of OneFormerHead have changed! Please upgrade your models. Applying automatic conversion now ... WARNING [11/26 13:32:01 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:

Using this model for config & checkpoint: OneFormer | DiNAT-L† | 896×896

I am running inference on cpu with:
cfg.MODEL.DEVICE = 'cpu'

Any idea where I'm going wrong would be greatly appreciated. Thanks !

Custom dataset training clarification

Hi team 👋!

First of all great project! I'm super excited to see that you used Detectron2 as framework of choice.

I'm trying to train my own model using custom dataset in COCO format. And for now I have 2 questions:

  1. Do I need to provide annotations for all tasks? Thats how I understand those guidelines. I'm mostly interested in instance segmentation task:
coco/
  annotations/
    instances_{train,val}2017.json
    panoptic_{train,val}2017.json
    caption_{train,val}2017.json
    # evaluate on instance labels derived from panoptic annotations
    panoptic2instances_val2017.json
  {train,val}2017/
    # image files that are mentioned in the corresponding json
  panoptic_{train,val}2017/  # png annotations
  panoptic_semseg_{train,val}2017/  # generated by the script mentioned below
  1. Do I need to train on 8x A100. I understand that you needed that much power when you trained from scratch, but if I use your checkpoint will 1x A100 be sufficient?

ERROR [04/29 21:09:11 d2.checkpoint.c2_model_loading]: Ambiguity found for res5.0.conv1.norm.bias in checkpoint!It matches at least two keys in the model (roi_heads.res5.0.conv1.norm.bias and backbone.res5.0.conv1.norm.bias).

I was stuck at this error, any help would be great ! @praeclarumjj3

[04/29 21:09:11 oneformer.data.dataset_mappers.oneformer_unified_dataset_mapper]: [OneFormerUnifiedDatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=..., max_size=2048, sample_style='choice'), RandomCrop_CategoryAreaConstraint(crop_type='absolute', crop_size=[1024, 1024], single_category_max_area=1.0, ignored_category=255), <detectron2.projects.point_rend.color_augmentation.ColorAugSSDTransform object at 0x7fa072dcfe20>, RandomFlip()]
[04/29 21:09:11 d2.data.build]: Using training sampler TrainingSampler
[04/29 21:09:11 d2.data.common]: Serializing 6000 elements to byte tensors and concatenating them all ...
[04/29 21:09:11 d2.data.common]: Serialized dataset takes 7.08 MiB
[04/29 21:09:11 fvcore.common.checkpoint]: [Checkpointer] Loading from detectron2://ImageNetPretrained/torchvision/R-50.pkl ...
[04/29 21:09:11 fvcore.common.checkpoint]: Reading a file from 'torchvision'
ERROR [04/29 21:09:11 d2.checkpoint.c2_model_loading]: Ambiguity found for res5.0.conv1.norm.bias in checkpoint!It matches at least two keys in the model (roi_heads.res5.0.conv1.norm.bias and backbone.res5.0.conv1.norm.bias).
Traceback (most recent call last):
File "train_net.py", line 435, in
launch(
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 424, in main
trainer.resume_or_load(resume=args.resume)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 412, in resume_or_load
self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS, resume=resume)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 227, in resume_or_load
return self.load(path, checkpointables=[])
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/checkpoint/detection_checkpoint.py", line 52, in load
ret = super().load(path, *args, **kwargs)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 156, in load
incompatible = self._load_model(checkpoint)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/checkpoint/detection_checkpoint.py", line 97, in _load_model
checkpoint["model"] = align_and_update_state_dicts(
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/checkpoint/c2_model_loading.py", line 287, in align_and_update_state_dicts
raise ValueError("Cannot match one checkpoint key to multiple keys in the model.")
ValueError: Cannot match one checkpoint key to multiple keys in the model.

I'm using custom dataset for panoptic segmentation.
wrote register file, config file , used oneformer_unified_dataset_mapper, COCOPanopticEvaluator.

Thanks in advance !

no images found in image directory!

Greetings,
I have followed your installation steps, dataset preparation steps as it is. I have an issue that it is showing images are not in the directory even though images are in it. Any help would be great. Thanks in advance.

03/20 15:44:27 oneformer.data.dataset_mappers.oneformer_unified_dataset_mapper]: [OneFormerUnifiedDatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=..., max_size=2560, sample_style='choice'), RandomCrop_CategoryAreaConstraint(crop_type='absolute', crop_size=[640, 640], single_category_max_area=1.0, ignored_category=255), <detectron2.projects.point_rend.color_augmentation.ColorAugSSDTransform object at 0x7f05844cbca0>, RandomFlip()]
Traceback (most recent call last):
File "train_net.py", line 435, in
launch(
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 423, in main
trainer = Trainer(cfg)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 378, in init
data_loader = self.build_train_loader(cfg)
File "train_net.py", line 162, in build_train_loader
return build_detection_train_loader(cfg, mapper=mapper)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/config/config.py", line 207, in wrapped
explicit_args = _get_args_from_config(from_config, *args, **kwargs)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/config/config.py", line 245, in _get_args_from_config
ret = from_config_func(*args, **kwargs)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/data/build.py", line 337, in _train_loader_from_config
dataset = get_detection_dataset_dicts(
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/data/build.py", line 240, in get_detection_dataset_dicts
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/data/build.py", line 240, in
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/data/catalog.py", line 58, in get
return f()
File "/home/iit29/Desktop/OneFormer/oneformer/data/datasets/register_ade20k_panoptic.py", line 295, in
lambda: load_ade20k_panoptic_json(
File "/home/iit29/Desktop/OneFormer/oneformer/data/datasets/register_ade20k_panoptic.py", line 267, in load_ade20k_panoptic_json
assert len(ret), f"No images found in {image_dir}!"
AssertionError: No images found in /home/iit29/Desktop/OneFormer/datasets/ADEChallengeData2016/ADEChallengeData2016/images/training!
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb:

Any help would be great, thanks in advance!

Segmentation label map

Hello
How are you?
Thanks for contributing to this project.
I want to get one segmentation map containing all the class labels rather than binary mask for each instance.
How can I get it?

Queries on training custom dataset for panoptic task

I know these are very basic queries, thanks in advance !

  1. After writing a file similar to https://github.com/SHI-Labs/OneFormer/blob/main/oneformer/data/datasets/register_ade20k_panoptic.py to register custom data, I should run this specific python file separately to register. is that right?

  2. I'm using oneformerunifieddatasetmapper https://github.com/SHI-Labs/OneFormer/blob/main/oneformer/data/dataset_mappers/oneformer_unified_dataset_mapper.py

  3. I'm using config file similar to https://github.com/SHI-Labs/OneFormer/blob/5e04c9aaffd9bc73020d2238757f62346fe778c0/configs/ade20k/Base-ADE20K-UnifiedSegmentation.yaml

4.I have doubt in evaluator, can I use any provided evaluators for my dataset? if not kindly guide me through

Total Loss stops reducing while fine tuning for Instance Segmentation on Custom Dataset. Should I continue to train for more iterations ?

Hello There,

Thanks for sharing the amazing work!

I have been experimenting OneFormer repo since past few days and I am able to run the training (fine-tuning) for Instance Segmentations using Custom Dataset on 1 GPU (Tesla T4) by reducing the image size to 512.

The following are the changes I have made to my configuration.

cfg.INPUT.IMAGE_SIZE = 512
cfg.SOLVER.IMS_PER_BATCH = 1 (Even 16 works)

cfg.MODEL.ROI_HEADS.NUM_CLASSES = <Number Of Classes In My Dataset>
cfg.MODEL.RETINANET.NUM_CLASSES = <Number Of Classes In My Dataset>
cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES = <Number Of Classes In My Dataset>

cfg.SOLVER.MAX_ITERA = 40000

with default Base Learning Rate of 0.0001

COCO DINAT Configuration file : oneformer_dinat_large_bs16_100ep.yaml

MODEL WEIGHTS : 150_16_dinat_l_oneformer_coco_100ep.pth

My dataset has approx 10,000 images in the train set.

I found the Training Settings you have used from the Appendix Section of the Paper. So, a batch size of 16 was used for around 90K or more iterations depending on the datasets.

I have trained the model with varying batch sizes but I observe that the total loss stop reducing after few thought iterations.

For example, at batch size of 1, the stating total loss was 87 which reduced to around 13 in 8000 iterations. But after that the train loss oscillates between the values of 9 to 28.

So, with this observation - what should is recommended ?

  1. Should I train the model longer ?
  2. Should I increase the batch size and train longer ?
  3. Should I change the Learning Rate ?
  4. Is there any other modification that is required for fine tuning ?
  5. What is a ball park number of iterations or epochs that one might need for fine tuning this architecture on a train set of 10K images.

Thanks for the help !

Error when training for instance segmentation with a custom dataset

Using my custom dataset in COCO format for instance segmentation training.
Changed CFG to

cfg.MODEL.TEST.TASK = "instance"
cfg.INPUT.TASK_PROB.SEMANTIC = 0
cfg.INPUT.TASK_PROB.INSTANCE = 1

Still getting an error UnboundLocalError: local variable 'pan_seg_gt' referenced before assignment

From #5 and reading docs I understand I have to somehow prepare my dataset for instance segmentation training.

  1. Is it correct to say OneFormer expects COCO dataset in a panoptic format?
  2. If 1) is true, how do I convert my custom instance segm. COCO dataset to a panoptic format?
  3. I found a script from panopticapi to convert from instance to panoptic format, but judging from the description it will merge every instance annotation in an image to a single annotation, which would defy the purpose of training an instance segmentation model.
    Also getting a KeyError when using that script cocodataset/panopticapi#58
  4. How do I prepare a detection COCO dataset to train for instance segmentation with OneFormer? Thanks.

image

How can we add Validation Dataset for periodic evaluation during training ?

Hello There,

Fristly, thanks for the sharing the amazing work !

I am using OneFormer for Instance Segmentation Task on custom dataset.

I read the #17 and used the "InstanceCOCOCustomNewBaselineDatasetMapper" from the instance_coco_custom_dataset_mapper.py; and I am able to train the model on my dataset.

I was trying to figure out - if I can get the inference results on the Validation Dataset periodically, says every 100 iterations.

I modified the cfg as below -

cfg.DATASETS.TEST = "CustomInstSegVAL"
cfg.TEST.EVAL_PERIOD = 100

I created a overridden version of the build_test_loader function using the InstanceCOCOCustomNewBaselineDatasetMapper as below -

def build_test_loader(cos, cfg, dataset_name):
val_ampper = InstanceCOCOCustomNewBaselineDatasetMapper(cfg, is_train = True)
return build_detection_test_loader(DatasetCatalog.get('CustomInstSegVal'), mapper=val_mapper)

NOTE: if I set is_train = False in val_mapper = InstanceCOCOCustomNewBaselineDatasetMapper(cfg, is_train = True), then it throws AssertionError from the build_transform_gen function of InstanceCOCOCustomNewBaselineDatasetMapper.

If I set is_train = True in val_mapper = InstanceCOCOCustomNewBaselineDatasetMapper(cfg, is_train = True), then it throws AssertionError from pycocotools/coco.py > stating "AssertionError: Results do not correspond to current coco set"

Can you please guide me as to how we can use the Validation Dataset to test the models performance during the training.

Thanks !

`cd oneformer/modeling/pixel_decoder/ops` `sh make.sh`

cd oneformer/modeling/pixel_decoder/ops
sh make.sh

got

/OneFormer/oneformer/modeling/pixel_decoder/ops/setup.py", line 52, in get_extensions
    raise NotImplementedError('CUDA_HOME is None. Please set environment variable CUDA_HOME.')
NotImplementedError: CUDA_HOME is None. Please set environment variable CUDA_HOME.

\how to run on CPU?

Bounding box for each instance

Hello
How are you?
Thanks for contributing to this project.
I found that the demo script does NOT output bounding boxes for each instance in panoptic/instance segmentation.
Could u guide me how to get the bounding boxes?

Mapillary dataset not working in google colab

Hi so I tried to clone the repo in your google colab sample and modified the code to include mapillary it seems to be failling with a meta dictonary erorr in get_config where meta info seems to missing i.e thingstocontiguous id

Calculate and Compare Panoptic Quality With Paper Results

Hi! I plan to compare Oneformer's panoptic quality results with the results stated in the paper for the COCO dataset. Initially, I need to convert oneformer output to .json coco format and then use panopticapi to evaluate Panoptic Quality. I do not know if this is correct but I tried yet I could not succeed. Could you please tell me the accurate steps and proper links to perform this task? I am using Oneformer's colab. Thank you in anticipation.

inquery about ade20k caption label

Hi, thanks for your great work. I did not find the caption-related annotations on the ADE20K website, could you point out how to get them? or how to generate them?

Cannot Reproduce Results (Concerning Discrepancies !)

Thanks for this work. I would like to ask if you can please share the logs for Swin-L backbone on ADE20K (640×640). I tried and get similar numbers to #14 and wonder what is the issue.

Specifically, you reported the following, hence would like to be able to see the logs to under the issue.
49.8 , 35.9, 57.0

Also it would be great if you can please share the logs for Swin-L backbone using Cityscapes dataset.

P.S: Now that this work is accepted to CVPR, it is so crucial to maintain reproduciblity.

Few Questions regarding training on custom data

Hi, I am trying to train OneFormer on the custom dataset and I was able to start the training. But, I have a few questions regarding choosing the right settings. Currently I resued ADE20k config file after editing the number of classes, iterations, and batch size.

  1. What does DETECTIONS_PER_IMAGE do and how to choose the right value?
  2. How to choose the right crop size? and will it impact the training or prediction time?
  3. I have 20k labeled images and I am training on 4 NVIDIA A100 40GB GPUs with batch size 4, what is the minimum number of iterations required to get good results?

Does Q only have 3 possible values?

According to the paper, the queries Q are only conditioned on "the task is {task}", but {task} only has 3 possible values. So do the queries only have 3 possible values?

few queries regarding config file

can you please elaborate the datasets section in Base-ADE20K-UnifiedSegmentation.yaml
DATASETS:
TRAIN: ("ade20k_panoptic_train",)
TEST_PANOPTIC: ("ade20k_panoptic_val",)
TEST_INSTANCE: ("ade20k_instance_val",)
TEST_SEMANTIC: ("ade20k_sem_seg_val",)

They are ground truth files for train and val, right?
what if instance ground truths are not available?
kindly please clarify, thanks in advance!
@praeclarumjj3

just segmantic segmentation training,how should i do?

if i just want segmantic segmentation training, do not want to Instance segmentation training. how should i do?

i just prepare the dataset like this :
ADEChallengeData2016/
images/
annotations/ # segmantic segmentation data
i don't want to download annotations_instance.tar which contain Instance segmentation data.
what should i do next?

Convert Model to TensorRT

@honghuis @SkalskiP thanks for sharing the source code , Just wante dto knw can we convert this model to Tensorrt or ONNX format ? if so please share the conversion and inference script

Thanks in advance

AssertionError: Checkpoint dinat_large_in22k_in1k_384_11x11.pkl not found!

[03/21 22:37:19 d2.data.build]: Using training sampler TrainingSampler
[03/21 22:37:19 d2.data.common]: Serializing 20210 elements to byte tensors and concatenating them all ...
[03/21 22:37:19 d2.data.common]: Serialized dataset takes 18.42 MiB
[03/21 22:37:19 fvcore.common.checkpoint]: [Checkpointer] Loading from dinat_large_in22k_in1k_384_11x11.pkl ...
Traceback (most recent call last):
File "train_net.py", line 435, in
launch(
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "train_net.py", line 424, in main
trainer.resume_or_load(resume=args.resume)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 412, in resume_or_load
self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS, resume=resume)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 227, in resume_or_load
return self.load(path, checkpointables=[])
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/detectron2/checkpoint/detection_checkpoint.py", line 52, in load
ret = super().load(path, *args, **kwargs)
File "/home/iit29/anaconda3/envs/oneformer/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 153, in load
assert os.path.isfile(path), "Checkpoint {} not found!".format(path)
AssertionError: Checkpoint dinat_large_in22k_in1k_384_11x11.pkl not found!

Similar errors popped when I try with other backbones too, I have no idea why, Any help would be great, Thanks in advance!

gpu cost duing training

Thanks for sharing the awesome work!

I have a minor question.

how many GPU hours does the model training process need on the COCO and ADE20K datasets?

A question about paper and code

Thanks for sharing your gread job! I have serveral questions during reading the paper and codes. Hope to disguss.

  1. About Contrastive Loss: According to the paper, T_pad is a list of representation for each mask-to-be-detected in image. How this relationship is keeped in training process? I found "i" in pairs {qobj_i, xtxt_i} represent index in code, so it seems like q^text always matches the q^obj with same index. But we shouldn't meant to know which object the q^obj represent before DETR decoder inference. Did I misunderstand the paper?

  2. Table 6 of ablation study confuse me. It seems like ablation is about some kind of prompt engineering (of course it's not). I still can't get why adding "a photo with a" can raise model performances. Is this paper use pretrained text encoder? Do you have any new idea or explanation about this ablation?

Correct dataset format to fine-tune with Hugging Face?

Hi, first of all thank you for sharing your awesome work.

I'm trying to fine-tune the model for instance segmentation with a custom dataset that I have locally in COCO format. The issue that I'm having is that I don't know how exactly to convert the segmentation polygon masks to pixel_values and task_inputs that the model's forward function expects.

This is my data loader script:

import datasets
import os
from pycocotools.coco import COCO
from pathlib import Path

class COCODataset(datasets.GeneratorBasedBuilder):
    def _info(self):
        return datasets.DatasetInfo(
            description="COCO dataset",
            features=datasets.Features({
                # "pixel_values": ...
                # "task_inputs": ...
                "image": datasets.Image(),
                "annotations": datasets.Sequence({
                    "id": datasets.Value("int32"),
                    "image_id": datasets.Value("int32"),
                    "category_id": datasets.Value("int32"),
                    "area": datasets.Value("int32"),
                    "iscrowd": datasets.Value("int32"),
                    "bbox": datasets.Sequence(datasets.Value("float32")),
                    "attributes": {
                        "occluded": datasets.Value("bool"),
                    },
                    "segmentation": datasets.Sequence(datasets.Sequence(datasets.Value("float32"))),
                })
            }),       
        )
    def _split_generators(self, dl_manager):
        instances_train_path = dl_manager.download(os.path.join(self.config.data_dir, "annotations/instances_train.json"))
        instances_val_path = dl_manager.download(os.path.join(self.config.data_dir, "annotations/instances_val.json"))
        
        return [
            datasets.SplitGenerator(name=datasets.Split.TRAIN, gen_kwargs={"images": instances_train_path}),
            datasets.SplitGenerator(name=datasets.Split.VALIDATION, gen_kwargs={"images": instances_val_path}),
        ]
    def _generate_examples(self, images):
        coco = COCO(images)
    
        for image_id in coco.imgs:
            image = coco.loadImgs(image_id)[0]
            annotations = coco.loadAnns(coco.getAnnIds(image_id))

            # Load the image content as bytes
            image_path = os.path.join(self.config.data_dir, "images", image["file_name"])
            image_content = Path(image_path).read_bytes()

            yield image_id, {
                "image": image_content,
                "annotations": annotations,
                # "pixel_values": ...,
                # "task_inputs": ...
            }

I know that I'm supposed to use OneFormerProcessor, but the examples provided are only for inference and don't specify how to process input masks. What exactly am I supposed to do in the _generate_examples method? Any tips are greatly appreciated!

Just for reference, here is my train script as well:

import numpy as np
import evaluate
from transformers import OneFormerForUniversalSegmentation, TrainingArguments, Trainer
import datasets
import os

script_dir = os.path.dirname(os.path.abspath(__file__))
data_dir = os.path.join(script_dir, "..", "data/datasets/archviz-600-v2-coco")

ds = datasets.load_dataset(os.path.join(script_dir, "dataset_loader.py"), data_dir=data_dir)

print("Length of train dataset:", len(ds['train']))
print("Length of validation dataset:", len(ds['validation']))

model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_cityscapes_swin_large")
training_args = TrainingArguments(output_dir=os.path.join(script_dir, 'output'), evaluation_strategy="epoch")
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=ds['train'],
    eval_dataset=ds['validation'],
    compute_metrics=compute_metrics,
)

trainer.train()

And this is the output:

Length of train dataset: 472
Length of validation dataset: 118

/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                                              

| 0/177 [00:00<?, ?it/s]Traceback (most recent call last):
  File "oneformer-hugging/train.py", line 32, in <module>
    trainer.train()
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1899, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 635, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 679, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch
    data = self.dataset.__getitems__(possibly_batched_index)
  File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 2782, in __getitems__
    batch = self.__getitem__(keys)
  File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 2778, in __getitem__
    return self._getitem(key)
  File "/usr/local/lib/python3.8/dist-packages/datasets/arrow_dataset.py", line 2762, in _getitem
    pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
  File "/usr/local/lib/python3.8/dist-packages/datasets/formatting/formatting.py", line 578, in query_table
    _check_valid_index_key(key, size)
  File "/usr/local/lib/python3.8/dist-packages/datasets/formatting/formatting.py", line 531, in _check_valid_index_key
    _check_valid_index_key(int(max(key)), size=size)
  File "/usr/local/lib/python3.8/dist-packages/datasets/formatting/formatting.py", line 521, in _check_valid_index_key
    raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 375 is out of bounds for size 0
  0%|

how to use pretrained model?

hi, i want to use pretained model for new trainning, how should i do? i want to trainning with dataset B, and i want to use model_0159999.pth which is producted at trainning on dataset A, as the pretrained model.

Can't setup the environment

I've been trying to build a docker image by following the steps from INSTALL.md, but I'm stuck on this:

# Setup MSDeformAttn
cd oneformer/modeling/pixel_decoder/ops
sh make.sh

I tried installing CUDA toolkit globally, I also tried without using conda at all. No luck, I keep getting all kinds of errors. Please help, I've been pulling my hair with this all day. Here is my Dockerfile so far:

# Use the official Ubuntu 20.04 LTS image as the base image
FROM ubuntu:20.04

# Set environment variables to avoid interaction during package installation
ENV DEBIAN_FRONTEND=noninteractive

# Update the package index and install required packages
RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    ca-certificates \
    bzip2 \
    build-essential \
    git

# Set the working directory
WORKDIR /opt

# Download and install Miniconda
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && chmod +x Miniconda3-latest-Linux-x86_64.sh \
    && ./Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda \
    && rm Miniconda3-latest-Linux-x86_64.sh

# Add conda to the system PATH
ENV PATH="/opt/conda/bin:${PATH}"

# Create the "oneformer" virtual environment
RUN conda create -y -n oneformer

# Activate the "oneformer" virtual environment and run any further commands within it
SHELL ["conda", "run", "-n", "oneformer", "/bin/bash", "-c"]

RUN git clone https://github.com/SHI-Labs/OneFormer.git /OneFormer
RUN cd /OneFormer
WORKDIR /OneFormer

# Install Pytorch
RUN conda install -y pytorch==1.10.1 -c pytorch
RUN conda install -y torchvision==0.11.2 -c pytorch
RUN conda install -y cudatoolkit=11.3 -c pytorch

# Install opencv (required for running the demo)
RUN pip3 install -U opencv-python

# Install detectron2
RUN python -m pip install detectron2 -f \
    https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html

# Install other dependencies
RUN pip3 install git+https://github.com/cocodataset/panopticapi.git
RUN pip3 install git+https://github.com/mcordts/cityscapesScripts.git
RUN pip3 install -r requirements.txt

# Setup wand
RUN pip3 install wandb
#ENV WANDB_API_KEY=...
#RUN wandb login

# Setup MSDeformAttn
# THIS IS WHERE IT BREAKS
# ENV CUDA_HOME=/opt/conda/envs/oneformer/lib/python3.9/site-packages/torch/cuda
# ENV FORCE_CUDA=1
RUN cd oneformer/modeling/pixel_decoder/ops && \
    sh ./make.sh

# Set the entrypoint to use the "oneformer" virtual environment by default
ENTRYPOINT ["conda", "run", "--no-capture-output", "-n", "oneformer"]

# Set the default command to run when starting the container
CMD ["/bin/bash"]

And this is the error that I'm getting:

[19/19] RUN cd oneformer/modeling/pixel_decoder/ops &&     sh ./make.sh:
#0 1.605 /opt/conda/envs/oneformer/lib/python3.9/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
#0 1.605   warnings.warn(msg.format('we could not find ninja.'))
#0 1.605 error: [Errno 2] No such file or directory: '/opt/conda/envs/oneformer/lib/python3.9/site-packages/torch/cuda/bin/nvcc'
#0 1.605
#0 1.605 ERROR conda.cli.main_run:execute(47): `conda run /bin/bash -c cd oneformer/modeling/pixel_decoder/ops &&     sh ./make.sh` failed. (See above for error)
#0 1.605 No CUDA runtime is found, using CUDA_HOME='/opt/conda/envs/oneformer/lib/python3.9/site-packages/torch/cuda'
#0 1.605 running build
#0 1.605 running build_py
#0 1.605 creating build
#0 1.605 creating build/lib.linux-x86_64-3.9
#0 1.605 creating build/lib.linux-x86_64-3.9/functions
#0 1.605 copying functions/__init__.py -> build/lib.linux-x86_64-3.9/functions
#0 1.605 copying functions/ms_deform_attn_func.py -> build/lib.linux-x86_64-3.9/functions
#0 1.605 creating build/lib.linux-x86_64-3.9/modules
#0 1.605 copying modules/__init__.py -> build/lib.linux-x86_64-3.9/modules
#0 1.605 copying modules/ms_deform_attn.py -> build/lib.linux-x86_64-3.9/modules
#0 1.605 running build_ext
#0 1.605
------
failed to solve: executor failed running [conda run -n oneformer /bin/bash -c cd oneformer/modeling/pixel_decoder/ops &&     sh ./make.sh]: exit code: 1

Metrics are zero during evaluation

Thanks for your excellent work. I have used pretrained weights of swin backbone and evaluated the model of ADE20k and got he following results.
03/25 23:29:06 d2.evaluation.panoptic_evaluation]: Panoptic Evaluation Results:

PQ SQ RQ #categories
All 0.000 0.000 0.000 150
Things 0.000 0.000 0.000 100
Stuff 0.000 0.000 0.000 50

[03/25 23:46:54 d2.evaluation.testing]: copypaste: Task: sem_seg
[03/25 23:46:54 d2.evaluation.testing]: copypaste: mIoU,fwIoU,mACC,pACC
[03/25 23:46:54 d2.evaluation.testing]: copypaste: 0.0029,0.0005,0.2547,0.0320
[03/25 23:46:54 d2.evaluation.testing]: copypaste: Task: panoptic_seg
[03/25 23:46:54 d2.evaluation.testing]: copypaste: PQ,SQ,RQ,PQ_th,SQ_th,RQ_th,PQ_st,SQ_st,RQ_st
[03/25 23:46:54 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
[03/25 23:46:54 d2.evaluation.testing]: copypaste: Task: bbox
[03/25 23:46:54 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/25 23:46:54 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
[03/25 23:46:54 d2.evaluation.testing]: copypaste: Task: segm
[03/25 23:46:54 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/25 23:46:54 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000

Anyhelp would be appreciated. Thanks in advance !

Is it possible to run the code without wandb?

Hi there, thanks very much for publishing this repo, it looks very interesting.

I'm trying to follow the installation instructions but due to the CPU architecture of the system I'm using, I don't think I'll be able to use wandb (I don't have an account with them so thought I'd try run it locally):

$ wandb server start
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/ppc64le) and no specific platform was requested

Is it possible to run the code without wandb?

Thanks for any help! :)

Poor performance for instance task

Thanks for sharing this work. I test the model and I have a similar result for panoptic and semantic but I have a poor performance for instance task.
output

I tried to test the model with the swin backbone but I have this error:

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

I would like to train the model on custom datasets for instance segmentation can you provide please a demo for training on custom datasets.

Thanks in advance

error in ms_deformable_im2col_cuda: too many resources requested for launch

Hello!
I am trying to run a demo for a single image. I use "oneformer_dinat_large_IN21k_384_bs16_160k.yaml" as config file and "250_16_dinat_l_oneformer_ade20k_160k.pth" as model weights. When I run the demo.py, I see the following lines: "error in ms_deformable_im2col_cuda: too many resources requested for launch" and the code ends up saving {task}.jpg without significant information.
The error string occurs when executing the timeline code in demo/defaults.py: line 81 "predictions = self.model([inputs])[0]"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.