Hello, I have been working on adapting your oneformer model to a cus

After comparing the <a href="https://detectron2.readthedocs.io/en/latest/tutorials/mod

What is the expected dimension sizes for the outputs dictionary from sem_seg_head? about oneformer HOT 3 CLOSED

SouLeo commented on June 12, 2024

What is the expected dimension sizes for the outputs dictionary from sem_seg_head?

from oneformer.

Comments (3)

SouLeo commented on June 12, 2024

I re-ran the original oneformer model using the ade20k dataset with swin backbone. I used the config file oneformer_swin_large_bs16_160k.yaml and I see similar inconsistencies

Where I printed the image shape using print(images[0].shape) after this line:

OneFormer/oneformer/oneformer_model.py

Line 274 in 4962ef6

images = ImageList.from_tensors(images, self.size_divisibility)

And the mask cls and mask pred shapes are printed after this line:

OneFormer/oneformer/oneformer_model.py

Line 309 in 4962ef6

mask_pred_results = F.interpolate(

with print(mask_cls_results.shape) and print(mask_pred_results.shape)

My point here is that the pred shapes are not guaranteed to match my input image resolution. So at what point in the model do these shapes match? Where should I begin evaluation?

Also, despite the mask_pred_results being different heights and widths than the original input images, the model does seem to produce outputs that are in the ballpark of the original image (when run on the ade20k dataset and config). Meanwhile, with my custom dataset, the outputs are 64 x 64 when I need them to be 256 x 256. Would you expect your model to behave this way? Or would you expect that my configuration and custom dataset may be wrong?

from oneformer.

SouLeo commented on June 12, 2024

I'm still scratching my head trying to figure this out. In a previous Issue, you suggested to extract the correct masks as follows: #82 (comment)

specifically with specific_categiry_mask = (sem_seg == category_id).float()

but using the ade20k config file, I noticed the output of sem seg is:

Looking at the unique values output by this matrix, how could (sem_seg == category_id) possibly be valid? Not just because category_id are ints and the values in sem_seg are floats, but also because the range of category_ids for ade20k are greater than 3?

To reiterate, I am genuinely confused about this model, its outputs, and what would be its expected behaviors.

from oneformer.

SouLeo commented on June 12, 2024

After comparing the model input format documentation for detectron2 with the provided example for semantic segmentation custom dataset mappers I noticed a mismatch.

In detectron2 they expect the values of sem_seg to be class labels with the ground truth resolution as [H, W]. However, following your custom mapper class, you use "instance" like labels, such that, the gt_masks are [N,H,W].

I have not tested this; but if this is the issue, I would highly recommend adding additional documentation for this.

from oneformer.

What is the expected dimension sizes for the outputs dictionary from sem_seg_head? about oneformer HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent