Coder Social home page Coder Social logo

Comments (3)

SouLeo avatar SouLeo commented on June 12, 2024

I re-ran the original oneformer model using the ade20k dataset with swin backbone. I used the config file oneformer_swin_large_bs16_160k.yaml and I see similar inconsistencies

image
image

Where I printed the image shape using print(images[0].shape) after this line:

images = ImageList.from_tensors(images, self.size_divisibility)

And the mask cls and mask pred shapes are printed after this line:

mask_pred_results = F.interpolate(

with print(mask_cls_results.shape) and print(mask_pred_results.shape)


My point here is that the pred shapes are not guaranteed to match my input image resolution. So at what point in the model do these shapes match? Where should I begin evaluation?

Also, despite the mask_pred_results being different heights and widths than the original input images, the model does seem to produce outputs that are in the ballpark of the original image (when run on the ade20k dataset and config). Meanwhile, with my custom dataset, the outputs are 64 x 64 when I need them to be 256 x 256. Would you expect your model to behave this way? Or would you expect that my configuration and custom dataset may be wrong?

from oneformer.

SouLeo avatar SouLeo commented on June 12, 2024

I'm still scratching my head trying to figure this out. In a previous Issue, you suggested to extract the correct masks as follows: #82 (comment)

specifically with specific_categiry_mask = (sem_seg == category_id).float()

but using the ade20k config file, I noticed the output of sem seg is:
image

Looking at the unique values output by this matrix, how could (sem_seg == category_id) possibly be valid? Not just because category_id are ints and the values in sem_seg are floats, but also because the range of category_ids for ade20k are greater than 3?

To reiterate, I am genuinely confused about this model, its outputs, and what would be its expected behaviors.

from oneformer.

SouLeo avatar SouLeo commented on June 12, 2024

After comparing the model input format documentation for detectron2 with the provided example for semantic segmentation custom dataset mappers I noticed a mismatch.

In detectron2 they expect the values of sem_seg to be class labels with the ground truth resolution as [H, W]. However, following your custom mapper class, you use "instance" like labels, such that, the gt_masks are [N,H,W].

I have not tested this; but if this is the issue, I would highly recommend adding additional documentation for this.

from oneformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.