Coder Social home page Coder Social logo

aim-uofa / matcher Goto Github PK

View Code? Open in Web Editor NEW
359.0 30.0 16.0 26.5 MB

[ICLR'24] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Home Page: https://arxiv.org/abs/2305.13310

License: MIT License

Python 94.01% Shell 0.42% C++ 0.55% Cuda 5.02%
dinov2 generalist-model matcher sam in-context-segmentation

matcher's People

Contributors

yangliu96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

matcher's Issues

problem about dataset

image
hello, could you please tell me where is the splits fold, should i download it, or the code will make the folder once i run the code?

Question about forward matching.

Congratulation for your great work!
I have several questions about bidirectional matching:
1、How do you perform bipartite matching between the points on the reference mask Pr and the patch-level features of target image Zt. As the dimension of the reference mask is 3256192 and the dimension of the patch feature is 7681612.
2、How to solve the problem of having more points than patch features(192 patch) using bipartite matching in forward matching? The number of Pr is much greater than the number of patch feature, how to evaluate the similarity of the Pr and Zt?
3、Could you elaborate on the implementation process and implementation details of forward matching or the bidirectional matching?

Bidirectional Matching

In forward matching, what is the dimesion of the points on the reference mask Pr? And what does the L means?

Possible release date

Hi, thanks for the great article
May I ask you what time are you going to release the source code? Will you release the pretrained models too?

Questions about the model task: object non-presence, multi-categories and few-shot

Hello,
Congratulation for your great and interesting work!
I have several questions to see if this model match my use case:

1- If I run the model on a target image without the reference object, will it still predict something or will it be able to say (with a given confidence) that the image does not have the queried object ?
2- I am interested to run this model with several categories as inputs. Is there a mecanism to run the inference on several categories at the same time or will I have to run distinct predictions for each categories ?
3- Can the model be extended to do few-shot with several reference masks for one same object ?

Thank you in advance!

detail about sample point prompt

作者您好,想请教几个问题。
1.论文里似乎只提到对于三种level有不同的采样的策略,但具体是怎么采样的似乎没有提及。请问可以展开说说吗?
2.关于kmeans++的cluster center数量似乎也没有明确,请问可以说说相关的选择策略吗?
3.图1中,输入point+box prompt后SAM似乎有多个mask proposals,sam中默认mult_mask_outputs的数量是3,请问你们是设置了多少?

Lvis 92-i

Thank you for very inspiring work! Our research group especially thinks that creation of new, demanding few shot dataset is a great idea. As we would like to use this dataset for evaluation we have a few questions about lvis 92-i dataset:

  1. How exactly the folds are chosen i.e. which classes belong to which fold? Usually a table with all names is supplied. The methods are different for Pascal-5i where continuous segments are taken (class 1, 2, 3, 4, 5 for fold 0, then 6 .. 10 for fold 1 etc) and coco 20i where discontinuous numbers are taken (class 1, 5, 9, ... for fold 0 then 2, 6, 10, ... for fold 1 etc)
  2. You eliminated classes with less than 2 images for 1 shot task. Using this logic for 5 shot one would need to eliminate classes with less than 6 images. This will change the dataset most probably. As you mentioned that you did evaluations for 5 shot scenario in the other issue what are the exact classes and folds in this scenario?
  3. Would it be possible to release dataloading and evaluation code? Our group has noticed a mistake in how evaluations are done for COCO 20i and we think this ought to be avoided in case of new dataset. Namely in some prominent implementations classes are encoded as .png images with class masks having the same number as their class number (i.e. pixels of class number 1 have value 1 in the image). The problem is that if masks of two classes overlap then one mask is cut away.

Thank you very much for your response in advance!

Patch Level

Thank you for your outstanding work!

Can you please describe how patch-level features are generated and how they are sized?
Also, I'd like to ask what the center prompt means and how the model generates it.

Your excellent will be a great help to my research!

VOS Code Release

Hi, thanks for the great work and code!

I want to try it on VOS tasks, but that part of code is not released for now.

Could we expect the release of VOS code in the near future, or some estimated date? Thank you!

Semantic segment

Thank you for your outstanding work!

The paper mentions that you use SAM as the class-agnostic segmentation model, does this mean that Matcher does not have the ability to recognize semantic information while segmenting?

In the meantime, I'm curious as to when the source code will be released.

Your excellent will be a great help to my research!

COCO-20 Dataset Mask Annotation

Hi, thanks for this great work!

I tried to test Matcher on COCO-20i by following the command here but I got following errors.

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/COCO2014/annotations/val2014/COCO_val2014_000000507081.png'

I believe this is because the code requires the COCO-20i mask annotation to be in image format, but the official annotation is in json format.

Could you please check it and provide the converted image format of the mask annotation as well, thank you so much!

Question about k-means++

Congratulation for your great work!
image
About k-means++, How do you set the value of k? Is it a learnable parameter?

No output images in gardio demo

Hi, thanks for sharing this wonderful work.

I follow the instructions to install the Matcher.
I launch gardio by 'python app.py'.
It shows the model process successfully but doesn't display the output images, like the screenshot.
image

No error output in the terminal

Windows

作者您好!请问可以在Windows系统上用Anaconda部署吗,我看安装说明里面只说了Linux和mac。谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.