aim-uofa / matcher Goto Github PK

View Code? Open in Web Editor NEW

359.0 30.0 16.0 26.5 MB

[ICLR'24] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Home Page: https://arxiv.org/abs/2305.13310

License: MIT License

Python 94.01% Shell 0.42% C++ 0.55% Cuda 5.02%

dinov2 generalist-model matcher sam in-context-segmentation

matcher's People

Contributors

Stargazers

Watchers

Forkers

whuhxb anh-vunguyen suke0 anglebinbin z-mu-z wzr0108 wuzujiong snowrain510 souxun2015 wzp8023391 noticeable iloncka-ds fjchange adambear

matcher's Issues

Problem about dataset of Pascal-Part

I can not obtain the file 'all_obj_part_to_image.json', could you tell me how can I achieve it?

problem about dataset

hello, could you please tell me where is the splits fold, should i download it, or the code will make the folder once i run the code?

Question about forward matching.

Congratulation for your great work!
I have several questions about bidirectional matching:
1、How do you perform bipartite matching between the points on the reference mask Pr and the patch-level features of target image Zt. As the dimension of the reference mask is 3256192 and the dimension of the patch feature is 7681612.
2、How to solve the problem of having more points than patch features(192 patch) using bipartite matching in forward matching? The number of Pr is much greater than the number of patch feature, how to evaluate the similarity of the Pr and Zt？
3、Could you elaborate on the implementation process and implementation details of forward matching or the bidirectional matching？

The code for training.

When will you release the code for training?

problem about the dimension of the query_mask

hello, what is the dimension of the query_mask here ，torch.size(3, 518, 518) or torch.size(518, 518)

How about the similar idea applied to object detection task?

I want to know it. Thanks a lot!

Bidirectional Matching

In forward matching, what is the dimesion of the points on the reference mask Pr? And what does the L means?

Possible release date

Hi, thanks for the great article
May I ask you what time are you going to release the source code? Will you release the pretrained models too?

Questions about the model task: object non-presence, multi-categories and few-shot

Hello,
Congratulation for your great and interesting work!
I have several questions to see if this model match my use case:

1- If I run the model on a target image without the reference object, will it still predict something or will it be able to say (with a given confidence) that the image does not have the queried object ?
2- I am interested to run this model with several categories as inputs. Is there a mecanism to run the inference on several categories at the same time or will I have to run distinct predictions for each categories ?
3- Can the model be extended to do few-shot with several reference masks for one same object ?

Thank you in advance!

detail about sample point prompt

作者您好，想请教几个问题。
1.论文里似乎只提到对于三种level有不同的采样的策略，但具体是怎么采样的似乎没有提及。请问可以展开说说吗？
2.关于kmeans++的cluster center数量似乎也没有明确，请问可以说说相关的选择策略吗？
3.图1中，输入point+box prompt后SAM似乎有多个mask proposals，sam中默认mult_mask_outputs的数量是3，请问你们是设置了多少？

Lvis 92-i

Thank you for very inspiring work! Our research group especially thinks that creation of new, demanding few shot dataset is a great idea. As we would like to use this dataset for evaluation we have a few questions about lvis 92-i dataset:

How exactly the folds are chosen i.e. which classes belong to which fold? Usually a table with all names is supplied. The methods are different for Pascal-5i where continuous segments are taken (class 1, 2, 3, 4, 5 for fold 0, then 6 .. 10 for fold 1 etc) and coco 20i where discontinuous numbers are taken (class 1, 5, 9, ... for fold 0 then 2, 6, 10, ... for fold 1 etc)
You eliminated classes with less than 2 images for 1 shot task. Using this logic for 5 shot one would need to eliminate classes with less than 6 images. This will change the dataset most probably. As you mentioned that you did evaluations for 5 shot scenario in the other issue what are the exact classes and folds in this scenario?
Would it be possible to release dataloading and evaluation code? Our group has noticed a mistake in how evaluations are done for COCO 20i and we think this ought to be avoided in case of new dataset. Namely in some prominent implementations classes are encoded as .png images with class masks having the same number as their class number (i.e. pixels of class number 1 have value 1 in the image). The problem is that if masks of two classes overlap then one mask is cut away.

Thank you very much for your response in advance!

Patch Level

Thank you for your outstanding work!

Can you please describe how patch-level features are generated and how they are sized?
Also, I'd like to ask what the center prompt means and how the model generates it.

Your excellent will be a great help to my research!

VOS Code Release

Hi, thanks for the great work and code!

I want to try it on VOS tasks, but that part of code is not released for now.

Could we expect the release of VOS code in the near future, or some estimated date? Thank you!

problem of how to use the Gradio Demo

hello, i followed the introduction and run the app.py, but it went wrong when i open the browser and use the demo.

Semantic segment

Thank you for your outstanding work!

The paper mentions that you use SAM as the class-agnostic segmentation model, does this mean that Matcher does not have the ability to recognize semantic information while segmenting?

In the meantime, I'm curious as to when the source code will be released.

Your excellent will be a great help to my research!

COCO-20 Dataset Mask Annotation

Hi, thanks for this great work!

I tried to test Matcher on COCO-20ⁱ by following the command here but I got following errors.

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/COCO2014/annotations/val2014/COCO_val2014_000000507081.png'

I believe this is because the code requires the COCO-20ⁱ mask annotation to be in image format, but the official annotation is in json format.

Could you please check it and provide the converted image format of the mask annotation as well, thank you so much!

Question about k-means++

Congratulation for your great work!

About k-means++, How do you set the value of k? Is it a learnable parameter?

No output images in gardio demo

Hi, thanks for sharing this wonderful work.

I follow the instructions to install the Matcher.
I launch gardio by 'python app.py'.
It shows the model process successfully but doesn't display the output images, like the screenshot.

No error output in the terminal