casia-iva-lab / fastsam Goto Github PK

View Code? Open in Web Editor NEW

7.3K 7.3K 682.0 72.57 MB

Fast Segment Anything

License: GNU Affero General Public License v3.0

Python 99.61% Shell 0.39%

fastsam's People

Contributors

Stargazers

Watchers

Forkers

cat-stack-boop fictiverse msanov trizko hmidani-abdelilah matthewgard1 healthonrails vietanhdev zoq samigroup treksis realsky-lgh vn-os mcx andykeh710 adam-aalah spinkoo damlacoskun karayakar russ76 xymfei standardgalactic dariush-bahrami sandeewestgate habujhabn chenxwh uglierh f901107 iman yiranvang ezhangle kyrie10favor dralkebsi fkarionue hsaigroup evdcush linhong00316 robinmarily 2132660698 ai-beans rizwanmunawar aust-hansen zhangjf2018 yanxg eltociear nstoa137 sirius1002 frederic33366 roderickgj123 ruyu37 marenan nemonameless ebiness ssahgal siennaknox lasyka qianlong2 artemardashev hyeyqwq yuqianf larasmithhh aarnrvera mustostark1 neerajkanhere gen-ai-experts henrykndr crazyboy9103 ytang67 perfyperfect gptalgopro rockystevejobs hhy5277 autogyro geocng kopigreenx mahimairaja iamleon121 admirind1 shaunwei rogerclarkgc sagum1 soon14 1bill2 solololololo faucet10i celum2 karynaur gierry tamnguyenvan floatingpoint64 weifj0212 bigshipai aojdfff tomdeaneight tezeragee soundwazzack gavinljj amkev101 ivangbi23 ivanakarhu

fastsam's Issues

Training code

Hi author,
Will you release the code so we can have a try?

strategy for training data selection?

Thanks for your work. Could you please provide the details of how the 2% of the training data was selected? Did you use any specific strategy or just random selection ?

Thanks

Minor error

Hello,

I am running the cat example given in the image folder. On MAC-M1, I get the following error:

img_array = np.frombuffer(buf, dtype=np.uint8).reshape(rows, cols, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 24883200 into shape (1920,1080,3)

can someone help?

Can this project be exported to ONNX?

I want to use this pro to deploy.

License question

Thanks for you interesting work.

I have license question: How can FastSAM model be released under Apache license when the SA-1B training dataset is licensed under research only license + ultralytics yolo codebase under AGPL?

Training Cost

Hi! Thanks for the great repo! I really like it.
Could you please provide more details about training costs, such as GPUs / training time? Besides, why are only 2% of images from SA-1B used for training? Is there some reason for the setting?

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found

Can you help me passing a video for inferencing using cv2 library? and also export the output into the video or get a output frame on every image inference?

about the train code

Thanks for your work, does the training code mean that I can use my labeled segmented data containing one class to train a model that segments only that one class?

感谢您的工作，请问训练代码是指可以使用我包含一个类的已标注分割数据训练一个只分割这一个类的模型吗？

can you share the checkpoint through baidu yun?

Object crop or template image as prompt support any plans?

Hello, when exporting to tensorrt (engine), an error is reported: export running on CPU but must be on GPU. How to fix it?

Tensor rt converter

Hi
Can you share the code how to convert the model to trt?

FileNotFoundError: [Errno 2] No such file or directory: 'weights\\FastSAM.pt'

I need help.I don't know why.

can you provide a model download link of baidu cloud？

the model outputs only masks, no labels?

Are you also planning combining this model to Grounding-DINO by IDEA?

Output Image

Hi there,

I passed a 19201080 image into the model for segmenation, but the output image doesn't stay in the same resolution. The image with masks is scaled down into around 1328741 (a rough number) in equal proportions.

Any measures to keep the ourput image same size as the original image?

Thanks

Mask Label

Since FastSAM uses Yolo detector, is it possible to get the mask labels?

Can't a 12GB GPU run?

Can't a 12GB GPU run?

Inference has the following error：

it can only detact (and segment) by minor prompts if applying with text mode?

I tried to test some other prompt-words such as black eyes, wood, sands in text mode with the sample picture in huggingface, but wrong results were given. Does it is possibly because the sample prompt-words such as yellow dog and blcak dog were ever applied in the prompts of training dataset?

Install Error

I'm getting this error when I'm trying to run pip.

Installing collected packages: seaborn, ultralytics
ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\Python311\Scripts\ultralytics.exe' -> 'C:\Python311\Scripts\ultralytics.exe.deleteme'

Thank you very much for your excellent work, can fastsam be integrated into label-studio?

Is only one box_prompt used in FastSAM?

Hi, thank you very much for your research.
Does box_prompt currently support only one box?

Is Yolact anchor based or anchor free ?

Thanks for the great engineering application research ！

I see the Yolo-seg series models completely borrow the idea of coefficientized masks from Yolact.

In your paper 3.2 section saying about yolov8: The updated Head module embraces a decoupled structure, separating classification and detection heads, and shifts from Anchor-Based to Anchor-Free.
And I'm wondering, does the threshold crop step in Yolact makes the yolo-seg architecture back to an Anchor-Based method?

Spliting model into Encoder and Decoder

Hello! I really like this project.
Do you plan to support splitting this model into Encoder and Decoder like the original SAM?
In that way, the Decoder part can be run very fast, and we can apply it to some applications like AnyLabeling.
I'd love to help integrate into AnyLabeling if we can find a way to split the model.
Thank you very much!

Some questions about full graph segmentation

Thank you for your efforts!! I would like to ask if this whole image segmentation can get the label of each split block, or if there is any good solution to get the label.

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,
Qiao

Suggestion - replace yolo8 with YoloNas for better performance

Thanks for the great work!
Can you consider replacing yolo8 with YoloNas for better speed+performance?

Why is FastSAM worse than MobileSAM with points as the prompt?

We have just released MobileSAM project https://github.com/ChaoningZhang/MobileSAM. We found that FastSAM seems to perform much worse than MobileSAM with points as the prompt, especially when the foreground point and background point as set close. Can you give thoughts on what might be the reason? Thank you for your help in advance.

Errro:

Hello,

I am trying the mode for the first time. I get the following message on Apple M1. Can someone help?
File "/Users/Projects/FastSAM/gitsrc/utils/tools.py", line 179, in fast_process
img_array = np.fromstring(buf, dtype=np.uint8).reshape(rows, cols, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 7756992 into shape (603,1072,3)

Output from onnx format

Hi, output from onnx format is of shape ([(1, 37, 21504), (1, 32, 256, 256)]. If I post process them using below method
where conf = 0.4, iou -> 0,=.9, and agnostic_nms = False like in the FastSAM .pt model but it doesn't return masks fo same length.

Can someone explain the outputs from onnx format FastSAM model and how to postprocess them .
def postprocess(preds, conf, iou, agnostic_nms=False):
"""TODO: filter by classes."""

p = ops.non_max_suppression(preds[0],
                            conf,
                            iou,
                            agnostic_nms,
                            max_det=100,
                            nc=1)



results = []
proto = preds[1]  # second output is len 3 if pt, but only 1 if exported
for i, pred in enumerate(p):

    pred[:, :4] = ops.scale_boxes(torch.Size([1024,1024]), pred[:, :4],(1024,1024))
    masks = ops.process_mask_native(proto[i], pred[:, 6:], pred[:, :4],(1024,1024))  # HWC
    return masks

Fine Tuning Code

Hey,
Huge thanks for such a great repo that you have made, just wanted to know if it can be fine tuned for specific purposes, it would be great if you could provide code to fine tune the model on COCO dataset.

Output Image - Colab

Why does the example script used in the Colab notebook save the final image in a downsized version? When I use my own custom photo, the resulting image dimensions are considerably smaller.

Can FastSAM be integrated into CVAT?

As you know, CVAT integrated SAM via serverless mode. @gaoxinge @zxDeepDiver

Can I use my own YOLOv8 model weights with FastSAM ?

I have tried to use a trained YOLOv8 models weights with FastSAM but it is raising an error about mismatching. I know you guys working on a fine-tune/train code right now but I was just curious about whether replacing the weights possible or not ?

Thanks for a great work!

Why my output image has blank parts around result image but with the correct same size

How can i remove the blank bounds in the output image.I am not familiar with matplotlib.Is there anything I should change?

Interface compatibility

Nice work!

I'd like to suggest that you create a compatible interface to SAM itself, i.e. a drop-in replacement to SamAutomaticMaskGenerator which would make it easier for people to start using this.

tensorrt inference?

Convert FastSAM to onnx and coreml format

Hi,

Has someone convert FastSAM to the onnx or coreml format?
Since, FastSam is based on YOLOV8 model and takes img.path as input, how to get an image trace for this and convert to core ml ormat?
Also, how to convert it to coreml format such that output can be of variable size ?

TypeError: torch._VariableFunctionsClass.meshgrid() got multiple values for keyword argument 'indexing'

Run the example script.
python Inference.py --model_path FastSAM-x.pt --img_path .\examples\dogs.jpg

File "d:\ProgramData\Anaconda3\envs\fastsam\lib\site-packages\torch\functional.py", line 504, in _meshgrid
return _VF.meshgrid(tensors, **kwargs, indexing ='ij') # type: ignore[attr-defined]
TypeError: torch._VariableFunctionsClass.meshgrid() got multiple values for keyword argument 'indexing'
Any one know how to solve it?

FastSAM batch inference?

Does FastSAM support batch inference on segment everything mode?

bounding box interaction option

Thank you so much for your contribution. Will you support bounding box interaction option in the future?

FigureCanvasTkAgg

Hello,

python Inference.py --model_path FastSAM-x.pt --img_path images/dogs.jpg

I got the following error when I ran it.

AttributeError: 'FigureCanvasTkAgg' object has no attribute 'renderer'

If anyone encounters this problem, they can add the following line to tools.py (line 171).

buf = fig.canvas.draw()
buf = fig.canvas.tostring_rgb()

CoreML export with variable Input Size

Hi,
Can someone help in understanding how to give variable input image sizes to FastSAM model exported in onnx format, currently it takes only (1024,1024) size image which is leading to mismatch in desired outcome

Multiple objects text prompt

Is it possible with the text prompt that it detects multiple objects ? (e.g., "people" will find all people in the image)

Great work! would you like to add this to ultralytics models HUB?

Hey guys,
great work on this. I'm one of the co-authors of Ultralytics YOLOv8 and was wondering if you'd like to add support for fast SAM to Ultralytics models HUB here -> https://docs.ultralytics.com/models/
I'd be happy to help. Thanks!

RuntimeError: expected scalar type long int but found float

I get following error when I run the model

File /root/FastSAM/fastsam/utils.py:17, in adjust_bboxes_to_image_border(boxes, image_shape, threshold)
     14 h, w = image_shape
     16 # Adjust boxes
---> 17 boxes[:, 0] = torch.where(boxes[:, 0] < threshold, 0, boxes[:, 0])  # x1
     18 boxes[:, 1] = torch.where(boxes[:, 1] < threshold, 0, boxes[:, 1])  # y1
     19 boxes[:, 2] = torch.where(boxes[:, 2] > w - threshold, w, boxes[:, 2])  # x2

RuntimeError: expected scalar type long int but found float

casia-iva-lab / fastsam Goto Github PK

fastsam's People

Contributors

Stargazers

Watchers

Forkers

fastsam's Issues

Recommend Projects

Recommend Topics

Recommend Org