casia-iva-lab / fastsam Goto Github PK
View Code? Open in Web Editor NEWFast Segment Anything
License: GNU Affero General Public License v3.0
Fast Segment Anything
License: GNU Affero General Public License v3.0
Hi author,
Will you release the code so we can have a try?
Hi
Thanks for your work. Could you please provide the details of how the 2% of the training data was selected? Did you use any specific strategy or just random selection ?
Thanks
Hello,
I am running the cat example given in the image folder. On MAC-M1, I get the following error:
img_array = np.frombuffer(buf, dtype=np.uint8).reshape(rows, cols, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 24883200 into shape (1920,1080,3)
can someone help?
I want to use this pro to deploy.
Thanks for you interesting work.
I have license question: How can FastSAM model be released under Apache license when the SA-1B training dataset is licensed under research only license + ultralytics yolo codebase under AGPL?
Hi! Thanks for the great repo! I really like it.
Could you please provide more details about training costs, such as GPUs / training time? Besides, why are only 2% of images from SA-1B used for training? Is there some reason for the setting?
Thanks for your work, does the training code mean that I can use my labeled segmented data containing one class to train a model that segments only that one class?
感谢您的工作,请问训练代码是指可以使用我包含一个类的已标注分割数据训练一个只分割这一个类的模型吗?
Hello, when exporting to tensorrt (engine), an error is reported: export running on CPU but must be on GPU. How to fix it?
Hi
Can you share the code how to convert the model to trt?
can you provide a model download link of baidu cloud?
Are you also planning combining this model to Grounding-DINO by IDEA?
Hi there,
I passed a 19201080 image into the model for segmenation, but the output image doesn't stay in the same resolution. The image with masks is scaled down into around 1328741 (a rough number) in equal proportions.
Any measures to keep the ourput image same size as the original image?
Thanks
Since FastSAM uses Yolo detector, is it possible to get the mask labels?
I tried to test some other prompt-words such as black eyes, wood, sands in text mode with the sample picture in huggingface, but wrong results were given. Does it is possibly because the sample prompt-words such as yellow dog and blcak dog were ever applied in the prompts of training dataset?
I'm getting this error when I'm trying to run pip.
Installing collected packages: seaborn, ultralytics
ERROR: Could not install packages due to an OSError: [WinError 2] The system cannot find the file specified: 'C:\Python311\Scripts\ultralytics.exe' -> 'C:\Python311\Scripts\ultralytics.exe.deleteme'
Hi, thank you very much for your research.
Does box_prompt currently support only one box?
Thanks for the great engineering application research !
I see the Yolo-seg series models completely borrow the idea of coefficientized masks from Yolact.
In your paper 3.2 section saying about yolov8: The updated Head module embraces a decoupled structure, separating classification and detection heads, and shifts from Anchor-Based to Anchor-Free.
And I'm wondering, does the threshold crop step in Yolact makes the yolo-seg architecture back to an Anchor-Based method?
Hello! I really like this project.
Do you plan to support splitting this model into Encoder and Decoder like the original SAM?
In that way, the Decoder part can be run very fast, and we can apply it to some applications like AnyLabeling.
I'd love to help integrate into AnyLabeling if we can find a way to split the model.
Thank you very much!
Thank you for your efforts!! I would like to ask if this whole image segmentation can get the label of each split block, or if there is any good solution to get the label.
Reference: https://github.com/ChaoningZhang/MobileSAM
Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.
MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:
Best Wishes,
Qiao
We have just released MobileSAM project https://github.com/ChaoningZhang/MobileSAM
. We found that FastSAM seems to perform much worse than MobileSAM with points as the prompt, especially when the foreground point and background point as set close. Can you give thoughts on what might be the reason? Thank you for your help in advance.
Hello,
I am trying the mode for the first time. I get the following message on Apple M1. Can someone help?
File "/Users/Projects/FastSAM/gitsrc/utils/tools.py", line 179, in fast_process
img_array = np.fromstring(buf, dtype=np.uint8).reshape(rows, cols, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot reshape array of size 7756992 into shape (603,1072,3)
Hi, output from onnx format is of shape ([(1, 37, 21504), (1, 32, 256, 256)]. If I post process them using below method
where conf = 0.4, iou -> 0,=.9, and agnostic_nms = False like in the FastSAM .pt model but it doesn't return masks fo same length.
Can someone explain the outputs from onnx format FastSAM model and how to postprocess them .
def postprocess(preds, conf, iou, agnostic_nms=False):
"""TODO: filter by classes."""
p = ops.non_max_suppression(preds[0],
conf,
iou,
agnostic_nms,
max_det=100,
nc=1)
results = []
proto = preds[1] # second output is len 3 if pt, but only 1 if exported
for i, pred in enumerate(p):
pred[:, :4] = ops.scale_boxes(torch.Size([1024,1024]), pred[:, :4],(1024,1024))
masks = ops.process_mask_native(proto[i], pred[:, 6:], pred[:, :4],(1024,1024)) # HWC
return masks
Hey,
Huge thanks for such a great repo that you have made, just wanted to know if it can be fine tuned for specific purposes, it would be great if you could provide code to fine tune the model on COCO dataset.
Why does the example script used in the Colab notebook save the final image in a downsized version? When I use my own custom photo, the resulting image dimensions are considerably smaller.
As you know, CVAT integrated SAM via serverless mode. @gaoxinge @zxDeepDiver
I have tried to use a trained YOLOv8 models weights with FastSAM but it is raising an error about mismatching. I know you guys working on a fine-tune/train code right now but I was just curious about whether replacing the weights possible or not ?
Thanks for a great work!
How can i remove the blank bounds in the output image.I am not familiar with matplotlib.Is there anything I should change?
Nice work!
I'd like to suggest that you create a compatible interface to SAM itself, i.e. a drop-in replacement to SamAutomaticMaskGenerator
which would make it easier for people to start using this.
Hi,
Has someone convert FastSAM to the onnx or coreml format?
Since, FastSam is based on YOLOV8 model and takes img.path as input, how to get an image trace for this and convert to core ml ormat?
Also, how to convert it to coreml format such that output can be of variable size ?
Run the example script.
python Inference.py --model_path FastSAM-x.pt --img_path .\examples\dogs.jpg
File "d:\ProgramData\Anaconda3\envs\fastsam\lib\site-packages\torch\functional.py", line 504, in _meshgrid
return _VF.meshgrid(tensors, **kwargs, indexing ='ij') # type: ignore[attr-defined]
TypeError: torch._VariableFunctionsClass.meshgrid() got multiple values for keyword argument 'indexing'
Any one know how to solve it?
Does FastSAM support batch inference on segment everything mode?
Thank you so much for your contribution. Will you support bounding box interaction option in the future?
Hello,
python Inference.py --model_path FastSAM-x.pt --img_path images/dogs.jpg
I got the following error when I ran it.
AttributeError: 'FigureCanvasTkAgg' object has no attribute 'renderer'
If anyone encounters this problem, they can add the following line to tools.py (line 171).
buf = fig.canvas.draw()
buf = fig.canvas.tostring_rgb()
Hi,
Can someone help in understanding how to give variable input image sizes to FastSAM model exported in onnx format, currently it takes only (1024,1024) size image which is leading to mismatch in desired outcome
Is it possible with the text prompt that it detects multiple objects ? (e.g., "people" will find all people in the image)
Hey guys,
great work on this. I'm one of the co-authors of Ultralytics YOLOv8 and was wondering if you'd like to add support for fast SAM to Ultralytics models HUB here -> https://docs.ultralytics.com/models/
I'd be happy to help. Thanks!
I get following error when I run the model
File /root/FastSAM/fastsam/utils.py:17, in adjust_bboxes_to_image_border(boxes, image_shape, threshold)
14 h, w = image_shape
16 # Adjust boxes
---> 17 boxes[:, 0] = torch.where(boxes[:, 0] < threshold, 0, boxes[:, 0]) # x1
18 boxes[:, 1] = torch.where(boxes[:, 1] < threshold, 0, boxes[:, 1]) # y1
19 boxes[:, 2] = torch.where(boxes[:, 2] > w - threshold, w, boxes[:, 2]) # x2
RuntimeError: expected scalar type long int but found float
向作者致敬,predict文件中import cog,这个cog是什么?也是pip安装的嘛
Can I read all the pictures in the folder at one time and use only one text prompt? I tried to do this, but an opencv error occurred. Is there a mistake in my operation or other reasons, because this operation is more realistic significance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.