chaoningzhang / mobilesam Goto Github PK
View Code? Open in Web Editor NEWThis is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
License: Apache License 2.0
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
License: Apache License 2.0
when I convert the model to onnx, I meet the error.
Exporting onnx model to mobile_sam_onnx_opset11.onnx...
Traceback (most recent call last):
File "/mnt/e/ncnn_project/MobileSAM/scripts/export_onnx_model.py", line 179, in
run_export(
File "/mnt/e/ncnn_project/MobileSAM/scripts/export_onnx_model.py", line 150, in run_export
torch.onnx.export(
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/init.py", line 275, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 88, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 689, in _export
_model_to_graph(model, args, verbose, input_names,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 463, in _model_to_graph
graph = _optimize_graph(graph, operator_export_type,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 200, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/init.py", line 313, in _run_symbolic_function
return utils._run_symbolic_function(*args, **kwargs)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
return symbolic_fn(g, *inputs, **attrs)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/symbolic_opset11.py", line 922, in repeat_interleave
return torch.onnx.symbolic_opset9.repeat_interleave(g, self, repeats, final_dim)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 2064, in repeat_interleave
for idx, r_split in enumerate(r_splits):
TypeError: 'torch._C.Value' object is not iterable
(Occurred when translating repeat_interleave).
Can the model handle multiple images at the same time? How to modify the code if possible
So I cloned the repo and worked through the notebook. Do I understand correctly that there's a lightweight image encoded based on TinyViT, but you still need to use the fullsize SAM decoder? I had to download the sam_vit_h_4b8939.pth
file, which is 2.4GB.
Am I using this wrong? Or are there smaller decoder models?
Can we change a particular masked segment with another image with same shape of masked image.
I had to pip install timm
to use mobile_sam, but it is nowhere mentioned that this is required
The initiative is fantastic and quite beneficial. Containerizing the project will make it easier to use.
Great work! I wonder that if we use more training data (~10% or more) to train the mobile-sam, will we get a more powerful model?
Hi! Thank you very much for creating such a wonderful segmentation solution. Using it I was able to create a small example, which works in the browser (powered by ONNX runtime web).
Here is the link: https://github.com/akbartus/MobileSAM-in-the-Browser
Feel free to share/add to your readme section, if needed.
Thank you.
We've already implemented Grounded-MobileSAM here: Grounded-MobileSAM Demo, it's really efficient
I just found the inference time is even longer than SAM when SAM uses 'vit_b'.
I add the time in around this line code
Hello,
Thank you for you work. I've tried to replace your model instead of using SAM in a visualisation pipeline that i've setup. The problem is that I'm unable to obtain as good results as with original SAM, even compared to the Vit-b version. On my test with AutoMask, your model is unable to separate objects as SAM can do. Is the model given in weights your trained model and if yes do you have any clue on what could be wrong ? (the preprocessing is OK as my code is working with original SAM)
Thank you in advance !
below is my test code, I calculate the code “ predictor.set_image(image) ” running time , in my laptop pc (RTX3060),it cost about 50ms using GPU.
sam_checkpoint = "./weights/mobile_sam.pt"
model_type = "vit_t"
device = "cuda" if torch.cuda.is_available() else "cpu"
sam = sam_model_registrymodel_type
sam.to(device=device)
sam.eval()
predictor = SamPredictor(sam)
for i in range(10):
torch.cuda.synchronize()
start = time.time()
predictor.set_image(image)
torch.cuda.synchronize()
print('time cost = ', (time.time() - start) * 1000, "ms")
wonderful work! I'm interested in interactive segmentation,C++ interface on the way?
Hello, I added MobileSAM support to Track Anything which allows to use SAM + XMem for video object tracking and segmentation.
While it does not speed up the inference much during tracking (as XMem is mostly used) MobileSAM allows to have a snappier Gradio interface, making the process of adding masks and deciding tracked objects faster.
Thanks for your work
Hi Team
Does MobileSam supports text as input prompt and mask the region based on the input prompt text.
would you like to add support for RKNN,to run the model on RK3588
As I was reading the paper I had some confusion: your paper mentions that FastSAM needs at least two prompt points and on that basis compares the performance of FastSAM and MobileSAM in segment anything mode.
But as you can see from the HuggingFace demo posted by FastSAM, their approach supports a single point and has worked well in my own attempts. If it is indeed a writing error, I very much look forward to your revision and showing new experiments to effectively compare the two methods.
BTW, I am also curious how Table 7 in the paper is done exactly, and why the model performance can be proved by modifying the distance of positive and negative prompt points?
Here is my code, just adding the time check on the demo code. It seems the encoding process is far more time-consuming than expected.
from mobile_sam import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import torch
import cv2
import numpy as np
import time
model_type = "vit_t"
sam_checkpoint = "./weights/mobile_sam.pt"
device = "cuda:1" if torch.cuda.is_available() else "cpu"
print(device)
mobile_sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
mobile_sam.to(device=device)
mobile_sam.eval()
predictor = SamPredictor(mobile_sam)
image = cv2.imread('./0000_color.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print(image.shape)
box = np.array([100,100,400,600])
time_s = time.time()
predictor.set_image(image=image)
time_e1 = time.time()
masks, _, _ = predictor.predict(box=box)
time_e2 = time.time()
print('encoding time:',time_e1-time_s)
print('decoding time:',time_e2-time_e1)
When using a GPU (RTX 3090), the output is:
cuda:1 (480, 640, 3) encoding time: 2.325988531112671 decoding time: 0.018665313720703125
When using cpu, the output is:
cpu (480, 640, 3) encoding time: 0.8602027893066406 decoding time: 0.08456754684448242
The cpu even runs faster on encoding
Hey guys,
great work. I was wondering would you like to add support for Core ML
to run the model on iOS.
Hello,
Does anyone knows how to use apple "MPS" instead of cpu/cuda device?
I did not convert the model to onnx according to the method provided by SAM
HI! can you release the training code? very thanks!
SAM Exporter has added Mobile SAM to the support models.
Does MobileSAM support batch inference on segment everything mode?
Thanks for your work, does the training code mean that I can use my labeled segmented data containing one class to train a model that segments only that one class?
感谢您的工作,请问训练代码是指可以使用我包含一个类的已标注分割数据训练一个只分割这一个类的模型吗?
Hello authors,
Thanks so much for your great work.
It looks quite promising.
Does it support fine-tuning for downstream tasks such as binary semantic segmentation?
Such as SAM adapter.
https://github.com/tianrun-chen/SAM-Adapter-PyTorch
Kind regards.
Thanks a lot for your work! When you mention that MobileSAM and SAM have comparable performance, do you mean with SAM (ViT-H)? I tried to find evidence for this idea from the paper, but only found a few visualization results. I'm really looking forward to know the quantitative comparison between MobileSAM and SAM on some tasks (such as the multiple experiments mentioned in the original SAM paper).
Hi,
I am experiencing an issue with Faster Segment Anything (MobileSAM) where it is taking 20 seconds to run sometimes, and errors out each time. I have also tried placing over 10 points on an image, but it is still taking a long time and not working.
Here are the steps I am taking to reproduce the issue:
I open https://huggingface.co/spaces/dhkim2810/MobileSAM
I open the image in MobileSAM, using my own and example images provided.
I Select a point with Add Mask
I place points on the image
I click the "Start segmenting" button
The issue occurs when the "Start segmenting" button is clicked. The process sometimes takes 8, 10, and 20 seconds, and each time it errors out with the following message: Error.
I have tried the following to try to fix the issue:
I have tried using different images, but the issue still occurs.
I am running MobileSAM on a HP Spectre x360
I would appreciate it if you could look into this issue and let me know if there is anything I can do to fix it.
Thanks,
Patrici Bal
I'm considering the possibility of exporting MobileSam as an ONNX model and running it on mobile devices using onnxruntime.
In the "About" section, "official" is misspelled as "offiicial".
Thanks for the excellent work.
The whole image segmentation is much slower than the FastSAM, is this because of the different postprocessing? Thanks
I have added MobileSAM to Image-Anything demo app (https://github.com/neuromorph/image-anything). It combines MobileSAM with GroundingDINO, Matte-Anything (VitMatte), Stable Diffusion for various image tasks. This smaller footprint model was quite helpful in the multi-model setup and keeping same API as SAM certainly helps.
So, thank you and congratulations!
Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). "gnerate all mask" seems to suggest that you are using everything mode. For everything mode, even though our encoder is much faster than that of the original SAM(roughly 8ms vs 450ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.
Would plz let me know more hardware information which was used in the paper to reproduce 8ms encoder inference.
asttokens 2.2.1
backcall 0.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
coloredlogs 15.0.1
comm 0.1.3
contourpy 1.0.7
cycler 0.11.0
debugpy 1.6.7
decorator 5.1.1
executing 1.2.0
filelock 3.12.2
flatbuffers 23.5.26
fonttools 4.39.2
fsspec 2023.6.0
huggingface-hub 0.15.1
humanfriendly 10.0
idna 3.4
imgviz 1.2.6
importlib-metadata 6.7.0
importlib-resources 5.12.0
ipykernel 6.24.0
ipython 8.12.2
jedi 0.18.2
jupyter_client 8.3.0
jupyter_core 5.3.1
kiwisolver 1.4.4
mahotas 1.4.13
matplotlib 3.7.1
matplotlib-inline 0.1.6
mobile-sam 1.0 /mnt/sda/code/MobileSAM-master
mpmath 1.3.0
nest-asyncio 1.5.6
numpy 1.24.2
onnx 1.12.0
onnxruntime-gpu 1.13.1
opencv-python 4.7.0.72
opencv-python-headless 4.5.3.56
packaging 23.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 23.0.1
platformdirs 3.8.0
prompt-toolkit 3.0.38
protobuf 3.20.1
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
Pygments 2.15.1
pyparsing 3.0.9
PyQt5 5.15.7
PyQt5-Qt5 5.15.2
PyQt5-sip 12.12.1
python-dateutil 2.8.2
PyYAML 6.0
pyzmq 25.1.0
requests 2.31.0
safetensors 0.3.1
segment-anything 1.0 /mnt/sda/code/segment-anything-main
setuptools 65.6.3
six 1.16.0
stack-data 0.6.2
sympy 1.12
timm 0.9.2
torch 1.10.1+cu111
torchaudio 0.10.1+cu111
torchvision 0.11.2+cu111
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
typing_extensions 4.5.0
urllib3 2.0.3
wcwidth 0.2.6
wheel 0.38.4
zipp 3.15.0
(cam) dongzf@dongzf-LEGION-REN7000P-26AMR:/mnt/sda/code/MobileSAM-master$ python scripts/export_onnx_model.py --checkpoint ./weights/mobile_sam.pt --model-type vit_t --output ./mobile_sam.onnx
Loading model...
Exporting onnx model to ./mobile_sam.onnx...
Traceback (most recent call last):
File "scripts/export_onnx_model.py", line 176, in
run_export(
File "scripts/export_onnx_model.py", line 148, in run_export
torch.onnx.export(
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/init.py", line 316, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/utils.py", line 107, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/utils.py", line 707, in _export
_set_opset_version(opset_version)
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 849, in _set_opset_version
raise ValueError("Unsupported ONNX opset version: " + str(opset_version))
ValueError: Unsupported ONNX opset version: 16
Hello, add these lines of code in build_sam.py to get the same code behavior as sam
like that:
def build_sam_mobile(checkpoint=None):
from mobile_encoder.setup_mobile_sam import setup_model
mobile_sam = setup_model()
if checkpoint is not None:
mobile_sam.load_state_dict(
torch.load(checkpoint), strict=True)
return mobile_sam
sam_model_registry = {
"default": build_sam_vit_h,
"vit_h": build_sam_vit_h,
"vit_l": build_sam_vit_l,
"vit_b": build_sam_vit_b,
'mobile': build_sam_mobile
}
Then you can export onnx model and quick demo like sam:
python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type mobile --input <image_or_folder> --output <path/to/output>
python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type mobile --output <path/to/output>
Thanks for the great repo. ( Sorry if this is an obvious question ) Is there support for text prompts for segmentation ?
It took 3000ms in my conputer,I don't know what is wrong
Hello, thank you for your great work!
I notice the the segment_anything
part in the code has the same code structure compared with the original SAM.
Is there any modification for this part?
And if I want to integrate MobileSAM into the project with SAM, all I need to do is to add the mobile_encoder to the original project ?
just like vit_h, MobileSAM doesn't work well when using half precision, but vit_b and _vit_l can work normally, have you ever thought about the reason of this phenomenon?It would be thankful if you can reply
Hi! Your paper and the work you've done is amazing, I was wondering if you tried training any of the other Tiny-ViT models (like one of the larger ones) using your pipeline? If yes, would you mind sharing those weight files?
Hey guys,
great work on this. I'm one of the co-authors of Ultralytics YOLOv8 and was wondering if you'd like to add support for faster SAM to Ultralytics models HUB here -> https://docs.ultralytics.com/models/
I'd be happy to help. Thanks!
Great work,hope you can update the train code so that we can finetune the model for a typical work.
Hello and thank you for open sourcing your work. It's truly an amazing engineering feat.
I've tried to fine-tune the mask decoder using box prompt with this repository as base. It works with VitB, VitL and VitH for my datasets. In all the procedures below, image encoder and prompt encoder layers are always frozen.
However, when I try to fine-tuning the mask decoder of VitT directly, the model seems be falling apart.
Below is the result after 1 epoch vs the original one. Further epoch only shrink the masks further without any visible improvements.
Result of 1 epoch fine-tuning (VitT)
An exhaustive grid search was performed with 2 hyperparamers:
Given that VitL of MobileSAM is distilled from VitH while keeping the mask decoder and prompt decoder the same, I've tried another approach:
This approach worked for me but the results is less than ideal. Some masks are better, some masks are worst comparing to original VitL. Also, fine-tuning VitH takes a lot of computing resources so I would like to avoid that.
Any suggestions to this matter would be much appreciated.
TIA
Looking forward to it.
AnyLabeling now supports MobileSAM for auto-labeling.
Checkout at: https://github.com/vietanhdev/anylabeling/releases/tag/v0.3.0.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.