chaoningzhang / mobilesam Goto Github PK

View Code? Open in Web Editor NEW

4.4K 4.4K 466.0 93.54 MB

This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!

License: Apache License 2.0

Shell 0.02% Python 6.99% Jupyter Notebook 92.99%

mobilesam's People

Contributors

Stargazers

Watchers

Forkers

zs-2021 sunghbae oibinu dhkim2810 maryamkqamar ahp1995 mlvc-lab jrespect myyzzzoooo ysbsb jie311 sumitkhu160205 sheikhsalmanhassan stargazerswc ofirkris sun631998316 liufqing nemonameless passiolife weisili2016 biovbreed healthonrails wwhio triple-mu ttjjmm unriddler 3a1b2c3 sunkaianna hobbymarks kgnbkgnb666 fenhua tuanshu 2132660698 mjlsuccess lannist ai-chaohu icedstone mcx apollohuang1 0cpha cvjie eltociear chaofwang scottflybird russ76 thanhpham1987 vietanhdev jieli1990 qqq-tech zp1018 adambear litt7e-oneone limaopeng1 qiaoyu1002 yijunwu hathoric hubin858130 luodinglin atikuzzamankhu delldu cts2021 pyc-kjmkj garbe-github-support narasimmansaravana1994 lwppwl gavinljj asdlei99 aacoderr chuxiuhong jikhanjung barbara0barbara rhondairushl73 gamerquant9z leoriaaa takashibbb meghasow erwindaviaa liujoel3 kiminonawaaaa raphtaliaaaa haohao11 liujuncn tanjironinaa yuitakoaaa xinshuai-lyu urbanist-ai josephwhite11 kikivisen ayutrida leanora33 sl001zc opbe wuyaojiong jarenierinili genericp3rson jtran15 anbea2020 lotusshashi st-rnd ank-it

mobilesam's Issues

convert the model to onnx

when I convert the model to onnx, I meet the error.
Exporting onnx model to mobile_sam_onnx_opset11.onnx...
Traceback (most recent call last):
File "/mnt/e/ncnn_project/MobileSAM/scripts/export_onnx_model.py", line 179, in
run_export(
File "/mnt/e/ncnn_project/MobileSAM/scripts/export_onnx_model.py", line 150, in run_export
torch.onnx.export(
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/init.py", line 275, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 88, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 689, in _export
_model_to_graph(model, args, verbose, input_names,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 463, in _model_to_graph
graph = _optimize_graph(graph, operator_export_type,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 200, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/init.py", line 313, in _run_symbolic_function
return utils._run_symbolic_function(*args, **kwargs)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
return symbolic_fn(g, *inputs, **attrs)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/symbolic_opset11.py", line 922, in repeat_interleave
return torch.onnx.symbolic_opset9.repeat_interleave(g, self, repeats, final_dim)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 2064, in repeat_interleave
for idx, r_split in enumerate(r_splits):
TypeError: 'torch._C.Value' object is not iterable
(Occurred when translating repeat_interleave).

handle multiple images at the same time？

Can the model handle multiple images at the same time? How to modify the code if possible

Is there a Light-Weight Decoder Too?

So I cloned the repo and worked through the notebook. Do I understand correctly that there's a lightweight image encoded based on TinyViT, but you still need to use the fullsize SAM decoder? I had to download the sam_vit_h_4b8939.pth file, which is 2.4GB.

Am I using this wrong? Or are there smaller decoder models?

Changing segment

Can we change a particular masked segment with another image with same shape of masked image.

timm required

I had to pip install timm to use mobile_sam, but it is nowhere mentioned that this is required

[Feature] Shipping in docker

The initiative is fantastic and quite beneficial. Containerizing the project will make it easier to use.

More training data?

Great work! I wonder that if we use more training data (~10% or more) to train the mobile-sam, will we get a more powerful model?

Example implementation of MobileSAM in the browser!

Hi! Thank you very much for creating such a wonderful segmentation solution. Using it I was able to create a small example, which works in the browser (powered by ONNX runtime web).

Here is the link: https://github.com/akbartus/MobileSAM-in-the-Browser

Feel free to share/add to your readme section, if needed.

Thank you.

[Project] Grounded-MobileSAM

We've already implemented Grounded-MobileSAM here: Grounded-MobileSAM Demo, it's really efficient

Could you share your code to compare the inference time between SAM and MobileSAM?

I just found the inference time is even longer than SAM when SAM uses 'vit_b'.
I add the time in around this line code

Unable to replicate SAM results

Hello,

Thank you for you work. I've tried to replace your model instead of using SAM in a visualisation pipeline that i've setup. The problem is that I'm unable to obtain as good results as with original SAM, even compared to the Vit-b version. On my test with AutoMask, your model is unable to separate objects as SAM can do. Is the model given in weights your trained model and if yes do you have any clue on what could be wrong ? (the preprocessing is OK as my code is working with original SAM)

Thank you in advance !

The image embedding inference time is long

below is my test code, I calculate the code “ predictor.set_image(image) ” running time , in my laptop pc （RTX3060），it cost about 50ms using GPU.

sam_checkpoint = "./weights/mobile_sam.pt"
model_type = "vit_t"
device = "cuda" if torch.cuda.is_available() else "cpu"
sam = sam_model_registrymodel_type
sam.to(device=device)
sam.eval()
predictor = SamPredictor(sam)

for i in range(10):
torch.cuda.synchronize()
start = time.time()

predictor.set_image(image)

torch.cuda.synchronize()
print('time cost = ', (time.time() - start) * 1000, "ms")

would you like to provide vit weights and C++ interface?

wonderful work! I'm interested in interactive segmentation,C++ interface on the way?

Added MobileSAM support to Track-Anything

Hello, I added MobileSAM support to Track Anything which allows to use SAM + XMem for video object tracking and segmentation.

While it does not speed up the inference much during tracking (as XMem is mostly used) MobileSAM allows to have a snappier Gradio interface, making the process of adding masks and deciding tracked objects faster.

Thanks for your work

MobileSAM supports text as input Prompt

Hi Team
Does MobileSam supports text as input prompt and mask the region based on the input prompt text.

RKNN support

would you like to add support for RKNN，to run the model on RK3588

Some confusion about analysis and experiment in the paper

As I was reading the paper I had some confusion: your paper mentions that FastSAM needs at least two prompt points and on that basis compares the performance of FastSAM and MobileSAM in segment anything mode.

But as you can see from the HuggingFace demo posted by FastSAM, their approach supports a single point and has worked well in my own attempts. If it is indeed a writing error, I very much look forward to your revision and showing new experiments to effectively compare the two methods.

BTW, I am also curious how Table 7 in the paper is done exactly, and why the model performance can be proved by modifying the distance of positive and negative prompt points?

Long inference time for single prompt (2s for encoding)

Here is my code, just adding the time check on the demo code. It seems the encoding process is far more time-consuming than expected.

from mobile_sam import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import torch
import cv2
import numpy as np
import time

model_type = "vit_t"
sam_checkpoint = "./weights/mobile_sam.pt"

device = "cuda:1" if torch.cuda.is_available() else "cpu"
print(device)

mobile_sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
mobile_sam.to(device=device)
mobile_sam.eval()

predictor = SamPredictor(mobile_sam)

image = cv2.imread('./0000_color.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print(image.shape)
box = np.array([100,100,400,600])
time_s = time.time()
predictor.set_image(image=image)
time_e1 = time.time()
masks, _, _ = predictor.predict(box=box)
time_e2 = time.time()
print('encoding time:',time_e1-time_s)
print('decoding time:',time_e2-time_e1)

When using a GPU (RTX 3090), the output is:
cuda:1 (480, 640, 3) encoding time: 2.325988531112671 decoding time: 0.018665313720703125

When using cpu, the output is:
cpu (480, 640, 3) encoding time: 0.8602027893066406 decoding time: 0.08456754684448242

The cpu even runs faster on encoding

Core ML support

Hey guys,
great work. I was wondering would you like to add support for Core ML
to run the model on iOS.

Apple MPS support?

Hello,

Does anyone knows how to use apple "MPS" instead of cpu/cuda device?

how to convet the model to onnx

I did not convert the model to onnx according to the method provided by SAM

generate 生成mark时间很长呀，同等gpu，我对比了FastSAM，FastSAM 只需要 0.1s，而 MobileSAM 需要 7s

是我们运行姿势不对？？？

Intergrating to Matting-Anything raise model mismatch

as title

how to import encoded model onnx?

training code

HI! can you release the training code? very thanks!

Convert to ONNX with SAMExporter

SAM Exporter has added Mobile SAM to the support models.

Checkout: https://github.com/vietanhdev/samexporter.
This implementation support exporting Encoder and Decoder separately.
Add this repo to README if you find it useful.

MobileSam batch inference

Does MobileSAM support batch inference on segment everything mode?

Training code

Thanks for your work, does the training code mean that I can use my labeled segmented data containing one class to train a model that segments only that one class?
感谢您的工作，请问训练代码是指可以使用我包含一个类的已标注分割数据训练一个只分割这一个类的模型吗？

Great work! Does MobileSAM support fine-tuning for customized tasks (e.g. binary semantic segmentation)?

Hello authors,

Thanks so much for your great work.
It looks quite promising.
Does it support fine-tuning for downstream tasks such as binary semantic segmentation?
Such as SAM adapter.
https://github.com/tianrun-chen/SAM-Adapter-PyTorch

Kind regards.

How to prove that MobileSAM's performance is comparable to the original SAM?

Thanks a lot for your work! When you mention that MobileSAM and SAM have comparable performance, do you mean with SAM (ViT-H)? I tried to find evidence for this idea from the paper, but only found a few visualization results. I'm really looking forward to know the quantitative comparison between MobileSAM and SAM on some tasks (such as the multiple experiments mentioned in the original SAM paper).

Faster Segment Anything (MobileSAM) taking 20 seconds and erroring out

Hi,

I am experiencing an issue with Faster Segment Anything (MobileSAM) where it is taking 20 seconds to run sometimes, and errors out each time. I have also tried placing over 10 points on an image, but it is still taking a long time and not working.

Here are the steps I am taking to reproduce the issue:

I open https://huggingface.co/spaces/dhkim2810/MobileSAM
I open the image in MobileSAM, using my own and example images provided.
I Select a point with Add Mask
I place points on the image
I click the "Start segmenting" button
The issue occurs when the "Start segmenting" button is clicked. The process sometimes takes 8, 10, and 20 seconds, and each time it errors out with the following message: Error.

I have tried the following to try to fix the issue:

I have tried using different images, but the issue still occurs.
I am running MobileSAM on a HP Spectre x360

I would appreciate it if you could look into this issue and let me know if there is anything I can do to fix it.

Thanks,
Patrici Bal

Can this model run on mobile devices?

I'm considering the possibility of exporting MobileSam as an ONNX model and running it on mobile devices using onnxruntime.

Typo in "About" section of this repo

In the "About" section, "official" is misspelled as "offiicial".

segmentation for whole image is slow

Thanks for the excellent work.

The whole image segmentation is much slower than the FastSAM, is this because of the different postprocessing? Thanks

Added MobileSAM to image-anything app

I have added MobileSAM to Image-Anything demo app (https://github.com/neuromorph/image-anything). It combines MobileSAM with GroundingDINO, Matte-Anything (VitMatte), Stable Diffusion for various image tasks. This smaller footprint model was quite helpful in the multi-model setup and keeping same API as SAM certainly helps.
So, thank you and congratulations!

gnerate all mask spend 20s, it is too long

Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). "gnerate all mask" seems to suggest that you are using everything mode. For everything mode, even though our encoder is much faster than that of the original SAM(roughly 8ms vs 450ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.

more hardware information to repreduce 8ms encoder inference

Would plz let me know more hardware information which was used in the paper to reproduce 8ms encoder inference.

import onnx error

asttokens 2.2.1
backcall 0.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
coloredlogs 15.0.1
comm 0.1.3
contourpy 1.0.7
cycler 0.11.0
debugpy 1.6.7
decorator 5.1.1
executing 1.2.0
filelock 3.12.2
flatbuffers 23.5.26
fonttools 4.39.2
fsspec 2023.6.0
huggingface-hub 0.15.1
humanfriendly 10.0
idna 3.4
imgviz 1.2.6
importlib-metadata 6.7.0
importlib-resources 5.12.0
ipykernel 6.24.0
ipython 8.12.2
jedi 0.18.2
jupyter_client 8.3.0
jupyter_core 5.3.1
kiwisolver 1.4.4
mahotas 1.4.13
matplotlib 3.7.1
matplotlib-inline 0.1.6
mobile-sam 1.0 /mnt/sda/code/MobileSAM-master
mpmath 1.3.0
nest-asyncio 1.5.6
numpy 1.24.2
onnx 1.12.0
onnxruntime-gpu 1.13.1
opencv-python 4.7.0.72
opencv-python-headless 4.5.3.56
packaging 23.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 23.0.1
platformdirs 3.8.0
prompt-toolkit 3.0.38
protobuf 3.20.1
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
Pygments 2.15.1
pyparsing 3.0.9
PyQt5 5.15.7
PyQt5-Qt5 5.15.2
PyQt5-sip 12.12.1
python-dateutil 2.8.2
PyYAML 6.0
pyzmq 25.1.0
requests 2.31.0
safetensors 0.3.1
segment-anything 1.0 /mnt/sda/code/segment-anything-main
setuptools 65.6.3
six 1.16.0
stack-data 0.6.2
sympy 1.12
timm 0.9.2
torch 1.10.1+cu111
torchaudio 0.10.1+cu111
torchvision 0.11.2+cu111
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
typing_extensions 4.5.0
urllib3 2.0.3
wcwidth 0.2.6
wheel 0.38.4
zipp 3.15.0

(cam) dongzf@dongzf-LEGION-REN7000P-26AMR:/mnt/sda/code/MobileSAM-master$ python scripts/export_onnx_model.py --checkpoint ./weights/mobile_sam.pt --model-type vit_t --output ./mobile_sam.onnx
Loading model...
Exporting onnx model to ./mobile_sam.onnx...
Traceback (most recent call last):
File "scripts/export_onnx_model.py", line 176, in
run_export(
File "scripts/export_onnx_model.py", line 148, in run_export
torch.onnx.export(
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/init.py", line 316, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/utils.py", line 107, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/utils.py", line 707, in _export
_set_opset_version(opset_version)
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 849, in _set_opset_version
raise ValueError("Unsupported ONNX opset version: " + str(opset_version))
ValueError: Unsupported ONNX opset version: 16

add little code for onnx export and quick demo

Hello, add these lines of code in build_sam.py to get the same code behavior as sam
like that:

def build_sam_mobile(checkpoint=None):
    from mobile_encoder.setup_mobile_sam import setup_model
    mobile_sam = setup_model()
    if checkpoint is not None:
        mobile_sam.load_state_dict(
            torch.load(checkpoint), strict=True)
    return mobile_sam


sam_model_registry = {
    "default": build_sam_vit_h,
    "vit_h": build_sam_vit_h,
    "vit_l": build_sam_vit_l,
    "vit_b": build_sam_vit_b,
    'mobile': build_sam_mobile
}

Then you can export onnx model and quick demo like sam:

python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type mobile --input <image_or_folder> --output <path/to/output>

python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type mobile --output <path/to/output>

Text prompt example

Thanks for the great repo. ( Sorry if this is an obvious question ) Is there support for text prompts for segmentation ?

It took 3000ms in my conputer

It took 3000ms in my conputer,I don't know what is wrong

Question about the segment_anything part in this code

Hello, thank you for your great work!
I notice the the segment_anything part in the code has the same code structure compared with the original SAM.
Is there any modification for this part?
And if I want to integrate MobileSAM into the project with SAM, all I need to do is to add the mobile_encoder to the original project ?

about problem of half precision

just like vit_h, MobileSAM doesn't work well when using half precision, but vit_b and _vit_l can work normally, have you ever thought about the reason of this phenomenon?It would be thankful if you can reply

Other TinyViT models

Hi! Your paper and the work you've done is amazing, I was wondering if you tried training any of the other Tiny-ViT models (like one of the larger ones) using your pipeline? If yes, would you mind sharing those weight files?

Great work! would you like to add this model to ultralytics?

Hey guys,
great work on this. I'm one of the co-authors of Ultralytics YOLOv8 and was wondering if you'd like to add support for faster SAM to Ultralytics models HUB here -> https://docs.ultralytics.com/models/
I'd be happy to help. Thanks!

Hope the training code can update as quickly as possible

Great work，hope you can update the train code so that we can finetune the model for a typical work.

Mask decoder finetuning

Hello and thank you for open sourcing your work. It's truly an amazing engineering feat.
I've tried to fine-tune the mask decoder using box prompt with this repository as base. It works with VitB, VitL and VitH for my datasets. In all the procedures below, image encoder and prompt encoder layers are always frozen.

However, when I try to fine-tuning the mask decoder of VitT directly, the model seems be falling apart.

Below is the result after 1 epoch vs the original one. Further epoch only shrink the masks further without any visible improvements.

Result of 1 epoch fine-tuning (VitT)

Original result (VitT)

An exhaustive grid search was performed with 2 hyperparamers:

Learning rate: 1e-5, 1e-4, 8e-4, 1e-3, 1e-2
Batch size: 2, 4, 8, 16, 32, 64, 128, 256 (batch size larger than 32 is trained using gradient accumulation).
In all configuration, the model fell apart after just 1 epoch just like the example above.

Given that VitL of MobileSAM is distilled from VitH while keeping the mask decoder and prompt decoder the same, I've tried another approach:

Fine-tune VitH mask decoder
Replace the VitL mask decoder with the fine-tuned one from VitH.

This approach worked for me but the results is less than ideal. Some masks are better, some masks are worst comparing to original VitL. Also, fine-tuning VitH takes a lot of computing resources so I would like to avoid that.

Any suggestions to this matter would be much appreciated.
TIA