Coder Social home page Coder Social logo

chaoningzhang / mobilesam Goto Github PK

View Code? Open in Web Editor NEW
4.4K 4.4K 466.0 93.54 MB

This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!

License: Apache License 2.0

Shell 0.02% Python 6.99% Jupyter Notebook 92.99%

mobilesam's People

Contributors

chaoningzhang avatar dhkim2810 avatar dongshenhan avatar killian31 avatar ksugar avatar qiaoyu1002 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobilesam's Issues

convert the model to onnx

when I convert the model to onnx, I meet the error.
Exporting onnx model to mobile_sam_onnx_opset11.onnx...
Traceback (most recent call last):
File "/mnt/e/ncnn_project/MobileSAM/scripts/export_onnx_model.py", line 179, in
run_export(
File "/mnt/e/ncnn_project/MobileSAM/scripts/export_onnx_model.py", line 150, in run_export
torch.onnx.export(
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/init.py", line 275, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 88, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 689, in _export
_model_to_graph(model, args, verbose, input_names,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 463, in _model_to_graph
graph = _optimize_graph(graph, operator_export_type,
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 200, in _optimize_graph
graph = torch._C._jit_pass_onnx(graph, operator_export_type)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/init.py", line 313, in _run_symbolic_function
return utils._run_symbolic_function(*args, **kwargs)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
return symbolic_fn(g, *inputs, **attrs)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/symbolic_opset11.py", line 922, in repeat_interleave
return torch.onnx.symbolic_opset9.repeat_interleave(g, self, repeats, final_dim)
File "/home/gtf/miniconda3/envs/yolov8/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 2064, in repeat_interleave
for idx, r_split in enumerate(r_splits):
TypeError: 'torch._C.Value' object is not iterable
(Occurred when translating repeat_interleave).

Is there a Light-Weight Decoder Too?

So I cloned the repo and worked through the notebook. Do I understand correctly that there's a lightweight image encoded based on TinyViT, but you still need to use the fullsize SAM decoder? I had to download the sam_vit_h_4b8939.pth file, which is 2.4GB.

Am I using this wrong? Or are there smaller decoder models?

Changing segment

Can we change a particular masked segment with another image with same shape of masked image.

timm required

I had to pip install timm to use mobile_sam, but it is nowhere mentioned that this is required

More training data?

Great work! I wonder that if we use more training data (~10% or more) to train the mobile-sam, will we get a more powerful model?

Unable to replicate SAM results

Hello,

Thank you for you work. I've tried to replace your model instead of using SAM in a visualisation pipeline that i've setup. The problem is that I'm unable to obtain as good results as with original SAM, even compared to the Vit-b version. On my test with AutoMask, your model is unable to separate objects as SAM can do. Is the model given in weights your trained model and if yes do you have any clue on what could be wrong ? (the preprocessing is OK as my code is working with original SAM)

Thank you in advance !

The image embedding inference time is long

below is my test code, I calculate the code “ predictor.set_image(image) ” running time , in my laptop pc (RTX3060),it cost about 50ms using GPU.

sam_checkpoint = "./weights/mobile_sam.pt"
model_type = "vit_t"
device = "cuda" if torch.cuda.is_available() else "cpu"
sam = sam_model_registrymodel_type
sam.to(device=device)
sam.eval()
predictor = SamPredictor(sam)

for i in range(10):
torch.cuda.synchronize()
start = time.time()

predictor.set_image(image)

torch.cuda.synchronize()
print('time cost = ', (time.time() - start) * 1000, "ms")

Added MobileSAM support to Track-Anything

Hello, I added MobileSAM support to Track Anything which allows to use SAM + XMem for video object tracking and segmentation.

While it does not speed up the inference much during tracking (as XMem is mostly used) MobileSAM allows to have a snappier Gradio interface, making the process of adding masks and deciding tracked objects faster.

Thanks for your work

RKNN support

would you like to add support for RKNN,to run the model on RK3588

Some confusion about analysis and experiment in the paper

As I was reading the paper I had some confusion: your paper mentions that FastSAM needs at least two prompt points and on that basis compares the performance of FastSAM and MobileSAM in segment anything mode.

But as you can see from the HuggingFace demo posted by FastSAM, their approach supports a single point and has worked well in my own attempts. If it is indeed a writing error, I very much look forward to your revision and showing new experiments to effectively compare the two methods.

BTW, I am also curious how Table 7 in the paper is done exactly, and why the model performance can be proved by modifying the distance of positive and negative prompt points?
51db6846031ae9f807827f67fcbc1bc

Long inference time for single prompt (2s for encoding)

Here is my code, just adding the time check on the demo code. It seems the encoding process is far more time-consuming than expected.

from mobile_sam import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import torch
import cv2
import numpy as np
import time

model_type = "vit_t"
sam_checkpoint = "./weights/mobile_sam.pt"

device = "cuda:1" if torch.cuda.is_available() else "cpu"
print(device)

mobile_sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
mobile_sam.to(device=device)
mobile_sam.eval()

predictor = SamPredictor(mobile_sam)

image = cv2.imread('./0000_color.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print(image.shape)
box = np.array([100,100,400,600])
time_s = time.time()
predictor.set_image(image=image)
time_e1 = time.time()
masks, _, _ = predictor.predict(box=box)
time_e2 = time.time()
print('encoding time:',time_e1-time_s)
print('decoding time:',time_e2-time_e1)

When using a GPU (RTX 3090), the output is:
cuda:1 (480, 640, 3) encoding time: 2.325988531112671 decoding time: 0.018665313720703125

When using cpu, the output is:
cpu (480, 640, 3) encoding time: 0.8602027893066406 decoding time: 0.08456754684448242

The cpu even runs faster on encoding

Core ML support

Hey guys,
great work. I was wondering would you like to add support for Core ML
to run the model on iOS.

Apple MPS support?

Hello,

Does anyone knows how to use apple "MPS" instead of cpu/cuda device?

training code

HI! can you release the training code? very thanks!

Training code

Thanks for your work, does the training code mean that I can use my labeled segmented data containing one class to train a model that segments only that one class?
感谢您的工作,请问训练代码是指可以使用我包含一个类的已标注分割数据训练一个只分割这一个类的模型吗?

How to prove that MobileSAM's performance is comparable to the original SAM?

Thanks a lot for your work! When you mention that MobileSAM and SAM have comparable performance, do you mean with SAM (ViT-H)? I tried to find evidence for this idea from the paper, but only found a few visualization results. I'm really looking forward to know the quantitative comparison between MobileSAM and SAM on some tasks (such as the multiple experiments mentioned in the original SAM paper).

Faster Segment Anything (MobileSAM) taking 20 seconds and erroring out

Hi,

I am experiencing an issue with Faster Segment Anything (MobileSAM) where it is taking 20 seconds to run sometimes, and errors out each time. I have also tried placing over 10 points on an image, but it is still taking a long time and not working.

Here are the steps I am taking to reproduce the issue:

I open https://huggingface.co/spaces/dhkim2810/MobileSAM
I open the image in MobileSAM, using my own and example images provided.
I Select a point with Add Mask
I place points on the image
I click the "Start segmenting" button
The issue occurs when the "Start segmenting" button is clicked. The process sometimes takes 8, 10, and 20 seconds, and each time it errors out with the following message: Error.

I have tried the following to try to fix the issue:

I have tried using different images, but the issue still occurs.
I am running MobileSAM on a HP Spectre x360

I would appreciate it if you could look into this issue and let me know if there is anything I can do to fix it.

Thanks,
Patrici Bal

segmentation for whole image is slow

Thanks for the excellent work.

The whole image segmentation is much slower than the FastSAM, is this because of the different postprocessing? Thanks

gnerate all mask spend 20s, it is too long

Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). "gnerate all mask" seems to suggest that you are using everything mode. For everything mode, even though our encoder is much faster than that of the original SAM(roughly 8ms vs 450ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.

import onnx error

asttokens 2.2.1
backcall 0.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
coloredlogs 15.0.1
comm 0.1.3
contourpy 1.0.7
cycler 0.11.0
debugpy 1.6.7
decorator 5.1.1
executing 1.2.0
filelock 3.12.2
flatbuffers 23.5.26
fonttools 4.39.2
fsspec 2023.6.0
huggingface-hub 0.15.1
humanfriendly 10.0
idna 3.4
imgviz 1.2.6
importlib-metadata 6.7.0
importlib-resources 5.12.0
ipykernel 6.24.0
ipython 8.12.2
jedi 0.18.2
jupyter_client 8.3.0
jupyter_core 5.3.1
kiwisolver 1.4.4
mahotas 1.4.13
matplotlib 3.7.1
matplotlib-inline 0.1.6
mobile-sam 1.0 /mnt/sda/code/MobileSAM-master
mpmath 1.3.0
nest-asyncio 1.5.6
numpy 1.24.2
onnx 1.12.0
onnxruntime-gpu 1.13.1
opencv-python 4.7.0.72
opencv-python-headless 4.5.3.56
packaging 23.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 23.0.1
platformdirs 3.8.0
prompt-toolkit 3.0.38
protobuf 3.20.1
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
Pygments 2.15.1
pyparsing 3.0.9
PyQt5 5.15.7
PyQt5-Qt5 5.15.2
PyQt5-sip 12.12.1
python-dateutil 2.8.2
PyYAML 6.0
pyzmq 25.1.0
requests 2.31.0
safetensors 0.3.1
segment-anything 1.0 /mnt/sda/code/segment-anything-main
setuptools 65.6.3
six 1.16.0
stack-data 0.6.2
sympy 1.12
timm 0.9.2
torch 1.10.1+cu111
torchaudio 0.10.1+cu111
torchvision 0.11.2+cu111
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
typing_extensions 4.5.0
urllib3 2.0.3
wcwidth 0.2.6
wheel 0.38.4
zipp 3.15.0

(cam) dongzf@dongzf-LEGION-REN7000P-26AMR:/mnt/sda/code/MobileSAM-master$ python scripts/export_onnx_model.py --checkpoint ./weights/mobile_sam.pt --model-type vit_t --output ./mobile_sam.onnx
Loading model...
Exporting onnx model to ./mobile_sam.onnx...
Traceback (most recent call last):
File "scripts/export_onnx_model.py", line 176, in
run_export(
File "scripts/export_onnx_model.py", line 148, in run_export
torch.onnx.export(
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/init.py", line 316, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/utils.py", line 107, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/utils.py", line 707, in _export
_set_opset_version(opset_version)
File "/home/dongzf/miniconda3/envs/cam/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 849, in _set_opset_version
raise ValueError("Unsupported ONNX opset version: " + str(opset_version))
ValueError: Unsupported ONNX opset version: 16

add little code for onnx export and quick demo

Hello, add these lines of code in build_sam.py to get the same code behavior as sam
like that:

def build_sam_mobile(checkpoint=None):
    from mobile_encoder.setup_mobile_sam import setup_model
    mobile_sam = setup_model()
    if checkpoint is not None:
        mobile_sam.load_state_dict(
            torch.load(checkpoint), strict=True)
    return mobile_sam


sam_model_registry = {
    "default": build_sam_vit_h,
    "vit_h": build_sam_vit_h,
    "vit_l": build_sam_vit_l,
    "vit_b": build_sam_vit_b,
    'mobile': build_sam_mobile
}

Then you can export onnx model and quick demo like sam:

python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type mobile --input <image_or_folder> --output <path/to/output>

python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type mobile --output <path/to/output>

Text prompt example

Thanks for the great repo. ( Sorry if this is an obvious question ) Is there support for text prompts for segmentation ?

Question about the segment_anything part in this code

Hello, thank you for your great work!
I notice the the segment_anything part in the code has the same code structure compared with the original SAM.
Is there any modification for this part?
And if I want to integrate MobileSAM into the project with SAM, all I need to do is to add the mobile_encoder to the original project ?

about problem of half precision

just like vit_h, MobileSAM doesn't work well when using half precision, but vit_b and _vit_l can work normally, have you ever thought about the reason of this phenomenon?It would be thankful if you can reply

Other TinyViT models

Hi! Your paper and the work you've done is amazing, I was wondering if you tried training any of the other Tiny-ViT models (like one of the larger ones) using your pipeline? If yes, would you mind sharing those weight files?

Mask decoder finetuning

Hello and thank you for open sourcing your work. It's truly an amazing engineering feat.
I've tried to fine-tune the mask decoder using box prompt with this repository as base. It works with VitB, VitL and VitH for my datasets. In all the procedures below, image encoder and prompt encoder layers are always frozen.

However, when I try to fine-tuning the mask decoder of VitT directly, the model seems be falling apart.

Below is the result after 1 epoch vs the original one. Further epoch only shrink the masks further without any visible improvements.

Result of 1 epoch fine-tuning (VitT)
image

Original result (VitT)
image

An exhaustive grid search was performed with 2 hyperparamers:

  • Learning rate: 1e-5, 1e-4, 8e-4, 1e-3, 1e-2
  • Batch size: 2, 4, 8, 16, 32, 64, 128, 256 (batch size larger than 32 is trained using gradient accumulation).
    In all configuration, the model fell apart after just 1 epoch just like the example above.

Given that VitL of MobileSAM is distilled from VitH while keeping the mask decoder and prompt decoder the same, I've tried another approach:

  1. Fine-tune VitH mask decoder
  2. Replace the VitL mask decoder with the fine-tuned one from VitH.

This approach worked for me but the results is less than ideal. Some masks are better, some masks are worst comparing to original VitL. Also, fine-tuning VitH takes a lot of computing resources so I would like to avoid that.

Any suggestions to this matter would be much appreciated.
TIA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.