kadirnar / segment-anything-video Goto Github PK

View Code? Open in Web Editor NEW

939.0 12.0 67.0 884 KB

MetaSeg: Packaged version of the Segment Anything repository

License: Apache License 2.0

Python 100.00%

object-detection segmentation segment-anything object-segmentation yolov5 yolov6 yolov7 yolov8

segment-anything-video's Introduction

MetaSeg: Packaged version of the Segment Anything repository

This repo is a packaged version of the segment-anything model.

Installation

pip install metaseg

Usage

from metaseg import SegAutoMaskPredictor, SegManualMaskPredictor

# If gpu memory is not enough, reduce the points_per_side and points_per_batch.

# For image
results = SegAutoMaskPredictor().image_predict(
    source="image.jpg",
    model_type="vit_l", # vit_l, vit_h, vit_b
    points_per_side=16,
    points_per_batch=64,
    min_area=0,
    output_path="output.jpg",
    show=True,
    save=False,
)

# For video
results = SegAutoMaskPredictor().video_predict(
    source="video.mp4",
    model_type="vit_l", # vit_l, vit_h, vit_b
    points_per_side=16,
    points_per_batch=64,
    min_area=1000,
    output_path="output.mp4",
)

# For manuel box and point selection

# For image
results = SegManualMaskPredictor().image_predict(
    source="image.jpg",
    model_type="vit_l", # vit_l, vit_h, vit_b
    input_point=[[100, 100], [200, 200]],
    input_label=[0, 1],
    input_box=[100, 100, 200, 200], # or [[100, 100, 200, 200], [100, 100, 200, 200]]
    multimask_output=False,
    random_color=False,
    show=True,
    save=False,
)

# For video

results = SegManualMaskPredictor().video_predict(
    source="video.mp4",
    model_type="vit_l", # vit_l, vit_h, vit_b
    input_point=[0, 0, 100, 100],
    input_label=[0, 1],
    input_box=None,
    multimask_output=False,
    random_color=False,
    output_path="output.mp4",
)

SAHI + Segment Anything

pip install sahi metaseg

from metaseg.sahi_predict import SahiAutoSegmentation, sahi_sliced_predict

image_path = "image.jpg"
boxes = sahi_sliced_predict(
    image_path=image_path,
    detection_model_type="yolov5",  # yolov8, detectron2, mmdetection, torchvision
    detection_model_path="yolov5l6.pt",
    conf_th=0.25,
    image_size=1280,
    slice_height=256,
    slice_width=256,
    overlap_height_ratio=0.2,
    overlap_width_ratio=0.2,
)

SahiAutoSegmentation().image_predict(
    source=image_path,
    model_type="vit_b",
    input_box=boxes,
    multimask_output=False,
    random_color=False,
    show=True,
    save=False,
)

FalAI(Cloud GPU) + Segment Anything

pip install metaseg fal_serverless
fal-serverless auth login

# For Auto Mask
from metaseg import falai_automask_image

image = falai_automask_image(
    image_path="image.jpg",
    model_type="vit_b",
    points_per_side=16,
    points_per_batch=32,
    min_area=0,
)
image.show() # Show image
image.save("output.jpg") # Save image

# For Manual Mask
from metaseg import falai_manuelmask_image

image = falai_manualmask_image(
    image_path="image.jpg",
    model_type="vit_b",
    input_point=[[100, 100], [200, 200]],
    input_label=[0, 1],
    input_box=[100, 100, 200, 200], # or [[100, 100, 200, 200], [100, 100, 200, 200]],
    multimask_output=False,
    random_color=False,
)
image.show() # Show image
image.save("output.jpg") # Save image

Extra Features

Support for Yolov5/8, Detectron2, Mmdetection, Torchvision models
Support for video and web application(Huggingface Spaces)
Support for manual single multi box and point selection
Support for pip installation
Support for SAHI library
Support for FalAI

segment-anything-video's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes hbcbh1999 hercules261188 hwpengtristin gitbenxing ceeroblaq thliang01 jaedukseo deep-learner-msp nishatvasker zifei-zhao grv805 hengle ilovemans yahooo-m oyjq pranjalya researcher48 kawdoco thanhpham1987 hedlen creolben soxunlocks gvc0461082002 abidsarwar lcz52900 klonggan midhat81 yjybuaa threeneedone syedusama5556 materialvision timills kristianmk deanofthewebb limzh00 shutongjin johannsky zenetio sorokinvld 2132660698 healthonrails richgong mcx chaudhga dr-data minfuel bestsongc osmaras zouxiaodong tahhnik nandss1 byteshow1234 tonywhite11 huiyan-dev leetesla hawkit vvandriichuk ferdavid1 alishabrenholt chips-song davidko3 daviddelaurier likeai22 bobonice

segment-anything-video's Issues

AttributeError: 'list' object has no attribute 'astype'

results = SegManualMaskPredictor().image_predict(
source="C:\software\sam_checkpoint\3515.jpg",
model_type="vit_h", # vit_l, vit_h, vit_b
input_point=[[548, 1031], [1121, 769]],
input_label=[0, 1],
input_box=[229, 684, 800, 800], # or [[100, 100, 200, 200], [100, 100, 200, 200]]
multimask_output=False,
random_color=False,
#output_path="C:\software\sam_checkpoint\output2.jpg",
show=False,
save=True,
)

======================
AttributeError Traceback (most recent call last)
Cell In[43], line 1
----> 1 results = SegManualMaskPredictor().image_predict(
2 source="C:\software\sam_checkpoint\3515.jpg",
3 model_type="vit_h", # vit_l, vit_h, vit_b
4 input_point=[[548, 1031], [1121, 769]],
5 input_label=[0, 1],
6 input_box=[229, 684, 800, 800], # or [[100, 100, 200, 200], [100, 100, 200, 200]]
7 multimask_output=False,
8 random_color=False,
9 #output_path="C:\software\sam_checkpoint\output2.jpg",
10 show=False,
11 save=True,
12 )

File ~\anaconda3\lib\site-packages\metaseg\mask_predictor.py:175, in SegManualMaskPredictor.image_predict(self, source, model_type, input_box, input_point, input_label, multimask_output, output_path, random_color, show, save)
172 elif type(input_box[0]) == int:
173 input_boxes = np.array(input_box)[None, :]
--> 175 masks, _, _ = predictor.predict(
176 point_coords=input_point,
177 point_labels=input_label,
178 box=input_boxes,
179 multimask_output=multimask_output,
180 )
181 mask_image = load_mask(masks, random_color)
182 image = load_box(input_box, image)

File ~\anaconda3\lib\site-packages\metaseg\generator\predictor.py:139, in SamPredictor.predict(self, point_coords, point_labels, box, mask_input, multimask_output, return_logits)
137 if point_coords is not None:
138 assert point_labels is not None, "point_labels must be supplied if point_coords is supplied."
--> 139 point_coords = self.transform.apply_coords(point_coords, self.original_size)
140 coords_torch = torch.as_tensor(point_coords, dtype=torch.float, device=self.device)
141 labels_torch = torch.as_tensor(point_labels, dtype=torch.int, device=self.device)

File ~\anaconda3\lib\site-packages\metaseg\utils\transforms.py:40, in ResizeLongestSide.apply_coords(self, coords, original_size)
38 old_h, old_w = original_size
39 new_h, new_w = self.get_preprocess_shape(original_size[0], original_size[1], self.target_length)
---> 40 coords = deepcopy(coords).astype(float)
41 coords[..., 0] = coords[..., 0] * (new_w / old_w)
42 coords[..., 1] = coords[..., 1] * (new_h / old_h)

AttributeError: 'list' object has no attribute 'astype'

segmented result json

Great job for the project. Is there a way I can get the segmented image in json for its x, y position and its own png file?

How about using FastSAM as Instance Segmenter Model?

How about using FastSAM as Instance Segmenter Model?
https://github.com/CASIA-IVA-Lab/FastSAM

Thanks in advance.

Originally posted by @DiamondGlassDrill in #68

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

Great work, Does it download the model checkpoints as well?

I've not tried it yet, just wanted to check if it also downloads the model checkpoints.

Input: OpenCV images

Hello,

It seems the example of AutoDetection uses filename. Can it support image as well?

Thanks

I have this problem.IndexError: list index out of range.

[]
vit_b model already exists as 'vit_b.pth'. Skipping download.
Traceback (most recent call last):
File "e:/AIGC/segment-anything-video/test", line 17, in
SahiAutoSegmentation().predict(
File "e:\AIGC\segment-anything-video\metaseg\sahi_predict.py", line 87, in predict
if type(input_box[0]) == list:
IndexError: list index out of range
how to deal it?

def predict(
    self,
    source,
    model_type,
    input_box=None,
    input_point=None,
    input_label=None,
    multimask_output=False,
    random_color=False,
    show=False,
    save=False,
):

    read_image = load_image(source)
    model = self.load_model(model_type)
    predictor = SamPredictor(model)
    predictor.set_image(read_image)

this if type(input_box[0]) == list:
input_boxes, new_boxes = multi_boxes(input_box, predictor, read_image)

        masks, _, _ = predictor.predict_torch(
            point_coords=None,
            point_labels=None,

installed metaseg but it doesn't download models

Tried to use metaseg in image and video project but couldn't do anything as it claimed that it doesn't know a model_type: self.segmentor = SegAutoMaskPredictor(model_type="vit_l", points_per_side=16, points_per_batch=64)
TypeError: init() got an unexpected keyword argument 'model_type'

Code:
import cv2
from PyQt6 import QtCore, QtGui, QtWidgets
from led_grid import LedGrid
from metaseg import SegAutoMaskPredictor

class HandTracker(QtCore.QObject):
def init(self, grid):
super().init()
self.grid = grid
self.cap = cv2.VideoCapture(0)
self.segmentor = SegAutoMaskPredictor(model_type="vit_l", points_per_side=16, points_per_batch=64)

def load_video(self, filepath):
    self.cap = cv2.VideoCapture(filepath)

def step(self):
    success, image = self.cap.read()
    if not success:
        print("Ignoring empty camera frame.")
        return

    image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
    results = self.segmentor.image_predict(
        source=image,
        min_area=1000,
        show=False,
        save=False,
    )

Add gradio library.

Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

(base) root@185:~/Track-Anything# python app.py --device cuda:0 --sam_model_type vit_h --port 80
/root/anaconda3/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
warn(f"Failed to load image Python extension: {e}")
Initializing BaseSegmenter to cuda:0

sahi yolov8 segmentation

hello all, i need a helo to run the code i dont know which code i should run

Issue with download_model() Function Not Completing

Hello,

I've encountered an issue with the download_model() function in your software. It seems to be failing and not progressing as expected. Below are the details of the problem:

Function in Question: download_model()
Issue: The function does not successfully complete its operation.
Behavior Observed: The process halts for an extended period without any progress. There is no change in the tqdm progress bar either.
Error Messages and Logs:

Segmenting input.mp4
OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
  0%|                                                                                        | 0/742 [00:00<?, ?it/s]
Downloading vit_b.pth model

The process then seems to halt at the initial stage of downloading the 'vit_b.pth' model with no progress indicated in the tqdm bar.

I would appreciate any guidance or fix for this issue. If there are any further details or logs that can assist in resolving this problem, please let me know, and I'll provide them.

Thank you for your time and assistance.

I found the solution, but a new problem has emerged.

          I found the solution, but a new problem has emerged.

What I want to do is to segment a video and label each class. My first idea is to assign different class labels to different mask_image colors (you can see what I did for this below). However, I noticed that the output mask video changes the colors between different frames, making it difficult for me to track the labels (such as cookie/person and so on). I checked your code and found that you did the same thing to the video as the images. So, it is not surprising to get such a result.

Therefore, I wonder if you could share some of your ideas regarding this. Thanks!

What I did (In sam_predictor.py line 139):
'''
combined_mask = mask_image # combined_mask = cv2.add(frame, mask_image)
out.write(combined_mask)
'''

Originally posted by @CRH400AF-A in #91 (comment)

Add foreground, background points

It would be highly useful to fully utilize the model's capabilities

Dose this repo support tracking in video?

ONNX support / Segmentation output

Thank you for this great wrapper!

I was wondering if there was support for ONNX models for faster inference.
Also; is it possible to export each layer individually as in the SAM demo?

Finally; I tried the cam streaming (setting source=0) but no success so far!

Does it support specific object segmentation ?

Hello ! Thanks for sharing your repo. I would like to know whether it could be used for video object segmentation in the semi-supervised setting.

Can metaseg input a video and output the class label?

Thanks for your great work!

I have a specific requirement for my project and I'm wondering if metaseg can cater to it. I need to input an image with dimensions HW3 (height * width * 3 channels) and obtain an "image" output with class labels in the form of HW1 (height * width * 1 channel). The "1" in this context represents that the pixels belong to different classes, rather than representing exact semantic labels.

Before I proceed, I'd like to confirm if metaseg has the capability to handle such a task. Your response would be highly valuable to me. Thank you for your time, and I'm looking forward to hearing from you.

Add the video support

Add manuel mask feature

Add the edit-anything feature support

ImportError: cannot import name 'SamAutomaticMaskGenerator' from partially initialized module 'metaseg' (on Google Colab)

Hello,

I seem to be getting the following error when running the following import on Google Colab:

from metaseg import SegAutoMaskGenerator

Error Output:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-6-5cdada3bd05d> in <cell line: 1>()
----> 1 from metaseg import SegAutoMaskGenerator

1 frames
/usr/local/lib/python3.9/dist-packages/metaseg/auto_mask_demo.py in <module>
      3 import torch
      4 
----> 5 from metaseg import SamAutomaticMaskGenerator, sam_model_registry
      6 from metaseg.utils import download_model, load_image, load_video
      7 

ImportError: cannot import name 'SamAutomaticMaskGenerator' from partially initialized module 'metaseg' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/metaseg/__init__.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

I have restarted the runtime and tried again but I get the same issue.

Example how to use with live video stream

An example would be nice to see how it can be used on a live capture from an USB cam for example.
BG, Thomas

AttributeError: module 'cv2' has no attribute 'write'

I was trying out this package - amazing work so quickly after the release! - and I'm getting:

[/usr/local/lib/python3.9/dist-packages/metaseg/mask_predictor.py](https://localhost:8080/#) in save_image(self, source, model_type, input_box, input_point, input_label, multimask_output, output_path)
    193 
    194         combined_mask = cv2.add(image, mask_image)
--> 195         cv2.write(output_path, combined_mask)
    196 
    197         return output_path

AttributeError: module 'cv2' has no attribute 'write'

Should that be: cv2.imwrite ?

Add the multi box feature

sahi_predict is used to predict what the SahiAutoSegmentation is. input_box in predict is []

sahi_predict is used to predict what the SahiAutoSegmentation is. input_box in predict is []?

metaseg-0.7.3 and metaseg-0.5.8 issues

metaseg-0.5.8 : AttributeError: 'list' object has no attribute 'astype'
metaseg-0.7.3: ImportError: Please install FalAI library using 'pip install fal_serverless'.

Supporting Apple M1 ?

Hello,

Does anyone knows how can we use device=mps on Apple M1 Chip in MetaSeg apps?

Thanks

how to get more objects like segment anything online demo?

the image has a high resolution : 5472x3648
How can we achieve a similar detection performance as the online demo for detecting numerous small targets, while using SAM+YOLOV8 that currently detects only a few targets

https://segment-anything.com/demo result : got lot of objects

-----------sam+yolov8-seg

from metaseg import SahiAutoSegmentation, sahi_sliced_predict

image_path = "pests.jpg"
boxes = sahi_sliced_predict(
image_path=image_path,
detection_model_type="yolov8", # yolov8, detectron2, mmdetection, torchvision
detection_model_path="yolov8x-seg.pt",
conf_th=0.25,
image_size=1024,
slice_height=256,
slice_width=256,
overlap_height_ratio=0.2,
overlap_width_ratio=0.2,
)

SahiAutoSegmentation().image_predict(
source=image_path,
model_type="vit_b",
input_box=boxes,
multimask_output=False,
random_color=False,
show=True,
save=False,
)

Add draw feature

SegAutoMaskPredictor producing random color

Hello, first of all, thank you for this awesome works!

I am following the instruction in the README for SegAutoMaskPredictor, specifically this one:

# For video
results = SegAutoMaskPredictor().video_predict(
    source="video.mp4",
    model_type="vit_l", # vit_l, vit_h, vit_b
    points_per_side=16,
    points_per_batch=64,
    min_area=1000,
    output_path="output.mp4",
)

on my private mp4 data. However I note that although the segment seems prefect, they often change color between frames. For example a chair was red in last frame but green in the next frame. I wonder is there any way to enforce color consistency between frames? Any pointer will be appreciated!

ROADMAP of MetaSeg

Add the FalAI library

Add the colab notebook file

#52 (reply in thread)

How is this algorithm different tan MIVOS STCN

How is this algorithm different than MIVOS STCN ?

Add video support for manual mask class

Add the SAHI Algorithm

ImportError with SegAutoMaskGenerator from metaseg package

Description:

I'm getting an ImportError when trying to import SegAutoMaskGenerator from the metaseg package. Here's the error message:

ImportError: cannot import name 'SamAutomaticMaskGenerator' from partially initialized module 'metaseg' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/metaseg/__init__.py)

Screenshot:

Steps to reproduce:

Install the metaseg package (pip install metaseg) on Google Colab.
Run the following code:
from metaseg import SegAutoMaskGenerator

Expected behavior:

The SegAutoMaskGenerator class should be successfully imported without any errors.

Actual behavior:

The ImportError is raised when trying to import SegAutoMaskGenerator.

Environment:

Python version: 3.9.16
Runtime: Google colab (with standard gpu)

SegAutoMaskPredictor().save_image returning error

Hello,

It seems something has changed in the code and .save_image isn't working anymore.

I have tried the following but there is no output image generated.

autoseg_image = SegAutoMaskPredictor().image_predict(
    source="smudge.png",
    model_type="vit_l",
    points_per_side=16, 
    points_per_batch=64,
    min_area=0,
    output_path='output.jpg'
)

I have also tried setting save=True and I'm getting the following error:

Could you help, please?

Huggingface Space Link doesn’t lead to the Demo

Hi @kadirnar,
I just came across your work and wanted to give metaseg a serious try :)
Unfortunately the HF spaces link provided on github lead me to a seemingly unrelated Space: https://huggingface.co/spaces/ArtGAN/Audio-WebUI
Would appreciate if you correct that.
Thanks and best,
Mike

use shi to segment video

I have a problem when i run this code.
I don't konw why ModuleNotFoundError: No module named 'yolov5'.
I install yolov5 module by following code.

pip install ultralytics

Can someone give me some advice? Thank you very much!

from metaseg.sahi_predict import SahiAutoSegmentation, sahi_sliced_predict
import cv2

cap = cv2.VideoCapture('./test_data/red_girl.mp4')
fourcc = cv2.VideoWriter_fourcc(*'MP4V')  # 视频编解码器
fps = cap.get(cv2.CAP_PROP_FPS)  # 帧数
width, height = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))  # 宽高
out = cv2.VideoWriter('./output/read_girl_shi.mp4', fourcc, fps, (width, height))  # 写入视频
# Read the first frame
ret, frame = cap.read()
while ret:
    cv2.imwrite("./test_data/temp.jpg", frame)

    image_path = "./test_data/temp.jpg"
    boxes = sahi_sliced_predict(
        image_path=image_path,
        detection_model_type="yolov5", #yolov8, detectron2, mmdetection, torchvision
        detection_model_path="yolov5l6.pt",
        conf_th=0.25,
        image_size=1280,
        slice_height=256,
        slice_width=256,
        overlap_height_ratio=0.2,
        overlap_width_ratio=0.2,
    )
    SahiAutoSegmentation().predict(
        source=image_path,
        model_type="vit_b",
        input_box=boxes,
        multimask_output=False,
        random_color=False,
        show=True,
        save=True,
        output_path="./output/temp.jpg"
    )
    image = cv2.imread("./output/temp.jpg")
    # image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    out.write(image)

cap.release()

document little error

In you ReadMe.md, Usage segment, there is a code error, when i copy this demo to run. You should add ',' behind this line "input_point=[0, 0, 100, 100]".

# For video

results = SegManualMaskPredictor().video_predict(
    source="test.mp4",
    model_type="vit_l", # vit_l, vit_h, vit_b
    input_point=[0, 0, 100, 100]
    input_label=[0, 1],
    input_box=None,
    multimask_output=False,
    random_color=False,
    output_path="output.mp4",
)

how to get semantic label

I want to know how to get the class label information corresponding to the segmentation mask area

I have this issue after updating the code.

Traceback (most recent call last):
File "e:/AIGC/segment-anything-video/test", line 1, in
from metaseg import sahi_sliced_predict, SahiAutoSegmentation
File "e:\AIGC\segment-anything-video\metaseg_init_.py", line 7, in
from metaseg.falai_demo import falai_automask_image, falai_manuelmask_image
File "e:\AIGC\segment-anything-video\metaseg\falai_demo.py", line 5, in
from metaseg import SegAutoMaskPredictor, SegManualMaskPredictor
ImportError: cannot import name 'SegAutoMaskPredictor' from partially initialized module 'metaseg' (most likely due to a circular import) (e:\AIGC\segment-anything-video\metaseg_init_.py)

Can you save sections of the image that are masked after using SegAutoMaskPredictor()

I can't find this in the documentation. After running results = SegAutoMaskPredictor().image_predict(
source="firststeve.png",
model_type="vit_h", # vit_l, vit_h, vit_b
points_per_side=4,
points_per_batch=16,
min_area=0,
output_path="output.png",
show=False,
save=True,
)

I can save the segmented image, but I can't pull out the individual segmented pieces. Does this package support doing something like this or do I have to use the original repository? I just want to know if something like this exists:

kadirnar / segment-anything-video Goto Github PK

segment-anything-video's Introduction

MetaSeg: Packaged version of the Segment Anything repository

Installation

Usage

SAHI + Segment Anything

FalAI(Cloud GPU) + Segment Anything

Extra Features

segment-anything-video's People

Contributors

Stargazers

Watchers

Forkers

segment-anything-video's Issues

Description:

Steps to reproduce:

Expected behavior:

Actual behavior:

Environment:

Recommend Projects

Recommend Topics

Recommend Org