Coder Social home page Coder Social logo

zhang-tao-whu / dvis Goto Github PK

View Code? Open in Web Editor NEW
118.0 118.0 6.0 192 KB

DVIS: Decoupled Video Instance Segmentation Framework

License: MIT License

Python 92.82% Shell 0.07% C++ 0.71% Cuda 6.40%
offline online ovis segmentation video-instance-segmentation video-panoptic-segmentation

dvis's People

Contributors

zhang-tao-whu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dvis's Issues

Model wieghts about VSPW dataset

Very good idea and project!
When I test the model on the VSPW dataset, I can't find model weights about it in Model Zoo. Where can I find it? Thank you!

单卡gpu 不支持推理吗

如题,输入命令
python train_net_video.py --num-gpus 1 --config-file configs/ovis/DVIS_Offline_R50.yaml --eval-only MODEL.WEIGHTS checkpoints/DVIS_offline_ovis_r50.pth
返回

[09/04 15:40:30 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from checkpoints/DVIS_offline_ovis_r50.pth ...
[09/04 15:40:30 fvcore.common.checkpoint]: [Checkpointer] Loading from checkpoints/DVIS_offline_ovis_r50.pth ...
[09/04 15:40:32 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[09/04 15:40:32 d2.data.common]: Serializing 140 elements to byte tensors and concatenating them all ...
[09/04 15:40:32 d2.data.common]: Serialized dataset takes 0.42 MiB
COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead.
[09/04 15:40:32 d2.evaluation.evaluator]: Start inference on 140 batches
/home/hs/AIGC/DVIS_ENV/lib/python3.10/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2228.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
已杀死

VSS bug

Hi, authors,

Thanks for your great work!
I met a few problems when running VSS code, could you pls give any suggestions?

  1. How do I train DVIS on VSS? (following which config file?)
  2. and What is the difference between 480p and 720p dataset?

I follow the code python train_net_video.py --num-gpus 4 --config-file configs/VSPW/MinVIS_R50_480p.yaml

The traceback error is : sem_seg_gt[sem_seg_gt == 0] = 255
ValueError: assignment destination is read-only

Thanks!

dataset

Hello, the dataset I use is youtubevis, and my data structure is as follows:
ytvis_2021/
train.json
valid.json
train/
Annotations/
JPEGImages/
valid/
Annotations/
JPEGImages/
And I put them under the datasets file.

But when I train with this command:
python train_net_video.py --num-gpus 2 --config-file ./configs/youtubevis_2021/swin/DVIS_Offline_SwinL.yaml --resume MODEL.WEIGHTS ./pretrain/DVIS_offline_ytvis21_swinl.pth
The following error occurred:
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/coco/annotations/coco2ytvis2021_train.json'
Do I need a json file in coco format for training?

No detections shown

I tried to use demo on a few videos, but even though there are some instances detected, they aren't shown on the output frames. Lowering the score threshold doesn't help.

Training parameters

How can I replicate similar results on only one GPU? What parameters need to be modified?
Can I display only the segmentation mask (without categories and scores) when running the demo.py for visualization?
Thank you!

Problem when I evaluate DVIS(online) on OVIS dataset

Hi! Thank you for your great work!
When I try to evaluate DVIS(online) on OVIS dataset, I only get "nan" results. Like this:
Snipaste_2023-11-20_18-19-00

I can't find the reason. I just change the IMS_PER_BATCH in DVIS_Online_R50.yaml from 8 to 4 in DVIS_Online_R50.yaml.
Here is my order:python train_net_video.py --num-gpus 4 --config-file configs/ovis/DVIS_Online_R50.yaml --eval-only --resume MODEL.WEIGHTS checkpoints/DVIS_online_ovis_r50.pth。
And my evaluate log is log.txt

Hope you can help answer.

can not produce demos

Hi authors,

Thanks for your great work! I try to produce some demo videos for VIPSeg dataset, but the segmentation results are poor, do I need to change some settings to reproduce the results shown in paper?

Thanks

About the json file size

Hi Siir

Thanks for your excellent job, when I try to evaluate the R-50 DVIS offline on ytvis2019, I found that the generated json file is more than 500+MB, which exceeds the codelab server requirement, is that a normal result?

Thanks
Kai

A problem of training R50 model under VIPSeg dataset

Hi! Your work is excellent!
I try to train DVIS_R50 model under VIPSeg dataset.
Do I need to finetune the segmenter by following "Training on a new dataset"? Or I just need to train by following "Training" and use minvis_pretrained_weights.pth like "minvis_ovis_R50.pth"?
I‘m looking forward to your answer. Thanks!

How to make a dataset for video instance segmentation model?

Hi! The DVIS model is a great model for video tasks. I've finished labeling the video data. Each object of each frame of each video has segmentation, classification, and persistent ID information, and a JSON file is made like

{info:{},
licenses:[],
videos:[], # video information
categories:[],
annotations:[] # Information about each instance, instance represents a collection of objects with unique IDs in each frame of the video
}

Uploading 20240308102623.png…

l have a question. What format and information does the 'image_instance‘ of the image contain? In my case, what do I need to do?

Thanks!

can not use demo file

image

When I tried to use the demo file. It stops the program with a "keyboard interrupt" error even though I don't press a key

Exploring Real-time Video Instance Segmentation with DVIS Model

I am currently using the DVIS model for inference, and it appears to take a directory of video frames in image format as input. I would like to inquire whether it is possible to directly input a video for real-time video instance segmentation.

Is it feasible to configure the DVIS model to work with video input, allowing for real-time video instance segmentation, or is it limited to processing individual frames in image format?

Thank you for your guidance and support!

Some questions about your motivation of instance association.

Dear Zhang,

I have read your paper closely. But I am confused about the 2 principles you proposed. I just cannot get the point that you want to solve.

(1) encourage sufficient interaction between instance representations of adjacent frames to fully exploit their similarity for better association.

You argue that previous works use some heuristic methods that lack of interaction between frames. But interaction really makes confused? What interaction really mean? Because there are a lot of work before that have interaction between frames in different manners. And the previous work you mentioned are using post processing methods to track. Do you want to emphasize that your method does not need post processing? Or something else? How to define sufficient or not sufficient?

(2) avoid mixing their information during the interaction process to prevent introducing indistinguishable noise that may interfere with association results.

This part also let me confuse, especially the word mixing. What the pre-frame instance representation mean? Is that query? Is that instance feature? In your paper, you reference two works before. The reference number is 33 and 9. In 33 it just passes the query to the next frame like below. Is that called instance representation? But in your work, it also passes the query from decoder to the next TD Block.
image
I just confused about this part of motivation, the problems that you want to solve and your own special solution. So kindly of you, if you can give me more detailed idea, I would be far more appreciate it.

Have a nice day.
Rohan

How to Train on New Data

Hello, I am interested in training a dataset using your DVIS model. After reading through the GETTING_STARTED.md, my understanding is that I should follow these steps:

  1. Finetune the segmenter.
  2. Use the weights from the trained segmenter to train DVIS_Online.
  3. Use the weights from the trained DVIS_Online to train DVIS_Offline.
image

Could you please confirm if my understanding is correct?

Why don't Temporal refiner module work well in my dataset?

Thank you so much for your excellent work!!!
I want to make the temporal refine mechanism like your work. First, I use a trained model to extract the time series features, which fuse temporal information. Then put it into the temporal refine module, but I found that it don't work well.
I think one reason is that my dataset size is relatively small. And the feature exactor is based on CNN.
Could you give me some advice? Thank you very much!!!

no detection results on demo.py

Hi,

I am trying to generate some visual results on VSPW dataset, I used VIPSeg's config and ckp, but it detect 0 instances, could you give me any suggestions?

Thanks!

ckpt reproduce

Hi, Author!

When i evaluated the performance using the trained model you provided, DVIS_online_r50 and DVIS_offline_r50, the performance was similar with paper.

To reproduce the ckpt by re-training, i followed your configuration.

# train the DVIS_Online
python train_net_video.py \
  --num-gpus 8 \
  --config-file ./configs/youtubevis_2019/DVIS_Online_R50.yaml \
  MODEL.WEIGHTS ./ckpt/minvis_ytvis19_R50.pth

benchmark score of reproduced-online method was simliar with paper.

# train the DVIS_Offline
python train_net_video.py \
  --num-gpus 8 \
  --config-file ./configs/youtubevis_2019/DVIS_Offline_R50.yaml \
  MODEL.WEIGHTS ./ckpt/DVIS_online_ytvis19_r50.pth \ 

However, when I tried learning the offline method according to the code provided above, the reproduced performance was significantly lower (ytvis2019: 1 AP). Could you please check the training code or configuration?
(never change anything in your code)

Can I deploy DVIS model with onnx?

Thanks to super awesome work! I'm really impressed on your work :) And I have a quick question.
Can I deploy DVIS model with onnx? if not yet, do you have any plan for that work?

Train on custom dataset

I can't get to load an annotated custom image dataset and train it. How is that possible?

About the transformer denoising blocks (TD)

Thanks for your great work!

As I understand, from the frame 2, the Q, K, and V are consistent on each TD block of the same frame, right? The only change is the ID, which will be updated through L blocks.

So have you ever tried to experiment with the effectiveness of the number of L?

A problem to submit result

Hi, I try to get the inference result by submitting esult_submission.zip on the server for video panoptic segmentation swin-L, but failed. The errors are shown below, could you please check?

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Traceback (most recent call last):
File "/tmp/codalab/tmp4vzAlN/run/program/score.py", line 815, in
pred_js = pred_j[video_id]
KeyError: '1001_5z_ijQjUf_0'

whether release LSVOS challenge technique report ?

congratulations. "DVIS achieved 1st place in the VIS Track of the 5th LSVOS challenge at ICCV 2023".

I want to ask whether to release the technique report about getting the first award.

Or just the paper code can get it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.