Coder Social home page Coder Social logo

context-and-structure-mining-network-for-video-object-detection's Introduction

Context and Structure Mining Network for Video Object Detection (Under Construction)

By Liang Han, Pichao Wang, Zhaozheng Yin, Fan Wang and Hao Li.

This repo is an implementation of ["Context and Structure Mining Network for Video Object Detection"], accepted by IJCV 2021. This repository contains a PyTorch implementation of our approach CSMN based on maskrcnn_benchmark.

Citing our work

Please cite our paper in your publications if it helps your research:

Installation

Please follow INSTALL.md for installation instructions.

Data preparation

Please download ILSVRC2015 DET and ILSVRC2015 VID dataset from here. After that, we recommend to symlink the path to the datasets to datasets/. And the path structure should be as follows:

./datasets/ILSVRC2015/
./datasets/ILSVRC2015/Annotations/DET
./datasets/ILSVRC2015/Annotations/VID
./datasets/ILSVRC2015/Data/DET
./datasets/ILSVRC2015/Data/VID
./datasets/ILSVRC2015/ImageSets

Note: We have already provided a list of all images we use to train and test our model as txt files under directory datasets/ILSVRC2015/ImageSets. You do not need to change them.

Usage

Note: Cache files will be created at the first time you run this project, this may take some time! Don't worry!

Note: Currently, one GPU could only hold 1 image. Do not put 2 or more images on 1 GPU!

Note We provide template files named BASE_RCNN_{}gpus.yaml which would automatically change the batch size and other relevant settings. This behavior is similar to detectron2. If you want to train model with different number of gpus, please change it by yourself :) But assure 1 GPU only holds 1 image! That is to say, you should always keep SOLVER.IMS_PER_BATCH and TEST.IMS_PER_BATCH equal to the number of GPUs you use.

Demo Usage

Please follow demo/README.md to see how to visualize your own images or video.

Customize

If you want to use these methods on your own dataset or implement your new method. Please follow CUSTOMIZE.md.

Acknowledgement

The implementation of this work is built on the official implementation of "Memory Enhanced Global-Local Aggregation for Video Object Detection" (https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Memory_Enhanced_Global-Local_Aggregation_for_Video_Object_Detection_CVPR_2020_paper.pdf), which is publicly available at https://github.com/Scalsol/mega.pytorch

Contributing to the project

Any pull requests or issues are welcomed.

context-and-structure-mining-network-for-video-object-detection's People

Contributors

lianghann avatar

Stargazers

Qianyu Chow avatar  avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.