Coder Social home page Coder Social logo

cmbs's Introduction

Cross Modal Background Suppression for Audio-Visual Event Localization

This is a pytorch implementation for CVPR 2022 paper "Cross Modal Background Suppression for Audio-Visual Event Localization".

Introduction

We are concerned about an important problem: audio-visual event localization, which requires the model to recognize the event category and localize the event boundary when the event is both audible and visible at the same time.

Unlike previous methods, we consider the problem of audio-visual event localization from the viewpoint of cross-modal background suppression. We first define the "background" category from two aspects: 1) If the audio and visual information in the small video segment do not represent the same event, then the video segment will be labeled as background. 2) If an event only occurs in one modality but has a low probability in another, then this event category will be labeled as background in this video, i.e., offscreen voice.

Hence, this paper proposes a novel cross-modal background suppression method considering two aspects: time-level and event-level, which allow the audio and visual modalities to serve as the supervisory signals complementing each other to solve the AVE task problems.

AVE

Prerequisites

This package has the following requirements:

  • Python 3.7.6
  • Pytorch 1.10.2
  • CUDA 11.4
  • h5py 2.10.0
  • numpy 1.21.5

Data preparation

The VGG visual features can be downloaded from Visual_feature.

The VGG-like audio features can be downloaded from Audio_feature.

The noisy visual features used for weakly-supervised setting can be downloaded from Noisy_visual_feature.

After downloading the features, please place them into the data folder.

If you are interested in the AVE raw videos, please refer to this repo and download the AVE dataset.

Training and Evaluating CMBS

Fully-Supervised Setting

The configs/main.json contains the main hyper-parameters used for fully-supervised training.

Training

bash supv_train.sh

Evaluating

bash supv_test.sh

Weakly-Supervised Setting

The configs/weak.json contains the main hyper-parameters used for weakly-supervised training.

Training

bash weak_train.sh

Evaluating

bash weak_test.sh

Pretrained model

The pretrained models can be downloaded from Supervised model and WeaklySupervised model.

After downloading the pretrained models, please place them into the Exps folder.

You can try different parameters or random seeds if you want to retrain the model, the results may be better.

Acknowledgement

Part of our code is borrowed from the following repositories.

We thank to the authors for releasing their codes. Please also consider citing their works.

cmbs's People

Contributors

marmot-xy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.