Coder Social home page Coder Social logo

xrosliang / sepstereo_eccv2020 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sheldontsui/sepstereo_eccv2020

0.0 1.0 0.0 329 KB

Codebase for the paper "Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation" (ECCV2020)

Home Page: https://github.com/SheldonTsui/SepStereo_ECCV2020

License: Creative Commons Attribution 4.0 International

Python 93.94% Shell 6.06%

sepstereo_eccv2020's Introduction

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation (ECCV 2020)

Hang Zhou*, Xudong Xu*, Dahua Lin, Xiaogang Wang, and Ziwei Liu.

We propose to integrate the task of stereophonic audio generation and audio source separation into a unified framework namely Sep-Stereo, which leverages vastly available mono audios to facilitate the training of stereophonic audio generation. Moreover, we design Associative Pyramid Network (APNet) which better associates the visual features and the audio features with a learned Associative-Conv operation, leading to performance improvement in both two tasks.

[Project] [Paper] [Demo]

Requirements

  • Python 3.6 is used. Basic requirements are listed in the 'requirements.txt'
pip install -r requirements.txt 

Dataset

Stereo dataset

FAIR-Play can be accessed here. YT-Music can be accessed here.

Separation dataset

MUSIC21 can be accessed here. As illustrated in our supplementary material, you'd better choose those instrument categories presented in the stereo dataset, such as cello, trumpet, piano, etc.

Training and Testing

All the training and testing bash scripts can be found in './scripts'. Before training, please prepare stereo data as the instructions in FAIR-Play. For MUSIC21 dataset, please the videos into 10s clips and formulate the data split as './data/dummy_MUSIC_split'.

License and Citation

The usage of this software is under CC-BY-4.0.

@inproceedings{zhou2020sep,
  title={Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation},
  author={Zhou, Hang and Xu, Xudong and Lin, Dahua and Wang, Xiaogang and  Liu, Ziwei},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Acknowledgement

The structure of this codebase is borrowed from 2.5D Visual Sound.

sepstereo_eccv2020's People

Contributors

hangz-nju-cuhk avatar sheldontsui avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.