The 2.5d-visual-sound from github30

2.5D Visual Sound

[Project Page] [arXiv] [Video] [Dataset]

2.5D Visual Sound
Ruohan Gao¹ and Kristen Grauman²
¹UT Austin, ²Facebook AI Research
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019

If you find our code or project useful in your research, please cite:

    @inproceedings{gao2019visualsound,
      title={2.5D Visual Sound},
      author={Gao, Ruohan and Grauman, Kristen},
      booktitle={CVPR},
      year={2019}
    }

FAIR-Play Dataset

The FAIR-Play repository contains the dataset we collected and used in our paper. It contains 1,871 video clips and their corresponding binaural audio clips recorded in a music room. The code provided can be used to train mon2binaural models on this dataset.

Training and Testing

(The code has beed tested under the following system environment: Ubuntu 16.04.6 LTS, CUDA 9.0, Python 2.7.15, PyTorch 1.0.0)

Download the FAIR-Play dataset and prepare the hdf5 splits files accordingly by adding the correct root prefix.
[OPTIONAL] Preprocess the audio files using reEncodeAudio.py to accelerate the training process.
Use the following command to train the mono2binaural model:

python train.py --hdf5FolderPath /YOUR_CODE_PATH/2.5d_visual_sound/hdf5/ --name mono2binaural --model audioVisual --checkpoints_dir /YOUR_CHECKPOINT_PATH/ --save_epoch_freq 50 --display_freq 10 --save_latest_freq 100 --batchSize 256 --learning_rate_decrease_itr 10 --niter 1000 --lr_visual 0.0001 --lr_audio 0.001 --nThreads 32 --gpu_ids 0,1,2,3,4,5,6,7 --validation_on --validation_freq 100 --validation_batches 50 --tensorboard True |& tee -a mono2binaural.log

Use the following command to test your trained mono2binaural model:

python demo.py --input_audio_path /BINAURAL_AUDIO_PATH --video_frame_path /VIDEO_FRAME_PATH --weights_visual /VISUAL_MODEL_PATH --weights_audio /AUDIO_MODEL_PATH --output_dir_root /YOUT_OUTPUT_DIR/ --hop_size 0.05

Acknowlegements

Portions of the code are adapted from the CycleGAN implementation (https://github.com/junyanz/CycleGAN) and the Sound-of-Pixels implementation (https://github.com/hangzhaomit/Sound-of-Pixels). Please also refer to the original License of these projects.

Licence

The code for 2.5D Visual Sound is CC BY 4.0 licensed, as found in the LICENSE file.

github30 / 2.5d-visual-sound Goto Github PK

2.5d-visual-sound's Introduction

2.5D Visual Sound

FAIR-Play Dataset

Training and Testing

Acknowlegements

Licence

2.5d-visual-sound's People

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent