Coder Social home page Coder Social logo

leimao / singing-voice-separation-rnn Goto Github PK

View Code? Open in Web Editor NEW
60.0 7.0 20.0 67.94 MB

Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks

License: MIT License

Python 100.00%
source-separation recurrent-neural-networks

singing-voice-separation-rnn's Introduction

Singing Voice Separation RNN

Lei Mao

University of Chicago

Introduction

This is a singing voice sepration tool developed using recurrent neural network (RNN). It could seperate the singer voice and the background music from the original song. It is still in the development stage since the separation has not been perfect yet. Please check the demo for the performance.

Dependencies

  • Python 3.5
  • Numpy 1.14
  • TensorFlow 1.8
  • RarFile 3.0
  • ProgressBar2 3.37.1
  • LibROSA 0.6
  • FFmpeg 4.0
  • Matplotlib 2.1.1
  • MIR_Eval 0.4

Files

.
├── demo
├── download.py
├── evaluate.py
├── figures
├── LICENSE.md
├── main.py
├── model
├── model.py
├── preprocess.py
├── README.md
├── songs
├── statistics
├── train.py
└── utils.py

Dataset

MIR-1K Dataset

Multimedia Information Retrieval, 1000 song clips (MIR-1K), dataset for singing voice separation.

To download the whole dataset, and split into train, validation, and test set, in the terminal:

$ python download.py 

Usage

Train Model

To train the model, in the terminal:

$ python train.py

The training took roughly 45 minutes for 50,000 iterations on the train set of MIR-1K dataset using NVIDIA GTX TITAN X graphic card.

The program loads all the MIR-1K dataset into memory and stores all the processed MIR-1K data in the memory to accelerate the data sampling for training. However, this may cosume more than 10 GB of memory.

The trained model would be saved to the model directory.

Evaludate Model

To evaludate the model, in the terminal:

$ python evaluate.py

The evaluation took roughly 1 minute on the test set of MIR-1K dataset using NVIDIA GTX TITAN X graphic card. The separated sources, together with the monaural source, would be saved to the demo directory.

GNSDR GSIR GSAR
Vocal 7.40 12.75 9.34
BGM 7.45 13.17 9.25

To do: The evaluation statistics would be saved.

Separate Sources for Customized Songs

To separate sources for customized songs, put the MP3 formatted songs to the songs directory, in the terminal:

$ python main.py

The separated sources, together with the monaural source, would be saved to the demo directory.

Demo

The MP3 of "Backstreet Boys - I want it that way", backstreet_boys-i_want_it_that_way.mp3 , was put to the songs directory. Using the pre-trained model in the model diretory, in the terminal:

$ python main.py

The separated sources, backstreet_boys-i_want_it_that_way_src1.mp3 and backstreet_boys-i_want_it_that_way_src2.mp3, together with the monaural source, backstreet_boys-i_want_it_that_way_mono.mp3, were saved to the demo directory.

References

  • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis, Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks. 2014.
  • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis, Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation. 2015.
  • Dabi Ahn's Music Source Separation Repository

To-Do List

  • Evaluation metrics
  • Hyper parameter tuning
  • Argparse

singing-voice-separation-rnn's People

Contributors

leimao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

singing-voice-separation-rnn's Issues

The link in download file('http://mirlab.org/dataset/public/MIR-1K.rar') is not available

I can't download data set by the link no more because the link is unavailable.

So I got the data set by another link but I still couldn't run the code.

The error says that 'FileNotFoundError: [Errno 2] No such file or directory: 'data/MIR1K/train.txt'.

I have downloaded the dataset with many different links, but train.txt is not in any folder.

What is the content of train.txt? Please let me know...

speech mixture

Dear friend.
I have one question. When I use this code to deal with Singing and voice mixture, it works well. But when I use it to process speech mixture, it seems to have no effect. Can you guide why?

Regards.

for a mono audio

Dear friend:

     If I input a mono MIR-1K audio, it can be tested and how to eval?  

please guide...

validation loss

Dear friend.

I have one more question.
no matter how much the model is trained. the loss function does not go below 1.2 or we can say 1.0.

Can you guide why? or this is multiplied by any of the factor?

Because as far as I understand, for good training, the loss should decrease 0.1 or at least nearby.

If you can guide because I am not so expert and trying to learn.

Greetings.

assert len

Dear Friend.

I want to ask a question from you. !!
in model, why is this necessary?

assert len(num_hidden_units) == num_rnn_layer

and why can't we change the number of layers with this exception?

Please guide..

Greetings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.