Coder Social home page Coder Social logo

ltphat / vietnamese-traditional-music-classification Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 1.0 1.06 GB

Music genre recognition with Convolutional Neural Networks (CNN) using Mel Spectrograms.

License: Apache License 2.0

Python 100.00%
audio-processing mel-spectrograms music-genre-classification vietnamese

vietnamese-traditional-music-classification's Introduction

Vietnamese Traditional Music Classification

  • Digital Signal Processing Course Project (EE2015).
  • Audio Classification using Mel Spectrograms and Convolution Neural Networks.
  • Finished: 09/07/2023

Dataset

  • The dataset includes audio files of 5 classes: cailuong, catru, chauvan, cheo, hatxam.

  • Each class includes 500 wav files with a length of about 30s.

  • Vietnam Traditional Music (5 genres): https://www.kaggle.com/datasets/homata123/vntm-for-building-model-5-genres.

  • Download the dataset, create a folder named rawdata in the project's folder and configure the dataset as shown below.

    ...
    ├── model_images
    ├── notebook
    ├── rawdata                   
    │  ├── cailuong 
    |  |      ├── Cailuong000.wav
    |  |      ├── Cailuong001.wav
    |  |      ├── Cailuong002.wav   
    |  |      ├── ...
    │  ├── catru    
    |  |      ├── Catru000.wav
    |  |      ├── Catru001.wav
    |  |      ├── Catru002.wav   
    |  |      ├── ...        
    │  ├── chauvan  
    |  |      ├── Chauvan000.wav
    |  |      ├── Chauvan001.wav
    |  |      ├── Chauvan002.wav   
    |  |      ├── ...   
    │  ├── cheo  
    |  |      ├── Cheo000.wav
    |  |      ├── Cheo001.wav
    |  |      ├── Cheo002.wav   
    |  |      ├── ...  
    │  ├── hatxam  
    |  |      ├── Hatxam000.wav
    |  |      ├── Hatxam001.wav
    |  |      ├── Hatxam002.wav   
    |  |      ├── ...              
    ├── test_audio
    ...
    

Workflow

The project's workflow is illustrated in the figure below:

Alt text

a) Audio feature extraction

Audio feature extraction is a necessary step in audio signal processing, which is a subfield of signal processing. Different features capture different aspects of sound. Here are some signal domain features.

  • Time domain: These are extracted from waveforms of the raw audio: Zero crossing rate, amplitude envelope, RMS energy ...

  • Frequency domain: Signals are generally converted from the time domain to the frequency domain using the Fourier Transform: Band energy ratio, spectral centroid, spectral flux ...

  • Time-frequency representation: The time-frequency representation is obtained by applying the Short-Time Fourier Transform (STFT) on the time domain waveform: Spectrogram, Mel-spectrogram, constant-Q transform...

In this repo, we extract Mel-spectrogram images from audios of the dataset and feed them to CNN model as an image classification task.

b) CNN models

We propose 3 models using the extracted mel-spectrogram as input images. With each image, the output vector gives the probability of 5 class.

Model 1

Alt text

Model 2

Alt text

Model 3

Alt text

Ensemble

In the inference phase, we propose to use late fusion of probabilities, referred to as PROD fusion. Consider predicted probabilities of each model as $\boldsymbol{P_s} = [p_{s1}, p_{s2}, ..., p_{sC}]$ where $C$ is the number of classes and the $s^{th}$ out of networks evaluated. The predicted probabilities after PROD fusion is obtained by:

$$\boldsymbol{P_{prod}}=[p_1, p_2, ..., p_C], p_i=\frac{1}{S}{\displaystyle \prod_{s=1}^{S} p_{si}}, 1 \le i \le C$$

Finally, the predicted label $\hat{y}$ is determined by: $\hat{y}=\arg \max{\boldsymbol{P_{prod}}}$

Tutorial

To run the code of this project, please follow these steps:

  • Install required libraries, dependencies.
numpy
librosa
tensorflow
matplotlib
pydub
sklearn
seaborn
  • Note! In order to avoid errors at local when using pydub.AudioSegment, it's better to download ffmpeg and add them to environment variables. Tutorial here: https://phoenixnap.com/kb/ffmpeg-windows

  • Config your own parameters in config.py. Directory configs are available and compatible with the project's folder structure. Hence, it's not recommended to change them.

  • Run processing.py. After running, mel-images folder contains all the mel-spectrogram images extracted from 5 classes and dataset folder contains train/val/test folder of images of 5 classes. Constructing the dataset is completed.

  • At build/train_model.py, change the model_index to 1, 2, 3 at the last line to train model1, model2 or model3. Then, run this file. After running, the best model .h5 file will be saved at model folder. Training is completed.

  • Run Streamlit app at app/app.py, upload your new audios and get prediction. The audios uploaded on app will be saved at audio_from_user folder. Run app using this command:

streamlit run app/app.py

Some images

Alt text

Alt text

References

[1] Vietnam Traditional Music (5 genres), https://www.kaggle.com/datasets/homata123/vntm-for-building-model-5-genres.

[2] Librosa Library, https://librosa.org/doc/latest/index.html

[3] TensorFlow, https://www.tensorflow.org/

[4] CHU BA THANH, TRINH VAN LOAN, DAO THI LE THUY, AUTOMATIC IDENTIFICATION OF SOME VIETNAMESE FOLK SONGS CHEO AND QUANHO USING DEEP NEURAL NETWORKS, https://vjs.ac.vn/index.php/jcc/article/view/15961

[5] Valerio Velardo - The Sound of AI, https://www.youtube.com/@ValerioVelardoTheSoundofAI

[6] Dipti Joshi1, Jyoti Pareek, Pushkar Ambatkar, Comparative Study of Mfcc and Mel Spectrogram for Raga Classification Using CNN, https://indjst.org/articles/comparative-study-of-mfcc-and-mel-spectrogram-for-raga-classification-using-cnn

[7] Loris Nanni et al, Ensemble of convolutional neural networks to improve animal audio classification, https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-020-00175-3

vietnamese-traditional-music-classification's People

Contributors

ltphat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

nguyenlamvu123

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.