jayg996 / btc-ismir19 Goto Github PK

"A Bi-Directional Transformer for Musical Chord Recognition" accepted on ISMIR2019

License: MIT License

Python 100.00%

btc-ismir19's Introduction

A Bi-Directional Transformer for Musical Chord Recognition

This repository has the source codes for the paper "A Bi-Directional Transformer for Musical Chord Recognition"(ISMIR19).

Requirements

pytorch >= 1.0.0
numpy >= 1.16.2
pandas >= 0.24.1
pyrubberband >= 0.3.0
librosa >= 0.6.3
pyyaml >= 3.13
mir_eval >= 0.5
pretty_midi >= 0.2.8

File descriptions

audio_dataset.py : loads data and preprocesses label files to chord labels and mp3 files to constant-q transformation.
btc_model.py : contains pytorch implementation of BTC.
train.py : for training.
crf_model.py : contatins pytorch implementation of Conditional Random Fields (CRFs) .
baseline_models.py : contains the codes of baseline models.
train_crf.py : for training CRFs.
run_config.yaml : includes hyper parameters and paths that are needed.
test.py : for recognizing chord from audio file.

Using BTC : Recognizing chords from files in audio directory

Using BTC from command line

$ python test.py --audio_dir audio_folder --save_dir save_folder --voca False

audio_dir : a folder of audio files for chord recognition (default: './test')
save_dir : a forder for saving recognition results (default: './test')
voca : False means major and minor label type, and True means large vocabulary label type (default: False)

The resulting files are lab files of the form shown below and midi files.

Attention Map

The figures represent the probability values of the attention of self-attention layers 1, 3, 5 and 8 respectively. The layers that best represent the different characteristics of each layers were chosen. The input audio is the song "Just A Girl" (0m30s ~ 0m40s) by No Doubt from UsPop2002, which was in evaluation data.

Data

We used Isophonics[1], Robbie Williams[2], UsPop2002[3] dataset which consists of chord label files. Due to copyright issue, these datasets do not include audio files. The audio files used in this work were collected from online music service providers.

[1] http://isophonics.net/datasets

[2] B. Di Giorgi, M. Zanoni, A. Sarti, and S. Tubaro. Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony. In Proc. of the 8th International Workshop on Multidimensional Systems, Erlangen, Germany, 2013.

[3] https://github.com/tmc323/Chord-Annotations

Reference

pytorch implementation of Transformer and Crf: https://github.com/kolloldas/torchnlp

Comments

Any comments for the codes are always welcome.

btc-ismir19's People

Contributors

Stargazers

Watchers

btc-ismir19's Issues

Structure of sound source and annotation data

This is not a bug report.
I'm trying to relearn(train.py) BTC-ISMIR2019, but I'm not sure about the directory structure under /data/music/chord_recognition/
Would you tell me where to put the mp3 and lab files?
P.S.
I already have the sound source data (.wav or .mp3) and annotation data (.lab) of isophonic, USpop, and robbie williams.

How to run this code on the BPS-FH dataset?

Hi, thanks for your work! It seems that this code is for audio2chord experiments. How to run it on the BPS-FH dataset(i.e. symbolic music chord recognition)?

Extracting embeddings

Hi, thanks for making this! I tested it, it's amazing. I wonder if it's possible to extract vectors for the wave files and if so is there example code for it?

Requires 1 positional argument 'Loader'

May I know what version of python did you used in this case since I'm having an error
Traceback (most recent call last):
File "test.py", line 22, in
config = HParams.load("run_config.yaml")
File "C:\Users\LENOVO\AppData\Local\Programs\Python\Python37\BTC-ISMIR19\utils\hparams.py", line 29, in load
return cls(**yaml.load(f))
TypeError: load() missing 1 required positional argument: 'Loader'

I fixed the problem by replacing the following line 29 in test.py:

checkpoint = torch.load(model_file)

with

torch.load(model_file, map_location=lambda storage, loc: storage)