Coder Social home page Coder Social logo

btc-ismir19's Introduction

A Bi-Directional Transformer for Musical Chord Recognition

This repository has the source codes for the paper "A Bi-Directional Transformer for Musical Chord Recognition"(ISMIR19).

Requirements

  • pytorch >= 1.0.0
  • numpy >= 1.16.2
  • pandas >= 0.24.1
  • pyrubberband >= 0.3.0
  • librosa >= 0.6.3
  • pyyaml >= 3.13
  • mir_eval >= 0.5
  • pretty_midi >= 0.2.8

File descriptions

  • audio_dataset.py : loads data and preprocesses label files to chord labels and mp3 files to constant-q transformation.
  • btc_model.py : contains pytorch implementation of BTC.
  • train.py : for training.
  • crf_model.py : contatins pytorch implementation of Conditional Random Fields (CRFs) .
  • baseline_models.py : contains the codes of baseline models.
  • train_crf.py : for training CRFs.
  • run_config.yaml : includes hyper parameters and paths that are needed.
  • test.py : for recognizing chord from audio file.

Using BTC : Recognizing chords from files in audio directory

Using BTC from command line

$ python test.py --audio_dir audio_folder --save_dir save_folder --voca False
  • audio_dir : a folder of audio files for chord recognition (default: './test')
  • save_dir : a forder for saving recognition results (default: './test')
  • voca : False means major and minor label type, and True means large vocabulary label type (default: False)

The resulting files are lab files of the form shown below and midi files.

Attention Map

The figures represent the probability values of the attention of self-attention layers 1, 3, 5 and 8 respectively. The layers that best represent the different characteristics of each layers were chosen. The input audio is the song "Just A Girl" (0m30s ~ 0m40s) by No Doubt from UsPop2002, which was in evaluation data.

Data

We used Isophonics[1], Robbie Williams[2], UsPop2002[3] dataset which consists of chord label files. Due to copyright issue, these datasets do not include audio files. The audio files used in this work were collected from online music service providers.

[1] http://isophonics.net/datasets

[2] B. Di Giorgi, M. Zanoni, A. Sarti, and S. Tubaro. Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony. In Proc. of the 8th International Workshop on Multidimensional Systems, Erlangen, Germany, 2013.

[3] https://github.com/tmc323/Chord-Annotations

Reference

Comments

  • Any comments for the codes are always welcome.

btc-ismir19's People

Contributors

ckycky3 avatar jayg996 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

btc-ismir19's Issues

Structure of sound source and annotation data

This is not a bug report.
I'm trying to relearn(train.py) BTC-ISMIR2019, but I'm not sure about the directory structure under /data/music/chord_recognition/
Would you tell me where to put the mp3 and lab files?
P.S.
I already have the sound source data (.wav or .mp3) and annotation data (.lab) of isophonic, USpop, and robbie williams.

Extracting embeddings

Hi, thanks for making this! I tested it, it's amazing. I wonder if it's possible to extract vectors for the wave files and if so is there example code for it?

Requires 1 positional argument 'Loader'

May I know what version of python did you used in this case since I'm having an error
Traceback (most recent call last):
File "test.py", line 22, in
config = HParams.load("run_config.yaml")
File "C:\Users\LENOVO\AppData\Local\Programs\Python\Python37\BTC-ISMIR19\utils\hparams.py", line 29, in load
return cls(**yaml.load(f))
TypeError: load() missing 1 required positional argument: 'Loader'

Hardware requirement

Hello, really interesting work! I am curious what GPUs and how many of them are required for training BTC? And how much time does it converge? Thank you.

dataset format

Can you please provide the details how can I arrange the dataset to execute your code?

Running on a non-CUDA device.

In case it helps anyone else, I got this error while running test.py on a machine w/o CUDA support.

Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I fixed the problem by replacing the following line 29 in test.py:

checkpoint = torch.load(model_file)

with

torch.load(model_file, map_location=lambda storage, loc: storage)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.