Coder Social home page Coder Social logo

allthingsllm / mert Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yizhilll/mert

0.0 0.0 0.0 12 KB

Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".

License: Apache License 2.0

Shell 74.72% Python 25.28%

mert's Introduction

MERT

This is the official implementation of the paper "MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training".

Training

The MERT training is implemented with fairseq. We clone the fairseq repo inside our repo and implement MERT as a fairseq example, and give the soft link of the code at ./mert_fairseq.

Environment Setup

The training of MERT requires:

  • fairseq & pytorch for the training (must)
  • nnAudio for on-the-fly CQT inference (must)
  • apex for half-precision training (optaional)
  • nccl for multiple device training (optional)
  • fairscale for FSDP and CPU offloading (optional)

You could use the script ./scripts/environment_setup.sh to set up the python environment from scarth. All the relevant folders will be placed at the customized $MAP_PROJ_DIR folder.

Data Preparation

The data preparation can be referred to HuBERT for more details.

Generally, there are 2 things you need to prepare:

  • DATA_DIR=${MAP_PROJ_DIR}/data/audio_tsv: a folder that contains a train.tsv and a valid.tsv file, which specify the root path to the audios at the first line and the relative paths at the rest lines.
  • LABEL_ROOT_DIR=${MAP_PROJ_DIR}/data/labels: a folder filled with all the discrete tokens that need to prepare before training. They could be K-means or RVQ-VAE tokens.

Start Training

After the environment is set up, you could use the following scripts:

# for MERT95M
bash scripts/run_training.sh 0 dummy MERT_RVQ-VAE_CQT_95M

# for MERT 330M
bash scripts/run_training.sh 0 dummy MERT_RVQ-VAE_CQT_330M

Inference

We use the huggingface models for interface and evaluation. Using the example of RVQ-VAE 95M MERT as example, the following codes show how to load and extract representations with MERT.

python MERT/scripts/MERT_demo_inference.py

Checkpoints

Huggingface Checkpoint

Our Huggingface Transformers checkpoints for convenient inference are uploaded to the m-a-p project page.

  • MERT-v0: The base (95M) model trained with K-means acoustic teacher and musical teacher.
  • MERT-v0-public: The base (95M) model trained with K-means acoustic teacher and musical teacher using the public music4all training data.
  • MERT-v1-95M: The base (95M) model trained with RVQ-VAE acoustic teacher and musical teacher.
  • MERT-v1-330M: The large (330M) model trained with RVQ-VAE acoustic teacher and musical teacher.

Fairseq Checkpoint

We also provide the corresponding fairseq checkpoint for continual training or further modification. Coming soon.

Citation

@misc{li2023mert,
      title={MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training}, 
      author={Yizhi Li and Ruibin Yuan and Ge Zhang and Yinghao Ma and Xingran Chen and Hanzhi Yin and Chenghua Lin and Anton Ragni and Emmanouil Benetos and Norbert Gyenge and Roger Dannenberg and Ruibo Liu and Wenhu Chen and Gus Xia and Yemin Shi and Wenhao Huang and Yike Guo and Jie Fu},
      year={2023},
      eprint={2306.00107},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

mert's People

Contributors

yizhilll avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.