Coder Social home page Coder Social logo

hear-2021-audio-mlp's Introduction

Audio-MLP

MLP-based models for learning audio representations. Submission for HEAR-2021@NeurIPS'21.

Setup

pip install -e 'git+https://github.com/ID56/HEAR-2021-Audio-MLP.git#egg=hearaudiomlp'

Usage

from hearaudiomlp.kwmlp import load_model, get_timestamp_embeddings, get_scene_embeddings # or from hearaudiomlp.audiomlp

model = load_model("checkpoints/kwmlp.pth")

b, ms, sr = 2, 1000, 16000
dummy_input = torch.randn(b, int(sr * ms / 1000))

embeddings, timestamps = get_timestamp_embeddings(dummy_input, model)
scene_embeddings = get_scene_embeddings(dummy_input, model)

Models

Model Name # Params† GFLOPS*† Sampling Rate Hop Length Timestamp Embedding Scene Embedding Location
kwmlp 424K 0.034 16000 10ms 64 1024 kwmlp(1.7Mb)
audiomae 213K 0.023 16000 10ms 8 1584 audiomae(0.9Mb)

Only considering the encoder, which is used for generating embeddings.
* Although there is no direct way to count FLOPS like parameters, you can use facebookresearch/fvcore. The FLOPS measured are per 1 single input spectrogram (tensor of shape (1, 40, 98)).

Creative Commons License
The trained checkpoints are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, as per HEAR-2021 requirements. You may also download them from drive: [ kwmlp | audiomae ].

Notes

All models were trained on:

  • A standard Kaggle environment: a single 16GiB NVIDIA Tesla P100, CUDA 11.0, CuDNN 8.0.5, python 3.7.10.
  • KW-MLP was trained on Google Speech Commands V2-35, and the weights are a direct port of its paper [1].
  • AudioMAE is an adaptation of KW-MLP that was trained on the training splits from the HEAR2021 Open tasks.
  • Both models primarily utilize gated-MLPs [2].

References

@misc{morshed2021attentionfree,
      title   = {Attention-Free Keyword Spotting}, 
      author  = {Mashrur M. Morshed and Ahmad Omar Ahsan},
      year    = {2021},
      eprint  = {2110.07749},
      archivePrefix = {arXiv},
      primaryClass  = {cs.LG}
}
@misc{liu2021pay,
      title  = {Pay Attention to MLPs}, 
      author = {Hanxiao Liu and Zihang Dai and David R. So and Quoc V. Le},
      year   = {2021},
      eprint = {2105.08050},
      archivePrefix = {arXiv},
      primaryClass  = {cs.LG}
}

hear-2021-audio-mlp's People

Contributors

mashrurmorshed avatar

Stargazers

Xiaoling Du avatar Ahmad Omar Ahsan avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.