Coder Social home page Coder Social logo

bhigy / zr-2021vg_baseline Goto Github PK

View Code? Open in Web Editor NEW
7.0 6.0 2.0 380 KB

Baselines for the Zero-Resources Speech Challenge using VisuallyGrounded Models of Spoken Language, 2021 edition

Home Page: https://zerospeech.com/

License: Apache License 2.0

Python 100.00%
weakly-supervised-learning speech-processing deep-neural-networks pytorch multimodal-learning representation-learning visually-grounded-speech spokencoco librispeech challenge

zr-2021vg_baseline's Introduction

ZeroSpeech2021-VG — Baselines

This repository contains the code to run the baselines for the Zero-Resource Speech Challenge using Visually-Grounded Models of Spoken Language, 2021 edition.

Overview of the baselines

Our baselines are directly inspired by the audio-only baselines used in the Zerospeech 2021 challenge. The main difference is that we incorporate a visually grounded (VG) model to learn our speech representations. Those representations are then fed to the language model through K-means clustering. The low-budget baseline completely replaces the contrastive predictive model (CPC) with the VG model. The high-budget baseline, on the other hand, adds the VG model on top of the CPC model.

Step Low-budget baseline High-budget baseline
Input MFCCs CPC-small
Acoustic model VG model VG model
Quantization K-means K-means
Language Model BERT small BERT large

How to use ?

  1. Installation
  2. Datasets
  3. Low budget baseline : MFCCs + VG + KMEANS + LM (1 GPU)
  4. High budget baseline : CPC + VG + KMEANS + LM (1 to 32 GPUs)
  5. Evaluation
  6. Baselines' results

Some useful reads

If you want to gain knowledge about the approach adopted in the ZeroSpeech 2021 challenge, we highly recommend going through :

[1] Description of the challenge in : Nguyen, T. A., de Seyssel, M., Rozé, P., Rivière, M., Kharitonov, E., Baevski, A., Dunbar, E., & Dupoux, E. (2020). The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling. http://arxiv.org/abs/2011.11588

[2] Website of the challenge : https://zerospeech.com/2021/news.html

[3] Description (1st) of the visually grounded models in : Chrupała, G. (2019). Symbolic Inductive Bias for Visually Grounded Learning of Spoken Language. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 6452–6462. https://doi.org/10.18653/v1/P19-1647

[4] Description (2nd) of the visually grounded models in : Higy, B., Elliott, D., & Chrupała, G. (2020). Textual Supervision for Visually Grounded Spoken Language Understanding. Findings of the Association for Computational Linguistics: EMNLP 2020, 2698–2709. https://doi.org/10.18653/v1/2020.findings-emnlp.244

zr-2021vg_baseline's People

Contributors

bhigy avatar marvinlvn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.