Coder Social home page Coder Social logo

speechtechlab / multilingual-asr-syllables Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sarapicc/multilingual-asr-syllables

0.0 0.0 0.0 13.83 MB

The aim of this project is to build a multilingual ASR in which phonological syllables are considered as subwords.

License: GNU General Public License v2.0

Python 37.70% Jupyter Notebook 62.30%

multilingual-asr-syllables's Introduction

Multilingual end-to-end ASR system with phonological syllables as subwords

The aim of this project is to build a multilingual ASR system trained with phonological syllables as subwords.

Syllables can be described broadly as linguistic units that represent sound organization patterns in human speech.
They are relevant in speech production and perception and are considered a linguistic universal, meaning that they are found in all the documented languages, and languages that share a similar phonetic inventory have in common most part of their syllable inventory.
According to the phonological definition, each syllable consists of at least a nucleus, namely an element characterized by a high degree of sonority (in most cases a vowel); the nucleus can be surrounded by less sonorant elements that constitute syllable onset and coda. The sonority within the syllable increases before the nucleus, in which the peak is reached, and decreases after it.


Sonority scale


Syllable structure


Syllables convey acoustic information, because the distribution of the segments represents the variation of energy in the signal.
Implementing such elements in the vocabulary on which the model is trained should therefore emphasize the association between each audio frame and its textual label and be beneficial for the recognition.


To obtain syllables as subwords we need to build a custom tokenizer based on the class Wav2Vec2PhonemeCTCTokenizer that works according to the main syllabification rules, the Sonority Sequencing Principle and the Maximal Onset Principle.

The dataset is automatically transcribed in phonemes to work on a phonological level. This is done through the tool WebMAUS Basic provided by the the Bavarian Archive for Speech Signals of the Institute of Phonetics and Speech Processing of the Ludwig-Maximilians-Universität (München, Germany).


To build the ASR we fine-tune the pre-trained model WavLM-large (Chen et al., 2021) on multilingual speech data extracted from the Mozilla Common Voice dataset.

The languages considered within this project are Italian, Spanish and French.

The performance of the model is evaluated with two metrics: the Token Error Rate and the Phoneme Error Rate.


This repo contains:

  • multilingual_corpus.ipynb
    -> multilingual corpus preparation (Common Voice data, WebMAUS Basic transcriptions)

  • transcriptions
    -> folder with pkl files with phonetic and phonological transcriptions of Italian, French and Spanish data

  • expMLhyb20.py
    -> fine-tuning of the model wavLM-large on multilingual data with a custom tokenizer and syllable-based vocabulary

  • syllabifier.ipynb
    -> syllabification algorithm used to generate the syllabic vocabulary

  • HybridML_ITESFRPhoCTCTokenizer
    -> custom tokenizer that acts according to syllabification rules

  • tokenizerMLT_ITESFR_hybPhoSyl246
    -> folder with multilingual syllabic vocabulary

  • evaluation.py
    -> evaluation of the trained model on the test dataset. Generates a csv file with a sample of predictions

  • back2words.py
    -> auxiliary script that adjusts the format of the predicted sentences to calculate PER, TER and WER scores

multilingual-asr-syllables's People

Contributors

sarapicc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.