Coder Social home page Coder Social logo

jcvasquezc / phonet Goto Github PK

View Code? Open in Web Editor NEW
36.0 2.0 14.0 23.54 MB

Keras-based python framework to compute phonological posterior probabilities from audio files

License: MIT License

Python 100.00%
speech-processing deep-learning deep-neural-networks phonetics linguistics linguistic-analysis

phonet's Introduction

Phonet

https://readthedocs.org/projects/phonet/badge/?version=latest

target:https://phonet.readthedocs.io/en/latest/?badge=latest
alt:Documentation Status
https://travis-ci.org/jcvasquezc/phonet.svg?branch=master

This toolkit compute posteriors probabilities of phonological classes from audio files for several groups of phonemes according to the mode and manner of articulation.

If you are not sure about what phonological classes are, have a look at this Phonological classes tutorial

Project Documentation

Paper

The list of the phonological classes available and the phonemes that are activated for each phonological class are observed in the following Table

The list of the phonological classes available and the phonemes that are activated for each phonological class are observed in the following Table

Phonological class Phonemes
vocalic /a/, /e/, /i/, /o/, /u/
consonantal /b/, /tS/, /d/, /f/, /g/, /x/, /k/, /l/, /ʎ/, /m/, /n/, /p/, /ɾ/, /r/, /s/, /t/
back /a/, /o/, /u/
anterior /e/, /i/
open /a/, /e/, /o/
close /i/, /u/
nasal /m/, /n/
stop /p/, /b/, /t/, /k/, /g/, /tS/, /d/
continuant /f/, /b/, /tS/, /d/, /s/, /g/, /ʎ/, /x/
lateral /l/
flap /ɾ/
trill /r/
voiced /a/, /e/, /i/, /o/, /u/, /b/, /d/, /l/, /m/, /n/, /r/, /g/, /ʎ/
strident /f/, /s/, /tS/
labial /m/, /p/, /b/, /f/
dental /t/, /d/
velar /k/, /g/, /x/
pause /sil/

Installation

From this repository:

git clone https://github.com/jcvasquezc/phonet
cd phonet
python setup.py

Usage

Supported features:

  • Estimate probabilities of phonological classes for an audio file

Example use

Estimation of phonological classes

Estimate the phonological classes using the BGRU models for an audio file or for a folder that contains audio files inside:

python
phon=Phonet([phonclass])
get_phon_wav(self, audio_file, feat_file, plot_flag=True)
Parameters Description
audio_file file audio (.wav) sampled at 16 kHz
feat_file file (.csv) to save the posteriors for the phonological classes
phonclass list of phonological classes to be evaluated The list of phonological classes include: "consonantal", "back", "anterior", "open", "close", "nasal", "stop", "continuant", "lateral", "flap", "trill", "voice", "strident", "labial", "dental", "velar", "pause", "vocalic" or "all"
plot_flag True or False, whether you want plots of phonological classes or not
returns It crates the feat_file with the estimation of the phonological classes for each time-frame of the audio file.

Training

If you want to train Phonet in your own language, or specific phonological classes that are not defined here, please refer to the folder train and follow the instructions there.

If you experienced problems with the Training process, please send me an email <[email protected]>

Reference

Phonet is available for research purposes

If you use Phonet, please cite the following paper.

@inproceedings{Vasquez-Correa2019,
author={J. C. Vásquez-Correa and P. Klumpp and J. R. Orozco-Arroyave and E. N"oth}, title={{Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech}}, year=2019, booktitle={Proc. Interspeech 2019}, pages={549--553}, doi={10.21437/Interspeech.2019-1405}, url={http://dx.doi.org/10.21437/Interspeech.2019-1405}

}

phonet's People

Contributors

jcvasquezc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

phonet's Issues

Error?

I find your padding strategy in get_feat function of file phonet.py quite weird. First of all, you compute

fill=len(signal)%int(fs*self.size_frame*self.len_seq)

so with the default parameters this means that if a signal has length say strictly less than 16000=int(fs*self.size_frame*self.len_seq), then you are simply doubling its size regardless of the length (normally you pad to fill frames instead).

Moreover, in the next line you do:

fillv=0.05*np.random.randn(fill)-self.size_frame

so you substract the value size frame to the random signal used for padding. I don't know why you'd do that.

Import error for Adam optimizer

Hi @jcvasquezc

While using the disvoice library I encountered this AttributeError when importing the Adam optimizer from keras:
AttributeError: module 'keras.optimizers' has no attribute 'Adam'

This occurs in the phonet.py. What worked for me was replacing all the keras imports, for example from keras import optimizers to from tensorflow.keras import optimizers.

I don't know if it's only me getting this problem, if not, could you kindly make a release with this few changes.
Best regards.

Accompanying paper?

Hi

Have you published a paper describing this tool?

Also, as I understand, the toolkit will compute probabilities for each phoneme for a given audio file. Does it work for free speech too or just specific utterances.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.