Coder Social home page Coder Social logo

warisqr007 / vq-bnf Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 161 KB

Vector Quantizing speech representations

License: Apache License 2.0

Python 94.05% Jupyter Notebook 5.22% Shell 0.73%
speech-processing speech-recognition speech-synthesis vector-quantization voice-conversion

vq-bnf's Introduction

Vector Quantize PPGs/Bottleneck features

Code for vector quantizing speech dataset, including melspectrograms, phonetic-posteriorgrams/bottleneck features(BNFs). This repo trains an independent module to vector quantize BNFs.

For usage in voice conversion, see here

Installation

  • Install ffmpeg.
  • Install Kaldi
  • Install PyKaldi
  • Install packages using environment.yml file.
  • Download pretrained TDNN-F model, extract it, and set PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh to the pretrained model directory.

Dataset

  • Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
    • You also need to set KALDI_ROOT and PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh accordingly.
  • Vector Quantization: [ARCTIC and L2-ARCTIC, see here for detailed training process.

All the pretrained the models are available (To be updated) here

Directory layout (Format your dataset to match below)

datatset_root
├── speaker 1
├── speaker 2 
│   ├── wav          # contains all the wav files from speaker 2
│   └── kaldi        # Kaldi files (auto-generated after running kaldi-scripts
.
.
└── speaker N

Quick Start

See the inference script

Training

  • Use Kaldi to extract BNF for individual speakers (Do it for all speakers)
./kaldi_scripts/extract_features_kaldi.sh /path/to/speaker
  • Preprocessing
python preprocess_bnfs.py path/to/dataset
python python make_data_all.py  #Edit the file to specify dataset path
  • Setting Training params. See conf/

  • Training VQ Model

./train.sh

vq-bnf's People

Contributors

warisqr007 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

ishine

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.