forced-decoding's Introduction

forced-decoding

This repository contains
A) the files needed to perform forced decoding given a piece of audio and the "ground truth" to be obtained from the decoding.
B) the files needed to turn this into an API.

Disclaimer: All of the API work here is very primitive and should only be used as a proof of concept.

Requirements

I tested everything here with an Ubuntu 20.04 VM on Azure for the backend / decoding part.

Setup

Backend / Forced Decoding

To set up the backend / decoding, follow these steps on the Ubuntu machine (as root):

Install Docker.
docker pull kaldiasr/kaldi
Clone this repository.
docker run -it -v <PATH_TO_REPO>/server:/opt/kaldi/egs/wsj/s5/forced_vit kaldiasr/kaldi /bin/bash
In the container, run:

cd egs/wsj/s5/
cp -r forced_vit/* .
./stage.sh
./make_forced.sh
./setup_speech.sh

Detach from the container (make sure it stays running)
Run the server script app.py
Usage: python3 app.py <docker-container-id>

Client Side

No set up needed for client side, just run client script detect_ans.py with the IP of the server.
Usage: python3 detect_ans.py <api-ip>

Testing

If you just want to test to make sure forced decoding works after setting up the backend, inside the docker container you can place the audio file here: /opt/kaldi/egs/wsj/s5/client_sound.wav

Then run: ./forced_single.sh "<WORDS-TO-DECODE>"

Running this for the first time will take a while.

Also, client/split_test_data.py is included to show how I roughly split audio given the youtube transcript when testing the decoder for resilience.

Recommend Projects

branai / forced-decoding Goto Github PK

forced-decoding's Introduction

forced-decoding

Requirements

Setup

Backend / Forced Decoding

Client Side

Testing

forced-decoding's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent