Coder Social home page Coder Social logo

baidu-deepspeech2's Introduction

Baidu's Deep Speech 2 (Tensorflow)

(This is a work in progress)

This is a python implementation of Baidu's Deep Speech 2 paper https://arxiv.org/pdf/1512.02595.pdf using tensorflow

TODO:

  • Fix GPU memory
  • Add batch normalization to RNN
  • Implement row convolution layer
  • Add other dataset support
  • Create pretrained models

Preprocessing

To preprocess your data you must first download the one of the datasets above and extract them to a folder. Then run the following script to preprocess the data (This might take a while depending on the amount of data you have)

python preprocess.py --data-dir=<your data directory> --dataset=<dataset name>

Training

Now that you have preprocessed your data, you can train a model. To do this, you can edit the settings in the config.py file if you want. Then run the following command to train the model:

python train.py

Testing your model

Now that you have trained a model, you can go ahead and start using it. We have created two scripts that can help you do this infer.py and streaming_infer.py. The infer.py script, transcribes a audio file that you give it

python infer.py -f <your audio file name>

The streaming_infer.py script uses PyAudio to record audio from your computer's microphone and transcribes it in real-time. To run it simply:

python streaming_infer.py

baidu-deepspeech2's People

Contributors

nchalifo avatar noahchalifour avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.