Coder Social home page Coder Social logo

xeransis / stanford-ctc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from amaas/stanford-ctc

0.0 2.0 0.0 269 KB

Neural net code for lexicon-free speech recognition with connectionist temporal classification

License: Apache License 2.0

Makefile 0.07% Python 73.31% Jupyter Notebook 1.39% Shell 5.64% CMake 1.55% C++ 5.69% HTML 0.18% CSS 0.33% JavaScript 0.51% MATLAB 11.33%

stanford-ctc's Introduction

stanford-ctc

Neural net code for lexicon-free speech recognition with connectionist temporal classification

This repository contains code for a bi-directional RNN training using the CTC loss function. We assume you have separately prepared a dataset of speech utterances with audio features and text transcriptions.

For more information please see the project page and the character language modeling repository

Our neural net code runs on the GPU using Cudamat We use a forked version of Cudamat to add an extra function which you can find here. If you need a more recent version of cudamat you can likely take just the extra function and apply the patch to the most recent version of Cudamat.

The latest code is in the directory ctc_fast; please set your PYTHONPATH accordingly. The script runNNet.py should be the starting point for training the BRNN model -- you'll have to modify run_cfg.py and decoder_config.py. Unfortunately the run*.sh scripts in {timit/wsj/swbd}-utils are outdated but you can refer to them for reasonable parameter settings.

Example feat#.bin, keys#.txt, and alis#.txt files for small subset of TIMIT training data can be found here.

For details about the algorithms used please see our NAACL paper. Also please cite that paper when using this code:

@inproceedings{lexfree2015,
    title={Lexicon-Free Conversational Speech Recognition with Neural Networks},
    author={Maas, Andrew L. and Xie, Ziang and Jurafsky, Dan and Ng, Andrew Y.},
    booktitle={Proceedings the North American Chapter of the Association for Computational Linguistics (NAACL)},
    year={2015}
}

stanford-ctc's People

Contributors

awni avatar zxie avatar amaas avatar qipeng avatar anshulsamar avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.