Coder Social home page Coder Social logo

neural_sp's Introduction

NeuralSP: Neural network based Speech Processing

How to install

Data preparation

Features

Connectionist Temporal Classification (CTC)

  • beam search
  • Shallow fusion [link]

Attention-based sequence-to-sequence

Encoder

  • CNN encoder
  • (bidirectional/unidirectional) LSTM encoder
  • CNN+(bidirectional/unidirectional) LSTM encoder
  • self-attention (Transformer) encoder [link]
  • Time-Depth Seprarabel (TDS) convolutional encoder [link] (NEW!)

Decoder

  • RNN decoder
    • Beam search
    • Shallow fusion [link]
    • Cold fusion [link]
    • Deep fusion [link]
    • Forward-backward attention decoding [link]
  • Transformer decoder

Attention

  • RNN decoder
    • location
    • additive
    • dot-product
    • Luong's dot/general/concat [link]
    • Multi-headed dor-product [link]
  • Transformer decoder
    • Multi-headed dor-product [link]

Language model (LM)

  • RNNLM (recurrent neural network language model)
  • Gated convolutional LM [link]

Output units

  • phoneme
  • grapheme
  • wordpiece (BPE, sentencepiece)
  • word
  • word-char mix

Multi-task learning (MTL)

Multi-task learning (MTL) with different units are supported to alleviate data sparseness.

  • Hybrid CTC/attention [link]
  • Hierarchical Attention (e.g., word attention + character CTC) [link]
  • Hierarchical CTC (e.g., word CTC + character CTC) [link]
  • Hierarchical CTC+Attention (e.g., word attention + character CTC) [link]
  • Forward-backward attention [link]
  • RNNLM objective [link]

Performance

WSJ (WER)

model test_dev93 test_eval92
Char attn 16.7 13.6
+ RNNLM 14.0 10.7
BPE1k attn 15.1 12.4
+ RNNLM 11.6 9.3
+ char CTC N/A N/A

CSJ (WER)

model eva1l eval2 eval3
Char attn N/A N/A N/A
+ RNNLM N/A N/A N/A
BPE30k attn 8.8 6.3 6.9
+ RNNLM 8.2 6.0 6.7
Word30k attn 9.3 7.0 7.9
+ RNNLM 8.9 6.9 7.6
+ Char attn 8.8 6.8 7.6
+ OOV resolution 8.3 6.1 6.7

Switchboard (WER)

model SWB CH
Char attn N/A N/A
BPE10k attn 11.8 23.5
+ RNNLM 11.0 23.3
+ speed perturbation 10.2 21.5
Word10k attn N/A N/A

Librispeech (WER)

model dev-clean dev-other test-clean test-other
Char attn N/A N/A N/A N/A
BPE30k attn N/A N/A N/A N/A
Word30k attn N/A N/A N/A N/A

Reference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.