Coder Social home page Coder Social logo

Comments (9)

reuben avatar reuben commented on April 29, 2024

LibriVox doesn't have properly aligned transcriptions. Is figuring out a solution for that within the scope of this issue?

from deepspeech.

reuben avatar reuben commented on April 29, 2024

Another alternative would be using existing corpuses (corpi?) extracted from LibriVox like LibriSpeech: http://www.openslr.org/12/

from deepspeech.

kdavis-mozilla avatar kdavis-mozilla commented on April 29, 2024

Have you looked at the TED code in issue 2?

from deepspeech.

reuben avatar reuben commented on April 29, 2024

Yep. I started writing a bunch of code for downloading and formatting the LibriVox data directly, from the Internet Archive, but after reading the LibriSpeech paper I learned that proper alignment and segmentation is a very large effort and we should probably just use that corpus directly, so I'm gonna do that.

from deepspeech.

kdavis-mozilla avatar kdavis-mozilla commented on April 29, 2024

Before you go off on a wild goose chase, please define what you mean by "proper alignment".

from deepspeech.

kdavis-mozilla avatar kdavis-mozilla commented on April 29, 2024

Also did you read and understand the Deep Speech paper?

The Deep Speech paper and our code under master uses the CTC algorithm which does not require "alignment" in the sense used for HMM STT engines.

from deepspeech.

kdavis-mozilla avatar kdavis-mozilla commented on April 29, 2024

Using LibriSpeech directly is fine, it's actually what I expected form the start, but do not spend time trying to "align" the corpus in the sense used for HMM STT engines. CTC does not require such "alignment".

from deepspeech.

reuben avatar reuben commented on April 29, 2024

Also did you read and understand the Deep Speech paper?

Not as well as I thought I had, evidently! Either that or I'm just abusing the jargon.

I was under the impression that the transcriptions need to have a minimal resemblance to the audio, which the raw LibriVox data, by default, doesn't have. That's as far as my definition of "alignment" went: skipping the initial audio disclaimers, skipping the license header on the Project Gutenberg files, etc.

In any case, we've ended up on the same page, albeit in my case that included a few bumps along the way :P

from deepspeech.

lock avatar lock commented on April 29, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from deepspeech.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.