Coder Social home page Coder Social logo

Speakers id. about open_stt HOT 9 CLOSED

snakers4 avatar snakers4 commented on May 13, 2024
Speakers id.

from open_stt.

Comments (9)

snakers4 avatar snakers4 commented on May 13, 2024

from open_stt.

snakers4 avatar snakers4 commented on May 13, 2024

We are planning to share a much larger dataset based on audio-books
Please PM me (telegram), I will share a private meta-data file, from which you could extract the data you need
We are not planning to share this data publicly yet

from open_stt.

snakers4 avatar snakers4 commented on May 13, 2024

@i7p9h9
We shared this dataset update here

from open_stt.

stefan-falk avatar stefan-falk commented on May 13, 2024

It would be great if the data came with dedicated directories for each speaker e.g.

<dataset-id>/<speaker-id>/<sample-id>.wav
                          <sample-id>.txt

because it makes sense to separate speakers during training and testing. Not just for speaker recognition but also for STT tasks.

However, open_stt is an awesome dataset nevertheless. Are you planning on adding more languages?

from open_stt.

snakers4 avatar snakers4 commented on May 13, 2024

Hi!

Doing exactly this is not feasible unfortunately due to the nature of the dataset (zero money investment into annotation).

But we could share speakers privately as meta data for a very limited subset of data if this helps. Mostly books.

from open_stt.

stefan-falk avatar stefan-falk commented on May 13, 2024

I see. Well, my workaround here is throwing everything uncertain into the train set and test on data which has speaker separation. E.g. the Common Voice dataset might be reliable enough.

If I may ask, what kind of word error rate (WER) did you get on the entire open_stt dataset? I am currently not too far below 40% (using ~3000h of the data) which is actually not as good as I expected it to be for so many hours of speech. :)

from open_stt.

snakers4 avatar snakers4 commented on May 13, 2024

Well, my workaround here is throwing everything uncertain into the train set and test on data which has speaker separation.

We have a small subset of the data (15 hours) manually annotated - we will be posting it soon enough

what kind of word error rate (WER) did you get on the entire open_stt dataset

Sorry for a late reply, but please refer to a ticket #5 #7
Obviously these are not the best / latest models, but you can see some patterns in the distributions
You will see that the whole dataset is not consistent in the annotation quality, so it has / will be distilled

There have been reports that if you use esp-net w/o data with bad annotation, you will get a much better result

It will be the foremost focus of our future work - seeding out the bad data

from open_stt.

stefan-falk avatar stefan-falk commented on May 13, 2024

Sorry for a late reply, but please refer to a ticket #5 #7

@snakers4 no worries :)

Thanks for sharing that information. Will take a look on those issues.

Thanks for doing all this great work and providing such an easy-to-use dataset!

from open_stt.

jsdtwry avatar jsdtwry commented on May 13, 2024

I see. Well, my workaround here is throwing everything uncertain into the train set and test on data which has speaker separation. E.g. the Common Voice dataset might be reliable enough.

If I may ask, what kind of word error rate (WER) did you get on the entire open_stt dataset? I am currently not too far below 40% (using ~3000h of the data) which is actually not as good as I expected it to be for so many hours of speech. :)

Hi, stefan: You mentationed that you have trained ASR system on common voice russia, could you share the lastest common voice russia WER performance? I do not learn a lot about russian language, and train a russian ASR system with little 60h data with RU- common voice data, now the WER is about 40% with a chain model with kaldi toolkit even with a test set text LM, do you think it's normal? I haven't found any bench mark on common voice russia, do you think it's a normal performance? I find you often evalute russia ASR with CER, wheather it is more common on russian ASR target? Thanks a lot !!!

from open_stt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.