Coder Social home page Coder Social logo

jakobovski / free-spoken-digit-dataset Goto Github PK

View Code? Open in Web Editor NEW
605.0 605.0 250.0 30.39 MB

A free audio dataset of spoken digits. An audio version of MNIST.

Python 100.00%
audio dataset machine-learning mnist speech-recognition spoken-digits spoken-language

free-spoken-digit-dataset's People

Contributors

adhishthite avatar antgeorge avatar cesarsouza avatar eonu avatar epochdv avatar experimenti avatar farizrahman4u avatar felixdollack avatar jakobovski avatar madtracki avatar mikayelh avatar pssf23 avatar speechwrecko avatar verbose-void avatar yuxinpan avatar yweweler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

free-spoken-digit-dataset's Issues

DOI

It would be nice to have a DOI for the dataset instead of relying on just git tags.
I would suggest to use Zenodo as they are integrated with github and support multiple versions of the same dataset.

problem of 8bit&16bit

some files are 8bit like 1_nicolas_36.wav
some files are 16bit like 9_theo_22.wav
it hard to deal with two diffient Sampling number
how can i unified?

Consider spreading the data into multiple directories

Right now the entire dataset is contained in a single directory (https://github.com/Jakobovski/free-spoken-digit-dataset/tree/master/recordings). This will not scale once the dataset becomes larger. Depending on the file system, even listing the directory contents with ls can become burdensome after around 10,000 files.

But another reason to do so is that the current layout may prevent the files from being queried using GitHub's developer API in the future. I am building an interface to the dataset that can automatically download, query and organize the dataset into training and testing sets without having to first clone the dataset using git. However, there is a limit on the number of files that can be retrieved using this API, and after this limit, the only method would be to clone the repository and retrieve the files manually.

Regards,
Cesar

Add other languages

Hi! Would you consider adding Serbian language to the dataset? I am interesetd to contribute my voice and as many as I can gather. I suppose this would also be simpler to accomplish if we could gather audio online using an automated website.

License

Have you considered choosing a license (Creative Commons for instance) to make explicit the conditions for copying and reuse?

adding more numbers

I think it will be better if you add numbers like 20,30,40,50,60,70,80,90,100 and from 11:20
so we can have all the numbers combinations, so for example, if we use this dataset in application to recognize numbers like 3.45, 578, or 54 this dataset after improving will help a lot

Normalize the recordings to have the same number of channels

It seems that the recordings done by Jackson have only 1 audio channel (mono), but the recordings by Nicolas have 2 audio channels (stereo). I've noticed that there are no guidelines regarding how many channels the recordings should have in the main README.md at the front page of this project. As such, I would like to know whether the samples from this dataset can have samples with different audio channels, or whether there are plans to normalize the samples such that they all have just one channel. In either case, I suppose the contribution guidelines could be extended with this information.

Regards,
Cesar

About reference

Thank you for your work,Please tell me if I want to use this data set in my paper,what citation format should I use?

RFI

Hello,
Can I ask simple (read stupid question), can I use this model for digit recognition or speaker recognition/identification or both?
Thanks in advance,
Mirko

How can we record 8KHz Mono WAV format file for Digit Classification?

I used this code for training a model to classify free-spoken-digit-dataset (https://github.com/mikesmales/Udacity-ML-Capstone). The accuracy of the trained model is 96%.

But the prediction using the saved model fails when I test it for my recorded voice. I recorded some digits using windows 10 voice recorder and converted files to 8KHz Mono WAV format.

Any help you can provide? The model predicts accurately on the recordings provided within the dataset.

My Recorded Digit 3:

Original sample rate: 22050
Librosa sample rate: 22050
Original audio file minmax range: 20 to 239
Librosa audio file minmax range: -0.84375 to 0.8671875
(40, 18)

Dataset : Jackson 3:

Original sample rate: 8000
Librosa sample rate: 22050
Original audio file minmax range: -10989 to 9277
Librosa audio file minmax range: -0.35349792 to 0.28417692
(40, 21)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.