jakobovski / free-spoken-digit-dataset Goto Github PK
View Code? Open in Web Editor NEWA free audio dataset of spoken digits. An audio version of MNIST.
A free audio dataset of spoken digits. An audio version of MNIST.
It would be nice to have a DOI for the dataset instead of relying on just git tags.
I would suggest to use Zenodo as they are integrated with github and support multiple versions of the same dataset.
sir how can get this whole dataset/part of it for my project
Hi. I was listening to the new samples from Jason, a lot of them are badly truncated at the beginning.
How to perform 1D wavelet scattering transform on this dataset?
some files are 8bit like 1_nicolas_36.wav
some files are 16bit like 9_theo_22.wav
it hard to deal with two diffient Sampling number
how can i unified?
Currently, the dataset contains 1501 recordings instead of the 1500 described in the readme. If you check the list of files currently at the recordings directory, you will see that it contains a file named "6_jackson_50.wav". However the maximum possible number label for the index should have been 49 instead of 50.
FYI file encoding for speaker nicolas are 8bit unsigned integer whereas all other speakers are 16bit Signed int
sox -b 16 -e signed-int old_nicolas.wav new_nicolas.wav
does the trick
Right now the entire dataset is contained in a single directory (https://github.com/Jakobovski/free-spoken-digit-dataset/tree/master/recordings). This will not scale once the dataset becomes larger. Depending on the file system, even listing the directory contents with ls
can become burdensome after around 10,000 files.
But another reason to do so is that the current layout may prevent the files from being queried using GitHub's developer API in the future. I am building an interface to the dataset that can automatically download, query and organize the dataset into training and testing sets without having to first clone the dataset using git. However, there is a limit on the number of files that can be retrieved using this API, and after this limit, the only method would be to clone the repository and retrieve the files manually.
Regards,
Cesar
Hi! Would you consider adding Serbian language to the dataset? I am interesetd to contribute my voice and as many as I can gather. I suppose this would also be simpler to accomplish if we could gather audio online using an automated website.
Have you considered choosing a license (Creative Commons for instance) to make explicit the conditions for copying and reuse?
I think it will be better if you add numbers like 20,30,40,50,60,70,80,90,100 and from 11:20
so we can have all the numbers combinations, so for example, if we use this dataset in application to recognize numbers like 3.45, 578, or 54 this dataset after improving will help a lot
It seems that the recordings done by Jackson have only 1 audio channel (mono), but the recordings by Nicolas have 2 audio channels (stereo). I've noticed that there are no guidelines regarding how many channels the recordings should have in the main README.md at the front page of this project. As such, I would like to know whether the samples from this dataset can have samples with different audio channels, or whether there are plans to normalize the samples such that they all have just one channel. In either case, I suppose the contribution guidelines could be extended with this information.
Regards,
Cesar
Thank you for your work,Please tell me if I want to use this data set in my paper,what citation format should I use?
I believe the Zenodo
& Tensorflow
versions of FSDD
are out-of-sync. Links:
https://zenodo.org/record/1342401#.YWygx9nMK3I
https://www.tensorflow.org/datasets/catalog/spoken_digit
Hello,
Can I ask simple (read stupid question), can I use this model for digit recognition or speaker recognition/identification or both?
Thanks in advance,
Mirko
I used this code for training a model to classify free-spoken-digit-dataset (https://github.com/mikesmales/Udacity-ML-Capstone). The accuracy of the trained model is 96%.
But the prediction using the saved model fails when I test it for my recorded voice. I recorded some digits using windows 10 voice recorder and converted files to 8KHz Mono WAV format.
Any help you can provide? The model predicts accurately on the recordings provided within the dataset.
My Recorded Digit 3:
Original sample rate: 22050
Librosa sample rate: 22050
Original audio file minmax range: 20 to 239
Librosa audio file minmax range: -0.84375 to 0.8671875
(40, 18)
Dataset : Jackson 3:
Original sample rate: 8000
Librosa sample rate: 22050
Original audio file minmax range: -10989 to 9277
Librosa audio file minmax range: -0.35349792 to 0.28417692
(40, 21)
I would suggest making all of the recordings a standard 1 second so we have access to all 8000 samples. the recordings vary greatly in sample length
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.