qqueing / deepspeaker-pytorch Goto Github PK
View Code? Open in Web Editor NEWSpeaker embedding(verification and recognition) using Pytorch
License: MIT License
Speaker embedding(verification and recognition) using Pytorch
License: MIT License
Hello! In the test section of triplet_train.py, what printed out is batch_idx * len(data_a)/len(test_loader.dataset). The batch_idx grows along with iterations, while the len(data_a) stays unchanged(default 512 seemingly). However, the length of the test_loader.dataset is 37720, as is given by the voxceleb1_test.txt. The numerator and denominator don't match well when the numerator grows really big. What's wrong here? Any good suggestion?
Could you tell me why you use the L2 distance rather than the Cosine Similarity proposed in the paper?
will use delta feature improve the performance?
Hi,
Thanks for the great implementation! I am wondering how should I download voxceleb dataset. What is the expected directory structure? And also could you give us an example of test_pairs file?
Best,
Xin
hello! Thanks for your code and can you provide the pretrained model so I can evaluate my own dataset?
I could be wrong, since I normaly don't do speech verification.
But in my case, if I run the training it gives following error when affine transform is done to match the embedding dim after avg pool
"x = self.model.fc(x)" gives
RuntimeError: size mismatch, m1: [512 x 1024], m2: [2048 x 512] at /opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THC/generic/THCTensorMathBlas.cu:266
I think this is because
avgpool is supposed to be on the temporal dimension by design, and in the commited version of the code, the avg pooling is done on frequency domain.
avg pool2d is supposed to give [F, T] = [4, 2] => [4,1] but instead it gives [4, 2] => [1, 2]
Thus the dimension after torch.view is half smaller than what is expected by the model.fc layer.
for myResNet.init()
Again, I'm no expert of speech verification.
Anybody has another idea on how to fix that bug that is occuring to me, please please let me know.
hello,i can download dataset from this URL:http://www.robots.ox.ac.uk/~vgg/data/voxceleb/download.sh with this commond : ./download.sh list.txt but all download audio is in a file.How can I separate them with speakers???can you understand? my English is awfu. very thanks.
Can I use this with CUDA 9.0 ?
Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is growing when I use greater number of frames (of course it's not only from silence segments). May be you was doing some experiments with numbers of frames?
Great job on implementing paper!
Question: why did you use python_speech_features.fbank instead of librosa.feature.melspectrogram ?
Both transformations are the same, right?
请问您这里有中文的参考资料吗,我刚把方向定下来要做这个,了解一点点机器学习,但语音这块是两眼一抹黑,可以分享点手头的资料给我吗?不胜感激,[email protected]
Hi! @qqueing
how to run this code?
It would be great if there was some sort of pre-trained model on which you can fine tune using your own small dataset
Could you please show the folder structure of voxceleb/ ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.