Coder Social home page Coder Social logo

deepspeaker-pytorch's People

Contributors

qqueing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepspeaker-pytorch's Issues

data_a & testloader.dataset()

Hello! In the test section of triplet_train.py, what printed out is batch_idx * len(data_a)/len(test_loader.dataset). The batch_idx grows along with iterations, while the len(data_a) stays unchanged(default 512 seemingly). However, the length of the test_loader.dataset is 37720, as is given by the voxceleb1_test.txt. The numerator and denominator don't match well when the numerator grows really big. What's wrong here? Any good suggestion?

How to prepare the dataset

Hi,

Thanks for the great implementation! I am wondering how should I download voxceleb dataset. What is the expected directory structure? And also could you give us an example of test_pairs file?

Best,
Xin

The pretrained model

hello! Thanks for your code and can you provide the pretrained model so I can evaluate my own dataset?

Minor issue with avg pooling

I could be wrong, since I normaly don't do speech verification.
But in my case, if I run the training it gives following error when affine transform is done to match the embedding dim after avg pool
"x = self.model.fc(x)" gives
RuntimeError: size mismatch, m1: [512 x 1024], m2: [2048 x 512] at /opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THC/generic/THCTensorMathBlas.cu:266

I think this is because
avgpool is supposed to be on the temporal dimension by design, and in the commited version of the code, the avg pooling is done on frequency domain.
avg pool2d is supposed to give [F, T] = [4, 2] => [4,1] but instead it gives [4, 2] => [1, 2]
Thus the dimension after torch.view is half smaller than what is expected by the model.fc layer.

image

So I suggest
image

for myResNet.init()

Again, I'm no expert of speech verification.
Anybody has another idea on how to fix that bug that is occuring to me, please please let me know.

Numbers of frames

Hi! Why are you using so low numbers of frame as default (32 as i see)? Voxceleb dataset wasn't preprocessing for dropping silence segments. Thus, many parts of training data is only silence. Acc is growing when I use greater number of frames (of course it's not only from silence segments). May be you was doing some experiments with numbers of frames?

Question regarding filter bank

Great job on implementing paper!

Question: why did you use python_speech_features.fbank instead of librosa.feature.melspectrogram ?
Both transformations are the same, right?

求助

请问您这里有中文的参考资料吗,我刚把方向定下来要做这个,了解一点点机器学习,但语音这块是两眼一抹黑,可以分享点手头的资料给我吗?不胜感激,[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.