Coder Social home page Coder Social logo

mycrazycracy / tf-kaldi-speaker Goto Github PK

View Code? Open in Web Editor NEW
32.0 32.0 16.0 408 KB

Neural speaker recognition/verification system based on Kaldi and Tensorflow

License: Apache License 2.0

Python 68.78% Shell 27.68% MATLAB 3.48% sed 0.06%
kaldi kaldi-asr machine-learning neural-network speaker-identification speaker-recognition speaker-verification speech-processing tensorflow

tf-kaldi-speaker's People

Contributors

mycrazycracy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tf-kaldi-speaker's Issues

Correct number of steps

Hello,

thanks for the great work, it is really useful!

I have a question about how to set the number of steps per epochs for KaldiDataRandomQueue.

As far as I know, an epoch means training the neural network with all the training data for one cycle. In an epoch, we use all of the data at least once. There are many steps in one epoch, and in one step, batch_size examples are processed.

But I don't see in the code for KaldiDataRandomQueue how you make sure to use all training data at least one for one epoch. So I'm having troubles to set the number of steps.

Please, Can you explain to me how I can make sure that the whole training set is seen and how to set the number of steps?

Thank you in advance.

jumpahead is eliminated in python3

the jumpahead function of random module is eliminated in python3, I removed it in the code, and found that the eer got degradation. I just want to ask is this function useful in model training? I think the os.urandom can already gives us good randomness

Text-dependent?

Thank you for sharing this great project.
What kind of additional features need to be added to support text-dependent speaker verification?
Thanks.

Inference script

Hi Dr.Liu:
Thank you very much for your sharing, I have seen your eer result(eer=0.02) is state of the art, but i have a few question for you. (1) I don't see the predict code, i just want to try the inference ; (2)How many days did you train on the voxceleb dataset?
Looking forward to your reply. Thank you!!

is there some files missing?

Hi,

thanks for the great work.

When I run the sre/v1 egs, I got the error in stage 2-8 such as :

python: can't open file 'python steps/data/augment_data_dir_new.py': [Errno 2] No such file or directory

python: can't open file 'utils/sample_validset_spk2utt.py': [Errno 2] No such file or directory

nnet/run_train_nnet.sh: line 63: /tf_gpu/bin/activate: No such file or directory

I have checked the such file in the originally path kaldi/egs/wsj/s5/. but none of these files exist.

Is that mean we need write the missing files to achieve the related function?

Thanks

Cheers

Error when commenting out "--rir-set-parameters"

It's strange that in the run.sh, we comment out the following lines

#  # Make a version with reverberated speech
#  rvb_opts=()
#  rvb_opts+=(--rir-set-parameters "0.5, RIRS_NOISES/simulated_rirs/smallroom/rir_list")                                         
#  rvb_opts+=(--rir-set-parameters "0.5, RIRS_NOISES/simulated_rirs/mediumroom/rir_list") 

however, --rir-set-parameters is a required parameter for steps/data/reverberate_data_dir.py, thus commenting out these lines will cause error.
Can I know why we comment out them, and whether in your experiments you include the reverberation augmentation training data? Since I am having problem on reproducing your results, thus I want to make sure our training data is same. Thanks!

About the GhostVLAD pooling

Thanks for realsing this useful implement, it greatly helps my work.
However, I notice that there are GhostVLAD pooling experiments in RESULTS.md file, but I did not find relevant function in pooling.py file. I currently need to do some tests on this popular pooling method.
Would you like to provide GhostVLAD pooling function? Truely appreciate your help.

Error while running run.sh in the egs/voxceleb/v1

I am running run.sh, up to stage7 it worked well, in stage 7 I got below error

File "nnet/lib/train.py", line 8, in
from misc.utils import ValidLoss, load_lr, load_valid_loss, save_codes_and_config, compute_cos_pairwise_eer
ModuleNotFoundError: No module named 'misc'

About chunk size

When I extracted embeddings (stage=8), I encountered a problem. When the length larger than the chunk size, it will be fall into a stop. In order to continue the extracting, I have to set the chunk size bigger to avoid segmentation. So is this a bug? And how can I deal with it?

How to use angular softmax loss

Hi,

Can you please give a brief idea about how to configure angular softmax as a loss function in the script?

Training steps I am not able to understand

About Enrollment and Testing

I didn't find the process of enrolment and testing. How can I distinguish between these two parts? I want to separate the enrollment utterances from the testing utterances. What should I do?

About "GE2E loss"

I saw code for GE2E loss in the code which have already been commented. Do you guys try GE2E loss in the experiment ? If yes, what is the performance of GE2E loss ?

Extracting embeddings error: ValueError: Cannot feed value of shape (1, 859, 24) for Tensor u'pred_features:0'

Yi Liu, Hello.

Thank you very much for your solution!

I trained with dataset voxcelev1&2 and xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_tdnn4_att. Everything works as expected. Training, extracting embeddings, eval works well.

But when i had tried to use pre-trained yours models on the same dataset for extracting embeddings (stage=8) i have got error:

ValueError: Cannot feed value of shape (1, 859, 24) for Tensor u'pred_features:0', which has shape '(?, ?, 30)'

Environment:
tensorflow-gpu==1.12
cuda==9.0.0
net = xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2

How to fix error? Thanks in advance!

Full log:

# nnet/wrap/extract_wrapper.sh --gpuid -1 --env tf_cpu --min-chunk-size 25 --chunk-size 10000 --normalize false --node tdnn6_dense /home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2 "ark:apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:- | select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:- |" "ark:| copy-vector ark:- ark,scp:/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.ark,/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.scp"
# Started at Tue Aug  4 13:11:43 MSK 2020
#
INFO:tensorflow:Extract embedding from tdnn6_dense
2020-08-04 13:11:46.819647: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-04 13:11:48.224681: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-08-04 13:11:48.224811: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: softs-server-07
2020-08-04 13:11:48.224871: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: softs-server-07
2020-08-04 13:11:48.225023: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 440.64.0
2020-08-04 13:11:48.225144: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 440.64.0
2020-08-04 13:11:48.225172: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 440.64.0
INFO:tensorflow:Extract embedding from node tdnn6_dense
WARNING:tensorflow:From /home/psadmin/projects/kaldi-tf/tf-kaldi-speaker/model/pooling.py:23: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
copy-vector ark:- ark,scp:/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.ark,/home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/xvectors_voxceleb_train/xvector.1.scp
apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:-
select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:-
INFO:tensorflow:[INFO] Key id00012-21Uxsk56VDQ-00001 length 859.
INFO:tensorflow:Reading checkpoints...
INFO:tensorflow:Restoring parameters from /home/psadmin/projects/voxceleb/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_mhe0.01_2/nnet/model-2610000
INFO:tensorflow:Succeed to load checkpoint model-2610000
Traceback (most recent call last):
  File "nnet/lib/extract.py", line 90, in <module>
    embedding = trainer.predict(feature)
  File "/home/psadmin/projects/kaldi-tf/tf-kaldi-speaker/model/trainer.py", line 724, in predict
ERROR (select-voiced-frames[5.5.762~1-0062]:Write():kaldi-matrix.cc:1404) Failed to write matrix to stream

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7fed417d3d1d]
select-voiced-frames(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40e76d]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-matrix.so(kaldi::MatrixBase<float>::Write(std::ostream&, bool) const+0x1a7) [0x7fed41a174ad]
select-voiced-frames(kaldi::TableWriterArchiveImpl<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::Write(std::string const&, kaldi::MatrixBase<float> const&)+0x1d6) [0x40ef40]
select-voiced-frames(main+0x580) [0x40cf50]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fed39fe8555]
select-voiced-frames() [0x40c909]

    embeddings = self.sess.run(self.embeddings, feed_dict={self.pred_features: features})
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
WARNING (select-voiced-frames[5.5.762~1-0062]:Write():util/kaldi-holder-inl.h:57) Exception caught writing Table object. kaldi::KaldiFatalError
WARNING (select-voiced-frames[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1057) Write failure to standard output
ERROR (select-voiced-frames[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1515) Error in TableWriter::Write

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7fed417d3d1d]
select-voiced-frames(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40e76d]
select-voiced-frames(main+0x5d3) [0x40cfa3]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fed39fe8555]
select-voiced-frames() [0x40c909]

    run_metadata_ptr)
WARNING (select-voiced-frames[5.5.762~1-0062]:Close():util/kaldi-table-inl.h:1089) Error closing stream: wspecifier is ark:-
  File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1128, in _run
ERROR (select-voiced-frames[5.5.762~1-0062]:~TableWriter():util/kaldi-table-inl.h:1539) Error closing TableWriter [in destructor].

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7fed417d3d1d]
select-voiced-frames(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40e76d]
select-voiced-frames(kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::~TableWriter()+0x59) [0x412893]
select-voiced-frames(main+0x82b) [0x40d1fb]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fed39fe8555]
select-voiced-frames() [0x40c909]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 859, 24) for Tensor u'pred_features:0', which has shape '(?, ?, 30)'
ERROR (apply-cmvn-sliding[5.5.762~1-0062]:Write():kaldi-matrix.cc:1404) Failed to write matrix to stream

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7f618cf15d1d]
apply-cmvn-sliding(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a4a9]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-matrix.so(kaldi::MatrixBase<float>::Write(std::ostream&, bool) const+0x1a7) [0x7f618d1594ad]
apply-cmvn-sliding(kaldi::TableWriterArchiveImpl<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::Write(std::string const&, kaldi::MatrixBase<float> const&)+0x29e) [0x40ad70]
apply-cmvn-sliding(main+0x335) [0x4091c2]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f618572a555]
apply-cmvn-sliding() [0x408dc9]

WARNING (apply-cmvn-sliding[5.5.762~1-0062]:Write():util/kaldi-holder-inl.h:57) Exception caught writing Table object. kaldi::KaldiFatalError
WARNING (apply-cmvn-sliding[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1057) Write failure to standard output
ERROR (apply-cmvn-sliding[5.5.762~1-0062]:Write():util/kaldi-table-inl.h:1515) Error in TableWriter::Write

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7f618cf15d1d]
apply-cmvn-sliding(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a4a9]
apply-cmvn-sliding(main+0x388) [0x409215]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f618572a555]
apply-cmvn-sliding() [0x408dc9]

WARNING (apply-cmvn-sliding[5.5.762~1-0062]:Close():util/kaldi-table-inl.h:1089) Error closing stream: wspecifier is ark:-
ERROR (apply-cmvn-sliding[5.5.762~1-0062]:~TableWriter():util/kaldi-table-inl.h:1539) Error closing TableWriter [in destructor].

[ Stack-Trace: ]
/home/psadmin/projects/kaldi-tf/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x8b7) [0x7f618cf15d1d]
apply-cmvn-sliding(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x40a4a9]
apply-cmvn-sliding(kaldi::TableWriter<kaldi::KaldiObjectHolder<kaldi::MatrixBase<float> > >::~TableWriter()+0x59) [0x412971]
apply-cmvn-sliding(main+0x5e0) [0x40946d]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f618572a555]
apply-cmvn-sliding() [0x408dc9]

terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
/bin/sh: line 1:  6753 Aborted                 apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:-
      6754                       | select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:-
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 765, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/psadmin/projects/kaldi-tf/tf-kaldi-speaker/dataset/kaldi_io.py", line 387, in cleanup
    raise SubprocessFailed('cmd %s returned %d !' % (cmd,ret))
SubprocessFailed: cmd apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 scp:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/feats.scp ark:- | select-voiced-frames ark:- scp,s,cs:/home/psadmin/projects/voxceleb/data/voxceleb_train/split40/1/vad.scp ark:-  returned 134 !

Exception KeyboardInterrupt in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored
# Accounting: time=873 threads=1
# Ended (code 1) at Tue Aug  4 13:26:16 MSK 2020, elapsed time 873 seconds

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.