Coder Social home page Coder Social logo

pyannote / pyannote-audio Goto Github PK

View Code? Open in Web Editor NEW
5.5K 70.0 729.0 256.32 MB

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Home Page: http://pyannote.github.io

License: MIT License

Python 36.09% Jupyter Notebook 63.91%
pytorch speech-processing speaker-diarization speech-activity-detection speaker-change-detection speaker-embedding voice-activity-detection pretrained-models overlapped-speech-detection speaker-recognition

pyannote-audio's People

Contributors

aashish-19 avatar bghorvath avatar clbarras avatar clement-pages avatar dependabot[bot] avatar entn-at avatar flyingleafe avatar frenchkrab avatar greggovit avatar hadware avatar hbredin avatar j-petiot avatar juanmc2005 avatar julien-c avatar kan-cloud avatar martinjbaker avatar marvinlvn avatar mogwai avatar mymoza avatar paulgb avatar paullerner avatar philschmid avatar pkorshunov avatar prashanthellina avatar purfview avatar rhenanbartels avatar simonottenhauskenbun avatar wesbz avatar wq2012 avatar yinruiqing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyannote-audio's Issues

Clustering approaches

Data augmentation

Tools

  • pyroomacoustics Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
  • AudioTSM is a python library for real-time audio time-scale modification procedures, i.e. algorithms that change the speed of an audio signal without changing its pitch.
  • CLEESE is a sound-manipulation software tool designed to generate an infinite number of natural-sounding, expressive variations around an original speech recording.
  • pydub, manipulate audio with a simple and easy high level interface
  • pysox Python wrapper around sox
  • Data Augmentation For Wearable Sensor Data
  • muda: A library for augmenting annotated audio data
  • maracas is a library for corrupting audio files with additive and convolutive noise.
  • pySpeechRev efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.
  • Reverb class in PASE
  • rir_simulator_python

Databases

Error in apply mode

When running the apply-command I get a following error message. This comes after I have run the tune-command with the result that I describe in issue #39.

/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/keras/models.py:245: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
Test set: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/niko/anaconda3/envs/py35-pyannote-audio/bin/pyannote-speech-detection", line 11, in <module>
    sys.exit(main())
  File "/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/pyannote/audio/applications/speech_detection.py", line 615, in main
    application.apply(protocol_name, subset=subset)
  File "/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/pyannote/audio/applications/speech_detection.py", line 565, in apply
precomputed = Precomputed(root_dir=apply_dir)
  File "/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/pyannote/audio/features/utils.py", line 110, in __init__
    start = f.attrs['start']
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/h5py_1496412421941/work/h5py-2.7.0/h5py/_objects.c:2846)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/h5py_1496412421941/work/h5py-2.7.0/h5py/_objects.c:2804)
  File "/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/h5py/_hl/attrs.py", line 58, in __getitem__
    attr = h5a.open(self._id, self._e(name))
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/h5py_1496412421941/work/h5py-2.7.0/h5py/_objects.c:2846)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/Users/travis/miniconda3/conda-bld/h5py_1496412421941/work/h5py-2.7.0/h5py/_objects.c:2804)
  File "h5py/h5a.pyx", line 77, in h5py.h5a.open (/Users/travis/miniconda3/conda-bld/h5py_1496412421941/work/h5py-2.7.0/h5py/h5a.c:2343)
KeyError: "Can't open attribute (Can't locate attribute: 'start')"

feature-extraction tutorial 'db.yml' file not found

Hi Bredin!
I was trying to replicate feature-extraction tutorial. I followed the installation instructions as given in master branch.
But when I tried to execute command cat ~/.pyannote/db.yml, I got the error as follows:

cat: /home/abhishek/.pyannote/db.yml: No such file or directory

So, after skipping this, when I tried to extract the features from audio files pyannote-speech-feature ${EXPERIMENT_DIR} GameOfThrones I got the following error:

/home/abhishek/anaconda3/envs/pyannote-audio/lib/python3.5/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/home/abhishek/anaconda3/envs/pyannote-audio/bin/pyannote-speech-feature", line 11, in
sys.exit(main())
File "/home/abhishek/anaconda3/envs/pyannote-audio/lib/python3.5/site-packages/pyannote/audio/applications/feature_extraction.py", line 188, in main
preprocessors = {'wav': FileFinder(db_yml)}
File "/home/abhishek/anaconda3/envs/pyannote-audio/lib/python3.5/site-packages/pyannote/database/util.py", line 85, in init
with open(config_yml, 'r') as fp:
FileNotFoundError: [Errno 2] No such file or directory: '/home/abhishek/.pyannote/db.yml'

So, I have the following doubts:

  1. What is db.yml file?
  2. How is this file created? I created the pyannote.database plugin for GameOfThrones but this file was not created in the process.
  3. Do we have to create a symbolic link to the database (audio .wav files) for ~/.pyannote/db.yml which redirects to /path/to/GameOfThrones/corpus/{uri}.wav
    • Is this {uri} variable need to be defined externally in bash terminal or is directly incorporated from files?

Error with change detection

I went through the Speech Activity Detection tutorial succesfully, and the resulting test mdtm files look good in what it comes to basic segmentation, but there are some problems with Change Detection tutorial.

I get the same error now when I try evaluate and apply commands:

Using Theano backend.
Traceback (most recent call last):
  File "/Users/niko/anaconda3/envs/pyannote/bin/pyannote-change-detection", line 11, in <module>
    load_entry_point('pyannote.audio', 'console_scripts', 'pyannote-change-detection')()
  File "/Users/niko/pyannote_test/pyannote-audio/pyannote/audio/applications/change_detection.py", line 392, in main
    epoch=epoch, min_duration=min_duration)
  File "/Users/niko/pyannote_test/pyannote-audio/pyannote/audio/applications/change_detection.py", line 268, in evaluate
    predictions[uri] = aggregation.apply(dev_file)
  File "/Users/niko/pyannote_test/pyannote-audio/pyannote/audio/labeling/aggregation.py", line 120, in apply
    predictions = next(self.from_file(current_file))
  File "/Users/niko/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/batch.py", line 365, in from_file
    incomplete=incomplete):
  File "/Users/niko/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/batch.py", line 418, in __call__
    for fragment in self.generator.from_file(preprocessed_file):
  File "/Users/niko/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/fragment.py", line 148, in from_file
    wav = current_file['wav']
KeyError: 'wav'

Thanks for help!

Theory background for pyannote-speech-detection

Hi, Bredin,

I am a newer to LSTM. After reading your paper(in the citation) about tristounet, I think I have got the basic idea how it works for speaker change detection.However, I still confused about the theory background for command: pyannote-speech-detection.

Can I say that speech detection is similar to speaker turns detection if non-speech segments are considered from a special 'speaker' while speech parts are from another 'speaker'. In this case, speech boundary detection is the same to speaker change detection.

However, I am still wondered why the parameter setting in config.yml ( n_classes: 2 ) for speech activity detection is different with the setting in config.yml ( n_classes: 1 ). To be honest, I don't know the meaning of this parameter(n_classes). Could I have other introduction or tutorials about the theory background for this pyannote-speech-detection command?

Thank you for your time and patience.

Liyong Guo

Alternative triplet sampling strategy

We should implement (and evaluat) the following ones.
Given anchor and positive samples,

  • hard negative - choose negative at random such that
    d(anchor, negative) < d(anchor, positive) + margin
  • hardest negative - same as hard negative +
    negative = argmin(d(anchor, negative))
  • semi-hard negative - same as hard negative +
    d(anchor, negative) > d(anchor, positive)

Speaker Change Detection

Hi,

We are trying to use pyannote for Speaker Change Detection in supervised way.

We have created our own database. We are trying to follow tutorial instructions. But getting following error. We haven't modified the code.

We have attached the screenshot of the error and our newly created database.
screenshot from 2017-12-21 13-24-13

Could you please guide us in this?

Regards
Ankur
pyannote-db-template-master.tar.gz
screenshot from 2017-12-21 13-24-13

what's the "architecture.yml" file of Pretrained ETAPE model?

Hi Hervé BREDIN,
what's the "architecture.yml" file of Pretrained ETAPE model?
"A TristouNet model (trained and validated on ETAPE database) is available for you to test directly in tutorials/speaker-embedding/2+0.5/TristouNet."

in TristouNet folder, I only find "0986.h5", maybe that's the weight_h5, but where is the architecture.yml?

List of embedding losses

Incorrect metric transfer

Hello, If I understand right, you don't use "metric" in module triplet_loss.py and it don't transfer SequenceEmbeddingAutograd after kwargs. In this way always use "cosine" as default.
Model traininig very slow due to distance computing. I have dataset with 1000+ speakers around 10 minutes for each and GPU. The first epoch's metrics were not computed within 3 hours. May be you have some idea how I can speed up it?
My approach settings is:

     metric: cosine
     margin: 0.1
     clamp: positive
     per_batch: 1
     per_label: 40
     per_fold: 40
     gradient_factor: 10000
     batch_size: 512

Unsupervised, weakly supervised, and semi-supervised learning

Unsupervised segmentation

I have a one conversation with several people and it is unsupervised. I would only like to segment all the speakers in the conversation. In your speaker change detection code, i have given that that url(Local filepath) of the dataset. But after one interation I was getting an error which I believe due to the length of the dataset. Which part of the code I can use from this project.?

My final goal is to only segment unique speakers from the conversation. Any idea would be appreciated or any basic algorithmic steps which i can borrow from your code.

anyway to *just apply*

I just want to experiment with the "Speaker Change Detection" as a SOLUTION OUT OF BOX ; for the first step, I need to test it on my own unlabeled data using a pretrained model and see if it generalizes well; and then decide whether to train my own model.

Something that looks like a pretrained model was find here:
https://github.com/yinruiqing/change_detection/tree/master/model

But I do not have access to the ETAPE database, nor do I understand the complicated structure or concepts (e.g. terms like "protocol" in "<database.task.protocol>") of the database;
So, how can I replicate your amazing work on my own data ?

Thanks !

Citation of "Binarize predictions using onset/offset thresholding"

Hi Bredin,

I want to cite your idea of "Binarize predictions using onset/offset thresholding" for speech activity detection. I think this is quite interesting comparing to a single threshold.

However, I failed to find introduction about that in two of your papers "Tristounet: Triplet Loss for Speaker Turn Embedding" and "Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks".

Could you tell me which paper of yours contains introduction about this idea so that I could cite in my work?

Many thanks.

ValueError: all input arrays must have the same shape

Hello,
I try to run pyannote on my dataset, and I got error in pack_ndarray function of "pyannote/generators/batch.py"

def pack_ndarray(self, ndarrays):
    return np.stack(ndarrays)

I print shape of my ndarrays
(201, 35)
(201, 35)
(201, 35)
(201, 35)
(108, 35)
(201, 35)
(175, 35)
(157, 35)
(34, 35)
(51, 35)
...
the first dim is not the same, so what's my mistake?

Alternative architectures

Training failure with several Keras versions

Hi,
I have the latest version of Keras (2.1.3) and when I tried to train a model for speaker embedding I received the following error:

TypeError: get_updates() missing 1 required positional argument: 'constraints'.

I looked at keras' sources and this error is due to an API change in keras.optimizer.get_updates() that was not changed in the file pyannote/audio/optimizers.py. When I tried to downgrade to keras 2.0.0 I got another error related to inconsistency in keras.models.save_model signature.

Please fix this inconsistency or provide a suitable version of Keras.
Thanks!

add option to provide validation subset

This could be used like that

embedding.py train --validation=<subset> ...

this would result in a few files being created (such as evolution of EER as a function of epoch number, evolution of scores distribution, etc...)

cc @GregGovit

"model wasn't compiled" during fine-tuning

Hello! I have my own dataset with ~1500 speakers (3-10 minutes for each). I tried to train models with pyannote and have some results now.

  1. Models works well on audio from the same devises (train and test stages), but the system quality falls very low on audio from different devices. I understand that it's may be actual only for my database. Do you test your model the same way or do you observe the same effects?
  2. The quality grew when I have used 20 MFCC instead 11.
  3. Pre-training model works on Russian language, EER around 20% (60 speakers).
  4. I tried to fine-tune pre-training model. I used "--start" option for that, but got an error "model wasn't compiled". It fixed by adding:
    embedding.compile(optimizer=optimizer, loss=precomputed_gradient_loss)
    in 306 line base_autograd.py file, but I'm not sure that it's correct.

Thanks!

Nan loss when training a speaker-embedding with the TristouNet architecture

Hello Hervé,
I am currently trying to train the speaker embedding module using the TristouNet architecture but I end up with a loss of nan from the second epoch... So, here is the command I am running:

 $ pyannote-speaker-embedding-keras train --database=db.yml --subset=train tutorials/speaker-embedding/2+0.5/TristouNet Etape.SpeakerDiarization.TV

And here are the warnings/log messages:

/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/indices.py:84: UserWarning: 5 labels (out of 179) have less than 3 training samples.
  per_label=per_label))
Epoch 1/1000
2018-01-24 17:20:21.787683: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/autograd/core.py:81: RuntimeWarning: divide by zero encountered in power
  result_value = self.fun(*argvals, **kwargs)
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/autograd/numpy/numpy_grads.py:84: RuntimeWarning: invalid value encountered in multiply
  anp.sqrt.defvjp(   lambda g, ans, vs, gvs, x : g * 0.5 * x**-0.5)
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/autograd/numpy/numpy_grads.py:46: RuntimeWarning: invalid value encountered in multiply
  unbroadcast(vs, gvs, g * y * x ** anp.where(y, y - 1, 1.)))
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce
  return umr_minimum(a, axis, None, out, keepdims)
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce
  return umr_maximum(a, axis, None, out, keepdims)
1/1 [==============================] - 36s - loss: 0.0535
Epoch 2/1000
1/1 [==============================] - 30s - loss: nan
Epoch 3/1000
1/1 [==============================] - 31s - loss: nan

Some minor details: as you may have guessed, I have slightly changed the options to the train method of pyannote-speaker-embedding-keras so that, like the data command, a path other than ~/.pyannote/db.yml could be specified for the db.yml file.
The various config.yml files (tutorial/speaker-embedding/config.yml and tutorial/speaker-embedding/2+0.5/TristouNet/config.yml) have the same content as what is given in the corresponding tutorial.

That said, another odd thing is that 2 progress indicators are printed when running

$ pyannote-speaker-embedding-keras data --database=db.yml --duration=2 --step=0.5 tutorials/speaker-embedding/ Etape.SpeakerDiarization.TV

as shown below

Training set: 0it [00:00, ?it/s]
Training set: 28it [02:57,  6.32s/it]
100%|████████████████████████████████████| 81433/81433 [00:37<00:00, 2148.18it/s]
Development set: 0it [00:00, ?it/s]
Development set: 9it [00:47,  5.32s/it]
100%|████████████████████████████████████| 23298/23298 [00:11<00:00, 2082.87it/s]
Test set: 0it [00:00, ?it/s]
Test set: 9it [00:50,  5.66s/it]
100%|████████████████████████████████████| 22815/22815 [00:10<00:00, 2132.08it/s]

So I don't really know if the problem I am facing comes from the training phase or from the data used to train the NN. Looking a bit around, the warnings given by autograd may be related to the bug reported here.
Have you encountered this problem before? And if so, how did you manage to circumvent it?

Cheers,
Mathieu

Edit:
After some printing, it appears that logs = self.loss_and_grad(batch, embedding) in pyannote/audio/embedding/approaches_keras/base.py, l. 333, yields a gradient with NaN values on the first epoch. However, I haven't been able to find the definition of loss_and_grad to narrow down the problem yet.

build error

Hi, when I run pip install "pyannote.audio==0.3", I got the following error msg:

In file included from _pysndfile.cpp:471:0:
pysndfile.hh:55:21: fatal error: sndfile.h: No such file or directory
#include <sndfile.h>
^
compilation terminated.
error: command 'gcc' failed with exit status 1


Failed building wheel for pysndfile
Running setup.py clean for pysndfile
Failed to build pysndfile

Pre-trained model

Is there a trained TristouNet on ETAPE that I can use for domain adaption on my task of speaker turn detection?

How to use this package

Can you please supply some documentation on speaker embedding and detection? It's not clear how to use this package. Moreover does it include diarization?

Error in tune mode

I get following error messages when trying to run the tune command on my adapted template. With training everything went as expected, I think. This creates tune.png and tune.yml files, but I assume this doesn't look as it should:

Plot

This is the error message:

Iteration No: 1 ended. Evaluation done at provided point.
Time taken: 3.3038
Function value obtained: 0.0000
Current minimum: 0.0000
Iteration No: 2 started. Evaluating function at random point.
/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/keras/models.py:245: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/skopt/optimizer/optimizer.py:195: UserWarning: The objective has been evaluated at this point before.
  warnings.warn("The objective has been evaluated "
  ... (the same message repeated)
  Iteration No: 20 ended. Search finished for the next optimal point.
Time taken: 8.3426
Function value obtained: 0.0000
Current minimum: 0.0000
Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x12d522a58>>
Traceback (most recent call last):
  File "/Users/niko/anaconda3/envs/py35-pyannote-audio/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 582, in __del__
AttributeError: 'NoneType' object has no attribute 'TF_DeleteStatus'

add option to restart training after crash

This could be used in the following way:

embedding.py train --restart=<epoch> ...

This also needs the following changes

  • update SequenceEmbedding.to_disk method to also save the state of the optimizer
  • update SequenceEmbedding.from_disk (or add a new method) to load a saved optimizer
  • update SequenceEmbedding.fit to support preloaded optimizer

cc @GregGovit

Error in "pyannote-speaker-embedding apply"

I'm finishing going through this tutorial, and everything has gone very well otherwise, but in the last extraction part I get following error.

It is very possible that I need to do some bigger changes to my setup or do some of the earlier tutorials again, as there seems to have been quite many changes. Anyway, maybe I'm missing something obvious here so I open an issue first. I updated pyannote-audio and others to the newest versions.

pyannote-speaker-embedding apply --step=2.0 $VALIDATION_DIR/development.eer.txt ikdp.SpeakerDiarization.MyFirstProtocol $OUTPUT_DIR

Using Theano backend.
/Users/niko/anaconda3/envs/pyannote/lib/python3.5/site-packages/keras/models.py:251: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
Development set: 0it [00:00, ?it/s]
Development set: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "/Users/niko/anaconda3/envs/pyannote/bin/pyannote-speaker-embedding", line 11, in <module>
    load_entry_point('pyannote.audio', 'console_scripts', 'pyannote-speaker-embedding')()
  File "/Users/niko/pyannote_test/pyannote-audio/pyannote/audio/applications/speaker_embedding.py", line 791, in main
    internal=internal)
  File "/Users/niko/pyannote_test/pyannote-audio/pyannote/audio/applications/speaker_embedding.py", line 708, in apply
    fX = extraction.apply(current_file)
  File "/Users/niko/pyannote_test/pyannote-audio/pyannote/audio/embedding/extraction.py", line 142, in apply
    incomplete=True)])
  File "/Users/niko/pyannote_test/pyannote-audio/pyannote/audio/embedding/extraction.py", line 141, in <listcomp>
    [batch for batch in self.from_file(current_file,
  File "/Users/niko/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/batch.py", line 379, in from_file
    incomplete=incomplete):
  File "/Users/niko/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/batch.py", line 432, in __call__
    for fragment in self.generator.from_file(preprocessed_file):
  File "/Users/niko/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/fragment.py", line 154, in from_file
    raise ValueError('source must be one of "annotated", "annotated_extent", "annotation", "support" or "audio"')
ValueError: source must be one of "annotated", "annotated_extent", "annotation", "support" or "audio"

Thanks for help!

Speaker Change Detection

Hi,

I have two doubts :

  1. Is 'speech activity detection' prerequisite to 'speaker change detection'?

  2. Another question is current approach in pyannote is speaker and content invariant?

  3. In striking balance between purity and coverage, what should be the value of coverage that should be good enough practically?

Regards
Ankur

Example with OS database?

This looks really nice. I was wondering if you have or plan to make available a quickstart example using an open source database?

Breaking change in Keras optimizer

There is a breaking change in Keras optimizer in version 2.0.7. So when I use a SMORMS3 (an optimizer defined in pyannote-audio and used in pyannote-speech-detection and pyannote-change-detection), it raise an error:
get_updates() missing 1 required positional argument: 'constraints'

Keep track of best model so far

ValidationCheckpoint should save the best model so far in something like best.accuracy.h5 and best.fscore.h5

It should also remember which epoch it corresponds to so it can be plotted in status.png.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.