juanmc2005 / similaritylearning Goto Github PK

Similarity Learning applied to Speaker Verification and Semantic Textual Similarity

Python 100.00%

metric-learning similarity-learning arcface contrastive-loss triplet-loss center-loss speaker-verification sts semantic-textual-similarity mnist

similaritylearning's People

Contributors

Stargazers

Watchers

Forkers

huyhoang17 entn-at xixirupan juneren peternara tilsonar chenchy

similaritylearning's Issues

Integrate STS Dataset

The dataset should be added with batch generation and loaders, so its access respects the same interface as MNIST.
Make sure to include the posibility to generate pairs, triplets or clusters.

Document Loss Functions

Not all loss functions are well documented. Explain them in detail, as it is the core of the project

Speaker Verification Training Problem

Cross entropy training is not stable. The EER is going up at each epoch instead of going down.

Train Contrastive STS model

Determine a threshold to build positive and negative STS pairs and train a model with contrastive loss.
Validate with Spearman score (should discuss which is the best method).

EER Parallel Validation

Implement EER validation trials as a separate validation script to run in parallel every time a new model is saved.

Train First Speaker Verification Model

Use SpeakerModel and VoxCeleb1 to train a first speaker verification model.
Cross entropy takes priority over the rest of the losses as discussed in previous meetings.
Make sure to use EER as the validation metric.

Trainer Callbacks to Separate Files

base.py is starting to grow in size:

Base training classes should be moved to a single file in a "training" directory.
Subclasses of TrainingListener should be put in a training callbacks file
Subclasses of TestListener should be put in a test callbacks file
It could be cooler to name listeners "plugins" or something like that, since they feel to be more than just callbacks.

Further Refactor Trainers

Currently the trainers have more code in common than I'm confortable with. Try to make them as small as possible, without losing the needed flexibility

Significance Test: KL-Divergence and Contrastive loss

Perform a significance test with the results obtained using these two methods. Still waiting on confirmation for the specific method to use for this.

how to run

I want to know How to run the training code of MNIST dataset

Triplet STS experiment

Code and run a triplet loss experiment for STS.
We can focus on euclidean distance for now, so we test as many functions as we can. Later we'll do experiments with cosine distance too.

Integrate VoxCeleb

Get familiar with the VoxCeleb dataset (doc provided) and integrate it into the project. Make it available through an interface similar to the already available MNIST dataset.

Simulate Generation of +/- STS Pairs

Calculate the number of positive and negative pairs that can be generated for STS (total and per sentence)

ArcFace Speaker Experiment

Use ArcFace loss to train a speaker verification model. The SincNet softmax baseline should be used in order to reduce training time.

Model Saving

Automatically save the model when it achieves best validation performance.

Integrate STS Model

Add chosen STS model (provided) to the project. Make it available through an interface similar to the MNIST model.

STS Golden Rating plot

Plot the golden ratings in the dataset to understand their distribution and choose an adequate threshold

Test and Compare Speaker Verification Losses

Run experiments for all losses using SincNet, compare and analyze results

Integrate SV Model

Integrate the speaker verification model (provided) into the project. Make it available through an interface similar to the MNIST model.

STS Cluster Simulation

The code needed to cluster sentences based on positive pairs and a threshold already exists.

What's needed here is to simulate the generation of clusters for different threshold values (to consider positives).

To start, consider threshold values from 0 to 5 with a step of 0.5, then plot the amount of clusters generated, total sentences kept, and mean cluster size.
A zone of interest will come out of this experience, we have to zoom there and repeat the experience in this interval, with a step of 0.1.
After that, we have to choose a good threshold to generate clusters.

Note: Sentences which don't have positive relationships will be lost from the original dataset, as we cannot infer clusters for them, unless we allow clusters of 1 sentence.