ravisoji / plda Goto Github PK

View Code? Open in Web Editor NEW

127.0 127.0 28.0 20.1 MB

Probabilistic Linear Discriminant Analysis & classification, written in Python.

Home Page: https://ravisoji.com

License: Apache License 2.0

Python 51.42% Jupyter Notebook 48.58%

plda's People

Contributors

Stargazers

Watchers

plda's Issues

Did you test it on face recognition?

Expected Behavior

Actual Behavior

Steps to Reproduce the Behavior

Some minor mistake In "classifier.model.calc_same_diff_likelihood_ratio(U_model_p, U_model_g)"

Expected Behavior

classifier.model.calc_same_diff_likelihood_ratio(U_model_p, U_model_g)
in this function the snippet should be "U_model_same = np.concatenate(
[U_model_p], [U_model_g])"

Actual Behavior

U_model_same = np.concatenate([ U_model_p, U_model_g,])

Steps to Reproduce the Behavior

Only two brackets are missing.

How to get scores from PLDA

@RaviSoji How could I get the score which the PLDA assigns to a test input vector?

Right now I am using calc_logp_pp_categories for getting scores, but they are log probabilities and not raw scores.

ValueError: array must not contain infs or NaNs

i was applied plda for my speaker verification task.

import os
import plda
import numpy as np

x_vector_embeddings = np.load("../../spk_xvector.npz")
train_values = x_vector_embeddings['features']
train_labels = x_vector_embeddings['data_path']
for i,label in enumerate(train_labels):
    labels.append(label.get('pic_path'))
myarray = np.asarray(labels)

training_data = train_values
training_labels = myarray

testing_data = train_values[:2]
testing_labels = myarray[:2]

classifier = plda.Classifier()
classifier.fit_model(training_data, training_labels, n_principal_components=5)
predictions, log_p_predictions = classifier.predict(testing_data)

it's giving this issue.
ValueError: array must not contain infs or NaNs

/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/optimizer.py:165: RuntimeWarning: Degrees of freedom <= 0 for slice
cov_ks.append(np.cov(X_k.T))
/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/numpy/lib/function_base.py:3109: RuntimeWarning: divide by zero encountered in double_scalars
c *= 1. / np.float64(fact)
/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/numpy/lib/function_base.py:3109: RuntimeWarning: invalid value encountered in multiply
c *= 1. / np.float64(fact)
Traceback (most recent call last):
File "x_vector_training.py", line 29, in
classifier.fit_model(training_data, training_labels, n_principal_components=5)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/classifier.py", line 25, in fit_model
self.model = Model(X, Y, n_principal_components)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/model.py", line 94, in init
self.fit(row_wise_data, labels, n_principal_components)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/model.py", line 161, in fit
optimize_maximum_likelihood(X, labels)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/optimizer.py", line 68, in optimize_maximum_likelihood
W = calc_W(S_b, S_w)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/optimizer.py", line 181, in calc_W
eigenvalues, eigenvectors = eigh(S_b, S_w)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/scipy/linalg/decomp.py", line 338, in eigh
b1 = _asarray_validated(b, check_finite=check_finite)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/scipy/_lib/_util.py", line 238, in _asarray_validated
a = toarray(a)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/numpy/lib/function_base.py", line 1233, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

@RaviSoji sir. how to reslove this issue.

predict_proba (calculate probabilities of each class given a sample)

Hello,
I need to have probabilities for each class given a test sample. As in Scikit-learn, we can use .predict_proba() method.
I'd appreciate if you help me with this.
Is something like .predict_proba() exists in your repository or do i need to calculate probabilities based on scores return in tuple from your .predict() method?
Thanks

Error while training model

Hi @RaviSoji I am using this code as shown in mnist_demo nothebook. I am using my own data.

If I use classifier.fit_model(training_data, training_labels, n_principal_components=None) , I get following error:

error: failed in converting 2nd argument `b' of _flapack.dsygvd to C/Fortran array in scipy/linalg/decomp.py in eigh

And if I use some value for n_principal_components such as classifier.fit_model(training_data, training_labels, n_principal_components=5), I get following error:

ValueError: array must not contain infs or NaNs in numpy/lib/function_base.py in asarray_chkfinite

I tried by freshly installing environment but errors are same. Any suggestion what is causing error?

Thanks.

Return best n categories when predicting

Is your feature request related to a problem? Please describe.
In my case (using PLDA for information retrieval) it'd better to predict [let's say] best n options instead of the best one for a given query.
I figured out that the predict method does not support this feature. But it can be done using calc_logp_pp_categories method.

Describe the solution you'd like
My fast solution for solving this was to use bellow code:

def predict_doc_at(query, k=1):
    """
    Predict which document is matched to the given query.

    :param query: input query
    :type query: str (or list of strs)
    :param k: number of returning docs
    :type k: int 
    :return: return the document name
    """
    query_embedding = get_embeddings(query)
    data = PLDA_classifier.model.transform(query_embedding,
                                           from_space='D',
                                           to_space='U_model')
    logpps_k, K = PLDA_classifier.calc_logp_pp_categories(data,
                                                          False)
    best_k_idx = logpps_k.argsort()[::-1][:k]
    predictions = K[best_k_idx]
    return accuracy, predictions

Is it possible to save a model to be able to use it without having to retrain it every time?

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

cannot install through pip install plda

Describe the bug
After git clone, run "pip install plda",
the error goes like:
ERROR: Could not find a version that satisfies the requirement plda
ERROR: No matching distribution found for plda

enrollment of model

PLDA is often used for biometric tasks, where we need to enroll a set of models. Based on those enrollments, we can generate scores. In your setup, how the enrollment will occur?

error:"array must not contain infs or NaNs"

When I use this framework for PLDA experiments, this error often appears. I can guarantee that all my input data are obtained through preprocessing, and they are all positive finite numbers, and the data scale is not large.
I don't know why such errors occur. Do you have any additional requirements for the input data?

Training converging problems

Describe the bug
I want to summarize problems and lessons learned while using this PLDA implementation.

I observed two kinds of errors when training so far:

ValueError: array must not contain infs or NaNs in optimizer.py:181 running calc_W(S_b, S_w)
LinAlgError: SVD did not converge in model.py:166 running matrix_rank = np.linalg.matrix_rank(S_w)

Both boils down to covariance matrix computation.

To Reproduce
Steps to reproduce the behavior:
Use a dummy data with single example per class

ipdb> labels.shape                                                                                                                                                                                                                                                                                                                                                                    [70/6138]
(5,)
ipdb> labels
array([1, 4, 3, 0, 2])
ipdb> embeddings.shape
(5, 896)

ipdb> plda_classifier = plda.Classifier()
ipdb> plda_classifier.fit_model(embeddings, labels)
/lnet/work/people/oplatek/plda/plda/optimizer.py:165: RuntimeWarning: Degrees of freedom <= 0 for slice
  cov_ks.append(np.cov(X_k.T))
/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/numpy/lib/function_base.py:2680: RuntimeWarning: divide by zero encountered in true_divide
  c *= np.true_divide(1, fact)
/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/numpy/lib/function_base.py:2680: RuntimeWarning: invalid value encountered in multiply
  c *= np.true_divide(1, fact)
*** numpy.linalg.LinAlgError: SVD did not converge

What helped?

So far I created a dummy data using gaussian noise and ensured that we have two examples per label.

embeddings = np.concatenate((embeddings, embeddings + randn(*embeddings.shape)), axis=0)
labels = np.concatenate((labels, labels), exis=0)
plda_classifier.fit_model(embeddings, labels)  # runs without errors

I am not 100% what is the root cause but I suspect that that computing covariance matrix from a single vector when computing within class scatter is the problematic part.

I will follow up and probably create a PR with asserts for the input data in order to avoid the error.

Note: There are several closed issues #56 #57 but I am opening this one for a new discussion and keep my (next) findings at single place. Feel free to close it if you do not like it.

Is the implementation of Sec-3.1 in the paper added now?

I want to use this repo to do a speaker recognition experiment.
In my case, I have numbers of i-vectors from different speakers in training set.
After I training the model, I want to use it to classify some vectors from unknonwn speakers belonging some other classes which are not in training set, and each class already has some labeled vectors. Just like the situation in the paper Sec-3.1 said.
But I can only find the way to classify these vectors to the same classes in training set by using .predict() method. So how can I fix this new classes problem.

ravisoji / plda Goto Github PK

plda's People

Contributors

Stargazers

Watchers

Forkers

plda's Issues

Expected Behavior

Actual Behavior

Steps to Reproduce the Behavior

Expected Behavior

Actual Behavior

Steps to Reproduce the Behavior

Recommend Projects

Recommend Topics

Recommend Org