ravisoji / plda Goto Github PK
View Code? Open in Web Editor NEWProbabilistic Linear Discriminant Analysis & classification, written in Python.
Home Page: https://ravisoji.com
License: Apache License 2.0
Probabilistic Linear Discriminant Analysis & classification, written in Python.
Home Page: https://ravisoji.com
License: Apache License 2.0
classifier.model.calc_same_diff_likelihood_ratio(U_model_p, U_model_g)
in this function the snippet should be "U_model_same = np.concatenate(
[U_model_p], [U_model_g])"
U_model_same = np.concatenate([ U_model_p, U_model_g,])
Only two brackets are missing.
@RaviSoji How could I get the score which the PLDA assigns to a test input vector?
Right now I am using calc_logp_pp_categories
for getting scores, but they are log probabilities and not raw scores.
i was applied plda for my speaker verification task.
import os
import plda
import numpy as np
x_vector_embeddings = np.load("../../spk_xvector.npz")
train_values = x_vector_embeddings['features']
train_labels = x_vector_embeddings['data_path']
for i,label in enumerate(train_labels):
labels.append(label.get('pic_path'))
myarray = np.asarray(labels)
training_data = train_values
training_labels = myarray
testing_data = train_values[:2]
testing_labels = myarray[:2]
classifier = plda.Classifier()
classifier.fit_model(training_data, training_labels, n_principal_components=5)
predictions, log_p_predictions = classifier.predict(testing_data)
it's giving this issue.
ValueError: array must not contain infs or NaNs
/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/optimizer.py:165: RuntimeWarning: Degrees of freedom <= 0 for slice
cov_ks.append(np.cov(X_k.T))
/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/numpy/lib/function_base.py:3109: RuntimeWarning: divide by zero encountered in double_scalars
c *= 1. / np.float64(fact)
/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/numpy/lib/function_base.py:3109: RuntimeWarning: invalid value encountered in multiply
c *= 1. / np.float64(fact)
Traceback (most recent call last):
File "x_vector_training.py", line 29, in
classifier.fit_model(training_data, training_labels, n_principal_components=5)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/classifier.py", line 25, in fit_model
self.model = Model(X, Y, n_principal_components)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/model.py", line 94, in init
self.fit(row_wise_data, labels, n_principal_components)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/model.py", line 161, in fit
optimize_maximum_likelihood(X, labels)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/optimizer.py", line 68, in optimize_maximum_likelihood
W = calc_W(S_b, S_w)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/plda/optimizer.py", line 181, in calc_W
eigenvalues, eigenvectors = eigh(S_b, S_w)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/scipy/linalg/decomp.py", line 338, in eigh
b1 = _asarray_validated(b, check_finite=check_finite)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/scipy/_lib/_util.py", line 238, in _asarray_validated
a = toarray(a)
File "/home/dell/Murugan_R/13-12-2018/kaldi/egs/sre16/v2/data/feat/xvectors_enroll_mfcc/plda/env/lib/python3.5/site-packages/numpy/lib/function_base.py", line 1233, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
@RaviSoji sir. how to reslove this issue.
Hello,
I need to have probabilities for each class given a test sample. As in Scikit-learn, we can use .predict_proba() method.
I'd appreciate if you help me with this.
Is something like .predict_proba() exists in your repository or do i need to calculate probabilities based on scores return in tuple from your .predict() method?
Thanks
Hi @RaviSoji I am using this code as shown in mnist_demo nothebook. I am using my own data.
If I use classifier.fit_model(training_data, training_labels, n_principal_components=None)
, I get following error:
error: failed in converting 2nd argument `b' of _flapack.dsygvd to C/Fortran array in scipy/linalg/decomp.py in eigh
And if I use some value for n_principal_components such as classifier.fit_model(training_data, training_labels, n_principal_components=5)
, I get following error:
ValueError: array must not contain infs or NaNs in numpy/lib/function_base.py in asarray_chkfinite
I tried by freshly installing environment but errors are same. Any suggestion what is causing error?
Thanks.
Is your feature request related to a problem? Please describe.
In my case (using PLDA for information retrieval) it'd better to predict [let's say] best n
options instead of the best one for a given query.
I figured out that the predict
method does not support this feature. But it can be done using calc_logp_pp_categories
method.
Describe the solution you'd like
My fast solution for solving this was to use bellow code:
def predict_doc_at(query, k=1):
"""
Predict which document is matched to the given query.
:param query: input query
:type query: str (or list of strs)
:param k: number of returning docs
:type k: int
:return: return the document name
"""
query_embedding = get_embeddings(query)
data = PLDA_classifier.model.transform(query_embedding,
from_space='D',
to_space='U_model')
logpps_k, K = PLDA_classifier.calc_logp_pp_categories(data,
False)
best_k_idx = logpps_k.argsort()[::-1][:k]
predictions = K[best_k_idx]
return accuracy, predictions
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Describe the bug
After git clone, run "pip install plda",
the error goes like:
ERROR: Could not find a version that satisfies the requirement plda
ERROR: No matching distribution found for plda
PLDA is often used for biometric tasks, where we need to enroll a set of models. Based on those enrollments, we can generate scores. In your setup, how the enrollment will occur?
error:"array must not contain infs or NaNs"
When I use this framework for PLDA experiments, this error often appears. I can guarantee that all my input data are obtained through preprocessing, and they are all positive finite numbers, and the data scale is not large.
I don't know why such errors occur. Do you have any additional requirements for the input data?
Describe the bug
I want to summarize problems and lessons learned while using this PLDA implementation.
I observed two kinds of errors when training so far:
ValueError: array must not contain infs or NaNs
in optimizer.py:181 running calc_W(S_b, S_w)
LinAlgError: SVD did not converge
in model.py:166 running matrix_rank = np.linalg.matrix_rank(S_w)
Both boils down to covariance matrix computation.
To Reproduce
Steps to reproduce the behavior:
Use a dummy data with single example per class
ipdb> labels.shape [70/6138]
(5,)
ipdb> labels
array([1, 4, 3, 0, 2])
ipdb> embeddings.shape
(5, 896)
ipdb> plda_classifier = plda.Classifier()
ipdb> plda_classifier.fit_model(embeddings, labels)
/lnet/work/people/oplatek/plda/plda/optimizer.py:165: RuntimeWarning: Degrees of freedom <= 0 for slice
cov_ks.append(np.cov(X_k.T))
/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/numpy/lib/function_base.py:2680: RuntimeWarning: divide by zero encountered in true_divide
c *= np.true_divide(1, fact)
/lnet/work/people/oplatek/moosenet03/env/lib/python3.8/site-packages/numpy/lib/function_base.py:2680: RuntimeWarning: invalid value encountered in multiply
c *= np.true_divide(1, fact)
*** numpy.linalg.LinAlgError: SVD did not converge
What helped?
So far I created a dummy data using gaussian noise and ensured that we have two examples per label.
embeddings = np.concatenate((embeddings, embeddings + randn(*embeddings.shape)), axis=0)
labels = np.concatenate((labels, labels), exis=0)
plda_classifier.fit_model(embeddings, labels) # runs without errors
I am not 100% what is the root cause but I suspect that that computing covariance matrix from a single vector when computing within class scatter is the problematic part.
I will follow up and probably create a PR with asserts for the input data in order to avoid the error.
Note: There are several closed issues #56 #57 but I am opening this one for a new discussion and keep my (next) findings at single place. Feel free to close it if you do not like it.
I want to use this repo to do a speaker recognition experiment.
In my case, I have numbers of i-vectors from different speakers in training set.
After I training the model, I want to use it to classify some vectors from unknonwn speakers belonging some other classes which are not in training set, and each class already has some labeled vectors. Just like the situation in the paper Sec-3.1 said.
But I can only find the way to classify these vectors to the same classes in training set by using .predict() method. So how can I fix this new classes problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.