andland / logisticpca Goto Github PK
View Code? Open in Web Editor NEWDimensionality reduction for binary data
License: Other
Dimensionality reduction for binary data
License: Other
Based on this benchmark, the package rARPACK
is about twice as fast for non-sparse matrices on my PC.
Would need to change generalizedPCA
too.
Add type = 'biplot'
option to plot
commands.
Dear prof Andrew J,
I've been reading your article in the Journal of Multivariate Analysis called Dimensionality reduction for binary data through the projection of natural parameters. As far as I know, you obtained the principal component score by establishing the relationship between the natural parameters of the saturation model and the natural parameters of the Bernoulli distribution. I try to generate high-dimensional related binary data by cutting quantiles through mixed multivariate normal distribution, and get the principal component of binary data from the R package you provided, and predict the principal component score of new data. Surprisingly, the generation method of new data is the same, only the random-seed is different. But the final predicted principal component scores varied widely. So I would like to ask you how to understand this phenomenon?
I am looking forward to your early reply.
Yours sincerely
It might not be a good idea to include in this package if it significantly slows performance.
No really an "issue" but a question: does the "m" parameter calculated in the cv.lpca function correspond to anything meaningful in the data, or the output. The reason, I ask is that it seems to correlate with the number of "clusters" in the PCA plot. Maybe this is just coincidence?
Hi,
I'm trying to evaluate the three different methods shipped with this package on my data. The data is a 76x4623 matrix.
Estimating m
with cv.lpca()
is extremely slow for this size matrix, just the first iteration of the function at m=1
took >24 hours. Is there any way to speed this up? For now, I am just using logisticSVD()
which takes <1 minute, but I am interested to compare the different approaches.
Best,
Ollie
I have written different method functions for lpca, lsvd, and clpca. I can probably combine many of them. The methods to combine are:
The package irlba
was updated to version 2.0.0 recently. This update includes the ability to get the first few eigenvectors of a symmetric matrix using partial_eigen
. In the past, I used the irlba
function to do this (without assuming symmetricity) and it was very ineffective. (Hence, why use_irlba = FALSE
for logisticPCA.) If it improves, I should also update generalizedPCA.
There is also the ability to center and scale, but that probably won't matter since our matrices are not sparse.
Tipping, M. E. (1998). Probabilistic visualisation of high-dimensional binary data. NIPS 11, pp. 592-598.
Hi,
I was wondering if logisticPCA::logisticSVD function might incorporate a way to retrieve (or calculate) the proportion of deviance explained by each principal component, in addition to the overall proportion explained by the model, which is already implemented.
I think this might be a nice new feature for the package!
Best regards,
Martín
reshape2::melt()
is called just once in plot.cv.lpca()
here. Seems silly to import it.
Due to its structure, the convex formulation may prefer higher values of M
no matter k
using the current setup of cv.clpca
. It may be better to see how well it reconstructs missing values of the matrix. Also, it is common for users to just have a holdout set of validation data, instead of looping over all folds.
Depends on #4
I have a similar question to this person: https://stats.stackexchange.com/questions/319818/how-to-analyse-the-strength-of-the-variables-in-a-logistic-pca-using-r
Your code essentially ends at creating a plot, and I am unsure of how to interpret the results. Essentially, my data table has 4 voting parties, that voted upon 11 policy changes, and I want to determine where the commonalities are.
Should citation refer to the paper?
Add @reference
to logisticPCA and convexLogisticPCA functions
Hi,
I couldn't install your package. This get me an error:
** installing vignettes
** testing if installed package can be loaded
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'checkCompilerOptions' not found
Calls: ::: -> get
Execution halted
ERROR: loading failed
Do you have an idea to help me?
Thanks
Is it okay to use the same documentation as the uci repo?
Hi,
I am having the following issues, first logistic SVD fails
> logisticPCA::logisticSVD(bdata,k=2)
45 rows and 395 columns
Rank 2 solution
20.3% of deviance explained
11 iterations to converge
Warning message:
In logisticPCA::logisticSVD(bdata, k = 2) :
Algorithm stopped because deviance increased.
This should not happen!
Try rerunning with partial_decomp = FALSE
Second, logisticPCA also fails
> logisticPCA(bdata,k=2)
Error in eigen(mat_temp, symmetric = TRUE) :
error code 1 from Lapack routine 'dsyevr'
No idea what's wrong, everything else works just fine. Unfortunately, I can't share the data.
While trying to implement LPCA into an analysis, I find it hard to know how to construct the methods. I think it would be helpful to include examples of how to call each method so this confusion can be mitigated.
Possibilities include:
n = 100
d = 10
x = matrix(sample(c(0, 1), d * n, TRUE), nrow = n)
log_like_Bernoulli(x = x, theta = outer(rep(1,n), gtools::logit(colMeans(x, na.rm = TRUE))))
which_missing = matrix(runif(n * d) < 0.25, nrow = d)
is.na(x[which_missing]) <- TRUE
log_like_Bernoulli(x = x, theta = outer(rep(1,n), gtools::logit(colMeans(x, na.rm = TRUE))))
loglike goes up with less data?
Hi,
I'm trying to run logisticPCA on a 105x91802 binary matrix. I've been getting the following error:
Error: vector memory exhausted (limit reached?)
The line of code that's causing the issue is: qTq = crossprod(q), since this requires computing a 91802x91802 matrix which my laptop won't handle.
Is there a workaround for this? Regular PCA still works for the matrix.
Thanks,
Ayan
To be consistent with the papers. In vignette/tests too.
The current handling of missing data isn't optimal. Fix it.
Create print
and plot
methods for lpca
class. Do same for lsvd
class, in addition to predict
. Possibly add a cv
method as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.