Coder Social home page Coder Social logo

logisticpca's People

Contributors

andland avatar wrathematics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

logisticpca's Issues

Biplot

Add type = 'biplot' option to plot commands.

Forecasts about new data

Dear prof Andrew J,
I've been reading your article in the Journal of Multivariate Analysis called Dimensionality reduction for binary data through the projection of natural parameters. As far as I know, you obtained the principal component score by establishing the relationship between the natural parameters of the saturation model and the natural parameters of the Bernoulli distribution. I try to generate high-dimensional related binary data by cutting quantiles through mixed multivariate normal distribution, and get the principal component of binary data from the R package you provided, and predict the principal component score of new data. Surprisingly, the generation method of new data is the same, only the random-seed is different. But the final predicted principal component scores varied widely. So I would like to ask you how to understand this phenomenon?
I am looking forward to your early reply.
Yours sincerely

Add weights

It might not be a good idea to include in this package if it significantly slows performance.

m parameter calculated by cv.lpca function

No really an "issue" but a question: does the "m" parameter calculated in the cv.lpca function correspond to anything meaningful in the data, or the output. The reason, I ask is that it seems to correlate with the number of "clusters" in the PCA plot. Maybe this is just coincidence?

Cross-validation speed

Hi,

I'm trying to evaluate the three different methods shipped with this package on my data. The data is a 76x4623 matrix.

Estimating m with cv.lpca() is extremely slow for this size matrix, just the first iteration of the function at m=1 took >24 hours. Is there any way to speed this up? For now, I am just using logisticSVD() which takes <1 minute, but I am interested to compare the different approaches.

Best,
Ollie

Refactor methods

I have written different method functions for lpca, lsvd, and clpca. I can probably combine many of them. The methods to combine are:

  • print
  • plot
  • fitted (not for clpca)
  • predict
  • cv
  • cv.plot

Update irlba

The package irlba was updated to version 2.0.0 recently. This update includes the ability to get the first few eigenvectors of a symmetric matrix using partial_eigen. In the past, I used the irlba function to do this (without assuming symmetricity) and it was very ineffective. (Hence, why use_irlba = FALSE for logisticPCA.) If it improves, I should also update generalizedPCA.

There is also the ability to center and scale, but that probably won't matter since our matrices are not sparse.

Add Tipping's formulation

Tipping, M. E. (1998). Probabilistic visualisation of high-dimensional binary data. NIPS 11, pp. 592-598.

prop_deviance_expl for each Principal Component

Hi,

I was wondering if logisticPCA::logisticSVD function might incorporate a way to retrieve (or calculate) the proportion of deviance explained by each principal component, in addition to the overall proportion explained by the model, which is already implemented.

I think this might be a nice new feature for the package!

Best regards,
Martín

Different options for cross validation

Due to its structure, the convex formulation may prefer higher values of M no matter k using the current setup of cv.clpca. It may be better to see how well it reconstructs missing values of the matrix. Also, it is common for users to just have a holdout set of validation data, instead of looping over all folds.

  • Matrix completion
  • Holdout validation

Update citation

Should citation refer to the paper?

Add @reference to logisticPCA and convexLogisticPCA functions

Failed installing package logisiticPCA

Hi,
I couldn't install your package. This get me an error:

** installing vignettes
** testing if installed package can be loaded
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'checkCompilerOptions' not found
Calls: ::: -> get
Execution halted
ERROR: loading failed

  • removing ?/mnt/gpfs/pt2/lib/R_3.2.3/logisticPCA?
    Installation failed: Command failed (1)

Do you have an idea to help me?

Thanks

Functions crashing

Hi,

I am having the following issues, first logistic SVD fails

> logisticPCA::logisticSVD(bdata,k=2)
45 rows and 395 columns
Rank 2 solution

20.3% of deviance explained
11 iterations to converge
Warning message:
In logisticPCA::logisticSVD(bdata, k = 2) :
  Algorithm stopped because deviance increased.
This should not happen!
            Try rerunning with partial_decomp = FALSE

Second, logisticPCA also fails

> logisticPCA(bdata,k=2)
Error in eigen(mat_temp, symmetric = TRUE) : 
  error code 1 from Lapack routine 'dsyevr'

No idea what's wrong, everything else works just fine. Unfortunately, I can't share the data.

Enhancement: Add examples to the README.

While trying to implement LPCA into an analysis, I find it hard to know how to construct the methods. I think it would be helpful to include examples of how to call each method so this confusion can be mitigated.

Add dataset

Possibilities include:

  • congressional voting
  • hall of fame voting

log_like_Bernoulli incorrect when missing data?

n = 100
d = 10
x = matrix(sample(c(0, 1), d * n, TRUE), nrow = n)
log_like_Bernoulli(x = x, theta = outer(rep(1,n), gtools::logit(colMeans(x, na.rm = TRUE))))

which_missing = matrix(runif(n * d) < 0.25, nrow = d)
is.na(x[which_missing]) <- TRUE
log_like_Bernoulli(x = x, theta = outer(rep(1,n), gtools::logit(colMeans(x, na.rm = TRUE))))

loglike goes up with less data?

running logisticPCA on large matrix

Hi,

I'm trying to run logisticPCA on a 105x91802 binary matrix. I've been getting the following error:

Error: vector memory exhausted (limit reached?)

The line of code that's causing the issue is: qTq = crossprod(q), since this requires computing a 91802x91802 matrix which my laptop won't handle.

Is there a workaround for this? Regular PCA still works for the matrix.

Thanks,
Ayan

Change M to m

To be consistent with the papers. In vignette/tests too.

Missing data

The current handling of missing data isn't optimal. Fix it.

Create methods

Create print and plot methods for lpca class. Do same for lsvd class, in addition to predict. Possibly add a cv method as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.