Coder Social home page Coder Social logo

Comments (10)

Ali-Tehrani avatar Ali-Tehrani commented on August 22, 2024 3

The generic function uses Morse Penrose Inverse from numpy and it uses the divide and conquer method from Lapack (gesdd) to calculate the SVD, whereas SciPy's Morse Penrose inverse uses the general rectangular approach ('gesvd') [1]. So I think that we just need to change from numpy to scipy.

[1] - https://stackoverflow.com/questions/13265299/the-difference-of-pseudo-inverse-between-scipy-and-numpy

from procrustes.

FarnazH avatar FarnazH commented on August 22, 2024 1

I think scipy.linalg.pinv fixed the problem... While we are at this, should we use scipy.linalg.svd instead of numpy.linalg.svd (so we can set the lapack_driver='gesvd')?

from procrustes.

PaulWAyers avatar PaulWAyers commented on August 22, 2024

It may still have a problem, I guess, because matrices with rows/columns of zeros are evil..... It's a strange error (that we encountered once before in GOpt) because SVD is a matrix factorization and therefore is very robust. But the divide-and-conquer algorithm is not as robust (but it is significantly faster). I guess that using smaller matrices is a good compromise, but perhaps we should at least document this error message, which I think is not unlikely to appear when zero-padding is substantial because the size-mismatch of the matrices is large. The obvious solution is to directly access the SVD algorithm that is a factorizaton (instead of the default divide-and-conquer) algorithm. Derrick may remember how to do that. He also had a hack-ish way to fix the error.

from procrustes.

FarnazH avatar FarnazH commented on August 22, 2024

These tests passed on Ubunto with python 3.8 before. Is GitHub Action using two different hardware for these runs?
We can decrease the matrix size and see whether it happens again.

The new runs failed again: https://github.com/theochem/procrustes/runs/2069045083#step:5:354

from procrustes.

PaulWAyers avatar PaulWAyers commented on August 22, 2024

The solution here, which may also be useful to @tczorro , is to use LAPACK directly. Construct the factorized SVD (more robust than the divide-and-conquer iterative algorithm), using (I think) scipy.linalg.lapack.gesvd though you may need to prefix the gesvd routine with dgesvd (float) or zgesvd (complex). Once you have the SVD (and this one should always work), you can then use

  1. Replace all elements of the sigma (singular value) diagonal that are greater than a threshhold with their inverse; replace values smaller than the threshold with zero. Call this inverse matrix Sigma-1, and its inverse .T (.H for Hermitian transpose).
  2. Then construct V {dot} Sigma.T {dot} U.H = pseudoinverse.

The traditional threshhold value for step 1 is, for an mxn input matrix A, max(m,n)*epsilon(elements of A). I.e., it is the machine-precision for the elements of A times the number of elements in the larger dimension. This could be approximated easily by merely using max(m,n)*1e-15, which is good enough.

This can be made (potentially much) more efficient by truncating the singular-value-inverse, U, and V to include only the elements corresponding to the nonzero rows/columns. That way the matrix multiplication is faster (because you are not computing a lot of "multiply something by zero" terms).

from procrustes.

FarnazH avatar FarnazH commented on August 22, 2024

We can use the lapack_driver argument of https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html

from procrustes.

PaulWAyers avatar PaulWAyers commented on August 22, 2024

That's a good trick. @tczorro should look at it too. You have to fall down the rabbit-hole of documentation a bit to find these things sometimes :-( .

If this works, it may be worth remembering that if someone tried a VERY huge matrix, the speed/memory benefits of the pinv2 algorithm in scipy/numpy might help. But Numpy had a stupid tolerance. And if you have a huge matrix, the robustness of a "real SVD" instead of an "iterative SVD" is liekly to be important.....

from procrustes.

PaulWAyers avatar PaulWAyers commented on August 22, 2024

I would use scipy.linalg.svd with lapack_drive='gesvd' everywhere. Where that is not appropriate (where speed is critical) probably there are better choices than SVD (which is a bit of a brute-force approach inherently).

from procrustes.

FanwangM avatar FanwangM commented on August 22, 2024

Using scipy.linalg.pinv instead of numpy.linalg.pinv can also fix the problem of failing generic Procrustes, #61. I have tested that this idea can fix the issuse both locally and on GitHub as well. This can lead to a more robust implementation of generic Procrustes testing without setting a random seed. So we should do the resting without adding a random see.

Thanks for proposing to use scipy.linalg.pinv! @FarnazH @Ali-Tehrani

from procrustes.

FarnazH avatar FarnazH commented on August 22, 2024

@Ali-Tehrani can you please take care of using scipy.linalg instead of numpy.linalg throughout the code? Thanks a lot.

from procrustes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.