gsvd's People
gsvd's Issues
consider adding in checks for PSD
Should the various matrices that should be at least positive semi definite be checked?
better illustrative data
I need better illustrative data for MCA & PLSCA, as well as just generally making these data examples better
update all examples
they are now out of date compared to the changes in the functions
Check & force stop when LW/RW are vectors and not correct size.
sweep() still runs and throws warnings. that's not appropriate!
fi & fj names
these are not coherent with the notation or the premise established
they should be renamed to lcs and rfc to "left component scores" and "right component scores"
also see Issue #20 -- as these two are now related (but distinct)
Make SVD slightly faster
Check for N < or > P, and transpose the matrix before SVD
svd() is faster when there are there are more rows than columns
error in tolerance_svd when nu = 0 or nv = 0
Hey Derek,
When calling tolerance_svd()
with nu = 0
or nv = 0
, there are errors since svd_res$u
and svd_res$v
don't exist.
I'm not immediately sure the cleanest way to check for this and adjust the function. I'm happy to help if you want me to take a stab at it, but maybe you have a good idea of how to fix it.
Thanks,
Luke
small example
library(GSVD)
set.seed(42)
X <- matrix(sample.int(20, 20*3, replace = TRUE), 20, 3)
tolerance_svd(X, nu = 0, nv = 0)
"tidy" up the place
I think that much of the internals of gsvd(), geigen(), and gplssvd() should conform to a more "tidy" verse like way of checking conditions, tests of input, and failures/exits
matrix.* functions
I only use the exponent (by way of %^%) in this package... While the others are nice, I'm thinking I should remove them for now and bring them back later as "useful utility features".
fi/fj & wfi/wfj
this is a very to-be-considered idea but would greatly benefit CCA and correlation PCA by way of geigen when using a covariance matrix...
I should consider introducing the idea of fi/fj and a "weighted" one. The "weighted" one should be perhaps an "unweighted" one, so that fi/fj are in the correct metric.
So either fi/fj = W[P/Q]D (as it is) or just [U/V]D
But perhaps an unweighted or "standard scores" approach should be [U/V]D and then W[P/Q]D remains as the "correct metric" scores
if I introduce them as weighted then fi/fj become [U/V]D where wfi/wfj become W[P/Q]D
but if I flip that, then fi/fj remain W[P/Q]D and then maybe "ufi/ufj" or perhaps something as simple as "ud/vd" for [U/V]D
Optimize decompositions
Consider adding in the switches to alternate decompositions, or introducing an alternating least squares/power method for when data are very large and we only need a few components.
functionalize checks
The same checks are performed in geigen(), gsvd(), and gplssvd() (sometimes also multiple times in each). These should be turned into functions and put into utils.R
Consider the use of Matrix package
that may help make memory things more effiicient
redo beer data
it's silly that I've created all the different dummy coding matrices.
undo them this way: https://r.789695.n4.nabble.com/R-how-to-convert-multiple-dummy-variables-to-1-factor-variable-td810654.html
replace sweep() calls with * and t()
all sweeps should be removed. these are actually slower than doing multiplying a matrix by a vector and two transposes
Formal tests
I probably want to include formal tests for various conditions by the time we hit a major release.
check that matrices are numeric
kind of important to halt everything if they aren't
class() order incorrect?
I think I may want to reverse the order of the items in the class() vector
*sqrt_psd_matrix() should have tol test
these two functions should require a real tol value and shouldn't be allowed to go to the non-numeric pass throughs
effectively, these methods must ensure that the eigenvalues retained are tested to make sure they are positive
geigen: need symmetric test on weights!
as of right now I'm assuming the weights are symmetric but they don't have to be...
which also means that the particular matrix line of
X <- sqrt_W %% X %% sqrt_W
should be
X <- sqrt_W %% X %% sqrt_psd_matrix(t(W))
make a real to do list
stop using issues as a to do list you dummy!
Vignettes
The package needs vignettes before I push it to CRAN.
print, summary, plot, other classes?
definitely implement print, plot, and summary
should I use or make other classes?
I do not want predict.*() at this time. That should be for actual analyses, not the decompositions.
speed up/memory efficiency
for future dev, resurrect @cfhammill 's GSVD/eigen stuff:
Check diagonal & vector constraints for positive semi-ness
weights can't be negative, else we end up with something not positive semi-definite
geigen rewrite: small consideration
it might be worthwhile rewriting geigen (and tolerance_eigen) to compute eigen-things via the SVD. it's a bit safer/faster and I can guarantee no negative eigenvalues
but then that must be a big design decision: am I willing to enforce that level of strictness for analyses? at this time no, but, I'm considering it down the line
check k?
Should I check k before it's entered for various silly values, and then stop/silently work if they are silly values?
compact returns
the g*() functions should have a "verbose" and "compact" return
the "verbose" is what I have now, the "compact" should focus strictly on what is needed for decomposition/rebuilding the matrices
expand class checks
there should be some assurance that it is in fact a GSVD object of a particular type by checking the names of the list, at least.
update README
tolerance_eigen error with rank 1 matrices
Hey Derek,
I got an error when using tolerance_eigen
on a rank-1 matrix. The error comes from the call to colSums
and says
Error in colSums(eigen_res$vectors) :
'x' must be an array of at least two dimensions
In line 64 below, eigen_res$vectors
gets converted to a vector when there is only one eigenvalue to keep.
Lines 64 to 68 in 0f41cf2
I converted back to matrix and that seems to have fixed the issue for me.
https://github.com/LukeMoraglia/GSVD/blob/a47ee067abe3a468137b34a4d5fd0984abca40ad/R/tolerance_eigen.R#L64
Code to reproduce error
library(GSVD)
set.seed(42)
X <- matrix(sample.int(20, 20*3, replace = TRUE), 20, 3)
R <- cor(X)
# Create a rank-1 matrix from R
eig_R <- tolerance_eigen(R)
R_rank1 <- as.matrix(eig_R$vectors[,1]) %*% eig_R$values[1] %*% t(as.matrix(eig_R$vectors[,1]))
tolerance_eigen(R_rank1)
dropping tau & related updates
tau doesn't make sense necessarily because of the eigen.
so tau goes away and will become part of ExPo2
geigen: sqrt(abs(eigenvalues)) * sign(eigenvalues)?
right now, the singular values come back as NaN when there are negative eigenvalues. maybe here I can try to catch that and send back "negative" singular values
but unsure...
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.