graspologic-org / graspologic Goto Github PK

View Code? Open in Web Editor NEW

671.0 18.0 136.0 668.18 MB

Python package for graph statistics

Home Page: https://graspologic-org.github.io/graspologic/

License: MIT License

Python 100.00%

graph data-science networks python machine-learning graph-statistics

graspologic's People

Contributors

Stargazers

Watchers

Forkers

tpsatish95 hhelm10 kristianeschenburg adalisan bstadt bdpedigo kikiwink quasilegendre shiyussy idc9 tathey1 neurodatadesign rguo123 asaadeldin11 pmyers16 bvarjavand luochuankai-jhu xueminzhu-charmaine science4fun shaaaan chaoshengt codeaudit dfrancisco1998 caseyweiner jingyan230 spencer-loggia zeou1 soumi7 rflperry pauladkisson daxpryce kareef928 loftusa anshutrivedi neurodata zubair-aziz junhaobearxiong perifanosprometheus bryantower pssf23 eigenvivek tliu68 saumitrash langsterga vmdhhh loribeiro adulau abdullah-jannadi emasquil global-localhost global19 global19-atlassian-net dtborders aaitbr sagarhowal rajpratyush kellymarchisio madlad33 deedeebanh diane-lee-01 domandinho qpc-database sleepyhead01 spector-in-london vishalbelsare oshinsharma2002 vishaljha2121 ngcd04-fa07 pbourke standardgalactic cuchulainx shikharvashistha youngser sneznaj santianlin mtandon18 maggiehua01 theartpiece tony-otis ju-c ebridge2 nmcguire101 fvaduva prabhsimran1313 dokato burugaria7 aj-hersko jmleksan isabella232 jwellan1 janeka1122 ktwillcode hugwuoke codewithasadofficial guilhermemonteiropeixoto alyakin314 gdchen94 johntango aad1tya ankit0806

graspologic's Issues

add sphinx doc script and host on github pages

for friday (9/21)

implement ASE, LSE, PTR, semipar

based on input data formats we agree on, make sure these functions work properly and no longer call R scripts

DoD:

tests demonstrating that these functions provide expected output, can take in whatever data format we agree on
jupyter notebook demonstrating use

Implement standardized IO, simulations, nonpar, semipar

write code, tests, and documentation for the following:

standardize IO
simulations
nonpar
semipar

DoD:

finished code with documentation, tests for each
travis integration for each function added

Read papers

Statistical inference on random dot product graphs: a survey
https://medium.com/basecs/a-gentle-introduction-to-graph-theory-77969829ead8

rerun analysis on CoRR and HBN

using modified atlases and dMRI pipeline

Find

On github
Permanent DOI
License

Install

Installation guidelines
On PyPi
- this is done automatically

Run

Demo, including expected results, data, and runtime
Readme with quick start guide
Autogen docs

Modify

https://bitsandbrains.io/2018/10/21/numerical-packages.html

sample correlated SBM

Example:

require(igraph)
gg <- rg.sample.SBM.correlated(n = 100, B = matrix(c(0.5,0.5,0.2,0.5), nrow = 2), rho = c(0.4,0.6), sigma = 0.2)

summary(gg$adjacency$A)
IGRAPH c77bf6c U--- 100 2424 --
summary(gg$adjacency$B)
IGRAPH 3ccfb0c U--- 100 2039 --

cor(as.vector(gg$adjacency$A[]), as.vector(gg$adjacency$B[]))
[1] 0.1494246

Correlated ER

rg.sample.correlated.gnp <- function(P,sigma){
require(igraph)
n <- nrow(P)
U <- matrix(0, nrow = n, ncol = n)
U[col(U) > row(U)] <- runif(n*(n-1)/2)
U <- (U + t(U))
diag(U) <- runif(n)
A <- (U < P) + 0 ;
diag(A) <- 0

avec <- A[col(A) > row(A)]
pvec <- P[col(P) > row(P)]
bvec <- numeric(n*(n-1)/2)

uvec <- runif(n*(n-1)/2)

idx1 <- which(avec == 1)
idx0 <- which(avec == 0)

bvec[idx1] <- (uvec[idx1] < (sigma + (1 - sigma)*pvec[idx1])) + 0
bvec[idx0] <- (uvec[idx0] < (1 - sigma)*pvec[idx0]) + 0

B <- matrix(0, nrow = n, ncol = n)
B[col(B) > row(B)] <- bvec
B <- B + t(B)
diag(B) <- 0

return(list(A = graph.adjacency(A,"undirected"), B = graph.adjacency(B,"undirected")))
}

non-igraph version of correlated SBM

#gg <- rg.sample.SBM.correlated(n = 100, B = matrix(c(0.5,0.5,0.2,0.5), nrow = 2), rho = c(0.4,0.6), sigma = 0.2)
#cor(as.vector(gg$adjacency$A[]), as.vector(gg$adjacency$B[]))
rg.sample.SBM.correlated <- function(n, B, rho, sigma, conditional = FALSE){
if(!conditional){
tau <- sample(c(1:length(rho)), n, replace = TRUE, prob = rho)
}
else{
tau <- unlist(lapply(1:2,function(k) rep(k, rho[k]*n)))
}
P <- B[tau,tau]
return(list(adjacency=rg.sample.correlated.gnp(P, sigma),tau=tau))
}

Implement dimselect, seeded graph matching

write code, tests, and documentation for the following:

dimensionality selection
joint graph embedding base on shangsi paper

DoD:

finished code with documentation, tests for each
travis integration for each function added

dimselect

make sure that it checks that the input is a vector, not a matrix.

@bdpedigo @j1c

learn how to use issues

Create many to one vertex matching algoritm

Many to one algorithm

DoD:

Some whitepaper describing the problem, the algo, any proofs.

Refactor dissimilarity function from omni to cmds

title

add common models that can be fit to graphs

add models for SBM, ER, and ZI, inheriting from a base model class.

Embedding regularization investigations

see how "adding c" affects embeddings
see how the embeddings look when we don't even check for LCC
see if one or two graphs are messing everything else up because they have many unconnected nodes
figure out how to still run sparse code after adding c
see if augmenting diagonal changes anything even when there are unconnected nodes (eg can we force connectivity somehow and just fake it)

Run ASE/fast ASE comparison

Change plot Heatmaps to 1-tone

It's slightly more intuitive to have single tone heatmaps (ie, color for large values, white for small values, somewhere in between otherwise). Makes visualizing stuff like the below:

as the absence of color generally indicates the something is small, whereas the presence of color usually indicates more of something, and here, that is fairly unintuitive. if you were to do 3 colors, that requires your readers to really be checking the axes, limits, etc so that's why we typically do 1-tone with white for small and color for large

sparse matrix support

Should be easy, most functions should already work on sparse matrices but we will need to update our typechecking in several places and write tests to make sure.

Also J1c says that one of the SVDs does not work on sparse

Possibly could include support for rank 1 + sparse matrices where rather than many 0s they have many of some constant

CONTRIBUTING Guidelines

Write a concrete contributing guidelines

DoD
CONTRIBUTING.md that specifies the following:

Coding guide following PEP8
Docstring guide following that of numpy/scipy NOT Google

What plots would be useful?

First I think seaborn is a good choice. ggplot for python hasn't been developed in over 2 years so while ggplot is nice, I don't think I'm going to use it. Not the biggest fan of plotly. Thoughts appreciated.

So for actual plots:

Function for plotting 1 or more adjacency/similarity/dissimilarity matrices as heatmaps
Pairwise scatter plots. So given X \in R^{n x d}, it plots each pairs of dimensions as scatter plots
A generic scatter plot. Given X \in R^n and Y \in R^n, just makes a scatter plot.
??

@jovo @ebridge2 @bdpedigo

do ASE/LSE only on the largest connected component.

add edge-case testing to simulations

as title states

learn how to use travis

Omnibus Embedding

Write a function for omnibus embedding with following features:

Can take any number of matrices
Checks for same matrix dimensions

DoD:

Code + tests demonstrating that it works.

Analyze another data set based on tools/discoveries in prior sprint

Based on what is discovered in sprint 2, see if any significant findings can be repeated in another data set. In particular, if disease/environmental phenotype data can be related to graph statistic properties, try to find another data set for that specific phenotype

DoD:

Quantification of graph statistical properties with regard to phenotype data as in Sprint 2
Reproducible figures and statistics in Jupyter for, ready for publication

OmnibusEmbed fit_transform results

Hi developers (cc @jovo),

I am running the OmnibusEmbed into several correlation matrices derived from functional magnetic resonance imaging data. Currently, I have 133 subjects, each one with 249 brain regions of interest timeseries. For each subject, I compute the pearson correlation matrix, so in the end I have a matrix [133 x 249 x 249] (if you prefer, 133 graphs with 249 vertices).

However, when I run:

embeddings = OmnibusEmbed(k=20).fit_transform(correlations)

embeddings becomes a 2-items tuple, with two matrices [33117 x 20], in which np.allclose(embeddings[0], embeddings[1]) is True. Why is it returning two of them?

Also, is it safe to reshape the matrix [33117 x 20] into [133 x 249 x 20], in a way that embeddings[0] contains the embeddings of the subject 0`s regions?

Thank you!

URerf Graph2vec

"concatenated vectors through unsupervised random forest, the features that were most informative would be the ones that are used. then, rather than MDS, we simply do an eigendecomposition"

More misc TODO

Pedigo

Standardize IO

Shared io functions in utils, so there is less inconsistency and chances for breakage. Ie, this breaks id imagine https://github.com/neurodata/pygraphstats/blob/master/graphstats/ase/ase.py with networkx.

Todo:

profie networkx to numpy matrix function (sparse as function of n, dense as function of n from n=10 to 10k on log scale; sparse = O(n) edges, dense = O(n^2) edges)
embed base class
IO functions
IO tests
travis integration

DoD:

figures showing profiling results from above
IO functions
consistent base class for "embed" methods
tests for both of the above, demonstrating that they work
Description of tests:
- test1: show that input accepts both networkx and numpy objects, and correctly returns the same object type for both
start travis

look into sklearn preprocessing functions

See if any of these could be useful for graphs

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer

add diag aug to ASE

ASE(A) operates on

A + diag(degree_vec / n-1)

Fix notebooks on netlify

Issues with setting jupyter notebook kernels prevent them from running on netlify

See if semipar needs LCC protection for the simulated RDPGs

Take Notes for Discriminability

Simulation Notes
Figure Notes

Implement omni, clustering after embedding, plotting functions

Write code, tests, and documentation for the following:

DoD:

Primitives and Super Primitives to add

Primitives
dimselect
ASE
LSE
OMNI

Super Primitives
Nonpar
Semipar
GClust
OOCASE

scipy.sparse.linalg.svds requires non-int array

must be float or decimal. i will update import_graph function to cast to float if int array.

Compare semipar for scaling vs clipping P matrix

Func for returning the actual latent positions in BaseEmbed or LatentPosition?

Thoughts on adding a function that returns the actual latent positions? So that I don't have to keep typing np.dot(lpm.X, np.diag(lpm.d) ** 0.5)?

I think it makes sense to add to BaseEmbed, but I could see it being added to LatentPosition. @ebridge2 ?

randomized SVD for ASE/LSE

dimensionality selection

Can take matrices of any size

DoD:

Code + tests demonstrating that it works.

read the following

update demo datasets

add graphs from here https://github.com/neurodata/graphstats/tree/master/data

networkx

pip install networkx
understand what classes and methods are available in the package

Closer look at semipar power/null distribution improvements

Add graph simulations

ER, SBM, zi-poisson ER, zi-poisson SBM, weighted ER, weighted SBM simulations.

DoD:

simulations subpackage added
tests for simulations subpackage for each simulation type to validate that graphs simulated from the respective simulations satisfy the hyperparameters for the simulation type

redo atlases using Jaewon's JSON mappings

Make a class for a latent position model and adjust embeddings to store the latent position model

Make a basic class contaning a structured representation of a latent position model. This consists of an X, a Y, and a optional vtx_names attribute where X \in \mathbb{R}^{N \times k}, Y \in NULL U \mathbb{R}^{N \times k}, and vtx_names \in \mathcal{S}^{N}. Correspondingly, make the base embedding class contain an instance of a latent position model to return to users.

DoD:

latent position model class build and tests added
embedding base method contains a instance of latent position model, and tests added correspondingly for base embedding method upon delivery of @bijanv 's dimselect method

Pipeline support

Roll several preprocessing steps (various types of PTR, other transforms, etc), embeddings, and clustering steps into a sklearn Pipeline that will work with randomized parameter search

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV

Explore spectral and omnibus embedding wrt. phenotype data

Select some phenotype features of interest and explore how spectral embedded data look with regard to these features. If it seems reasonable based on this output, try clustering/classification
DoD:

pretty graphs generated from an extensible python/class module
use jupyter notebook to tell data narrative

graspologic-org / graspologic Goto Github PK

graspologic's People

Contributors

Stargazers

Watchers

Forkers

graspologic's Issues

Find

Install

Run

Modify

Correlated ER

non-igraph version of correlated SBM

Pedigo

Pedigo

J1C

Recommend Projects

Recommend Topics

Recommend Org