xgi-org / xgi Goto Github PK
View Code? Open in Web Editor NEWCompleX Group Interactions (XGI) is a Python package for higher-order networks.
Home Page: https://xgi.readthedocs.io
License: Other
CompleX Group Interactions (XGI) is a Python package for higher-order networks.
Home Page: https://xgi.readthedocs.io
License: Other
In the requirement readme file, links for example.txt
and doc.txt
are broken.
Currently, edges contain singletons (nodes) :
list(H._edge.values())
# [[0], [3, 4], [4], [4]]
it's often useful to have a list of edges that does not contain nodes.
Add a function to get that.
Should the function output a new external list, of remove singletons from the edges stored internally in the HyperGraph?
The proposal is to have a subclass of Hypergraph where every time the user does something fishy, it raises a warning.
H = some_hypergraph()
H.add_edge([1, 1, 2]) # an edge with a repeated node is added
easy = xgi.HypergraphWithWarnings(H)
easy.add_edge([1, 1, 2])
# -> Warning: An edge contains a repeated node
Implementing such a class would require executing many sanity checks each time an edge is added. This makes the class slow and inefficient. However, it is very safe and it "protects" the user at each step of the way. There would also be a way to easily convert one class to the other. The main caveat would be to make sure the user remembers to convert from one class to the other when they need to. We could perhaps highlight this in the documentation, and also include it in the warning text:
easy.add_edge([1, 1, 2])
# -> Warning: An edge contains a repeated node (call XXX method to disable these warnings)
All these functions do very similar things. In keeping with the idea of doing things only once, I vote we simplify them somehow.
I personally lean toward the nuclear option and just keeping shape
. All the others are not necessary...
Currently,
H.edges
# EdgeView([3, 2, 4, 9])
outputs edge ids, not edges as list of member nodes.
It's often useful to access edges as list of member nodes.
One current way is list(H._edge.values())
but it's long and hidden.
Should we have a two views?
H.edges_id
# EdgeIdView([3, 2, 4, 9])
H.edges
# EdgeView([[0], [3, 4], [4], [4]])
Make sure the documentation highlights the fact that the user should remember to call remove_isolates
and other related functions.
Implement a cleanup
method that performs various common clean up tasks, see comment down thread.
Sort using isort.
The adjacency matrix is computed indirectly through the incidence matrix via the formula
The current functionality only specifies the number of hyperedges to which a node belongs. In principle, one should be able to specify a hyperedge size or list of sizes and it returns the number of hyperedges of that size of which that node is a part.
For reproducibility, it would be nice to have a "random_seed" argument that can be set in the generators.
As discussed today, trying the random hypergraph generator:
import xgi
n = 10
m = 100
p = 0.01
H = xgi.erdos_renyi_hypergraph(n, m, p)
H.nodes
# NodeView((1, 3, 6, 7, 9))
the hypergraph has 5 nodes, not 10, even though the doc says n
is the number of nodes.
This discrepancy between n and the actual number of nodes is apparently expected for small n and p (correct me if I'm wrong).
If so, add a warning to the docstring. Otherwise, fix the bug.
The incidence matrix should always have dimension (num_nodes, num_edges)
or (num_nodes, num_edges_of_order)
.
But when sparse=True
, dimension 0 is wrong when some nodes are disconnected (overall, or at a given order when it is specified). This makes all the other functions using the incidence matrix wrong (adjacency, laplacian, etc.).
MWE:
nodes = range(5)
edges = [[0,1], [0,2], [1,2,3]]
H = xgi.empty_hypergraph()
H.add_nodes_from(nodes)
H.add_edges_from(edges)
xgi.incidence_matrix(H, sparse=False).shape
## --> (5, 3) # good
xgi.incidence_matrix(H, sparse=True).shape
## --> (4, 3) # not good
This is because when sparse=True
, the sparse matrix is built by iterating over the members of edges (of order d):
The words "order" and "size" are used throughout the codebase to refer to the number of nodes in an edge (where order = size-1). I'd suggest we choose one and remove all instances of the other. For example, we have H.edge_size
but H.edges_of_order
.
The .github/workflows/test.yml file needs to be edited so that the tests pass. I removed doctests for now. Having trouble getting it to work so would appreciate any help.
Currently, the way we have implemented subhypergraphs are essentially as read-only (but separate) hypergraphs. They are not really views, though there is some language in the code base that would suggest that they are. What I mean by this is the following:
H = xgi.Hypergraph([(0,1,2), (0,3), (1,2)])
sub = H.subhypergraph([0,1,2])
# the subgraph contains the correct edges
sub.edges
# -> EdgeView((0, 2))
# now modify the original graph
H.add_edge([0, 1])
H.edges
# -> EdgeView((0, 1, 2, 3))
# the subgraph DOES NOT update
sub.edges
# -> EdgeView((0, 2))
# we want EdgeView((0, 2, 3))
My problem is this: since the subgraphs do not update automatically (i.e. they are not views), there is no point in making them read-only. So for now we can just return a new Hypergraph object whenever the user needs a subgraph.
A weak removal of a node from a hypergraph means removing the node from each edge it belongs to and then removing the node itself from the node set. A strong removal means weak removal AND deleting all edges that contain that node. Currently, Hypergraph.remove_node
implements strong removal only.
We should think whether or not this makes sense in the case of simp. comps.
Update the docstrings of the functions to the numpydoc format for the following files:
I'm just wondering if we still wanted to implement this solution #28 (comment) that we seemed to agree on. But then we didn't implement it.
Namely, something more like
H = xgi.Hypergraph([[1, 2, 3], [3, 4], [4, 5, 6]])
H.edge_ids
# -> EdgeIDView((0, 1, 2))
H.edges
# -> EdgeView([[1, 2, 3], [3, 4], [4, 5, 6]])
H.edges(return_dict=True)
# -> {0: [1, 2, 3], 1: [3, 4], 2: [4, 5, 6]}
I think the reasoning was that it makes it shorter and more straightforward to see the members, as well as being consistent with networkx's edges.
Originally posted by @maximelucas in #70 (comment)
Calling
import xgi
H = xgi.random_hypergraph(10, [0.1, 0.01])
A = xgi.adjacency_matrix(H, weighted=True)
yields two warnings:
/usr/local/lib/python3.9/site-packages/scipy/sparse/compressed.py:291: SparseEfficiencyWarning:
Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead.
warn(bad_scalar_msg, SparseEfficiencyWarning)
/usr/local/lib/python3.9/site-packages/scipy/sparse/_index.py:125: SparseEfficiencyWarning:
Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
self._set_arrayXarray(i, j, x)
When weighted=False
, I cannot reproduce a warning (but I might have had one in the past, unsure).
It seems to come from the last line in
if not weighted:
A = (A >= s) * 1
else:
A[A < s] = 0
NodeView
supports accessing the edge ids that contain a node in three different ways:
H = some_hypergraph()
H.nodes[1] == H.nodes(1) == H.members(1)
# -> True
I find this unpythonic. I'd much rather there was a single way of doing things if at all possible. If there was exactly one way of accessing the edges that contain a node, which would you prefer? If anyone can argue that we should support two (or more) ways of doing the same thing, I'm all ears.
Related to #20.
We have discussed implementing a frozen object that would allow every statistic to be cached:
H = some_hyper_graph()
H.adj_tensor() # compute the tensor
# ...more code here...
H.adj_tensor() # has to recompute
H.freeze()
H.adj_tensor() # since H is frozen, it computes the tensor once and *caches it*
# ...more code here...
H.adj_tensor() # no need to recompute!
The point is that after H
is frozen, we can essentially cache everything!
H.freeze()
H.max_degree() # compute and cache
H.degree_histogram() # compute and cache
# etc
If at any point the user wants to modify H
again, the cache is flushed,
H.thaw() # everything that was cached is erased
H.add_edge([1,2,3])
H.max_degree() # need to recompute
H.freeze()
H.max_degree() # compute and cache
Graph generators may sometimes return hypergraphs with isolated nodes (e.g. ER hypergraph with small p
). In these cases, and others, it is usual to obtain the largest component. Ideally, we would like to do something like this:
G = xgi.erdos_renyi_hypergraph(n, m, p).largest_component()
G = xgi.erdos_renyi_hypergraph(n, m, p).remove_isolates()
Improve the efficiency of the has_edge method by implementing with generators.
import xgi
xgi.__version__
yields module 'xgi' has no attribute '__version__'
. It would be nice to support it.
Running
import xgi
H = xgi.Hypergraph()
H.add_edges_from([[0, 1, 2], [1, 2, 3], [2, 3, 4]]) #edgelist6
A, node_dict = xgi.adjacency_matrix(H, order=1, index=True)
returns
File ~/Dropbox (ISI Foundation)/WORK/SCIENCE/xgi/xgi/linalg/matrix.py:136, in adjacency_matrix(H, order, s, weighted, index)
133 I = incidence_matrix(H, index=False, order=order)
135 A = I.dot(I.T)
--> 136 A.setdiag(0)
138 if not weighted:
139 A = (A >= s) * 1
AttributeError: 'numpy.float64' object has no attribute 'setdiag'
That happens because, there is no edge of order 1 in the hypergraph. In that case, right now, we have
xgi.incidence_matrix(H, order=1, index=True)
# array([0])
so that I.dot(I.T) == 0
which is an int which has no diagonal to set.
To fix this, two options:
if else
in adjacency_matrix
to return an NxN array of zeros if I == [0]
. Probably need to add sparse
argument there too then to deal with both cases coming from incidence_matrix
.What do you think? Do we want to return [0] for the incidence matrix when there are no edges/nodes or not?
For the methods in the convert.py
file, the create_using
keyword is used to indicate the constructor to be used to add edges from. This may be unnecessary and potentially should be implemented differently.
Write initial tests for XGI using pytest for the following files:
The method in question is defined in classes/hypergraph.py
as :
def get_edge_data(self, id, default=None):
"""Returns the attribute dictionary associated with edge id.
This is identical to `H._edge_attr[id]` except the default is returned
instead of an exception if the edge doesn't exist.
"""
try:
# this may fail because the ID may not exist
# or the property doesn't exist.
return self.edges[id]
except KeyError:
return default
I suggest we move remove it from Hypergraph
and make it a part of IDView
, where it can just be called get
. This will match more nicely with the dictionary interface:
# instead of this
H.get_edge_data(-1, default)
# how about this
H.edges.get(-1, default)
pip on readthedocs seems to be trying to install pre-releases, so we had to upper bound numpy.
We have the case that EdgeView([...])
contains a list while NodeView((...))
contains a tuple.
H.edge[id]
, xgi.get_edge_attributes
, and H.get_edge_data
. All of these should be handled via the same interface, perhaps a AttrView
object that we used to have.
Because one can infer all the sub-interactions given a simplex, perhaps it is beneficial to store a simplicial complex using it's maximal edges to save space and perhaps use to develop efficient algorithms.
I was thinking that, when doing simulations (particularly for games), it is very useful to have the neighbors of a node divided by hyperedge. To explain better, here's what the current neighbors
function returns:
hyperedge_list = [[1, 2], [2, 3, 4]]
H.neighbors(2)
# {1, 3, 4}
which is basically the neighbors in the graph projection of H. What I was imagining is actually a higher-order neighbor function that returns {[1], [3, 4]}
. This is extremely useful in loops when you want to scroll over the hyperedges attached to a node and then the neighbors in those.
Do you think something like this could be integrated into the current neighbors
function? Or maybe two different ones? In my old code I had H.graph_neighbors()
and H.neighbors()
.
Change API such that
The tutorial 1 notebook with basic functionality is ready.
Currently the way of accessing nodal data of node 1 is H.nodes.data()[1]
or H.nodes(1).data()
and a more streamlined call would be preferable. Currently .nodes(id)
and .nodes[id]
access a node's bipartite neighbors so this probably is an undesirable way of doing it. The problem is that unlike NetworkX, we are using two dictionaries each for nodes and edges: one for nodes and one for attributes. So, I'm not sure what the function would return. I see three options 1) a separate property of Hypergraph
called H.node_data
or something 2) a flag data
where if data=True
, then it returns the node attributes and if false, it returns the nodes with the bipartite connections, or 3) return a tuple of the bipartite connections and data respectively. Would appreciate your thoughts!
It would be good to add a class that allows users to specify a directed hypergraph. I've seen several ways to do this, but not sure which is best.
After #41, the code H['name']
will access hypergraph attributes and therefore should (hopefully) always return the same as H.name
. However, if the name is not set, we will get, after that PR is merged, the following:
H.name
# -> ""
H['name']
# -> KeyError: 'name'
There may be unintended consequences to storing the hyperedge members as a set. When there are self-loops, the edge will shrink in cardinality. When constructing m-uniform hypergraphs with self-loops, the hypergraph will no longer will uniform.
Running
import xgi
edges = [[0,1], [0,2], [1,2,3]]
H = xgi.Hypergraph(edges)
yields
xgi.degree(H, order=3).shape
## --> (0,)
but it should be (4,)
, the number of nodes.
This then yields an error when running xgi.laplacian(H, order=3)
, because it conflicts with the (correct) dimension of the adjacency matrix:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 xgi.laplacian(H, order=3).shape
File ~/Dropbox (ISI Foundation)/WORK/SCIENCE/xgi/xgi/linalg/matrix.py:274, in laplacian(H, order, rescale_per_node, index)
270 return (np.array([]), {}) if index else np.array([])
272 K = degree(H, order=order, index=False)
--> 274 L = order * np.diag(np.ravel(K)) - A # ravel needed to convert sparse matrix
275 L = np.asarray(L)
277 if rescale_per_node:
ValueError: operands could not be broadcast together with shapes (0,0) (4,4)
This issue consists of 4 components:
The default behavior is that the first column specifies the nodes and the second column specifies the edges. Instead of calling this method and then H.dual()
, it would be nice if there was reverse=True
to switch the labels. This would affect the following functions:
In networkx, the (only?) way to get the largest degree is the following:
max(list(dict(G.degree()).values()))
I find this unacceptable. In xgi
, I propose we implement something like the following:
max(H.degree().values())
That is, we should try to give the View objects a bit of an interface that makes them behave like the object they point to. Specifically, since H.degree()
returns a DegreeView, which looks like a dictionary,
H.degree()
# -> DegreeView({0: 10, 1: 20, ...})
we should make the DegreeView object support some of the methods of a dictionary,
H.degree().values()
# -> [10, 20]
It may be possible to speed up the methods in algorithms/connected.py by removing the dependence on networkx and doing a breadth-first search directly.
Working on #29 to remove singletons, I came across this:
Running
import xgi
n = 10
m = 50
p = 0.1
H = xgi.erdos_renyi_hypergraph(n, m, p)
then in H._edge
, a given edge sometimes appear more than once with different IDs, e.g. { 38: [9], 39: [9], 48: [9], ...}
.
In addition, not all nodes are present in edges as a singleton. For example, there is a node 8 but no singleton edge [8].
Haven't checked if #33 fixes this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.