Coder Social home page Coder Social logo

topos's Introduction

spinGlass

Topos

This library implements natural topological and combinatorial operators on statistical networks described in [1] and [2] They for instance yield message-passing algorithms on graph neural networks, the belief propagation (BP) or sum-product algorithm for efficient marginal estimation (generalised to hypergraphs in [3]), and the belief diffusions that regularise BP [2].

The main data structure consists of fields of tensors, each having possibly different shapes, and stored internally as 1D Pytorch vectors. They can be for instance indexed by the hyperedges of a hypergraph $K \subseteq \mathcal{P}(\Omega)$, i.e. a collection of regions $\mathrm{a} \subseteq \Omega$ describing which variables are allowed to be measured simultaneously. In the case of a graph of binary variables, a field $(f_{\mathrm{a}})$ maps each vertex $i$ to a tensor $f_i$ of shape [2] (a function on $\lbrace{-1, +1 \rbrace}$) and each edge $ij$ to tensor $f_{ij}$ of shape [2, 2] (a function on $\lbrace -1, +1 \rbrace^2$).

This kind of structure may be called a spin glass in physics, a Hopfield network in neuroscience, or a Boltzmann machine or energy-based model in artificial intelligence, all of which are essentially equivalent to the famous Ising model.

Degree-0 fields are therefore collections of local functions $(f_{\tt a})$ for ${{\tt a \in }K}$, whose sum over $K$ is typically used to parameterise a global energy function (while such a global observable may be evaluated quickly, computing integrals or expectations is intractable in high dimension). Higher-degree fields for instance describe the degree-1 messages $m_{\mathrm{a \to b}}$ which are iterated upon in the belief propagation algorithm (for marginal estimation) or in message-passing neural networks (MPNNs). They are indexed by ordered region pairs $\mathrm{a \supset b}$ which generate the nerve of $K$ [2].

References

[:book:] : The topos wiki

[1] : Peltre, 2020, Message-Passing Algorithms and Homology, PhD thesis. arXiv:2009.11631

[2] : Peltre, 2021, Belief Propagation as Diffusion. GSI'21 proceedings. arXiv:2107.12230

[3] : Yedidia, Freeman and Weiss, 2000 - Generalized Belief Propagation, NeurIPS 2000. full text

Usage

Installation

Run pip install git+https://github.com/opeltre/topos

Or clone the repository locally before installing:

$ git clone https://github.com/opeltre/topos
$ cd topos
$ pip install .

You should be able to run tests with:

$ cd test && python -m unittest

Interfacing with Pytorch

The main purpose of this library is to construct various (wrapped) Pytorch tensors, related to the topology of an underlying data structure (graph, hypergraph, simplicial complex...). These tensor either represent fields of values over a domain (dense vectors, wrapped in the Field class), or linear operators between field types (sparse matrices, wrapped in the Linear class).

Working with these two classes has little more overhead than storing a reference to underlying domains, while providing a few convenient methods (e.g. f @ g for calling torch.sparse.matmul and f @ x for calling torch.sparse.matvec). Either way, the underlying Pytorch tensor may be accessed by the .data attribute, for seamless interfacing with other Pytorch or Pytorch-geometric code.

class GCN(nn.Module): 
     """ Glass Convolutional Network """

     def __init__(self, G, num_filters=10, degree=3):
          self.graph   = topos.Complex(G)
          self.weights = nn.Parameter(torch.randn(num_filters, degree))
          #--- Powers of the graph laplacian --- 
          L = self.graph.laplacian(0)
          self.L_powers = [self.graph.eye(0)]
          for i in range(degree):
               self.L_powers.append(L @ self.L_powers[-1])


     def forward(self, x):
          X = self.graph[0].field(x)
          #--- Convolutional filters are polynomials of the laplacian --- 
          P_L = sum(wk * Lk for wk, Lk in zip(self.weights, self.L_powers))
          return (P_L @ X).data

>>> V = torch.arange(12)
>>> E = torch.randint(12, (10, 2))
>>> gcn = GCN([V, E])
>>> x = torch.randn([100, 12])
>>> y  = gcn(x)

The mechanism for wrapping and typing Pytorch tensors has been moved to an other repository, opeltre/fp. Its purpose is to provide generic functorial constructs to emulate type polymorphism in Python. It exposes an unsafe Tensor class, a typed Tens : Shape -> Type functor, a Linear bifunctor... all holding a torch.Tensor instance wrapped inside an fp.Tensor instance (algebraic methods are lifted from one type to the other by a Wrap monad).

Contributing

Contributions are welcome. I am already grateful to @aklipfel for helping in the sparse module to speed up some of the index computations dramatically. If you enjoy this library and would like to help in any way please get in touch.

Overview

Fields and Domains

Index ranges are represented by topos instances subclassing the Domain base class. The Field functor then maps any domain D to a tensor type Field(D) wrapping 1D-torch vectors of length D.size:

from topos import Domain, Field
#--- Domain with keys [0, ..., n-1]
D = Domain(list(range(n)))
#--- Tensor type of shape [n] ---
FD = Field(D)  
#--- Fields wrap torch tensors ---
x  = FD(torch.zeros([n]))
True == isinstance(x.data, torch.Tensor)

A field instance x = D.field(data) can be created from any numerical data of the appropriate size. Note that D also provides with common fields constructors such as D.zeros(), D.ones(), D.randn(), etc.

Sheaves

The Sheaf class is a generic base class representing finite domains defined as a dependent sum type:

$$ F(K) = \bigsqcup_{a \in K} F_a $$

i.e. points of $F(K)$ are key-value pairs $(a, x_a)$ for $a$ in a set of keys $K$ and $x_a$ in the finite set $F_a$. A trivial sheaf, i.e. with a point above each key, is constructed by default. No categorical structure is assumed on $K$ so that one may create a non-trivial Sheaf instance by supplying shapes above each key by a list of shapes, a callable, or aggregate keys and shapes in a dictionnary.

from topos import Sheaf

#--- Sheaf(keys, functor=None) : equivalent forms ---
F = Sheaf({'ij': [3, 3], 'i': [3], 'j': [3]})
F = Sheaf(['ij', 'i', 'j'], lambda k: [k] * len(k))
F = Sheaf(['ij', 'i', 'j'], [[3, 3], [3], [3]])

The cardinal F.size of a sheaf instance F can be computed as:

F.size = sum(F(a).size for a in F.keys)
       = sum(Fa.size for Fa in F.fibers)

The corresponding index map for Field(F) instances can be conveniently visualized by calling F.range():

>>> F.range()
Field Ω :  ij :        [[0, 1, 2],
                        [3, 4, 5],
                        [6, 7, 8]]
           i :        [ 9, 10, 11]
           j :        [12, 13, 14]

(Hyper)graphs

The Graph class (which should be called Hypergraph) is a base class for sheaves G whose keys can be represented by (positive) torch.LongTensor instances of shape (len(G[k].keys), k + 1). In particular G is a graded sheaf instance with fibers G[0], ..., G[G.dim] each containing len(G[k].keys) regions of cardinal k + 1, also called hyperedges of dimension k. Instantiating large graph instances is much faster than large sheaf instances, as they enable to leverage on Pytorch's sparse matrix library.

A 1-graph G can for instance be created by:

from topos import Graph
G0 = [0, 1, 2, 3]
G1 = [[0, 1], [0, 2], [0, 3]]
G = Graph([G0, G1])

The resulting Sheaf instance has two keys 0 and 1 pointing to graded components of size G.sizes[k] and begining at G.begin[k]:

>>> G.range()
Field G :  0 :  [0] :        0
                [1] :        1
                [2] :        2
                [3] :        3
               
           1 :  [0, 1] :        4
                [0, 2] :        5
                [0, 3] :        6

A general n-graph G defines n+1 sparse adjacency tensors G.adj[k] of dimension k + 1 for k = 0, ..., n. Sparse tensors G.idx[k] of identical shapes allow for fast index access (sometimes by custom routines defined in core/sparse.py, thanks to Astrid Klipfel). This is particularly useful during the computation of topological operators when G is equipped with additional structure.

A functor-valued graph GF = Graph(G, F) can be defined given a functor F that maps (1) every region a to a shape F(a) and (2) every strict inclusion relation a > b to an index map F.fmap([a, b]). See functors for more details on functor creation. A canonical example is given by free functors, i.e. mapping every region to a cartesian product of atomic microstates.

from topos import FreeFunctor
#--- Each vertex has 3 degrees of freedom ---
F = FreeFunctor(3)
#--- Shape [3, 3] on edges ---
GF = Graph(G, F)

Simplicial complexes

The Complex class inherits from Graph and is used to represent graphs K such that for every region a in K, any subregion b of a also belongs to K. The simplest way to define a simplicial complex is via the simplicial class method:

from topos import Complex
#--- Simplicial closure ---
K = Complex.simplicial([[0, 1, 2], [1, 2, 3]])

The functor-valued simplicial complex KF associated to a functor F can then be created by:

KF = Complex(K, F)

Because the Complex constructor does not compute simplicial closures by default, be sure to provide a simplicially closed set of keys K when using the latter form.

Simplicial complex contain all the structure required for differential calculus. This means every complex defines a degree 1 linear map $d$ mapping k-fields to (k+1)-fields for all k, and satisfying the fundamental equation $d^2 = d \circ d = 0$.

d0 = K.diff(0)
d1 = K.diff(1)
# d1 @ d0 : Field K[0] -> Field K[2]
K.zeros(2) == d1 @ d0 @ K.randn(0) 

When K has scalar or constant coefficients, the first differential K.diff(0) from K[0] to K[1] simply maps a function on vertices to a function on directed edges by computing differences between end points. Edge features may also to be mapped to vertex features by a given functor. Its pullback then maps vertex observables to edge observables, allowing for a differential to be defined. The fundamental example consists of the FreeFunctor mapping every region $\mathrm{a} \subseteq \Omega$ to a local cartesian product $E_{\mathrm{a}} = \prod_{i \in \mathrm{a}} E_i$.

The tranposed operator K.codiff(k+1) = K.diff(k).t() decreases degree by 1 and involves the linear adjoints of functorial maps when K is equipped with functorial coefficients (this means Field(K) is identified with its linear dual by the canonical metric of $\mathbb{R}^{\tt K.size}$). In the case of a 1-complex K, the codifferential K.codiff(1) is the discrete divergence operator, aggregating directed edge values onto source and target vertices with opposite signs.

Note that differential operators are not computed upon complex creation but cached for reuse when called for the first time instead. Calling K.diff(x) on a degree k field instance will look for the operator K.diff(k) before applying it on the input field.

#--- K.diff(0) is added to K[0]._cache ---
>>> K.diff(0)
Linear K[0] -> K[1] : d
#--- Degree 0 random field ---
>>> x = K.randn(0)
#--- Lookup for K[x.degree]._cache["d"] ---
>>> K.diff(x)
Field K[1] : ...

Nerves

nerve

Nerve instance both leverage on the simplicial structure of Complex instances on the partial order structure of the underlying Hypergraph instance.

They are best constructed by calling Graph.classify().

topos's People

Contributors

opeltre avatar aklipf avatar

Stargazers

Bohan Lu avatar Ammar Husain avatar Luke Meyers avatar  avatar  avatar  avatar Marc Laugharn avatar  avatar

Watchers

James Cloos avatar Ammar Husain avatar  avatar

topos's Issues

Arithmetic operations on functionals + graded objects

Add support for +-*/ on functionals 🎰

  • cast numbers to constant valued fields - as multiples of D.eye() only work for [D, D]
  • arithmetic on the values
  • compose names

Inherit support for +-*/ from Graded, dispatching calls to each component
-> GradedLinear, GradedFunctional

(Would bring close to support a GradedField class for inhomogeneous fields, acted on by GradedLinear)

Domain / Field name attribute

So that we can represent maps as Linear f : A -> B, useful to keep track of src/tgt of restrictions, degrees, ... and all educational purposes

Continuous domains

Support continuous domains, e.g. $E_i = \mathbb{R}^{n_i}$, by using either polynomial observables on $E_i$ (degree-2 energies yielding Gaussian kernels) or smooth parameterizations from a vector space $\Theta_i$ to $C(E_i)$ ( nn.Module instance).

1. Gaussian mixtures

To handle $k$-modal gaussian mixtures on $V$, the configuration space can be expanded to $kV = V^{\sqcup k} = V \sqcup \dots \sqcup V$.

Defining a quadratic observable $H$ on $kV$ as a collection $(H_i)_{1 \leq i \leq k}$ of quadratic polynomials on $V$,
the associated Gibbs distribution on $kV$ is projected to a Gaussian mixture on $V$:

$$ p(x|i) = \frac {\mathrm{e}^{-H_i(x)}}{Z_i} $$

$$ p(x, i) = \pi_i \: p(x | i) $$

$$ p(x) = \sum_i \pi_i \: \frac {\mathrm{e}^{-H_i(x)}}{Z_i} $$

and where each $H_i(x)$ may be expressed into a canonical form $H_i(x_i) + \frac 1 2 \langle x - x_i | Q_i | x - x_i \rangle$.
In particular, the mixture loglikelihood is contained in the mean-value term $H_i(x_i)$.

$$ -\ln p(x, i) = - \ln \pi_i + \frac 1 2 \langle x - x_i | Q_i | x - x_i \rangle + \ln Z_i $$

Gaussian mixtures have the huge advantage of being easy and explicit to sample from.

2. Neural networks

There is no real problem in defining observables on continuous domains as nn.Module instances.
However, the Gibbs state correspondence fails to be
explicit when the hamiltonian $H_i : E_i \to \mathbb{R}$ is a generic nn.Module instance.

One may rely on different strategies to sample from $\frac {1}{Z_i} \mathrm{e}^{-H_i}$, without resorting to naive Monte-Carlo methods. Diffusion models, inspired by annealed importance samply, proved very succesful recently.

Strategy

Keep using core classes

The state sheaf for $\mathbb{R}^n$ could be taken to be Field(Domain(n)) without implementation changes.
There still remain subtleties for handling products of discrete and continuous domains => view discrete microstates as
Fields with torch.long values

Arrow interface

Harmonise the concept of real observable $f : \mathbb{R}^n \to \mathbb{R}$ with the concept of real-valued field $t : \mathbb{Z} / n \mathbb{Z} \to \mathbb{R}$ by inheriting from a common topos.Observable mixin class. This class should do little more than assuming a (batchable) __call__ method is implemented and that src and tgt attributes are defined, while ensuring compatibility with fp.Arrow(A, B) and functorial operations (Field.cofmap, etc. )

Microstates as fields

Describe a (batched) state $x_{\tt a} \in E_{\tt a}$ as a Field instance with dtype=torch.long, and Field.__call__ as index selection.

A section $(x_{\tt a})$ for ${\tt a} \in K$ can then be viewed as a length $|K|$ batch of microstates in the disjoint union $\bigsqcup_{{\tt a} \in K} E_{\tt a}$. Evaluating the total energy then amounts to evaluating local potentials individualy before summing batch values.

Global sections (in coordinate format) are projected onto a batch of indices by calling fp.Torus.index on each of the appropriate slices. In order to work with batches, we only need to divide the loop according to hyperedge dimensions.

Conditioning

Implement a map Cx : K -> K|x which evaluates a field along specified values of x:

  • this is the pullback of a map E|x -> E
  • the pushforward on measures maps the conditional probability of E|x to a measure on E restricted to the fiber of x.

Running BP on K|x seems the best way to estimate conditional probabilities p(y|x).

Sort graph hyperedges

The effect of the sort=True kwarg has been removed by some commit.

A lexicographic sorting functionality on hyperedges should still be optional, either by kwarg or by a classmethod (probably cleaner).

This may right now lead to inconsistent behaviour (e.g. IsingNetwork.lift_node_beliefs) as calling Nerve.classify will sort hyperedges lexicographically. Maybe working with the base Graph.Index methods still makes it possible to keep indices in order...

Sort this out!

Field multiplication on CPU

Computing the product of two fields, e.g.

pV = p * V
# or
V -= K.sums(p * V)

is right now very slow on CPU. Much slower than matrix-vector product, Gibbs state computation, etc. This is what's holding tangent diffusion far behind BP in performance.

This is quite strange as in the 2nd example, p * V should be understood as rvalue. Even if we allocate a tensor pV for this intermediate result in the loop as in 1st example, it stays slow, although there shouldn't be any new memory allocation.

Harmonise domain and system interface

Complex now inherits from domain so that K.field, K.zeros, K.randn... is transfered but both interfaces are not fully consistent.

For starters:

  • Replace calls to K[key] by K.get(key) to retrieve cells as we only want K[degree] to extract graded component.
  • See what K.__iter__() should do for system/domain instances: yield degrees K[d] / self=K[0] ? remove it?

Cartesian structure on domains and sheaves

Transfer the cartesian structure already present on Set to domains and sheaves:

A, B  = {i: A[i]}, {j: B[j]}
A * B = {i.j: A[i] * B[j]}          # cartesian product of shapes

That way the tensor product will be simply defined on fields as:

(u | v)[i.j] = u[i] * v[j]     # if scalar values
               u[i] | v[j]     # if tensor values (implement a torch.otimes)

Boundary Conditions

For Dirichlet boundary conditions,
restrict the image of K.delta[1] to a subdomain of K[0].

For Neumann conditions, restrict the image of K.nabla(p)[0] to a subdomain of K[1].

Functional / Linear fields

Instead of storing the energy function as a tensor, make it possible to compute it via a function e.g. linear or affine functional

Necessary for true convolutional layers (translation invariant)

Necessary for quotients (observable on a class of vertices lies in a low dimensional subspace)

Example usecase: images and MNIST dataset

  • each image may be viewed as a global system with observables restricted to sums of local magnetic fields (bias vector)
  • each layer may be viewed as a pair of systems with observables restricted to pairwise magnetic interactions (weight matrix)

This approach would make it much closer to torch or keras interfaces when piling up layers.

Functorial restrictions

Sheaf instances should implement restrictions on subsets of keys.

Fiber(key, shape) instances also implement these for restriction maps

Field access: local getters and setters

Implement local getters and setters so that Field instances really behave like dictionnaries.

G = Graph([[0,1,2], [[0, 1], [1, 2]]], FreeFunctor(2))
x = G.zeros(1)
# setter
x.set([0, 1], torch.ones([2, 2]))
# getter
x.get([1, 2]) == torch.zeros([2, 2])

N.B. Probably best to avoid __getitem__ and __setitem__ and not override fp.Tensor behaviour: x remains a 1D-vector internally.

Domains, operators and degrees

A couple improvements to be done:

  1. Define the empty domain on which every field is zero: this would be useful e.g. for operators like K.delta[0] or K.d[n], and avoid key errors in K[d]

  2. There should be a class for operators on the full complex, i.e. K.zeta @ field should call K.zeta[field.degree] @ field, etc.

Most operators are full : K[d].gibbs, K.zeta[d], K.delta[d], etc... but only the +-1 graded ones leave the complex at some point (to zero)

Harmonise Fiber and Sheaf interfaces

Fibers and sheaves both represent index ranges - a fiber is the same as a sheaf over a point.

Harmonising both interfaces will allow to have sheaves whose fibers are also sheaves.

This is particularly useful to define restrictions, quotients, fibrations...

(continues #12)

Interaction decomposition

Implement projection to homology classes via local iFFT

H(x) = sum_k h(k) x(k)

where x(k) = prod_i ei^(ki xi)

This extends the representation of local observables as polynomials in the binary case, +-1 being replaced by Nth-roots of unity.

Sheaf.scalars

Fix access to Sheaf.scalars (ideally as a property removing cumbersome brackets).

Most importantly the sheaf instance, as well as all its graded components, should be created only once to enforce equality at all levels.

# should be True
N.scalars()[0] == N[0].scalars()

Cartesian structure on Fields

Implement direct sum and tensor product:

(u & v).domain = Sum(u.domain, v.domain)
(u | v).domain = Product(u.domain, v.domain)

Made possible by #14, on fields but also linear functionals

Functor vs. Scalar values

It would be useful to have a scalar-valued field class with maps to/from the tensor valued fields, i.e.

C[d](K, R)  <-------> C[d](K, R^E)       # where E : K -> Set is the shape functor. 

Right now, one of the most costly operations upon system creation seems to be the following list comprehension: (core/operators.py)

def local_masses(domain):
    indices = [[i, j] for a in domain\
              for i in range(a.begin, a.end)\
              for j in range(a.begin, a.end)]
    ...

Basically we create block diagonal matrices of 1s (sum then extend) when we could reduce this to a scalar valued field (useful for visualisation too). Later composing with an extension matrix would essentially yield the same blocks of 1s but __matmul__ is probably more efficient than this nested list comprehension... 🤔

N.B: local_masses = S* . S if S denotes the sum from R^E to R.

Useful for operators as well: implement K.zeta, K.delta, ... with scalar values then extend them (allows for concise +-1 representation).

Gibbs Sampling

Add support for Gibbs sampling:

Given an initial condition x0, sweep on vertices i <- I or a <- K to update local states xi or xa. This is done by local samplers of conditional likelihoods p(xi | x~i) or p(xa | x~a) #23

For large N, the temporal distribution of x converges exponentially quickly to the Gibbs distribution of x.

The initial state could be sampled accross local minima of energy, whenever values conflict at intersections.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.