Coder Social home page Coder Social logo

neo4j-contrib / neo4j-graph-algorithms Goto Github PK

View Code? Open in Web Editor NEW
763.0 53.0 240.0 42.34 MB

Efficient Graph Algorithms for Neo4j

Home Page: https://github.com/neo4j/graph-data-science/

License: GNU General Public License v3.0

Java 100.00%
graph-algorithms neo4j cypher graph-database graph-analytics

neo4j-graph-algorithms's Introduction

Efficient Graph Algorithms for Neo4j

This libray has been deprecated by the Graph Data Science (GDS) library, now available on our download center or via our github repo at https://github.com/neo4j/graph-data-science/.

This is an update to the graph algorithms library, featuring all of your favorite graph algorithms - and some new ones - plus a new, unified and simplified surface, improvements to the graph loaders, better error messaging, and additional features and workflows to support production scale deployments.

Documentation for the Graph Data Science Library is available here: https://neo4j.com/docs/graph-data-science/current/

please enter all github issues on the new repo

Graph Algorithms Book

Amy Hodler and Mark Needham recently finished writing the O’Reilly Graph Algorithms Book. For a limited time only, you can download your free copy from neo4j.com/graph-algorithms-book

OReilly Graph Algorithms v2 ol1 (1)

neo4j-graph-algorithms's People

Contributors

aliciaframe avatar darthmax avatar davidoliversp2 avatar fbiville avatar frant-hartm avatar jexp avatar jjaderberg avatar knutwalker avatar mats-sx avatar mknblch avatar mneedham avatar moxious avatar pstutz avatar ryguyrg avatar s1ck avatar tomasonjo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neo4j-graph-algorithms's Issues

Fix weight mapping in heavy graph

org.neo4j.graphalgo.core.heavyweight.HeavyGraphTest is currently ignored due to failing tests (weighted iterator / weighted forEach). Fix weight mapping and unignore test.

Custom default weight

The WeightMapping assumes a default weight of 0.0. The number should be configurable per algorithm. Since the defaults aren't stored, this would allow algorithms to reduce memory usage by not storing default weights if they require a different default from 0.0

Turn WeightMapping into interface with two cases

Currently we have conditional checks on any get the see if we have a mapping at all or if we just have to return the default value.
We should turn WeightMapping into an interface with at most two implementations.
One is the current implementation and the second one always returns a default value and never stores.

Floyd Warshall algorithm

I suggest to add Floyd Warshall algorithm to compute the shortest path from many sources to many destinations and to call it directly using APOC Procedures.

Thank you

Allow concurrent access in GraphView

GraphView loads the ReadOperations in the constructor, which bind the graph the the same thread.
We should make it such that it can be used from multiple threads, esp. if we start to implement parallel graph algorithms.

Document graph encoding

We should explain how exactly the LightGraph and HeavyGraph encode the graph and how their internals work.

Load input for Graphs via Cypher

Instead of allocating a List<Node> from Cypher while calling the procedure, we can accept a Cypher statement and run it ourselves, using the far more efficient PLongIterators.

int[] specialized form of LightGraph

Follows #7

For smaller graphs, the LightGraph could use int[] instead of IntArray for the adjacency list and int[] instead of long[] for the offsets, which does reduce memory consumption and one indirection.

GraphView ID handling relies on int values

The GraphView currently relies on node Ids smaller then int.max. We should add some kind of lazy IdMapping if it is considerable to support graphs (or subsets of a graph) with Ids which exceed the int size.

Allow Graphs to skip loading of certain relationships

Not every algorithm needs every dimension of the graph. For example, PageRank only requires incoming relationships on for outgoing ones, it just needs the degree.
We should find an API that allows us to express those requirements, so that we don't have to load and store all outgoing relationships, just their degree.

Change Consumer into Functions to allow premature termination

Since we might decide to get rid of the Iterator-methods (in #29) we should add the possibility to terminate the iteration within a forEach(..) method before all values have been emitted. This could be implemented by changing the Consumer into Functions which return a Boolean that either stops or continues the current iteration.

turn list of graph algorithms into issues with progress tracking

create card for each algorithm track progress in

also note findings, links, ideas from research / reading in each card
and also limitations of the current implementation / alternatives

  • started work / paper reading
  • implement baseline
  • implement procedure
  • test baseline
  • performance test baseline
  • edge case tests
  • parallelization
  • performance test parallelization
  • documentation
  • different APIs / memory structures
  • Implementation of the selected algorithms and exposed them as a Java API and Java Stored Procedure
    • PageRank
    • Label Propagation
    • Louvain
    • Betweenness Centrality
    • Closeness Centrality
    • Degree Centrality
    • Single Shortest Path
    • Strongly, Weakly connected components
    • Parallel BFS / DFS
  • Evaluation
    • Performance Tests (different dataset sizes, regressions, level of concurrency)
    • Performance comparison with other implementations (Spark/Flink/Graphlytic) on same dataset and same level of concurrency
    • Review existing implementations for its functional correctness and performance:
      • A*
      • Dijkstra
      • Shortest Path

issues created

  • PageRank
  • Label Propagation
  • Betweenness Centrality
  • Closeness Centrality
  • Single Shortest Path
  • Strongly connected components
  • Weakly connected components
  • Minimum Spanning Tree
  • Connected Components
  • Parallel BFS / DFS

  • Louvain
  • k-means (for top-n)
  • edge betweenness
  • Bellman-Ford algorithm
  • Floyd-Warshall algorithm
  • Degree Centrality
  • Eigenvector Centrality
  • approximation of the Chromatic Number of a graph
  • approximation of Hamiltonian cycle
  • min cut
  • community detection: Walk trap, infomap
  •  graph metrics

Simplify Matrix representation

We have a large amount of objects in indirections in AdjacencyMatrix. We should reduce the overhead by finding a better adjacency encoding.

Faster Array.fill for large arrays

Some arrays are allocated and then pre-filled with a default value. Array.fill does a naïve linear iteration over all array indices and sets each element, which can become quite inefficient for large arrays. As this happens mostly for arrays that store something pre-node, we have a linear dependency on the number of nodes that we can strive to eliminate.

  • fill in batches by arraycopying from pre-filled arrays
  • check if we can maybe rewrite some algorithms to make use of the system default value, instead of having to use a custom default value

Move Fileloaders back to core module

Loading and writing graph into a file serialization might be useful for others besides our tests

  • Move every loader to the core module
  • Replace reflective access with package-private access
  • document accessors and constructors, why they are package private

Consider IdMapping starting at 1 (0 exclusive)

In our current approach the Id-mapping returns intergers starting at 0. Yet there is often the case where nodeId-arrays have to be initialized with some kind of start value. An 1-based mapping could save us some initialization loops.

Smaller and fixed batch sizes for parallel imports

The current batchSize is nodeCount / nrOfThreads. It would be better to use a fixed batch size like 10k oder 100k.

  • Better work stealing possibilities, if one batch contains mostly deleted nodes
  • More predictable resource usage, temporary arrays could be reused for multiple batches

JMH recording

run JMH benchmarks with csv output or something similar machine-parsable. Dump results over time to someplace.

Autogrowing array in RelationContainer

Currently we have to initialize the RelationContainer.Builder with the degree. Add a logic which grows the relationship array on demand. Also check if growing is an option for the parent (data) array too.

First clustering graph algorithm

Something that's useful / useable, like label-prop or union-find, I leave the choice up to you.

Reasons:

  • we have a practical use-case that would benefit from it
  • we want to exercise the graph-API also from different kinds of algorithms

One relevant feature would be to consider the "weight" property in a relationship for the "strength" of the connection to a cluster.

As a simple solution to start with, could be to filter relationships to consider "weight" as a filter, e.g. only consider relationship-input that exceeds a certain weight at all.

General-purpose, power procedure

This general purpose procedure allows loading the graph once and then allows multiple, differently configured algorithms on top of it, e.g. also page-rank with different configs or page-rank and clustering

procedure pipelines

We can also consider one algorithm feeding into the next. E.g. the first-page-rank is not (just) persisted into the graph but immediately (with the in-memory computed values) taken into account for the next algorithm (centrality or clustering)

Thread safe WeightMap

The current WeightMap is not threadsafe for write access. Evaluate int->double / int->int backed mapping logic. To implement this we first need some kind of mapping between the long-relationship ids and their inner representation

Revise primitive collections

We should try to use Neo4j's primitive collections where possible and document and explain, when we use a different collection.
Where we settle on a third-party collection, we can think about changing to algorithm to be able to use a Neo4j collections instead, or PR a change to the Neo4j collections.

Considerations on weights

So far we assumed edge weights to be double values in [0, 1) and we have a basic mapper logic for turning arbritary property objects into doubles. We also have to duplicate the relationship-iterator ifaces for the weighted versions. This make the handling of weights very inflexible. With the api2 approach we could consider to switch over to a weight-datasource instead of haven them bound to the iterator.

Since we consider one property per relation and one relation between a pair of nodes we could use the mapped-nodeIds as key-pair.

I'd suggest something like this

wheightOf(sourceNodeId:int, targetNodeId:int):double

This would reduce the amount of different ifaces and implementations. We could also have different impl. for the wheight-source with their own characteristics.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.