Coder Social home page Coder Social logo

Comments (6)

timlrx avatar timlrx commented on June 8, 2024 1

Nope, it was my bad experience using spark for network algorithms that motivated this benchmark to find a better alternative. My hunch is that it would be at networkx level of performance and even worse if distributed.

If you are mainly working with the graph structure itself, I find it's often beneficial to put the graph in memory using one of these libraries and have an external (possibly distributed) storage for the meta information.

from graph-benchmarks.

timlrx avatar timlrx commented on June 8, 2024 1

It's definitely worth trying with graph-tool. It uses the C++ Boost library internally and should be on par with NetworkKit in terms of memory usage. It's also pretty easy to install the package and try it out!

Distributed graph computation is a hard task and I try to avoid spark as much as possible.

from graph-benchmarks.

timlrx avatar timlrx commented on June 8, 2024

I typically use graph-tool (python) or lightgraphs (julia) and both are fine memory wise. Lightgraph even has a squash method that returns the smallest int type required to represent the graph. Networkx is less efficient especially if you are using a multigraph as it uses a dict structure instead of a adjacency list (if I recall correctly).

from graph-benchmarks.

yunshiuan avatar yunshiuan commented on June 8, 2024

Thank you! This is very helpful :) I’ll check them out. Btw, have you considered including GraphX in the scope? Even with one single machine, the Spark-based package might do a good job at parallelizing the computation.

from graph-benchmarks.

yunshiuan avatar yunshiuan commented on June 8, 2024

Thank you for sharing your experiences!

from graph-benchmarks.

carlosg-m avatar carlosg-m commented on June 8, 2024

Amazing work.

Between NetworkKit and Graph-Tool which one do you consider to be more efficient in terms of memory usage?

I have an undirected weighted graph with 50M nodes and 100M edges, I tried several Python libraries and the only library that supported this workload was NetworkKit. The graph takes about 6gb of ram (7gb during creation). My main use case is shortest path queries.

I haven't tried Graph-Tools but if the memory footprint is worse it won't solve my problem, even if the shortest path queries are faster, as your benchmark showed.

I'm using Databricks and a Spark Cluster, I also tried GraphFrames (distributed) with a 5 node cluster, but for shortest paths and most types of queries this lib is trash. All other libs I've tested are running on the cluster's driver machine, since they support multi-threading they're using all cores (8 cores, 28gb ram).

Considering a graph this size, in your opinion is Graph-Tool worth a try?

from graph-benchmarks.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.