Coder Social home page Coder Social logo

sparkgcn's Introduction

Graph Convolutional Networks in Apache Spark

Implementation of the GCN(https://arxiv.org/abs/1609.02907) on top of Spark using BigDL 2.0. It is inspired by the initial Keras-based implementation.

It is a pure Scala project that relies on BigDL DLlib and Breeze libraries and can be used in graph processing pipelines based in Spark like GraphX that can be executed in big data clusters.

Cora Example

To see how it works, it is implemented the Cora example that consists in a semi-supervised classification problem where only 140 samples from a set of 2708 are used in the training process. With these labeled nodes, the optimization process calculates a set of weights which can be considered as filter parameters of convolutional layers that are shared across the graph and encode node features and information from connections. For more details see: "Semi-Supervised Classification with Graph Convolutional Networks" (Thomas N. Kipf, Max Welling).

Results

No propagation Model GNC Propagation Model
0.53 Accuracy 0.78 Accuracy

Execution

You can use SBT version >= 1.0(https://www.scala-sbt.org/download.html) to spawn the training process, indicating the propagation function model:

You must also include the number of epochs you want to train the neural network, and the path of the node and edes data from the Cora dataset. They are included in the resources folder.

Example: sbt run 1 200 cora.content cora.cites

This sbt command starts the optimization process and executes the inference to the whole graph. As final result, the accuracy metric is calculated.

You can train the neural network in your spark cluster using spark-submit indicating the main class and the parameters ems.gcn.CoraExample [mode] [epochs] [cora.content] [cora.cites] including the BigDL 2.0 dependency, and the artifact generated with sbt package.

IMPORTANT NOTE:

This project applies the convolution in one Spark Partition, so in case of submitting the application to a Spark cluster you must set the number of cores to 1. This is because in each iteration the convolution is applied to the whole graph and if you use more cores the data will be split across threads and the process will fail.

More work about processing much bigger graphs with graph neural networks using Spark clusters will be added soon in this repository.

sparkgcn's People

Contributors

emartinezs44 avatar dding3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.