Coder Social home page Coder Social logo

jshengyi / psp Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 128 KB

Parallel Subgraph query Processing (source code:"Locality Sensitive Hashing for Optimizing Subgraph Query Processing in Parallel Computing Systems")

License: Apache License 2.0

C++ 91.40% Makefile 0.06% C 8.53%

psp's Introduction

PSP: Parallel Subgraph query Processing

We study how to exploit the inherent parallelism of parallel computing systems for both intraquery and interquery parallelism during subgraph query processing over large graphs. Different from widely used hash-based parallelism techniques, we utilize and extend locality sensitive hashing based on the structures of subgraph queries to merge multiple tasks during a single subgraph query processing or multiple subgraph queries.

Prerequisities

  • Linux
  • GCC 4.8.5 or higher version
  • MPICH3
  • JDK 7 or higher version
  • Hadoop 2.7.7 or higher version

Data Format

The data input into the program needs to be in adjacency list format, as follows:

vertex-ID label \t  neighbor1-ID  neighbor1-label  neighbor2-ID  neighbor2-label  neighbor3-ID   neighbor3-label ...

In the sample_data directory of the project, there are sample data graph and query graph.

If you need to input multiple query graphs, please put multiple query graphs in one file. The id of the vertex of each query graph must be unique, and the query graphs are separated by blank lines. In the query graph file, please use CRLF for line breaks.

Sample:

0	7	3 8 5 1
1	9	2 20 3 8
2	20	1 9
3	8	0 7 1 9
4	95	5 1
5	1	0 7 4 95

6	27	8 5
7	12	8 5
8	5	6 27 7 12

9	27	12 5
10	1	12 5
11	12	12 5
12	5	9 27 10 1 11 12

Configuration

In the ydhdfs2.h file in the $PSP_PATH/system/util ($PSP_PATH is the root directory of the project), you need to specify the hostname and port of the master machine (aka NameNode) of the Hadoop cluster.

// $PSP_PATH/system/util/ydhdfs2.h
hdfsBuilderSetNameNode(bld, "master"); // master machine's hostname, default is master
hdfsBuilderSetNameNodePort(bld, 9000); // port, default is 9000

Compile

In the root directory of the project, execute the following command, you can get an executable file run.

cd app_gmatch
make

Run

Before running the code, please check the following:

  • All prerequisites have been met
  • The ydhdfs2.h file has been updated with your master machine’s hostname, port
  • Data graph and query graph have been converted into adjacency list format
  • The data graph and query graph have been uploaded to HDFS
  • All executable files have been sent to each machine in the cluster (if you want to run on multiple machines )

Running on multiple machines

You can use the mpiexec command to run the program on multiple machines, the command is as follows:

mpiexec -n <N> -f <hosts> ./run <data_graph_path> <query_graph_path> <t>

where N is the number of machines/processes, hosts is the hosts file, data_graph_path is the HDFS path of the data graph, query_graph_path is the HDFS path of the query graph, and t is the number of threads per machine/process.

Running on a single machine

If you only want to test on a single machine, you can execute the following command:

./run <data_graph_path> <query_graph_path> <t>

Results of the run

When you test on sample data, you can get the following results.

Load Time : xxxxx seconds
running
Task Time : xxxxx seconds
the number of matching results for query1 is 7

where Load Time is the time to load data, Task Time is the total time the program runs. For the sample query graph, the id of the root vertex of the query plan is 1 (i.e. 1 in query1), and the number of matching results on the sample data graph is 7. If you input multiple query graphs, please distinguish the number of matching results from different query graphs by the id of the root vertex of the query plan.

psp's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.