General dataset and library setup

How to set-up the C++ Project

The C++ project contains the actual code for GCEA and the querying for Virtuoso, plus some scripts required to prepare the data for the input. Mainly, this project requires to both install the Boost library and to set-up Virtuoso. CMake is used to compile all the project's binaries, that are the following:

graph_sampler, which samples an input bigger graph in smaller operand graphs-
(re)indexing.cpp, which reindexes the CSV file after the sampling phase in Java (this is just a preparation to the data representation, and does not represent a part of the join algorithm).
print_graph.cpp, which prints the serialized graph.
serializer, which serializes a graph operand in secondary memory (First and second phase of the graph join algorithm).
graph_join (main.cpp), which actually performs the graph join phase.
virtuoso_loader.cpp, which loads the operands into virtuoso from CSV files.

How to set-up Virtuoso.

The process of the Virtuoso Setup is described by the script virtuoso_dependencies.sh, which will install all the dependencies and libraries required to connect the C++ code to the actual Virtuoso Driver. After doing that, we need to set-up the ODBC connections that are exploited by the C++ libraries.

If your system has the same default directory paths as Ubuntu (GNU/Linux), then the user can run the cpfiles.sh script in the odbc_setup folder to copy the odbc connection drivers to the right and expected locations. After editing the file virtuoso_setup/virtuoso.ini accordingly to your hardware specification, the script virt_start.sh in the same folder will start the Virtuoso Server. Now, the Virtuoso server should be receiving binary/ODBC connections on the 1111 port, while the web browser interface should be provided on http://localhost:8890/. The default username and password for accessing the Virtuoso Conductor are both dba, for DataBase Administrator.

How to set-up the Java Project

The Java project in the usergenerator folder does not require any additional tool set-up a part from Maven 3, which will automatically download all the dependencies and libraries required for the project. Some of the Java scripts require to set-up the C++ project first.

How to recreate the graph operands (Syntetic Networks)

Run the java class GCEA.DatasetSampleGenerator from the java project in the usergenerator folder. This class automates the operations for generating the different operands and subgraphs from a single adjacency list dataset. In the project folder, we provide the .properties files that we exploited for our experiments for sampling both the Friendster and the Kronecker dataset, as well as updating the CSV files for making those compatible with PostgreSQL loading.

The datasets used for the "On Efficiently Equi-Joining Graphs" paper @IDEAS'21 and generated from this pipeline are available on both FigShare and OSF.

gyankos / graphjoin Goto Github PK

graphjoin's Introduction

General dataset and library setup

How to set-up the C++ Project

How to set-up Virtuoso.

How to set-up the Java Project

How to recreate the graph operands (Syntetic Networks)

graphjoin's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent