Coder Social home page Coder Social logo

graphjoin's Introduction

General dataset and library setup

How to set-up the C++ Project

The C++ project contains the actual code for GCEA and the querying for Virtuoso, plus some scripts required to prepare the data for the input. Mainly, this project requires to both install the Boost library and to set-up Virtuoso. CMake is used to compile all the project's binaries, that are the following:

  • graph_sampler, which samples an input bigger graph in smaller operand graphs-
  • (re)indexing.cpp, which reindexes the CSV file after the sampling phase in Java (this is just a preparation to the data representation, and does not represent a part of the join algorithm).
  • print_graph.cpp, which prints the serialized graph.
  • serializer, which serializes a graph operand in secondary memory (First and second phase of the graph join algorithm).
  • graph_join (main.cpp), which actually performs the graph join phase.
  • virtuoso_loader.cpp, which loads the operands into virtuoso from CSV files.

How to set-up Virtuoso.

The process of the Virtuoso Setup is described by the script virtuoso_dependencies.sh, which will install all the dependencies and libraries required to connect the C++ code to the actual Virtuoso Driver. After doing that, we need to set-up the ODBC connections that are exploited by the C++ libraries.

If your system has the same default directory paths as Ubuntu (GNU/Linux), then the user can run the cpfiles.sh script in the odbc_setup folder to copy the odbc connection drivers to the right and expected locations. After editing the file virtuoso_setup/virtuoso.ini accordingly to your hardware specification, the script virt_start.sh in the same folder will start the Virtuoso Server. Now, the Virtuoso server should be receiving binary/ODBC connections on the 1111 port, while the web browser interface should be provided on http://localhost:8890/. The default username and password for accessing the Virtuoso Conductor are both dba, for DataBase Administrator.

How to set-up the Java Project

The Java project in the usergenerator folder does not require any additional tool set-up a part from Maven 3, which will automatically download all the dependencies and libraries required for the project. Some of the Java scripts require to set-up the C++ project first.

How to recreate the graph operands (Syntetic Networks)

Run the java class GCEA.DatasetSampleGenerator from the java project in the usergenerator folder. This class automates the operations for generating the different operands and subgraphs from a single adjacency list dataset. In the project folder, we provide the .properties files that we exploited for our experiments for sampling both the Friendster and the Kronecker dataset, as well as updating the CSV files for making those compatible with PostgreSQL loading.

The datasets used for the "On Efficiently Equi-Joining Graphs" paper @IDEAS'21 and generated from this pipeline are available on both FigShare and OSF.

graphjoin's People

Contributors

gyankos avatar jackbergus avatar dependabot[bot] avatar

Watchers

James Cloos avatar

Forkers

jackbergus

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.