Jaccard-ML

Degree-aware kernels for calculating Jaccard Weights, and a script demonstrating the effectiveness of using machine learning for tuning the work distribution on the GPU on calculation efficiency.

In addition to the degree-aware kernels, the repository contains a collection of Jaccard Weight calculation kernels on the CPU and on the GPU.

Jaccard Weights Calculation Executable

Compilation

Compiling the CUDA code is done using CMake.

mkdir build
cd build
cmake ..
make

This will only compile the binning-based kernels (degree-aware kernels), and will assume that graphs are undirected. However, you can modify this behavior using CMake options. For instance, to add to the compiled kernels the CPU kernel, and to tell the executable to assume that graphs are directed, we would do the following:

mkdir build
cd build
cmake .. -D_CPU=ON -D_DIRECTED=ON
make

The following are the available options with their default values:

//Run binning experiment
_BINNING=ON

//Use CPU kernels
_CPU=ON

//use cuGraph
_CUGRAPH=OFF

//Treat graphs as directed
_DIRECTED=OFF

//use dongarra
_DONGARRA=OFF

//Use GPU kernels
_GPU=ON

//use inhouse cuGraph
_INHOUSE_CUGRAPH=OFF

//Simple GPU kernel
_SIMPLE_GPU=OFF

//Edge-based GPU kernel
_SIMPLE_GPU_EDGE=OFF

Usage

After compiling the code, the executable will be located in the build directory with the name jaccard. The executable runs many experiments and reports their timings.

The executable takes two required parameters and three optional parameters.

-i: input graph edge list file (required)

Path to the input graph edge list file. Each line in the file comprises an edge in the graph. It must contain two vertex IDs seperated by whitespace. Example:

-e: JSON file containing binning experiment parameters (required)

Path to the JSON file containing the experimentation parameters used with the binning kernels. A sample JSON file is given in parameters/experiment.json. Each file must contain a list called ranges containing the edge boundaries at which bins are seperated. In addition, it will contain an object for each different binning kernel that will be executed. In each one of these objects, lists g and a must be given, which define the number of threads calculating a single Jaccard weight value (search group size), and the number of thread teams calculating Jaccard weights for a single vertex (search assembly size; number of search groups for each vertex), respectively.

-a: number of averages

The number of times each experiment is run. The output timing of each experiment is the average of all its runs.

--o: output JSON file containing the experiment timing results

A JSON file containing the timing results of all the experiments executed.

Important note: the actual Jaccard Weights are stored in the file {INPUT_GRAPH}.corr.bin and it is presented as a binary array of values. The ith element in the array is the Jaccard Weight of the ith edge in the CSR representation of the graph.

Machine Learning model

We supply a python script and an iPython notebook that train and evaluate a random forest machine learning model to predict execution parameters. These scripts are only used for evaluating the quality of the ML model, and don't actually interface with the CUDA executable described in the previous section.

Usage

Both the Python script and the iPython notebook are used in exactly the same way:

Place a set of JSON outputs generated by the CUDA executable in a folder; these experiments will act as the training set for the model.
Place a second set of JSON outputs generated by the CUDA executable in a different folder; these experiments will be the test set.
Change the values of the variables train_folder and test_folder in the script/notebook to reflect the locations of these folders.
Execute the scripts and observe the results. It will train the model using the training set, and predict the correct execution parameters for the test set. It will report the best time possible for each graph, the time it would take that same graph using the predicted parameters, and the loss in speed in %.

Acknowledgements

We use the argument parsing code written by Jesse Laning located here under the Apache-2.0 license.

moghun / jaccard-ml Goto Github PK

jaccard-ml's Introduction

Jaccard-ML

Jaccard Weights Calculation Executable

Compilation

Usage

Machine Learning model

Usage

Acknowledgements

jaccard-ml's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent