Coder Social home page Coder Social logo

moghun / jaccard-ml Goto Github PK

View Code? Open in Web Editor NEW

This project forked from su-hpc/jaccard-ml

0.0 0.0 0.0 302 KB

Jaccard weight calculation with an ML-based work distribution model.

Shell 0.38% C++ 77.73% Python 0.83% Cuda 19.92% CMake 0.20% Jupyter Notebook 0.75% C 0.19%

jaccard-ml's Introduction

Jaccard-ML

Degree-aware kernels for calculating Jaccard Weights, and a script demonstrating the effectiveness of using machine learning for tuning the work distribution on the GPU on calculation efficiency.

In addition to the degree-aware kernels, the repository contains a collection of Jaccard Weight calculation kernels on the CPU and on the GPU.

Jaccard Weights Calculation Executable

Compilation

Compiling the CUDA code is done using CMake.

mkdir build
cd build
cmake ..
make

This will only compile the binning-based kernels (degree-aware kernels), and will assume that graphs are undirected. However, you can modify this behavior using CMake options. For instance, to add to the compiled kernels the CPU kernel, and to tell the executable to assume that graphs are directed, we would do the following:

mkdir build
cd build
cmake .. -D_CPU=ON -D_DIRECTED=ON
make

The following are the available options with their default values:

//Run binning experiment
_BINNING=ON

//Use CPU kernels
_CPU=ON

//use cuGraph
_CUGRAPH=OFF

//Treat graphs as directed
_DIRECTED=OFF

//use dongarra
_DONGARRA=OFF

//Use GPU kernels
_GPU=ON

//use inhouse cuGraph
_INHOUSE_CUGRAPH=OFF

//Simple GPU kernel
_SIMPLE_GPU=OFF

//Edge-based GPU kernel
_SIMPLE_GPU_EDGE=OFF

Usage

After compiling the code, the executable will be located in the build directory with the name jaccard. The executable runs many experiments and reports their timings.

The executable takes two required parameters and three optional parameters.

-i: input graph edge list file (required)

Path to the input graph edge list file. Each line in the file comprises an edge in the graph. It must contain two vertex IDs seperated by whitespace. Example:

0 1
2 1
0 3
4 1

-e: JSON file containing binning experiment parameters (required)

Path to the JSON file containing the experimentation parameters used with the binning kernels. A sample JSON file is given in parameters/experiment.json. Each file must contain a list called ranges containing the edge boundaries at which bins are seperated. In addition, it will contain an object for each different binning kernel that will be executed. In each one of these objects, lists g and a must be given, which define the number of threads calculating a single Jaccard weight value (search group size), and the number of thread teams calculating Jaccard weights for a single vertex (search assembly size; number of search groups for each vertex), respectively.

-a: number of averages

The number of times each experiment is run. The output timing of each experiment is the average of all its runs.

--o: output JSON file containing the experiment timing results

A JSON file containing the timing results of all the experiments executed.

Important note: the actual Jaccard Weights are stored in the file {INPUT_GRAPH}.corr.bin and it is presented as a binary array of values. The ith element in the array is the Jaccard Weight of the ith edge in the CSR representation of the graph.

Machine Learning model

We supply a python script and an iPython notebook that train and evaluate a random forest machine learning model to predict execution parameters. These scripts are only used for evaluating the quality of the ML model, and don't actually interface with the CUDA executable described in the previous section.

Usage

Both the Python script and the iPython notebook are used in exactly the same way:

  1. Place a set of JSON outputs generated by the CUDA executable in a folder; these experiments will act as the training set for the model.
  2. Place a second set of JSON outputs generated by the CUDA executable in a different folder; these experiments will be the test set.
  3. Change the values of the variables train_folder and test_folder in the script/notebook to reflect the locations of these folders.
  4. Execute the scripts and observe the results. It will train the model using the training set, and predict the correct execution parameters for the test set. It will report the best time possible for each graph, the time it would take that same graph using the predicted parameters, and the loss in speed in %.

Acknowledgements

We use the argument parsing code written by Jesse Laning located here under the Apache-2.0 license.

jaccard-ml's People

Contributors

amroaljundi avatar moghun avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.