Coder Social home page Coder Social logo

dbscan's Introduction

DBSCAN

My multi-threaded, GPU-enabled implementation of Density-Based Spatial Clustering of Applications with Noise.

The GPU implementation has achieved >20x speedup against de facto SKLearn; multi-threaded CPU implementation is 40% faster than SKLearn with one thread, and scalable with multiple threads.

image info

image info

image info

Requirements

C++17; GCC 7.5; CUDA 10.2; Thrust; pthread.

How to build

  • For the first time: git submodule update --init --recursive

Build main

  • cmake -DCMAKE_BUILD_TYPE=None -Bbuild -H.
    • For cpu-main
      • set environment variable AVX=1 to enable AVX;
      • set environment variable BIT_ADJ=1 if on average, each vertex has more than |V|/64 number of neighbours.
    • For gpu-main
      • Modify gpu/CMakeLists.txt, change the architecture code to fit your hardware.
  • cmake --build build --target cpu-main/gpu-main

Build tests

  • cmake -DCMAKE_BUILD_TYPE=Debug -Bbuild -H.
    • For cpu-test set AVX=1 and BIT_ADJ=1 correspondingly.
    • For gpu-test modify gpu/CMakeLists.txt correspondingly.
  • cmake --build build --target cpu-test/gpu-test

How to run

Existing test data under test_inputs/.

  • [optional] Generate an example using the Python script: python3 generate_dateset.py --n-samples=20000 --cluster-std=3.0

    • 20,000 points with 3.0 stddiv;
  • [optional] Cluster the input and visualize it using Sklearn: python3 dbscan.py --input-name=test_input.txt --eps=0.1 --min-pts=12

    • 0.1 radius and 12 neighbour points for clustering.

CPU algorithm

  • ./build/bin/cpu-main --input=<path_to_input> --eps=<eps> --min-pts=<P>.
    • Append --print to see the cluster ids.
    • Append --num-threads=K to speed up the processing.

GPU algorithm

  • ./build/bin/gpu-main --input=<path_to_input> --eps=<eps> --min-pts=<P>.
    • Append --print to see the cluster ids.

Test data

The query parameters are selected to emulate 10% Noises.

test_input_20k.txt is generated using --cluster-std=2.2 --n-samples=20000, with 4 clusters. Should use --eps=0.15 --min-pts=180 to query.

test_input_50k.txt is generated using --cluster-std=2.2 --n-samples=50000, with 20 clusters. Should use --eps=0.09 --min-pts=110 to query.

test_input_100k.txt is generated using --cluster-std=2.2 --n-samples=100000, with 20 clusters. Should use --eps=0.07 --min-pts=100 to query.

test_input_200k.txt is generated using --cluster-std=2.2 --n-samples=200000, with 20 clusters. Should use --eps=0.05 --min-pts=100 to query.

test_input_300k.txt is generated using --cluster-std=2.2 --n-samples=300000, with 20 clusters. Should use --eps=0.044 --min-pts=100 to query.

test_input_400k.txt is generated using --cluster-std=2.2 --n-samples=400000, with 20 clusters. Should use --eps=0.04 --min-pts=105 to query.

test_input_500k.txt is generated using --cluster-std=2.2 --n-samples=500000, with 20 clusters. Should use --eps=0.035 --min-pts=115 to query.

test_input_600k.txt is generated using --cluster-std=2.2 --n-samples=600000, with 20 clusters. Should use --eps=0.035 --min-pts=120 to query.

test_input_700k.txt is generated using --cluster-std=2.2 --n-samples=600000, with 20 clusters. Should use --eps=0.033 --min-pts=122 to query.

test_input_800k.txt is generated using --cluster-std=2.2 --n-samples=800000, with 20 clusters. Should use --eps=0.03 --min-pts=120 to query.

dbscan's People

Contributors

superliuxz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.