Coder Social home page Coder Social logo

pyculib_sorting's Introduction

Pyculib_sorting

Pyculib_sorting provides simplified interfaces to CUDA sorting libraries. At present it contains a wrapper around:

  • A radix sort implementation from CUB.
  • A segmented sort implementation from ModernGPU

Pyculib_sorting is predominantly used by Pyculib to provide sorting routines.

Requirements

Pyculib_sorting requires the following programs to build and test:

  • Python
  • NVIDIA's nvcc compiler

and the following Python packages

  • pytest
  • Numba

Obtaining the source code

Pyculib_sorting relies on git submodules to access the CUB and ModernGPU source code, to obtain a code base suitable for building the libraries run:

#> git clone https://github.com/numba/pyculib_sorting.git

#> cd pyculib_sorting

#> git submodule update --init

the URL above may be adjusted to use ssh based [email protected]:numba/pyculib_sorting.git as desired.

Building the libraries

To build the libraries run:

#> python build_sorting_libs.py

Testing

Testing uses pytest and is simply invoked with:

#> pytest

Conda build

To create a conda package of Pyculib_sorting, assuming conda-build is installed, run:

#> conda build condarecipe

from the root directory of Pyculib_sorting.

pyculib_sorting's People

Contributors

seibert avatar stuartarchibald avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyculib_sorting's Issues

Compute_20 nvcc error

Downloading and trying to install gives me the following error. I think this is because I'm using CUDA 9, which may have remove support for fermi architectures.

nvcc fatal   : Unsupported gpu architecture 'compute_20'

This was simple enough to fix by making the following change:

diff --git a/build_sorting_libs.py b/build_sorting_libs.py
index 1589b7e..d1f44f6 100644
--- a/build_sorting_libs.py
+++ b/build_sorting_libs.py
@@ -53,7 +53,7 @@ def gencode_flags():
 
     # Concatenate flags
     SM = []
-    SM.append(GENCODE_SM20)
+    #  SM.append(GENCODE_SM20)
     SM.append(GENCODE_SM30)
     SM.append(GENCODE_SM35)
     SM.append(GENCODE_SM37)

More examples ?

Hi,
Could you make more examples to show how to use sorting API ?
Although I read the document, still cannot understand how to use it.
Thanks for reading

Mysterious number: 9362

Hi I implemented a caffe (https://github.com/BVLC/caffe) module which uses segsortpairs_float64() to sort a long numpy array (6422528 elements) consisting of 49-element segments. So there are totally 131072 segments. If I sort the whole array, it always throws CUDA exceptions. Below is the diagnostic information cuda-memcheck outputs:

========= Invalid __global__ read of size 8
=========     at 0x00000788 in /home/xxxxxx/pyculib_sorting/thirdparty/moderngpu/include/kernels/../device/loadstore.cuh:77:void mgpu::KernelSegBlocksortIndices<mgpu::LaunchBoxVT<int=128, int=11, int=0, int=128, int=7, int=0, int=128, int=7, int=0>, bool=1, bool=1, double*, unsigned int*, double*, unsigned int*, mgpu::less<double>>(int=128, int=7, int, int const *, int const , int=0, int=128, int*, int=7)
=========     by thread (96,0,0) in block (256,0,0)
=========     Address 0x104bfdc0300 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x23cd8d]
=========     Host Frame:/usr/lib/pyculib_segsort.so [0x31f61]
=========     Host Frame:/usr/lib/pyculib_segsort.so [0x4fb03]

So I tried to reduce the sorted array size, and eventually found that, if I sort only 9362 segments (9362*49 elements), it runs smoothly. But if I sort 9363 segments, the exception appears.

A simple calculation: 9363 segments are 458787 elements. Considering the keys are double (8-bytes) and the values are uint32 (4-bytes), these elements take up 5505444 bytes, or 5376KB. This is not a huge array, though.

Moreover, if I sort the above whole array in a standalone python script, everything is fine. I guess maybe caffe takes up some GPU RAM? But when the caffe script is running, the used RAM is 5785MiB, less than half of the total GPU RAM (12GB).

Could anyone give hints on what may cause this exception? Then I could dig deeper to find the culprit. Thank you very much.

Before fixing this problem, currently I can circumvent the error by splitting the whole array into 16 larger segments, and sorting each segment with segsortpairs_float64(). But this is not so convenient...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.