Coder Social home page Coder Social logo

rbffd_gpu's Introduction

git clone ssh://[email protected]/~efb06/repos/rbffd_gpu.git


Needs :q




cd rbffd_gpu
mkdir build
cd build 
cmake -DUSE_CUDA=off .. 






Casiornis (Cascade.msi.umn.edu): 

Modules: 
  1) local                    5) suacct                   9) vtune/2013/update7      13) impi/4.1.0.030/intel    17) vtk/5.4.2
  2) vars                     6) base                    10) zlib/1.2.8              14) fftw/3.3.3-impi-double  18) metis/5.0.2
  3) user                     7) cuda/5.0                11) bzip2/1.0.6-gnu4.8.0    15) intel/2013/update4
  4) moab                     8) cmake/2.8.11            12) boost/1.53.0            16) mkl/11.0.4.183

source $HOME/../shared/cascade_env.sh
 
cmake -DARMADILLO_ROOT=/home/bollige/shared/intel-soft/ -DFFTW_ROOT=/soft/fftw/intel-impi/3.3.3-double -DUSE_ICC=on -DUSE_CUDA=off ..

rbffd_gpu's People

Contributors

bollig avatar erlebach avatar

Stargazers

Amine avatar Nick Fn Blum avatar luke chen avatar  avatar

Watchers

 avatar  avatar James Cloos avatar Daniel Howard avatar Nick Fn Blum avatar

rbffd_gpu's Issues

overlap communication and computation

OpenCL will allow overlap. Need to complete this for

  • hand coded kernels (easy)
  • ViennaCL (hard)
  • CPU (use libnbc to get MPI_ialltoallv support introduced in MPI-v3)

libnbc

Try overlapping CPU and MPI_ialltoallv

Stokes

  • The equations I started with for stokes are wrong
  • Refer to http://www.dealii.org/developer/doxygen/deal.II/step_22.html for a test case we might implement and compare.
  • The deal.ii package was developed over 10+ years. We wont hope to compare in functionality, but we can definitely borrow their logic for solving PDEs and substitute RBF-FD as the numerical method.

METIS

Introduce metis into the workflow to partition node sets rather than my own domain class

  • SHould be able to call in the code, but how is the node set distributed initially (all N nodes on all P Processors?)
  • MIght need to preprocess the grid and load appropriate files at runtime. (change way we load grid to read <grid_name>.)

Node Ordering Tests

  • Test convergence of GMRES vs Node orderings
    • Reorder grid and compute stencils with Matlab (write to disk)
    • Load grid and stencils from file
    • Assemble matrix
    • Solve with GMRES

ILU0 on GPU

Right now the ILU0 is applied on the CPU. We want it to apply on the GPU.

  • ViennaCL 1.3.1 introduced a GPU kernel to apply ILU factors. Test to make sure that ILU0 works
  • It was not essential to have this on the GPU since no one has tested this and other preconditioners on RBF-FD matrices.

Support for CVTs

Need to improve support parameter for CVT meshes. The variation in node locations causes very high error.

  • Using same parameters as MD node sets fails for convergence of 100K, 500K and 1M nodes
  • Parameters based on conditioning. Perhaps we can compute condition number of stencil and adjust epsilon in a loop when computing weights.
  • Parameters are also based on sqrt(N). We could translate the sqrt(N) into the radius of minimum enclosing circle for the stencil. The size of the circle could give us a function for maintaining the same conditioning. This assumes the circle has nodes regularly distributed in some way (CVTs would classify if the density function doesnt vary too much).

METIS

Introduce metis into the workflow to partition node sets rather than my own domain class

  • SHould be able to call in the code, but how is the node set distributed initially (all N nodes on all P Processors?)
  • MIght need to preprocess the grid and load appropriate files at runtime. (change way we load grid to read <grid_name>.)

spear benchmarks

Repeat benchmarks from keeneland on spear. See the impact of different hardware?

  • MPI comm is much slower. Spear seems to have issues with hardware. Will need to contact help.

GFLOPS

measure gflops and present those numbers in plots in addition to speedup. throughput is more meaningful for others who want to know how the gains are influenced by hardware and how they will appear on their systems.

Weak Scaling

Setup a weak scaling test case on sphere. METIS will be necessary

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.