Coder Social home page Coder Social logo

cchan / latticeboltzmann Goto Github PK

View Code? Open in Web Editor NEW
6.0 3.0 2.0 7.4 MB

extremely fast LBM fluid simulation based on tiling and warp shuffles

Home Page: https://link.clive.io/gpufluidsim

Python 33.11% Julia 15.37% Cuda 46.87% Shell 0.60% Cython 4.06%

latticeboltzmann's Introduction

Lattice Boltzmann Method

2D lattice boltzmann fluid sim. Achieves 5.7 GLUPS on an RTX 2070, approx. 92% of maximum achievable memory bandwidth. (Another test yields 4k630Hz)

Achieves 14.3 GLUPS / 1.03 TBps on a single A100 80GB GPU, with INNER_TIMESTEPS=6 and BLOCKS_THREADS_TUNE_CONSTANT=12.

Nice things to look at

YouTube videos

yeet

multi object stable vortex street

vortex street

lid-driven tall box vortex

cylinder wake at high speed

incorrect lattice boltzmann cfd that looks really cool

Older version screenshot (top is density, bottom is direction field):

screenshot

Requirements

My setup:

  • AMD Ryzen 3700x
  • Nvidia GeForce RTX 2070 with drivers 460.20 and CUDA 11.2
  • Ubuntu 20.04 on WSL
  • Python 3.7.4 using conda I suspect but cannot test that this will work with much earlier versions / lower specs. (Was previously on 18.04 pure linux, 440.59, 10.2)

To install, just:

Benchmarking

Achieved memory bandwidth is 406GBps, compared to 441GBps achieved in the bandwidthTest CUDA sample, and 506GBps bandwidth SOL (this was overclocked to 7899MHz; stock is 7000MHz/448GBps).

Python profiling:

Run python -m cProfile -s cumtime latticeboltzmann.py | less for perhaps a minute.

Also kcachegrind with it:

  • python -m cProfile -s tottime -o profile_data.pyprof latticeboltzmann.py
  • pyprof2calltree -i profile_data.pyprof -k

Nvidia profiling:

First allow NV usage: echo 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' | sudo tee /etc/modprobe.d/nsight-privilege.conf and reboot.

Then run nv-nsight-cu-cli --target-processes all python latticeboltzmann.py. A few seconds of samples will do.

nvvp is also nice, but you need to sudo apt install openjdk-8-jdk then nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java.

Notes on overclocking:

Using MSI Afterburner, I can get a +200MHz core overclock on my 2070, yielding almost no performance boost. Overclocking the memory has much larger gains; +1100MHz overclock yields a nearly 20% performance boost.

Run the UnifiedMemoryPerf CUDA sample to get a sense of when we start encountering errors.

Notes on nvcc:

Useful command to dump all intermediate products

nvcc -keep -cubin --use_fast_math -O3 -Xptxas -O3,-v -arch sm_75 --extra-device-vectorization --restrict lb_cuda_kernel.cu && cuobjdump -sass lb_cuda_kernel.cubin | grep '\/\*0' > lb_cuda_kernel.sass

Future directions

  • Implemented in python
  • Very vectorized in numpy
  • Javascript in-browser implementation using compute APIs
  • Cython implementation
  • CUDA implementation using pycuda
  • Julia implementation
    • With CUDANative.jl
    • With distributability
  • PyTorch implementation (cf https://github.com/kobejean/tf-cfd?)
    • Failed, pytorch has a 3-4x slowdown :/
  • CUDA
    • Explore better simulation-time memory layouts (morton, tiling, SoA, etc. - unlikely that the display layout is the optimal computational layout)
    • D2Q21 or similar - kinda equivalent to doing two D2Q9 timesteps in one go.
    • D3Q19
    • Write newcurr directly and get rid of the double buffer... or maybe this won't help because memory again? Might get some caching benefits though.
    • Try mixed-precision - implemented; 10% gain at the cost of extreme unphysical viscosity

Resources

latticeboltzmann's People

Contributors

cchan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.