Coder Social home page Coder Social logo

torchswe's Introduction

TorchSWE: GPU shallow-water equation solver

TorchSWE is a simple parallel (MPI & GPU) SWE solver supporting several different backends: CuPy, PyTorch, and Legate NumPyโ€ .

The MPI support is done through mpi4py and a simple domain decomposition algorithm. For multi-GPU settings (either multiple GPUs on a single node or across a cluster), only MPI + CuPy have been tested. For regular CPU clusters, use MPI + NumPy. For single GPU, both CuPy and PyTorch work fine. Also, PyTorch provides a shared-memory parallelization for a single CPU computing node.

Note
โ€  Legate NumPy backend has been removed from the master branch due to incompatibility with MPI. Also, Legate NumPy lacks some required features at its current stage, making it non-trivial to maintain Legate-compatible code. Therefore, the last version supporting Legate NumPy has been archived to release v0.1.

Installation


Dependencies can be installed using Anaconda. For example, to create a new Anaconda environment that is called torchswe and has all backends (assuming now we are under the top-level directory of this repository):

$ conda env create -n torchswe -f conda/torchswe.yml

Next, source into the environment:

$ conda activate torchswe

or

$ source ${CONDA_PREFIX}/bin/activate torchswe

Then install TorchSWE with pip:

$ pip install .

It installs an executable, TorchSWE.py, to the bin directory of this Anaconda environment.

Following the above workflow, the MPI backend will be OpenMPI and is CUDA-aware. If a user wants to use MPICH and multiple GPUs, the user may have to build MPICH from scratch. (The MPICH package from Anaconda's conda-forge channel does not support CUDA.) Also, to use MPICH, it's necessary to use MPICH-compatible netcd4. (For example, if using Anaconda, do $ conda install -c conda-forge "netcdf4=*=mpi_mpich*".)

The Anaconda environment created using torchswe.yml does not have dependencies for post-processing/visualizing the results of example cases. These dependencies include matplotlib and pyvista. Users can install them separately or, alternatively, create the Anaconda environment with development.yml.

Example cases


Example cases are under the folder cases.

Usage


To see help

$ TorchSWE.py --help

To run a case:

  • using MPI + NumPy (assuming already in a case folder)

    $ mpiexec -n <number of processes> --mca fcoll "^vulcan" TorchSWE.py ./
    

    Note that, as of OpenMPI 4.1.1, we haven't been able to use components vulcan for the framework fcoll. It fails the parallel HDF5 I/O with compression in our code. So we must use the flag --mca fcoll "^vulcan" to disable this component. By disabling this component, OpenMPI runtime will pick other available components for fcoll.

  • using MPI + CuPy (assuming already in a case folder)

    $ USE_CUPY=1 mpiexec \
          -n <number of processes> \
          --mca fcoll "^vulcan" \
          --mca opal_cuda_support 1 \
          TorchSWE.py ./
    

    Note that using --mca opal_cuda_support 1 is required if OpenMPI is installed through Anaconda. The OpenMPI from Anaconda is built with CUDA suppoer but does not enable it by default.

    When multiple GPUs are available on a compute node, the code assigns GPUs based on local ranks (local to the compute node), not the global rank. The number of processes (i.e., ranks) does not have to be the same as the number of available GPUs. If the number of processes is higher than that of GPUs, multiple ranks will share GPUs. Performance penalty, however, may apply in this case.

  • using PyTorch (assuming already in a case folder)

    $ USE_TORCH=1 TorchSWE.py ./
    

    The MPI support of PyTorch has not been tested at all. So currently, it's better to use only one GPU when using PyTorch backend.

  • using PyTorch's shared-memory CPU backend

    $ USE_TORCH=1 TORCH_USE_CPU=1 TorchSWE.py ./
    

    This runs the solver with shared-memory parallelization from PyTorch and is hence only available when using one computing node.

torchswe's People

Contributors

piyueh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

torchswe's Issues

Some dependencies' newer versions break the code

Haven't had time to check which dependencies break the code. For those who just need a working environment, use the following YAML to create the environment:

channels:
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - alsa-lib=1.2.3.2=h166bdaf_0
  - aom=3.3.0=h27087fc_1
  - appdirs=1.4.4=pyh9f0ad1d_0
  - argon2-cffi=21.3.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py39hb9d737c_2
  - astroid=2.11.4=py39hf3d152e_0
  - asttokens=2.0.5=pyhd8ed1ab_0
  - attrs=21.4.0=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - beautifulsoup4=4.11.1=pyha770c72_0
  - bleach=5.0.0=pyhd8ed1ab_0
  - brotli=1.0.9=h166bdaf_7
  - brotli-bin=1.0.9=h166bdaf_7
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2022.6.15=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - certifi=2022.6.15=py39hf3d152e_0
  - cffi=1.15.0=py39h4bc2ebd_0
  - cftime=1.6.0=py39hd257fcd_1
  - colorama=0.4.4=pyh9f0ad1d_0
  - cudatoolkit=11.5.1=hcf5317a_10
  - cudnn=8.2.1.32=h86fa8c9_0
  - cupy=10.4.0=py39hc3c280e_0
  - curl=7.83.0=h2283fc2_0
  - cutensor=1.5.0.3=h12f7317_2
  - cycler=0.11.0=pyhd8ed1ab_0
  - dbus=1.13.6=h5008d03_3
  - debugpy=1.6.0=py39h5a03fae_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - dill=0.3.4=pyhd8ed1ab_0
  - double-conversion=3.2.0=h9c3ff4c_0
  - eigen=3.4.0=h4bd325d_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - executing=0.8.3=pyhd8ed1ab_0
  - expat=2.4.8=h27087fc_0
  - fastrlock=0.8=py39h5a03fae_2
  - ffmpeg=4.4.1=hd7ab26d_2
  - flake8=4.0.1=pyhd8ed1ab_2
  - flit-core=3.7.1=pyhd8ed1ab_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=hab24e00_0
  - fontconfig=2.14.0=h8e229c2_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - fonttools=4.33.3=py39hb9d737c_0
  - freetype=2.10.4=h0708190_1
  - gettext=0.19.8.1=h73d1719_1008
  - giflib=5.2.1=h36c2ea0_2
  - gl2ps=1.4.2=h0708190_0
  - glew=2.1.0=h9c3ff4c_2
  - gmp=6.2.1=h58526e2_0
  - gnutls=3.6.13=h85f3911_1
  - gst-plugins-base=1.20.2=hcf0ee16_0
  - gstreamer=1.20.2=hd4edc92_0
  - h5py=3.6.0=mpi_openmpi_py39hb889842_0
  - hdf4=4.2.15=h10796ff_3
  - hdf5=1.12.1=mpi_openmpi_h41b9b70_4
  - icu=69.1=h9c3ff4c_0
  - imageio=2.19.0=pyhcf75d05_1
  - importlib-metadata=4.11.3=py39hf3d152e_1
  - importlib_resources=5.7.1=pyhd8ed1ab_0
  - iniconfig=1.1.1=pyh9f0ad1d_0
  - ipycanvas=0.12.0=pyhd8ed1ab_0
  - ipyevents=2.0.1=pyhd8ed1ab_0
  - ipykernel=6.13.0=py39hef51801_0
  - ipython=8.3.0=py39hf3d152e_0
  - ipython_genutils=0.2.0=py_1
  - ipyvtklink=0.2.2=pyhd8ed1ab_0
  - ipywidgets=7.7.0=pyhd8ed1ab_0
  - isort=5.10.1=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jedi=0.18.1=py39hf3d152e_1
  - jinja2=3.1.2=pyhd8ed1ab_0
  - jpeg=9e=h166bdaf_1
  - jsoncpp=1.9.5=h4bd325d_1
  - jsonschema=4.5.1=pyhd8ed1ab_0
  - jupyter=1.0.0=py39hf3d152e_7
  - jupyter_client=7.3.1=pyhd8ed1ab_0
  - jupyter_console=6.4.3=pyhd8ed1ab_0
  - jupyter_core=4.9.2=py39hf3d152e_0
  - jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
  - jupyterlab_widgets=1.1.0=pyhd8ed1ab_0
  - keyutils=1.6.1=h166bdaf_0
  - kiwisolver=1.4.2=py39hf939315_1
  - krb5=1.19.3=h08a2579_0
  - lame=3.100=h7f98852_1001
  - lazy-object-proxy=1.7.1=py39hb9d737c_1
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - lerc=3.0=h9c3ff4c_0
  - libblas=3.9.0=14_linux64_openblas
  - libbrotlicommon=1.0.9=h166bdaf_7
  - libbrotlidec=1.0.9=h166bdaf_7
  - libbrotlienc=1.0.9=h166bdaf_7
  - libcblas=3.9.0=14_linux64_openblas
  - libcbor=0.9.0=h9c3ff4c_0
  - libclang=13.0.1=default_hc23dcda_0
  - libcurl=7.83.0=h2283fc2_0
  - libdeflate=1.10=h7f98852_0
  - libdrm=2.4.109=h7f98852_0
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libevent=2.1.10=h28343ad_4
  - libffi=3.4.2=h7f98852_5
  - libfido2=1.11.0=he49606f_0
  - libgcc-ng=12.1.0=h8d9b700_16
  - libgfortran-ng=11.2.0=h69a702a_16
  - libgfortran5=11.2.0=h5c6108e_16
  - libglib=2.70.2=h174f98d_4
  - libglu=9.0.0=he1b5a44_1001
  - libgomp=12.1.0=h8d9b700_16
  - libiconv=1.16=h516909a_0
  - liblapack=3.9.0=14_linux64_openblas
  - libllvm13=13.0.1=hf817b99_2
  - libnetcdf=4.8.1=mpi_openmpi_he7012b2_2
  - libnghttp2=1.47.0=he49606f_0
  - libnsl=2.0.0=h7f98852_0
  - libogg=1.3.4=h7f98852_1
  - libopenblas=0.3.20=pthreads_h78a6416_0
  - libopus=1.3.1=h7f98852_1
  - libpciaccess=0.16=h516909a_0
  - libpng=1.6.37=h21135ba_2
  - libpq=14.2=h676c864_0
  - libsodium=1.0.18=h36c2ea0_1
  - libssh2=1.10.0=ha35d2d1_2
  - libstdcxx-ng=12.1.0=ha89aaad_16
  - libtheora=1.1.1=h7f98852_1005
  - libtiff=4.3.0=h542a066_3
  - libudev1=249=h7f98852_1
  - libuuid=2.32.1=h7f98852_1000
  - libva=2.14.0=h7f98852_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libvpx=1.11.0=h9c3ff4c_3
  - libwebp=1.2.2=h3452ae3_0
  - libwebp-base=1.2.2=h7f98852_1
  - libxcb=1.13=h7f98852_1004
  - libxkbcommon=1.0.3=he3ba5ed_0
  - libxml2=2.9.12=h885dcf4_1
  - libzip=1.8.0=h1c5bbd1_1
  - libzlib=1.2.11=h166bdaf_1014
  - loguru=0.6.0=py39hf3d152e_1
  - lz4-c=1.9.3=h9c3ff4c_1
  - markupsafe=2.1.1=py39hb9d737c_1
  - matplotlib=3.5.2=py39hf3d152e_0
  - matplotlib-base=3.5.2=py39h700656a_0
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mccabe=0.6.1=py_1
  - mistune=0.8.4=py39h3811e60_1005
  - mpi=1.0=openmpi
  - mpi4py=3.1.3=py39h5418507_1
  - munkres=1.1.4=pyh9f0ad1d_0
  - mysql-common=8.0.29=h26416b9_0
  - mysql-libs=8.0.29=hbc51c84_0
  - nbclient=0.6.2=pyhd8ed1ab_0
  - nbconvert=6.5.0=pyhd8ed1ab_0
  - nbconvert-core=6.5.0=pyhd8ed1ab_0
  - nbconvert-pandoc=6.5.0=pyhd8ed1ab_0
  - nbformat=5.4.0=pyhd8ed1ab_0
  - nccl=2.12.10.1=h0800d71_0
  - ncurses=6.3=h27087fc_1
  - nest-asyncio=1.5.5=pyhd8ed1ab_0
  - netcdf4=1.5.8=mpi_openmpi_py39h957639f_1
  - nettle=3.6=he412f7d_0
  - notebook=6.4.11=pyha770c72_0
  - nspr=4.32=h9c3ff4c_1
  - nss=3.77=h2350873_0
  - numpy=1.23.0=py39hba7629e_0
  - openh264=2.1.1=h780b84a_0
  - openjpeg=2.4.0=hb52868f_1
  - openmpi=4.1.3=h846660c_103
  - openssh=9.0p1=h67c24c5_0
  - openssl=3.0.5=h166bdaf_0
  - orjson=3.6.8=py39hb9d737c_0
  - packaging=21.3=pyhd8ed1ab_0
  - pandoc=2.18=ha770c72_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - pcre=8.45=h9c3ff4c_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pillow=9.1.0=py39hae2aec6_2
  - pip=22.0.4=pyhd8ed1ab_0
  - platformdirs=2.5.1=pyhd8ed1ab_0
  - pluggy=1.0.0=py39hf3d152e_3
  - proj=9.0.0=h93bde94_1
  - prometheus_client=0.14.1=pyhd8ed1ab_0
  - prompt-toolkit=3.0.29=pyha770c72_0
  - prompt_toolkit=3.0.29=hd8ed1ab_0
  - psutil=5.9.0=py39hb9d737c_1
  - pthread-stubs=0.4=h36c2ea0_1001
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pugixml=1.11.4=h9c3ff4c_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - py=1.11.0=pyh6c4a22f_0
  - pycodestyle=2.8.0=pyhd8ed1ab_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pydantic=1.9.0=py39hb9d737c_1
  - pyflakes=2.4.0=pyhd8ed1ab_0
  - pygments=2.12.0=pyhd8ed1ab_0
  - pylint=2.13.8=pyhd8ed1ab_0
  - pyparsing=3.0.8=pyhd8ed1ab_0
  - pyqt=5.12.3=py39hf3d152e_8
  - pyqt-impl=5.12.3=py39hde8b62d_8
  - pyqt5-sip=4.19.18=py39he80948d_8
  - pyqtchart=5.12=py39h0fcd23e_8
  - pyqtwebengine=5.12.1=py39h0fcd23e_8
  - pyrsistent=0.18.1=py39hb9d737c_1
  - pytest=7.1.2=py39hf3d152e_0
  - python=3.9.12=h2660328_1_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-fastjsonschema=2.15.3=pyhd8ed1ab_0
  - python_abi=3.9=2_cp39
  - pyvista=0.34.0=pyhd8ed1ab_0
  - pyyaml=6.0=py39hb9d737c_4
  - pyzmq=22.3.0=py39headdf64_2
  - qt=5.12.9=h1304e3e_6
  - qtconsole=5.3.0=pyhd8ed1ab_0
  - qtconsole-base=5.3.0=pyhd8ed1ab_0
  - qtpy=2.1.0=pyhd8ed1ab_0
  - readline=8.1=h46c0cb4_0
  - scipy=1.8.0=py39hee8e79c_1
  - scooby=0.5.12=pyhd8ed1ab_0
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setuptools=62.1.0=py39hf3d152e_0
  - six=1.16.0=pyh6c4a22f_0
  - soupsieve=2.3.1=pyhd8ed1ab_0
  - sqlite=3.38.5=h4ff8645_0
  - stack_data=0.2.0=pyhd8ed1ab_0
  - svt-av1=0.9.1=h27087fc_0
  - tbb=2021.5.0=h924138e_1
  - tbb-devel=2021.5.0=h924138e_1
  - terminado=0.13.3=py39hf3d152e_1
  - tinycss2=1.1.1=pyhd8ed1ab_0
  - tk=8.6.12=h27826a3_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - tornado=6.1=py39hb9d737c_3
  - traitlets=5.1.1=pyhd8ed1ab_0
  - typing-extensions=4.2.0=hd8ed1ab_1
  - typing_extensions=4.2.0=pyha770c72_1
  - tzdata=2022a=h191b570_0
  - ucx=1.12.1=h7a399c7_1
  - unicodedata2=14.0.0=py39hb9d737c_1
  - utfcpp=3.2.1=ha770c72_0
  - vtk=9.1.0=qt_py39hd359688_207
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - wheel=0.37.1=pyhd8ed1ab_0
  - widgetsnbextension=3.6.0=py39hf3d152e_0
  - wrapt=1.14.1=py39hb9d737c_0
  - x264=1!161.3030=h7f98852_1
  - x265=3.5=h924138e_3
  - xorg-fixesproto=5.0=h7f98852_1002
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libice=1.0.10=h7f98852_0
  - xorg-libsm=1.2.3=hd9c2040_1000
  - xorg-libx11=1.7.2=h7f98852_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h7f98852_1
  - xorg-libxfixes=5.0.3=h7f98852_1004
  - xorg-libxt=1.2.1=h7f98852_2
  - xorg-xextproto=7.3.0=h7f98852_1002
  - xorg-xproto=7.0.31=h7f98852_1007
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h7f98852_2
  - zeromq=4.3.4=h9c3ff4c_1
  - zipp=3.8.0=pyhd8ed1ab_0
  - zlib=1.2.11=h166bdaf_1014
  - zstd=1.5.2=ha95c52a_0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.