Coder Social home page Coder Social logo

heffte's People

Contributors

ahojukka5 avatar ax3l avatar dalg24 avatar dannys4 avatar g-ragghianti avatar junghans avatar mabraham avatar maetveis avatar memmett avatar mkstoyanov avatar philipfackler avatar sfogerty avatar stomov avatar streeve avatar vmontanaro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

heffte's Issues

C++-20 support

Check the conflict between C++-20 and the stock backend aligned vector (and some of the type aliases).

possible issue in use of oneAPI DFT

I'm working at Intel using HeFFTe for the distributed 3D-FFT used in GROMACS (see https://gitlab.com/gromacs/gromacs/-/blob/main/src/gromacs/fft/gpu_3dfft_heffte.cpp). HeFFTe's unit tests are failing on PVC platforms, even when using only a single rank and device. The failure mode is not uniformly reproducible, which naturally suggests that the device memory usage is inconsistent with the synchronization.

Since I was interested in single-precision R2C 3D-FFT, I focused on the failures I see in test_fft3d_r2c. Sometimes I see a nearly correct result but the first 20 reals in the output buffer are wrong. Since the box is {3,4,5} and 4*5==20, I think there is some wrong indexing going on. If I refactor heffte::onemkl_executor_r2c::forward() and ...backward() so that the q.wait() calls move into the for loop over blocks (see https://github.com/icl-utk-edu/heffte/blob/master/include/heffte_backend_oneapi.h#L502), then I observe correct results. The code then looks like

    //! \brief Forward transform, single precision.
    void forward(float const indata[], std::complex<float> outdata[], std::complex<float>*) const override{
        if (not init_splan) make_plan(splan);
        for(int i=0; i<blocks; i++)
        {
            oneapi::mkl::dft::compute_forward(splan, const_cast<float*>(indata + i * rblock_stride), reinterpret_cast<float*>(outdata + i * cblock_stride))\
;
            q.wait();
        }
    }

If those per-block calls to compute_forward etc. have distinct memory footprints then the extra synchronization should not help. So I think the memory footprints for each block do overlap, perhaps for the same reason I see wrong results in the first 20 reals of the output buffer. Perhaps blocks are running concurrently and the previous block is over-running its bounds and affecting the calculation of the current block?

I can't see any obvious indexing errors, but I'm not a DFT expert. Do you see anything in the code that would work better? Is there anything you want me to test for you?

FFTW link error on M1

Hi try to build heffte (heffte-2.2.0) on my mac (13.1) with ffw (3.3.10). I use Apple clang version 14.0.3.
The FFTW installation is fine and my build scripts are working fine on linux.

Here my build:

bash-3.2$ CXX=clang CC=clang cmake -D CMAKE_BUILD_TYPE=Release -D BUILD_SHARED_LIBS=ON -D CMAKE_INSTALL_PREFIX=$HOME/opal -D Heffte_ENABLE_AVX=OFF -D Heffte_ENABLE_FFTW=ON -D FFTW_ROOT=$HOME/opal -D Heffte_ENABLE_CUDA=OFF ..
fatal: not a git repository (or any of the parent directories): .git

-- heFFTe 2.2.0
-- -D CMAKE_INSTALL_PREFIX=/Users/adelmann/opal
-- -D BUILD_SHARED_LIBS=ON
-- -D CMAKE_BUILD_TYPE=Release
-- -D CMAKE_CXX_FLAGS_RELEASE=-O3 -DNDEBUG
-- -D CMAKE_CXX_FLAGS=
-- -D MPI_CXX_COMPILER=/Users/adelmann/OPAL/bin/mpicxx
-- -D MPI_CXX_COMPILE_OPTIONS=
-- -D Heffte_ENABLE_FFTW=ON
-- -D Heffte_ENABLE_MKL=OFF
-- -D Heffte_ENABLE_CUDA=OFF
-- -D Heffte_ENABLE_ROCM=OFF
-- -D Heffte_ENABLE_ONEAPI=OFF
-- -D Heffte_ENABLE_AVX=OFF
-- -D Heffte_ENABLE_AVX512=OFF
-- -D Heffte_ENABLE_PYTHON=OFF
-- -D Heffte_ENABLE_FORTRAN=OFF
-- -D Heffte_ENABLE_TRACING=OFF

-- Setting Heffte INSTALL_RPATH = /Users/adelmann/OPAL/lib;/Users/adelmann/opal/lib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/adelmann/opal/tmp/src/heffte
bash-3.2$ make VERBOSE=1
/opt/local/bin/cmake -S/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0 -B/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build --check-build-system CMakeFiles/Makefile.cmake 0
/opt/local/bin/cmake -E cmake_progress_start /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/CMakeFiles /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build//CMakeFiles/progress.marks
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/Makefile2 all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/Heffte.dir/build.make CMakeFiles/Heffte.dir/depend
cd /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build && /opt/local/bin/cmake -E cmake_depends "Unix Makefiles" /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0 /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0 /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/CMakeFiles/Heffte.dir/DependInfo.cmake --color=
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f CMakeFiles/Heffte.dir/build.make CMakeFiles/Heffte.dir/build
[ 2%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_c.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/include -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/include -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -std=c++11 -MD -MT CMakeFiles/Heffte.dir/src/heffte_c.cpp.o -MF CMakeFiles/Heffte.dir/src/heffte_c.cpp.o.d -o CMakeFiles/Heffte.dir/src/heffte_c.cpp.o -c /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/src/heffte_c.cpp
[ 4%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_plan_logic.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/include -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/include -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -std=c++11 -MD -MT CMakeFiles/Heffte.dir/src/heffte_plan_logic.cpp.o -MF CMakeFiles/Heffte.dir/src/heffte_plan_logic.cpp.o.d -o CMakeFiles/Heffte.dir/src/heffte_plan_logic.cpp.o -c /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/src/heffte_plan_logic.cpp
[ 6%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_magma_helpers.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/include -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/include -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -std=c++11 -MD -MT CMakeFiles/Heffte.dir/src/heffte_magma_helpers.cpp.o -MF CMakeFiles/Heffte.dir/src/heffte_magma_helpers.cpp.o.d -o CMakeFiles/Heffte.dir/src/heffte_magma_helpers.cpp.o -c /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/src/heffte_magma_helpers.cpp
[ 8%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_reshape3d.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/include -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/include -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -std=c++11 -MD -MT CMakeFiles/Heffte.dir/src/heffte_reshape3d.cpp.o -MF CMakeFiles/Heffte.dir/src/heffte_reshape3d.cpp.o.d -o CMakeFiles/Heffte.dir/src/heffte_reshape3d.cpp.o -c /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/src/heffte_reshape3d.cpp
[ 11%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_fft3d.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/include -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/include -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -std=c++11 -MD -MT CMakeFiles/Heffte.dir/src/heffte_fft3d.cpp.o -MF CMakeFiles/Heffte.dir/src/heffte_fft3d.cpp.o.d -o CMakeFiles/Heffte.dir/src/heffte_fft3d.cpp.o -c /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/src/heffte_fft3d.cpp
[ 13%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_fft3d_r2c.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/include -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/include -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -std=c++11 -MD -MT CMakeFiles/Heffte.dir/src/heffte_fft3d_r2c.cpp.o -MF CMakeFiles/Heffte.dir/src/heffte_fft3d_r2c.cpp.o.d -o CMakeFiles/Heffte.dir/src/heffte_fft3d_r2c.cpp.o -c /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/src/heffte_fft3d_r2c.cpp
[ 15%] Linking CXX static library libheffte.a
/opt/local/bin/cmake -P CMakeFiles/Heffte.dir/cmake_clean_target.cmake
/opt/local/bin/cmake -E cmake_link_script CMakeFiles/Heffte.dir/link.txt --verbose=1
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar qc libheffte.a CMakeFiles/Heffte.dir/src/heffte_c.cpp.o CMakeFiles/Heffte.dir/src/heffte_plan_logic.cpp.o CMakeFiles/Heffte.dir/src/heffte_magma_helpers.cpp.o CMakeFiles/Heffte.dir/src/heffte_reshape3d.cpp.o CMakeFiles/Heffte.dir/src/heffte_fft3d.cpp.o CMakeFiles/Heffte.dir/src/heffte_fft3d_r2c.cpp.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib libheffte.a
[ 15%] Built target Heffte
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f benchmarks/CMakeFiles/speed3d_c2c.dir/build.make benchmarks/CMakeFiles/speed3d_c2c.dir/depend
cd /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build && /opt/local/bin/cmake -E cmake_depends "Unix Makefiles" /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0 /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/benchmarks /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/benchmarks /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/benchmarks/CMakeFiles/speed3d_c2c.dir/DependInfo.cmake --color=
/Applications/Xcode.app/Contents/Developer/usr/bin/make -f benchmarks/CMakeFiles/speed3d_c2c.dir/build.make benchmarks/CMakeFiles/speed3d_c2c.dir/build
[ 17%] Building CXX object benchmarks/CMakeFiles/speed3d_c2c.dir/speed3d_c2c.cpp.o
cd /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/benchmarks && /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/benchmarks/../test -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/include -I/Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/include -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -std=gnu++11 -MD -MT benchmarks/CMakeFiles/speed3d_c2c.dir/speed3d_c2c.cpp.o -MF CMakeFiles/speed3d_c2c.dir/speed3d_c2c.cpp.o.d -o CMakeFiles/speed3d_c2c.dir/speed3d_c2c.cpp.o -c /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0/benchmarks/speed3d_c2c.cpp
[ 20%] Linking CXX executable speed3d_c2c
cd /Users/adelmann/OPAL/tmp/src/heffte/heffte-2.2.0_build/benchmarks && /opt/local/bin/cmake -E cmake_link_script CMakeFiles/speed3d_c2c.dir/link.txt --verbose=1
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=13.1 -Wl,-search_paths_first -Wl,-headerpad_max_install_names -Wl,-flat_namespace CMakeFiles/speed3d_c2c.dir/speed3d_c2c.cpp.o -o speed3d_c2c ../libheffte.a /Users/adelmann/OPAL/lib/libmpicxx.dylib /Users/adelmann/OPAL/lib/libmpi.dylib /Users/adelmann/OPAL/lib/libpmpi.dylib /Users/adelmann/OPAL/lib/libfftw3.dylib /Users/adelmann/OPAL/lib/libfftw3_threads.dylib
Undefined symbols for architecture arm64:
"_fftwf_destroy_plan", referenced from:
heffte::fftw_executor::~fftw_executor() in speed3d_c2c.cpp.o
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
heffte::fftw_executor::~fftw_executor() in libheffte.a(heffte_fft3d.cpp.o)
heffte::fftw_executor_r2c::~fftw_executor_r2c() in libheffte.a(heffte_fft3d.cpp.o)
...
"_fftwf_execute_dft", referenced from:
void heffte::fft3d<heffte::backend::fftw, int>::standard_transform(float const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, int>::standard_transform(std::__1::complex const*, float*, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, int>::standard_transform<std::__1::complex >(std::__1::complex const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, long long>::standard_transform(float const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, long long>::standard_transform(std::__1::complex const*, float*, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, long long>::standard_transform<std::__1::complex >(std::__1::complex const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_execute_dft_c2r", referenced from:
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_cos_pre_pos_processor>::backward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_sin_pre_pos_processor>::backward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_execute_dft_r2c", referenced from:
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_cos_pre_pos_processor>::forward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_sin_pre_pos_processor>::forward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_plan_many_dft", referenced from:
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_plan_many_dft_c2r", referenced from:
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_plan_many_dft_r2c", referenced from:
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [benchmarks/speed3d_c2c] Error 1
make[1]: *** [benchmarks/CMakeFiles/speed3d_c2c.dir/all] Error 2
make: *** [all] Error 2

however I get an linker error on my mac:

Consolidate compiler generated dependencies of target Heffte
[ 15%] Built target Heffte
Consolidate compiler generated dependencies of target speed3d_c2c
[ 17%] Linking CXX executable speed3d_c2c
Undefined symbols for architecture arm64:
"_fftwf_destroy_plan", referenced from:
heffte::fftw_executor::~fftw_executor() in speed3d_c2c.cpp.o
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
heffte::fftw_executor::~fftw_executor() in libheffte.a(heffte_fft3d.cpp.o)
heffte::fftw_executor_r2c::~fftw_executor_r2c() in libheffte.a(heffte_fft3d.cpp.o)
...
"_fftwf_execute_dft", referenced from:
void heffte::fft3d<heffte::backend::fftw, int>::standard_transform(float const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, int>::standard_transform(std::__1::complex const*, float*, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, int>::standard_transform<std::__1::complex >(std::__1::complex const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, long long>::standard_transform(float const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, long long>::standard_transform(std::__1::complex const*, float*, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fft3d<heffte::backend::fftw, long long>::standard_transform<std::__1::complex >(std::__1::complex const*, std::__1::complex, std::__1::complex, std::__1::array<std::__1::unique_ptr<heffte::reshape3d_base, std::__1::default_delete<heffte::reshape3d_base > >, 4ul> const&, std::__1::array<heffte::fftw_executor*, 3ul>, heffte::direction, heffte::scale) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_execute_dft_c2r", referenced from:
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_cos_pre_pos_processor>::backward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_sin_pre_pos_processor>::backward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_execute_dft_r2c", referenced from:
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_cos_pre_pos_processor>::forward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::real2real_executor<heffte::backend::fftw, heffte::cpu_sin_pre_pos_processor>::forward(float*, float*) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_plan_many_dft", referenced from:
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
void heffte::fftw_executor::make_plan<std::__1::complex, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<std::__1::complex, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<std::__1::complex, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_plan_many_dft_c2r", referenced from:
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)1>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)1>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)1> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
"_fftwf_plan_many_dft_r2c", referenced from:
void heffte::fftw_executor_r2c::make_plan<float, (heffte::direction)0>(std::__1::unique_ptr<heffte::plan_fftw<float, (heffte::direction)0>, std::__1::default_delete<heffte::plan_fftw<float, (heffte::direction)0> > >&) const in libheffte.a(heffte_fft3d.cpp.o)
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [benchmarks/speed3d_c2c] Error 1
make[1]: *** [benchmarks/CMakeFiles/speed3d_c2c.dir/all] Error 2
make: *** [all] Error 2
./600-build-heffte: error in setting everything up!

CUDA backend shouldn't require MPI header file

CUDA backend source files src/heffte_backend_cuda.cu requires MPI header because of the heFFTe header file it uses:

#include "heffte_backend_cuda.h"

This leads to the following chain of headers being included: heffte_backend_cuda.h -> heffte_r2r_executor.h -> heffte_pack3d.h -> heffte_common.h -> heffte_geometry.h -> heffte_utils.h -> mpi.h

This requires CUDA compiler to have access to MPI headers but heFFTe's CUDA backend code does not have any calls to MPI.

CUDA_NVCC_FLAGS variable not honored by CMakeLists.txt

After configuring heFFTe, the option CUDA_NVCC_FLAGS is listed as output but is not passed to the nvcc compiler.

For example, cmake invocation: cmake -B /path/to/build -D Heffte_ENABLE_CUDA=On -D CUDA_NVCC_FLAGS=--no-such-flag /path/to/heffte/source/code ; cd /path/to/build ; make finishes successfully even though CUDA_NVCC_FLAGS is set to invalid value.

The only way to pass flags to CUDA's compiler I could use was through the CMake's standard CMAKE_CUDA_FLAGS variable.

proc_setup_min_surface is missing some edge cases?

It looks like proc_setup_min_surface is missing some edge cases.

    // set initial guess, probably the worst grid but a valid one
    std::valarray<index> best_grid = {1, 1, num_procs};

Is this in all cases valid? For example, if one wants to do 1d or 2d FFT, would the last dimension be 1?

    for(int i=1; i<=num_procs; i++){
        if (num_procs % i == 0){
            int const remainder = num_procs / i;
            for(int j=1; j<=remainder; j++){
                if (remainder % j == 0){
                    std::valarray<index> candidate_grid = {i, j, remainder / j};
                    index const candidate_surface = surface(candidate_grid);
                    if (candidate_surface < best_surface){
                        best_surface = candidate_surface;
                        best_grid    = candidate_grid;
                    }
                }
            }
        }
    }

Here we iterate up to num_procs or num_procs/i.Should we somehow consider the simulation domain's size? For example, with the following changes to heffte_example_r2c:

--- a/examples/heffte_example_r2c.cpp
+++ b/examples/heffte_example_r2c.cpp
@@ -34,8 +34,8 @@ void compute_dft(MPI_Comm comm){
 
     // using problem with size 20x20x20 problem, the computed indexes are 11x20x20
     // direction 0 is chosen to reduce the number of indexes
-    heffte::box3d<> real_indexes({0, 0, 0}, {19, 19, 19});
-    heffte::box3d<> complex_indexes({0, 0, 0}, {10, 19, 19});
+    heffte::box3d<> real_indexes({0, 0, 0}, {19, 0, 0});
+    heffte::box3d<> complex_indexes({0, 0, 0}, {10, 0, 0});
 
     // check if the complex indexes have correct dimension
     assert(real_indexes.r2c(r2c_direction) == complex_indexes);
@@ -50,6 +50,11 @@ void compute_dft(MPI_Comm comm){
     // the proc_grid is chosen to minimize the real data, but use for both real and complex cases
     std::array<int, 3> proc_grid = heffte::proc_setup_min_surface(real_indexes, num_ranks);
 
+    // print proc_grid
+    if (me == 0){
+        std::cout << "The processor grid is: " << proc_grid[0] << " x " << proc_grid[1] << " x " << proc_grid[2] << std::endl;
+    }
+
     std::vector<heffte::box3d<>> real_boxes    = heffte::split_world(real_indexes,    proc_grid);
     std::vector<heffte::box3d<>> complex_boxes = heffte::split_world(complex_indexes, proc_grid);

We get grid 1 x 2 x 4:

using backend: stock
The global input contains 20 real indexes.
The global output contains 11 complex indexes.
The processor grid is: 1 x 2 x 4
rank 0 computed error: 1.144409e-05
rank 1 computed error: 0.000000e+00
rank 2 computed error: 0.000000e+00
rank 3 computed error: 0.000000e+00
rank 4 computed error: 0.000000e+00
rank 5 computed error: 0.000000e+00
rank 6 computed error: 0.000000e+00
rank 7 computed error: 0.000000e+00

Whereas doing other way

--- a/examples/heffte_example_r2c.cpp
+++ b/examples/heffte_example_r2c.cpp
@@ -30,12 +30,12 @@ void compute_dft(MPI_Comm comm){
     if (me == 0) std::cout << "using backend: " << heffte::backend::name<backend_tag>() << "\n";
 
     // the dimension where the data will shrink
-    int r2c_direction = 0;
+    int r2c_direction = 2;
 
     // using problem with size 20x20x20 problem, the computed indexes are 11x20x20
     // direction 0 is chosen to reduce the number of indexes
-    heffte::box3d<> real_indexes({0, 0, 0}, {19, 19, 19});
-    heffte::box3d<> complex_indexes({0, 0, 0}, {10, 19, 19});
+    heffte::box3d<> real_indexes({0, 0, 0}, {0, 0, 19});
+    heffte::box3d<> complex_indexes({0, 0, 0}, {0, 0, 10});
 
     // check if the complex indexes have correct dimension
     assert(real_indexes.r2c(r2c_direction) == complex_indexes);
@@ -50,6 +50,11 @@ void compute_dft(MPI_Comm comm){
     // the proc_grid is chosen to minimize the real data, but use for both real and complex cases
     std::array<int, 3> proc_grid = heffte::proc_setup_min_surface(real_indexes, num_ranks);
 
+    // print proc_grid
+    if (me == 0){
+        std::cout << "The processor grid is: " << proc_grid[0] << " x " << proc_grid[1] << " x " << proc_grid[2] << std::endl;
+    }
+
     std::vector<heffte::box3d<>> real_boxes    = heffte::split_world(real_indexes,    proc_grid);
     std::vector<heffte::box3d<>> complex_boxes = heffte::split_world(complex_indexes, proc_grid);

The program crashes:

using backend: stock
The global input contains 20 real indexes.
The global output contains 11 complex indexes.
The processor grid is: 2 x 2 x 2
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
  what():  terminate called after throwing an instance of 'std::runtime_error'
  what():  Cannot split the given number of indexes into the given set of mpi-ranks. Most liklely, the number of indexes is too small compared to the number of mpi-ranks.terminate called after throwing an instance of 'std::runtime_error'

I tried to fix this with the following change:

--- a/include/heffte_geometry.h
+++ b/include/heffte_geometry.h
@@ -646,8 +646,8 @@ inline std::array<int, 3> proc_setup_min_surface(box3d<index> const world, int n
     // using valarrays that work much like vectors, but can perform basic
     // point-wise operations such as addition, multiply, and division
     std::valarray<index> all_indexes = {world.size[0], world.size[1], world.size[2]};
-    // set initial guess, probably the worst grid but a valid one
-    std::valarray<index> best_grid = {1, 1, num_procs};
+    // set initial guess
+    std::valarray<index> best_grid = {1, 1, 1};
 
     // internal helper method to compute the surface
     auto surface = [&](std::valarray<index> const &proc_grid)->
@@ -656,18 +656,22 @@ inline std::array<int, 3> proc_setup_min_surface(box3d<index> const world, int n
             return ( box_size * box_size.cshift(1) ).sum();
         };
 
-    index best_surface = surface({1, 1, num_procs});
+    index best_surface = std::numeric_limits<index>::max();
 
-    for(int i=1; i<=num_procs; i++){
+    int const i_max = std::min((index)num_procs, all_indexes[0]);
+    for(int i=1; i<=i_max; i++){
         if (num_procs % i == 0){
-            int const remainder = num_procs / i;
-            for(int j=1; j<=remainder; j++){
-                if (remainder % j == 0){
-                    std::valarray<index> candidate_grid = {i, j, remainder / j};
-                    index const candidate_surface = surface(candidate_grid);
-                    if (candidate_surface < best_surface){
-                        best_surface = candidate_surface;
-                        best_grid    = candidate_grid;
+            int const j_max = std::min((index)(num_procs / i), all_indexes[1]);
+            for(int j=1; j<=j_max; j++){
+                if (j_max % j == 0){
+                    int const k = j_max / j;
+                    if (k <= all_indexes[2]){
+                        std::valarray<index> candidate_grid = {i, j, k};
+                        index const candidate_surface = surface(candidate_grid);
+                        if (candidate_surface < best_surface){
+                            best_surface = candidate_surface;
+                            best_grid    = candidate_grid;
+                        }
                     }
                 }
             }

But whereas this modification seems to give a good processor grid (at least for "1d corner case"), splitting the world is still failing.

Linking error with fftw3_omp

There is an issue with compiling software on LUMI.

module load LUMI/23.09 partition/C
module load cray-python cray-fftw
cmake -S heffte-2.4.0-src -B heffte-2.4.0-build -DCMAKE_BUILD_TYPE=Release -D Heffte_ENABLE_AVX=ON -D Heffte_ENABLE_FFTW=ON -D Heffte_ENABLE_PYTHON=ON -D CMAKE_CXX_COMPILER=CC -D MPI_CXX_COMPILER=CC
[  1%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_c.cpp.o
[  3%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_plan_logic.cpp.o
[  5%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_magma_helpers.cpp.o
[  6%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_reshape3d.cpp.o
[  8%] Building CXX object CMakeFiles/Heffte.dir/src/heffte_compute_transform.cpp.o
[ 10%] Linking CXX shared library libheffte.so
[ 10%] Built target Heffte
[ 12%] Building CXX object benchmarks/CMakeFiles/convolution.dir/convolution.cpp.o
[ 13%] Linking CXX executable convolution
ld.lld: error: undefined reference due to --no-allow-shlib-undefined: omp_get_thread_num
>>> referenced by /opt/cray/pe/fftw/3.3.10.5/x86_milan/lib/libfftw3_omp.so

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: omp_get_num_threads
>>> referenced by /opt/cray/pe/fftw/3.3.10.5/x86_milan/lib/libfftw3_omp.so

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: GOMP_parallel
>>> referenced by /opt/cray/pe/fftw/3.3.10.5/x86_milan/lib/libfftw3_omp.so

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: omp_get_thread_num
>>> referenced by /opt/cray/pe/fftw/3.3.10.5/x86_milan/lib/libfftw3f_omp.so

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: omp_get_num_threads
>>> referenced by /opt/cray/pe/fftw/3.3.10.5/x86_milan/lib/libfftw3f_omp.so

With a small modification to FindHeffteFFTW.cmake the compilation is working:

# respect user provided FFTW_LIBRARIES
if (NOT FFTW_LIBRARIES)
    heffte_find_fftw_libraries(
        PREFIX ${FFTW_ROOT}
        VAR FFTW_LIBRARIES
        REQUIRED "fftw3" "fftw3f"
        OPTIONAL "fftw3_omp" "fftw3f_omp" "fftw3_threads" "fftw3f_threads"
                               )
#   if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
#       find_package(OpenMP REQUIRED)
#       list(APPEND FFTW_LIBRARIES ${OpenMP_CXX_LIBRARIES})
#   else()
#       if ("fftw3_omp" IN_LIST FFTW_LIBRARIES)
#           list(APPEND FFTW_LIBRARIES "-lgomp")
#       endif()
#   endif()

    find_package(OpenMP REQUIRED)
    if (OpenMP_FOUND)
        list(APPEND FFTW_LIBRARIES OpenMP::OpenMP_CXX)
    endif()

endif()

Before submitting PR, I'd like to understand the rationale behind the if-else and whether it should be fixed in some other way..?

Method to calculate modes?

Is there a method to calculate the modes/wavenumbers in Fourier space?
The background is that I'd like to calculate power spectra and thus need to bin the data in Fourier space based on their wavenumber.

Fuse packing/unpacking kernels for reshape3d_alltoall

Currently reshape3d_alltoall for N ranks runs N packing and N unpacking kernels respectively before and after the MPI_Alltoall. As rank count grows, the overhead of launching and waiting on those kernels grows linearly with N. In sufficiently regular cases, the loop over ranks in heffte::reshape3d_alltoall::apply_base() can be lowered into the device kernel. I have working SYCL code that does that and shows a clear performance improvement even for small N. Is this an optimization you'd consider incorporating if I contribute it?

Configure error on Fedora with MKL

CMake Error at cmake/FindHeffteMKL.cmake:59 (message):
  Could not find required mkl component: mkl_intel_ilp64
Call Stack (most recent call first):
  cmake/FindHeffteMKL.cmake:86 (heffte_find_mkl_libraries)
  CMakeLists.txt:159 (find_package)

Build includes:

-DMKL_ROOT=/opt/intel/oneapi/mkl/latest
-DHeffte_MKL_IOMP5=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin/libiomp5.so

See https://github.com/ECP-copa/Cabana/actions/runs/7062115458/job/19225227534?pr=720

Multi stream calculation error

Hello, I am using the rocm backend to build the heffte2.3 version for unit testing, and the heffte_streams test item will report the following error:

Start testing: Aug 16 15:16 CST

12/22 Testing: heffte_streams_np6
12/22 Test: heffte_streams_np6
Command: "/public/home/knight_wp/openmpi-5.0.0rc12/install/bin/mpiexec" "-n" "6" "/public/home/knight_wp/heffte-2.3.0/build/test/test_streams"
Directory: /public/home/knight_wp/heffte-2.3.0/build/test
"heffte_streams_np6" start time: Aug 16 15:16 CST
Output:


                          heffte::fft streams

------------------------------------------------------------------------------- ccomplex -np 6 test heffte::fft3d (stream) pass
zcomplex -np 6 test heffte::fft3d (stream) pass
float -np 6 test heffte::fft3d_r2c (stream) pass
double -np 6 test heffte::fft3d_r2c (stream) pass
error magnitude: 0.294887
error magnitude: 0.29489
error magnitude: 0.294889
error magnitude: 0.294887
terminate called after throwing an instance of 'std::runtime_error'
what(): mpi rank = 0 test -np 6 test heffte::fft3d (stream) in file: /public/home/knight_wp/heffte-2.3.0/test/test_fft3d.h line: 283
[a06r3n04:15025] *** Process received signal ***
[a06r3n04:15025] Signal: Aborted (6)
[a06r3n04:15025] Signal code: (-6)
[a06r3n04:15025] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2b84f305b5d0]
[a06r3n04:15025] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b84fcd57207]
[a06r3n04:15025] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b84fcd588f8]
[a06r3n04:15025] [ 3] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0x99203)[0x2b84fc9e0203]
[a06r3n04:15025] [ 4] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4c76)[0x2b84fc9ebc76]
[a06r3n04:15025] [ 5] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4ce1)[0x2b84fc9ebce1]
[a06r3n04:15025] [ 6] terminate called after throwing an instance of 'std::runtime_error'
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4f35)[0x2b84fc9ebf35]
[a06r3n04:15025] [ 7] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x42c167]
[a06r3n04:15025] [ 8] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bee0]
[a06r3n04:15025] [ 9] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bb30]
[a06r3n04:15025] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b84fcd433d5]
[a06r3n04:15025] [11] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40b0f8]
[a06r3n04:15025] *** End of error message ***
what(): mpi rank = 4 test -np 6 test heffte::fft3d (stream) in file: /public/home/knight_wp/heffte-2.3.0/test/test_fft3d.h line: 283
terminate called after throwing an instance of 'std::runtime_error'
[a06r3n04:15029] *** Process received signal ***
[a06r3n04:15029] Signal: Aborted (6)
[a06r3n04:15029] Signal code: (-6)
what(): mpi rank = 3 test -np 6 test heffte::fft3d (stream) in file: /public/home/knight_wp/heffte-2.3.0/test/test_fft3d.h line: 283
[a06r3n04:15028] *** Process received signal ***
[a06r3n04:15028] Signal: Aborted (6)
[a06r3n04:15028] Signal code: (-6)
[a06r3n04:15029] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2ab27ffc95d0]
[a06r3n04:15029] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2ab289cc5207]
[a06r3n04:15029] [ 2] [a06r3n04:15028] [ 0] /lib64/libc.so.6(abort+0x148)[0x2ab289cc68f8]
[a06r3n04:15029] [ 3] /lib64/libpthread.so.0(+0xf5d0)[0x2ae3196a55d0]
[a06r3n04:15028] [ 1] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0x99203)[0x2ab28994e203]
[a06r3n04:15029] [ 4] /lib64/libc.so.6(gsignal+0x37)[0x2ae3233a1207]
[a06r3n04:15028] [ 2] /lib64/libc.so.6(abort+0x148)[0x2ae3233a28f8]
[a06r3n04:15028] [ 3] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4c76)[0x2ab289959c76]
[a06r3n04:15029] [ 5] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4ce1)[0x2ab289959ce1]
[a06r3n04:15029] [ 6] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0x99203)[0x2ae32302a203]
[a06r3n04:15028] [ 4] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4f35)[0x2ab289959f35]
[a06r3n04:15029] [ 7] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x42c167]
[a06r3n04:15029] [ 8] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bee0]
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4c76)[0x2ae323035c76]
[a06r3n04:15028] [ 5] terminate called after throwing an instance of 'std::runtime_error'
[a06r3n04:15029] [ 9] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bb30]
[a06r3n04:15029] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab289cb13d5]
[a06r3n04:15029] [11] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40b0f8]
[a06r3n04:15029] *** End of error message ***
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4ce1)[0x2ae323035ce1]
[a06r3n04:15028] [ 6] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4f35)[0x2ae323035f35]
[a06r3n04:15028] [ 7] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x42c167]
[a06r3n04:15028] [ 8] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bee0]
[a06r3n04:15028] [ 9] what(): mpi rank = 1 test -np 6 test heffte::fft3d (stream) in file: /public/home/knight_wp/heffte-2.3.0/test/test_fft3d.h line: 283
/public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bb30]
[a06r3n04:15028] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ae32338d3d5]
[a06r3n04:15028] [11] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40b0f8]
[a06r3n04:15028] *** End of error message ***
[a06r3n04:15026] *** Process received signal ***
[a06r3n04:15026] Signal: Aborted (6)
[a06r3n04:15026] Signal code: (-6)
[a06r3n04:15026] [ 0] terminate called after throwing an instance of 'std::runtime_error'
terminate called after throwing an instance of 'std::runtime_error'
/lib64/libpthread.so.0(+0xf5d0)[0x2ad73a2775d0]
[a06r3n04:15026] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2ad743f73207]
[a06r3n04:15026] [ 2] what(): mpi rank = 2 test -np 6 test heffte::fft3d (stream) in file: /public/home/knight_wp/heffte-2.3.0/test/test_fft3d.h line: 283
/lib64/libc.so.6(abort+0x148)[0x2ad743f748f8]
[a06r3n04:15026] [ 3] what(): mpi rank = 5 test -np 6 test heffte::fft3d (stream) in file: /public/home/knight_wp/heffte-2.3.0/test/test_fft3d.h line: 283
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0x99203)[0x2ad743bfc203]
[a06r3n04:15026] [ 4] [a06r3n04:15027] *** Process received signal ***
[a06r3n04:15027] Signal: Aborted (6)
[a06r3n04:15027] Signal code: (-6)
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4c76)[0x2ad743c07c76]
[a06r3n04:15026] [ 5] [a06r3n04:15030] *** Process received signal ***
[a06r3n04:15030] Signal: Aborted (6)
[a06r3n04:15030] Signal code: (-6)
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4ce1)[0x2ad743c07ce1]
[a06r3n04:15026] [ 6] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4f35)[0x2ad743c07f35]
[a06r3n04:15026] [ 7] [a06r3n04:15027] [ 0] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x42c167]
[a06r3n04:15026] [ 8] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bee0]
[a06r3n04:15026] [ 9] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bb30]
[a06r3n04:15026] /lib64/libpthread.so.0(+0xf5d0)[0x2b9587dc05d0]
[a06r3n04:15027] [ 1] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ad743f5f3d5]
[a06r3n04:15026] [11] /lib64/libc.so.6(gsignal+0x37)[0x2b9591abc207]
[a06r3n04:15027] [ 2] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40b0f8]
[a06r3n04:15026] *** End of error message ***
[a06r3n04:15030] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2b61769615d0]
[a06r3n04:15030] [ 1] /lib64/libc.so.6(abort+0x148)[0x2b9591abd8f8]
[a06r3n04:15027] [ 3] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0x99203)[0x2b9591745203]
[a06r3n04:15027] [ 4] /lib64/libc.so.6(gsignal+0x37)[0x2b618065d207]
[a06r3n04:15030] [ 2] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4c76)[0x2b9591750c76]
[a06r3n04:15027] [ 5] /lib64/libc.so.6(abort+0x148)[0x2b618065e8f8]
[a06r3n04:15030] [ 3] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0x99203)[0x2b61802e6203]
[a06r3n04:15030] [ 4] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4ce1)[0x2b9591750ce1]
[a06r3n04:15027] [ 6] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4c76)[0x2b61802f1c76]
[a06r3n04:15030] [ 5] /public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4f35)[0x2b9591750f35]
[a06r3n04:15027] [ 7] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x42c167]
[a06r3n04:15027] [ 8] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bee0]
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4ce1)[0x2b61802f1ce1]
[a06r3n04:15030] [ 6] [a06r3n04:15027] [ 9] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bb30]
[a06r3n04:15027] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b9591aa83d5]
[a06r3n04:15027] [11] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40b0f8]
[a06r3n04:15027] *** End of error message ***
/public/software/compiler/gnu/gcc-9.3.0/lib64/libstdc++.so.6(+0xa4f35)[0x2b61802f1f35]
[a06r3n04:15030] [ 7] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x42c167]
[a06r3n04:15030] [ 8] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bee0]
[a06r3n04:15030] [ 9] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40bb30]
[a06r3n04:15030] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b61806493d5]
[a06r3n04:15030] [11] /public/home/knight_wp/heffte-2.3.0/build/test/test_streams[0x40b0f8]
[a06r3n04:15030] *** End of error message ***

prterun noticed that process rank 0 with PID 0 on node a06r3n04 exited on signal 6 (Aborted).

Test time = 7.80 sec ---------------------------------------------------------- Test Failed. "heffte_streams_np6" end time: Aug 16 15:16 CST "heffte_streams_np6" time elapsed: 00:00:07 ----------------------------------------------------------

End testing: Aug 16 15:16 CST

I changed the computation using multiple streams to be on the default stream and the unit test passed. May I ask if there is a numerical dependency in the program that causes the calculation order to be wrong?
Also, my cmake instructions to build heffte look like this:
cmake .. -DCMAKE_BUILD_TYPE=Release
-DBUILD_SHARED_LIBS=ON
-DCMAKE_INSTALL_PREFIX=/public/home/knight_wp/heffte-2.3.0/install/
-DHeffte_ENABLE_AVX=ON
-DHeffte_ENABLE_ROCM=ON
-DCMAKE_CXX_COMPILER=hipcc
-DHeffte_ROCM_ROOT=/public/software/compiler/rocm/dtk-23.04 \

MPICH + CUDA

Some tests, e.g., long long fail when using mpich and CUDA-aware GPU.

Conda Package for heFFTe

Hi,

We are in the process of integrating heFFTe into the mainline of WarpX. As part of this, we want to make sure we can compile our desktop distributions (Linux/macOS/Windows) with all features.

We distribute for desktop via conda using the conda-forge packaging index.

Since heFFTe did not yet have a package in conda-forge, I will add one. Please let me know if you like to be listed as co-maintainers for it: conda-forge/staged-recipes#26633

(We also ship via other package managers, such as Spack for desktop and HPC, which already has a heFFTe package.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.