Coder Social home page Coder Social logo

kokkos-fft's Introduction

kokkos-fft

CI docs

Warning

UNOFFICIAL FFT interfaces for Kokkos C++ Performance Portability Programming EcoSystem

Kokkos-fft implements local interfaces between Kokkos and de facto standard FFT libraries, including fftw, cufft, hipfft (rocfft), and oneMKL. "Local" means not using MPI, or running within a single MPI process without knowing about MPI. We are inclined to implement the numpy.fft-like interfaces adapted for Kokkos. A key concept is that "As easy as numpy, as fast as vendor libraries". Accordingly, our API follows the API by numpy.fft with minor differences. A fft library dedicated to Kokkos Device backend (e.g. cufft for CUDA backend) is automatically used. If something is wrong with runtime values (say View extents), it will raise runtime errors (C++ exceptions or assertions). See documentations for more information.

Here is an example for 1D real to complex transform with rfft in Kokkos-fft.

#include <Kokkos_Core.hpp>
#include <Kokkos_Complex.hpp>
#include <Kokkos_Random.hpp>
#include <KokkosFFT.hpp>
using execution_space = Kokkos::DefaultExecutionSpace;
template <typename T> using View1D = Kokkos::View<T*, execution_space>;
constexpr int n = 4;

View1D<double> x("x", n);
View1D<Kokkos::complex<double> > x_hat("x_hat", n/2+1);

Kokkos::Random_XorShift64_Pool<> random_pool(12345);
Kokkos::fill_random(x, random_pool, 1);
Kokkos::fence();

KokkosFFT::rfft(execution_space(), x, x_hat);

This is equivalent to the following python code.

import numpy as np
x = np.random.rand(4)
x_hat = np.fft.rfft(x)

There are two major differences: execution_space argument and output value (x_hat) is an argument of API (not returned value from API). As imagined, Kokkos-fft only accepts Kokkos Views as input data. The accessibilities of Views from execution_space are statically checked (compilation errors if not accessible).

Depending on a View dimension, it automatically uses the batched plans as follows

#include <Kokkos_Core.hpp>
#include <Kokkos_Complex.hpp>
#include <Kokkos_Random.hpp>
#include <KokkosFFT.hpp>
using execution_space = Kokkos::DefaultExecutionSpace;
template <typename T> using View2D = Kokkos::View<T**, execution_space>;
constexpr int n0 = 4, n1 = 8;

View2D<double> x("x", n0, n1);
View2D<Kokkos::complex<double> > x_hat("x_hat", n0, n1/2+1);

Kokkos::Random_XorShift64_Pool<> random_pool(12345);
Kokkos::fill_random(x, random_pool, 1);
Kokkos::fence();

int axis = -1;
KokkosFFT::rfft(execution_space(), x, x_hat, KokkosFFT::Normalization::backward, axis); // FFT along -1 axis and batched along 0th axis

This is equivalent to

import numpy as np
x = np.random.rand(4, 8)
x_hat = np.fft.rfft(x, axis=-1)

In this example, the 1D batched rfft over 2D View along axis -1 is executed. Some basic examples are found in examples.

Disclaimer

KokkosFFT is under development and subject to change without warning. The authors do not guarantee that this code runs correctly in all the environments.

Using KokkosFFT

For the moment, there are two ways to use Kokkos-fft: including as a subdirectory in CMake project or installing as a library. First of all, you need to clone this repo.

git clone --recursive https://github.com/kokkos/kokkos-fft.git

Prerequisites

To use Kokkos-fft, we need the followings:

  • CMake 3.22+
  • Kokkos 4.2+
  • gcc 8.3.0+ (CPUs)
  • IntelLLVM 2023.0.0+ (CPUs, Intel GPUs)
  • nvcc 11.0.0+ (NVIDIA GPUs)
  • rocm 5.3.0+ (AMD GPUs)

CMake

Since Kokkos-fft is a header-only library, it is enough to simply add as a subdirectory. It is assumed that kokkos and Kokkos-fft are placed under <project_directory>/tpls.

Here is an example to use Kokkos-fft in the following CMake project.

---/
 |
 └──<project_directory>/
    |--tpls
    |    |--kokkos/
    |    └──kokkos-fft/
    |--CMakeLists.txt
    └──hello.cpp

The CMakeLists.txt would be

cmake_minimum_required(VERSION 3.23)
project(kokkos-fft-as-subdirectory LANGUAGES CXX)

add_subdirectory(tpls/kokkos)
add_subdirectory(tpls/kokkos-fft)

add_executable(hello-kokkos-fft hello.cpp)
target_link_libraries(hello-kokkos-fft PUBLIC Kokkos::kokkos KokkosFFT::fft)

For compilation, we basically rely on the CMake options for Kokkos. For example, the compile options for A100 GPU is as follows.

cmake -B build \
      -DCMAKE_CXX_COMPILER=g++ \
      -DCMAKE_BUILD_TYPE=Release \
      -DKokkos_ENABLE_CUDA=ON \
      -DKokkos_ARCH_AMPERE80=ON
cmake --build build -j 8

This way, all the functionalities are executed on A100 GPUs. For installation, details are provided in the documentation.

LICENCE

License License: MIT Kokkos-FFT is distributed under either the MIT license, or at your option, the Apache-2.0 licence with LLVM exception.

kokkos-fft's People

Contributors

helloworld922 avatar jbigot avatar pzehner avatar rb214678 avatar yasahi-hpc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

helloworld922

kokkos-fft's Issues

Pre-build Kokkos on CI

Build and install Kokkos on Release mode in the images.

Kokkos would be discovered as an installed library, backend parameters should be deduced and not passed again.

The test example build at the end of the workflow should properly use this installed version of Kokkos.

Optionally, KokkosFFT_INTERNAL_Kokkos should be removed.

Cannot compile if CMAKE_BUILD_TYPE is not set or set to Debug

When building with CMAKE_BUILD_TYPE unset or set to Debug, I got the following error:

/local/home/user/Projets/kokkos-fft/fft/src/KokkosFFT_Helpers.hpp(27): error: no instance of function template "KokkosFFT::Impl::is_out_of_range_value_included" matches the argument list
            argument types are: (std::vector<int, std::allocator<int>>, const int)
   !KokkosFFT::Impl::is_out_of_range_value_included(axes, rank)

Pertinence of Kokkos_Version_Check

I think the feature in cmake/Kokkos_Version_Check.cmake should be easily replaced by the CMake standard find_package version option:

find_package(Kokkos 4.2.0 REQUIRED)

Maybe we don't need to specify the version up to the patch level (i.e. 4.2 should be enough).

Otherwise, the cmake/Kokkos_Version_Check.cmake is invalid and should be:

diff --git a/cmake/Kokkos_Version_Check.cmake b/cmake/Kokkos_Version_Check.cmake
index 3f3f5cb..98a38f5 100644
--- a/cmake/Kokkos_Version_Check.cmake
+++ b/cmake/Kokkos_Version_Check.cmake
@@ -1,11 +1,11 @@
 function(check_minimum_required_kokkos kokkos_required_version)
-    if(${Kokkos_VERSION} STREQUAL "")
+    if(Kokkos_VERSION STREQUAL "")
         message(FATAL_ERROR "Kokkos_VERSION not set. Cannot check Kokkos satisfies the minimum required version.")
     else()
-        if(${Kokkos_VERSION} VERSION_GREATER_EQUAL ${kokkos_required_version})
+        if(Kokkos_VERSION VERSION_GREATER_EQUAL ${kokkos_required_version})
             message(STATUS "Found Kokkos version ${Kokkos_VERSION} at ${Kokkos_DIR}")
         else()
             message(FATAL_ERROR "Kokkos FFT ${KOKKOSFFT_VERSION} requires ${kokkos_required_version} or later.")
         endif()
     endif()

Kokkos version check

Hi all another issue appears to be the lack of check on the Kokkos Version being used when building against a pre-installed Kokkos library.
I am suspecting that for the time being it works with 4.2.0 and maybe up to 4.0.0 but likely not against anything older than that?
It would be good to add something in CMakeLists.txt to warn users if they attempt to install against an unsupported version.
Also updating the documentation to reflect what is actually supported would be great!

Explore existing DDC implementation

Summary

Before starting development, we should start looking at what's already been done in DDC

End-User Goal

Business Goal

Acceptance Criteria

Measurement of Success

We understand the existing implementation & its choices.

Tasks

Epic FFT Goal

Description

The goal is to design a performance portable FFT library with a Kokkos-compatible interface.

At first we target shared memory and exclude distributed memory.

Related User Stories

API design

Summary

As a code developer, I want to know the list of features supported by all back-ends, on:

  • CPU,
  • AMD GPU,
  • Nvidia GPU,
  • Intel GPU.

End-User Goal

  • I'd like a list of function calls to be available that only differ in terms of template parameters describing the execution space and memory space, and that I can use on all four back-ends.
  • I'd accept calls to not be available when the memory space where my data resides is not accessible from the execution space I select.
  • I'd expect the configuration options to be similar to that of Kokkos so that I don't have much to do by default when already using Kokos to use the FFT.

Business Goal

Long term goal for the project

Acceptance Criteria

Measurement of Success

Tasks

Preparation for release

Here are the current lists from #8 (comment).

  • Add AUTHORS, who should be? Just myself for the moment (#23)
  • Unuse double underscore identifiers like __KOKKOSFFT_NORMALIZATION_HPP__ (#8)
  • Rename FFT_Normalization to Normalization because they are already under KokkosFFT namespace (#8)
  • Add installation part to CMakeLists.txt. Also add installation test in GitHub actions (#22). Thanks a lot for your help! @cedricchevalier19
  • Support transforms over odd number. The number of points for FFT direction is limited to be even number (so that the size after transform would be n/2+1) (#21)
  • More static assertions. Particularly, we assume that the In and Out Views have same Layout and ranks, which should be checked. (#26)
  • Introduce ExecSpace as a template argument as well as DDC. This allows the coexistence of fft helpers for both host and device (#12):
// As a template parameter
KokkosFFT::fft<Kokkos::OpenMP>(a, out); // a and out are on Host or Device is CPU
KokkosFFT::fft<Kokkos::Cuda>(a, out); // a and out are on Device

// As an argument
KokkosFFT::fft(Kokkos::OpenMP(), a, out); // a and out are on Host or Device is CPU
KokkosFFT::fft(Kokkos::Cuda(), a, out); // a and out are on Device

We can also make a check for memory and exec space consistency by

static_assert(
  Kokkos::SpaceAccessibility<ExecSpace, MemorySpace>::accessible,
  "MemorySpace has to be accessible for ExecutionSpace."
);
  • Add capability to reuse plans. This is particularly important for NVIDIA GPUs, which has some overheads to create plans (#19)
  • Management of default arguments. As well as numpy, I would like to allow the function calls with minimum overloading (#19)
KokkosFFT::fft(execution_space(), a, out);
KokkosFFT::fft(execution_space(), a, out, /*axis=*/-1);
KokkosFFT::fft(execution_space(), a, out, /*n=*/n0);
KokkosFFT::fft(execution_space(), a, out, KokkosFFT::FFT_Normalization::BACKWARD);
KokkosFFT::fft(execution_space(), a, out, KokkosFFT::FFT_Normalization::BACKWARD, /*axis=*/*-1);
KokkosFFT::fft2(execution_space(), a, out, KokkosFFT::FFT_Normalization::BACKWARD, /*axes=*/axes_type{-2, -1});
KokkosFFT::fft(execution_space(), a, out, plan, KokkosFFT::FFT_Normalization::BACKWARD, /*axis=*/-1);

Particularly, it is difficult to distinguish n and axis, where n is a size_t type and axis is an int type.

  • Introduce the Impl namespace to disallow users to access implementation details (#13)
  • Add missing functions such as hfft (#16)
  • Add missing helper functions such as fftfreq (#17)
  • Add capability to allow optimal argument n. (See fft) (#16)
  • Googletest should be parameterized over types using ::testing::Types (#14)
  • Add docs to explain what are our interfaces and how to use them (#42)
  • introduce a clang format file ideally from Kokkos repo (#20)
  • Adding perf_test with googlebenchmark (#41)
  • Add capability to call KokkosFFT APIs from both host and device. This is not quite useful so I will make it optional. (#18)
using execution_space = Kokkos::DefaultExecutionSpace;
using host_execution_space = Kokkos::DefaultHostExecutionSpace;
KokkosFFT::fft(execution_space(), a, out); // Calling FFTs on Device
KokkosFFT::fft(host_execution_space(), h_a, h_out); // Calling FFTs on Host
  • Improve docker files and add CI on GPUs (#29)
    Thanks a lot for your help! @pzehner
  • Kokkos::Threads support (#31, #46)
  • Adding 4D to 8D capabilities (#51)
  • #63

The remaining issues which may be important in the future

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.