Coder Social home page Coder Social logo

Comments (8)

mtazzari avatar mtazzari commented on August 28, 2024

This is a very important enhancement because the input image will always be real and taking advantage of faster R2C transforms must be definitely done at some point.
This would reduce not only the computing time, but also the memory usage.

Implementing the usage of R2C transforms will change the coordinate mapping in the Fourier space, but probably it is not too difficult to recompute the algorithm to account for the change in symmetry.

I am undecided whether this should be done for version 1.0 due to the possibile large amount of time needed to check it properly.

from galario.

fredRos avatar fredRos commented on August 28, 2024

I agree it will be quite some work. But if the next round of CPU profiling tomorrow shows that the O(n^2) operations like shift still dominate, we have to seriously consider doing it asap. Once we submit the paper, our enthusiasm to take on such changes will fade away.

from galario.

fredRos avatar fredRos commented on August 28, 2024

On the FFTW format for r2c transforms
http://fftw.org/fftw3_doc/Real_002ddata-DFT-Array-Format.html#Real_002ddata-DFT-Array-Format
http://docs.nvidia.com/cuda/cufft/#data-layout

from galario.

fredRos avatar fredRos commented on August 28, 2024

Reserve the memory for the Fourier transform on the cpu with FFTW functions. From the MPI example,

alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
                                         &local_n0, &local_0_start);
data = fftw_alloc_complex(alloc_local);
...
fftw_free(void *p);

from galario.

fredRos avatar fredRos commented on August 28, 2024

Here are some additional performance hints for the gpu taken from http://docs.nvidia.com/cuda/cufft/index.html#accuracy-and-performance. I'm surprised that a plan needs as much temp space as the image size. But this may explain why inplace is not faster because it has to copy back all the elements from temp space.

For real to complex

  • Ensure problem size of x dimension is a multiple of 4.
  • Use out-of-place mode.

Memory usage

Execution of a transform of a particular size and type may take several stages of processing. When a plan for the transform is generated, cuFFT derives the internal steps that need to be taken. These steps may include multiple kernel launches, memory copies, and so on. In addition, all the intermediate buffer allocations (on CPU/GPU memory) take place during planning. These buffers are released when the plan is destroyed. In the worst case, the cuFFT Library allocates space for 8batchn[0]..n[rank-1] cufftComplex or cufftDoubleComplex elements (where batch denotes the number of transforms that will be executed in parallel, rank is the number of dimensions of the input data (see Multidimensional Transforms) and n[] is the array of transform dimensions) for single and double-precision transforms respectively. Depending on the configuration of the plan, less memory may be used. In some specific cases, the temporary space allocations can be as low as 1batchn[0]*..*n[rank-1] cufftComplex or cufftDoubleComplex elements. This temporary space is allocated separately for each individual plan when it is created (i.e., temporary space is not shared between the plans).

from galario.

fredRos avatar fredRos commented on August 28, 2024

Fixed by #61

from galario.

fredRos avatar fredRos commented on August 28, 2024

To make this quantitative, we compare c2c (6fe6eaa) and r2c (43d34bd) transform. As anticipated, the computation of the FFT reduces by ~2. From the attached nvprof output files, the most important numbers in ms are the memory transfer (~32) in either case, the first shift (2.6 vs 2), the FFT (4 vs 2), the 2nd shift (2.6 vs 1.4).

from galario.

fredRos avatar fredRos commented on August 28, 2024

speed_benchmark_c2c_7cdbfac.txt
speed_benchmark_r2c_43d34bd3.txt

from galario.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.