Comments (11)
Hm....
using CuArrays, FourierFlows, Statistics, FFTW, Random, BenchmarkTools
A, T = CuArray, Float64
Fqh = A{Complex{T}}(undef, (129, 256))
function testscalar(Fqh)
phase = 2π*rand!(A{T}(undef, size(Fqh)))
eta = cos.(phase) + im*sin.(phase)
eta[1, 1] = 0
@. Fqh = eta
end
function testnoscalar(Fqh)
phase = 2π*rand!(A{T}(undef, size(Fqh)))
eta = cos.(phase) + im*sin.(phase)
@. Fqh = eta
end
function testalternativetoscalar(Fqh)
phase = 2π*rand!(A{T}(undef, size(Fqh)))
eta = cos.(phase) + im*sin.(phase)
@. Fqh = eta
Fq = irfft(Fqh, 256)
Fq = Fq .- mean(Fq)
Fqh = rfft(Fq)
end
julia> @btime testscalar(Fqh);
110.507 μs (310 allocations: 13.83 KiB)
julia> @btime testnoscalar(Fqh);
87.748 μs (307 allocations: 13.67 KiB)
julia> @btime testalternativetoscalar(Fqh);
1.899 ms (493 allocations: 21.70 KiB)
So the scalar operation causes @25% slowdown... However the alternative method I used is unfair since I didn't use FFT plans... But what other alternative we have other than eta[1, 1] = 0
?
from geophysicalflows.jl.
The alternative is to launch a kernel, I suppose. We would just need one thread so it'd be an extremely simple kernel. Perhaps we can write something into utils.jl
that does it, like zero_zeroth_mode!
or something?
from geophysicalflows.jl.
I don't understand what "launch a kernel" means...
from geophysicalflows.jl.
@navidcy a "kernel" is a function that, when "launched" on the GPU, executes in parallel on hundreds to thousands of GPU threads. All GPU computations are done with kernels. For example, a broadcast operation over CuArray
s launches one kernel. Calling for a CuFFT also launches a kernel.
For the most part, we are able to use powerful abstractions for launching kernels in FourierFlows, which has the benefit of being easy to program and also permitting code that runs on both CPUs and GPUs.
However, there seem to be some small number of tasks that will require us to actually write the kernel functions ourselves (rather than using broadcasting or FFTs).
See the CUDAnative
documentation --- the macro @cuda
is used to launch a kernel:
https://juliagpu.github.io/CUDAnative.jl/stable/man/usage.html
We can also use GPUifyLoops
to specify kernel functions that work on both CPU and GPU, though I don't think we will need to do that. Instead, we will define a high-level function like zero_zeroth_mode!
which has methods for both CPU arrays (the easy case) and CuArray
s (the case that requires writing a simple GPU kernel).
from geophysicalflows.jl.
So I think we need something ultra-simple like
zero_zeroth_mode!(a) = a[1] = 0
@hascuda function zero_zeroth_mode!(a::CuArray)
@cuda threads=1 _zero_zeroth_mode!(a)
return nothing
end
function _zero_zeroth_mode!(a)
a[1] = 0
return nothing
end
Will have to see if that works (I'm not sure...); we might actually need
function _zero_zeroth_mode!(a)
i = threadIdx().x
a[i] = 0
return nothing
end
let's test it out and see.
from geophysicalflows.jl.
On this issue --- it seems that for very small scalar operations such as eta[1, 1] = 0
, a scalar operation may actually be the fastest method. There is another quite low-level CUDA function that could be used an alternative, but I doubt that it's worth the effort. The cost of this single scalar operation should be fairly miniscule.
from geophysicalflows.jl.
Sure. For this case it may miniscule. But if such operation occurs every time-step then it might be an issue?
Perhaps we should close the issue then...
from geophysicalflows.jl.
By miniscule, I mean actually miniscule compared to something like an FFT transform, which also occurs every time-step. Thus the net effect would be in the noise.
The way to test is just to benchmark with and without this operation (even though it is not physically correct to omit, this still tests performance). I'd be curious to see if there's any impact.
from geophysicalflows.jl.
Here is a good solution for scalar operations: CliMA/Oceananigans.jl#851
from geophysicalflows.jl.
This will prevent unintended invocation of scalar operations, since using scalar operations requires the prefix CUDA.@allowscalar
.
The above discussion holds however. It is not always desirable to eliminate scalar operations.
from geophysicalflows.jl.
Modules now include CUDA.@allowscalar
; I'm closing this.
from geophysicalflows.jl.
Related Issues (20)
- `MultiLayerQG` tests fail on Julia v1.6
- 3D Navier-Stokes HOT 1
- Better docstring for module `Problem` constructors
- Avoid scalar operations every time-step
- Should we enforce dealiasing before any calculation of nonlinear terms? HOT 7
- No need to import `FFTW` in examples
- `Plots` no longer accepts functions as `clims` HOT 7
- MultilayerQG example crashed at about 35000 steps HOT 7
- Optimized PV inversion for two layer case in `MultilayerQG` HOT 4
- Demonstrate save/load output functionality in an example
- Convert diagnostics output for `MultiLayerQG` from Arrays to Tuples? HOT 1
- م
- Potential missing factor in `MultiLayerQG.energies` HOT 1
- construct a random field with prescribed spectral slope HOT 2
- enstrophy and cuda problem HOT 7
- Should we be passing `forcing_spectrum` as argument into `calcF!`? HOT 2
- Option to specify PV gradients explicitly rather than computing them spectrally HOT 4
- Sign error for mean PV gradient `Qy` in `MultiLayerQG` module HOT 11
- Add citations in the Docs via DocumenterCitations.jl
- Mistake in stretching matrix in multilayerqg.jl HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geophysicalflows.jl.