mikaelslevinsky / fasttransforms Goto Github PK

View Code? Open in Web Editor NEW

53.0 6.0 8.0 1.26 MB

:bullettrain_front: Fast orthogonal polynomial transforms :surfer:

License: MIT License

Makefile 0.52% C 98.94% BitBake 0.54%

legendre chebyshev jacobi laguerre ultraspherical spherical zernike proriol harmonics spin

fasttransforms's People

Stargazers

Watchers

Forkers

dawese dlfivefifty sciencewiki ikaroruan brockklippenstein rethd eschnett mikaem

fasttransforms's Issues

Unify normalizations of 1D transforms by inputting non-unit diagonal scalings and discretizing banded matrices with normalizations

Add methods & tests for banded LU factorization solves

Submit to J. Open Source Software

This would make it easy to cite the package.

Unify ft_harmonic_plan and ft_tetrahedral_harmonic_plan

Support cone orthogonal polynomials

Sometimes columns are not stored contiguously.

I think assuming stride(A,1) == 1 is fine (that is, ColumnMajor())

Add tests for blockbanded_source.c

Remove |m| in disk2cfs documentation

m == |m| as m is non-negative. This could be confusing on this page:

https://mikaelslevinsky.github.io/FastTransforms/transforms.html

Add FFTW-based synthesis and analysis

Hi Michael,

It took a little while to get build on linux going. Here's suggestions for tweaks to library (not worth a pull request). I only tested on ubuntu 16.04.

Append to Make.inc:

ifeq ($(UNAME), Linux)
LDLIBS += -lm
CFLAGS += -I/usr/lib/openblas-base
LDFLAGS += -L/usr/lib/openblas-base
endif

You'll also need to add in the ifeq windows block:
UNAME := Windows

Also, you'll want to add instructions to type in the shell:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.

Otherwise the executable can't find the .so

My tests now all run, although I don't understand the output.

PS It's not apparent from your docs how to do a simple spherical harmonic eval or projection onto a tensor product grid. Am I supposed to do sph to Fourier then do FFT2 on grid? I would have to guess the normalization, etc. An example of that would be very helpful.

PPS Good to see you at ICOSAHOM!

Use C99 Complex types if already defined

Add link to github page in docs

Do not declare backups if Intel instruction sets are not available

There should be a macro that determines the user's API for the "most vectorized" driver routines.

Fast trivial `cheb2jac`

julia> x = randn(100); @time cheb2jac(x, -1/2,-1/2);
  0.000522 seconds (18 allocations: 1.234 KiB)

julia> x = randn(1000); @time cheb2jac(x, -1/2,-1/2);
  0.025063 seconds (18 allocations: 8.297 KiB)

Spherical and Disk harmonic transforms segfault with an even number of columns

Support vector fields on the sphere

Combine DCTs (DSTs) and connection coefficients for faster (alternative) synthesis and analysis

Using an O(n² log n) instead of O(n²) precomputation, the colatitudinal DCTs and DSTs can be applied to the Legendre to Chebyshev connection coefficients to reduce the overall spherical harmonic synthesis and analysis to execute_sph_hi2lo followed by a sequence of cblas_dgemm and one longitudinal FFTW_HC2R.

Contrast this with the current four-step approach of execute_sph_hi2lo followed by a sequence of cblas_dtrmm and a sequence of FFTW_REDFT01 and FFTW_RODFT01, and finally the longitudinal FFTW_HC2R.

Unify execution model of triangular and disk harmonics

Right now, one has conditionals based on parameters, the other always executes the 1D column transforms.

Inconsistency with FastTransforms.jl

n = 600
    S = KoornwinderTriangle(0.0,-0.5,-0.5)
    ff = (xy) -> P(18,8,0.,-0.5,-0.5,xy...)
    p = points(S,n)
    v = ff.(p)
    PP = plan_transform(S,v)
    v̂ = PP.duffyplan*v
    F = tridevec_trans(v̂)
    F̌ = FastTransforms.cheb2tri(F,0.,-0.5,-0.5)
    F̌₂ = PP.tri2cheb \ F
    norm(F̌ - F̌₂)
size(F)
F̃ = copy(F)
c_cheb2tri(c_plan_tri2cheb(size(F,1),0.0,-0.5,-0.5),F̃)

Fast jac2jac with integer steps

julia> n = 1000; x = randn(n); @time FastTransforms.lib_jac2jac(x, 0, 0, 1, 1);
  0.021386 seconds (6 allocations: 8.109 KiB)

julia> n = 10_000; x = randn(n); @time FastTransforms.lib_jac2jac(x, 0, 0, 1, 1);
  0.676580 seconds (7 allocations: 78.344 KiB)

Add single precision methods

Two parameter rectangularized disk polynomials

Dunkl-Xu polynomials with the weight (1-x^2-y^2)^β are written in terms of equal-parameter Jacobi polynomials. These can be generalized in two ways: with inequal-parameter Jacobi polynomials for a weight like (1-sqrt(x^2+y^2))^α(1+sqrt(x^2+y^2))^β; or with equal-parameter Jacobi polynomials with the quadratic transformations for a weight like (x^2+y^2)^α(1-x^2-y^2)^β. The latter, which I prefer due to their correspondence with generalized Zernike, are like 2D Konoplev OPs with a symmetric weight on [-1,1] with interior algebraic factor. @dlfivefifty

Add documentation

There is almost no documentation or even code comments to explain what the code does. For example, I'm trying to use c_execute_tri_lo2hi but there is no explanation what basis the lo should be in.

Make not working on Mojave

[solver-mbook:~/Projects/FastTransforms] solver% export CC=gcc-8 && make
make lib
gcc-8 -Ofast  -march=native -mtune=native -I./src -I/usr/local/opt/openblas/include -I/usr/local/opt/fftw/include -shared -lm -fPIC -fopenmp src/transforms.c src/rotations.c src/permute.c src/drivers.c src/fftw.c -L/usr/local/opt/openblas/lib -L/usr/local/opt/fftw/lib -lm -lblas -lfftw3 -lfftw3_threads -o libfasttransforms.dylib
In file included from src/transforms.c:3:
src/ftinternal.h:4:10: fatal error: stdlib.h: No such file or directory
 #include <stdlib.h>
          ^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
                 from /usr/local/opt/openblas/include/cblas.h:5,
                 from src/fasttransforms.h:4,
                 from src/rotations.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
 #include <_stdio.h>
          ^~~~~~~~~~
compilation terminated.
In file included from src/permute.c:3:
src/ftinternal.h:4:10: fatal error: stdlib.h: No such file or directory
 #include <stdlib.h>
          ^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
                 from /usr/local/opt/openblas/include/cblas.h:5,
                 from src/fasttransforms.h:4,
                 from src/drivers.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
 #include <_stdio.h>
          ^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
                 from /usr/local/opt/openblas/include/cblas.h:5,
                 from src/fasttransforms.h:4,
                 from src/fftw.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
 #include <_stdio.h>
          ^~~~~~~~~~
compilation terminated.
make[1]: *** [lib] Error 1
make: *** [all] Error 2

Use FT_NUM_THREADS

Directly using OpenMP's functions may lead to issues when linking with other libraries.

Generalised Zernike polynomials?

I remember there used to be docs on what the transforms actually do, but they seem to have disappeared from the docs page! Where's the Disk transform definition, for example??

Disk harmonic plan segfaults when `n=1`

I think I narrowed down the bug:

julia> using MultivariateOrthogonalPolynomials

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(2)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
              end

signal (11): Segmentation fault: 11
in expression starting at no file:0
_ZN4llvm20AAResultsWrapperPass13runOnFunctionERNS_8FunctionE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)

Add Travis Continuous Integration

CC @dawese.

Vectorized transforms with non-power-of-two degrees

This has been done for

spherical harmonics
triangular harmonics
disk harmonics
spin-weighted spherical harmonics

tests freeze on julia 1.4

tests freeze here

...
  [44cfe95a] Pkg 
  [de0858da] Printf 
  [3fa0cd96] REPL 
  [9a3f8284] Random 
  [ea8e919c] SHA 
  [9e88b42a] Serialization 
  [6462fe0b] Sockets 
  [2f01184e] SparseArrays 
  [10745b16] Statistics 
  [8dfed614] Test 
  [cf7118a7] UUIDs 
  [4ec0a83e] Unicode 
Test Summary:     | Pass  Total
Special functions |   19     19
Test Summary:       | Pass  Total
Chebyshev transform |  250    250
Test Summary:                         | Pass  Total
Fejér and Clenshaw--Curtis quadrature |    9      9

Add installation instructions

brew install gcc
brew install openblas
brew install fftw
export CC=gcc-8 && make

export OMP_NUM_THREADS=4
export VECLIB_MAXIMUM_THREADS=4

Use runtime checks to dispatch on SIMD

A cross-compiler might be more advanced than the user's personal computer. Thus, cross-compiling should be done at the highest level of SIMD provided that runtime checks on cpuid can be implemented and have the least overhead.

Triangle transform broken for a,b,c = 0,0.5,-0.5

P = MultivariateOrthogonalPolynomials.c_plan_tri2cheb(4, 0.0, 0.5, -0.5)
F = zeros(4,4); F[1]=1;
MultivariateOrthogonalPolynomials.c_cheb2tri(P, F) # Fills F with NaN

Note that a,b,c = (0,-0.5,-0.5) works fine.

Support for Gegenbauer polynomials / hyperspherical harmonics?

Hi! I'm working on problems with data defined on the n-sphere (n >= 128 or so). From looking around the documentation it seems like this library doesn't support that--do I have the right? Are you familiar with any other libraries that support it?

Ball and Tet transforms?

2D cheb2leg (or more generally, support "region")

FFTW allows specifying "regions", for multidimensional FFTs:

julia> X = randn(5,5,5);

julia> F = fft(X,(1,3));

julia> F[:,1,:] ≈ fft(X[:,1,:]) # 2D FFT acting on 1st and 3rd dimension
true

It turns out I need this feature for Legendre transforms.... at the moment it just does 1D transforms:

julia> X = randn(5,5);

julia> cheb2leg(X)[:,1] == cheb2leg(X[:,1])
true

I can add it to FastTransforms.jl, e.g., if I need a 2D Legendre I can do

cheb2leg(cheb2leg(X')')

but I'm curious if this could be SIMD-optimised (or multithreaded) in C to make it faster?

Incorporate permutations for SIMD into the shared memory paradigm

Reimplement code coverage

This used to work on Travis, but now that the free trial has been exhausted and the CI has mostly been migrated to GitHub Actions, the coverage flags make the tests extremely slow. If this can somehow be fixed, it may be reimplemented by the two environment variables COV=gcov and FT_COVERAGE=1, and post-success make coverage && bash <(curl -s https://codecov.io/bash).

Change the vectorization paradigm

Single step decrements in Proriol/Zernike/spherical harmonic polynomial orders are currently done with a sweep of Givens rotations. Nominally, they come from the QR factorization of W = I±X or I-X^2. Multi-step decrements can be done with a Householder QR factorization of W^k for some integer k (c.f. https://arxiv.org/pdf/2302.08448.pdf with @dlfivefifty and @TSGut). It's reasonable then that k would be chosen such that the Householder vectors fit exactly into SIMD registers or small multiples of these to maximize throughput.

All functions and structs should be prefixed by a namespace

Probably ft_ is reasonable.

Is there a performance improvement to vectorized kernels when aligning memory?

Disk transforms on higher order tensors?

In preparation for Cylinder solves, in FastTransforms.jl I'm planning to add support for disk2cxf(A::Array{T,3}, α, β, 1) in analogy to fft(A::Matrix, 1), etc. I can do this all in Julia via views (I assume strided array views are compatible with FastTransforms) but perhaps doing this in C there's potential for better parallelisation?

Note the interface needs some thought. Way back, when I wrote the previous paragraph, I was thinking disk2cxf(A::Array{T,3}, α, β, 1) would act on the first two dimensions of the array but what if we want to act on the first and third? Or if we have tensor products of two disks do we want to write the transform that transforms both as disk2cxf(A::Array{T,4}, α, β, 1:2)? Perhaps a better interface is

disk2cxf(A::Array{T,3}, α, β, (1,2)) # single disk transform on first two dimensions
disk2cxf(A::Array{T,3}, α, β, (1,3)) # single disk transform on first and last dimension
disk2cxf(A::Array{T,4}, α, β, ((1,2), (3,4)) # tensor disk transform on first and second followed by 3rd and 4th dimension
disk2cxf(A::Array{T,4}, α, β) # same as above
disk2cxf(A::Array{T,3}, α, β, ((1,2), (2,3)) # Get lost!

You might wonder why not leave it to the "user" of FastTransforms.jl. There won't likely be any speedup and multithreading is a pipe-dream. I guess the argument is (1) consistent interface (2) consistent allocation-free behaviour in higher dimensions (3) preparation for MPIFastTransforms.jl or similar for distributed memory parallelisation (which we pretty much need for Cylinders or Cubes to do anything fun)

@ioannisPApapadopoulos

mikaelslevinsky / fasttransforms Goto Github PK

fasttransforms's People

Stargazers

Watchers

Forkers

fasttransforms's Issues

Recommend Projects

Recommend Topics

Recommend Org