Coder Social home page Coder Social logo

mikaelslevinsky / fasttransforms Goto Github PK

View Code? Open in Web Editor NEW
53.0 6.0 8.0 1.26 MB

:bullettrain_front: Fast orthogonal polynomial transforms :surfer:

License: MIT License

Makefile 0.52% C 98.94% BitBake 0.54%
legendre chebyshev jacobi laguerre ultraspherical spherical zernike proriol harmonics spin

fasttransforms's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fasttransforms's Issues

Add FFTW-based synthesis and analysis

Hi Michael,

It took a little while to get build on linux going. Here's suggestions for tweaks to library (not worth a pull request). I only tested on ubuntu 16.04.

Append to Make.inc:

ifeq ($(UNAME), Linux)
LDLIBS += -lm
CFLAGS += -I/usr/lib/openblas-base
LDFLAGS += -L/usr/lib/openblas-base
endif

You'll also need to add in the ifeq windows block:
UNAME := Windows

Also, you'll want to add instructions to type in the shell:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.

Otherwise the executable can't find the .so

My tests now all run, although I don't understand the output.

PS It's not apparent from your docs how to do a simple spherical harmonic eval or projection onto a tensor product grid. Am I supposed to do sph to Fourier then do FFT2 on grid? I would have to guess the normalization, etc. An example of that would be very helpful.

PPS Good to see you at ICOSAHOM!

Fast trivial `cheb2jac`

julia> x = randn(100); @time cheb2jac(x, -1/2,-1/2);
  0.000522 seconds (18 allocations: 1.234 KiB)

julia> x = randn(1000); @time cheb2jac(x, -1/2,-1/2);
  0.025063 seconds (18 allocations: 8.297 KiB)

Combine DCTs (DSTs) and connection coefficients for faster (alternative) synthesis and analysis

Using an O(n2 log n) instead of O(n2) precomputation, the colatitudinal DCTs and DSTs can be applied to the Legendre to Chebyshev connection coefficients to reduce the overall spherical harmonic synthesis and analysis to execute_sph_hi2lo followed by a sequence of cblas_dgemm and one longitudinal FFTW_HC2R.

Contrast this with the current four-step approach of execute_sph_hi2lo followed by a sequence of cblas_dtrmm and a sequence of FFTW_REDFT01 and FFTW_RODFT01, and finally the longitudinal FFTW_HC2R.

Inconsistency with FastTransforms.jl

n = 600
    S = KoornwinderTriangle(0.0,-0.5,-0.5)
    ff = (xy) -> P(18,8,0.,-0.5,-0.5,xy...)
    p = points(S,n)
    v = ff.(p)
    PP = plan_transform(S,v)
    v̂ = PP.duffyplan*v
    F = tridevec_trans(v̂)
    F̌ = FastTransforms.cheb2tri(F,0.,-0.5,-0.5)
    F̌₂ = PP.tri2cheb \ F
    norm(F̌ - F̌₂)
size(F)
F̃ = copy(F)
c_cheb2tri(c_plan_tri2cheb(size(F,1),0.0,-0.5,-0.5),F̃)

Fast jac2jac with integer steps

julia> n = 1000; x = randn(n); @time FastTransforms.lib_jac2jac(x, 0, 0, 1, 1);
  0.021386 seconds (6 allocations: 8.109 KiB)

julia> n = 10_000; x = randn(n); @time FastTransforms.lib_jac2jac(x, 0, 0, 1, 1);
  0.676580 seconds (7 allocations: 78.344 KiB)

Two parameter rectangularized disk polynomials

Dunkl-Xu polynomials with the weight (1-x^2-y^2)^β are written in terms of equal-parameter Jacobi polynomials. These can be generalized in two ways: with inequal-parameter Jacobi polynomials for a weight like (1-sqrt(x^2+y^2))^α(1+sqrt(x^2+y^2))^β; or with equal-parameter Jacobi polynomials with the quadratic transformations for a weight like (x^2+y^2)^α(1-x^2-y^2)^β. The latter, which I prefer due to their correspondence with generalized Zernike, are like 2D Konoplev OPs with a symmetric weight on [-1,1] with interior algebraic factor. @dlfivefifty

Add documentation

There is almost no documentation or even code comments to explain what the code does. For example, I'm trying to use c_execute_tri_lo2hi but there is no explanation what basis the lo should be in.

Make not working on Mojave

[solver-mbook:~/Projects/FastTransforms] solver% export CC=gcc-8 && make
make lib
gcc-8 -Ofast  -march=native -mtune=native -I./src -I/usr/local/opt/openblas/include -I/usr/local/opt/fftw/include -shared -lm -fPIC -fopenmp src/transforms.c src/rotations.c src/permute.c src/drivers.c src/fftw.c -L/usr/local/opt/openblas/lib -L/usr/local/opt/fftw/lib -lm -lblas -lfftw3 -lfftw3_threads -o libfasttransforms.dylib
In file included from src/transforms.c:3:
src/ftinternal.h:4:10: fatal error: stdlib.h: No such file or directory
 #include <stdlib.h>
          ^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
                 from /usr/local/opt/openblas/include/cblas.h:5,
                 from src/fasttransforms.h:4,
                 from src/rotations.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
 #include <_stdio.h>
          ^~~~~~~~~~
compilation terminated.
In file included from src/permute.c:3:
src/ftinternal.h:4:10: fatal error: stdlib.h: No such file or directory
 #include <stdlib.h>
          ^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
                 from /usr/local/opt/openblas/include/cblas.h:5,
                 from src/fasttransforms.h:4,
                 from src/drivers.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
 #include <_stdio.h>
          ^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
                 from /usr/local/opt/openblas/include/cblas.h:5,
                 from src/fasttransforms.h:4,
                 from src/fftw.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
 #include <_stdio.h>
          ^~~~~~~~~~
compilation terminated.
make[1]: *** [lib] Error 1
make: *** [all] Error 2

Use FT_NUM_THREADS

Directly using OpenMP's functions may lead to issues when linking with other libraries.

Generalised Zernike polynomials?

I remember there used to be docs on what the transforms actually do, but they seem to have disappeared from the docs page! Where's the Disk transform definition, for example??

Disk harmonic plan segfaults when `n=1`

I think I narrowed down the bug:

julia> using MultivariateOrthogonalPolynomials

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(2)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
              end

julia> for _=1:10000
                  MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
              end

signal (11): Segmentation fault: 11
in expression starting at no file:0
_ZN4llvm20AAResultsWrapperPass13runOnFunctionERNS_8FunctionE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)

tests freeze on julia 1.4

tests freeze here

...
  [44cfe95a] Pkg 
  [de0858da] Printf 
  [3fa0cd96] REPL 
  [9a3f8284] Random 
  [ea8e919c] SHA 
  [9e88b42a] Serialization 
  [6462fe0b] Sockets 
  [2f01184e] SparseArrays 
  [10745b16] Statistics 
  [8dfed614] Test 
  [cf7118a7] UUIDs 
  [4ec0a83e] Unicode 
Test Summary:     | Pass  Total
Special functions |   19     19
Test Summary:       | Pass  Total
Chebyshev transform |  250    250
Test Summary:                         | Pass  Total
Fejér and Clenshaw--Curtis quadrature |    9      9

Add installation instructions

brew install gcc
brew install openblas
brew install fftw
export CC=gcc-8 && make

export OMP_NUM_THREADS=4
export VECLIB_MAXIMUM_THREADS=4

Use runtime checks to dispatch on SIMD

A cross-compiler might be more advanced than the user's personal computer. Thus, cross-compiling should be done at the highest level of SIMD provided that runtime checks on cpuid can be implemented and have the least overhead.

Triangle transform broken for a,b,c = 0,0.5,-0.5

P = MultivariateOrthogonalPolynomials.c_plan_tri2cheb(4, 0.0, 0.5, -0.5)
F = zeros(4,4); F[1]=1;
MultivariateOrthogonalPolynomials.c_cheb2tri(P, F) # Fills F with NaN

Note that a,b,c = (0,-0.5,-0.5) works fine.

Support for Gegenbauer polynomials / hyperspherical harmonics?

Hi! I'm working on problems with data defined on the n-sphere (n >= 128 or so). From looking around the documentation it seems like this library doesn't support that--do I have the right? Are you familiar with any other libraries that support it?

2D cheb2leg (or more generally, support "region")

FFTW allows specifying "regions", for multidimensional FFTs:

julia> X = randn(5,5,5);

julia> F = fft(X,(1,3));

julia> F[:,1,:]  fft(X[:,1,:]) # 2D FFT acting on 1st and 3rd dimension
true

It turns out I need this feature for Legendre transforms.... at the moment it just does 1D transforms:

julia> X = randn(5,5);

julia> cheb2leg(X)[:,1] == cheb2leg(X[:,1])
true

I can add it to FastTransforms.jl, e.g., if I need a 2D Legendre I can do

cheb2leg(cheb2leg(X')')

but I'm curious if this could be SIMD-optimised (or multithreaded) in C to make it faster?

Reimplement code coverage

This used to work on Travis, but now that the free trial has been exhausted and the CI has mostly been migrated to GitHub Actions, the coverage flags make the tests extremely slow. If this can somehow be fixed, it may be reimplemented by the two environment variables COV=gcov and FT_COVERAGE=1, and post-success make coverage && bash <(curl -s https://codecov.io/bash).

Change the vectorization paradigm

Single step decrements in Proriol/Zernike/spherical harmonic polynomial orders are currently done with a sweep of Givens rotations. Nominally, they come from the QR factorization of W = I±X or I-X^2. Multi-step decrements can be done with a Householder QR factorization of W^k for some integer k (c.f. https://arxiv.org/pdf/2302.08448.pdf with @dlfivefifty and @TSGut). It's reasonable then that k would be chosen such that the Householder vectors fit exactly into SIMD registers or small multiples of these to maximize throughput.

Disk transforms on higher order tensors?

In preparation for Cylinder solves, in FastTransforms.jl I'm planning to add support for disk2cxf(A::Array{T,3}, α, β, 1) in analogy to fft(A::Matrix, 1), etc. I can do this all in Julia via views (I assume strided array views are compatible with FastTransforms) but perhaps doing this in C there's potential for better parallelisation?

Note the interface needs some thought. Way back, when I wrote the previous paragraph, I was thinking disk2cxf(A::Array{T,3}, α, β, 1) would act on the first two dimensions of the array but what if we want to act on the first and third? Or if we have tensor products of two disks do we want to write the transform that transforms both as disk2cxf(A::Array{T,4}, α, β, 1:2)? Perhaps a better interface is

disk2cxf(A::Array{T,3}, α, β, (1,2)) # single disk transform on first two dimensions
disk2cxf(A::Array{T,3}, α, β, (1,3)) # single disk transform on first and last dimension
disk2cxf(A::Array{T,4}, α, β, ((1,2), (3,4)) # tensor disk transform on first and second followed by 3rd and 4th dimension
disk2cxf(A::Array{T,4}, α, β) # same as above
disk2cxf(A::Array{T,3}, α, β, ((1,2), (2,3)) # Get lost!

You might wonder why not leave it to the "user" of FastTransforms.jl. There won't likely be any speedup and multithreading is a pipe-dream. I guess the argument is (1) consistent interface (2) consistent allocation-free behaviour in higher dimensions (3) preparation for MPIFastTransforms.jl or similar for distributed memory parallelisation (which we pretty much need for Cylinders or Cubes to do anything fun)

@ioannisPApapadopoulos

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.