Coder Social home page Coder Social logo

Comments (4)

ElSeidy avatar ElSeidy commented on July 16, 2024

I am also very interested in this functionality. Is there any progress there? I can do this if you like, I just need more info. FLINT is amazing, keep up the good job guys 👍

from flint.

wbhart avatar wbhart commented on July 16, 2024

Nobody is currently working on this. There is a --with-blas option in
configure which should at least allow flint to be linked against BLAS,
though of course at present that does nothing.

The nmod_poly module is obviously the logical place for this to be used,
particularly in the case where dim*p^2 fits into 53 bits or 23 bits.

Bill.

On 19 January 2014 09:36, Mohammed ElSeidy [email protected] wrote:

I am also very interested in this functionality. Is there any progress
there? I can do this if you like, I just need more info. FLINT is amazing,
keep up the good job guys [image: 👍]


Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-32703633
.

from flint.

fredrik-johansson avatar fredrik-johansson commented on July 16, 2024

Correction to Bill's comment: nmod_mat, not nmod_poly.

The first step after making linking to BLAS work should probably be to write an nmod_mat_mul_blas that multiplies two nmod_mat's by converting them to floating-point arrays, doing a BLAS multiplication, and converting the result back to an nmod_mat.

Then you might want to tune it for different sizes. You probably want to use single precision floats instead of doubles when the modulus is small enough. When the modulus is really small, you could pack several entries into each double (nmod_mat_mul already does such packing into 64-bit words). Also, when dim*p^2 does not fit in 53 bits, you can still use BLAS by computing the matrix product modulo a couple of smaller primes and then patching the results together using the Chinese remainder theorem.

Finally, there's making nmod_mat_mul use BLAS multiplication. You might want to do Strassen multiplication on top of the BLAS multiplication, as this should be faster when sizes get large enough, so presumably the BLAS multiplication should just replace nmod_mat_mul_basecase. Some careful profiling is needed to determine good cutoffs.

Optionally, you could also improve fmpz_mat_mul_multi_mod so that it does BLAS multiplications directly instead of converting to nmod_mats first. This should eliminate a lot of overhead, and should be faster if you also choose primes here that are optimal for BLAS (i.e. much smaller than the default ~64 bits).

from flint.

wbhart avatar wbhart commented on July 16, 2024

On 19 January 2014 23:04, Fredrik Johansson [email protected]:

Correction to Bill's comment: nmod_mat, not nmod_poly.

Yes, that was a stupid typo. Thanks for spotting it Fredrik.

The first step after making linking to BLAS work should probably be to
write an nmod_mat_mul_blas that multiplies two nmod_mat's by converting
them to floating-point arrays, doing a BLAS multiplication, and converting
the result back to an nmod_mat.

Then you might want to tune it for different sizes. You probably want to
use single precision floats instead of doubles when the modulus is small
enough. When the modulus is really small, you could pack several entries
into each double (nmod_mat_mul already does such packing into 64-bit
words). Also, when dim*p^2 does not fit in 53 bits, you can still use BLAS
by computing the matrix product modulo a couple of smaller primes and then
patching the results together using the Chinese remainder theorem.

I am sceptical that this last one will actually be faster than what we
currently have. But I'm definitely interested in seeing a comparison and
being proved wrong.

Finally, there's making nmod_mat_mul use BLAS multiplication. You might
want to do Strassen multiplication on top of the BLAS multiplication, as
this should be faster when sizes get large enough, so presumably the BLAS
multiplication should just replace nmod_mat_mul_basecase.

Definitely. It's unlikely that BLAS will be faster beyond about dimension
256.

Some careful profiling is needed to determine good cutoffs.

Optionally, you could also improve fmpz_mat_mul_multi_mod so that it does
BLAS multiplications directly instead of converting to nmod_mats first.
This should eliminate a lot of overhead, and should be faster if you also
choose primes here that are optimal for BLAS (i.e. much smaller than the
default ~64 bits).


Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-32722887
.

from flint.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.