Comments (4)
I am also very interested in this functionality. Is there any progress there? I can do this if you like, I just need more info. FLINT is amazing, keep up the good job guys 👍
from flint.
Nobody is currently working on this. There is a --with-blas option in
configure which should at least allow flint to be linked against BLAS,
though of course at present that does nothing.
The nmod_poly module is obviously the logical place for this to be used,
particularly in the case where dim*p^2 fits into 53 bits or 23 bits.
Bill.
On 19 January 2014 09:36, Mohammed ElSeidy [email protected] wrote:
I am also very interested in this functionality. Is there any progress
there? I can do this if you like, I just need more info. FLINT is amazing,
keep up the good job guys [image: 👍]—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-32703633
.
from flint.
Correction to Bill's comment: nmod_mat, not nmod_poly.
The first step after making linking to BLAS work should probably be to write an nmod_mat_mul_blas that multiplies two nmod_mat's by converting them to floating-point arrays, doing a BLAS multiplication, and converting the result back to an nmod_mat.
Then you might want to tune it for different sizes. You probably want to use single precision floats instead of doubles when the modulus is small enough. When the modulus is really small, you could pack several entries into each double (nmod_mat_mul already does such packing into 64-bit words). Also, when dim*p^2 does not fit in 53 bits, you can still use BLAS by computing the matrix product modulo a couple of smaller primes and then patching the results together using the Chinese remainder theorem.
Finally, there's making nmod_mat_mul use BLAS multiplication. You might want to do Strassen multiplication on top of the BLAS multiplication, as this should be faster when sizes get large enough, so presumably the BLAS multiplication should just replace nmod_mat_mul_basecase. Some careful profiling is needed to determine good cutoffs.
Optionally, you could also improve fmpz_mat_mul_multi_mod so that it does BLAS multiplications directly instead of converting to nmod_mats first. This should eliminate a lot of overhead, and should be faster if you also choose primes here that are optimal for BLAS (i.e. much smaller than the default ~64 bits).
from flint.
On 19 January 2014 23:04, Fredrik Johansson [email protected]:
Correction to Bill's comment: nmod_mat, not nmod_poly.
Yes, that was a stupid typo. Thanks for spotting it Fredrik.
The first step after making linking to BLAS work should probably be to
write an nmod_mat_mul_blas that multiplies two nmod_mat's by converting
them to floating-point arrays, doing a BLAS multiplication, and converting
the result back to an nmod_mat.Then you might want to tune it for different sizes. You probably want to
use single precision floats instead of doubles when the modulus is small
enough. When the modulus is really small, you could pack several entries
into each double (nmod_mat_mul already does such packing into 64-bit
words). Also, when dim*p^2 does not fit in 53 bits, you can still use BLAS
by computing the matrix product modulo a couple of smaller primes and then
patching the results together using the Chinese remainder theorem.
I am sceptical that this last one will actually be faster than what we
currently have. But I'm definitely interested in seeing a comparison and
being proved wrong.
Finally, there's making nmod_mat_mul use BLAS multiplication. You might
want to do Strassen multiplication on top of the BLAS multiplication, as
this should be faster when sizes get large enough, so presumably the BLAS
multiplication should just replace nmod_mat_mul_basecase.
Definitely. It's unlikely that BLAS will be faster beyond about dimension
256.
Some careful profiling is needed to determine good cutoffs.
Optionally, you could also improve fmpz_mat_mul_multi_mod so that it does
BLAS multiplications directly instead of converting to nmod_mats first.
This should eliminate a lot of overhead, and should be faster if you also
choose primes here that are optimal for BLAS (i.e. much smaller than the
default ~64 bits).—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-32722887
.
from flint.
Related Issues (20)
- Improve Hensel lifting as used in fmpz_poly_factor
- name clash with Singular's `fq_nmod_set_nmod_poly` HOT 4
- There is no fmpz_mod_mat_det HOT 1
- fmpz_mat_charpoly doesn't use the best algorithm in all cases HOT 1
- Improve _fmpz_vec_dot
- Integrate code from PML
- Waksman multiplication
- Shortcut for running tests HOT 1
- Many div and divrem calls shouldn't be
- Nightly CI for software using FLINT HOT 4
- License change to GMP-compatible license HOT 6
- Unified layout for context objects
- calculating 1F1 HOT 7
- Multivariate fractions with factored denominator
- Performance regression in fmpz_mpoly HOT 9
- Use a GPU to increase performance HOT 6
- Document common trap in `arb_set_d` HOT 7
- Fix ordering of inputs of `add_ssaaaa` (and friends)
- If enabling fft_small in configuration, check for AMD/Intel HOT 2
- Use preprocessor only more throughout configuration
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flint.