Coder Social home page Coder Social logo

malb / m4ri Goto Github PK

View Code? Open in Web Editor NEW
50.0 8.0 24.0 1.37 MB

M4RI is a library for fast arithmetic with dense matrices over GF(2)

License: GNU General Public License v2.0

C 84.42% CSS 3.78% Shell 0.27% HTML 1.97% Makefile 0.74% M4 8.82%
linear-algebra matrix-multiplication matrix-factorization c

m4ri's Introduction

M4RI is a library for fast arithmetic with dense matrices over F2. The name M4RI comes from the first implemented algorithm: The “Method of the Four Russians” inversion algorithm published by Gregory Bard. This algorithm in turn is named after the “Method of the Four Russians” multiplication algorithm which is probably better referred to as Kronrod's method. M4RI is available under the General Public License Version 2 or later (GPLv2+).

Main Features

  • basic arithmetic with dense matrices over F2 (addition, equality testing, stacking, augmenting, sub-matrices, randomisation);

  • asymptotically fast $O(n^{log_2 7})$ matrix multiplication via the Method of the Four Russians (M4RM) & Strassen-Winograd algorithm;

  • asymptotically fast $O(n^{log_2 7})$ PLE factorisation (Gaussian elimination, system solving, …);

  • fast row echelon form computation and matrix inversion via the Method of the Four Russians (M4RI, $O(n^{3/log n})$);

  • asymptotically fast Triangular System solving with Matrices (upper left, lower left, upper right, lower right),

  • support for the x86/x86_64 SSE2 instruction set where available;

  • preliminary support for parallelisation on shared memory systems via OpenMP;

  • and support for Linux, Solaris, and OS X (GCC).

See Further Reading for implemented algorithms.

Performance

See Performance.

OpenMP Support

OpenMP support for parallel multiplication and elimination is enabled with the

--enable-openmp

configure switch.

Install

If you downloaded M4RI by cloning the mainline tree at

https://bitbucket.org/malb/m4ri

you need to first run the following command:

autoreconf --install

Then do the usual

./configure
make
make check

For details see the instructions in the file INSTALL.

Documentation

To build the reference manual, ensure that you have Doxygen installed. The HTML version of the reference manual can be built as follows:

cd src/
doxygen

The built documentation is contained under the doc subdirectory of m4ri/. Once the HTML version is built, you can build the PDF version as follows:

cd doc/latex/
make

The documentation is also available here.

Contributors

At least the following people have contributed to the M4RI library.

  • Tim Abbott: Debian-isation & advice on correct libtool versioning;
  • Martin Albrecht: maintainer, release manager, peformance tuning (M4RM, M4RI, Strassen, PLE), initial M4RM implementation, parallelisation, PLE factorisation (MMPF algorithm);
  • Gregory Bard: initial author, M4RI algorithm and initial implementation;
  • Marco Bodrato: new Strassen-like sequence for matrix multiplication and squaring which improves performance for squaring;
  • Michael Brickenstein: PolyBoRi author, standard conformity contributions for ANSIC, test data, discussion/suggestion of performance improvements, fast vector-matrix products;
  • Alexander Dreyer: PolyBoRi author, standard conformity contributions for ANSIC;
  • Jean-Guillaume Dumas: linear system resolution;
  • William Hart: many performance improvements for matrix multiplication and in general;
  • David Harvey: parallel parity function used in classical multiplication;
  • Jerry James: bug fixes, dealing with compiler warnings, Fedora Linux packaging;
  • David Kirkby: portability issues (Solaris, HP Unix);
  • Clément Pernet: PLE factorisation, triangular system solving (TRSM);
  • Wael Said: test cases, feedback;
  • Carlo Wood: bit-level optimisation (transpose, column swaps), refactoring, benchmark(et)ing framework, test code, build system clean-up;

We are grateful to William Stein for providing our hosting and general infrastructure in the past.

Citing M4RI

If you use our libraries in a non-trivial part of your research please consider citing them as follows:

@manual{M4RI,
    key          = "M4RI",
    author       = "Martin Albrecht and Gregory Bard",
    organization = "The M4RI~Team",
    title        = "{The M4RI Library -- Version **version**}",
    year         = **year**,
    url          = "\url{https://bitbucket.org/malb/m4ri}",
}

and cite the appropriate publications mentioned in Further Reading.

Contact

Please contact our mailinglist if there are bugs, questions, comments.

History

  • 2024/07/29 A new version of M4RI is available. It is available at https://github.com/malb/m4ri/tags. Charles Bouillaguet made a series of improvements, see: https://github.com/malb/m4ri/compare/release-20200125...release-20240729

  • 2020/01/25 A new version of M4RI is available with a few bugfixes. It is available at https://bitbucket.org/malb/m4ri/downloads.

  • 2020/01/15 A new version of M4RI is available with a few small build system tweaks. It is available at https://bitbucket.org/malb/m4ri/downloads.

  • 2015/04/17 Our hosting for http://m4ri.sagemath.org at University of Washington. is discontinued and we’re moving everything over to https://bitbucket.org/malb/m4ri. A copy of the old website (except for large files) is available at http://malb.bitbucket.io/m4ri-e-website-2008-2015/.

  • 2014/09/14 A new version of M4RI and M4RIE is available for download. The biggest change is that A->offset was dropped. Also, various small (multicore) performance improvements were implemented. The update for M4RIE is to maintain compatibility with M4RI. A few improvements were implemented for the mzd_poly module as well.

  • 2013/04/16 A new version of M4RI is available for download. A detailed changlog is available here for M4RI.

  • 2012/12/21 A new version of M4RI is available for download. A detailed changlog is available here for M4RI. See also this blog post for details.

  • 2012/06/13 New versions of both M4RI and M4RIE are available for download. A detailed changlog are available here for M4RI.

  • 2012/04/13 New versions of both M4RI and M4RIE are available for download. Detailed changlogs are available here for M4RI and here for M4RIE.

  • 2011/12/04 New versions of both M4RI and M4RIE are available for download. The highlight of this version for M4RI is support for reading and writing 1-bit PNG images. The highlight of this release of M4RIE is much improved performance for $4 < e \leq 8$. Detailed changlogs are available here for M4RI and here for M4RIE.

  • 2011/11/30 A technical report by Martin R. Albrecht is available describing the M4RIE library. In particular, Newton-John tables are introduced and our implementation of Karatsuba based matrix-matrix multiplication is described:

    The M4RIE library for dense linear algebra over small fields with even characteristic

    Abstract: In this work, we present the M4RIE library which implements efficient algorithms for linear algebra with dense matrices over GF(2^e) for 2 ≤ e ≤ 10. As the name of the library indicates, it makes heavy use of the M4RI library both directly (i.e., by calling it) and indirectly (i.e., by using its concepts). We provide an open-source GPLv2+ C library for efficient linear algebra over GF(2^e) for e small. In this library we implemented an idea due to Bradshaw and Boothby which reduces matrix multiplication over GF(p^k) to a series of matrix multiplications over GF(p). Furthermore, we propose a caching technique - Newton-John tables - to avoid finite field multiplications which is inspired by Kronrod's method ("M4RM") for matrix multiplication over GF(2). Using these two techniques we provide asymptotically fast triangular solving with matrices (TRSM) and PLE-based Gaussian elimination. As a result, we are able to significantly improve upon the state of the art in dense linear algebra over $F(2^e) with 2 ≤ e ≤ 10.

  • 2011/11/29 A technical report by Martin R. Albrecht, Gregory Bard and Clément Pernet is available describing the Gaussian elimination machinery (PLE decomposition) in the M4RI library:

    Efficient Dense Gaussian Elimination over the Finite Field with Two Elements.

    Abstract: In this work we describe an efficient implementation of a hierarchy of algorithms for Gaussian elimination upon dense matrices over the field with two elements. We discuss both well-known and new algorithms as well as our implementations in the M4RI library, which has been adopted into Sage. The focus of our discussion is a block iterative algorithm for PLE decomposition which is inspired by the M4RI algorithm. The implementation presented in this work provides considerable performance gains in practice when compared to the previously fastest implementation. We provide performance figures on x86_64 CPUs to demonstrate the alacrity of our approach.

  • 2011/10/10 A new release of M4RI is available for download. See the release notes for the list of changes. Also, a new release of M4RIE is also available for download. See the release notes for the list of changes.

  • 2011/07/14 A new release of M4RI is available for download. See the release notes for the list of changes. Also, a new release of M4RIE is also available for download. M4RIE now relies on M4RI for cache size and other hardware feature detection.

  • 2011/06/10 A new release of M4RI is available for download. This version fixes various issues when M4RI is built with OpenMP enabled.

  • 2011/06/01 A new release of M4RI is available for download. See the release notes for the list of changes. Also, a new release of M4RIE is also available for download. The only changes to M4RIE are to ensure compatibility with M4RI version 20110601 and up.

  • 2011/04/13 We now have a mailinglist.

  • 2010/08/14 A new release of M4RI is available for download. The main changes are improved automatic cache size detection and some clean ups necessary for M4RIE. A first official release of M4RIE is also available for download.

  • 2010/07/13 A new release is available for download. See the release notes for details.

  • 2009/11/04 A new release is available for download. See the release notes for details.

  • 2009/04/09 A new release is available for download. It heavily breaks backward compatibility but supports much bigger matrices than before. See the release notes for details.

  • 2009/01/05 A new release is available for download. It contains new features, performance enhancements and bug fixes. Release notes are available in the wiki.

  • 2008/11/12 A paper describing our matrix multiplication implementation is available as pre-print on the ArXiv. Also, M4RI is being packaged for Fedora Core. Finally, we updated the peformance data for GAP and Magma on the Core 2 Duo with improved timings.

  • 2008/10/28 A new release is available for download. It contains mainly bugfixes but starting with this release triangular solving with matrices (TRSM) is fully supported. Also LUP factorisation (i.e. on full rank matrices) seems to be working now but it is not optimised at all.

  • 2008/10/22 The slides for the Sage Days 10 talk about matrix multiplication in the M4RI library are available online.

  • 2008/09/22 A new release is available. It is identical to the version of M4RI shipped with Sage 3.1.2 and contains many build fixes for a wide range of platforms. Sage (and thus M4RI) supportes x86 Linux, x86_64 Linux, ia64 Linux, x86 OSX and ppc OSX. M4RI also supports Windows and Solaris 10.

  • 2008/08/26 This release is a pure bugfix release. Before this bugfix, if the input matrices were very non-square either wrong results or SIGSEGVs could be observed.

  • 2008/08/21 A new release is available. This release contains Clément Pernet's latest LQUP and TRSM development code. LQUP still lacks a basecase but TRSM should be fairly complete. No attempts were made so far to optimise things. Furthermore, this release contains an improved strategy for choosing $k$ in M4RM which improves performance on the Core2Duo.

  • 2008/08/17 A new release is available. This release adds a simple memory manager for systems with slow malloc/free syscalls. Also, the initialisation (m4ri_init) and finalisation (m4ri_fini) routines are now called automatically when the library is loaded/unloaded. This is tested with GCC and SunCC but not with MSVC. Matrix elimination got slightly faster across plattforms, multi-core support was extended to elimination and improved for multiplication. The README contains instruction how to enable multi-core support. This release does not contain Clément Pernet's latest LQUP patch.

  • 2008/06/24 A new release is available. This release uses the libtool -release mechanism to ensure binary (in)compatibility between releases since - again - the API changed: since the project is quite young do not expect the API to be stable anytime soon. Also the new version attempts to detect the L1 and L2 cache sizes and uses a Strassen-Winograd cutoff by default such that both source matrices fit in L2 (this is not optimal but a good compromise). This new version has some scratch/experimental code which is the beginning of asymptotically fast LQUP factorisation. Finally, elimination got slightly faster.

  • 2008/06/20 Thanks to Tim Abbott libM4RI is now in Debian/unstable.

  • 2008/06/13 It turns out our comparison with Magma on the Core2Duo was strongly biased, since we compared with a version of Magma that was optimised for AMD64 rather than Intel64. The correct times are given now and we apologise for this mix-up.

  • 2008/06/03 This release is a small bugfix release. Matrices are now printed correctly and a bug in mzd_gauss_delayed was reported and fixed by Wael Said and Mohammed Saied.

  • 2008/06/01 This release greatly improves the performance of M4RI: the reduction of a given matrix to (reduced) row echelon form. The speed-up over the last release can be as much as ten, we will provide performance data for this in the near future. However, the new implementation still isn't asymptotically fast. Also mzd_transpose is much faster now due to improved data locality.

  • 2008/05/21 Today's release fixes a severe bug found by Bill Hart, disables SSE2 on all CPUs except those manufactured by Intel (for performance reasons), improves performance on the Core2Duo and introduces a configure switch to enable OpenMP support.

  • 2008/05/20 A new release is available with massive speed improvements for matrix multiplication. These improvements were discussed and tested in this thread on the sage-devel mailing list. This release has also experimental and preliminary support for OpenMP. To activate it compile with GCC 4.2 and CFLAGS="-fopenmp -DHAVE_OPENMP“ Note however, that this release is still a developer preview since some automatic tuning is still not implemented, the performance on the Opteron isn't acceptable yet, and the parallel implementation is naive.

  • 2008/05/16 Release early, release often. This release fixes the unconditional use of _mm_free even when it is not available.

  • 2008/05/15 A new minor release is available which improves performance on Opterons. Also, the website has moved to http://m4ri.sagemath.org.

m4ri's People

Contributors

alexanderdreyer avatar carlowood avatar cbouilla avatar clementpernet avatar isuruf avatar jdemeyer avatar jgdumas avatar jpflori avatar kiwifb avatar malb avatar msoos avatar sebastinas avatar thomwiggers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

m4ri's Issues

L3 cache detection issue

On sage-on-gentoo on a Debian prefix my friend Steve Trogdon noticed some random failures that seemed related to the use of openmp in m4ri. Disabling openmp solved made the issue go away.

After he investigated further it seems that the L3 cache of the machine is misdetected. Entering some safe values again made it go away.
See cschwan/sage-on-gentoo#475 for the full discussion. I am wondering if the L3 cache detection routine needs updating for newer cpus. The macro hasn't received an update in years (not only in m4ri but upstream as well).

move to saner SONAME

Currently, each release has its own SONAME so no two releases are binary compatible. However, the ABI doesn't change much and we can move to a more stable SONAME.

A user said: This is really import for Linux distributions. We are currently updating all the Sage dependencies in Debian [¹] and for M4RI/M4RIE it is especially important to have a sane SONAME, because other libraries (e.g. linbox) depend on it. So right now M4RI claims not only to break its own ABI with every update, but also the ABIs of the libraries depending on it (and they need a source upload with every M4RI update).

high CPU usage

How to solve the issue of high CPU usage during concurrent operations?

`solve_left` in M4RI is `solve_right` in sage

Indeed, what the mzd_solve_left() function does is exactly what is called M.solve_right() in sage.

It could be somewhat confusing, but I am not sure what is the best solution:

  • Rename the function to mzd_solve_right() ?
  • Add an alias for backward compatibility and stop advertising the old name ?
  • Add run-time error if anyone calls the old function ?

CFLAGS in m4ri.pc

Reported by @kiwifb François Bissey:

I have been giving some thought about the presence of SIMD_CFLAGS and OPENMP_CFLAGS in m4ri.pc. The simd flags are not necessary for a customer of m4ri. By including them you are potentially imposing an optimization (useful or not) on a customer of m4ri. The customer doesn’t need these flags to use m4ri compiled with them. It should be up to consumer whether they want to use it in their own code.

openmp is more subtle. You may want to pass -fopenmp (or equivalent) to the linker to make sure the openmp runtime is linked. But when you put it in CFLAGS you may trigger unwanted use of openmp. Let’s take the example of m4rie. m4rie has its own --enable-openmp switch in configure. If someone has an openmp enabled m4ri but doesn’t want to compile m4rie with openmp (for benchmarking for example), the fact that m4ri puts -fopenmp means openmp code path may still be compiled unless they are guarded by HAVE_OPENMP rather than just #pragmas.

I would be tempted to put OPENMP_CFLAGS in Libs only and not in Cflags.

Beefing up m4ri.pc

I am preparing a new release of BRiAl and it strikes me that the way BRiAl and m4rie access the flags from m4ri is suboptimal.
Instead of of using AX_M4RI_CFLAGS we should just use the output of of pkg-config. To make it even more interesting I think it should include the library flags for openmp and libpng. Also the optimization flags used don't really belong there. The consumer of m4ri should be able to set its own without worrying about what is in m4ri.pc.
If you think that is a worthwhile idea I'll send a pull request.

Improve small-ish matrices

@malb said: Our performance for * 64x64 times 64x1, * 64x64 times 64x64 and * Nx64 times 64x64 matrix multiplication is embarrassingly bad. We must improve performance here, probably by using code by Emmanuel Thomé who worked on these dimensions quite a bit.

Clarify PLUQ specification

The documentation says that the PLUQ function returns matrices satisfying A = PLUQ.

But the test code actually checks that PAQ^t = LU, and the test pass… So the specification is actually A == P^TLUQ.

Proposed solution: fix the doc.

Assertion failed in test_misc -> test_submatrix

This was reported on Gentoo in https://bugs.gentoo.org/908449 where the full build/test logs are available. Note tha gcc-13.x was used.

=========================================
   m4ri 20200125: tests/test-suite.log
=========================================

# TOTAL: 14
# PASS:  13
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: test_misc
===============

submatrix: m:    2, n:  127, (   1,    1,    2,  127) ... passed
Assertion failed: highr-lowr > 0 (test_misc.c: test_submatrix: 81)
FAIL test_misc (exit status: 134)

Matrix inversion

we should add mzd_inv(A) as a shorthand for inverting an invertible matrix.

Multithreading concurrency issues

I'm calling this shared object (so) in my project, and I'm encountering a 'double free' error during concurrent calls. I tried to enable OpenMP with --enable-openmp, but it keeps giving me linking errors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.