hurchalla / factoring Goto Github PK

EPR: A Factoring and Primality checking library for C++

License: Other

CMake 0.26% C++ 98.71% Shell 0.39% Batchfile 0.03% C 0.62%

factoring greatest-common-divisor miller-rabin pollard-rho primality-testing sieve-of-eratosthenes factorization gcd integer-factorization miller-rabin-algorithm

factoring's Introduction

The EPR Factoring Library

EPR is a high performance, easy to use factoring and primality checking C++ library (header-only) for any integer up to 128 bits in size. At the time of this writing, EPR provides the fastest factoring functions known for 64 bit integers (i.e. types int64_t and uint64_t). Note that for good performance you must ensure that the standard macro NDEBUG is defined when compiling - see How to use the library.

The name EPR is an abbreviation of Ecm and Pollard-Rho, since those are the two main algorithms this library uses for factoring. It's also a play on Einstein-Podolsky-Rosen for fun (for now).

Thanks to Ben Buhrow for his great original microecm code for ECM, which this library further optimizes (exploiting Clockwork) and extends to 128 bits.

Design goals

The goal for EPR was to create a correct and easy to use library with extremely fast factoring (and primality checking) for native C++ integer types (i.e. 16/32/64 bit signed and unsigned ints). Though it was optimized for native C++ integer types, it also efficiently supports 128 bit types - including the compiler extensions __int128_t and __uint128_t. The EPR library uses variants of ECM and Pollard-Rho and deterministic Miller-Rabin algorithms, described below, and has proofs for the new variants (with the exception of ECM) and extensive unit tests. A portion of the optimizations (such as Montgomery arithmetic) that EPR uses are provided by the Clockwork modular arithmetic library.

Requirements

The EPR library requires compiler support for C++17 (if you are not using CMake, you may need to specify the option -std="c++17" when compiling). Compilers that are confirmed to build the library without warnings or errors on x86 include clang6, clang10, gcc7, gcc10, intel compiler 19, and Microsoft Visual C++ 2017 and 2019. The library is intended for use on all architectures (e.g. x86/64, ARM, RISC-V, Power), but has been tested only on x86/x64.

For good performance you absolutely must ensure that the standard macro NDEBUG (see <cassert>) is defined when compiling.

Status

Released. All planned functionality and unit tests are finished and working correctly.

Authors

Jeffrey Hurchalla
(microecm.h by Ben Buhrow and Jeff Hurchalla)

License

This project is licensed under the MPL 2.0 License - see the LICENSE.TXT file for details

How to use the library

With CMake

If you're using CMake for your project and you wish to add this library to it, then clone this git repository onto your system. In your project's CMakeLists.txt file, add the following two lines with appropriate changes to their italic portions to match your project and paths ( an easy replacement for your_binary_dir is ${CMAKE_CURRENT_BINARY_DIR} ):
add_subdirectory(path_of_the_cloned_factoring_repository your_binary_dir/factoring)
target_link_libraries(your_project_target_name hurchalla_factoring)

For good performance you absolutely must ensure that the standard macro NDEBUG (see <cassert>) is defined when compiling. You can do this by calling CMake with -DCMAKE_BUILD_TYPE=Release.

It may help to see a simple example project with CMake.

Without CMake

If you're not using CMake for your project, you'll need to install/copy the EPR library's headers and dependencies to some directory in order to use them. To do this, first clone this git repository onto your system. You'll need CMake on your system (at least temporarily), so install CMake if you don't have it. Then from your shell run the following commands:

cd path_of_the_cloned_factoring_repository
mkdir tmp
cd tmp
cmake -S.. -B.
cmake --install . --prefix the_folder_you_want_to_install_to
If you prefer, for the last command you could instead use CMake's default install location (on linux this is /usr/local) by omitting the --prefix and subsequent folder.

This will copy all the header files needed for the factoring library to an "include" subfolder in the installation folder of your choosing. When compiling your project, you'll of course need to ensure that you have that include subfolder as part of your include path.

For good performance you absolutely must ensure that the standard macro NDEBUG (see <cassert>) is defined when compiling. You can generally do this by adding the option flag -DNDEBUG to your compile command.

It may help to see a simple example.

The API

The API consists of five header files in total (all the headers that are not under the detail folder). These files are the three general purpose files factorize.h, is_prime.h, greatest_common_divisor.h, and in the resource_intensive_api folder, the two special purpose files factorize_intensive_uint32.h and IsPrimeIntensive.h. Please view these files for their documentation. A quick summary of the functions is provided below; in all cases T is a template parameter of integral type.

hurchalla::factorize(T x, int& num_factors). Returns a std::array containing the factors of x.
hurchalla::factorize(T x, std::vector& factors). Clears a std::vector and fills it with the factors of x.
hurchalla::greatest_common_divisor(T a, T b). Returns the greatest common divisor of a and b.
hurchalla::is_prime(T x). Returns true if x is prime. Otherwise returns false.

(from the resource_intensive_api folder)
hurchalla::IsPrimeIntensive(T x). This is a functor that returns true if x is prime, and otherwise returns false. Depending on the type T, this functor can use a very large amount of memory and can take many seconds to construct. See IsPrimeIntensive.h for details.
hurchalla::factorize_intensive_uint32(uint32_t x, int& num_factors, const IsPrimeIntensive<uint32_t,true>& ipi). Returns a std::array containing the factors of x. Note that the IsPrimeIntensive argument will usually take many seconds for you to construct and will use a large amount of memory. See IsPrimeIntensive.h for details.

Algorithms

For factoring: ECM ("Factoring Integers with Elliptic Curves" by H.W. Lenstra Jr. (see also https://en.wikipedia.org/wiki/Lenstra_elliptic-curve_factorization)). For smaller integers: Pollard-Rho Brent ("An Improved Monte Carlo Factorization Algorithm" by Richard Brent (see also https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm#Variants)).
Both ECM and Pollard-Rho are preceded by a stage of trial division using special algorithms for divisibility (Section 10-17 from Hacker's Delight 2nd edition by Henry Warren, and "ALGORITHM A: IS_DIV_A" from "Efficient long division via Montgomery multiply" by Ernst W. Mayer).

For primality testing: Deterministic Miller-Rabin (https://miller-rabin.appspot.com/ and https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test#Deterministic_variants). If using the resource_intensive_api, also Sieve of Eratosthenes for 32 bit and smaller types (https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes).

Special attention was paid to instruction level parallelism, to take advantage of typical pipelined/superscalar CPUs. This resulted in modifications to some algorithms. For example, Miller-Rabin is changed (along with its associated modular exponentiation) to perform more than one trial/exponentiation at a time. And there is a changed Pollard-Rho Brent algorithm that simultaneously advances two independent sequences (from which it extracts factors), rather than just one.

Near-optimal hash tables are used for fast deterministic Miller-Rabin primality testing. For information on the tables and how they were generated, you can view their README, and see the header files with the tables. These hash tables are the densest known at this time. The general purpose functions hurchalla::is_prime and hurchalla::factorize use some of the smallest of these hash tables (8 to 320 byte), to minimize memory footprint and cache impact while still receiving a performance boost. The resource_intensive_api functions use the largest of the hash tables for best possible performance.

The particular ECM variant used by this library originated from Ben Buhrow's microecm in yafu.

The Pollard-Rho-Brent algorithm uses an easy extra step that seems to be unmentioned in the literature. The step is a "pre-loop" that advances as quickly as possible through a portion of the initial pseduo-random sequence before beginning the otherwise normal Pollard-Rho Brent algorithm. The rationale for this is that every Pollard-Rho pseudo-random sequence begins with a non-periodic segment, and trying to extract factors from that segment is mostly wasted work since the algorithm logic relies on a periodic sequence. Using a "pre-loop" that does nothing except iterate for a set number of times through the sequence thus improves performance on average, since it quickly gets past some of that unwanted non-periodic segment. In particular, the "pre-loop" avoids calling the greatest common divisor, which would rarely find a factor during the non-periodic segment. This optimization would likely help any form/variant of the basic Pollard-Rho algorithm.

Performance Notes

If you're interested in experimenting, predefining certain macros when compiling can improve performance - see macros_for_performance.md.

factoring's People

Contributors

Stargazers

Watchers

factoring's Issues

Slower performance for 32-bit integers

Thank you for creating this wonderful library!

I compared the performance to other similar libraries/programs and it is significantly faster for larger 128-bit integers. For example, compared to GNU factor, it is over 330× faster:

$ ./time.sh -i -r 5 "seq $(bc <<<'(2^127) - (10^2) - 1') $(bc <<<'(2^127) - 1') | "{factor,./gnu/factor,./factoring/gcc_example,./factoring/clang_example}
Benchmark #1: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | factor
  Time (mean ± σ std dev):     143.1562s ±  0.5082s          [User: 143.1556s, System: 0.0000s]
  Range (min … median … max):  142.371s … 143.422s … 143.732s   CPU: 100.0%, 5 runs

Benchmark #2: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./gnu/factor
  Time (mean ± σ std dev):     142.3252s ±  0.7053s          [User: 141.9832s, System: 0.0060s]
  Range (min … median … max):  141.651s … 141.996s … 143.588s   CPU:  99.8%, 5 runs

Benchmark #3: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/gcc_example
  Time (mean ± σ std dev):      0.9602s ±  0.7591s          [User: 0.5456s, System: 0.0080s]
  Range (min … median … max):   0.507s …  0.564s …  2.473s   CPU:  57.7%, 5 runs

Benchmark #4: seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/clang_example
  Time (mean ± σ std dev):      0.4254s ±  0.1519s          [User: 0.3314s, System: 0.0080s]
  Range (min … median … max):   0.312s …  0.356s …  0.719s   CPU:  79.8%, 5 runs

Summary
  #4 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/clang_example’ ran
  336.521 ± 120.198 times (33,552.1%) faster than #1 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | factor’
  334.568 ± 119.506 times (33,356.8%) faster than #2 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./gnu/factor’
    2.257 ± 1.958 times (125.7%) faster than #3 ‘seq 170141183460469231731687303715884105627 170141183460469231731687303715884105727 | ./factoring/gcc_example’

For a couple of 128-bit integers with large 64-bit factors (from the GNU test suite), it is almost 2,000× faster:

$ ./time.sh -i -m 2 "echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | "{factor,./gnu/factor,./factoring/gcc_example,./factoring/clang_example}
Benchmark #1: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | factor
  Time (mean ± σ std dev):     319.4225s ±  0.3625s          [User: 319.4225s, System: 0.0000s]
  Range (min … median … max):  319.060s … 319.422s … 319.785s   CPU: 100.0%, 2 runs

Benchmark #2: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./gnu/factor
  Time (mean ± σ std dev):     318.5655s ±  0.4765s          [User: 317.7500s, System: 0.0005s]
  Range (min … median … max):  318.089s … 318.566s … 319.042s   CPU:  99.7%, 2 runs

Benchmark #3: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/gcc_example
  Time (mean ± σ std dev):      1.1580s ±  0.8340s          [User: 0.2590s, System: 0.0340s]
  Range (min … median … max):   0.324s …  1.158s …  1.992s   CPU:  25.3%, 2 runs

Benchmark #4: echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/clang_example
  Time (mean ± σ std dev):      0.1634s ±  0.0169s          [User: 0.1365s, System: 0.0059s]
  Range (min … median … max):   0.152s …  0.158s …  0.219s   CPU:  87.2%, 13 runs

Summary
  #4 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/clang_example’ ran
1,955.034 ± 202.348 times (195,403.4%) faster than #1 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | factor’
1,949.789 ± 201.814 times (194,878.9%) faster than #2 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./gnu/factor’
    7.088 ± 5.157 times (608.8%) faster than #3 ‘echo 170141183460469225450570946617781744489 170141183460469229545748130981302223887 | ./factoring/gcc_example’

However, for smaller integers under 32-bits, it seems to be up to 11× slower than GNU factor:

$ ./time.sh -i -r 5 'seq 2 10000000 | '{factor,./gnu/factor,./uu/factor,./factoring/gcc_example,./factoring/clang_example,'./numbers -p'}
Benchmark #1: seq 2 10000000 | factor
  Time (mean ± σ std dev):      2.2944s ±  0.0112s          [User: 2.0142s, System: 0.4204s]
  Range (min … median … max):   2.281s …  2.291s …  2.313s   CPU: 106.1%, 5 runs

Benchmark #2: seq 2 10000000 | ./gnu/factor
  Time (mean ± σ std dev):      2.2020s ±  0.0087s          [User: 1.8820s, System: 0.4534s]
  Range (min … median … max):   2.188s …  2.201s …  2.213s   CPU: 106.1%, 5 runs

Benchmark #3: seq 2 10000000 | ./uu/factor
  Time (mean ± σ std dev):     12.6390s ±  0.0296s          [User: 10.2822s, System: 2.4632s]
  Range (min … median … max):  12.591s … 12.637s … 12.684s   CPU: 100.8%, 5 runs

Benchmark #4: seq 2 10000000 | ./factoring/gcc_example
  Time (mean ± σ std dev):     15.2668s ±  0.1457s          [User: 12.6782s, System: 2.7610s]
  Range (min … median … max):  15.093s … 15.291s … 15.512s   CPU: 101.1%, 5 runs

Benchmark #5: seq 2 10000000 | ./factoring/clang_example
  Time (mean ± σ std dev):     15.2658s ±  0.0899s          [User: 12.7338s, System: 2.7064s]
  Range (min … median … max):  15.172s … 15.212s … 15.419s   CPU: 101.1%, 5 runs

Benchmark #6: seq 2 10000000 | ./numbers -p
  Time (mean ± σ std dev):     19.5844s ±  0.2708s          [User: 17.0100s, System: 2.7530s]
  Range (min … median … max):  19.241s … 19.561s … 20.072s   CPU: 100.9%, 5 runs

Summary
  #2 ‘seq 2 10000000 | ./gnu/factor’ ran
    1.042 ± 0.007 times (4.2%) faster than #1 ‘seq 2 10000000 | factor’
    5.740 ± 0.026 times (474.0%) faster than #3 ‘seq 2 10000000 | ./uu/factor’
    6.933 ± 0.072 times (593.3%) faster than #4 ‘seq 2 10000000 | ./factoring/gcc_example’
    6.933 ± 0.049 times (593.3%) faster than #5 ‘seq 2 10000000 | ./factoring/clang_example’
    8.894 ± 0.128 times (789.4%) faster than #6 ‘seq 2 10000000 | ./numbers -p’

It seems to get progressively faster as the integers get larger. For integers less than 10⁴, it is about 10.9× slower, for 10⁵ about 8.1× slower, for 10⁶ about 7.2× slower, for 10⁷ about 6.9× slower and for 10⁸ about 4.6× slower.

The above benchmarks were created with my Benchmarking Tool and include the system GNU factor (factor), a locally compiled version of GNU factor with -O3 and LTO enabled (./gnu/factor), uutils' Rust port of GNU factor (./uu/factor), a simple wrapper program for this library adapted from your example without CMake compiled with both GCC (./factoring/gcc_example) and Clang (./factoring/clang_example) and with inline ASM enabled, and lastly my simple Numbers Tool program which includes a partial C++ port of GNU factor (./numbers -p).

Hopefully I am using your library correctly to get maximum performance. For reference, the wrapper program I wrote for your library is available here: https://gist.github.com/tdulcet/6393865769644aaf9244561f3eb4e9c2. The commands I used to compile and run it are on top of the file. I used GCC 11.4 and Clang 14.0. Note that Clang did generate 42 compiler warnings, but in my benchmarking it was always faster than GCC with your library.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.