plummerssoftwarellc / primes Goto Github PK

Prime Number Projects in C#/C++/Python

Home Page: https://plummerssoftwarellc.github.io/PrimeView/

C++ 7.07% C# 21.89% Python 3.77% Shell 2.74% Batchfile 1.44% Go 3.76% Pascal 1.80% Julia 2.01% Common Lisp 14.26% Swift 1.57% Java 2.82% PHP 0.27% Zig 9.37% Dockerfile 4.01% Crystal 0.89% Ada 0.31% Makefile 0.70% Assembly 15.07% JavaScript 3.44% TypeScript 2.82%

programming-languages drag-race primes primesieve benchmarks docker benchmark

primes's Introduction

Primes | A Software Drag Race

Note: You're currently looking at the community contribution branch of this repo, which is now the default. The original branch can be found here.

Source code to Dave's Garage video benchmarking the same prime number sieve in Python, C#, and C++.

Build status

Community contributions

Community contributions, fixes and improvements are accepted on this branch. If you want to add your own solution, please read CONTRIBUTING.md.

Running the benchmarks

The vast majority of benchmarks (solutions) in this repository can be built and run with one Makefile. This is described in BENCHMARK.md.

Viewing results

The PrimeView web application can be used to view, filter and sort results generated by Dave's benchmark machine. It is our intention to add results of other systems in the future. Note that any feedback, suggestions, PRs concerning PrimeView need to be made on the application's own repository.

primes's People

Contributors

Stargazers

Watchers

Forkers

lekonna crgimenes egorbo tannergooding thech0s3n0n3 mckoss rhbvkleef maw136 panjkov p-i-t-s mike-barber realxcv arnonsen jakimball pwae awesomegithubrepos ch0001 zettzet parapapan adamsabelg1 trondhe antonyshen alfest athenaazuraea ibledy bterwijn jl102 profl chinnagalla sewer56 david-sobeski callowaysutton vavanade aberent pasanmanula omnifarious spongman keremkoksal fmielicki dennisdebest jsrobson10 venomtek jameswrw hugosantiagobecerraadan seanbouk j-f1 brucepeterson mikehw mitchellwaite jacktrapper dustman31 jem-experience apisarenco nitepone jaybinks vedarepowered satrac abhiyanp marghidanu robert-fhm bundgaard ruchita3009 alexte yajiynma bluetsys mansenc mihapiha fspigel ijohnnis zyhjuan trevorstubbs cpayne1303 bluseking asdfjkluiop 2782579159qq dhirajpa tdaz-9 mkfahim arvin69 shakedtest jicgiebcden tecwano24145 ape0bakass fatho smiliea gkpotter rich1051414 justusiv dmannock dcbi ajayemlyon gitshreevatsa lvfrazao xvolks lzpnplon138 alansingfield jmatthewpeters minecraftchest1 tcode94 zoopyfsc

primes's Issues

Not an issue - Python with Numba, faster than C#

Hello,

This is not really an issue. Wanted to share a version of the Python code using Numba. But because this is the first time i used Numba and i don't really know how to use jitclass, i had to pull the code apart so it's pretty ugly. As such, i didn't want to submit it as a pull request. Hopefully someone can make a prettier version.

On my machine this ran faster than the C# version, with 4855 iterations vs 4150.

from sys import stdout
from math import sqrt                       # Used by the sieve
import timeit                               # For timing the durations
import numpy as np
from numba import njit

@njit
def primeSieve(limit=1000000):
    rawbits = np.full(int((limit+1)/2), np.bool(True))

    factor = 3
    q = sqrt(limit)
    num = 0

    while factor < q:
        for num in range (factor, limit):
            if num % 2 == 0:
                continue

            if rawbits[int(num/2)]:
                factor = num
                break

        # If marking factor 3, you wouldn't mark 6 (it's a mult of 2) so start with the 3rd instance of this factor's multiple.
        # We can then step by factor * 2 because every second one is going to be even by definition

        for num in range (factor * 3, limit, factor * 2):
            if num % 2 == 0:
                continue
            rawbits[int(num/2)] = False

        factor += 2 # No need to check evens, so skip to next odd (factor = 3, 5, 7, 9...)
    return rawbits

def printResults(showResults, rawbits, duration, passes, sieveSize):
    if showResults: # Since we auto-filter evens, we have to special case the number 2 which is prime
        stdout.write("2, ")

    count = 1
    for num in range (3, sieveSize): # Count (and optionally dump) the primes that were found below the limit
        if num % 2 == 0:
            continue
        if rawbits[int(num/2)]:
            if showResults:
                stdout.write(str(num) +", ")
            count+=1

    countPrimes = sum(1 for b in rawbits if b)
    assert count == countPrimes
    stdout.write("\n")

    primeCounts = { 10 : 1,                 # Historical data for validating our results - the number of primes
                    100 : 25,               # to be found under some limit, such as 168 primes under 1000
                    1000 : 168,
                    10000 : 1229,
                    100000 : 9592,
                    1000000 : 78498,
                    10000000 : 664579,
                    100000000 : 5761455
                  }

    if sieveSize in primeCounts:      # the data, and (b) our count matches. Since it will return
        validateResults = (primeCounts[sieveSize] == countPrimes)
    else:
        validateResults = False

    print("Passes: " + str(passes) + ", Time: " + str(duration) + ", Avg: " + str(duration/passes) +
          ", Limit: " + str(sieveSize) + ", Count: " + str(count) + ", Valid: " + str(validateResults))

tStart = timeit.default_timer()                         # Record our starting time
totPass = 0
numSieve = 1000000
while timeit.default_timer() - tStart < 10:           # Run until more than 10 seconds have elapsed
    res = primeSieve(numSieve)                        #  Calc the primes up to a million
    totPass = totPass + 1                                 #  Count this pass

tD = timeit.default_timer() - tStart

printResults(False, res, tD, totPass, numSieve)

Discuss how to report on alternate algorithms

I think we need to have a discussion to figure out how to deal with reporting for different algorithms, as this adds an extra dimension of fun.

Currently, all of the implementations (except for the C ones) are very similar to the "reference" C++ version, where we consider all odd numbers as potential primes.

The C implementations from @mckoss and @danielspaangberg are really interesting additions, and make reference to this wikipedia section. The same applies to the C# implementation by @Kinematics. Thanks for these! The important thing to note here is that these are algorithmically quite different from all other implementations as they avoid needing to check higher known primes factors besides just 2: variants include 3,5,7, and upwards.

However, this does raise some issues:

we should not just exclude these from reporting - I think it's great work, and it would be a disservice to not include them. We definitely should continue showcasing these.
we also can't compare these results directly to every other implementation, as it would be a disservice to all the work done in other languages
one of the C solutions, 1of2, is the equivalent of the "reference" C++ implementation, so this one is comparable
considering that 1of2 is part of a more general set of algorithms, I don't think it would make sense to move it to a separate project.
there's a potential arms race as other languages implement this, resulting in a lot of code :)

In essence, I think we just need to work out what do to on the reporting side. My initial thoughts are along the lines of this:

have a category for "normal" solutions, which is where most stuff lives, including C 1of2
have a special "different algorithm" category where we can put the more interesting stuff like 8of30 and up

I'm not sure if this is the right approach, or how to deal with tagging specific lines in the output as "different algorithm". It'll be great to get everyone's input! I know you're working on reporting currently @marghidanu and it's looking pretty cool so far.

Much faster C# version

https://pastebin.com/usbUgMjT

Comparability between implementations

The problem

I just took a look at the implementations for C, C++ & Rust as those are commonly known for getting highly optimized versions in benchmarks. And indeed there are quite highly optimized versions to be found.
But after looking at the C implementations, I was wondering at what point the implementations are no longer comparable.

In my opinion it is not really possible to compare the languages in any meaningful way, when some versions optimize the algorithm itsef in the way that most of the C implementations do. Of course the meaningful comparison is debatable in any case, but I would argue that it may still produce interesting results when all implementations use the same base algorithm.

The original implementation was optimized to skip even numbers which is quite reasonable and results in no cpu time overhead while calculating or counting since it just increments by 2 instead of 1.

Almost all C versions on the other hand further optimized the skipping part, with not only skipping even numbers but instead also skipping higher multiples of other primes (2, 3, 5) or even skipping multiples of all primes up to 13 in one example. This of course provides a significant boost to performance (at least the first few) but is now no longer comparable to the original version (and in turn any language that doesn't use the same optimizations). This optimization is of course quite clever and would be valid in any other case, but here it is basically just using existing knowledge to highly decrease the workingset to a much smaller size than the other implementations have to deal with. (Just skipping the multiples of 3 decreases the unchecked values from about 500_000 to only about 166_667)

Besides the counting part now no longer being trivial (and not being timed), this of course makes C/C++ and in turn all languages with implementations that use this optimization seem much faster in relation to other languages than they are.

A language that is inherntly slower than another language could implement this change an in turn look faster on paper, while the other just doesn't have this optimization.

Considerations for a solution

Ok I don't really have a solution, but here are some of my considerations:

Constructs and optimizations that a language provides are fine and wanted (for example using iterators / filters in rust instead of the normal for loops, or C++ vector find) since those represent the languages features
The base algorithm should have to be the same if any cross language comparison is wanted (outer loop, find next factor, set values for this factor, increment factor)
The factor checking HAS to start with 3 and not 5, 7, or even higher to stay comparable
The way how the smaller tasks in the algorithm, like setting / clearing or storing the values can be changed as needed

bit allocation should be dynamic

Primes/PrimeGo/solution_1/main.go

Line 100 in 2c7bbf9

initBitArray := make([]bool, 1e6)

initBitArray := make([]bool, 1e6)

the array size is hard coded to 1,000,000 elements. Should this by dynamic and allow for 10, 100, 1000 ...?
Also, in Go, a bool type requires 1 byte for memory allocation, so each element in initBitArray is using 8 bits, and your output tag would need to specify '8' for the storage flag.

Python list is not an array

Python's list buitin doesn't do random access, it's not an array. bytearray builtin is a real array.
Yet, somehow the bytearray gains not too much.
On my setup, the original version does 20 passes in 10 seconds, an with the bytearray it only does 42 passes.
Also, every step of the sieve can actually start from factor*factor, because every N*factor for N<factor has already been eliminated. This doesn't give much gain, though.

No Node support

I understand that Node isn't technically a language but neither is PHP.

Zig solution 3 is broken

When I run make then the build stops at Zig solution 3. No report is created.

This can also be reproduced by running:
make one SOLUTION=PrimeZig/solution_3

I get the following error:
Step 12/13 : RUN zig build -Drelease-fast
---> Running in 2af2d356b4a8
PrimeZig...The following command terminated unexpectedly:
/deps/local/zig build-exe /opt/app/src/main.zig -OReleaseFast --cache-dir /opt/app/zig-cache --global-cache-dir /root/.cache/zig --name PrimeZig --enable-cache
error: the following build command failed with exit code 9:
/opt/app/zig-cache/o/c4cde88a97309b9cc1655cc0cdae2bbb/build /deps/local/zig /opt/app /opt/app/zig-cache /root/.cache/zig -Drelease-fast
The command '/bin/sh -c zig build -Drelease-fast' returned a non-zero code: 1
"docker run" requires at least 1 argument.
See 'docker run --help'.

Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Run a command in a new container
/bin/bash: line 6: npm: command not found
make: *** [Makefile:30: one] Error 127

Question about rules

Hi, I've devised an algorithm that is quite fast for the language it is and the base algorithm.
I believe that it is the base algorithm and it follows the rules, but I'm not certain.
It uses 8-bits for each stored value, but instead of storing every value, it only stores for the odd numbers starting at 3. It doesn't have any predefined primes (except 2 like the standard c++ one), so it's not a wheel algorithm.
Is this allowed? Is it 8-bits or 4-bits or other?
I don't see why it wouldn't be, but I wanted to check before I got it ready for PR.

Github Discussions

I think it would be a good idea to enable discussions so that we can collaborate and talk without flooding the issues section with non-issues.

Cleaning up minor C++ bugs, warnings and bad teaching style

There are a few minor C++ warnings that set a bad precedent for people watching the videos who are new to the language.

myDict has a key of type const long -- it should be const long long. The last entry is 10,000,000,000L which overflows a signed 32-bit (2,147,483,647) and wraps around to 1,410,065,408 as the small snippet below demonstrates:

    for (auto i = table.begin(); i != table.end(); i++)
        printf( "%ld\n", i->first );

The literal integer keys in the table should be using annotated with LL:

                {          10LL, 4         },                // Historical data for validating our results - the number of primes
                {         100LL, 25        },               // to be found under some limit, such as 168 primes under 1000
                {        1000LL, 168       },
                {       10000LL, 1229      },
                {      100000LL, 9592      },
                {     1000000LL, 78498     },
                {    10000000LL, 664579    },
                {   100000000LL, 5761455   },
                {  1000000000LL, 50847534  },
                { 10000000000LL, 455052511 }

There is a missing int cast in the sqrt() result in Line #53. It should be:

    int q = (int) sqrt(sieveSize);

There is a missing literal double type annotation in Line 129 for the microseconds case. The integer 1000000 should be annotated as a literal double type with either a . or .0 suffix:

            sieve.printResults(false, duration_cast<microseconds>(steady_clock::now() - tStart).count() / 1000000., passes);

Other misc. comments.

validateResults() is doing an extra find(). It should be only doing this once:

      bool validateResults()
      {
          auto result = myDict.find(sieveSize);
          if (myDict.end() == result)
              return false;
          return result->second == countPrimes();
      }

The table myDict is only used by validateResults yet it pollutes the class. It be scoped within that function. Unfortunately, I'm seeing a huge performance hit on my Threadripper 3960X when it is in the function. Namely 8600 passes (class scope) vs 8000 passes (function scope). Looks like MSVC 2019 isn't properly inline the static const map initializer list when it is in the function scope? Might want to make a mention of the scoping performance issue.
There are numerous edge-case bugs that assume sieveSize > 2 in countPrimes() and printResults(). They should be:

      void printResults(bool showResults, double duration, int passes)
      {
          if ((showResults && (sieveSize >= 2))
              printf("2, ");

      int countPrimes()
      {
          int count = (sieveSize >= 2);

There is no comment on the magic number of 1 for count in printResults() on Line #83:

          int count = 1;

It should be written as:

          int count =  (sieveSize >= 2); // include first prime 2

Lastly, you might want to mention in a follow up video that is worth compiling at Warning Level 4 on MSVC /W4 to catch type errors. While the program will usually run if these are ignored these they are potentially problems lurking in the code. While it can be tedious to "shut the compiler up" over trivial warnings professional programmers try to compile code with zero warnings at the highest level possible.

Would anyone like to Admin this project? Someone knowledgeable in the ways of Github?

If so, please chime in and volunteer! I wasn't really anticipating the amount of interest and I've got 30+ pull requests and I'm swamped just doing the benchmarking and making the videos, but if someone would like to take over this project and maintain the source to the Drag Racing series, I'd love that!

My goal is to be able to keep the original branch with only a very few changes. I have a few checking queued up of my own, which I will check in, and then leave the main branch alone. I want as few changes in the main branch as possible to keep the tests similar from one episode to the next, but we can party alongside it for sure!

But if folks want to add side-projects for ports to other languages (I know folks have already submitted D, Rust, Java, and others!) the can go in folders alongside, like PrimeJava.

If you're interested, please let me know! Ideally with a one or two snippet of your experience with GitHub!

Thanks!
Dave

c++ can count primes at compile time if the sieve size is known at compile time

c++ constexpr example "https://gist.github.com/werto87/e59883c731ece4926a6a212d920961e9"

I get in 5 seconds:
passes: 98510" with debug build
passes: 307252277" with release build (I think the compiler optimizes the calculation in the loop away)

Setup CI for this project

Hello,

It would be nice to set up some CI to run these implementations in a controlled environment periodically. We're getting benchmarks from different machines, and it is hard to keep track of all the numbers for all the different implementations. We need one source of truth.

What is the guideline or procedure for improvements of one of the existing implementations?

What is the guideline or procedure for (small) performance improvements of one of the existing implementations?

Different cases:

There is a small performance improvement possible in the Python solution 2, it was made by someone else.
There is a small performance improvement possible in the Tcl solution 1, that was made by my self.
A small improvement in the algorithm can be applied to multiple implementations.

So what should be done in these cases?

Considerations:

Raise an issue for each solution that can be improved
Update the existing code and submit a new pull request?
Implement a different script in the same solution directory, with a different name and make sure both are run by the Dockerfile?
Implement a new solution directory, with only minor changes to the original?
Does it matter if it is an improvement of one of my "own" or from somebody else?

Please advice.

WASM build/runs for the Rust implementation

Per some discussion on #29 it would be very cool to see a WASM build of the Rust implementation with tests in a couple of different runtimes, Wasmer, Node and Deno likely being the most common.

`make` fails at result presentation step

For the third time now, I've run make and have had all the benchmarks run just fine only for the entire operation to fail at the point where the presentation of the results happens.

I see this error from node:

> ts-node ./src/index.ts "report" "-d" "/tmp/tmp.3PxYmVnGws" "-f" "table"

(node:30699) UnhandledPromiseRejectionWarning: TypeError: glob_1.default.sync(...).flatMap is not a function
    at Command.exports.command.commander_1.Command.requiredOption.option.action (/home/jjohnson/Primes/tools/src/commands/report.ts:26:62)
    at Command.listener [as _actionHandler] (/home/jjohnson/Primes/tools/node_modules/commander/index.js:922:31)
    at Command._parseCommand (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1503:14)
    at Command._dispatchSubcommand (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1443:18)
    at Command._parseCommand (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1460:12)
    at Command.parse (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1292:10)
    at Object.<anonymous> (/home/jjohnson/Primes/tools/src/index.ts:9:9)
    at Module._compile (internal/modules/cjs/loader.js:778:30)
    at Module.m._compile (/home/jjohnson/Primes/tools/node_modules/ts-node/src/index.ts:1295:23)
    at Module._extensions..js (internal/modules/cjs/loader.js:789:10)
(node:30699) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:30699) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I don't know enough about TypeScript to fix this, or this would be a pull request instead of a bug report.

The very first time I ran this, it worked perfectly. A later run the same day produced this error. Today I ran make again and this error arose again, then I deleted the Primes directory, re-cloned, and re-ran yet again and that is the error that I have pasted here.

If this is something that I am somehow doing wrong, please let me know.

Incorrect results with square numbers.

If the sieveSize is the square of a prime number like 25 or 49, that last number won't be flagged as "not a prime" since the runSieve function will stop before evaluating the factors of its square root.

Might also apply if the size is 50.

#17 includes a fix for this issue for the CPP version.

[Comment] Intel(R) Xeon(R) CPU E5-2630 v4

I have access to a Intel(R) Xeon(R) CPU E5-2630 v4 processor which has 10 cores and 20 threads I would be willing to use to run anyone's C or C++ code. It is a little old as far as processors go, but it may be faster than what we have access to. Reply to this if you are interested.

Where can I view recent results?

Would it be possible for some sort of GitHub CI automation to produce the results somewhere?

Typeo in validation check

Hello David,

I was looking at your source code provided on GitHub Prims in the PrimeSieveCS directory and notice a typo error.
The error is in the PrimeCS.cs file at line 19. It currently reads "{ 10 , 1 }," but there are 4 prime number and should be "{ 10 , 4 }".
I had also taken a quick peek in the C++ and Java source code and notice it has propagated into the Java language domain.
The propagation might have gone over to other language domains as well in your collection.

Best,
Fred Ekstrand

C++ Implementation using C++ Builder 10.3 (Formally Borland C++ Builder)

I have compiled the CPP code using the new version of C++ Builder. I have not changed anything in the code - I just thought it would be interesting to see how the MS and C++ Builder compilers compare. I didn't know if you want to add this into the main tree under the PrimeCPP folder? I have compiled it for Win32 and Win64.

To edit this code, you can download a community version of C++ Builder at: https://www.embarcadero.com/products/cbuilder/starter/free-download

I am not affiliated with Embarcadero at all, I just have been using their tools since they were Borland back in the late 90's,

The code is under my Git account at:
https://github.com/smiliea/PrimeSieveCppBuilder

Thanks!

Python version assert fails on odd numbers

Try:
sieve = prime_sieve(999999) # Calc the primes up to a million-1

Result:
assert(count == this.countPrimes()) AssertionError

The reason is that keyword int in Python doesn't do what a C++ programmer thinks:
this.rawbits = [True] * (int((this.sieveSize+1)/2))

Fix by:
from math import sqrt,floor # Used by the sieve
and
this.rawbits = [True] * (floor((this.sieveSize)/2))

Clarify Drag Race Rules

Thanks for this fun project.

Comparing languages and algorythms gives interesting insights.

For better comparabilty we should define a clear drag race finish line.

Currently the runSieve functions returns when the bitfield is built.
This means algorithms that don't use this field are not comparable.
Or functional languages like haskel would not even start because they don't produce data that is not used.

My proposal for a defined drag race finish line is that the runSieve function should return the number of primes every time and the benchmark loop should check the value every time. This would be a well defined task that can easily be checked on every run.

[Comment] Regarding the NUMA locality of your code on TR

As I alluded too in the video a bit ago, I believe the speed "boost hump" you saw was because of NUMA locality in Linux.
See this doc for more info and tell me if you're thinking the same: https://www.kernel.org/doc/html/latest/admin-guide/mm/numaperf.html

And don't forget to check out Valgrind, it's meant to help you find and fix memory leaks in code, but it's very useful in profiling threads as well! http://cs.ecs.baylor.edu/~donahoo/tools/valgrind/

As always, I enjoyed your video! Thanks for making them! You're a pleasure to watch and talk too!

(You can go ahead and close this ticket, it's not a bug.)

Should parallel version supercede standard version?

In the parallel multicore version (in PrimeCPP_PAR) I've added a number of command line switches that allow you to control how many threads, upper limit, how long to run for, and so on.

Since -t is threadcount running "primes.exe -t 1" will run a single-core pass, obviating the need for both a single and multicore versions.

I'm proposing that we orphan the original single core version and that future improvements go into the PAR version as well.

PS: I didn't want to open a separate issue just to gush, but wanted to say I was blown away and thrilled at the response to this project in general and how so many folks have contributed! And a HUGE thanks to Rolf for stepping up and taking charge!

Dockerfile each implementation

Hello,

Been sifting through some of the issues where we have discussions, and I'm not sure if it has been suggested before, but I would suggest we update all our implementations to be run from a Dockerfile, that way we have less to learn in terms of how each pipeline looks like and also we will be able to run each others implementations in an automated fashion.

Will update my own implementation to have a Dockerfile and then make a new pull-request.

Anyone who has other feedback to this suggestion?

Go implementation does not reset bitArray values between passes

Primes/PrimeGo/solution_1/main.go

Line 105 in ef7e095

sieve := Sieve{bitArray: initBitArray, sieveSize: 1e6}

I believe that since go makes a shallow copy of the slice bitArray that runs after the first will be artificially fast due to having the answer in place already.

https://github.com/emeraldivine335

Readme.md

`sed: illegal option -- r` when running on macOS.

Running make on a macOS MBP 13" 2020 gives error sed: illegal option -- r.

Catalina 10.15.7, Intel Core i7 (Quad).

It runs the benchmarks but flags this error for each one. Unsure if it affects result.

Single benchmarks no longer working

Has the make one option been deprecated?

Faster python implementation.

Hello Dave.

I saw your YouTube video and I instantly became a fan ;). Keep up the good work!

Made a slightly faster implementation with lists. Runs 255 passes compared to the original 40 (6 times faster):

import math
import timeit
from textwrap import dedent
from numba import jit

@jit  # speeds up the funtion 3 times
def run_sieve(limit):
    prime_list = [0, 0, 1]
    prime_list += [i % 2 for i in range(3, limit + 1)]
    upper_limit = int(math.sqrt(limit)) + 1
    for prime_candidate in range(3, upper_limit, 2):
        if prime_list[prime_candidate] == 1:
            for non_prime in range(prime_candidate ** 2, limit, prime_candidate * 2):
                prime_list[non_prime] = 0
    return prime_list[: limit + 1]

tStart = timeit.default_timer()  # Record our starting time
passes = 0
limit = 1000000

while timeit.default_timer() - tStart < 10:
    # Run until more than 10 seconds have elapsed
    prime_validators = run_sieve(limit)  #  Find the results
    passes = passes + 1
time_delta = timeit.default_timer() - tStart
primes = [
    int(prime) for (prime, validator) in enumerate(prime_validators) if validator == 1
]
print(
    dedent(
        f"""\
    Passes: {passes}
    Time: {time_delta}
    Avg: {time_delta/passes}
    Limit: {limit}
    Count: {len(primes)}
    Valid: {primeCounts[limit]==len(primes)}"""
    )
)

Numba optimized numpy solution. Runs 10 000 passes (250 times faster):

import timeit
from textwrap import dedent
from numba import jit
import numpy as np

@jit(nopython=True, fastmath=True, parallel=False)
def run_sieve(limit, prime_list):
    prime_list[:2] = False
    prime_list[4::2] = False
    upper_limit = int(limit ** (1/2)) + 1
    for prime_candidate in range(3, upper_limit, 2):
        if prime_list[prime_candidate]:
            prime_list[prime_candidate ** 2: limit: prime_candidate * 2] = False
    return prime_list[: limit + 1]

def print_results(time_delta, passes, primes, limit):
    prime_counts = {
        10: 4,  # Historical data for validating our results - the number of primes
        100: 25,  # to be found under some limit, such as 168 primes under 1000
        1000: 168,
        10000: 1229,
        100000: 9592,
        1000000: 78498,
        10000000: 664579,
        100000000: 5761455,
    }
    print(f"""Passes: {passes}\nTime: {time_delta}\nAvg: {time_delta/passes}\
        \nLimit: {limit}\nCount: {len(primes)}\nValid: {prime_counts[limit]==len(primes)}""")

if __name__=="__main__":
    t_start = timeit.default_timer()  # Record our starting time
    passes = 0
    limit = 1000000
    while timeit.default_timer() - t_start < 10:
        # Run until more than 10 seconds have elapsed
        raw_array = np.ones(limit, bool)
        prime_validators = run_sieve(limit, raw_array)  #  Find the results
        passes = passes + 1
    time_delta = timeit.default_timer() - t_start
    primes = [prime for (prime, validator) in enumerate(prime_validators) if validator]
    print_results(time_delta, passes, primes, limit)

drag-race branch paths

Honestly, I find the manual classification of the implementations overengineered. Do we need to reflect this in the structure of the project, a markdown table to mark them as faithful/unfaithful should be enough IMHO.

Improvement to Go as new solution?

I made the following changes to Go solution 2 which yielded drastically improved performance. However, since this solution parallelizes the computation I am not sure whether this is fitting as just an improvement to solution 2 which explicitly states that it is not parallelized.

I'd also like to ask all Go developers whether I maybe made a mistake and just overlooked it, because the results on my Ryzen 7 2700X are as follows:

# before patch
ssovest-go;4411;5.001076872;1;algorithm=base,faithful=yes,bits=1
# after patch
ssovest-go;56936;5.000778586;1;algorithm=base,faithful=yes,bits=1

This improvement is enough to make it suspicious in my eyes, but I can not find an error like just increasing passes without completing the sieve beforehand.

diff --git a/PrimeGo/solution_2/main.go b/PrimeGo/solution_2/main.go
index e2bf12b..c767b1d 100644
--- a/PrimeGo/solution_2/main.go
+++ b/PrimeGo/solution_2/main.go
@@ -1,11 +1,15 @@
 package main
 
 import (
+	"context"
 	"flag"
 	"fmt"
 	"math"
 	"math/bits"
+	"runtime"
 	"time"
+
+	"golang.org/x/sync/semaphore"
 )
 
 var primeCounts = map[uint64]uint64{
@@ -60,7 +64,6 @@ func (s Sieve) ValidateResults() bool {
 func main() {
 	var limit uint64
 	var duration time.Duration
-	var sieve Sieve
 
 	flag.Uint64Var(&limit, "limit", 1000000, "limit")
 	flag.DurationVar(&duration, "time", 5*time.Second, "duration")
@@ -71,15 +74,19 @@ func main() {
 	start := time.Now()
 	time.AfterFunc(duration, func() { stop <- struct{}{} })
 
+	sem := semaphore.NewWeighted(int64(runtime.NumCPU()))
 loop:
 	for {
 		select {
 		case <-stop:
 			break loop
 		default:
-			sieve = Sieve{make([]uint8, (limit+15)/16), limit}
-			sieve.RunSieve()
-			passes++
+			sem.Acquire(context.Background(), 1)
+			go func(sieve Sieve) {
+				defer sem.Release(1)
+				sieve.RunSieve()
+				passes++
+			}(Sieve{make([]uint8, (limit+15)/16), limit})
 		}
 	}
 	fmt.Printf("ssovest-go;%v;%v;1;algorithm=base,faithful=yes,bits=1\n", passes, time.Since(start).Seconds())

There are no instructions for running this suite

Please add instructions to the README for running the entire suite. Specifically Windows, although any amount of instructions for any platform would be an improvement!

Is it possible to add a LICENSE file?

In CONTRIBUTING.md, it is mentioned in the rules that:

You own copyright to all code and are willing to license that code under BSD-new/BSD-3 or a more permissive license, or the code is available under BSD-new/BSD-3 or a more permissive license.

Does this currently apply to all code in the repository, or only for new code submitted after a certain commit was added?
If this does apply to all code currently in the repository, it might be best to add a LICENSE file in the root of the repository so it's clear that all solutions are licensed under BSD-3.

If this does not currently apply to all code, it might still be worth placing a LICENSE file with the text of the BSD-3 license, but simply add a footnote that some solutions might not be licensed under that license. Then, for any solutions that need it, place a LICENSE file in the solution directory to note that that specific solution is under a different license.

Benchmark job vs CI

I was discussing this today with @rhbvkleef that we can separate the current CI implementation into two parts: one to do the actual CI process using the GitHub runners and this job could do a sanity check on the Docker files and try to build the images thus ensuring everything works. Another would be a scheduled job that runs the actual benchmark every night on dedicated runners for a more controlled environment. This job will also generate an execution report of the implementations.

Possibly incorrect bits reported by zig solution 2

I noticed that zig solution 2 (ManDeJan&ityonemo-zig-byte-sieve-type-bool) is by far the fastest solution reported as base, faithful=yes, bits=1. So I went looking where the magic is, and all I found was an array of bool. This could be a packed bit array, but it often isn't. I see this came up in #241, where @ityonemo reported (#241 (comment)) that they're just using @bitSizeOf. Sounds fair, but I couldn't find any evidence (in docs, or wherever) that zig uses packed bit arrays by default, so I thought I'd see if I can figure it out by writing a simple test program.

To cut a long story short, this program:

const std = @import("std");
pub fn main() !void {
    std.debug.print("@sizeOf(u8): {d}\n", .{@sizeOf(u8)});
    std.debug.print("@sizeOf([16]u8): {d}\n", .{@sizeOf([16]u8)});
    std.debug.print("@sizeOf(bool): {d}\n", .{@sizeOf(bool)});
    std.debug.print("@sizeOf([16]bool): {d}\n", .{@sizeOf([16]bool)});
    std.debug.print("@bitSizeOf(u8): {d}\n", .{@bitSizeOf(u8)});
    std.debug.print("@bitSizeOf([16]u8): {d}\n", .{@bitSizeOf([16]u8)});
    std.debug.print("@bitSizeOf(bool): {d}\n", .{@bitSizeOf(bool)});
    std.debug.print("@bitSizeOf([16]bool): {d}\n", .{@bitSizeOf([16]bool)});
}

Produces this output:

@sizeOf(u8): 1
@sizeOf([16]u8): 16
@sizeOf(bool): 1
@sizeOf([16]bool): 16
@bitSizeOf(u8): 8
@bitSizeOf([16]u8): 128
@bitSizeOf(bool): 1
@bitSizeOf([16]bool): 16

Which makes no sense at all (either [16]bool is 16 bits, or it's 16 bytes, not both) This leads me to suspect that

@bitSizeOf in zig is either broken or not intended to return the number of bits actually used in memory, and
the zig solution(s) should be classified as bits=unknown

This probably applies to a number of other zig solutions as well. They're obviously still very impressive!

Also tagging @ManDeJan as your name is also on the solution :-)

number field sieves

i'm willing to write quadratic sieves and gnfs in spare time, will you consider that sorta pull request?

parallel version question

what exactly do we mean by a parallel version? Is that

run the sieve on as many threads as possible and add up the results of each thread to get the # of executions? (Gustafson scaling)

run one sieve across many threads and add up how many executions you get, total? (Amdahl scaling)

Improvements to the Swift implementation

Hello! I am looking at the current Swift implementation of the sieve and have some changes in mind that in my machine lead to ~10x speed improvement. The thing is that I consider these changes to be relatively small, and since they don't change the algorithm in any way, I consider them unworthy of a separate solution.

The changes I'm considering are:

Changing the Dockerfile to use a more recent version of the language & Ubuntu 20.04 container instead of the current Ubuntu 18.04
Change the Bit count flag in the README to 8 bits since the implementation is using an array of booleans that each occupy 8 bits in memory, even if only one of those bits is used to represent the value (currently it's set to unknown)
Changes to PrimeSwift/solution_1/Sources/PrimeSieveSwift/main.swift:
1. Delete the BitArray protocol definition (lines 3-7) that is unnecessarily being used as an existential and replace its use in line 33 by the concrete struct BoolBitArray that is already implemented
2. Change PrimeSieveSwift from a class to a struct (at line 31) and make the runSieve method mutating (line 52) to indicate that the method mutates the struct
3. Add the bits=8 tag to the program output at line 117
4. Change the benchmark duration from 10 seconds to 5 at line 125, so the benchmark doesn't take any longer than needed

Then there are more "stylistic" changes that make the code more idiomatic:

Replace the use of an implicitly unwrapped optional (IUO) at line 123 by initialising the sieve variable to an empty sieve instead (var sieve: PrimeSieveSwift! -> var sieve = PrimeSieveSwift(limit: 0))
Replace the use of idx % 2 != 0 by the more descriptive and idiomatic !idx.isMultiple(of: 2)
Rename the PrimeSieveSwift type to PrimeSieve
Rename the bits property to the more descriptive name, isPrime
Change lines 15-28 in the BoolBitArray to be a pair of subscript getter and setter, keeping the indexing implementation the same, and changing every use of the bits property to use the subscript operator like bits[num] and bits[num] = false (at lines 45, 58, 64 and 103), instead of the current getBit and clearBit

My first question is if I should contribute this as a separate new solution or to the existing one.

The contribution guidelines say this under faithfulness:

It uses a class to encapsulate the sieve, or an equivalent feature in your language.

So my next question is if my change from a class to a struct falls under "equivalent feature". A C++ class is closer to a Swift struct than a Swift class, so I guess there is no issue here.

Thank you for your attention.

Require README for each benchmark

Since we have a lot of benchmarks now, I think it would be beneficial to have each folder have a README with relevant information. Most notably, they should probably contain instructions on how to correctly compile and/or run the benchmarks, including recommended compiler flags if relevant. Also, the creators of said benchmarks may wish to write a little description in the readme explaining what it does and how it differs from the original three.

Memory saving for generating larger prime numbers

For primes > 210, there are only 48 answers to: x MOD 210, that a prime can exist at.
I believe it is also true for any prime > 7

1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,121,127,131,137,139,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209

If 2,3,5,7 were automatically assumed to be prime, and then a mapping using the above was made for the bitarray, you would only need to use 22.86% of the memory. Allowing you to create primes 437.5% bigger, without going into virtual memory.

A simpler version of this would be: x MOD 6 in [1,5] for any prime > 3, but this would give 66.66% memory saving, compared to 78.14% using MOD 210.

Recommendation: May you have to follow the C# standard naming rules.

Here is a good resource to learn C# standard naming rules.

https://github.com/ktaranov/naming-convention/blob/master/C%23%20Coding%20Standards%20and%20Naming%20Conventions.md

Licenses and Build support for Delphi

I see that Delphi was disqualified for concerns about having a license. I'm happy to provide a few Delphi licenses for the purposes of testing. I'm a Developer Advocate and we are happy to provide licenses for projects like this. Comment on here if you are interested and we can figure out how many licenses you need.

Add a license

Add a license to the code to prevent copyright related issues with viewer submitted code.

error from make

Whenever I execute the make command, I get the following error:

This is since the merge commit from pull request #348 (SHA-1: e866254).
The commit before (SHA-1: 2c7bbf9) does work perfectly.

What am I doing wrong?

Changes for C# Code

Dave,
Great Video and Videos!
thanks, Scott...

Changes for better results in C#:

In validateResults()
if (myDict.ContainsKey(this.sieveSize))
return this.myDict[this.sieveSize] == this.countPrimes();
return false;

To: (closer match to C++ code, and sb faster)
return myDict.TryGetValue(this.sieveSize, out int test) && test == this.countPrimes();

In countPrimes()
for (int i = 0; i < this.bitArray.Count; i++)

To: (know this shouldn't matter, but it seems to help most of the time)
int hold = this.bitArray.Count;
for (int i = 0; i < hold; i++)

Consider changing:
project properties->Build->advance-> set output debug to None (helps little bit, not sure why)
&
C#'s DateTime class is slow (this change help a lot)
A better way is using System.Diagnostics.StopWatch (This class is available in c++ too)

Code would be:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();

            var passes = 0;
            prime_sieve sieve = null;

            while (stopWatch.Elapsed.TotalSeconds < 10)
            {
                sieve = new prime_sieve(1000000);
                sieve.runSieve();
                passes++;
            }

            stopWatch.Stop();
            
            if (sieve != null)
                sieve.printResults(false, stopWatch.Elapsed.TotalSeconds, passes);

Merge `drag-race` and `master`

It causes several problems to not have the majority of contributions be merged into master. As such, the proposal is to find a way to merge these two branches.

Dave requested this:

My goal is to be able to keep the original branch with only a very few changes. I have a few checking queued up of my own, which I will check in, and then leave the main branch alone. I want as few changes in the main branch as possible to keep the tests similar from one episode to the next, but we can party alongside it for sure!

I think achieving the main goal Dave set out, and merging drag-race into master is not impossible.

@mike-barber proposed this:

Would it make sense to put Dave's original solutions in an "original" folder maybe?

Could be all in a single original folder, or each language could have an original folder with Dave's code in it, with community ones then labelled solution_x as usual.

One thing that could make this merge easier is doing it in two phases -

Merge main into drag-race

Then fix the paths in drag-race

Then maybe one day get rid of drag race entirely and just have main :)

I think this is mostly a good idea. The only problematic part is that Dave has implemented multiple solutions for C++, and so there would be multiple original folders. As such, numbering original would make sense.

Before we do this, however, I would like to get a comment from @davepl because we are getting close to encroaching on his contributions.

Request for a replication of observations

My first though in trying to make a parallel sieve algorithm was to exchange time for memory. To that end, I changed the bit field that the program was operating on into a byte field: as the processor can address bytes, you don't have a read-modify-write when flagging a number as composite. It becomes just a write, which has no hazards.
Which is to say, I changed vector bool to vector char .

What I noticed, though, and what I would like verified, is that the negative performance impact from this change is so large that no performance increase from multi-threading could ever recover from it. Does everyone else see that? Or, is my code just absolute garbage?

Thanks

plummerssoftwarellc / primes Goto Github PK

primes's Introduction

Primes | A Software Drag Race

Build status

Community contributions

Running the benchmarks

Viewing results

primes's People

Contributors

Stargazers

Watchers

Forkers

primes's Issues

The problem

Considerations for a solution

Recommend Projects

Recommend Topics

Recommend Org