plummerssoftwarellc / primes Goto Github PK
View Code? Open in Web Editor NEWPrime Number Projects in C#/C++/Python
Home Page: https://plummerssoftwarellc.github.io/PrimeView/
Prime Number Projects in C#/C++/Python
Home Page: https://plummerssoftwarellc.github.io/PrimeView/
What is the guideline or procedure for (small) performance improvements of one of the existing implementations?
Different cases:
So what should be done in these cases?
Considerations:
Please advice.
I see that Delphi was disqualified for concerns about having a license. I'm happy to provide a few Delphi licenses for the purposes of testing. I'm a Developer Advocate and we are happy to provide licenses for projects like this. Comment on here if you are interested and we can figure out how many licenses you need.
Per some discussion on #29 it would be very cool to see a WASM build of the Rust implementation with tests in a couple of different runtimes, Wasmer, Node and Deno likely being the most common.
Would it be possible for some sort of GitHub CI automation to produce the results somewhere?
Since we have a lot of benchmarks now, I think it would be beneficial to have each folder have a README with relevant information. Most notably, they should probably contain instructions on how to correctly compile and/or run the benchmarks, including recommended compiler flags if relevant. Also, the creators of said benchmarks may wish to write a little description in the readme explaining what it does and how it differs from the original three.
Readme.md
Try:
sieve = prime_sieve(999999) # Calc the primes up to a million-1
Result:
assert(count == this.countPrimes()) AssertionError
The reason is that keyword int in Python doesn't do what a C++ programmer thinks:
this.rawbits = [True] * (int((this.sieveSize+1)/2))
Fix by:
from math import sqrt,floor # Used by the sieve
and
this.rawbits = [True] * (floor((this.sieveSize)/2))
I have compiled the CPP code using the new version of C++ Builder. I have not changed anything in the code - I just thought it would be interesting to see how the MS and C++ Builder compilers compare. I didn't know if you want to add this into the main tree under the PrimeCPP folder? I have compiled it for Win32 and Win64.
To edit this code, you can download a community version of C++ Builder at: https://www.embarcadero.com/products/cbuilder/starter/free-download
I am not affiliated with Embarcadero at all, I just have been using their tools since they were Borland back in the late 90's,
The code is under my Git account at:
https://github.com/smiliea/PrimeSieveCppBuilder
Thanks!
As I alluded too in the video a bit ago, I believe the speed "boost hump" you saw was because of NUMA locality in Linux.
See this doc for more info and tell me if you're thinking the same: https://www.kernel.org/doc/html/latest/admin-guide/mm/numaperf.html
And don't forget to check out Valgrind, it's meant to help you find and fix memory leaks in code, but it's very useful in profiling threads as well! http://cs.ecs.baylor.edu/~donahoo/tools/valgrind/
As always, I enjoyed your video! Thanks for making them! You're a pleasure to watch and talk too!
(You can go ahead and close this ticket, it's not a bug.)
Hello Dave.
I saw your YouTube video and I instantly became a fan ;). Keep up the good work!
Made a slightly faster implementation with lists. Runs 255 passes compared to the original 40 (6 times faster):
import math
import timeit
from textwrap import dedent
from numba import jit
@jit # speeds up the funtion 3 times
def run_sieve(limit):
prime_list = [0, 0, 1]
prime_list += [i % 2 for i in range(3, limit + 1)]
upper_limit = int(math.sqrt(limit)) + 1
for prime_candidate in range(3, upper_limit, 2):
if prime_list[prime_candidate] == 1:
for non_prime in range(prime_candidate ** 2, limit, prime_candidate * 2):
prime_list[non_prime] = 0
return prime_list[: limit + 1]
tStart = timeit.default_timer() # Record our starting time
passes = 0
limit = 1000000
while timeit.default_timer() - tStart < 10:
# Run until more than 10 seconds have elapsed
prime_validators = run_sieve(limit) # Find the results
passes = passes + 1
time_delta = timeit.default_timer() - tStart
primes = [
int(prime) for (prime, validator) in enumerate(prime_validators) if validator == 1
]
print(
dedent(
f"""\
Passes: {passes}
Time: {time_delta}
Avg: {time_delta/passes}
Limit: {limit}
Count: {len(primes)}
Valid: {primeCounts[limit]==len(primes)}"""
)
)
Numba optimized numpy solution. Runs 10 000 passes (250 times faster):
import timeit
from textwrap import dedent
from numba import jit
import numpy as np
@jit(nopython=True, fastmath=True, parallel=False)
def run_sieve(limit, prime_list):
prime_list[:2] = False
prime_list[4::2] = False
upper_limit = int(limit ** (1/2)) + 1
for prime_candidate in range(3, upper_limit, 2):
if prime_list[prime_candidate]:
prime_list[prime_candidate ** 2: limit: prime_candidate * 2] = False
return prime_list[: limit + 1]
def print_results(time_delta, passes, primes, limit):
prime_counts = {
10: 4, # Historical data for validating our results - the number of primes
100: 25, # to be found under some limit, such as 168 primes under 1000
1000: 168,
10000: 1229,
100000: 9592,
1000000: 78498,
10000000: 664579,
100000000: 5761455,
}
print(f"""Passes: {passes}\nTime: {time_delta}\nAvg: {time_delta/passes}\
\nLimit: {limit}\nCount: {len(primes)}\nValid: {prime_counts[limit]==len(primes)}""")
if __name__=="__main__":
t_start = timeit.default_timer() # Record our starting time
passes = 0
limit = 1000000
while timeit.default_timer() - t_start < 10:
# Run until more than 10 seconds have elapsed
raw_array = np.ones(limit, bool)
prime_validators = run_sieve(limit, raw_array) # Find the results
passes = passes + 1
time_delta = timeit.default_timer() - t_start
primes = [prime for (prime, validator) in enumerate(prime_validators) if validator]
print_results(time_delta, passes, primes, limit)
It causes several problems to not have the majority of contributions be merged into master
. As such, the proposal is to find a way to merge these two branches.
Dave requested this:
My goal is to be able to keep the original branch with only a very few changes. I have a few checking queued up of my own, which I will check in, and then leave the main branch alone. I want as few changes in the main branch as possible to keep the tests similar from one episode to the next, but we can party alongside it for sure!
I think achieving the main goal Dave set out, and merging drag-race
into master is not impossible.
@mike-barber proposed this:
Would it make sense to put Dave's original solutions in an "original" folder maybe?
Could be all in a single original folder, or each language could have an original folder with Dave's code in it, with community ones then labelled solution_x as usual.
One thing that could make this merge easier is doing it in two phases -
- Merge main into drag-race
- Then fix the paths in drag-race
Then maybe one day get rid of drag race entirely and just have main :)
I think this is mostly a good idea. The only problematic part is that Dave has implemented multiple solutions for C++, and so there would be multiple original
folders. As such, numbering original
would make sense.
Before we do this, however, I would like to get a comment from @davepl because we are getting close to encroaching on his contributions.
what exactly do we mean by a parallel version? Is that
or
If the sieveSize is the square of a prime number like 25 or 49, that last number won't be flagged as "not a prime" since the runSieve function will stop before evaluating the factors of its square root.
Might also apply if the size is 50.
#17 includes a fix for this issue for the CPP version.
For primes > 210, there are only 48 answers to: x MOD 210, that a prime can exist at.
I believe it is also true for any prime > 7
1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,121,127,131,137,139,143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209
If 2,3,5,7 were automatically assumed to be prime, and then a mapping using the above was made for the bitarray, you would only need to use 22.86% of the memory. Allowing you to create primes 437.5% bigger, without going into virtual memory.
A simpler version of this would be: x MOD 6 in [1,5] for any prime > 3, but this would give 66.66% memory saving, compared to 78.14% using MOD 210.
My first though in trying to make a parallel sieve algorithm was to exchange time for memory. To that end, I changed the bit field that the program was operating on into a byte field: as the processor can address bytes, you don't have a read-modify-write when flagging a number as composite. It becomes just a write, which has no hazards.
Which is to say, I changed vector bool to vector char .
What I noticed, though, and what I would like verified, is that the negative performance impact from this change is so large that no performance increase from multi-threading could ever recover from it. Does everyone else see that? Or, is my code just absolute garbage?
Thanks
c++ constexpr example "https://gist.github.com/werto87/e59883c731ece4926a6a212d920961e9"
I get in 5 seconds:
passes: 98510" with debug build
passes: 307252277" with release build (I think the compiler optimizes the calculation in the loop away)
In the parallel multicore version (in PrimeCPP_PAR) I've added a number of command line switches that allow you to control how many threads, upper limit, how long to run for, and so on.
Since -t is threadcount running "primes.exe -t 1" will run a single-core pass, obviating the need for both a single and multicore versions.
I'm proposing that we orphan the original single core version and that future improvements go into the PAR version as well.
PS: I didn't want to open a separate issue just to gush, but wanted to say I was blown away and thrilled at the response to this project in general and how so many folks have contributed! And a HUGE thanks to Rolf for stepping up and taking charge!
i'm willing to write quadratic sieves and gnfs in spare time, will you consider that sorta pull request?
I was discussing this today with @rhbvkleef that we can separate the current CI implementation into two parts: one to do the actual CI process using the GitHub runners and this job could do a sanity check on the Docker files and try to build the images thus ensuring everything works. Another would be a scheduled job that runs the actual benchmark every night on dedicated runners for a more controlled environment. This job will also generate an execution report of the implementations.
Python's list buitin doesn't do random access, it's not an array. bytearray builtin is a real array.
Yet, somehow the bytearray gains not too much.
On my setup, the original version does 20 passes in 10 seconds, an with the bytearray it only does 42 passes.
Also, every step of the sieve can actually start from factor*factor
, because every N*factor for N<factor has already been eliminated. This doesn't give much gain, though.
Honestly, I find the manual classification of the implementations overengineered. Do we need to reflect this in the structure of the project, a markdown table to mark them as faithful/unfaithful should be enough IMHO.
I understand that Node isn't technically a language but neither is PHP.
Thanks for this fun project.
Comparing languages and algorythms gives interesting insights.
For better comparabilty we should define a clear drag race finish line.
Currently the runSieve functions returns when the bitfield is built.
This means algorithms that don't use this field are not comparable.
Or functional languages like haskel would not even start because they don't produce data that is not used.
My proposal for a defined drag race finish line is that the runSieve function should return the number of primes every time and the benchmark loop should check the value every time. This would be a well defined task that can easily be checked on every run.
I just took a look at the implementations for C, C++ & Rust as those are commonly known for getting highly optimized versions in benchmarks. And indeed there are quite highly optimized versions to be found.
But after looking at the C implementations, I was wondering at what point the implementations are no longer comparable.
In my opinion it is not really possible to compare the languages in any meaningful way, when some versions optimize the algorithm itsef in the way that most of the C implementations do. Of course the meaningful comparison is debatable in any case, but I would argue that it may still produce interesting results when all implementations use the same base algorithm.
The original implementation was optimized to skip even numbers which is quite reasonable and results in no cpu time overhead while calculating or counting since it just increments by 2 instead of 1.
Almost all C versions on the other hand further optimized the skipping part, with not only skipping even numbers but instead also skipping higher multiples of other primes (2, 3, 5) or even skipping multiples of all primes up to 13 in one example. This of course provides a significant boost to performance (at least the first few) but is now no longer comparable to the original version (and in turn any language that doesn't use the same optimizations). This optimization is of course quite clever and would be valid in any other case, but here it is basically just using existing knowledge to highly decrease the workingset to a much smaller size than the other implementations have to deal with. (Just skipping the multiples of 3 decreases the unchecked values from about 500_000
to only about 166_667
)
Besides the counting part now no longer being trivial (and not being timed), this of course makes C/C++ and in turn all languages with implementations that use this optimization seem much faster in relation to other languages than they are.
A language that is inherntly slower than another language could implement this change an in turn look faster on paper, while the other just doesn't have this optimization.
Ok I don't really have a solution, but here are some of my considerations:
Has the make one option been deprecated?
Hello David,
I was looking at your source code provided on GitHub Prims in the PrimeSieveCS directory and notice a typo error.
The error is in the PrimeCS.cs file at line 19. It currently reads "{ 10 , 1 }," but there are 4 prime number and should be "{ 10 , 4 }".
I had also taken a quick peek in the C++ and Java source code and notice it has propagated into the Java language domain.
The propagation might have gone over to other language domains as well in your collection.
Best,
Fred Ekstrand
Primes/PrimeGo/solution_1/main.go
Line 105 in ef7e095
I believe that since go makes a shallow copy of the slice bitArray that runs after the first will be artificially fast due to having the answer in place already.
Please add instructions to the README for running the entire suite. Specifically Windows, although any amount of instructions for any platform would be an improvement!
Dave,
Great Video and Videos!
thanks, Scott...
Changes for better results in C#:
In validateResults()
if (myDict.ContainsKey(this.sieveSize))
return this.myDict[this.sieveSize] == this.countPrimes();
return false;
To: (closer match to C++ code, and sb faster)
return myDict.TryGetValue(this.sieveSize, out int test) && test == this.countPrimes();
In countPrimes()
for (int i = 0; i < this.bitArray.Count; i++)
To: (know this shouldn't matter, but it seems to help most of the time)
int hold = this.bitArray.Count;
for (int i = 0; i < hold; i++)
Consider changing:
project properties->Build->advance-> set output debug to None (helps little bit, not sure why)
&
C#'s DateTime class is slow (this change help a lot)
A better way is using System.Diagnostics.StopWatch (This class is available in c++ too)
Code would be:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
var passes = 0;
prime_sieve sieve = null;
while (stopWatch.Elapsed.TotalSeconds < 10)
{
sieve = new prime_sieve(1000000);
sieve.runSieve();
passes++;
}
stopWatch.Stop();
if (sieve != null)
sieve.printResults(false, stopWatch.Elapsed.TotalSeconds, passes);
Add a license to the code to prevent copyright related issues with viewer submitted code.
Hello,
It would be nice to set up some CI to run these implementations in a controlled environment periodically. We're getting benchmarks from different machines, and it is hard to keep track of all the numbers for all the different implementations. We need one source of truth.
Primes/PrimeGo/solution_1/main.go
Line 100 in 2c7bbf9
initBitArray := make([]bool, 1e6)
the array size is hard coded to 1,000,000 elements. Should this by dynamic and allow for 10, 100, 1000 ...?
Also, in Go, a bool type requires 1 byte for memory allocation, so each element in initBitArray is using 8 bits, and your output tag would need to specify '8' for the storage flag.
I think we need to have a discussion to figure out how to deal with reporting for different algorithms, as this adds an extra dimension of fun.
Currently, all of the implementations (except for the C ones) are very similar to the "reference" C++ version, where we consider all odd numbers as potential primes.
The C implementations from @mckoss and @danielspaangberg are really interesting additions, and make reference to this wikipedia section. The same applies to the C# implementation by @Kinematics. Thanks for these! The important thing to note here is that these are algorithmically quite different from all other implementations as they avoid needing to check higher known primes factors besides just 2: variants include 3,5,7, and upwards.
However, this does raise some issues:
1of2
, is the equivalent of the "reference" C++ implementation, so this one is comparable1of2
is part of a more general set of algorithms, I don't think it would make sense to move it to a separate project.In essence, I think we just need to work out what do to on the reporting side. My initial thoughts are along the lines of this:
1of2
8of30
and upI'm not sure if this is the right approach, or how to deal with tagging specific lines in the output as "different algorithm". It'll be great to get everyone's input! I know you're working on reporting currently @marghidanu and it's looking pretty cool so far.
Here is a good resource to learn C# standard naming rules.
For the third time now, I've run make
and have had all the benchmarks run just fine only for the entire operation to fail at the point where the presentation of the results happens.
I see this error from node:
> ts-node ./src/index.ts "report" "-d" "/tmp/tmp.3PxYmVnGws" "-f" "table"
(node:30699) UnhandledPromiseRejectionWarning: TypeError: glob_1.default.sync(...).flatMap is not a function
at Command.exports.command.commander_1.Command.requiredOption.option.action (/home/jjohnson/Primes/tools/src/commands/report.ts:26:62)
at Command.listener [as _actionHandler] (/home/jjohnson/Primes/tools/node_modules/commander/index.js:922:31)
at Command._parseCommand (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1503:14)
at Command._dispatchSubcommand (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1443:18)
at Command._parseCommand (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1460:12)
at Command.parse (/home/jjohnson/Primes/tools/node_modules/commander/index.js:1292:10)
at Object.<anonymous> (/home/jjohnson/Primes/tools/src/index.ts:9:9)
at Module._compile (internal/modules/cjs/loader.js:778:30)
at Module.m._compile (/home/jjohnson/Primes/tools/node_modules/ts-node/src/index.ts:1295:23)
at Module._extensions..js (internal/modules/cjs/loader.js:789:10)
(node:30699) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:30699) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
I don't know enough about TypeScript to fix this, or this would be a pull request instead of a bug report.
The very first time I ran this, it worked perfectly. A later run the same day produced this error. Today I ran make
again and this error arose again, then I deleted the Primes
directory, re-cloned, and re-ran yet again and that is the error that I have pasted here.
If this is something that I am somehow doing wrong, please let me know.
I think it would be a good idea to enable discussions so that we can collaborate and talk without flooding the issues section with non-issues.
Hello,
Been sifting through some of the issues where we have discussions, and I'm not sure if it has been suggested before, but I would suggest we update all our implementations to be run from a Dockerfile, that way we have less to learn in terms of how each pipeline looks like and also we will be able to run each others implementations in an automated fashion.
Will update my own implementation to have a Dockerfile and then make a new pull-request.
Anyone who has other feedback to this suggestion?
I made the following changes to Go solution 2 which yielded drastically improved performance. However, since this solution parallelizes the computation I am not sure whether this is fitting as just an improvement to solution 2 which explicitly states that it is not parallelized.
I'd also like to ask all Go developers whether I maybe made a mistake and just overlooked it, because the results on my Ryzen 7 2700X are as follows:
# before patch
ssovest-go;4411;5.001076872;1;algorithm=base,faithful=yes,bits=1
# after patch
ssovest-go;56936;5.000778586;1;algorithm=base,faithful=yes,bits=1
This improvement is enough to make it suspicious in my eyes, but I can not find an error like just increasing passes
without completing the sieve beforehand.
diff --git a/PrimeGo/solution_2/main.go b/PrimeGo/solution_2/main.go
index e2bf12b..c767b1d 100644
--- a/PrimeGo/solution_2/main.go
+++ b/PrimeGo/solution_2/main.go
@@ -1,11 +1,15 @@
package main
import (
+ "context"
"flag"
"fmt"
"math"
"math/bits"
+ "runtime"
"time"
+
+ "golang.org/x/sync/semaphore"
)
var primeCounts = map[uint64]uint64{
@@ -60,7 +64,6 @@ func (s Sieve) ValidateResults() bool {
func main() {
var limit uint64
var duration time.Duration
- var sieve Sieve
flag.Uint64Var(&limit, "limit", 1000000, "limit")
flag.DurationVar(&duration, "time", 5*time.Second, "duration")
@@ -71,15 +74,19 @@ func main() {
start := time.Now()
time.AfterFunc(duration, func() { stop <- struct{}{} })
+ sem := semaphore.NewWeighted(int64(runtime.NumCPU()))
loop:
for {
select {
case <-stop:
break loop
default:
- sieve = Sieve{make([]uint8, (limit+15)/16), limit}
- sieve.RunSieve()
- passes++
+ sem.Acquire(context.Background(), 1)
+ go func(sieve Sieve) {
+ defer sem.Release(1)
+ sieve.RunSieve()
+ passes++
+ }(Sieve{make([]uint8, (limit+15)/16), limit})
}
}
fmt.Printf("ssovest-go;%v;%v;1;algorithm=base,faithful=yes,bits=1\n", passes, time.Since(start).Seconds())
I have access to a Intel(R) Xeon(R) CPU E5-2630 v4 processor which has 10 cores and 20 threads I would be willing to use to run anyone's C or C++ code. It is a little old as far as processors go, but it may be faster than what we have access to. Reply to this if you are interested.
When I run make then the build stops at Zig solution 3. No report is created.
This can also be reproduced by running:
make one SOLUTION=PrimeZig/solution_3
I get the following error:
Step 12/13 : RUN zig build -Drelease-fast
---> Running in 2af2d356b4a8
PrimeZig...The following command terminated unexpectedly:
/deps/local/zig build-exe /opt/app/src/main.zig -OReleaseFast --cache-dir /opt/app/zig-cache --global-cache-dir /root/.cache/zig --name PrimeZig --enable-cache
error: the following build command failed with exit code 9:
/opt/app/zig-cache/o/c4cde88a97309b9cc1655cc0cdae2bbb/build /deps/local/zig /opt/app /opt/app/zig-cache /root/.cache/zig -Drelease-fast
The command '/bin/sh -c zig build -Drelease-fast' returned a non-zero code: 1
"docker run" requires at least 1 argument.
See 'docker run --help'.
Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
Run a command in a new container
/bin/bash: line 6: npm: command not found
make: *** [Makefile:30: one] Error 127
Hello,
This is not really an issue. Wanted to share a version of the Python code using Numba. But because this is the first time i used Numba and i don't really know how to use jitclass, i had to pull the code apart so it's pretty ugly. As such, i didn't want to submit it as a pull request. Hopefully someone can make a prettier version.
On my machine this ran faster than the C# version, with 4855 iterations vs 4150.
from sys import stdout
from math import sqrt # Used by the sieve
import timeit # For timing the durations
import numpy as np
from numba import njit
@njit
def primeSieve(limit=1000000):
rawbits = np.full(int((limit+1)/2), np.bool(True))
factor = 3
q = sqrt(limit)
num = 0
while factor < q:
for num in range (factor, limit):
if num % 2 == 0:
continue
if rawbits[int(num/2)]:
factor = num
break
# If marking factor 3, you wouldn't mark 6 (it's a mult of 2) so start with the 3rd instance of this factor's multiple.
# We can then step by factor * 2 because every second one is going to be even by definition
for num in range (factor * 3, limit, factor * 2):
if num % 2 == 0:
continue
rawbits[int(num/2)] = False
factor += 2 # No need to check evens, so skip to next odd (factor = 3, 5, 7, 9...)
return rawbits
def printResults(showResults, rawbits, duration, passes, sieveSize):
if showResults: # Since we auto-filter evens, we have to special case the number 2 which is prime
stdout.write("2, ")
count = 1
for num in range (3, sieveSize): # Count (and optionally dump) the primes that were found below the limit
if num % 2 == 0:
continue
if rawbits[int(num/2)]:
if showResults:
stdout.write(str(num) +", ")
count+=1
countPrimes = sum(1 for b in rawbits if b)
assert count == countPrimes
stdout.write("\n")
primeCounts = { 10 : 1, # Historical data for validating our results - the number of primes
100 : 25, # to be found under some limit, such as 168 primes under 1000
1000 : 168,
10000 : 1229,
100000 : 9592,
1000000 : 78498,
10000000 : 664579,
100000000 : 5761455
}
if sieveSize in primeCounts: # the data, and (b) our count matches. Since it will return
validateResults = (primeCounts[sieveSize] == countPrimes)
else:
validateResults = False
print("Passes: " + str(passes) + ", Time: " + str(duration) + ", Avg: " + str(duration/passes) +
", Limit: " + str(sieveSize) + ", Count: " + str(count) + ", Valid: " + str(validateResults))
tStart = timeit.default_timer() # Record our starting time
totPass = 0
numSieve = 1000000
while timeit.default_timer() - tStart < 10: # Run until more than 10 seconds have elapsed
res = primeSieve(numSieve) # Calc the primes up to a million
totPass = totPass + 1 # Count this pass
tD = timeit.default_timer() - tStart
printResults(False, res, tD, totPass, numSieve)
In CONTRIBUTING.md, it is mentioned in the rules that:
You own copyright to all code and are willing to license that code under BSD-new/BSD-3 or a more permissive license, or the code is available under BSD-new/BSD-3 or a more permissive license.
Does this currently apply to all code in the repository, or only for new code submitted after a certain commit was added?
If this does apply to all code currently in the repository, it might be best to add a LICENSE
file in the root of the repository so it's clear that all solutions are licensed under BSD-3.
If this does not currently apply to all code, it might still be worth placing a LICENSE
file with the text of the BSD-3 license, but simply add a footnote that some solutions might not be licensed under that license. Then, for any solutions that need it, place a LICENSE file in the solution directory to note that that specific solution is under a different license.
Hi, I've devised an algorithm that is quite fast for the language it is and the base algorithm.
I believe that it is the base algorithm and it follows the rules, but I'm not certain.
It uses 8-bits for each stored value, but instead of storing every value, it only stores for the odd numbers starting at 3. It doesn't have any predefined primes (except 2 like the standard c++ one), so it's not a wheel algorithm.
Is this allowed? Is it 8-bits or 4-bits or other?
I don't see why it wouldn't be, but I wanted to check before I got it ready for PR.
There are a few minor C++ warnings that set a bad precedent for people watching the videos who are new to the language.
myDict
has a key of type const long
-- it should be const long long
. The last entry is 10,000,000,000L which overflows a signed 32-bit (2,147,483,647) and wraps around to 1,410,065,408 as the small snippet below demonstrates: for (auto i = table.begin(); i != table.end(); i++)
printf( "%ld\n", i->first );
The literal integer keys in the table should be using annotated with LL
:
{ 10LL, 4 }, // Historical data for validating our results - the number of primes
{ 100LL, 25 }, // to be found under some limit, such as 168 primes under 1000
{ 1000LL, 168 },
{ 10000LL, 1229 },
{ 100000LL, 9592 },
{ 1000000LL, 78498 },
{ 10000000LL, 664579 },
{ 100000000LL, 5761455 },
{ 1000000000LL, 50847534 },
{ 10000000000LL, 455052511 }
int
cast in the sqrt() result in Line #53. It should be: int q = (int) sqrt(sieveSize);
1000000
should be annotated as a literal double type with either a .
or .0
suffix: sieve.printResults(false, duration_cast<microseconds>(steady_clock::now() - tStart).count() / 1000000., passes);
Other misc. comments.
validateResults()
is doing an extra find()
. It should be only doing this once: bool validateResults()
{
auto result = myDict.find(sieveSize);
if (myDict.end() == result)
return false;
return result->second == countPrimes();
}
myDict
is only used by validateResults
yet it pollutes the class. It be scoped within that function. Unfortunately, I'm seeing a huge performance hit on my Threadripper 3960X when it is in the function. Namely 8600 passes (class scope) vs 8000 passes (function scope). Looks like MSVC 2019 isn't properly inline the static const map
initializer list when it is in the function scope? Might want to make a mention of the scoping performance issue.sieveSize
> 2 in countPrimes()
and printResults()
. They should be: void printResults(bool showResults, double duration, int passes)
{
if ((showResults && (sieveSize >= 2))
printf("2, ");
int countPrimes()
{
int count = (sieveSize >= 2);
count
in printResults()
on Line #83: int count = 1;
It should be written as:
int count = (sieveSize >= 2); // include first prime 2
Lastly, you might want to mention in a follow up video that is worth compiling at Warning Level 4 on MSVC /W4
to catch type errors. While the program will usually run if these are ignored these they are potentially problems lurking in the code. While it can be tedious to "shut the compiler up" over trivial warnings professional programmers try to compile code with zero warnings at the highest level possible.
Hello! I am looking at the current Swift implementation of the sieve and have some changes in mind that in my machine lead to ~10x speed improvement. The thing is that I consider these changes to be relatively small, and since they don't change the algorithm in any way, I consider them unworthy of a separate solution.
The changes I'm considering are:
Bit count
flag in the README to 8 bits since the implementation is using an array of booleans that each occupy 8 bits in memory, even if only one of those bits is used to represent the value (currently it's set to unknown
)PrimeSwift/solution_1/Sources/PrimeSieveSwift/main.swift
:
BitArray
protocol definition (lines 3-7) that is unnecessarily being used as an existential and replace its use in line 33 by the concrete struct BoolBitArray
that is already implementedPrimeSieveSwift
from a class to a struct (at line 31) and make the runSieve
method mutating
(line 52) to indicate that the method mutates the structbits=8
tag to the program output at line 117Then there are more "stylistic" changes that make the code more idiomatic:
sieve
variable to an empty sieve instead (var sieve: PrimeSieveSwift!
-> var sieve = PrimeSieveSwift(limit: 0)
)idx % 2 != 0
by the more descriptive and idiomatic !idx.isMultiple(of: 2)
PrimeSieveSwift
type to PrimeSieve
bits
property to the more descriptive name, isPrime
BoolBitArray
to be a pair of subscript getter and setter, keeping the indexing implementation the same, and changing every use of the bits
property to use the subscript operator like bits[num]
and bits[num] = false
(at lines 45, 58, 64 and 103), instead of the current getBit
and clearBit
My first question is if I should contribute this as a separate new solution or to the existing one.
The contribution guidelines say this under faithfulness:
It uses a class to encapsulate the sieve, or an equivalent feature in your language.
So my next question is if my change from a class to a struct falls under "equivalent feature". A C++ class is closer to a Swift struct than a Swift class, so I guess there is no issue here.
Thank you for your attention.
I noticed that zig solution 2 (ManDeJan&ityonemo-zig-byte-sieve-type-bool
) is by far the fastest solution reported as base, faithful=yes, bits=1. So I went looking where the magic is, and all I found was an array of bool
. This could be a packed bit array, but it often isn't. I see this came up in #241, where @ityonemo reported (#241 (comment)) that they're just using @bitSizeOf
. Sounds fair, but I couldn't find any evidence (in docs, or wherever) that zig uses packed bit arrays by default, so I thought I'd see if I can figure it out by writing a simple test program.
To cut a long story short, this program:
const std = @import("std");
pub fn main() !void {
std.debug.print("@sizeOf(u8): {d}\n", .{@sizeOf(u8)});
std.debug.print("@sizeOf([16]u8): {d}\n", .{@sizeOf([16]u8)});
std.debug.print("@sizeOf(bool): {d}\n", .{@sizeOf(bool)});
std.debug.print("@sizeOf([16]bool): {d}\n", .{@sizeOf([16]bool)});
std.debug.print("@bitSizeOf(u8): {d}\n", .{@bitSizeOf(u8)});
std.debug.print("@bitSizeOf([16]u8): {d}\n", .{@bitSizeOf([16]u8)});
std.debug.print("@bitSizeOf(bool): {d}\n", .{@bitSizeOf(bool)});
std.debug.print("@bitSizeOf([16]bool): {d}\n", .{@bitSizeOf([16]bool)});
}
Produces this output:
@sizeOf(u8): 1
@sizeOf([16]u8): 16
@sizeOf(bool): 1
@sizeOf([16]bool): 16
@bitSizeOf(u8): 8
@bitSizeOf([16]u8): 128
@bitSizeOf(bool): 1
@bitSizeOf([16]bool): 16
Which makes no sense at all (either [16]bool
is 16 bits, or it's 16 bytes, not both) This leads me to suspect that
@bitSizeOf
in zig is either broken or not intended to return the number of bits actually used in memory, andThis probably applies to a number of other zig solutions as well. They're obviously still very impressive!
Also tagging @ManDeJan as your name is also on the solution :-)
Running make
on a macOS MBP 13" 2020 gives error sed: illegal option -- r
.
Catalina 10.15.7, Intel Core i7 (Quad).
It runs the benchmarks but flags this error for each one. Unsure if it affects result.
If so, please chime in and volunteer! I wasn't really anticipating the amount of interest and I've got 30+ pull requests and I'm swamped just doing the benchmarking and making the videos, but if someone would like to take over this project and maintain the source to the Drag Racing series, I'd love that!
My goal is to be able to keep the original branch with only a very few changes. I have a few checking queued up of my own, which I will check in, and then leave the main branch alone. I want as few changes in the main branch as possible to keep the tests similar from one episode to the next, but we can party alongside it for sure!
But if folks want to add side-projects for ports to other languages (I know folks have already submitted D, Rust, Java, and others!) the can go in folders alongside, like PrimeJava.
If you're interested, please let me know! Ideally with a one or two snippet of your experience with GitHub!
Thanks!
Dave
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.