Here are some time comparisons I ran today, run on a quiet
system after bootup.
System:
System76 laptop (2016); i7 6700HQ, 2.6-3.5GHz, 4C|8T, 16GB memory
Linux 5.15.14, gcc 11.2.0, llvm 12.0.1
I use htop
to monitor threads and memory use while the programs run.
I just show Rust
and C++
because they were the 2 languages people seemed most interested in.
As a reference, I compare my results with primesieve
https://github.com/kimwalisch/primesieve
a highly optimized C++ implementation of the the Sieve of Eratosthenes (SoE). I used its current
version (ATOW) primesieve-7.7
I consider my code reference implementations
. They perform all the separate pieces of the
algorithm but are not necessarily optimum for performance or memory use. However, they still
demonstrate the efficiency and inherent performance of the algorithm, which is the fastest
way to generate twin (cousin) primes, based on Prime Generator Theory (PGT) math.
primesieve
actually has 3 different routines, for small, medium, and large input number ranges. For inputs from about 16-20 digits (20 digits is max for 64-bits) it’s faster because it uses look-up tables of compressed sieving primes. My implementations generate all its parameters from scratch, which slows it down when generating the sieving primes for large input values.
My code also doesn’t try to optimize cache hits, etc, like primesieve
, but all the code
to select which Prime Generator (PG) and segments size to use is confined to set_sieve_parameters
, so you only have one place to optimize memory use and generator selection. People should appreciate you can do the whole algorithm in less than 300 loc.
The SSoZ (Segmented Sieve of Zakiya) is designed to be run on multicore systems. The more cores the faster. I’ve only had an x86_64 system to test on, and am looking for people with
different hardware to run it on (ARM|M1, AMD, RISC, PowerPC, etc).
The table shows the times (secs) for Rust
, C++
, and primesieve
for the single inputs shown.
I also show the count of twinprimes (all agree). My implementation also gives the largest
twinprime in the input range. This allows you to search for specific twinprimes using small ranges.
The major difference between the language implementations is how they do parallel processing.
For Rust
I use the Rayon
crate – https://github.com/rayon-rs/rayon – while C++
uses OpenMP
.
This data can be used as references for a Julia
implementation to see how it compares when
running each version on the same system.
If there are any questions please feel free to ask here and email me.
N | Rust | C++ | PrmSev | Twinprimes | Largest in Range |
-------------------|--------|--------|--------|----------------|-----------------------|
10_000_000_000 | 0.35 | 0.45 | 0.51 | 27,412,679 | 9,999,999,702+/-1 |
50_000_000_000 | 1.67 | 2.14 | 2.81 | 118,903,682 | 49,999,999,590+/-1 |
100_000_000_000 | 3.41 | 4.24 | 5.91 | 224,376,048 | 99,999,999,762+/-1 |
500_000_000_000 | 18.15 | 21.51 | 32.76 | 986,222,314 | 499,999,999,062+/-1 |
1_000_000_000_000 | 37.67 | 44.48 | 69.25 | 1,870,585,220 | 999,999,999,960+/-1 |
5_000_000_000_000 | 219.67 | 253.62 | 395.16 | 8,312,493,003 | 4,999,999,999,878+/-1 |
10_000_000_000_000 | 482.51 | 543.74 | 825.71 | 15,834,664,872 | 9,999,999,998,490+/-1 |