Can Julia do this algorithm faster?

jzakiya · January 14, 2022, 7:14pm

As I said at the start, I don’t have the time to learn (well) another whole language. My life currently doesn’t afford me that luxury. If no one is curious enough to want to do it, that’s cool. I thought I’d try though.

It really shouldn’t take that long to do since you have working code to mimic, if Julia is really that easy to code in.

StefanKarpinski · January 14, 2022, 7:14pm

If someone feels like doing this they will. If no one feels like it they won’t. Simple enough.

jzakiya · January 14, 2022, 7:15pm

Yes, will see.

Benny · January 14, 2022, 9:27pm

I’ll respectfully point out that your advice about using the existing code also has the implicit requirement of knowing at least one of the other languages used so far. As you’ve pointed out, it takes time to learn another programming language well, and it would take even more time to learn how to translate between languages well. So even ignoring all other factors, only a small subset of Julia users can practically translate your code in the near future. With Rust, you had provided Rust code first, so a much larger subset of the community could immediately share their Rust optimization knowledge with you. It’s not just a matter of curiosity.

IlianPihlajamaa · January 15, 2022, 4:17pm

Without any guarantees: it would also probably help if you provided unoptimized single threaded code from another language, or even pseudocode. It’s really hard to accurately translate highly optimized code from a foreign language…

Once we have a simple working Julia version, we can start to optimize.

jzakiya · January 15, 2022, 6:02pm

Here’s a single threaded version done in Crystal, whose syntax is really easy to follow.
The only difference is it doesn’t spawn the threads but run them serial, so its slower.

https://gist.github.com/jzakiya/05b39b66ee6c1a25bb0dd39f83acbd64

jzakiya · January 17, 2022, 8:07pm

FYI

After watching this presentation (and some referenced links) on using cpu resources more efficiently I was able to increase performance by just changing the sequence of a few lines of code (loc).

The goal is to use as many cpu execution units (EU) simultaneously as possible.
So you order sequential lines of code that can be executed independently.
Here’ one snippet from the Crystal code.

Original
def nextp_init(rhi, kmin, modpg, primes, resinvrs)
  nextp = Slice(UInt64).new(primes.size*2)
  r_hi, r_lo = rhi, rhi - 2
  primes.each_with_index do |prime, j|
    k = (prime - 2) // modpg 
    r = (prime - 2) %  modpg + 2 
    r_inv = resinvrs[r].to_u64   
    ri = (r_inv * r_lo - 2) % modpg + 2
    ki = k * (prime + ri) + (r * ri - 2) // modpg 
    ki < kmin ? (ki = (kmin - ki) % prime; ki = prime - ki if ki > 0) : (ki -= kmin)
    nextp[j << 1] = ki.to_u64 
    ri = (r_inv * r_hi - 2) % modpg + 2 
    ki = k * (prime + ri) + (r * ri - 2) // modpg
    ki < kmin ? (ki = (kmin - ki) % prime; ki = prime - ki if ki > 0) : (ki -= kmin)
    nextp[j << 1 | 1] = ki.to_u64
  end
  nextp
end

More Efficient/Faster
def nextp_init(rhi, kmin, modpg, primes, resinvrs)
  nextp = Slice(UInt64).new(primes.size*2)
  r_hi, r_lo = rhi, rhi - 2
  primes.each_with_index do |prime, j|
    k = (prime - 2) // modpg 
    r = (prime - 2) %  modpg + 2
    r_inv = resinvrs[r].to_u64 
    rl = (r_inv * r_lo - 2) % modpg + 2
    rh = (r_inv * r_hi - 2) % modpg + 2 
    kl = k * (prime + rl) + (r * rl - 2)
    kh = k * (prime + rh) + (r * rh - 2)
    kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kmin)
    kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= kmin)
    nextp[j << 1] = kl.to_u64 
    nextp[j << 1 | 1] = kh.to_u64 
  end
  nextp
end

Changing the code sequence such that the next loc isn’t dependent on input from the previous allows two EUs to run these instructions simultaneously. This one change in this one routine appreciably increased total runtime speed.

Here’s another snippet example from the same program.

Original
    primes.each_with_index do |prime, j| 
      k = nextp.to_unsafe[j << 1]        
      while k < kn                       
        seg[k >> s] |= 1_u64 << (k & bmask)
        k += prime  end                  
      nextp.to_unsafe[j << 1] = k - kn   
                                         
      k = nextp.to_unsafe[j << 1 | 1]    
      while k < kn                       
        seg[k >> s] |= 1_u64 << (k & bmask)
        k += prime  end                   
      nextp.to_unsafe[j << 1 | 1] = k - kn
    end                                   

More Efficient/Faster
    primes.each_with_index do |prime, j|
      kl = nextp.to_unsafe[j << 1]      
      kh = nextp.to_unsafe[j << 1 | 1]  
                                        
      while kl < kn                     
        seg[kl >> s] |= 1_u64 << (kl & bmask)
        kl += prime  end
                        
      while kh < kn     
        seg[kh >> s] |= 1_u64 << (kh & bmask)
        kh += prime  end                
        
      nextp.to_unsafe[j << 1] = kl - kn 
      nextp.to_unsafe[j << 1 | 1] = kh - kn 
    end

I’ve also seen the gcc compiled versions for C++|Nim show a bigger speed improvement (execution time reduction) than for the llvm ones, except for Rust which also sees significant improvement.

I don’t know how applicable this is to Julia, but I’m in the process of updating all the language versions of my code to get these FREE speedups by just writing the code better.

Mark_Nahabedian · January 18, 2022, 7:41pm

I’m not willing to wade through the PDF (I despise PDF, it was obsolete two decades ago, in a paperless world who needs a page description language) or the bits of code I saw.

This Julia program
https://github.com/MarkNahabedian/prime_numbers/blob/master/sieve.jl
will keep finding prime numbers until you stop it and when run again it will pick up where it left off. The program pretty much only uses integer addition and arithmetic comparison. I doubt I’m the first to implement it. I don’t know if it has a name but I think if it as an inside out Siege of Sieve of Eratosthenes.

I don’t run it very often, but so far it has found all primed up to 35249497.

You were asking about prime number algorithms in Julia so I thought I’d speak up.

This is probably the first Julia program I ever wrote.

lawless-m · January 19, 2022, 8:14am

There are around 200 implementations on Rosetta Code alone

https://rosettacode.org/wiki/Sieve_of_Eratosthenes

And a CSP version is illustrated in Hoare’s “Communicating Sequential Processes,” Communications of the ACM 21(8) (August 1978), 666-677.

The code is a prime (ha!) example of how one can use Channels to make better code.

https://swtch.com/~rsc/thread

Mark_Nahabedian · January 19, 2022, 2:31pm

The algorithm I used is not Sieve of Eratosthenes, which requires that an upper bound be specified for the primes one is looking for.

I’ve not seen or heard about the approach I took. It is pretty simple though, so I expect it’s been done before.

It counts up the positive integers. It maintains a list of, because I couldn’t think of a good name, sieve objects, one for each prime it has found so far. Each of these holds one prime number and it’s greatest multiple that is less than the integer under current consideration.

The current integer is tested against each sieve to see if it is equal to that sieve’s next prime. If so, a new sieve object is created for that prime and we proceed to the next integer. I don’t recall the edge cases but IIRC it doesn’t need to test every sieve against every integer.

I think I first implemented this in Dylan (probably my only Dylan program), but that version held the sieve objects in a priority queue ordered by next multiple. I restructured the algorithm to not need a priority queue in the Julia version.

Maybe one of you knows what this algorithm is called.

gustaphe · January 20, 2022, 7:12am

I don’t think the “upper bound” thing is the defining characteristic of SoE. Your algorithm is what Wikipedia/sieve_of_eratosthenes calls the “incremental sieve”.

Mark_Nahabedian · January 20, 2022, 3:08pm

Great. Thanks Gustaphe.

jzakiya · June 17, 2022, 8:05pm

I just released a new paper on this algorithm, that explains it in detail.

Twin Primes Segmented Sieve of Zakiya (SSoZ) Explained

I provide implementations|benchmarks in 6 languages: D, Go, C++, Nim, Rust, and Crystal.

I would love to see how a Julia implementation compares to them.
This paper should give anyone all the details necessary to create a unique Julia version.

.

jzakiya · June 26, 2022, 10:24pm

It’s been some time for people to read the paper, understand it, and try coding a Julia version.
Has anybody tried?

Are there questions people still need to ask|answer?

Oscar_Smith · June 26, 2022, 10:38pm

your proofs are still dramatically wrong (e.g. on line 5 of page 4, you “prove” that twin primes don’t exist since 2-2=0, not 2 as you go on to assume in line 8 of the same page), but the sieve seems not entirely broken (although it’s unclear to me how unique it is).

adienes · June 26, 2022, 10:58pm

As others have noted, the proof has serious mathematical errors and is not guaranteed to produce twin primes ad infinitum. Unfortunately I do not think you will find much motivation to implement from scratch a broken algorithm.

jzakiya · June 27, 2022, 1:56am

You people are absolutely CRAZY! You’re not qualified to judge anything.

You apparently have no comprehension of simple modular arithmetic, can’t follow logic, and are too lazy to even run the code to see that it works.

Thank goodness people here don’t represent any significant portion of the math|software world as others have shown more interest in studying the work and understanding it, and acknowledge and appreciate the advancement in the start-of-the-art the math and algorithm|software represent.

Oscar_Smith · June 27, 2022, 2:13am

you still haven’t fixed the part where you decided that 2-2=1 on page 4.

PetrKryslUCSD · June 27, 2022, 2:40am

This is zero?

adienes · June 27, 2022, 2:41am

When you find yourself disagreeing with everybody around you, including those people who have spent most of their lives studying the topic at hand, it may be time to turn introspective and reconsider if there may be an opportunity for you to learn something, rather than lash out.

Topic		Replies	Views
An Adventurous Juila Challenge Performance	0	734	December 13, 2018
New on Forem: “Generate Prime Pairs for Goldbach's Conjecture" Community question , performance , publications	3	246	March 18, 2025
A humble request/challenge Optimization (Mathematical)	5	1194	August 8, 2017
Fast Generation of Arrays of Prime Numbers Offtopic	32	1293	October 4, 2023
[YouTube/GitHub] What is the FASTEST Computer Language? 45 Languages Tested Performance	18	2896	June 24, 2023

Can Julia do this algorithm faster?

Related topics