Iterating over primes

In Project Euler, prime numbers come up quite a lot, but sometimes it is not clear how many prime numbers are needed. To avoid duplicate calculations, and as a challenge to get a bit more familiar with iterators in julia, I’ve written a struct that computes the next prime number as it is needed and then internally stores the vector of all primes so far, such that these calculations do not need to happen more than once. I’m aware that growing vectors with push! is not very nice, is there another more efficient way (in base julia!) to do what I’m doing? (Purely to help me become more aware of iterator practices, I’m aware of Primes.jl and the likes).

using BenchmarkTools

struct Primes
    primes_vec::Vector{Int}
end
Primes() = Primes([2,3])
function Base.iterate(P::Primes, state=1) 
    if state > length(P.primes_vec) 
        next_prime!(P)
    end
    return (P.primes_vec[state], state+1)
end
function isprime(n, primes::Primes)
    for p in primes
        p > floor(Int, sqrt(n)) && return true
        n % p == 0 && return false
    end
end

""" Assumes that primes_vec contains the first primes (at least 2 and 3) and adds the next prime by a simple sieve. """
function next_prime!(primes::Primes)
    n = primes.primes_vec[end] + 2
    while true
        if isprime(n, primes)
            push!(primes.primes_vec, n)
            return
        end
        n += 2
    end
end

function test_primes()
    P = Primes()
    @time sum(Iterators.take(P, 100_000)) # Here, we need to compute the first 100_000 primes
    @time sum(Iterators.take(P, 100_000)) # Here, we can re-use them from memory. 
end
test_primes()

Gives

  0.062785 seconds (10 allocations: 1.828 MiB)
  0.000066 seconds

The nextprime function works more or less like an iterator: Functions · Primes.jl I think it avoids storing anything.

Even if you don’t want to depend on Primes.jl, you could study it.

1 Like

I see, thank you! The way I understood that one is that it would recompute the primes already computed so far (as it doesn’t store anything). My point was to try and avoid unnecessary recomputations. (I’m aware that the computations are cheap here, but let’s imagine I’ll do the same with a different kind of sequence that is computationally expensive to compute)

1 Like

I don’t know that much about primes, but nextprime calls the isprime function, which does not need to compute all the previous primes.

Prime numbers are probably the most heavily studied subject in all of mathematics, so I expect there are advanced methods for determining primality.

…but if your main goal is to investigate iterators and accumulating stored intermediate results, then, firstly, I would point out that push! is not so inefficient. It preallocates more space than you ask for to avoid resizing on each step. You can also use sizehint! if you have a good hunch how much space you will eventually need (for example if you want to calculate 100_000 primes, you know how much space to set aside.)

Concerning primes again:

This one is dangerous, and can give wrong answers, due to floating point issues. Use isqrt instead. Prime calculations inherently deal with integers, so you should avoid going through float operations.

3 Likes

This is an interesting kind of iterator, in that it’s a stateful iterator, but only as a performance optimization!

I don’t think you’re doing anything wrong, but here are two slightly different approaches you could have taken:

  • Make it a purely stateless iterator, by storing the already computed primes in the iteration state (the second return value).
  • Lock when updating the iterator, to be concurrency-safe.
1 Like

AFAIK, once one’s in bigint land, using a sieve (so storing all primes) is much faster than using nextprime repeatedly.

1 Like

I see the approach to working purely stateless, but my point is that I wish to re-use the same iterator at a later point and benefit from the work already done

e.g. see how

@time sum(Iterators.take(P, 100_000)) # Here, we need to compute the first 100_000 primes
@time sum(Iterators.take(P, 100_000)) # Here, we can re-use them from memory. 

produces the following, where the second run is re-using the work done in the first run

  0.062785 seconds (10 allocations: 1.828 MiB)
  0.000066 seconds
1 Like

Thanks, I didn’t think about that!

If you work with primes a lot and also want save time between Julia sessions, you could also push the computation of primes to the compilation step and keep a “global” list of primes around. The list gets generated once when your package is compiled and could then be used (and expanded) during a session.

I believe this is what base Julia is doing for combinatorial factors up to a certain size:

1 Like

Or faster, p^2 > n, unless overflow is an issue.

You can make a stateless base iterator, and make it stateful via Iterators.Stateful.

2 Likes

As one of the main developers of Primes.jl the basic things you are missing for performance are as follows:

  1. Fast isprime checking (look up Baillie–PSW or Miller-Rabin). For an n bit number, these methods are ~O(n^3) as opposed to O(2^n) of trial division
  2. Sieve methods rather than spending time testing whether each number is prime individually, we can compute all the primes up to n in roughly O(n) operations.
4 Likes