In Project Euler, prime numbers come up quite a lot, but sometimes it is not clear how many prime numbers are needed. To avoid duplicate calculations, and as a challenge to get a bit more familiar with iterators in julia, I’ve written a struct that computes the next prime number as it is needed and then internally stores the vector of all primes so far, such that these calculations do not need to happen more than once. I’m aware that growing vectors with push! is not very nice, is there another more efficient way (in base julia!) to do what I’m doing? (Purely to help me become more aware of iterator practices, I’m aware of Primes.jl and the likes).
using BenchmarkTools
struct Primes
primes_vec::Vector{Int}
end
Primes() = Primes([2,3])
function Base.iterate(P::Primes, state=1)
if state > length(P.primes_vec)
next_prime!(P)
end
return (P.primes_vec[state], state+1)
end
function isprime(n, primes::Primes)
for p in primes
p > floor(Int, sqrt(n)) && return true
n % p == 0 && return false
end
end
""" Assumes that primes_vec contains the first primes (at least 2 and 3) and adds the next prime by a simple sieve. """
function next_prime!(primes::Primes)
n = primes.primes_vec[end] + 2
while true
if isprime(n, primes)
push!(primes.primes_vec, n)
return
end
n += 2
end
end
function test_primes()
P = Primes()
@time sum(Iterators.take(P, 100_000)) # Here, we need to compute the first 100_000 primes
@time sum(Iterators.take(P, 100_000)) # Here, we can re-use them from memory.
end
test_primes()
I see, thank you! The way I understood that one is that it would recompute the primes already computed so far (as it doesn’t store anything). My point was to try and avoid unnecessary recomputations. (I’m aware that the computations are cheap here, but let’s imagine I’ll do the same with a different kind of sequence that is computationally expensive to compute)
…but if your main goal is to investigate iterators and accumulating stored intermediate results, then, firstly, I would point out that push! is not so inefficient. It preallocates more space than you ask for to avoid resizing on each step. You can also use sizehint! if you have a good hunch how much space you will eventually need (for example if you want to calculate 100_000 primes, you know how much space to set aside.)
Concerning primes again:
This one is dangerous, and can give wrong answers, due to floating point issues. Use isqrt instead. Prime calculations inherently deal with integers, so you should avoid going through float operations.
I see the approach to working purely stateless, but my point is that I wish to re-use the same iterator at a later point and benefit from the work already done
e.g. see how
@time sum(Iterators.take(P, 100_000)) # Here, we need to compute the first 100_000 primes
@time sum(Iterators.take(P, 100_000)) # Here, we can re-use them from memory.
produces the following, where the second run is re-using the work done in the first run
If you work with primes a lot and also want save time between Julia sessions, you could also push the computation of primes to the compilation step and keep a “global” list of primes around. The list gets generated once when your package is compiled and could then be used (and expanded) during a session.
I believe this is what base Julia is doing for combinatorial factors up to a certain size:
As one of the main developers of Primes.jl the basic things you are missing for performance are as follows:
Fast isprime checking (look up Baillie–PSW or Miller-Rabin). For an n bit number, these methods are ~O(n^3) as opposed to O(2^n) of trial division
Sieve methods rather than spending time testing whether each number is prime individually, we can compute all the primes up to n in roughly O(n) operations.