Hi, I wrote the following code in python with help of numba JIT:
from numba import jit
@jit(nopython=True)
def is_prime(number):
upper_limit = int(number**0.5) + 1
for i in range(2, upper_limit):
if number % i:
return False
return True
number = 23002999
is_prime(number)
By timing this code in jupyter with timeit magic the following result I get:
394 ns ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
For my project I need more performance so I decide to use Julia; The following is my julia code:
using BenchmarkTools
function isprime(number)
upper_limit = trunc(Int, sqrt(number)) + 1
for i in 2:upper_limit
if number % i == 0
return false
end
end
return true
end
number = 23002999
isprime(number)
by using @benchmark macro I get the following results:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 40.100 μs (0.00% GC)
median time: 40.200 μs (0.00% GC)
mean time: 43.025 μs (0.00% GC)
maximum time: 1.796 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Definitely something is wrong with my code; Do you have any suggestions for better performance? Thank you in advance.
You cannot use solutions provided by packages for this project? It seems strange to be worried about performance of isprime and not using a polished library to do the work.
I need an unconventional parameter to analyze the distribution of prime numbers. To write the code for calculating that parameter, I need to get a better understanding of how to increase performance. I am currently testing different ways to implement it.
It’s in Primes.jl (which you can install using the package manager). Also, it’s worth noting that finding a large number of of primes using a sieve may be better for what you’re describing.
Since you picked a prime number in your example that was big enough to profit from multithreading, here’s a multithreaded version of your isprime:
julia> using Transducers
julia> function isprime_xt(number)
upper_limit = trunc(Int, sqrt(number)) + 1
init=false
basesize = 2*upper_limit ÷ Threads.nthreads()
foldxt(right, ReduceIf(x -> x === false), 2:upper_limit |> Map(i -> number % i != 0); init, basesize)
end
isprime_xt (generic function with 1 method)
julia> function isprime(number)
upper_limit = trunc(Int, sqrt(number)) + 1
for i in 2:upper_limit
if number % i == 0
return false
end
end
return true
end
isprime (generic function with 1 method)
julia> let n = Ref(23002999)
@btime isprime($n[])
@btime isprime_xt($n[])
end
25.299 μs (0 allocations: 0 bytes)
10.009 μs (32 allocations: 2.30 KiB)
true
I suspect that a much faster version could be made with a bit of effort by chunking the search range into blocks of some multiple of 8 or 16 and then taking advantage of LoopVectorization.jl in each chunk, rather than multi-threading. Then the outer part could be profitably multithreaded too.
Oh, I should also mention that though this multi-threaded approach does get early termination, it still won’t terminate as fast as the sequential one, so there’s a real overhead for numbers that are like a multiple of two.
julia> let n = Ref(23002999 + 1)
@btime isprime($n[])
@btime isprime_xt($n[])
end
6.419 ns (0 allocations: 0 bytes)
4.911 μs (24 allocations: 1.77 KiB)
false
A smarter approach might be to do the first chunk of integers sequentially.
For problems like factoring primes, it’s probably worth pursuing algorithmic improvements before micro-optimizations. The best system will probably be some combination of trial division by small factors in tandem with a probabilistic prime test
Thank you very much for your suggestion. To understand your code, I need to study multithreading; Your code performance makes me very eager to learn it as soon as possible.