Improving performance in checking prime numbers

Hi, I wrote the following code in python with help of numba JIT:

from numba import jit

@jit(nopython=True)
def is_prime(number):
    upper_limit = int(number**0.5) + 1
    for i in range(2, upper_limit):
        if number % i:
            return False
    return True

number = 23002999

is_prime(number)

By timing this code in jupyter with timeit magic the following result I get:

394 ns ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For my project I need more performance so I decide to use Julia; The following is my julia code:

using BenchmarkTools

function isprime(number)
    upper_limit = trunc(Int, sqrt(number)) + 1
    for i in 2:upper_limit
        if number % i == 0
            return false
        end
    end
    return true
end

number = 23002999
isprime(number)

by using @benchmark macro I get the following results:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     40.100 μs (0.00% GC)
  median time:      40.200 μs (0.00% GC)
  mean time:        43.025 μs (0.00% GC)
  maximum time:     1.796 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Definitely something is wrong with my code; Do you have any suggestions for better performance? Thank you in advance.

I think the bug is in the python code, I think it should be if number % i == 0: to let the algorithm actually do something.

9 Likes

You are absolutely right, after fixing this the result is:

51.6 µs ± 3.31 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
5 Likes

You cannot use solutions provided by packages for this project? It seems strange to be worried about performance of isprime and not using a polished library to do the work.

2 Likes

yes

agree …
the JuliaMath / Primes package solutions is:

I need an unconventional parameter to analyze the distribution of prime numbers. To write the code for calculating that parameter, I need to get a better understanding of how to increase performance. I am currently testing different ways to implement it.

Is this in standard library?

It’s in Primes.jl (which you can install using the package manager). Also, it’s worth noting that finding a large number of of primes using a sieve may be better for what you’re describing.

1 Like

Since you picked a prime number in your example that was big enough to profit from multithreading, here’s a multithreaded version of your isprime:

julia> using Transducers

julia> function isprime_xt(number)
           upper_limit = trunc(Int, sqrt(number)) + 1
           init=false
           basesize = 2*upper_limit ÷ Threads.nthreads()
           foldxt(right, ReduceIf(x -> x === false), 2:upper_limit |> Map(i -> number % i != 0); init, basesize)
       end
isprime_xt (generic function with 1 method)

julia> function isprime(number)
           upper_limit = trunc(Int, sqrt(number)) + 1
           for i in 2:upper_limit
               if number % i == 0
                   return false
               end
           end
           return true
       end
isprime (generic function with 1 method)

julia> let n = Ref(23002999)
           @btime isprime($n[]) 
           @btime isprime_xt($n[])
       end
  25.299 μs (0 allocations: 0 bytes)
  10.009 μs (32 allocations: 2.30 KiB)
true

I suspect that a much faster version could be made with a bit of effort by chunking the search range into blocks of some multiple of 8 or 16 and then taking advantage of LoopVectorization.jl in each chunk, rather than multi-threading. Then the outer part could be profitably multithreaded too.

2 Likes

Oh, I should also mention that though this multi-threaded approach does get early termination, it still won’t terminate as fast as the sequential one, so there’s a real overhead for numbers that are like a multiple of two.

julia> let n = Ref(23002999 + 1)
           @btime isprime($n[]) 
           @btime isprime_xt($n[])
       end
  6.419 ns (0 allocations: 0 bytes)
  4.911 μs (24 allocations: 1.77 KiB)
false

A smarter approach might be to do the first chunk of integers sequentially.

1 Like

For problems like factoring primes, it’s probably worth pursuing algorithmic improvements before micro-optimizations. The best system will probably be some combination of trial division by small factors in tandem with a probabilistic prime test

2 Likes

Sure, but that’s not what the OP asked for, and I don’t know anything about algorithmic improvements to the OP’s code anyways. :slight_smile:

Thank you; I am planning to use Sieve of Eratosthenes; but may be with some search I can find a better algorithm for large primes.

Thank you very much for your suggestion. To understand your code, I need to study multithreading; Your code performance makes me very eager to learn it as soon as possible.

Depending on how large large is, there are asymtotically more efficient sieves such as the Quadratic sieve or the General Number Field sieve.

4 Likes