Julia vs Python almost the same performance but Python with taichi is 100x faster. Why?And can be improved?

Dudes,recently I reviewed Python and the Taichi package. As lots of users mentioned, the performance enchanced almost 100x by Taichi. And, when I compare with the counting primes code between Julia and Python, it seems they are almost the same, maybe there is something wrong in my code.

Here is the code in Python:

"""Count the number of primes in range [1, n].
"""

def is_prime(n: int):
result = True
for k in range(2, int(n ** 0.5) + 1):
if n % k == 0:
result = False
break
return result

def count_primes(n: int) -> int:
count = 0
for k in range(2, n):
if is_prime(k):
count += 1

return count

print(count_primes(1000000))

time python count_primes.py

real 0m2.235s
user 0m2.235s
sys 0m0.000s

and here is the code in Julia:

function is_prime(n::Int)
result=true
for k in (2:(trunc(n^0.5)+1))
if n % k ==0 
    result=false
    break
end
end
return result
end

function count_primes(n::Int)
    count = 0
    for k in (2:n)
        if is_prime(k)
            count+=1
        end
    end
    return count
end

@timev println(count_primes(1000000))

2.659729 seconds (10 allocations: 272 bytes)
elapsed time (ns): 2659728961
bytes allocated: 272
pool allocs: 10

and here is the code in Python with Taichi:

"""Count the number of primes below a given bound.
"""
import taichi as ti
ti.init()

@ti.func
def is_prime(n: int):
result = True
for k in range(2, int(n ** 0.5) + 1):
if n % k == 0:
result = False
break
return result

@ti.kernel
def count_primes(n: int) -> int:
count = 0
for k in range(2, n):
if is_prime(k):
count += 1

return count

print(count_primes(1000000))

time python count_primes.py

real 0m0.363s
user 0m0.546s
sys 0m0.179s

I know Julia is fast, but in this condition by far in this condition, it seems Julia is the same with Python and the Taichi is a booster. If someone hotshot can do me a favor on this question please?

and there is a framework of Taichi I find:
add

I am all gratitude!

1 Like

It appears that Taichi is using multithreadding. If you change count_primes to

using .Threads
function count_primes(n::Int)
    count = 0
    @threads for k in (2:n)
        if is_prime(k)
            count+=1
        end
    end
    return count
end

You will see a speedup proprotional to the number of threads you have.

# orignial
julia> @time println(count_primes(1000000))
78497
  0.594386 seconds (8 allocations: 232 bytes)
# threaded 
julia> @time println(count_primes(1000000))
73976
  0.156051 seconds (78.02 k allocations: 1.193 MiB)

This is also an incredibly bad algorithm for this task.

julia> using Primes

julia> @time length(primes(1000000))
  0.001682 seconds (5 allocations: 876.125 KiB)

Note that this still isn’t very close to optimal (with an hour or so of effort, there’s another factor of 10 or so in primes, and there’s significant further asymptotic improvement if you only want to count the primes less than n).

2 Likes

I might be overlooking something, but from 2.235 seconds for Python to 0.363 seconds with Taichi, is that not more like a factor of 6 instead of a factor of 100? Probably in line with the number of threads as @Oscar_Smith pointed out.

4 Likes

Side note: if you want to count primes efficiently, you can consider using the primecount program by Kim Walisch. There is a Julia wrapper for it. Here is an example to demonstrate how fast it is

> primecount 1e16 --time
279238341033925
Seconds: 0.931
4 Likes

Thank you @Oscar_Smith ! Such a wonderful answer.
I have the impression that Julia is much more faster than Python even on a +() function.
The performance of the original code in Julia and Python represents the performance of functions and control-flow. In this view, any problem in my code please ? It seems they have almost the same performance.

1 Like

Is there? There is primecount_jll but that “only” provides the binary/lib.

julia> using primecount_jll

julia> run(`$(primecount()) 1e16 --time`);
279238341033925
Seconds: 5.542

I’m not able to reproduce your results. For me the python takes 2.7 seconds, while the Julia takes .6 seconds. That said, I had to fix your indentation to make the python code run. Can you edit your original post to have proper indentation to ensure we are running the same code?

I wrote one a long time ago. It would have to be pruned and modernized for recent versions of Julia.

1 Like

You need to be aware of the types here. trunc(n^0.5) returns a Float64, which means that you will iterate over a range of floats, which means that you will do n % k with a float k, which is slow.

Just changing this to

for k in (2:(trunc(Int, n^0.5)+1))

yielded a 7x speedup for me.

Furthermore (and with less impact):

  • Never use x^0.5 instead of sqrt(x). This is why sqrt exists
  • You should really use isqrt here to avoid rounding errors.

So change the line to

for k in (2:isqrt(n)+1)

Now I get

julia> @btime count_primes(10^6)
  158.684 ms (0 allocations: 0 bytes)

No multithreading or anything. That’s a 12x speedup.

20 Likes

BTW:

julia> is_prime(1)
true

julia> is_prime(2)
false

:face_with_raised_eyebrow:

9 Likes

I was thinking of Python with Taichi is doing Taichi(the Chinese traditional sports) while coding Python will make you a better Python coder.

@Freya_the_Goddess Python coding while doing Tai-chi could damage your computer (and your feet).

Have you heard about Taichi.jl?

2 Likes

Just heard it now, it is created by you?

For someone who is still in beginner level of Julia and Python like me.

What could Taichi bring to the users ? Why we need to use Taichi in Julia?

I see the Julia Sets can be animated differently than just using Plots. Great one.