Why is python faster than Julia

I wrote two absolutely identical pieces of code on python and Julia:

from time import time

start_time = time()
def euclidian_algorithm_division_count(a, b):
    division_count = 1
    if b > a:
        a, b = b, a
    while (c := a % b) != 0:
        a, b = b, c
        division_count += 1
    return division_count


N = 10**100
M = 10**4
from random import randint
division_count_array = []
while M > 0:
    a = randint(1, N)
    b = randint(1, N)
    division_count_array.append(euclidian_algorithm_division_count(a, b))
    M -= 1
print(time() - start_time, "seconds")
@time begin

function euclidean_algorithm_division_count(a, b)
    division_count = 1
    if b > a
        a, b = b, a
    end
    while (c = a % b) != 0
        a, b = b, c
        division_count += 1
    end
    return division_count
end

function main()
    N = big(10)^100
    M = 10^4
    division_count_array = []
    while M > 0
        a, b = rand(1:N, 2)
        push!(division_count_array, euclidean_algorithm_division_count(a, b))
        M -= 1
    end
end

main()

end

And that’s the output i got:
0.6859960556030273 seconds for python;
1.760202 seconds (9.66 M allocations: 191.444 MiB, 20.62% gc time) for Julia;
I believe compiled Julia can’t be slower than python for real, thus the problem is in my code. Where did i mistake?

Apart from any issues with the code itself, you are measuring compilation time in addition to compute time. Try using @btime from the BenchmarkTools package to get a more reliable measurement.

1 Like

got 1.5 seconds with btime, but it’s still slower than python

using BenchmarkTools

function euclidean_algorithm_division_count(a, b)
    division_count = 1
    if b > a
        a, b = b, a
    end
    while (c = a % b) != 0
        a, b = b, c
        division_count += 1
    end
    return division_count
end

function main()
    N = big(10)^100
    M = 10^4
    division_count_array = []
    while M > 0
        a, b = rand(1:N, 2)
        push!(division_count_array, euclidean_algorithm_division_count(a, b))
        M -= 1
    end
end

@btime main()

To be more explicit, you want to define your functions as above and then run

using BenchmarkTools
@btime main()

An analogy here would be that timing as you’ve done above is sort of like writing a C program and then timing both compilation and execution.

isn’t type inferred

Specifically here you’ll want to do Int[] rather than just [], so julia knows what types the array will hold.

I usually avoid creating a container first and then push!ing into it for this reason (I don’t want to have to specify the element types). So I often do something more like:

function main()
    N = big(10)^100
    M = 10^4
    division_count_array = map(1:M) do _
        a, b = rand(1:N, 2)
        euclidean_algorithm_division_count(a, b)
    end
end
2 Likes

I’m not sure (and your code doesn’t seem to run?) but profiler says the largest part of code is spend in the (c = a % b) line, and chaging the container to Int[] doesn’t do anything to my timings.

Profiler points me to

($fJ)(x::BigInt, y::BigInt) = MPZ.$fC(x, y)

in gmp.jl, so maybe there’s an issue here with BigInts just not being very fast?

Specifically, rewriting things as:

function main2()
    N = big(10)^100
    M = 10^4
    division_count_array = Int[]
    while M > 0
        a, b = rand(1:N, 2)
        push!(division_count_array, euclidean_algorithm_division_count(a, b))
        M -= 1
    end
end

@btime main2()

function main3()
    N = big(10)^100
    division_count_array = Array{Int}(undef, 10^4)
    for i ∈ 1:length(division_count_array)
        division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
    end
end

@btime main3()

I get

 514.021 ms (9549046 allocations: 185.52 MiB)
  539.400 ms (9552491 allocations: 185.58 MiB)
  485.769 ms (9697452 allocations: 188.44 MiB)

so no real changes - maybe there are things to look out for with BigInts I’m unaware of, as I never need to work with them.

And just to add reducing N to avoid BigInt (which of course isn’t directly comparable as we’re also changing the range for rand to consider) the time reduces by a factor of 100 and almost all allocations go away:


function main4()
    N = 10^30
    division_count_array = Array{Int}(undef, 10^4)
    for i ∈ 1:length(division_count_array)
        division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
    end
end

@btime main4()
3.869 ms (2 allocations: 78.20 KiB)
4 Likes

BigInt and BigFloat are handled by the GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be. I don’t know what python uses for it’s library, or if they rolled their own. I tried 2 quick things, one was allocating the division_count_array in one pass rather than growing it. The second was to turn off garbage collection.

On my machine I get:

  • 384 ms to 360 ms - Python code.
  • 473 ms - Your (mostly) original Julia code.
  • 333 ms - Julia /w GC disabled.

Which has me curious if the python garbage collector ever ran…

My version:

using BenchmarkTools

function euclidean_algorithm_division_count(a, b)
    division_count = 1
    if b > a
        a, b = b, a
    end
    while (c = a % b) != 0
        a, b = b, c
        division_count += 1
    end
    return division_count
end

function main()
    N = big(10)^100
    R = 1:N
    M = 10^4
    division_count_array = Vector{Int64}(undef, M)
    while M > 0
        a, b = rand(R, 2)
        division_count_array[M] = euclidean_algorithm_division_count(a, b)
        M -= 1
    end
end

GC.enable(false)
@btime main()
GC.enable(true)
GC.gc()
3 Likes

Also using @inbounds gives me a little more speed up:

function main()
    N = big(10)^100
    R = 1:N
    M = 10^4
    division_count_array = Vector{Int64}(undef, M)
    while M > 0
        a, b = rand(R, 2)
        @inbounds division_count_array[M] = euclidean_algorithm_division_count(a, b)
        M -= 1
    end
end

really rand(1:N,2) and not rand(N,2)?

julia> rand(big(10), 2)
10×2 Array{Float64,2}:
 0.85374    0.102817
 0.686707   0.545013
 0.905144   0.897086
 0.405992   0.878026
 0.113256   0.640781
 0.0366894  0.302942
 0.593746   0.203189
 0.647248   0.00892756
 0.826612   0.910477
 0.133924   0.318374
rand(big(10)^100, 2)
ERROR: InexactError: Int64(10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
Stacktrace:
 [1] rand(::BigInt, ::Int64) at .\gmp.jl:356
 [2] top-level scope at REPL[10]:1

rand(1:N, 2) appears to produce different results than rand(N, 2)…

1 Like

So, c = a % b is slow for BigInts because a new c is allocated each time. But there’s an in-place version, available, but a bit hidden: Base.GMP.MPZ.tdiv_r! (https://gmplib.org/manual/Integer-Division.html#Integer-Division)

function divcount!(a::BigInt, b::BigInt)
    count = 1
    if b > a
        a, b = b, a
    end
    c = a % b
    while !iszero(c)
        a, b, c = b, c, a  # we need to shuffle all three variables to avoid aliasing
        count += 1
        Base.GMP.MPZ.tdiv_r!(c, a, b)  # this works in-place!
    end
    return count
end

divcount(a::BigInt, b::BigInt) = divcount!(deepcopy(a), deepcopy(b))  # copy is not enough

The divcount! function actually mutates the inputs, a and b, so I made a separate divcounts which doesn’t. Let’s compare performance:

function main_(f)
    N = big(10)^100
    M = 10^4
    counts = Vector{Int}(undef, M)
    nrange = 1:N  # moving this outside loop since it allocates
    @inbounds for i in eachindex(counts)
        a, b = rand(nrange, 2)
        counts[i] = f(a, b)
    end
    return counts
end

julia> @btime main_(euclidean_algorithm_division_count);
  694.734 ms (5549040 allocations: 108.78 MiB)

julia> @btime main_(divcount);
  134.639 ms (250005 allocations: 13.28 MiB)

It’s definitely faster. There might be some gains to be had with the rand function too, but that is probably much less.

3 Likes

I also find that after disabling the garbage collector the python and Julia versions have the same run time.

EDIT: The statement below is wrong. I mistook the timings. The code that uses the reference is slower. I think what @yuyichao is getting at is that an allocation is made for the result of a % b in any case.

[quote=“pixel27, post:8, topic:35890”]
GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be.
[/quote]

The libraries are fast. But, Julia doesn’t do much, if anything, to optimize code for BigInts.

For example this
function euclidean_algorithm_division_count(a, b)
division_count = 1
c = Ref(BigInt())
if b > a
a, b = b, a
end
while (c[] = a % b) != 0
a, b = b, c[]
division_count += 1
end
return division_count
end

is a bit faster than the version above becuase it doesn’t allocate c every time it is assigned. But, it doesn’t have a big effect on the total run time.

No this code does not do any (useful) preallocation.

1 Like

Which one is ‘the version above’? It seems slower than the one I suggested:

julia> @btime divcount($a, $b)
  10.524 μs (13 allocations: 976 bytes)

julia> @btime euclidean_algorithm_division_count($a, $b) # yours
  36.689 μs (498 allocations: 9.88 KiB)

And it actually allocates more than the original version

julia> @btime euclidean_algorithm_division_count($a, $b)
  36.610 μs (496 allocations: 9.84 KiB)
1 Like