I wrote two absolutely identical pieces of code on python and Julia:

from time import time
start_time = time()
def euclidian_algorithm_division_count(a, b):
division_count = 1
if b > a:
a, b = b, a
while (c := a % b) != 0:
a, b = b, c
division_count += 1
return division_count
N = 10**100
M = 10**4
from random import randint
division_count_array = []
while M > 0:
a = randint(1, N)
b = randint(1, N)
division_count_array.append(euclidian_algorithm_division_count(a, b))
M -= 1
print(time() - start_time, "seconds")

@time begin
function euclidean_algorithm_division_count(a, b)
division_count = 1
if b > a
a, b = b, a
end
while (c = a % b) != 0
a, b = b, c
division_count += 1
end
return division_count
end
function main()
N = big(10)^100
M = 10^4
division_count_array = []
while M > 0
a, b = rand(1:N, 2)
push!(division_count_array, euclidean_algorithm_division_count(a, b))
M -= 1
end
end
main()
end

And thatâs the output i got:
0.6859960556030273 seconds for python;
1.760202 seconds (9.66 M allocations: 191.444 MiB, 20.62% gc time) for Julia;
I believe compiled Julia canât be slower than python for real, thus the problem is in my code. Where did i mistake?

Apart from any issues with the code itself, you are measuring compilation time in addition to compute time. Try using @btime from the BenchmarkTools package to get a more reliable measurement.

got 1.5 seconds with btime, but itâs still slower than python

using BenchmarkTools
function euclidean_algorithm_division_count(a, b)
division_count = 1
if b > a
a, b = b, a
end
while (c = a % b) != 0
a, b = b, c
division_count += 1
end
return division_count
end
function main()
N = big(10)^100
M = 10^4
division_count_array = []
while M > 0
a, b = rand(1:N, 2)
push!(division_count_array, euclidean_algorithm_division_count(a, b))
M -= 1
end
end
@btime main()

Specifically here youâll want to do Int[] rather than just [], so julia knows what types the array will hold.

I usually avoid creating a container first and then push!ing into it for this reason (I donât want to have to specify the element types). So I often do something more like:

function main()
N = big(10)^100
M = 10^4
division_count_array = map(1:M) do _
a, b = rand(1:N, 2)
euclidean_algorithm_division_count(a, b)
end
end

Iâm not sure (and your code doesnât seem to run?) but profiler says the largest part of code is spend in the (c = a % b) line, and chaging the container to Int[] doesnât do anything to my timings.

Profiler points me to

($fJ)(x::BigInt, y::BigInt) = MPZ.$fC(x, y)

in gmp.jl, so maybe thereâs an issue here with BigInts just not being very fast?

Specifically, rewriting things as:

function main2()
N = big(10)^100
M = 10^4
division_count_array = Int[]
while M > 0
a, b = rand(1:N, 2)
push!(division_count_array, euclidean_algorithm_division_count(a, b))
M -= 1
end
end
@btime main2()
function main3()
N = big(10)^100
division_count_array = Array{Int}(undef, 10^4)
for i â 1:length(division_count_array)
division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
end
end
@btime main3()

I get

514.021 ms (9549046 allocations: 185.52 MiB)
539.400 ms (9552491 allocations: 185.58 MiB)
485.769 ms (9697452 allocations: 188.44 MiB)

so no real changes - maybe there are things to look out for with BigInts Iâm unaware of, as I never need to work with them.

And just to add reducing N to avoid BigInt (which of course isnât directly comparable as weâre also changing the range for rand to consider) the time reduces by a factor of 100 and almost all allocations go away:

function main4()
N = 10^30
division_count_array = Array{Int}(undef, 10^4)
for i â 1:length(division_count_array)
division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
end
end
@btime main4()
3.869 ms (2 allocations: 78.20 KiB)

BigInt and BigFloat are handled by the GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be. I donât know what python uses for itâs library, or if they rolled their own. I tried 2 quick things, one was allocating the division_count_array in one pass rather than growing it. The second was to turn off garbage collection.

On my machine I get:

384 ms to 360 ms - Python code.

473 ms - Your (mostly) original Julia code.

333 ms - Julia /w GC disabled.

Which has me curious if the python garbage collector ever ranâŚ

My version:

using BenchmarkTools
function euclidean_algorithm_division_count(a, b)
division_count = 1
if b > a
a, b = b, a
end
while (c = a % b) != 0
a, b = b, c
division_count += 1
end
return division_count
end
function main()
N = big(10)^100
R = 1:N
M = 10^4
division_count_array = Vector{Int64}(undef, M)
while M > 0
a, b = rand(R, 2)
division_count_array[M] = euclidean_algorithm_division_count(a, b)
M -= 1
end
end
GC.enable(false)
@btime main()
GC.enable(true)
GC.gc()

Also using @inbounds gives me a little more speed up:

function main()
N = big(10)^100
R = 1:N
M = 10^4
division_count_array = Vector{Int64}(undef, M)
while M > 0
a, b = rand(R, 2)
@inbounds division_count_array[M] = euclidean_algorithm_division_count(a, b)
M -= 1
end
end

function divcount!(a::BigInt, b::BigInt)
count = 1
if b > a
a, b = b, a
end
c = a % b
while !iszero(c)
a, b, c = b, c, a # we need to shuffle all three variables to avoid aliasing
count += 1
Base.GMP.MPZ.tdiv_r!(c, a, b) # this works in-place!
end
return count
end
divcount(a::BigInt, b::BigInt) = divcount!(deepcopy(a), deepcopy(b)) # copy is not enough

The divcount! function actually mutates the inputs, a and b, so I made a separate divcounts which doesnât. Letâs compare performance:

function main_(f)
N = big(10)^100
M = 10^4
counts = Vector{Int}(undef, M)
nrange = 1:N # moving this outside loop since it allocates
@inbounds for i in eachindex(counts)
a, b = rand(nrange, 2)
counts[i] = f(a, b)
end
return counts
end
julia> @btime main_(euclidean_algorithm_division_count);
694.734 ms (5549040 allocations: 108.78 MiB)
julia> @btime main_(divcount);
134.639 ms (250005 allocations: 13.28 MiB)

Itâs definitely faster. There might be some gains to be had with the rand function too, but that is probably much less.

I also find that after disabling the garbage collector the python and Julia versions have the same run time.

EDIT: The statement below is wrong. I mistook the timings. The code that uses the reference is slower. I think what @yuyichao is getting at is that an allocation is made for the result of a % b in any case.

[quote=âpixel27, post:8, topic:35890â]
GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be.
[/quote]

The libraries are fast. But, Julia doesnât do much, if anything, to optimize code for BigInts.

For example this
function euclidean_algorithm_division_count(a, b)
division_count = 1
c = Ref(BigInt())
if b > a
a, b = b, a
end
while (c[] = a % b) != 0
a, b = b, c[]
division_count += 1
end
return division_count
end is a bit faster than the version above becuase it doesnât allocate c every time it is assigned. But, it doesnât have a big effect on the total run time.