Why is python faster than Julia

I wrote two absolutely identical pieces of code on python and Julia:

``````from time import time

start_time = time()
def euclidian_algorithm_division_count(a, b):
division_count = 1
if b > a:
a, b = b, a
while (c := a % b) != 0:
a, b = b, c
division_count += 1
return division_count

N = 10**100
M = 10**4
from random import randint
division_count_array = []
while M > 0:
a = randint(1, N)
b = randint(1, N)
division_count_array.append(euclidian_algorithm_division_count(a, b))
M -= 1
print(time() - start_time, "seconds")
``````
``````@time begin

function euclidean_algorithm_division_count(a, b)
division_count = 1
if b > a
a, b = b, a
end
while (c = a % b) != 0
a, b = b, c
division_count += 1
end
return division_count
end

function main()
N = big(10)^100
M = 10^4
division_count_array = []
while M > 0
a, b = rand(1:N, 2)
push!(division_count_array, euclidean_algorithm_division_count(a, b))
M -= 1
end
end

main()

end
``````

And thatâs the output i got:
0.6859960556030273 seconds for python;
1.760202 seconds (9.66 M allocations: 191.444 MiB, 20.62% gc time) for Julia;
I believe compiled Julia canât be slower than python for real, thus the problem is in my code. Where did i mistake?

Apart from any issues with the code itself, you are measuring compilation time in addition to compute time. Try using `@btime` from the `BenchmarkTools` package to get a more reliable measurement.

1 Like

got 1.5 seconds with btime, but itâs still slower than python

``````using BenchmarkTools

function euclidean_algorithm_division_count(a, b)
division_count = 1
if b > a
a, b = b, a
end
while (c = a % b) != 0
a, b = b, c
division_count += 1
end
return division_count
end

function main()
N = big(10)^100
M = 10^4
division_count_array = []
while M > 0
a, b = rand(1:N, 2)
push!(division_count_array, euclidean_algorithm_division_count(a, b))
M -= 1
end
end

@btime main()
``````

To be more explicit, you want to define your functions as above and then run

``````using BenchmarkTools
@btime main()
``````

An analogy here would be that timing as youâve done above is sort of like writing a C program and then timing both compilation and execution.

isnât type inferred

Specifically here youâll want to do `Int[]` rather than just `[]`, so julia knows what types the array will hold.

I usually avoid creating a container first and then `push!`ing into it for this reason (I donât want to have to specify the element types). So I often do something more like:

``````function main()
N = big(10)^100
M = 10^4
division_count_array = map(1:M) do _
a, b = rand(1:N, 2)
euclidean_algorithm_division_count(a, b)
end
end
``````
2 Likes

Iâm not sure (and your code doesnât seem to run?) but profiler says the largest part of code is spend in the `(c = a % b)` line, and chaging the container to `Int[]` doesnât do anything to my timings.

Profiler points me to

``````(\$fJ)(x::BigInt, y::BigInt) = MPZ.\$fC(x, y)
``````

in `gmp.jl`, so maybe thereâs an issue here with `BigInt`s just not being very fast?

Specifically, rewriting things as:

``````function main2()
N = big(10)^100
M = 10^4
division_count_array = Int[]
while M > 0
a, b = rand(1:N, 2)
push!(division_count_array, euclidean_algorithm_division_count(a, b))
M -= 1
end
end

@btime main2()

function main3()
N = big(10)^100
division_count_array = Array{Int}(undef, 10^4)
for i â 1:length(division_count_array)
division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
end
end

@btime main3()
``````

I get

`````` 514.021 ms (9549046 allocations: 185.52 MiB)
539.400 ms (9552491 allocations: 185.58 MiB)
485.769 ms (9697452 allocations: 188.44 MiB)
``````

so no real changes - maybe there are things to look out for with `BigInt`s Iâm unaware of, as I never need to work with them.

And just to add reducing `N` to avoid `BigInt` (which of course isnât directly comparable as weâre also changing the range for `rand` to consider) the time reduces by a factor of 100 and almost all allocations go away:

``````
function main4()
N = 10^30
division_count_array = Array{Int}(undef, 10^4)
for i â 1:length(division_count_array)
division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
end
end

@btime main4()
3.869 ms (2 allocations: 78.20 KiB)
``````
4 Likes

BigInt and BigFloat are handled by the GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be. I donât know what python uses for itâs library, or if they rolled their own. I tried 2 quick things, one was allocating the division_count_array in one pass rather than growing it. The second was to turn off garbage collection.

On my machine I get:

• 384 ms to 360 ms - Python code.
• 473 ms - Your (mostly) original Julia code.
• 333 ms - Julia /w GC disabled.

Which has me curious if the python garbage collector ever ranâŚ

My version:

``````using BenchmarkTools

function euclidean_algorithm_division_count(a, b)
division_count = 1
if b > a
a, b = b, a
end
while (c = a % b) != 0
a, b = b, c
division_count += 1
end
return division_count
end

function main()
N = big(10)^100
R = 1:N
M = 10^4
division_count_array = Vector{Int64}(undef, M)
while M > 0
a, b = rand(R, 2)
division_count_array[M] = euclidean_algorithm_division_count(a, b)
M -= 1
end
end

GC.enable(false)
@btime main()
GC.enable(true)
GC.gc()
``````
3 Likes

Also using `@inbounds` gives me a little more speed up:

``````function main()
N = big(10)^100
R = 1:N
M = 10^4
division_count_array = Vector{Int64}(undef, M)
while M > 0
a, b = rand(R, 2)
@inbounds division_count_array[M] = euclidean_algorithm_division_count(a, b)
M -= 1
end
end
``````

really `rand(1:N,2)` and not `rand(N,2)`?

``````julia> rand(big(10), 2)
10Ă2 Array{Float64,2}:
0.85374    0.102817
0.686707   0.545013
0.905144   0.897086
0.405992   0.878026
0.113256   0.640781
0.0366894  0.302942
0.593746   0.203189
0.647248   0.00892756
0.826612   0.910477
0.133924   0.318374
``````
``````rand(big(10)^100, 2)
ERROR: InexactError: Int64(10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
Stacktrace:
[1] rand(::BigInt, ::Int64) at .\gmp.jl:356
[2] top-level scope at REPL[10]:1
``````

rand(1:N, 2) appears to produce different results than rand(N, 2)âŚ

1 Like

So, `c = a % b` is slow for `BigInt`s because a new `c` is allocated each time. But thereâs an in-place version, available, but a bit hidden: `Base.GMP.MPZ.tdiv_r!` (https://gmplib.org/manual/Integer-Division.html#Integer-Division)

``````function divcount!(a::BigInt, b::BigInt)
count = 1
if b > a
a, b = b, a
end
c = a % b
while !iszero(c)
a, b, c = b, c, a  # we need to shuffle all three variables to avoid aliasing
count += 1
Base.GMP.MPZ.tdiv_r!(c, a, b)  # this works in-place!
end
return count
end

divcount(a::BigInt, b::BigInt) = divcount!(deepcopy(a), deepcopy(b))  # copy is not enough
``````

The `divcount!` function actually mutates the inputs, `a` and `b`, so I made a separate `divcounts` which doesnât. Letâs compare performance:

``````function main_(f)
N = big(10)^100
M = 10^4
counts = Vector{Int}(undef, M)
nrange = 1:N  # moving this outside loop since it allocates
@inbounds for i in eachindex(counts)
a, b = rand(nrange, 2)
counts[i] = f(a, b)
end
return counts
end

julia> @btime main_(euclidean_algorithm_division_count);
694.734 ms (5549040 allocations: 108.78 MiB)

julia> @btime main_(divcount);
134.639 ms (250005 allocations: 13.28 MiB)
``````

Itâs definitely faster. There might be some gains to be had with the `rand` function too, but that is probably much less.

3 Likes

I also find that after disabling the garbage collector the python and Julia versions have the same run time.

EDIT: The statement below is wrong. I mistook the timings. The code that uses the reference is slower. I think what @yuyichao is getting at is that an allocation is made for the result of `a % b` in any case.

[quote=âpixel27, post:8, topic:35890â]
GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be.
[/quote]

The libraries are fast. But, Julia doesnât do much, if anything, to optimize code for `BigInt`s.

For example this
function euclidean_algorithm_division_count(a, b)
division_count = 1
c = Ref(BigInt())
if b > a
a, b = b, a
end
while (c[] = a % b) != 0
a, b = b, c[]
division_count += 1
end
return division_count
end

is a bit faster than the version above becuase it doesnât allocate `c` every time it is assigned. But, it doesnât have a big effect on the total run time.

No this code does not do any (useful) preallocation.

1 Like

Which one is âthe version aboveâ? It seems slower than the one I suggested:

``````julia> @btime divcount(\$a, \$b)
10.524 Îźs (13 allocations: 976 bytes)

julia> @btime euclidean_algorithm_division_count(\$a, \$b) # yours
36.689 Îźs (498 allocations: 9.88 KiB)
``````

And it actually allocates more than the original version

``````julia> @btime euclidean_algorithm_division_count(\$a, \$b)
36.610 Îźs (496 allocations: 9.84 KiB)
``````
1 Like