Why is python faster than Julia

adangreen · March 12, 2020, 4:36pm

I wrote two absolutely identical pieces of code on python and Julia:

from time import time

start_time = time()
def euclidian_algorithm_division_count(a, b):
    division_count = 1
    if b > a:
        a, b = b, a
    while (c := a % b) != 0:
        a, b = b, c
        division_count += 1
    return division_count


N = 10**100
M = 10**4
from random import randint
division_count_array = []
while M > 0:
    a = randint(1, N)
    b = randint(1, N)
    division_count_array.append(euclidian_algorithm_division_count(a, b))
    M -= 1
print(time() - start_time, "seconds")

@time begin

function euclidean_algorithm_division_count(a, b)
    division_count = 1
    if b > a
        a, b = b, a
    end
    while (c = a % b) != 0
        a, b = b, c
        division_count += 1
    end
    return division_count
end

function main()
    N = big(10)^100
    M = 10^4
    division_count_array = []
    while M > 0
        a, b = rand(1:N, 2)
        push!(division_count_array, euclidean_algorithm_division_count(a, b))
        M -= 1
    end
end

main()

end

And that’s the output i got:
0.6859960556030273 seconds for python;
1.760202 seconds (9.66 M allocations: 191.444 MiB, 20.62% gc time) for Julia;
I believe compiled Julia can’t be slower than python for real, thus the problem is in my code. Where did i mistake?

nilshg · March 12, 2020, 4:53pm

Apart from any issues with the code itself, you are measuring compilation time in addition to compute time. Try using @btime from the BenchmarkTools package to get a more reliable measurement.

adangreen · March 12, 2020, 5:02pm

got 1.5 seconds with btime, but it’s still slower than python

using BenchmarkTools

function euclidean_algorithm_division_count(a, b)
    division_count = 1
    if b > a
        a, b = b, a
    end
    while (c = a % b) != 0
        a, b = b, c
        division_count += 1
    end
    return division_count
end

function main()
    N = big(10)^100
    M = 10^4
    division_count_array = []
    while M > 0
        a, b = rand(1:N, 2)
        push!(division_count_array, euclidean_algorithm_division_count(a, b))
        M -= 1
    end
end

@btime main()

ssfrr · March 12, 2020, 5:03pm

To be more explicit, you want to define your functions as above and then run

using BenchmarkTools
@btime main()

An analogy here would be that timing as you’ve done above is sort of like writing a C program and then timing both compilation and execution.

ChrisRackauckas · March 12, 2020, 5:05pm

isn’t type inferred

ssfrr · March 12, 2020, 5:09pm

Specifically here you’ll want to do Int[] rather than just [], so julia knows what types the array will hold.

I usually avoid creating a container first and then push!ing into it for this reason (I don’t want to have to specify the element types). So I often do something more like:

function main()
    N = big(10)^100
    M = 10^4
    division_count_array = map(1:M) do _
        a, b = rand(1:N, 2)
        euclidean_algorithm_division_count(a, b)
    end
end

nilshg · March 12, 2020, 5:14pm

I’m not sure (and your code doesn’t seem to run?) but profiler says the largest part of code is spend in the (c = a % b) line, and chaging the container to Int[] doesn’t do anything to my timings.

Profiler points me to

($fJ)(x::BigInt, y::BigInt) = MPZ.$fC(x, y)

in gmp.jl, so maybe there’s an issue here with BigInts just not being very fast?

Specifically, rewriting things as:

function main2()
    N = big(10)^100
    M = 10^4
    division_count_array = Int[]
    while M > 0
        a, b = rand(1:N, 2)
        push!(division_count_array, euclidean_algorithm_division_count(a, b))
        M -= 1
    end
end

@btime main2()

function main3()
    N = big(10)^100
    division_count_array = Array{Int}(undef, 10^4)
    for i ∈ 1:length(division_count_array)
        division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
    end
end

@btime main3()

I get

 514.021 ms (9549046 allocations: 185.52 MiB)
  539.400 ms (9552491 allocations: 185.58 MiB)
  485.769 ms (9697452 allocations: 188.44 MiB)

so no real changes - maybe there are things to look out for with BigInts I’m unaware of, as I never need to work with them.

And just to add reducing N to avoid BigInt (which of course isn’t directly comparable as we’re also changing the range for rand to consider) the time reduces by a factor of 100 and almost all allocations go away:


function main4()
    N = 10^30
    division_count_array = Array{Int}(undef, 10^4)
    for i ∈ 1:length(division_count_array)
        division_count_array[i] = euclidean_algorithm_division_count(rand(1:N), rand(1:N))
    end
end

@btime main4()
3.869 ms (2 allocations: 78.20 KiB)

pixel27 · March 12, 2020, 6:09pm

BigInt and BigFloat are handled by the GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be. I don’t know what python uses for it’s library, or if they rolled their own. I tried 2 quick things, one was allocating the division_count_array in one pass rather than growing it. The second was to turn off garbage collection.

On my machine I get:

384 ms to 360 ms - Python code.
473 ms - Your (mostly) original Julia code.
333 ms - Julia /w GC disabled.

Which has me curious if the python garbage collector ever ran…

My version:

using BenchmarkTools

function euclidean_algorithm_division_count(a, b)
    division_count = 1
    if b > a
        a, b = b, a
    end
    while (c = a % b) != 0
        a, b = b, c
        division_count += 1
    end
    return division_count
end

function main()
    N = big(10)^100
    R = 1:N
    M = 10^4
    division_count_array = Vector{Int64}(undef, M)
    while M > 0
        a, b = rand(R, 2)
        division_count_array[M] = euclidean_algorithm_division_count(a, b)
        M -= 1
    end
end

GC.enable(false)
@btime main()
GC.enable(true)
GC.gc()

Amin_Yahyaabadi · March 12, 2020, 7:00pm

Also using @inbounds gives me a little more speed up:

function main()
    N = big(10)^100
    R = 1:N
    M = 10^4
    division_count_array = Vector{Int64}(undef, M)
    while M > 0
        a, b = rand(R, 2)
        @inbounds division_count_array[M] = euclidean_algorithm_division_count(a, b)
        M -= 1
    end
end

rveltz · March 12, 2020, 8:55pm

really rand(1:N,2) and not rand(N,2)?

pixel27 · March 12, 2020, 9:02pm

julia> rand(big(10), 2)
10×2 Array{Float64,2}:
 0.85374    0.102817
 0.686707   0.545013
 0.905144   0.897086
 0.405992   0.878026
 0.113256   0.640781
 0.0366894  0.302942
 0.593746   0.203189
 0.647248   0.00892756
 0.826612   0.910477
 0.133924   0.318374

rand(big(10)^100, 2)
ERROR: InexactError: Int64(10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
Stacktrace:
 [1] rand(::BigInt, ::Int64) at .\gmp.jl:356
 [2] top-level scope at REPL[10]:1

rand(1:N, 2) appears to produce different results than rand(N, 2)…

DNF · March 12, 2020, 11:15pm

So, c = a % b is slow for BigInts because a new c is allocated each time. But there’s an in-place version, available, but a bit hidden: Base.GMP.MPZ.tdiv_r! (Integer Division (GNU MP 6.2.1))

function divcount!(a::BigInt, b::BigInt)
    count = 1
    if b > a
        a, b = b, a
    end
    c = a % b
    while !iszero(c)
        a, b, c = b, c, a  # we need to shuffle all three variables to avoid aliasing
        count += 1
        Base.GMP.MPZ.tdiv_r!(c, a, b)  # this works in-place!
    end
    return count
end

divcount(a::BigInt, b::BigInt) = divcount!(deepcopy(a), deepcopy(b))  # copy is not enough

The divcount! function actually mutates the inputs, a and b, so I made a separate divcounts which doesn’t. Let’s compare performance:

function main_(f)
    N = big(10)^100
    M = 10^4
    counts = Vector{Int}(undef, M)
    nrange = 1:N  # moving this outside loop since it allocates
    @inbounds for i in eachindex(counts)
        a, b = rand(nrange, 2)
        counts[i] = f(a, b)
    end
    return counts
end

julia> @btime main_(euclidean_algorithm_division_count);
  694.734 ms (5549040 allocations: 108.78 MiB)

julia> @btime main_(divcount);
  134.639 ms (250005 allocations: 13.28 MiB)

It’s definitely faster. There might be some gains to be had with the rand function too, but that is probably much less.

jlapeyre · March 12, 2020, 11:23pm

I also find that after disabling the garbage collector the python and Julia versions have the same run time.

EDIT: The statement below is wrong. I mistook the timings. The code that uses the reference is slower. I think what @yuyichao is getting at is that an allocation is made for the result of a % b in any case.

[quote=“pixel27, post:8, topic:35890”]
GNU Multiple Precision Arithmetic Library. My assumption would be that the library is as optimized as it can be.
[/quote]

~~The libraries are fast. But, Julia doesn’t do much, if anything, to optimize code for BigInts.~~

For example this
function euclidean_algorithm_division_count(a, b)
division_count = 1
c = Ref(BigInt())
if b > a
a, b = b, a
end
while (c[] = a % b) != 0
a, b = b, c[]
division_count += 1
end
return division_count
end
~~is a bit faster than the version above becuase it doesn’t allocate c every time it is assigned. But, it doesn’t have a big effect on the total run time.~~

yuyichao · March 12, 2020, 11:32pm

No this code does not do any (useful) preallocation.

DNF · March 12, 2020, 11:34pm

Which one is ‘the version above’? It seems slower than the one I suggested:

julia> @btime divcount($a, $b)
  10.524 μs (13 allocations: 976 bytes)

julia> @btime euclidean_algorithm_division_count($a, $b) # yours
  36.689 μs (498 allocations: 9.88 KiB)

And it actually allocates more than the original version

julia> @btime euclidean_algorithm_division_count($a, $b)
  36.610 μs (496 allocations: 9.84 KiB)

Topic		Replies	Views
Poor BigInt performance Performance	32	1275	May 14, 2024
Python big integer vs. Julia big integer arithmetic Performance python	90	1536	June 20, 2024
Optimized Python is as good as Julia Performance question	28	25185	June 2, 2025
Question on simple performance comparison between Python and Julia General Usage question	23	1040	June 13, 2023
Quite bad performance of Julia 0.6.4 vs Python+Numpy General Usage	26	5202	November 13, 2018

Why is python faster than Julia

Related topics