Huge performance difference between inbuilt gcdx() and my function

Hi All,

I had written a Bezout’s identity function (before I knew Julia already had one in the form of gcdx), and although my implementation is very close to the one that comes with Julia, I see that the inbuilt function is some ~50x faster than mine. Here are the 2 functions, and the results. All inputs are BigInts of ~1000 bits to both functions.

My function:

function bezout(a::BigInt, b::BigInt)
  x1::BigInt, y1::BigInt, z1::BigInt = a, 1, 0
  x2::BigInt, y2::BigInt, z2::BigInt = b, 0, 1
  while x2!= 0
    q = x1÷x2
    x1, x2 = x2, x1%x2
    y1, y2 = y2, y1-q*y2
    z1, z2 = z2, z1-q*z2
  return x1, y1, z1

Inbuilt function:

function gcdx(a::Integer, b::Integer)
    T = promote_type(typeof(a), typeof(b))
    # a0, b0 = a, b
    s0, s1 = oneunit(T), zero(T)
    t0, t1 = s1, s0
    # The loop invariant is: s0*a0 + t0*b0 == a
    x = a % T
    y = b % T
    while y != 0
        q = div(x, y)
        x, y = y, rem(x, y)
        s0, s1 = s1, s0 - q*s1
        t0, t1 = t1, t0 - q*t1
    x < 0 ? (-x, -s0, -t0) : (x, s0, t0)



@time map((x,y)->bezout(x,y),a,b)
  8.285886 seconds (87.98 M allocations: 3.098 GiB, 20.51% gc time, 1.17% compilation time)

@time map((x,y)->gcdx(x,y),a,b)
  0.178137 seconds (155.75 k allocations: 8.008 MiB, 45.94% compilation time)

Both functions were run once with BigInt inputs for compilation before running the perf test. The inbuilt function takes just 0.18s vs 8.29s for mine! Any pointers in understanding this huge performance difference? The memory allocations also seem to be way off.

Thank you!
PS: I’m using the latest version of Julia v1.6.1

Julia has a dedicated gcdx function for BigInt which calls into the MPZ library:

That’s not the gcdx that’s called for BigInt, check with @edit gcdx(big"1", big"1"):

julia> @which gcdx(big"1", big"1")                  
gcdx(a::BigInt, b::BigInt) in Base.GMP at gmp.jl:616
julia> @edit gcdx(big"1", big"1")

#<new editor opens up>

function gcdx(a::BigInt, b::BigInt)                                          
    if iszero(b) # shortcut this to ensure consistent results with gcdx(a,b) 
        return a < 0 ? (-a,-ONE,b) : (a,one(BigInt),b)                       
        # we don't return the globals ONE and ZERO in case the user wants to 
        # mutate the result                                                  
    g, s, t = MPZ.gcdext(a, b)                                               
    if t == 0                                                                
        # work around a difference in some versions of GMP                   
        if a == b                                                            
            return g, t, s                                                   
        elseif abs(a)==abs(b)                                                
            return g, t, -s                                                  
    g, s, t                                                                  

The called GMP method avoids all those intermediate allocations from repeated div, rem etc.

1 Like

Ah, I see! Thank you, @Sukera and @kristoffer.carlsson That explains it.