Why does this Python code performs three times faster than Julia?

I tried to follow basic Performance Tips, but surprisingly, my Python code does it faster than Julia.

# Julia code

ʋ₀::Float64 = 1.0
r₀::Float64 = 1.0
n::Int64 = 5000000

const σ::Float64 = 1.0
const ϵ::Float64 = 1.0
const m::Float64 = 1.0 
const Δt::Float64 = 1E-2

F = (r::Float64) -> -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7);

function md_model(ʋᵢ::Float64, rᵢ::Float64, n::Int64)
    rʋ = Array{Float64, 2}(undef, n, 2)
    for i = 1:n
        ʋᵢ = ʋᵢ + Δt/2*(F(rᵢ)/m)
        rᵢ₊₁ = rᵢ + Δt*ʋᵢ
        ʋᵢ₊₁ = ʋᵢ + Δt/2*(F(rᵢ₊₁)/m)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end

I am using @btime to check the performance:

using BenchmarkTools
@btime md_model(ʋ₀, r₀, n);

This is the Python version:

import numpy as np

ʋ0 = 1.0
r0 = 1.0
n = 500000

σ = 1.0
ϵ = 1.0
m = 1.0
Δt = 1E-2

F = lambda r : -4*ϵ*(-12*σ**12/r**13 + 6*σ**6/r**7)

def md_model(ʋi, ri, n):
    rʋ = np.empty((n, 2))
    for i in range(n):
        ʋi = ʋi + Δt/2*(F(ri)/m)
        ri_new = ri + Δt*ʋi
        ʋi_new = ʋi + Δt/2*(F(ri_new)/m)
        ri = ri_new
        ʋi = ʋi_new
        rʋ[i, 0] = ri_new
        rʋ[i, 1] = ʋi_new
    return rʋ

# I am using %timeit to check the performance
%timeit md_model(ʋ0, r0, n)

The outputs in each case are:
Julia: 2.480990 seconds (50.01 M allocations: 839.939 MiB, 1.18% gc time, 0.59% compilation time)

Python: 824 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I also tried using @turbo in my Julia code, but it did not work (How could I implement it?).

Thank you for all the help,
Santiago

Did you mean to use @btime? @time is the regular timing macro, which includes compile time and only runs the function once.

While the reported compilation time is low, the number of allocations seems very high, and suspiciously close to n::Int64 = 5000000 (scaling with 10x with n), suggesting a type instability to me. I’d suggest making F const as well or making F a regular function like so:

function F(r::Float64)
   -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7)
end
4 Likes

Sorry, @time instead of @btime was a typo in my code. Nonetheless, Python is still more than two times faster. I will check the function suggestion.

1 Like

FYI, making F const or a regular function improves performance by 6x for me, going from ~620ms to ~107ms. It’s likely going to be more for you, judging by how slow the type unstable version is on your end.

2 Likes

In your posted code also the ns are different:

n::Int64 = 5000000
n =        500000

i.e., Julia is already 5 times faster?

22 Likes

lol That was my biggest blunder.

Thank you

14 Likes

On top of what other said, I believe that accessing variables that are in the Global scope may not be a good idea.

Defining F as:

function F(r::Float64)
  σ::Float64 = 1.0
  ϵ::Float64 = 1.0
  m::Float64 = 1.0 
  Δt::Float64 = 1E-2

  -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7)
end

speeds it up 10x for me and reduces to 2 (rather than 5000000) the allocations.

2 Likes

That makes sense. However, conceptually, I think it is cleaner to separate the functions and the constants that are common between them (e.g., m). In that case, I compared passing these values as function arguments or as global constants, and the code ran a bit faster when declared as global constants.

In that case it is better to pass those variables to the function as variables.

1 Like

I will check it again, thanks :+1:

1 Like

Your finding is very surprising to me. Santiago defines the global variables as constants, in which case there should be no difference. For example:

const a1 = 1.1
const a2 = 1.2
const a3 = 1.3
const a4 = 1.4

function f(x)
    return -4*a1*(-12*a2^12/x^13 + 6*a3^6/x^7*a4)
end

function g(x)
    a1 = 1.1
    a2 = 1.2
    a3 = 1.3
    a4 = 1.4
    return -4*a1*(-12*a2^12/x^13 + 6*a3^6/x^7*a4)
end

julia> f(3.0) == g(3.0) 
true

julia> @btime f(x) setup=(x=rand());
  23.219 ns (0 allocations: 0 bytes)

julia> @btime g(x) setup=(x=rand());
  23.249 ns (0 allocations: 0 bytes)

And the LLVM code looks the same too:

julia> @code_llvm f(3.0)
;  @ REPL[5]:1 within `f`
define double @julia_f_518(double %0) #0 {
top:
;  @ REPL[5]:2 within `f`
; ┌ @ intfuncs.jl:325 within `literal_pow`
; │┌ @ math.jl:1165 within `^`
    %1 = call double @j_pow_body_520(double %0, i64 signext 13) #0
    %2 = call double @j_pow_body_520(double %0, i64 signext 7) #0
; └└
; ┌ @ float.jl:411 within `/`
   %3 = insertelement <2 x double> poison, double %2, i64 0
   %4 = insertelement <2 x double> %3, double %1, i64 1
   %5 = fdiv <2 x double> <double 0x403CF5FA871A3B18, double 0x405ABF90AD4B54D2>, %4
; └
; ┌ @ float.jl:410 within `*`
   %6 = extractelement <2 x double> %5, i64 0
   %7 = fmul double %6, 1.400000e+00
; └
; ┌ @ float.jl:408 within `+`
   %8 = extractelement <2 x double> %5, i64 1
   %9 = fsub double %7, %8
; └
; ┌ @ operators.jl:578 within `*` @ float.jl:410
   %10 = fmul double %9, -4.400000e+00
; └
  ret double %10
}
julia> @code_llvm g(3.0)
;  @ REPL[7]:1 within `g`
define double @julia_g_530(double %0) #0 {
top:
;  @ REPL[7]:6 within `g`
; ┌ @ intfuncs.jl:325 within `literal_pow`
; │┌ @ math.jl:1165 within `^`
    %1 = call double @j_pow_body_532(double %0, i64 signext 13) #0
    %2 = call double @j_pow_body_532(double %0, i64 signext 7) #0
; └└
; ┌ @ float.jl:411 within `/`
   %3 = insertelement <2 x double> poison, double %2, i64 0
   %4 = insertelement <2 x double> %3, double %1, i64 1
   %5 = fdiv <2 x double> <double 0x403CF5FA871A3B18, double 0x405ABF90AD4B54D2>, %4
; └
; ┌ @ float.jl:410 within `*`
   %6 = extractelement <2 x double> %5, i64 0
   %7 = fmul double %6, 1.400000e+00
; └
; ┌ @ float.jl:408 within `+`
   %8 = extractelement <2 x double> %5, i64 1
   %9 = fsub double %7, %8
; └
; ┌ @ operators.jl:578 within `*` @ float.jl:410
   %10 = fmul double %9, -4.400000e+00
; └
  ret double %10
}

Silly silly me. I compared directly the original and my version, rather than the explicit

function F
...
end

version. Sorry to both of you!

As @Sukera mentioned, it is enough to mark F as const F. You can also define it as a closure inside md_model:


function md_model(ʋᵢ::Float64, rᵢ::Float64, n::Int64)

    @inline F(r::Float64) = -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7)
    
    rʋ = Array{Float64, 2}(undef, n, 2)

    for i = 1:n
        ʋᵢ = ʋᵢ + Δt/2*(F(rᵢ)/m)
        rᵢ₊₁ = rᵢ + Δt*ʋᵢ
        ʋᵢ₊₁ = ʋᵢ + Δt/2*(F(rᵢ₊₁)/m)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end
julia> @btime md_model($(ʋ₀, r₀, n)...);
  143.676 ms (2 allocations: 76.29 MiB)

If you want extra speed, you could use FastPow.jl, fma’s, and precompute some constant terms:

using FastPow
function md_model2(ʋᵢ::Float64, rᵢ::Float64, n::Int64)

    c1 = -4*ϵ
    c2 = @fastpow -12*σ^12 * c1
    c3 = @fastpow 6*σ^6 * c1
    
    @inline F(r::Float64) = @fastpow c2/r^13 + c3/r^7
     
    rʋ = Array{Float64, 2}(undef, n, 2)
    
    c = Δt * 0.5 * inv(m)
    for i = 1:n
        ʋᵢ = muladd(c, F(rᵢ), ʋᵢ)
        rᵢ₊₁ = muladd(Δt, ʋᵢ, rᵢ)
        ʋᵢ₊₁ = muladd(c, (F(rᵢ₊₁)), ʋᵢ)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end
julia> @btime md_model2($(ʋ₀, r₀, n)...);
  67.954 ms (2 allocations: 76.29 MiB)
4 Likes

Just a couple of questions/suggestions. Is there any reason why you’re type annotating each variable? There’s no need in terms of performance.

In fact, you’re probably introducing a bug by doing that, as the following code is type stable (I only got rid of the type annotations)

# Julia code

ʋ₀ = 1.0
r₀ = 1.0
n  = 5_000_000

const σ  = 1.0
const ϵ  = 1.0
const m  = 1.0
const Δt = 1E-2

F(r) = -4 * ϵ * (-12 * σ^12 / r^13 + 6 * σ^6 / r^7)

function md_model(ʋᵢ, rᵢ, n)
    rʋ = Array{Float64, 2}(undef, n, 2)
    
    for i in 1:n
        ʋᵢ       = ʋᵢ + Δt/2*(F(rᵢ)/m)
        rᵢ₊₁     = rᵢ + Δt*ʋᵢ
        ʋᵢ₊₁     = ʋᵢ + Δt/2*(F(rᵢ₊₁)/m)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end

using BenchmarkTools
@btime md_model(ʋ₀, r₀, n);

You can check that now the function is type stable by executing

@code_warntype md_model(ʋ₀, r₀, n);

which should be faster.

Second, redefining variables or arguments within a function is calling for trouble. I’m talking about these two particular lines:

rʋ[i, 1] = rᵢ = rᵢ₊₁
rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁

Even though it seems that it doesn’t affect this particular case, when you redefine a variable inside a function, Julia couldn’t be able to figure whether the type has changed in some cases (in particular, when you work with functions inside functions).

Third, I’d use r[i] instead of rᵢ, as in the end you’re trying to represent a vector, and it’s hence better to make it explicit.

4 Likes

What does @inline do here?

Press ? for the help mode (the hel text is a lot longer, but starts with, and inlining is a computer science term for not needing to “call” a subroutine):

help?> @inline
  @inline

  Give a hint to the compiler that this function is worth inlining.

  Small functions typically do not need the @inline annotation, as the compiler does it automatically. By using @inline on bigger functions, an extra nudge can be given to the compiler to inline it.

Maybe @inline wasn’t needed. The function seems small, but will be counted in assembly/machine code instructions basically. It’s not always better to do this there’s even @noinline.

1 Like

Thank you for your suggestions, @alfaromartino !

I thought that type annotation for variables could improve the performance. That is not the case in Julia, I guess? What is usually the main reason one would annotate the type of a local variable, then?

I am not sure if I understood your second point. The value of the local variables ʋᵢ and rᵢ has to change in each iteration (as told by the numerical method I am using). How could I rearrange these code lines inside the function to comply with your second suggestion?

Julia does automatic specialization of functions for arguments type, so it is not actually necessary for performance. The main reason to annotate argument types is for multiple dispatch and documentation.

For example, note that Julia generating special versions of the following simple function f for LLVM types double, i64, and i8:

julia> f(x) = x
f (generic function with 1 method)

julia> @code_llvm f(5.0)
;  @ REPL[1]:1 within `f`
define double @julia_f_94(double %0) #0 {
top:
  ret double %0
}

julia> @code_llvm f(5)
;  @ REPL[1]:1 within `f`
define i64 @julia_f_123(i64 signext %0) #0 {
top:
  ret i64 %0
}

julia> @code_llvm f(0x5)
;  @ REPL[1]:1 within `f`
define i8 @julia_f_125(i8 zeroext %0) #0 {
top:
  ret i8 %0
}

If you use type annotations, you can have different functions with the same name but do different things depending on the input (multiple dispatch). Below providing a Float64 to f cubes the function, while providing anything else will use the prior unannotated definition above.

julia> f(x::Float64) = x^3
f (generic function with 2 methods)

julia> @code_llvm f(5.0)
;  @ REPL[6]:1 within `f`
define double @julia_f_134(double %0) #0 {
top:
; ┌ @ intfuncs.jl:320 within `literal_pow`
; │┌ @ operators.jl:578 within `*` @ float.jl:410
    %1 = fmul double %0, %0
    %2 = fmul double %1, %0
; └└
  ret double %2
}

julia> @code_llvm f(5)
;  @ REPL[1]:1 within `f`
define i64 @julia_f_138(i64 signext %0) #0 {
top:
  ret i64 %0
}
4 Likes

A little used/appreciated feature of Julia is that you
may use single underscores to visibly “punctuate”
long digit sequences to make them more legible and
less likely to introduce mistakes.

For example, these

n::Int64 = 5000000
n =        500000

become

n::Int64 = 5_000_000
n =        500_000

where they are clearly different numbers!

13 Likes

You can also do that in Python, which would make it easier to ensure that both the Julia and Python code use the same integer. :slight_smile:

>>> 5_000_000
5000000
6 Likes