Why does this Python code performs three times faster than Julia?

SantiagoOrtiz · October 13, 2023, 6:25am

I tried to follow basic Performance Tips, but surprisingly, my Python code does it faster than Julia.

# Julia code

ʋ₀::Float64 = 1.0
r₀::Float64 = 1.0
n::Int64 = 5000000

const σ::Float64 = 1.0
const ϵ::Float64 = 1.0
const m::Float64 = 1.0 
const Δt::Float64 = 1E-2

F = (r::Float64) -> -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7);

function md_model(ʋᵢ::Float64, rᵢ::Float64, n::Int64)
    rʋ = Array{Float64, 2}(undef, n, 2)
    for i = 1:n
        ʋᵢ = ʋᵢ + Δt/2*(F(rᵢ)/m)
        rᵢ₊₁ = rᵢ + Δt*ʋᵢ
        ʋᵢ₊₁ = ʋᵢ + Δt/2*(F(rᵢ₊₁)/m)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end

I am using @btime to check the performance:

using BenchmarkTools
@btime md_model(ʋ₀, r₀, n);

This is the Python version:

import numpy as np

ʋ0 = 1.0
r0 = 1.0
n = 500000

σ = 1.0
ϵ = 1.0
m = 1.0
Δt = 1E-2

F = lambda r : -4*ϵ*(-12*σ**12/r**13 + 6*σ**6/r**7)

def md_model(ʋi, ri, n):
    rʋ = np.empty((n, 2))
    for i in range(n):
        ʋi = ʋi + Δt/2*(F(ri)/m)
        ri_new = ri + Δt*ʋi
        ʋi_new = ʋi + Δt/2*(F(ri_new)/m)
        ri = ri_new
        ʋi = ʋi_new
        rʋ[i, 0] = ri_new
        rʋ[i, 1] = ʋi_new
    return rʋ

# I am using %timeit to check the performance
%timeit md_model(ʋ0, r0, n)

The outputs in each case are:
Julia: 2.480990 seconds (50.01 M allocations: 839.939 MiB, 1.18% gc time, 0.59% compilation time)

Python: 824 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I also tried using @turbo in my Julia code, but it did not work (How could I implement it?).

Thank you for all the help,
Santiago

Sukera · October 13, 2023, 6:31am

Did you mean to use @btime? @time is the regular timing macro, which includes compile time and only runs the function once.

While the reported compilation time is low, the number of allocations seems very high, and suspiciously close to n::Int64 = 5000000 (scaling with 10x with n), suggesting a type instability to me. I’d suggest making F const as well or making F a regular function like so:

function F(r::Float64)
   -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7)
end

SantiagoOrtiz · October 13, 2023, 6:35am

Sorry, @time instead of @btime was a typo in my code. Nonetheless, Python is still more than two times faster. I will check the function suggestion.

Sukera · October 13, 2023, 6:42am

FYI, making F const or a regular function improves performance by 6x for me, going from ~620ms to ~107ms. It’s likely going to be more for you, judging by how slow the type unstable version is on your end.

bertschi · October 13, 2023, 6:43am

In your posted code also the ns are different:

n::Int64 = 5000000

n =        500000

i.e., Julia is already 5 times faster?

SantiagoOrtiz · October 13, 2023, 6:50am

lol That was my biggest blunder.

Thank you

gvdr · October 13, 2023, 6:54am

On top of what other said, I believe that accessing variables that are in the Global scope may not be a good idea.

Defining F as:

function F(r::Float64)
  σ::Float64 = 1.0
  ϵ::Float64 = 1.0
  m::Float64 = 1.0 
  Δt::Float64 = 1E-2

  -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7)
end

speeds it up 10x for me and reduces to 2 (rather than 5000000) the allocations.

SantiagoOrtiz · October 13, 2023, 7:00am

That makes sense. However, conceptually, I think it is cleaner to separate the functions and the constants that are common between them (e.g., m). In that case, I compared passing these values as function arguments or as global constants, and the code ran a bit faster when declared as global constants.

gvdr · October 13, 2023, 7:03am

In that case it is better to pass those variables to the function as variables.

SantiagoOrtiz · October 13, 2023, 7:05am

I will check it again, thanks

barucden · October 13, 2023, 7:34am

Your finding is very surprising to me. Santiago defines the global variables as constants, in which case there should be no difference. For example:

const a1 = 1.1
const a2 = 1.2
const a3 = 1.3
const a4 = 1.4

function f(x)
    return -4*a1*(-12*a2^12/x^13 + 6*a3^6/x^7*a4)
end

function g(x)
    a1 = 1.1
    a2 = 1.2
    a3 = 1.3
    a4 = 1.4
    return -4*a1*(-12*a2^12/x^13 + 6*a3^6/x^7*a4)
end

julia> f(3.0) == g(3.0) 
true

julia> @btime f(x) setup=(x=rand());
  23.219 ns (0 allocations: 0 bytes)

julia> @btime g(x) setup=(x=rand());
  23.249 ns (0 allocations: 0 bytes)

And the LLVM code looks the same too:

julia> @code_llvm f(3.0)
;  @ REPL[5]:1 within `f`
define double @julia_f_518(double %0) #0 {
top:
;  @ REPL[5]:2 within `f`
; ┌ @ intfuncs.jl:325 within `literal_pow`
; │┌ @ math.jl:1165 within `^`
    %1 = call double @j_pow_body_520(double %0, i64 signext 13) #0
    %2 = call double @j_pow_body_520(double %0, i64 signext 7) #0
; └└
; ┌ @ float.jl:411 within `/`
   %3 = insertelement <2 x double> poison, double %2, i64 0
   %4 = insertelement <2 x double> %3, double %1, i64 1
   %5 = fdiv <2 x double> <double 0x403CF5FA871A3B18, double 0x405ABF90AD4B54D2>, %4
; └
; ┌ @ float.jl:410 within `*`
   %6 = extractelement <2 x double> %5, i64 0
   %7 = fmul double %6, 1.400000e+00
; └
; ┌ @ float.jl:408 within `+`
   %8 = extractelement <2 x double> %5, i64 1
   %9 = fsub double %7, %8
; └
; ┌ @ operators.jl:578 within `*` @ float.jl:410
   %10 = fmul double %9, -4.400000e+00
; └
  ret double %10
}

julia> @code_llvm g(3.0)
;  @ REPL[7]:1 within `g`
define double @julia_g_530(double %0) #0 {
top:
;  @ REPL[7]:6 within `g`
; ┌ @ intfuncs.jl:325 within `literal_pow`
; │┌ @ math.jl:1165 within `^`
    %1 = call double @j_pow_body_532(double %0, i64 signext 13) #0
    %2 = call double @j_pow_body_532(double %0, i64 signext 7) #0
; └└
; ┌ @ float.jl:411 within `/`
   %3 = insertelement <2 x double> poison, double %2, i64 0
   %4 = insertelement <2 x double> %3, double %1, i64 1
   %5 = fdiv <2 x double> <double 0x403CF5FA871A3B18, double 0x405ABF90AD4B54D2>, %4
; └
; ┌ @ float.jl:410 within `*`
   %6 = extractelement <2 x double> %5, i64 0
   %7 = fmul double %6, 1.400000e+00
; └
; ┌ @ float.jl:408 within `+`
   %8 = extractelement <2 x double> %5, i64 1
   %9 = fsub double %7, %8
; └
; ┌ @ operators.jl:578 within `*` @ float.jl:410
   %10 = fmul double %9, -4.400000e+00
; └
  ret double %10
}

gvdr · October 13, 2023, 7:42am

Silly silly me. I compared directly the original and my version, rather than the explicit

function F
...
end

version. Sorry to both of you!

Albert_de_montserrat · October 13, 2023, 7:51am

As @Sukera mentioned, it is enough to mark F as const F. You can also define it as a closure inside md_model:


function md_model(ʋᵢ::Float64, rᵢ::Float64, n::Int64)

    @inline F(r::Float64) = -4*ϵ*(-12*σ^12/r^13 + 6*σ^6/r^7)
    
    rʋ = Array{Float64, 2}(undef, n, 2)

    for i = 1:n
        ʋᵢ = ʋᵢ + Δt/2*(F(rᵢ)/m)
        rᵢ₊₁ = rᵢ + Δt*ʋᵢ
        ʋᵢ₊₁ = ʋᵢ + Δt/2*(F(rᵢ₊₁)/m)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end

julia> @btime md_model($(ʋ₀, r₀, n)...);
  143.676 ms (2 allocations: 76.29 MiB)

If you want extra speed, you could use FastPow.jl, fma’s, and precompute some constant terms:

using FastPow
function md_model2(ʋᵢ::Float64, rᵢ::Float64, n::Int64)

    c1 = -4*ϵ
    c2 = @fastpow -12*σ^12 * c1
    c3 = @fastpow 6*σ^6 * c1
    
    @inline F(r::Float64) = @fastpow c2/r^13 + c3/r^7
     
    rʋ = Array{Float64, 2}(undef, n, 2)
    
    c = Δt * 0.5 * inv(m)
    for i = 1:n
        ʋᵢ = muladd(c, F(rᵢ), ʋᵢ)
        rᵢ₊₁ = muladd(Δt, ʋᵢ, rᵢ)
        ʋᵢ₊₁ = muladd(c, (F(rᵢ₊₁)), ʋᵢ)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end

julia> @btime md_model2($(ʋ₀, r₀, n)...);
  67.954 ms (2 allocations: 76.29 MiB)

alfaromartino · October 13, 2023, 7:57am

Just a couple of questions/suggestions. Is there any reason why you’re type annotating each variable? There’s no need in terms of performance.

In fact, you’re probably introducing a bug by doing that, as the following code is type stable (I only got rid of the type annotations)

# Julia code

ʋ₀ = 1.0
r₀ = 1.0
n  = 5_000_000

const σ  = 1.0
const ϵ  = 1.0
const m  = 1.0
const Δt = 1E-2

F(r) = -4 * ϵ * (-12 * σ^12 / r^13 + 6 * σ^6 / r^7)

function md_model(ʋᵢ, rᵢ, n)
    rʋ = Array{Float64, 2}(undef, n, 2)
    
    for i in 1:n
        ʋᵢ       = ʋᵢ + Δt/2*(F(rᵢ)/m)
        rᵢ₊₁     = rᵢ + Δt*ʋᵢ
        ʋᵢ₊₁     = ʋᵢ + Δt/2*(F(rᵢ₊₁)/m)
        rʋ[i, 1] = rᵢ = rᵢ₊₁
        rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁
    end
    return rʋ
end

using BenchmarkTools
@btime md_model(ʋ₀, r₀, n);

You can check that now the function is type stable by executing

@code_warntype md_model(ʋ₀, r₀, n);

which should be faster.

Second, redefining variables or arguments within a function is calling for trouble. I’m talking about these two particular lines:

rʋ[i, 1] = rᵢ = rᵢ₊₁
rʋ[i, 2] = ʋᵢ = ʋᵢ₊₁

Even though it seems that it doesn’t affect this particular case, when you redefine a variable inside a function, Julia couldn’t be able to figure whether the type has changed in some cases (in particular, when you work with functions inside functions).

Third, I’d use r[i] instead of rᵢ, as in the end you’re trying to represent a vector, and it’s hence better to make it explicit.

SantiagoOrtiz · October 13, 2023, 4:42pm

What does @inline do here?

Palli · October 13, 2023, 4:51pm

Press ? for the help mode (the hel text is a lot longer, but starts with, and inlining is a computer science term for not needing to “call” a subroutine):

help?> @inline
  @inline

  Give a hint to the compiler that this function is worth inlining.

  Small functions typically do not need the @inline annotation, as the compiler does it automatically. By using @inline on bigger functions, an extra nudge can be given to the compiler to inline it.

Maybe @inline wasn’t needed. The function seems small, but will be counted in assembly/machine code instructions basically. It’s not always better to do this there’s even @noinline.

SantiagoOrtiz · October 13, 2023, 4:58pm

Thank you for your suggestions, @alfaromartino !

I thought that type annotation for variables could improve the performance. That is not the case in Julia, I guess? What is usually the main reason one would annotate the type of a local variable, then?

I am not sure if I understood your second point. The value of the local variables ʋᵢ and rᵢ has to change in each iteration (as told by the numerical method I am using). How could I rearrange these code lines inside the function to comply with your second suggestion?

mkitti · October 13, 2023, 5:08pm

Julia does automatic specialization of functions for arguments type, so it is not actually necessary for performance. The main reason to annotate argument types is for multiple dispatch and documentation.

For example, note that Julia generating special versions of the following simple function f for LLVM types double, i64, and i8:

julia> f(x) = x
f (generic function with 1 method)

julia> @code_llvm f(5.0)
;  @ REPL[1]:1 within `f`
define double @julia_f_94(double %0) #0 {
top:
  ret double %0
}

julia> @code_llvm f(5)
;  @ REPL[1]:1 within `f`
define i64 @julia_f_123(i64 signext %0) #0 {
top:
  ret i64 %0
}

julia> @code_llvm f(0x5)
;  @ REPL[1]:1 within `f`
define i8 @julia_f_125(i8 zeroext %0) #0 {
top:
  ret i8 %0
}

If you use type annotations, you can have different functions with the same name but do different things depending on the input (multiple dispatch). Below providing a Float64 to f cubes the function, while providing anything else will use the prior unannotated definition above.

julia> f(x::Float64) = x^3
f (generic function with 2 methods)

julia> @code_llvm f(5.0)
;  @ REPL[6]:1 within `f`
define double @julia_f_134(double %0) #0 {
top:
; ┌ @ intfuncs.jl:320 within `literal_pow`
; │┌ @ operators.jl:578 within `*` @ float.jl:410
    %1 = fmul double %0, %0
    %2 = fmul double %1, %0
; └└
  ret double %2
}

julia> @code_llvm f(5)
;  @ REPL[1]:1 within `f`
define i64 @julia_f_138(i64 signext %0) #0 {
top:
  ret i64 %0
}

devel-chm · October 13, 2023, 5:47pm

A little used/appreciated feature of Julia is that you
may use single underscores to visibly “punctuate”
long digit sequences to make them more legible and
less likely to introduce mistakes.

For example, these

n::Int64 = 5000000

n =        500000

become

n::Int64 = 5_000_000

n =        500_000

where they are clearly different numbers!

CameronBieganek · October 13, 2023, 5:55pm

You can also do that in Python, which would make it easier to ensure that both the Julia and Python code use the same integer.

>>> 5_000_000
5000000

Topic		Replies	Views
General questions from Python user Performance	59	4324	March 8, 2021
optimising for loop performance Performance	14	3032	July 8, 2018
Quite bad performance of Julia 0.6.4 vs Python+Numpy General Usage	26	5212	November 13, 2018
Julia can be dramatically slower than Matlab when solving 2D PDE New to Julia performance , matlab , pde	42	1240	November 9, 2024
For Monte Carlo simulation with same code same algorithm, how fast is Julia compared with Fortran? General Usage fortran , performance	59	9005	September 1, 2021

Why does this Python code performs three times faster than Julia?

Related topics