"Very optimised" Julia code 50x slower than Python with compiled functions

StarecZosima · March 28, 2023, 12:57pm

Hi All,
I am doing some Monte Carlo computation using Python (compiling some functions to C++ using the Numba package).
I thought I would see how Julia compares.
As is good programming practice, I scraped some code from the Internet Ising 2D, simulating the same 2D square Ising model I am into. The code is supposedly “very optimised” already.
The fact is, the Python code is almost 50 times faster, two seconds vs approx one hundred. The Python code it is not optimised at all, and it does perform additional operations such as appending a values to string for every iteration. They both perform a Metropolis-Hasting iteration for one million iterations, over a 100x100 lattice.
I am totally at a loss. I fear I am missing something silly, but I have been checking all morning.
This is the Julia code

using BenchmarkTools, StaticArrays
using Random: default_rng, seed!

function ising2d_ifelse!(s, β, niters, rng=default_rng())
    m, n = size(s)
    min_h = -4
    max_h = 4
    prob = [1/(1+exp(-2*β*h)) for h in min_h:max_h]
    for iter in 1:niters
        for j in 1:n 
            for i in 1:m
                NN = s[ifelse(i == 1, m, i-1), j]
                SS = s[ifelse(i == m, 1, i+1), j]
                WW = s[i, ifelse(j == 1, n, j-1)]
                EE = s[i, ifelse(j == n, 1, j+1)]
                h = NN + SS + WW + EE
                s[i,j] = ifelse(rand(rng) < prob[h-min_h+1], +1, -1)
            end
        end
    end
end

const β_crit = log(1+sqrt(2))/2
#rand_ising2d(m, n=m) = rand(Int8[-1, 1], m, n)
rand_ising2d(m, n=m) = fill(1,(m,m))

seed!(4649)
s₀ = rand_ising2d(100);

s = copy(s₀)
rng = default_rng()
ising2d_ifelse!(s, β_crit, 10^3, rng)

s = copy(s₀)
steps = 1000000
seed!(4649)
rng = default_rng()
#@benchmark ising2d_ifelse!(s, β_crit, 10^6, rng)
#@time begin
#  ising2d_ifelse!(s, β_crit, 1000000, rng)
#end

time_ising(s, β_crit, steps, rng) = @time  ising2d_ifelse!(s, β_crit, steps, rng)
time_ising(s, β_crit, steps, rng)

and its (compiled) Python counterpart

import numpy as np
from math import exp
from random import randrange, random, choice
from numba import jit

from datetime import datetime
startTime = datetime.now()


@jit(nopython= True)
def init_lowT(N): 
        return  np.ones((N,N))
@jit(nopython= True)        
def init_highT(N):
        l = np.zeros((self.N,self.N),dtype=int)
        for y in range(self.N):
            for x in range(self.N):
                l[x,y] = choice([1,-1])
        return l
    

@jit(nopython= True)
def E_elem(array,x,y, B): 
    N = array.shape[0]
    if B==0: #no magnetic field
        return (-1.0 * array[x,y]*(array[(x + 1) % N, y] +
                                            array[(x - 1 + N) % N, y] +
                                            array[x, (y + 1) % N] +
                                            array[x, (y - 1 + N) % N]))
    else: #there is a magnetic field
        return (-1.0 * array[x,y]*(array[(x + 1) % N, y] +
                                            array[(x - 1 + N) % N, y] +
                                            array[x, (y + 1) % N] +
                                            array[x, (y - 1 + N) % N]) -
                                            array[x,y]*B)


# sums up potential of lattice
@jit(nopython= True)
def E_tot(array, B):
   N = array.shape[0]
   en = 0
   for y in range(N):
       for x in range(N):
           en += E_elem(array, x,y,B)
   return en

# sums up magnetization of lattice
@jit(nopython= True)
def M_tot(array):
   N = array.shape[0]
   mag = 0
   for y in range(N):
       for x in range(N):
           mag += array[x,y]
   return mag


@jit(nopython= True)
def metropolis(array, T, steps, B):
    N = array.shape[0]
    Elist = []
    Mlist = []
    M = M_tot(array)
    E = E_tot(array, B)
    for i in range(steps):          
    # choose random atom
        x = randrange(N)
        y = randrange(N)

        dE = -2* E_elem(array,x,y,B)

        if (dE <= 0):
            array [x,y] *= -1
            M += 2 * array[x,y]
            E += dE
            Mlist.append(M)
            Elist.append(E)

        elif random() < exp(-1.0 * dE/T):
            array[x,y] *= -1
            M += 2 * array[x,y]
            E += dE
            Mlist.append(M)
            Elist.append(E)
#        Mlist = Mlist[::100]
#       Elist = Elist[::100]
    return array, Mlist, Elist


@jit(nopython= True)
def run_sim(N=100,T=1.3, steps = 500000, B=-1):
    start_conf = init_lowT(N)
    res, Mlist, Elist = metropolis(start_conf, T, steps, B)
#    M.metropolis(steps = steps)
# plt.imshow(res,cmap="bwr")
#    plt.colorbar()
    return  res, Mlist, Elist

T_hat = 1.0*2.26

res, Mlist, Elist = run_sim(N=128,T=T_hat, steps = 1000000, B=0)

print(datetime.now() - startTime)

The timing might be quite rudimental in both cases, but the difference is so astonishing (approx. 2 sec vs 100 sec) I do not think it matters. Any hint or ideas please?

Henrique_Becker · March 28, 2023, 1:09pm

Are not the two codes completely different?

In Julia you have a for j in 1:n and for i in 1:m inside for iter in 1:niters. In Python you select a single index in the range 1:N to do something over it. The Julia algorithm is cubic, the Python algorithm is linear.

JonasIsensee · March 28, 2023, 1:16pm

The python code picks a random lattice site (metropolis algorithm) at every step.

 x = randrange(N)
 y = randrange(N)

whereas the julia code uses all lattice sites at every step

for iter in 1:niters
        for j in 1:n 
            for i in 1:m

This is not correct and at the at the resolution that you tested it is also 10000x more work.

function ising2d_ifelse!(s, β, niters, rng=default_rng())
    m, n = size(s)
    min_h = -4
    max_h = 4
    prob = [1/(1+exp(-2*β*h)) for h in min_h:max_h]

    for iter in 1:niters
        j = rand(rng, 1:n)
        i = rand(rng, 1:m)
        NN = s[ifelse(i == 1, m, i-1), j]
        SS = s[ifelse(i == m, 1, i+1), j]
        WW = s[i, ifelse(j == 1, n, j-1)]
        EE = s[i, ifelse(j == n, 1, j+1)]
        h = NN + SS + WW + EE
        s[i,j] = ifelse(rand(rng) < prob[h-min_h+1], +1, -1)
    end
end

this takes 0.027s on my computer

StarecZosima · March 28, 2023, 2:02pm

Goodness gracious, I totally missed that part. I gave it for granted it would do the standard Metropolis flip, indeed it goes through all sequentially, at every iterations.
My most sincere apologies. At least, it gets confirmed once more Julia crashed Python + Numba. Thanks again

Topic		Replies	Views
Optimized Python is as good as Julia Performance question	28	25012	June 2, 2025
Question on simple performance comparison between Python and Julia General Usage question	23	1033	June 13, 2023
Python3+@njit faster than julia (rs) General Usage performance	10	658	May 13, 2025
Different speed when estimating pi General Usage question	4	858	December 29, 2017
General questions from Python user Performance	59	4314	March 8, 2021

"Very optimised" Julia code 50x slower than Python with compiled functions

Related topics