Numerical difference between Julia and (C, Fortran, Python) for a chaotic damped driven pendulum

I wrote a simple code implementing a midpoint 2nd order Runge-Kutta method for a damped driven pendulum in different languages to compare run times. While the results were exactly the same for C, Fortran and Python (Fortran and C tested up to t = 10⁷) the results for Julia drifted away from these after a rather short time.

Since the problem is chaotic with the given parameters, any slight difference the initial conditions will grow with time; however, all languages tested except for Julia apparently were able to perform exactly the same calculations.

Can someone explain what is happening? (Equations, Julia and C codes below image)

\begin{cases}\frac{d\theta}{dt} = v \\ \frac{dv}{dt} = -\gamma\, v - \sin(\theta) + F \cos(\omega\, t) \end{cases}
x_0 = 1.0,\; v_0 = 0.0
\omega = 0.7, \; \gamma = 0.05, \; F = 0.6

Julia code:

using Printf

X0         = 1.0		
VX0        = 0.0		

OMEGA      = 0.70
GAMMA      = 0.05
FF         = 0.6 

TMAX       = 500 
DT         = 0.01  
DT_DIV_2   = 0.5*DT

function f(t, x, v)
    return -GAMMA*v - sin(x) + FF*cos(OMEGA*t)
end

out1 = @sprintf("dados_rk2_F%.2f_julia.txt", FF)
fout1 = open(out1, "w")

i = 0
	
x = X0 
v = VX0 
t = 0.0

while (t<=TMAX)
	k1x = v                    
	k1v = f(t,x,v)
		
	tt  = t + DT_DIV_2
	xx  = x + k1x * DT_DIV_2
	vv  = v + k1v * DT_DIV_2	          
		
	k2x = vv
	k2v = f(tt, xx, vv)                  
		  
	global x += k2x * DT	
	global v += k2v * DT
		
	global i += 1
	global t  = i*DT
		
	@printf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v)
	
end
	
close(fout1)

C language code:

#include <stdio.h>
#include <math.h>

const double X0       = 1.0;		
const double VX0      = 0.0;	

const double OMEGA    = 0.70;
const double GAMMA    = 0.05;
const double FF		  = 0.60; 

const double TMAX     = 500;
const double DT       = 0.01;

double f(double t, double x, double v)
{ 
	return -GAMMA*v - sin(x) + FF*cos(OMEGA*t);
}

int main(int argc, char *argv[])
{	
	double x, v, t;
	double xx, vv, tt, k1x, k2x, k1v, k2v;
	double DT_DIV_2 = 0.5*DT;

	char   out1[200];
	FILE   *fout1;
	
	int i = 0;
	
	sprintf(out1,"dados_rk2_F%.2f.txt",FF);
	fout1=fopen(out1,"w");
	
	x = X0; v = VX0; t = 0.0;

	while (t <= TMAX)
	{
		k1x = v;                      
		k1v = f(t,x,v); 
		
		tt  = t + DT_DIV_2;
		xx  = x + k1x * DT_DIV_2;
		vv  = v + k1v * DT_DIV_2;	          
		
		k2x = vv;
		k2v = f(tt, xx, vv);                  
		  
		x += k2x * DT;	
		v += k2v * DT;
	
		i++;
		t = i*DT;
		
		fprintf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v);
	}
	
	fclose(fout1);

	return 0;
}
2 Likes

It’s chaotic, so the simplest things can cause this. My guess is that the first 3 use the same standard math library definition of sin and cos, while Julia’s is different in 1ulp or so.

10 Likes

Thanks for the heads up. I naively expected that all implementations for the transcendental functions should work exactly the same with double precision. If anyone is curious:

“Reputable numeric libraries compute the basic transcendental functions to between 0.5 and about 1 ULP. Only a few libraries compute them within 0.5 ULP, this problem being complex due to the Table-maker’s dilemma.”

3 Likes

You can make some rough estimate of the divergence due to the chaotic dynamics based on the Lyapunov exponent. Here is a simple script that calculates the maximum Lyapunov exponent:

using DynamicalSystems

function pendulum_rule(u, p, t)
    ω, γ, F = p
    θ, v = u
    dθ = v
    dv =  -γ*v - sin(θ) + F*cos(ω*t)
    return SVector(dθ, dv)
end

p0 = [0.7, 0.05, 0.6]
u0 = [1.0 + 1e-3rand(), 0.0]

ds = ContinuousDynamicalSystem(pendulum_rule, u0, p0)

λ = lyapunov(ds, 100000.0)

e = 1e-12 # error expected 

Δt = log(1/e)/λ
195.5...

Using this I get a prediction horizon of about 200 time units, which seems to go well with your plot. Of course, this is not precisely accurate, as the roundoff error due to cosine differences is more similar to dynamic noise constantly added to the system, but can give you a good estimate of how fast you would expect divergence.

8 Likes

You can try the system sin and cos functions instead of Julia’s:

csin(x::Float64) =  ccall(:sin, Float64, (Float64,), x)
csin(x::Float32) =  ccall(:sinf, Float32, (Float32,), x)
ccos(x::Float64) =  ccall(:cos, Float64, (Float64,), x)
ccos(x::Float32) =  ccall(:cosf, Float32, (Float32,), x)
11 Likes

Before you get disapointed, your code as is will be very slow in Julia. Put all inside a function and benchmark it with BenchmarkTools to actually get its peformance. Something like this (be aware that in all these benchmarks the time will be dominated by printing as they are):

using Printf, BenchmarkTools


function pendulum(;print=false)

  X0         = 1.0		
  VX0        = 0.0		
  
  OMEGA      = 0.70
  GAMMA      = 0.05
  FF         = 0.6 
  
  TMAX       = 500 
  DT         = 0.01  
  DT_DIV_2   = 0.5*DT

  f(t, x, v) = -GAMMA*v - sin(x) + FF*cos(OMEGA*t)

  if print
    out1 = @sprintf("dados_rk2_F%.2f_julia.txt", FF)
    fout1 = open(out1, "w")
  end

  i = 0
  x = X0 
  t = 0.0
  v = VX0
  
  while t <= TMAX

  	k1x = v
  	k1v = f(t,x,v)
  		
  	tt  = t + DT_DIV_2
  	xx  = x + k1x * DT_DIV_2
  	vv  = v + k1v * DT_DIV_2	          
  		
  	k2x = vv
  	k2v = f(tt, xx, vv)                  
  		  
  	x += k2x * DT	
  	v += k2v * DT
  		
  	i += 1
  	t  = i*DT
  		
     if print 
       @printf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v)
     end
  end
  if print
    close(fout1)
  end
	
end

pendulum(print=true)

@benchmark pendulum(print=false)

You will get:

julia> include("./pend.jl")
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.795 ms (0.00% GC)
  median time:      1.868 ms (0.00% GC)
  mean time:        1.875 ms (0.00% GC)
  maximum time:     2.643 ms (0.00% GC)
  --------------
  samples:          2662
  evals/sample:     1


8 Likes

I tried using the Float64 versions in my program but I still got the same result as the original Julia program.

Thanks for the reply. Even if I switch off printing, the Julia code is still quite slower than C. Is there anything that can be done to improve performance, like Numba for Python?

There’s no need for anything like Numba–most likely you just need to follow @lmiq’s suggestion to put your code into a function (see Performance Tips · The Julia Language ). If that’s still not enough, definitely post your updated code and its performance here, and I’m sure we can find some more concrete suggestions.

6 Likes

As I posted above, you need to put your code inside a function (see the code I posted there). Also, Julia compiles the code the first time it is run, so for very quick tests like this you might be measuring the compilation time as an important part of the total time. That is why using the benchmark tools is important there.

With both codes printing (which is almost all the time), here the Julia code (I posted) takes 49ms and the C code (measured with system’s time, thus probably not a good measure either), takes 85ms. But with good performance measures I would expect both to be similar here.

Ps: If you want a simpler but reasonably fair comparison, take the code I posted there, remove the last line (@benchmark...) and change the number of steps to something much bigger (1 million). Probably removing printing is a good idea also:

%gcc -o pend pend.c -lm -O3
% time ./pend
real    0m6,144s
user    0m6,139s
sys     0m0,004s


% time julia ./pend.jl
real    0m4,854s
user    0m5,167s
sys     0m0,628s

With these codes:

Julia
using Printf, BenchmarkTools


function pendulum(;print=false,TMAX=500)

  X0         = 1.0		
  VX0        = 0.0		
  
  OMEGA      = 0.70
  GAMMA      = 0.05
  FF         = 0.6 
  
  DT         = 0.01  
  DT_DIV_2   = 0.5*DT

  f(t, x, v) = -GAMMA*v - sin(x) + FF*cos(OMEGA*t)

  if print
    out1 = @sprintf("dados_rk2_F%.2f_julia.txt", FF)
    fout1 = open(out1, "w")
  end
  i = 0
  x = X0 
  t = 0.0
  v = VX0
  
  while t <= TMAX

    k1x = v
    k1v = f(t,x,v)
    	
    tt  = t + DT_DIV_2
    xx  = x + k1x * DT_DIV_2
    vv  = v + k1v * DT_DIV_2	          
    	
    k2x = vv
    k2v = f(tt, xx, vv)                  
    	  
    x += k2x * DT	
    v += k2v * DT
    	
    i += 1
    t  = i*DT
     	
    if print 
      @printf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v)
    end
  end
  if print
    close(fout1)
  end 
  return x

end

pendulum(print=false,TMAX=1_000_000)

C
#include <stdio.h>
#include <math.h>

const double X0       = 1.0;		
const double VX0      = 0.0;	

const double OMEGA    = 0.70;
const double GAMMA    = 0.05;
const double FF		  = 0.60; 

const double TMAX     = 1000000;
const double DT       = 0.01;

double f(double t, double x, double v)
{ 
	return -GAMMA*v - sin(x) + FF*cos(OMEGA*t);
}

int main(int argc, char *argv[])
{	
	double x, v, t;
	double xx, vv, tt, k1x, k2x, k1v, k2v;
	double DT_DIV_2 = 0.5*DT;

	char   out1[200];
	FILE   *fout1;
	
	int i = 0;

/*	
	sprintf(out1,"dados_rk2_F%.2f.txt",FF);
	fout1=fopen(out1,"w");
*/
	
	x = X0; v = VX0; t = 0.0;

	while (t <= TMAX)
	{
		k1x = v;                      
		k1v = f(t,x,v); 
		
		tt  = t + DT_DIV_2;
		xx  = x + k1x * DT_DIV_2;
		vv  = v + k1v * DT_DIV_2;	          
		
		k2x = vv;
		k2v = f(tt, xx, vv);                  
		  
		x += k2x * DT;	
		v += k2v * DT;
	
		i++;
		t = i*DT;
		
/*
		fprintf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v);
*/
	}
	
/*
	fclose(fout1);
*/

	return x;
}

(note that I had to return x from both functions, otherwise the compiler with -O3 just figures out that it does not need to do anything if not printing the trajectory)

2 Likes

Thank you @rdeits and @lmiq! This was my first attempt at using Julia.

That was very helpful! Thank you very much.

1 Like

I’m impressed! Even printing every 100 steps it is running faster than C.

2 Likes

On neither side you should expect one code to be much faster than the other. If one or other thing is happening, probably there is something to be improved on the other side. I am saying this so that expectations are reasonable: to get performance in Julia you need to get used to some few coding strategies that are inherent to it. And with that you will get code that is of the order of C-fast.

From that point on one or other code can get faster with more specialized optimizations, tuning compiler flags, etc.

Juila is nice because the experience of coding that is much smoother in general (once one gets used to it, as in all cases).

8 Likes

At least on my machine, gcc doesn’t want to use AVX, whereas Julia seemingly does.

I suspect this vectorization is what makes the difference here.

C asm
0000000000001420 <f>:                                                                   
    1420:       f3 0f 1e fa             endbr64                                         
    1424:       48 83 ec 28             sub    $0x28,%rsp                               
    1428:       f2 0f 11 44 24 18       movsd  %xmm0,0x18(%rsp)                         
    142e:       66 0f 28 c1             movapd %xmm1,%xmm0                              
    1432:       f2 0f 11 54 24 10       movsd  %xmm2,0x10(%rsp)                         
    1438:       e8 b3 fc ff ff          callq  10f0 <sin@plt>                           
    143d:       f2 0f 10 5c 24 18       movsd  0x18(%rsp),%xmm3                         
    1443:       f2 0f 59 1d 25 0c 00    mulsd  0xc25(%rip),%xmm3        # 2070 <X0+0x8> 
    144a:       00                                                                      
    144b:       f2 0f 11 44 24 08       movsd  %xmm0,0x8(%rsp)                          
    1451:       66 0f 28 c3             movapd %xmm3,%xmm0                              
    1455:       e8 76 fc ff ff          callq  10d0 <cos@plt>                           
    145a:       f2 0f 59 05 1e 0c 00    mulsd  0xc1e(%rip),%xmm0        # 2080 <X0+0x18>
    1461:       00                                                                      
    1462:       f2 0f 10 54 24 10       movsd  0x10(%rsp),%xmm2                         
    1468:       f2 0f 59 15 08 0c 00    mulsd  0xc08(%rip),%xmm2        # 2078 <X0+0x10>
    146f:       00                                                                      
    1470:       f2 0f 5c 54 24 08       subsd  0x8(%rsp),%xmm2                          
    1476:       48 83 c4 28             add    $0x28,%rsp                               
    147a:       f2 0f 58 c2             addsd  %xmm2,%xmm0                              
    147e:       c3                      retq                                            
    147f:       90                      nop                                             

vs.

Julia asm
julia> @code_native acceleration(0.0, 1.0, 0.0, 0.05, 0.6, 0.7)     
        .text                                                       
; ┌ @ REPL[42]:1 within `acceleration'                              
        pushq   %rbx                                                
        subq    $32, %rsp                                           
        vmovsd  %xmm5, 16(%rsp)                                     
        vmovsd  %xmm4, 24(%rsp)                                     
        vmovsd  %xmm0, 8(%rsp)                                      
        movabsq $.rodata.cst16, %rax                                
; │┌ @ float.jl:383 within `-'                                      
        vxorpd  (%rax), %xmm3, %xmm0                                
; │└                                                                
; │┌ @ float.jl:395 within `*'                                      
        vmulsd  %xmm2, %xmm0, %xmm0                                 
        vmovsd  %xmm0, (%rsp)                                       
        movabsq $cos, %rbx                                          
; │└                                                                
; │┌ @ REPL[38]:1 within `csin'                                     
        leaq    17120(%rbx), %rax                                   
        vmovapd %xmm1, %xmm0                                        
        callq   *%rax                                               
        vmovsd  (%rsp), %xmm1                   # xmm1 = mem[0],zero
; │└                                                                
; │┌ @ float.jl:392 within `-'                                      
        vsubsd  %xmm0, %xmm1, %xmm0                                 
        vmovsd  %xmm0, (%rsp)                                       
        vmovsd  8(%rsp), %xmm0                  # xmm0 = mem[0],zero
; │└                                                                
; │┌ @ float.jl:395 within `*'                                      
        vmulsd  16(%rsp), %xmm0, %xmm0                              
; │└                                                                
; │┌ @ REPL[40]:1 within `ccos'                                     
        callq   *%rbx                                               
; │└                                                                
; │┌ @ float.jl:395 within `*'                                      
        vmulsd  24(%rsp), %xmm0, %xmm0                              
; │└                                                                
; │┌ @ float.jl:389 within `+'                                      
        vaddsd  (%rsp), %xmm0, %xmm0                                
; │└                                                                
        addq    $32, %rsp                                           
        popq    %rbx                                                
        retq                                                        
        nopw    %cs:(%rax,%rax)                                     
; └                                                                 

no, the calls are not inlined in julia (not with ccos and not with native sin), though I did rename them for my understanding - acceleration is your f:

callq are in here
L1340:                                                              
        vmovsd  %xmm0, 32(%rsp)                                     
        movabsq $sin, %rbx                                          
; │ @ chaotic.jl:18 within `pendulum'                               
; │┌ @ chaotic.jl:13 within `acceleration'                          
        callq   *%rbx                                               
        vmovsd  %xmm0, 120(%rsp)                                    
        vmovsd  24(%rsp), %xmm0                 # xmm0 = mem[0],zero
; ││┌ @ float.jl:395 within `*'                                     
        vmulsd  80(%rsp), %xmm0, %xmm0                              
        movabsq $cos, %r15                                          
; ││└                                                               
        callq   *%r15                                               
        vmovsd  %xmm0, 112(%rsp)                                    
        vmovsd  72(%rsp), %xmm0                 # xmm0 = mem[0],zero
; │└                                                                ```
Julia Code
function pendulum(print=false, x=1.0, v=0.0, dt=0.01, tmax=500.0, Ω=0.70, Γ=0.05, FF=0.6)
    if print
        out1 = @sprintf("2_dados_rk2_F%.2f_julia.txt", FF)
        fout1 = open(out1, "w")
    end
    acceleration(t, x, v) = -Γ * v - sin(x) + FF * cos(Ω * t)

    dt_half = dt / 2.0

    for t in range(0.0, tmax; step=dt)
        acc_start = acceleration(t, x, v)
        
        t_middle  = t + dt_half
        x_middle  = x + v * dt_half
        v_middle  = v + acc_start * dt_half	          
        acc_middle = acceleration(t_middle, x_middle, v_middle)                  
        
        x += v_middle * dt	
        v += acc_middle * dt
  
        if print
            @printf(fout1,"%12.6f %26.20f %26.20f\n", t+dt, x, v)
        end
    end
    if print
        close(fout1)
    end
end

C code was the one by @lmiq, compiled with gcc -o pendulum pendulum.c -lm -O3.

3 Likes

Now the really cool thing is this. You can with very minor modifications to your function make simulations of the pendulums of any dimension, defined by the input type of the initial point and velocities. Now do that with C :slight_smile:

There are a few annotations on where I had to change the code (besides the plotting, of course):

using Printf, StaticArrays

function pendulum(X0,VX0;printout=false,TMAX=500) # X0 and VX0 are input parameters

  OMEGA      = 0.70
  GAMMA      = 0.05
  FF         = 0.6 
  
  DT         = 0.01  
  DT_DIV_2   = 0.5*DT

  f(t, x, v) = -GAMMA*v - sin(x) + FF*cos(OMEGA*t)

  if printout
    out1 = "dados_julia.txt"
    fout1 = open(out1, "w")
  end
  i = 0
  x = X0 
  t = 0.0
  v = VX0
  
  while t <= TMAX

    k1x = v
    k1v = f.(t,x,v) # Added a dot here
    	
    tt  = t + DT_DIV_2
    xx  = x + k1x * DT_DIV_2
    vv  = v + k1v * DT_DIV_2	          
    	
    k2x = vv
    k2v = f.(tt, xx, vv) # added a dot here                  
    	  
    x += k2x * DT	
    v += k2v * DT
    	
    i += 1
    t  = i*DT
     	
    if printout && i%5 == 0 # print generic sizes 
      println(fout1,"$t ",[ "$xi " for xi in x ]...)
    end
  end
  if printout
    close(fout1)
  end
  return x

end

using DelimitedFiles, Plots, Plots.Measures

plot(layout=(1,3))
default(label="")

# 1D pendulum
x0 = 1.0
vx0 = rand()
pendulum(x0,vx0,printout=true,TMAX=1_000)
x1 = readdlm("dados_julia.txt")
plot!(x1[:,1],x1[:,2],subplot=1,title="1D pendulum")
plot!(xlabel="time",ylabel="x",subplot=1)

# 2D pendulum
n = 2
x0 = ones(SVector{n,Float64})
vx0 = rand(SVector{n,Float64})
pendulum(x0,vx0,printout=true,TMAX=1_000)
x2 = readdlm("dados_julia.txt")
plot!(x2[:,1],x2[:,2],x2[:,3],subplot=2,title="2D pendulum")
plot!(xlabel="time",ylabel="x1",zlabel="x2",subplot=2)

# 3D pendulum
n = 3
x0 = ones(SVector{n,Float64})
vx0 = rand(SVector{n,Float64})
pendulum(x0,vx0,printout=true,TMAX=1_000)
x3 = readdlm("dados_julia.txt")
plot!(x3[:,2],x3[:,3],x3[:,4],subplot=3,title="3D pendulum")
plot!(xlabel="x1",ylabel="x2",zlabel="x3",subplot=3)

plot!(size=(1200,500),margin=5mm)
savefig("./pend.png")





14 Likes

That’s really cool! Thank for the nice examples. I guess I’ll be learning Julia.

7 Likes

For what it’s worth, you may wish to compare the results also with those from Mathematica, which is presumably using state-of-the-art numerical methods.

Code:

Plot:

3 Likes