# Numerical difference between Julia and (C, Fortran, Python) for a chaotic damped driven pendulum

I wrote a simple code implementing a midpoint 2nd order Runge-Kutta method for a damped driven pendulum in different languages to compare run times. While the results were exactly the same for C, Fortran and Python (Fortran and C tested up to t = 10⁷) the results for Julia drifted away from these after a rather short time.

Since the problem is chaotic with the given parameters, any slight difference the initial conditions will grow with time; however, all languages tested except for Julia apparently were able to perform exactly the same calculations.

Can someone explain what is happening? (Equations, Julia and C codes below image)

\begin{cases}\frac{d\theta}{dt} = v \\ \frac{dv}{dt} = -\gamma\, v - \sin(\theta) + F \cos(\omega\, t) \end{cases}
x_0 = 1.0,\; v_0 = 0.0
\omega = 0.7, \; \gamma = 0.05, \; F = 0.6

Julia code:

using Printf

X0         = 1.0
VX0        = 0.0

OMEGA      = 0.70
GAMMA      = 0.05
FF         = 0.6

TMAX       = 500
DT         = 0.01
DT_DIV_2   = 0.5*DT

function f(t, x, v)
return -GAMMA*v - sin(x) + FF*cos(OMEGA*t)
end

fout1 = open(out1, "w")

i = 0

x = X0
v = VX0
t = 0.0

while (t<=TMAX)
k1x = v
k1v = f(t,x,v)

tt  = t + DT_DIV_2
xx  = x + k1x * DT_DIV_2
vv  = v + k1v * DT_DIV_2

k2x = vv
k2v = f(tt, xx, vv)

global x += k2x * DT
global v += k2v * DT

global i += 1
global t  = i*DT

@printf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v)

end

close(fout1)


C language code:

#include <stdio.h>
#include <math.h>

const double X0       = 1.0;
const double VX0      = 0.0;

const double OMEGA    = 0.70;
const double GAMMA    = 0.05;
const double FF		  = 0.60;

const double TMAX     = 500;
const double DT       = 0.01;

double f(double t, double x, double v)
{
return -GAMMA*v - sin(x) + FF*cos(OMEGA*t);
}

int main(int argc, char *argv[])
{
double x, v, t;
double xx, vv, tt, k1x, k2x, k1v, k2v;
double DT_DIV_2 = 0.5*DT;

char   out1[200];
FILE   *fout1;

int i = 0;

fout1=fopen(out1,"w");

x = X0; v = VX0; t = 0.0;

while (t <= TMAX)
{
k1x = v;
k1v = f(t,x,v);

tt  = t + DT_DIV_2;
xx  = x + k1x * DT_DIV_2;
vv  = v + k1v * DT_DIV_2;

k2x = vv;
k2v = f(tt, xx, vv);

x += k2x * DT;
v += k2v * DT;

i++;
t = i*DT;

fprintf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v);
}

fclose(fout1);

return 0;
}

2 Likes

It’s chaotic, so the simplest things can cause this. My guess is that the first 3 use the same standard math library definition of sin and cos, while Julia’s is different in 1ulp or so.

10 Likes

Thanks for the heads up. I naively expected that all implementations for the transcendental functions should work exactly the same with double precision. If anyone is curious:

“Reputable numeric libraries compute the basic transcendental functions to between 0.5 and about 1 ULP. Only a few libraries compute them within 0.5 ULP, this problem being complex due to the Table-maker’s dilemma.”

3 Likes

You can make some rough estimate of the divergence due to the chaotic dynamics based on the Lyapunov exponent. Here is a simple script that calculates the maximum Lyapunov exponent:

using DynamicalSystems

function pendulum_rule(u, p, t)
ω, γ, F = p
θ, v = u
dθ = v
dv =  -γ*v - sin(θ) + F*cos(ω*t)
return SVector(dθ, dv)
end

p0 = [0.7, 0.05, 0.6]
u0 = [1.0 + 1e-3rand(), 0.0]

ds = ContinuousDynamicalSystem(pendulum_rule, u0, p0)

λ = lyapunov(ds, 100000.0)

e = 1e-12 # error expected

Δt = log(1/e)/λ

195.5...


Using this I get a prediction horizon of about 200 time units, which seems to go well with your plot. Of course, this is not precisely accurate, as the roundoff error due to cosine differences is more similar to dynamic noise constantly added to the system, but can give you a good estimate of how fast you would expect divergence.

8 Likes

You can try the system sin and cos functions instead of Julia’s:

csin(x::Float64) =  ccall(:sin, Float64, (Float64,), x)
csin(x::Float32) =  ccall(:sinf, Float32, (Float32,), x)
ccos(x::Float64) =  ccall(:cos, Float64, (Float64,), x)
ccos(x::Float32) =  ccall(:cosf, Float32, (Float32,), x)

11 Likes

Before you get disapointed, your code as is will be very slow in Julia. Put all inside a function and benchmark it with BenchmarkTools to actually get its peformance. Something like this (be aware that in all these benchmarks the time will be dominated by printing as they are):

using Printf, BenchmarkTools

function pendulum(;print=false)

X0         = 1.0
VX0        = 0.0

OMEGA      = 0.70
GAMMA      = 0.05
FF         = 0.6

TMAX       = 500
DT         = 0.01
DT_DIV_2   = 0.5*DT

f(t, x, v) = -GAMMA*v - sin(x) + FF*cos(OMEGA*t)

if print
fout1 = open(out1, "w")
end

i = 0
x = X0
t = 0.0
v = VX0

while t <= TMAX

k1x = v
k1v = f(t,x,v)

tt  = t + DT_DIV_2
xx  = x + k1x * DT_DIV_2
vv  = v + k1v * DT_DIV_2

k2x = vv
k2v = f(tt, xx, vv)

x += k2x * DT
v += k2v * DT

i += 1
t  = i*DT

if print
@printf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v)
end
end
if print
close(fout1)
end

end

pendulum(print=true)

@benchmark pendulum(print=false)



You will get:

julia> include("./pend.jl")
BenchmarkTools.Trial:
memory estimate:  0 bytes
allocs estimate:  0
--------------
minimum time:     1.795 ms (0.00% GC)
median time:      1.868 ms (0.00% GC)
mean time:        1.875 ms (0.00% GC)
maximum time:     2.643 ms (0.00% GC)
--------------
samples:          2662
evals/sample:     1


8 Likes

I tried using the Float64 versions in my program but I still got the same result as the original Julia program.

Thanks for the reply. Even if I switch off printing, the Julia code is still quite slower than C. Is there anything that can be done to improve performance, like Numba for Python?

There’s no need for anything like Numba–most likely you just need to follow @leandromartinez98’s suggestion to put your code into a function (see Performance Tips · The Julia Language ). If that’s still not enough, definitely post your updated code and its performance here, and I’m sure we can find some more concrete suggestions.

6 Likes

As I posted above, you need to put your code inside a function (see the code I posted there). Also, Julia compiles the code the first time it is run, so for very quick tests like this you might be measuring the compilation time as an important part of the total time. That is why using the benchmark tools is important there.

With both codes printing (which is almost all the time), here the Julia code (I posted) takes 49ms and the C code (measured with system’s time, thus probably not a good measure either), takes 85ms. But with good performance measures I would expect both to be similar here.

Ps: If you want a simpler but reasonably fair comparison, take the code I posted there, remove the last line (@benchmark...) and change the number of steps to something much bigger (1 million). Probably removing printing is a good idea also:

%gcc -o pend pend.c -lm -O3
% time ./pend
real    0m6,144s
user    0m6,139s
sys     0m0,004s

% time julia ./pend.jl
real    0m4,854s
user    0m5,167s
sys     0m0,628s



With these codes:

Julia
using Printf, BenchmarkTools

function pendulum(;print=false,TMAX=500)

X0         = 1.0
VX0        = 0.0

OMEGA      = 0.70
GAMMA      = 0.05
FF         = 0.6

DT         = 0.01
DT_DIV_2   = 0.5*DT

f(t, x, v) = -GAMMA*v - sin(x) + FF*cos(OMEGA*t)

if print
fout1 = open(out1, "w")
end
i = 0
x = X0
t = 0.0
v = VX0

while t <= TMAX

k1x = v
k1v = f(t,x,v)

tt  = t + DT_DIV_2
xx  = x + k1x * DT_DIV_2
vv  = v + k1v * DT_DIV_2

k2x = vv
k2v = f(tt, xx, vv)

x += k2x * DT
v += k2v * DT

i += 1
t  = i*DT

if print
@printf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v)
end
end
if print
close(fout1)
end
return x

end

pendulum(print=false,TMAX=1_000_000)


C
#include <stdio.h>
#include <math.h>

const double X0       = 1.0;
const double VX0      = 0.0;

const double OMEGA    = 0.70;
const double GAMMA    = 0.05;
const double FF		  = 0.60;

const double TMAX     = 1000000;
const double DT       = 0.01;

double f(double t, double x, double v)
{
return -GAMMA*v - sin(x) + FF*cos(OMEGA*t);
}

int main(int argc, char *argv[])
{
double x, v, t;
double xx, vv, tt, k1x, k2x, k1v, k2v;
double DT_DIV_2 = 0.5*DT;

char   out1[200];
FILE   *fout1;

int i = 0;

/*
fout1=fopen(out1,"w");
*/

x = X0; v = VX0; t = 0.0;

while (t <= TMAX)
{
k1x = v;
k1v = f(t,x,v);

tt  = t + DT_DIV_2;
xx  = x + k1x * DT_DIV_2;
vv  = v + k1v * DT_DIV_2;

k2x = vv;
k2v = f(tt, xx, vv);

x += k2x * DT;
v += k2v * DT;

i++;
t = i*DT;

/*
fprintf(fout1,"%12.6f %26.20f %26.20f\n", t, x, v);
*/
}

/*
fclose(fout1);
*/

return x;
}


(note that I had to return x from both functions, otherwise the compiler with -O3 just figures out that it does not need to do anything if not printing the trajectory)

2 Likes

Thank you @rdeits and @leandromartinez98! This was my first attempt at using Julia.

That was very helpful! Thank you very much.

1 Like

I’m impressed! Even printing every 100 steps it is running faster than C.

2 Likes

On neither side you should expect one code to be much faster than the other. If one or other thing is happening, probably there is something to be improved on the other side. I am saying this so that expectations are reasonable: to get performance in Julia you need to get used to some few coding strategies that are inherent to it. And with that you will get code that is of the order of C-fast.

From that point on one or other code can get faster with more specialized optimizations, tuning compiler flags, etc.

Juila is nice because the experience of coding that is much smoother in general (once one gets used to it, as in all cases).

8 Likes

At least on my machine, gcc doesn’t want to use AVX, whereas Julia seemingly does.

I suspect this vectorization is what makes the difference here.

C asm
0000000000001420 <f>:
1420:       f3 0f 1e fa             endbr64
1424:       48 83 ec 28             sub    $0x28,%rsp 1428: f2 0f 11 44 24 18 movsd %xmm0,0x18(%rsp) 142e: 66 0f 28 c1 movapd %xmm1,%xmm0 1432: f2 0f 11 54 24 10 movsd %xmm2,0x10(%rsp) 1438: e8 b3 fc ff ff callq 10f0 <sin@plt> 143d: f2 0f 10 5c 24 18 movsd 0x18(%rsp),%xmm3 1443: f2 0f 59 1d 25 0c 00 mulsd 0xc25(%rip),%xmm3 # 2070 <X0+0x8> 144a: 00 144b: f2 0f 11 44 24 08 movsd %xmm0,0x8(%rsp) 1451: 66 0f 28 c3 movapd %xmm3,%xmm0 1455: e8 76 fc ff ff callq 10d0 <cos@plt> 145a: f2 0f 59 05 1e 0c 00 mulsd 0xc1e(%rip),%xmm0 # 2080 <X0+0x18> 1461: 00 1462: f2 0f 10 54 24 10 movsd 0x10(%rsp),%xmm2 1468: f2 0f 59 15 08 0c 00 mulsd 0xc08(%rip),%xmm2 # 2078 <X0+0x10> 146f: 00 1470: f2 0f 5c 54 24 08 subsd 0x8(%rsp),%xmm2 1476: 48 83 c4 28 add$0x28,%rsp
147a:       f2 0f 58 c2             addsd  %xmm2,%xmm0
147e:       c3                      retq
147f:       90                      nop


vs.

Julia asm
julia> @code_native acceleration(0.0, 1.0, 0.0, 0.05, 0.6, 0.7)
.text
; ┌ @ REPL[42]:1 within acceleration'
pushq   %rbx
subq    $32, %rsp vmovsd %xmm5, 16(%rsp) vmovsd %xmm4, 24(%rsp) vmovsd %xmm0, 8(%rsp) movabsq$.rodata.cst16, %rax
; │┌ @ float.jl:383 within -'
vxorpd  (%rax), %xmm3, %xmm0
; │└
; │┌ @ float.jl:395 within *'
vmulsd  %xmm2, %xmm0, %xmm0
vmovsd  %xmm0, (%rsp)
movabsq $cos, %rbx ; │└ ; │┌ @ REPL[38]:1 within csin' leaq 17120(%rbx), %rax vmovapd %xmm1, %xmm0 callq *%rax vmovsd (%rsp), %xmm1 # xmm1 = mem[0],zero ; │└ ; │┌ @ float.jl:392 within -' vsubsd %xmm0, %xmm1, %xmm0 vmovsd %xmm0, (%rsp) vmovsd 8(%rsp), %xmm0 # xmm0 = mem[0],zero ; │└ ; │┌ @ float.jl:395 within *' vmulsd 16(%rsp), %xmm0, %xmm0 ; │└ ; │┌ @ REPL[40]:1 within ccos' callq *%rbx ; │└ ; │┌ @ float.jl:395 within *' vmulsd 24(%rsp), %xmm0, %xmm0 ; │└ ; │┌ @ float.jl:389 within +' vaddsd (%rsp), %xmm0, %xmm0 ; │└ addq$32, %rsp
popq    %rbx
retq
nopw    %cs:(%rax,%rax)
; └


no, the calls are not inlined in julia (not with ccos and not with native sin), though I did rename them for my understanding - acceleration is your f:

callq are in here
L1340:
vmovsd  %xmm0, 32(%rsp)
movabsq $sin, %rbx ; │ @ chaotic.jl:18 within pendulum' ; │┌ @ chaotic.jl:13 within acceleration' callq *%rbx vmovsd %xmm0, 120(%rsp) vmovsd 24(%rsp), %xmm0 # xmm0 = mem[0],zero ; ││┌ @ float.jl:395 within *' vmulsd 80(%rsp), %xmm0, %xmm0 movabsq$cos, %r15
; ││└
callq   *%r15
vmovsd  %xmm0, 112(%rsp)
vmovsd  72(%rsp), %xmm0                 # xmm0 = mem[0],zero
; │└                                                                

Julia Code
function pendulum(print=false, x=1.0, v=0.0, dt=0.01, tmax=500.0, Ω=0.70, Γ=0.05, FF=0.6)
if print
fout1 = open(out1, "w")
end
acceleration(t, x, v) = -Γ * v - sin(x) + FF * cos(Ω * t)

dt_half = dt / 2.0

for t in range(0.0, tmax; step=dt)
acc_start = acceleration(t, x, v)

t_middle  = t + dt_half
x_middle  = x + v * dt_half
v_middle  = v + acc_start * dt_half
acc_middle = acceleration(t_middle, x_middle, v_middle)

x += v_middle * dt
v += acc_middle * dt

if print
@printf(fout1,"%12.6f %26.20f %26.20f\n", t+dt, x, v)
end
end
if print
close(fout1)
end
end


C code was the one by @leandromartinez98, compiled with gcc -o pendulum pendulum.c -lm -O3.

3 Likes

Now the really cool thing is this. You can with very minor modifications to your function make simulations of the pendulums of any dimension, defined by the input type of the initial point and velocities. Now do that with C

There are a few annotations on where I had to change the code (besides the plotting, of course):

using Printf, StaticArrays

function pendulum(X0,VX0;printout=false,TMAX=500) # X0 and VX0 are input parameters

OMEGA      = 0.70
GAMMA      = 0.05
FF         = 0.6

DT         = 0.01
DT_DIV_2   = 0.5*DT

f(t, x, v) = -GAMMA*v - sin(x) + FF*cos(OMEGA*t)

if printout
fout1 = open(out1, "w")
end
i = 0
x = X0
t = 0.0
v = VX0

while t <= TMAX

k1x = v
k1v = f.(t,x,v) # Added a dot here

tt  = t + DT_DIV_2
xx  = x + k1x * DT_DIV_2
vv  = v + k1v * DT_DIV_2

k2x = vv
k2v = f.(tt, xx, vv) # added a dot here

x += k2x * DT
v += k2v * DT

i += 1
t  = i*DT

if printout && i%5 == 0 # print generic sizes
println(fout1,"$t ",[ "$xi " for xi in x ]...)
end
end
if printout
close(fout1)
end
return x

end

using DelimitedFiles, Plots, Plots.Measures

plot(layout=(1,3))
default(label="")

# 1D pendulum
x0 = 1.0
vx0 = rand()
pendulum(x0,vx0,printout=true,TMAX=1_000)
plot!(x1[:,1],x1[:,2],subplot=1,title="1D pendulum")
plot!(xlabel="time",ylabel="x",subplot=1)

# 2D pendulum
n = 2
x0 = ones(SVector{n,Float64})
vx0 = rand(SVector{n,Float64})
pendulum(x0,vx0,printout=true,TMAX=1_000)
plot!(x2[:,1],x2[:,2],x2[:,3],subplot=2,title="2D pendulum")
plot!(xlabel="time",ylabel="x1",zlabel="x2",subplot=2)

# 3D pendulum
n = 3
x0 = ones(SVector{n,Float64})
vx0 = rand(SVector{n,Float64})
pendulum(x0,vx0,printout=true,TMAX=1_000)
plot!(x3[:,2],x3[:,3],x3[:,4],subplot=3,title="3D pendulum")
plot!(xlabel="x1",ylabel="x2",zlabel="x3",subplot=3)

plot!(size=(1200,500),margin=5mm)
savefig("./pend.png")

`
14 Likes

That’s really cool! Thank for the nice examples. I guess I’ll be learning Julia.

7 Likes

For what it’s worth, you may wish to compare the results also with those from Mathematica, which is presumably using state-of-the-art numerical methods.

Code:

Plot:

3 Likes