I am a newbie to Julia, and I came up with a script to produce a bifurcation diagram for the logistic map. The logistic map is well known and I will be brief here: x(t+1) = r*x(t)*(1-x(t))
, 0 ≤ r ≤ 4
. For accuracy, I choose maxiter = 1000
and r = LinRange(0.0,4.0, 6001)
.
I have been able to run the code and produce such diagram. Performance is quite crucial in this kind of problem, and to check it in my code I tried without any macro and with the following macros: @inbounds
, @simd
and @avx
. There seems to be a kind of interdependency between @avx
and @btime
in my code, which I do not understand. Below, I explain what I do using three scenarios: (i) applying @inbounds @simd
, (ii) applying @avx
, (iii) applying @avx
and @btime
. The results look very different indeed.
1. Applying the macros @inbounds @simd
using Plots, BenchmarkTools, LoopVectorization, SIMD
function bifur!(rs, x, xs, maxiter)
for k = 1 : length(rs)
x[k,1] = xs # initial condition
@inbounds @simd for j = 1 : maxiter-1
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
return x
end
rs = LinRange(0.0, 4.0, 6001) #rs = collect(rs)
xs = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs), maxiter)
#@btime begin
bifur!(rs, x, xs, maxiter)
#end
#@benchmark bifur!($rs, x, $xs, $maxiter)
@benchmark bifur!($rs, x, $xs, $maxiter) setup=(x = zeros(length($rs), $maxiter)) evals=1
@btime begin
plot(rs, x[:, end-50:end], #avoid transients
#plot($rs, @view($x[:, end-50:end]),
seriestype = :scatter,
markercolor=:blue,
markerstrokecolor=:match,
markersize = 1.1,
markerstrokewidth = 0,
legend = false,
markeralpha = 0.3)
#xticks! = 0:1:4
xlims!(2.75,4)
#savefig("zee_bifurcation.png")
end
The output that the function bifur!
delivers is:
6001×1000 Array{Float64,2}:
0.5 0.0 0.0 … 0.0 0.0 0.0
0.5 0.000166667 1.11093e-7 0.0 0.0 0.0
0.5 0.000333333 4.44296e-7 0.0 0.0 0.0
0.5 0.0005 9.995e-7 0.0 0.0 0.0
0.5 0.000666667 1.77659e-6 0.0 0.0 0.0
0.5 0.000833333 2.77546e-6 … 0.0 0.0 0.0
0.5 0.001 3.996e-6 0.0 0.0 0.0
0.5 0.00116667 5.43809e-6 0.0 0.0 0.0
0.5 0.00133333 7.10163e-6 0.0 0.0 0.0
0.5 0.0015 8.9865e-6 0.0 0.0 0.0
⋮ ⋱
0.5 0.998667 0.00531912 0.226126 0.699039 0.840412
0.5 0.998833 0.00465578 0.7908 0.660969 0.895311
0.5 0.999 0.003992 0.915731 0.308363 0.852247
0.5 0.999167 0.00332778 … 0.23146 0.710952 0.821312
0.5 0.999333 0.00266311 0.895762 0.37324 0.935104
0.5 0.9995 0.001998 0.9993 0.00279591 0.0111468
0.5 0.999667 0.00133244 0.0453006 0.172936 0.571927
0.5 0.999833 0.000666444 0.272303 0.792485 0.657701
0.5 1.0 0.0 … 0.0 0.0 0.0
The execution time given by @benchmark bifur!($rs, x, $xs, $maxiter) setup=(x = zeros(length($rs), $maxiter)) evals=1
is as follows (actually, the function without those two macros runs with very similar times):
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 19.345 ms (0.00% GC)
median time: 19.805 ms (0.00% GC)
mean time: 20.000 ms (0.00% GC)
maximum time: 28.163 ms (0.00% GC)
--------------
samples: 172
evals/sample: 1
2. Here, I apply the macro @avx
.
I keep the code exactly as it was in point (1), but I have to remove the @inbounds @simd
to avoid LoadError (LoadError: Don't know how to handle expression
), and call the bifur!
function with exactly the same arguments:
function bifur!(rs, x, xs, maxiter)
@avx for k = 1 : length(rs) # macro to vectorize the loop
x[k,1] = xs # initial condition
for j = 1 : maxiter-1
x[k, j+1] = rs[k] * x[k, j] * (1 - x[k,j])
end
end
return x
end
rs = LinRange(0.0, 4.0, 6001) #rs = collect(rs)
xs = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs), maxiter)
#@btime begin
bifur!(rs, x, xs, maxiter)
#end
The output is now completely different:
6001×1000 Array{Float64,2}:
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
⋮ ⋮ ⋱ ⋮
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
The execution time is:
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.840 ms (0.00% GC)
median time: 2.949 ms (0.00% GC)
mean time: 3.408 ms (0.00% GC)
maximum time: 12.612 ms (0.00% GC)
--------------
samples: 391
evals/sample: 1
3. Applying the macros @avx
and @btime
The problem with the approach in point (2) is that it renders the entire exercise useless: the loops do not produce the intended output, which is the 6001×1000 Array{Float64,2} as found in point (1). However, if I use the code of point (2), but call the bifur!
as follows:
@btime begin
bifur!(rs, x, xs, maxiter)
end
instead of
#@btime begin
bifur!(rs, x, xs, maxiter)
#end
the output turns out to be the correct one (the same as in point 1). And the execution time is
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.852 ms (0.00% GC)
median time: 2.962 ms (0.00% GC)
mean time: 3.410 ms (0.00% GC)
maximum time: 12.561 ms (0.00% GC)
--------------
samples: 389
evals/sample: 1
My fundamental question is: what is happening between the @avx
and the @btime
that leads the code to produce the correct output? Is this just a lousy coding style of myself? Or is there something that I overlooked when reading the docs?
It should be stressed that the @avx
macro does indeed speed up the bifur!
function (from 20ms to 3.4ms), which is 6 times faster. But it looks awkward that, to obtain the correct output, I have to call that function by using the @btime
macro. Suppose I just want to call the function, without executing `@btime: what should I have to do to get the correct output?
Help would be appreciated.