Hello,
I’m trying to write a Lattice Boltzmann Solver in Julia. My initial version strongly relies on global constants (mostly numeric values and Arrays). I’ve tried to use immutable structures that hold the state of my simulator, but found a significant difference in performance.
My minimal working example focus on essentially calculating sum over one of array’s dimension.
I see that when the function operates on a constant global it is significantly faster than when the same data are accessed through a structure’s property.
I am comparing four approaches: moments1()
and moments2()
use nested loops but differ the accessed array (structure’s property vs. constant global array); moments3()
and moments4()
are similar to the former two, but this time I use Einsum
package instead of nested loops.
const H = 20
const W = 20
const f = zeros(H, W, 9)
const rho = ones(H, W)
struct Lattice
f :: Array{Float64, 3}
end
const lattice = Lattice(f)
function moments1()
lf = lattice.f
let n, m, i
@inbounds for m=1:W, n=1:H
local a = 0.0
for i=1:9 a += lf[n, m, i] end
rho[n, m] = a
end
end
return nothing
end
function moments2()
let n, m, i
@inbounds for m=1:W, n=1:H
local a = 0.0
for i=1:9 a += f[n, m, i] end
rho[n, m] = a
end
end
return nothing
end
using Einsum
function moments3()
lf = lattice.f
@einsum rho[n, m] = lf[n, m, i]
return nothing
end
function moments4()
@einsum rho[n, m] = f[n, m, i]
return nothing
end
using BenchmarkTools
@btime moments1() # 423.995 ns (0 allocations: 0 bytes)
@btime moments2() # 214.657 ns (0 allocations: 0 bytes)
@btime moments3() # 1471. ns (0 allocations: 0 bytes)
@btime moments4() # 214.623 ns (0 allocations: 0 bytes)
I was inspecting the LLVM code of all the four methods, and of course they differ, but I don’t know the cause.
Is accessing the property really that a big deal or is it something else that I am missing?