I have a compute-heavy function kernel that I am trying to use inside another function. This function takes several arrays and other data as arguments, and I was trying to make the code cleaner by passing an object as argument, but the performance drops markedly when I do so. Here’s a quick code for demonstration.
# Passing all data as explicit arguments:
function compute1(nx, ny, nz, arr1, arr2)
for k in 1:nz, j in 1:ny, i in 1:nx
tmp = arr1[i, j, k] * arr2[i, j, k]
tmp2 = tmp + arr1[i, j, k]
end
end
function start_compute1()
nx = 10
ny = 10
nz = 200
arr1 = zeros(Float64, (nx, ny, nz))
arr2 = ones(Float64, (nx, ny, nz))
compute1(nx, ny, nz, arr1, arr2)
end
@benchmark start_compute1()
BenchmarkTools.Trial:
memory estimate: 312.66 KiB
allocs estimate: 4
--------------
minimum time: 39.613 μs (0.00% GC)
median time: 157.446 μs (0.00% GC)
mean time: 192.640 μs (20.39% GC)
maximum time: 6.807 ms (97.69% GC)
--------------
samples: 10000
evals/sample: 1
This is the version that performs. My first attempt at passing a object with the data was with a dicionary:
function compute2(data)
nx, ny, nz = data[:nx], data[:ny], data[:nz]
arr1 = data[:arr1]
arr2 = data[:arr2]
for k in 1:nz, j in 1:ny, i in 1:nx
tmp = arr1[i, j, k] * arr2[i, j, k]
tmp2 = tmp + arr1[i, j, k]
end
end
function start_compute2()
nx = 10
ny = 10
nz = 200
arr1 = zeros(Float64, (nx, ny, nz))
arr2 = ones(Float64, (nx, ny, nz))
data = Dict()
data[:nx], data[:ny], data[:nz] = nx, ny, nz
data[:arr1] = arr1
data[:arr2] = arr2
compute2(data)
end
@benchmark start_compute2()
BenchmarkTools.Trial:
memory estimate: 2.58 MiB
allocs estimate: 124409
--------------
minimum time: 2.178 ms (0.00% GC)
median time: 2.327 ms (0.00% GC)
mean time: 2.593 ms (10.94% GC)
maximum time: 8.116 ms (70.25% GC)
--------------
samples: 1928
evals/sample: 1
Seems like the problem is that it is not optimising for what is inside the Dict
. I have also tried to use a struct
with a well-defined data type instead of Dict
, or including type annotations for arr1
and arr2
(e.g. arr1 = data[:arr1]::Array{Float64, 3}
), but the problem persists.
Is there a way to recover the performance without having to spell out all individual arguments into the compute2
function? Perhaps there is an obvious solution, but I’m not finding it.
Any suggestions would be appreciated.