Hi,
I found some strange performance issues in a simple code, using a mutable struct that contains an array and an index to the array as fields. In the simplified example below I compare four different function that fill all array elements with a constant.
const CELLS = 1024*1024
mutable struct Box
v :: Vector{Int}
ind :: Int
end
function fill1!(box,n)
# code without helper function runs rather fast
box.ind = 0
for i=1:CELLS
box.ind += 1
box.v[box.ind] = n
end
end
function fill2!(box,n)
# calling a small function to fill one element
# -> this take much longer
box.ind = 0
for i=1:CELLS
fill_elem!(box,n)
end
end
function fill_elem!(box, n)
box.ind += 1
box.v[box.ind] = n
end
function fill3!(box,n)
# subtle change to fill1! but about double runtime
box.ind = 1
for i=1:CELLS
box.v[box.ind] = n
box.ind += 1
end
end
function fill4!(v,n)
# avoiding the mutable struct is the fastest
ind = 0
for i=1:CELLS
ind += 1
v[ind] = n
end
end
function main(method) # benchmark
v = zeros(Int,CELLS)
box = Box(v,1)
if method == 1
for n = 1:1000
fill1!(box,n)
end
elseif method == 2
for n = 1:1000
fill2!(box,n)
end
elseif method == 3
for n = 1:1000
fill3!(box,n)
end
else
for n = 1:1000
fill4!(v,n)
end
end
end
In this example I observe very different run-times, depending on which fill-function is used. Probably I miss something simple, but I cannot understand the reason for the performance differences, in particular why using the small helper function makes such a difference.
In a straightforward translation to C-code all four possibilities result in comparable run-time.
Of course I can always store the array and the index in separate variables (avoiding these issues), but what is wrong to put them into a single struct?
@time main(1) # 0.634483 seconds (10.90 k allocations: 8.557 MiB, 0.26% gc time)
@time main(2) # 3.529522 seconds (7 allocations: 8.000 MiB)
@time main(3) # 1.214788 seconds (7 allocations: 8.000 MiB)
@time main(4) # 0.509509 seconds (7 allocations: 8.000 MiB, 0.23% gc time)
Note, this is not about the most efficient way to initialize an array. The code above is a contrived (simplified) example of some more involved code where I stumbled across these unexpected performance issues.