Hi everyone,
I’m in the process of optimizing a portion of my code that has a general problem of too much memory allocation, and consequently poor threading efficiency and overall performance. This is a series of pretty simple functions that are called a gazillion times, and each of which needs to use various small temporary arrays.
I’m able to rely on Static Arrays (which, as I understand, get allocated on the stack) but only when the arrays in question can be initialized in a single line of code. However, in other situations, when I need to perform a small computation to build my array, the only solution I found was defining “work” arrays and passing them around. Now, this can be annoying sometimes, because I have several small arrays that I use.
Consider this MWE:
function barw_array( xnodes )
n = length(xnodes)
barw = Array{Float64}(undef,n)
for j ∈ 1:n
barw[j] = 1.0
for k ∈ 1:n
if k != j
barw[j] = barw[j] / (xnodes[j] - xnodes[k])
end
end
end
return sum(barw)
end
function run_barw_array()
xnodes = LinRange(0,1,4)
for i ∈ 1:100000
u = barw_array( xnodes )
end
end
@time run_interp()
This results in too much memory allocation (that grows with the size of the loop).
0.292006 seconds (2.63 M allocations: 109.113 MiB, 7.85% gc time, 15.11% compilation time)
The work-array version doesn’t have this problem
function barw_workvector( barw, xnodes )
n = length(xnodes)
for j ∈ 1:n
barw[j] = 1.0
for k ∈ 1:n
if k != j
barw[j] = barw[j] / (xnodes[j] - xnodes[k])
end
end
end
return sum(barw)
end
function run_barw_workvector()
xnodes = LinRange(0,1,4)
barw = Array{Float64}(undef,length(xnodes))
for i ∈ 1:100000
u = barw_workvector( barw, xnodes )
end
end
@time run_interp_workvector()
0.005938 seconds (2 allocations: 192 bytes)
Now, I would love to be able to obtain this functionality with StaticArrays.
I’ve been reading some threads in the forum, like this one:
Can Julia optimize mutable static arrays to be allocated on the stack?
but in the proposed solution, they use an MArray
in conjunction with a work array. If I try to allocate an MArray
in my original function
barw = MVector{n,Float64}
it seems that these vectors are allocated on the heap, and therefore I have the same poor performance as in the original code.
For a related discussion, this was also an interesting thread from 2018: Fortran vs Julia stack-allocated arrays.
I also tried using StaticArrays with generator constructors that call a function, just to see if that worked
using StaticArrays
function xdiff(xnodes,i)
xd = 1.0
for k ∈ eachindex(xnodes)
if k != i
xd = xd / (xnodes[i] - xnodes[k])
end
end
return xd
end
function sarray_constructor( xnodes )
n = length(xnodes)
barw = SVector{n,Float64}( ( xdiff(xnodes,i) for i in 1:n )... )
s = sum( barw )
return s
end
function run_barw_sarray_constructor()
xnodes = LinRange(0,1,4)
for i ∈ 1:100000
u = barw_sarray_constructor( xnodes )
end
end
@time run_barw_sarray_constructor()
This effectively doesn’t allocate. But regardless of the mess that this would be, there is also a performance penalty – benchmarking with @btime
I got:
7.868 ms (0 allocations: 0 bytes)
4.956 ms (2 allocations: 192 bytes)
Any ideas? Maybe there is a package that enables something along the lines of stack-allocated mutable arrays, or that makes this sort of StaticArray constructors work properly?
Thanks!