I’ve been running into a bizarre problem with huge allocations, and I’ve just discovered that the problem can be avoided if I don’t use a keyword argument to a function inside my main computation function. The allocations aren’t happening in a line that actually uses the kwarg; the allocations come later. @code_warntype
shows no problems, and specifically everything directly involved in the worst lines are just Int64
and Float64
. Even for my MWE below, the worst line — which I wouldn’t expect to allocate at all — averages ~128 B of allocation per iteration. I don’t even see that many bytes in all the variables involved in the line! Even the ranges of the for
loops are allocating lots of memory.
But all these allocations go away if I just don’t use the keyword argument in a function call inside my big and ugly function. The function with the kwarg is type stable either way. But looking more closely at the @code_warntype
output, I see that that function is represented as a Core.kwfunc
. I guess this screws things up??? Should I have known this somehow? Is this a bug?
My use case is a pretty big and ugly recurrence computation, but I’ve managed to simplify it as much as possible. Here, index
is the function with the kwarg that I’d like to use, inplace!
is the core computation (drastically simplified here), and compute_a
just sets things up and measures the allocations.
using Profile
function index(n, mp, m; n_max=n)
n + mp + m + n_max
end
function inplace!(a, n_max, index_func=index)
i1 = index_func(1, 2, 3; n_max=n_max) # This version allocates
# i1 = index_func(1, 2, 3) # This version doesn't allocate
i2 = size(a, 1) - 2i1
for i in 1:i2 # Allocates 3182688 B if using kwarg above
a[i + i1] = a[i + i1 - 1] # Allocates 9573120 B if using kwarg above
end
for i in 3:i2-4 # Allocates 3182576 B if using kwarg above
a[i + i1] -= a[i + i1 - 2] # Allocates 12771408 B if using kwarg above
end
end
function compute_a(n_max::Int64)
a = randn(Float64, 100_000)
inplace!(a, n_max, index)
Profile.clear_malloc_data()
inplace!(a, n_max, index)
end
compute_a(10)
[Curiously, I need both for
loops, or the allocations disappear.]
If I just remove the kwarg from the call to index_func
, the allocations all competely disappear; there are no allocations inside inplace!
in that case.