I’ve been running into a bizarre problem with huge allocations, and I’ve just discovered that the problem can be avoided if I don’t use a keyword argument to a function inside my main computation function. The allocations aren’t happening in a line that actually uses the kwarg; the allocations come later.
@code_warntype shows no problems, and specifically everything directly involved in the worst lines are just
Float64. Even for my MWE below, the worst line — which I wouldn’t expect to allocate at all — averages ~128 B of allocation per iteration. I don’t even see that many bytes in all the variables involved in the line! Even the ranges of the
for loops are allocating lots of memory.
But all these allocations go away if I just don’t use the keyword argument in a function call inside my big and ugly function. The function with the kwarg is type stable either way. But looking more closely at the
@code_warntype output, I see that that function is represented as a
Core.kwfunc. I guess this screws things up??? Should I have known this somehow? Is this a bug?
My use case is a pretty big and ugly recurrence computation, but I’ve managed to simplify it as much as possible. Here,
index is the function with the kwarg that I’d like to use,
inplace! is the core computation (drastically simplified here), and
compute_a just sets things up and measures the allocations.
using Profile function index(n, mp, m; n_max=n) n + mp + m + n_max end function inplace!(a, n_max, index_func=index) i1 = index_func(1, 2, 3; n_max=n_max) # This version allocates # i1 = index_func(1, 2, 3) # This version doesn't allocate i2 = size(a, 1) - 2i1 for i in 1:i2 # Allocates 3182688 B if using kwarg above a[i + i1] = a[i + i1 - 1] # Allocates 9573120 B if using kwarg above end for i in 3:i2-4 # Allocates 3182576 B if using kwarg above a[i + i1] -= a[i + i1 - 2] # Allocates 12771408 B if using kwarg above end end function compute_a(n_max::Int64) a = randn(Float64, 100_000) inplace!(a, n_max, index) Profile.clear_malloc_data() inplace!(a, n_max, index) end compute_a(10)
[Curiously, I need both
for loops, or the allocations disappear.]
If I just remove the kwarg from the call to
index_func, the allocations all competely disappear; there are no allocations inside
inplace! in that case.