Motivation
When optimizing julia code, my first step is usually to make sure the compiler infers everything. There are great tools like @code_warntype
and Cthulhu, that allow to diagnose inference problems in a systematic and efficient way.
Often this step is enough to get decent performance. Sometimes however there are still lots of allocations, without obvious reason. See for instance here. Lately this consumes significant amounts of my dev time, so I would like to understand better whats going on and be more systematic about solving it.
Simple allocation model
Here is my simple allocation model. I use it to reason about my code.
- Constructing a bitstype does not allocate
- Constucting a non bitstype does allocate
- Running code that
@code_warntype
complains about does allocate
Any allocation that cannot be explained by this, I will call strange.
Examples of allocations explained by the model
obvious0(x) = x
obvious0(1) # expected compilation allocation
obvious1(x) = [x]
obvious1(42) # warmup
obvious1(42) # expected allocation due to non bitstype construction
struct MyBox; content; end
obvious2(x::MyBox) = x.content
x = MyBox(5)
obvious2(x) # warmup
obvious2(x) # expected allocation due to type instability
Example of strange allocation
struct TypeStableBox{T}; inner::T; end
unwrap(b::TypeStableBox) = b.inner
@noinline kw(;a,b) = unwrap(a) + unwrap(b)
function strange_kw(a,b)
ret = unwrap(a)
for _ in 1:1000
ret += kw(a=a,b=b)
end
ret
end
a = TypeStableBox(1)
b = TypeStableBox(2)
strange_kw(a,b) # expected allocation due to compilation
@time strange_kw(a,b) # strange allocation
# 0.000019 seconds (2.00 k allocations: 31.422 KiB)
Example of strange allocation
struct TypeStableBox{T}; inner::T; end
unwrap(b::TypeStableBox) = b.inner
@noinline argsplat(ab...) = _argsplat(ab...)
@noinline _argsplat(a,b) = unwrap(a) + unwrap(b)
function strange_argsplat(a,b)
ret = unwrap(a)
for _ in 1:1000
ret += argsplat(a,b)
end
ret
end
a = TypeStableBox(1)
b = TypeStableBox(2)
strange_argsplat(a,b) # expected allocation due to compilation
@time strange_argsplat(a,b) # strange allocation
# 0.000044 seconds (2.00 k allocations: 31.422 KiB)
So here are a couple of questions:
- What would be a better allocation model?
- Given a code that has strange allocations, how to diagnose where they occur?
- Can I see symptoms/warning signs of strange allocations in
@code_warntype
? - How do other people debug these?