Debugging strange allocations

jw3126 · June 20, 2019, 3:56pm

Motivation

When optimizing julia code, my first step is usually to make sure the compiler infers everything. There are great tools like @code_warntype and Cthulhu, that allow to diagnose inference problems in a systematic and efficient way.

Often this step is enough to get decent performance. Sometimes however there are still lots of allocations, without obvious reason. See for instance here. Lately this consumes significant amounts of my dev time, so I would like to understand better whats going on and be more systematic about solving it.

Simple allocation model

Here is my simple allocation model. I use it to reason about my code.

Constructing a bitstype does not allocate
Constucting a non bitstype does allocate
Running code that @code_warntype complains about does allocate

Any allocation that cannot be explained by this, I will call strange.

Examples of allocations explained by the model

obvious0(x) = x
obvious0(1)  # expected compilation allocation

obvious1(x) = [x]
obvious1(42) # warmup
obvious1(42) # expected allocation due to non bitstype construction

struct MyBox; content; end
obvious2(x::MyBox) = x.content
x = MyBox(5)
obvious2(x) # warmup
obvious2(x) # expected allocation due to type instability

Example of strange allocation

struct TypeStableBox{T}; inner::T; end
unwrap(b::TypeStableBox) = b.inner
@noinline kw(;a,b) = unwrap(a) + unwrap(b)
function strange_kw(a,b)
    ret = unwrap(a)
    for _ in 1:1000
        ret += kw(a=a,b=b)
    end
    ret
end
a = TypeStableBox(1)
b = TypeStableBox(2)
strange_kw(a,b)  # expected allocation due to compilation
@time strange_kw(a,b) # strange allocation
# 0.000019 seconds (2.00 k allocations: 31.422 KiB)

Example of strange allocation

struct TypeStableBox{T}; inner::T; end
unwrap(b::TypeStableBox) = b.inner
@noinline argsplat(ab...) = _argsplat(ab...)
@noinline _argsplat(a,b) = unwrap(a) + unwrap(b)
function strange_argsplat(a,b)
    ret = unwrap(a)
    for _ in 1:1000
        ret += argsplat(a,b)
    end
    ret
end
a = TypeStableBox(1)
b = TypeStableBox(2)
strange_argsplat(a,b)  # expected allocation due to compilation
@time strange_argsplat(a,b) # strange allocation
#   0.000044 seconds (2.00 k allocations: 31.422 KiB)

So here are a couple of questions:

What would be a better allocation model?
Given a code that has strange allocations, how to diagnose where they occur?
Can I see symptoms/warning signs of strange allocations in @code_warntype?
How do other people debug these?

Elrod · June 20, 2019, 4:07pm

I cannot reproduce.

julia> struct TypeStableBox{T}; inner::T; end

julia> unwrap(b::TypeStableBox) = b.inner
unwrap (generic function with 1 method)

julia> @noinline kw(;a,b) = unwrap(a) + unwrap(b)
kw (generic function with 1 method)

julia> function strange_kw(a,b)
           ret = unwrap(a)
           for _ in 1:1000
               ret += kw(a=a,b=b)
           end
           ret
       end
strange_kw (generic function with 1 method)

julia> a = TypeStableBox(1)
TypeStableBox{Int64}(1)

julia> b = TypeStableBox(2)
TypeStableBox{Int64}(2)

julia> strange_kw(a,b)  # expected allocation due to compilation
3001

julia> @time strange_kw(a,b)
  0.000004 seconds (5 allocations: 176 bytes)
3001

julia> using BenchmarkTools

julia> @benchmark strange_kw($a,$b)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.119 μs (0.00% GC)
  median time:      1.123 μs (0.00% GC)
  mean time:        1.177 μs (0.00% GC)
  maximum time:     2.970 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

You will in general see allocations when dynamic dispatching.

jw3126 · June 20, 2019, 4:09pm

Dynamic dispatch is covered by the @code_warntype complains rule. Interesting to know that you can’t reproduce. I currently use Julia 1.1.0:

versioninfo()

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, ivybridge)
Environment:
  JULIA_NUM_THREADS = 8

Elrod · June 20, 2019, 4:11pm

I was referring to:

@time strange_kw(a,b)

in the global scope as being the dynamic dispatch.
I’m on:

julia> versioninfo()
Julia Version 1.3.0-DEV.428
Commit 863e03fbb1* (2019-06-19 15:07 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.0 (ORCJIT, skylake)

So there’s a chance things are different in 1.3.
Can you post the result when you use BenchmarkTools like I did?

jw3126 · June 20, 2019, 4:14pm

It is weird. I get no allocations, when doing @benchmark for the second time. I was pretty sure, that @benchmark does a warmup call, so there is no compilation overhead reported??

julia> struct TypeStableBox{T}; inner::T; end

julia> unwrap(b::TypeStableBox) = b.inner
unwrap (generic function with 1 method)

julia> @noinline kw(;a,b) = unwrap(a) + unwrap(b)
kw (generic function with 1 method)

julia> function strange_kw(a,b)
           ret = unwrap(a)
           for _ in 1:1000
               ret += kw(a=a,b=b)
           end
           ret
       end
strange_kw (generic function with 1 method)

julia> a = TypeStableBox(1)
TypeStableBox{Int64}(1)

julia> b = TypeStableBox(2)
TypeStableBox{Int64}(2)

julia> using BenchmarkTools

julia> @benchmark strange_kw($a,$b)
BenchmarkTools.Trial: 
  memory estimate:  31.25 KiB
  allocs estimate:  2000
  --------------
  minimum time:     15.132 μs (0.00% GC)
  median time:      16.334 μs (0.00% GC)
  mean time:        23.670 μs (24.53% GC)
  maximum time:     41.522 ms (99.88% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark strange_kw($a,$b)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.777 μs (0.00% GC)
  median time:      1.777 μs (0.00% GC)
  mean time:        1.824 μs (0.00% GC)
  maximum time:     3.853 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

Elrod · June 20, 2019, 4:19pm

That is weird. I don’t know why it would allocate the second time.
I’m also fairly sure that BenchmarkTools does a warmup call. Your first @benchmark was 10_000 samples anyway, of which the miunimum was 15 microseconds.
Your second benchmark was 10_000 samples, that time with 10 evals/sample, and the max was < 3.9 microseconds, with min and median of 1.77.

Topic		Replies	Views
Finding the memory allocation in some code General Usage performance	3	884	August 25, 2017
2 identical versions of the code: one allocates the other not, when accessing type-unstable tuple. Why? Performance	3	813	October 1, 2018
How do I determine the memory allocations in Julia using @code_typed (or @code_llvm) General Usage	4	3103	September 4, 2018
Inappropriate memory allocation? Julia at Scale question , memory	24	2230	September 4, 2017
Understanding source of allocations when profiling New to Julia memory-allocation , profiling	4	1402	August 4, 2021

Debugging strange allocations

Motivation

Simple allocation model

Examples of allocations explained by the model

Example of strange allocation

Example of strange allocation

Related topics