Using a runtime-length `SVector` within a closure allocates memory

Consider the Test struct below.

julia> using StaticArrays

julia> struct Test{T,V}
           x::Int
           v::V
           Test(x::Int, v::V) where {T<:Real,V<:AbstractVector{T}} = new{T,V}(x, v)
       end

Let’s write a helper method called _make_test that does some work and then returns a function for constructing a Test:

julia> function _make_test(::Type{T}=Float64) where {T<:Real}
           x = rand(1:10)
           v = SVector{x,T}([i for i in 1:x])
           () -> Test(x, v)
       end
_make_test (generic function with 2 methods)

julia> let
           make_test_Float16 = _make_test(Float16)
           @allocated make_test_Float16()
       end
88088

To my surprise, this approach allocates memory. However, if I replace the SVector with a regular vector, then the construction of Test no longer allocates any memory:

julia> function _make_test(::Type{T}=Float64) where {T<:Real}
           x = rand(1:10)
           v = T[i for i in 1:x]
           () -> Test(x, v)
       end
_make_test (generic function with 2 methods)

julia> let
           make_test_Float16 = _make_test(Float16)
           @allocated make_test_Float16()
       end
0

I have two questions:

  1. Why does the first approach allocate memory but the second one does not?
  2. Can we make the first approach non-allocating?

Thanks in advance!

what’s happening is that since the compiler can’t know what the length of v is when it creates the closure (since it’s dynamics), it also can’t compile a method ahead of time which specializes on the type of v, hence any code using v inside make_test_Float16 has to allocate and dynamically dispatch.

Within _make_test, the length of the vector v is determined during runtime with x = rand(1:10). I understand we can only know x during runtime after calling _make_test, but I cannot understand why the closure wouldn’t be aware of the length of the vector (after calling _make_test(Float16) but before evaluating the closure with make_test_Float16()).

It can do that, but since the call to make_test_Float16() is in the same compilation unit as the dynamic part, it doesn’t. You can make it do that by creating a function barrier though:

julia> function call_barrier(f::F, args...) where {F}
           f(args...)
       end;

julia> let
           make_test_Float16 = _make_test(Float16)
           call_barrier(make_test_Float16) do f
               @allocated f() 
           end
       end
0

Since call_barrier needs to specialize on the type of make_test_Float16, it forces its type to be resolved before it is called.

See also: this section of the performance tips

4 Likes

Thanks for your help! I don’t know what you meant by “compilation unit”, but I’m guessing that this has to do with the way Julia code gets converted into optimised machine code — in my original snippet the closure is part of the same unit as the assignment operation for x and so they are effectively “tied” together and the compiler optimisation is not able to specialise on the length of the vector.

Introducing the function barrier is a way of explicitly splitting the x = rand(1:10); v = T[i for i in 1:x] and () -> Test(x, v) into two distinct units that the compiler can optimise separately.

I benefitted from trying to rewrite your solution without the do-block syntax, so I’m going to share them below for future reference.

Using the call_barrier and a named function:

julia> let
           make_test_Float16 = _make_test(Float16)
           execute_function(f) = @allocated f()
           call_barrier(execute_function, make_test_Float16)
       end

Using the call_barrier and an anonymous function:

julia> let
           make_test_Float16 = _make_test(Float16)
           call_barrier((f -> @allocated f()), make_test_Float16)
       end

Using an (inner) anonymous function wrapped in an (outer) anonymous function (which in the previous examples would be call_barrier):

julia> let
           make_test_Float16 = _make_test(Float16)
           (() -> @allocated make_test_Float16())()
       end

yeah, that’s right. A compilation unit is basically whatever chunk of code we send to LLVM to make a compiled binary out of. A toplevel let block is often a compilation unit, and functions are often compilation units, but also functions can be inlined so that they’re actually just ‘one’ function from LLVM’s point of view.

Because of stuff like let blocks and inlining, it’s typically better to use the term ‘compilation unit’ if you want to talk about “one thing” that the compiler is looking at.

Here’s a little example:

julia> f(x) = x + 1;

julia> outer1(x) = f(x) - 1; 

julia> outer2(x) = @noinline(f(x)) - 1;

julia> code_llvm(outer1, Tuple{Int}; debuginfo=:none)
; Function Signature: outer1(Int64)
define i64 @julia_outer1_4124(i64 signext %"x::Int64") #0 {
top:
  ret i64 %"x::Int64"
}

julia> code_llvm(outer2, Tuple{Int}; debuginfo=:none)
; Function Signature: outer2(Int64)
define i64 @julia_outer2_4137(i64 signext %"x::Int64") #0 {
top:
  %0 = call i64 @j_f_4140(i64 signext %"x::Int64")
  %1 = add i64 %0, -1
  ret i64 %1
}

(here’s what that looks like before being sent to LLVM):

julia> code_typed(outer1, Tuple{Int})
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = Base.add_int(x, 1)::Int64
│   %2 = Base.sub_int(%1, 1)::Int64
└──      return %2
) => Int64

julia> code_typed(outer2, Tuple{Int})
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = invoke Main.f(x::Int64)::Int64
│   %2 = Base.sub_int(%1, 1)::Int64
└──      return %2
) => Int64
1 Like