julia> using StaticArrays
julia> struct Test{T,V}
x::Int
v::V
Test(x::Int, v::V) where {T<:Real,V<:AbstractVector{T}} = new{T,V}(x, v)
end
Let’s write a helper method called _make_test that does some work and then returns a function for constructing a Test:
julia> function _make_test(::Type{T}=Float64) where {T<:Real}
x = rand(1:10)
v = SVector{x,T}([i for i in 1:x])
() -> Test(x, v)
end
_make_test (generic function with 2 methods)
julia> let
make_test_Float16 = _make_test(Float16)
@allocated make_test_Float16()
end
88088
To my surprise, this approach allocates memory. However, if I replace the SVector with a regular vector, then the construction of Test no longer allocates any memory:
julia> function _make_test(::Type{T}=Float64) where {T<:Real}
x = rand(1:10)
v = T[i for i in 1:x]
() -> Test(x, v)
end
_make_test (generic function with 2 methods)
julia> let
make_test_Float16 = _make_test(Float16)
@allocated make_test_Float16()
end
0
I have two questions:
Why does the first approach allocate memory but the second one does not?
what’s happening is that since the compiler can’t know what the length of v is when it creates the closure (since it’s dynamics), it also can’t compile a method ahead of time which specializes on the type of v, hence any code using v inside make_test_Float16 has to allocate and dynamically dispatch.
Within _make_test, the length of the vector v is determined during runtime with x = rand(1:10). I understand we can only know x during runtime after calling _make_test, but I cannot understand why the closure wouldn’t be aware of the length of the vector (after calling _make_test(Float16) but before evaluating the closure with make_test_Float16()).
It can do that, but since the call to make_test_Float16() is in the same compilation unit as the dynamic part, it doesn’t. You can make it do that by creating a function barrier though:
julia> function call_barrier(f::F, args...) where {F}
f(args...)
end;
julia> let
make_test_Float16 = _make_test(Float16)
call_barrier(make_test_Float16) do f
@allocated f()
end
end
0
Since call_barrier needs to specialize on the type of make_test_Float16, it forces its type to be resolved before it is called.
Thanks for your help! I don’t know what you meant by “compilation unit”, but I’m guessing that this has to do with the way Julia code gets converted into optimised machine code — in my original snippet the closure is part of the same unit as the assignment operation for x and so they are effectively “tied” together and the compiler optimisation is not able to specialise on the length of the vector.
Introducing the function barrier is a way of explicitly splitting the x = rand(1:10); v = T[i for i in 1:x] and () -> Test(x, v) into two distinct units that the compiler can optimise separately.
yeah, that’s right. A compilation unit is basically whatever chunk of code we send to LLVM to make a compiled binary out of. A toplevel let block is often a compilation unit, and functions are often compilation units, but also functions can be inlined so that they’re actually just ‘one’ function from LLVM’s point of view.
Because of stuff like let blocks and inlining, it’s typically better to use the term ‘compilation unit’ if you want to talk about “one thing” that the compiler is looking at.