I have spend several hours to eliminate a very elusive allocation that happens in a package that I intend to run on a GPU, which any allocation is fatal. I have tried all the standard methods to eliminate it (@code_warntype,--track-allocation=user) unsuccessfully. I have managed to produce a minimal reproducer that perhaps somebody could have a look for an idea. The central object is MyState, which is a mutable struct that behaves as a stack. Here is the code:
Reproducer code
using StaticArrays
using Rotations
using BenchmarkTools
const Point3 = SVector{3}
mutable struct MyState{T<:AbstractFloat}
currentDepth::Int64
volstack::SVector{32,UInt32}
function MyState{T}() where T<:AbstractFloat
state = new{T}()
reset!(state)
return state
end
end
function reset!(state::MyState{T}) where T<:AbstractFloat
state.currentDepth = 0
state.volstack = zero(SVector{32,UInt32})
nothing
end
function pushIn!(state::MyState{T}, vol::Int64) where T<:AbstractFloat
state.currentDepth += 1
state.volstack = setindex(state.volstack, vol, state.currentDepth)
nothing
end
function popOut!(state::MyState{T}) where T<:AbstractFloat
if state.currentDepth > 0
state.currentDepth -= 1
end
nothing
end
function MylocateGlobalPoint!(state::MyState{T}, point::Point3{T}) where T<:AbstractFloat
reset!(state)
if point[1] > 0.
#return 2
pushIn!(state,999)
end
return state.currentDepth
end
function test(point::Point3{T}) where T<:AbstractFloat
state = MyState{T}()
MylocateGlobalPoint!(state, point)
end
point = Point3{Float64}(1,1,1)
@btime test(point)
Running it I obtain 1 allocation of 144 bytes (probably the size of MyState). The strange thing is that tracing the allocation tells me that it happens in the object constructor, in the line state = MyState{T}(). However if I comment the line pushIn!(state,999) and uncomment return 2 in the function MylocateGlobalPoint!(...) there is no allocation.
I could perhaps understand that the type MyState is badly implemented using a static vector to emulate a stack but what I do not understand is that the behaviour changes changing the function that is called after the instantiation.
isnât that the definition of âallocationâ⌠I mean allocation on stack is still allocation and once the immutable object doesnât fit on stack, it will be allocated on the heap, itâs not a guarantee Julia makes about immutable objects.
I would guess there is a difference between creating the object in the global space (interpreter) or in a function.
julia> function f()
state = MyState{Float64}()
state.currentDepth
end
f (generic function with 1 method)
julia> @btime f()
1.559 ns (0 allocations: 0 bytes)
0
I donât think thatâs anything to do with it. In that case your code that returns the Int should also allocate.
I think state is allocated, on the heap, but when you just pick out and return the one bitstype field, without using anything else, the rest of the object is optimized away, because itâs not needed.
This can happen with MArrays too, if you convert them into an SArray, it can also be âoptimized awayâ.
I do no think it can optimise all away. Have a look at
julia> function f()
state = MyState{Float64}()
pushIn!(state,9)
pushIn!(state,99)
return sum(state.volstack)
end
f (generic function with 1 method)
julia> @btime f()
2.624 ns (0 allocations: 0 bytes)
0x0000006c
julia> function f()
state = MyState{Float64}()
pushIn!(state,9)
pushIn!(state,99)
return sum(state.volstack)
end
f (generic function with 1 method)
julia> @code_llvm f()
; @ REPL[4]:1 within `f`
; Function Attrs: uwtable
define i32 @julia_f_573() #0 {
L188:
; @ REPL[4]:5 within `f`
ret i32 108
}
Thank-you very much, but I do not need to return the State mutable or immutable. The State is created in a function and used in subsequent functions and loops and can be destroyed when returning. It is really a temporary and fixed size, and this is exactly why the allocation in the stack is ideal for this. I really do not understand why Julia does not do it. It is ridiculous that in oder to avoid allocations I need to pass a State instance (i.e. a temporary working object) from the function caller.
If State is mutable and it is passed to other functions (inside the scope it was created, I am not talking about returning), then Julia probably loses sight of it and it is not able to rule out that, for example, some of these functions has saved it in a global variable. To save something mutable in the stack Julia needs to assure itself that the reference cannot escape by any means possible. Which includes both global variables and it becoming the value of a field in another structure that is returned.
Thanks very much. I understand that Julia needs to assure any leaking, in the original reproducer I do not see what makes Julia to think that a reference is escaping and therefore needs to allocate the object State. Probably what happens then is that to be absolutely sure it allocates in the heap always. Is this correct? So, there is no way to have a mutable object in the stack? Is there a pragma or similar to tell the compiler how to allocate the object?