Hello,
I’m new to Julia, and I have been looking into reducing some memory allocations due to heterogeneous storage and dynamic dispatch. I’ve encountered a result that I find puzzling. I’ve built an almost-MWE to illustrate.
using Profile
# Abstract supertype. All subtypes are expected to provide a get_value function
abstract type AbstractTestWrapper end
@noinline function get_value(w::AbstractTestWrapper)
error("Implement get_value")
end
@noinline function process_wrappers(ws::Vector{AbstractTestWrapper})::Int
# Were this not a MWE, something important would be done with 'v'
v::Int = 0
for i in 1:length(ws)
v += get_value(ws[i])
end
return v
end
# Simple subtype: Stores the value to be returned by get_value
mutable struct TestWrapper{T} <: AbstractTestWrapper
value::Int
state::T
end
@noinline function get_value(w::TestWrapper{T})::Int where {T<:Any}
return w.value
end
function test_wrappers()
w1::TestWrapper = TestWrapper{Int}(256,0)
w2::TestWrapper = TestWrapper{Int}(512,0)
# Vector storage of (w1,w2) as supertype
# This obfuscation seems to be necessary to illustrate the inconsistency
warr1::Vector{AbstractTestWrapper} = Vector{AbstractTestWrapper}(undef,0)
warr2::Vector{AbstractTestWrapper} = Vector{AbstractTestWrapper}(undef,0)
push!(warr1,w1)
push!(warr2,w2)
# Run once to compile
process_wrappers(warr1)
# Measure allocations multiple times
for i = 1:2
for j = 1:4
@time process_wrappers((i==1) ? warr1 : warr2)
end
end
end
I’ve defined an abstract supertype, with a parametric subtype.
The function process_wrappers receives a vector of the supertype, and calls the get_value function on each element in turn.
The puzzling part is the resulting memory allocations from the code in test_wrappers
I would expect, due to dynamic dispatch, that all calls to get_value
in the process_wrappers
function should require a heap allocation, as the return type is not known in advance, yet it actually appears to be contingent on the value being returned.
My output from the testwrappers
function is
0.000002 seconds
0.000000 seconds
0.000000 seconds
0.000002 seconds
0.000000 seconds (1 allocation: 16 bytes)
0.000000 seconds (1 allocation: 16 bytes)
0.000000 seconds (1 allocation: 16 bytes)
0.000000 seconds (1 allocation: 16 bytes)
Playing around with initialising w1
and w2
, it seems if the value
field is 512 or above, a heap allocation occurs. Smaller values get away with 0 allocations.
If I switch the types from Int to UInt, the cutoff becomes 1024 (0x400).
My current working theory (entirely speculative) is that small enough values can’t possibly be valid memory addresses, which rules out all but primitive types, eliminating the need to consider return types that would mandate a heap allocation.
Are there any other explanations? Something less low-level, perhaps?
Thanks in advance!