Memory allocations with dynamic dispatch may vary with return value

Hello,
I’m new to Julia, and I have been looking into reducing some memory allocations due to heterogeneous storage and dynamic dispatch. I’ve encountered a result that I find puzzling. I’ve built an almost-MWE to illustrate.

using Profile

# Abstract supertype. All subtypes are expected to provide a get_value function
abstract type AbstractTestWrapper end

@noinline function get_value(w::AbstractTestWrapper)
    error("Implement get_value")
end

@noinline function process_wrappers(ws::Vector{AbstractTestWrapper})::Int
    # Were this not a MWE, something important would be done with 'v'
    v::Int = 0
    for i in 1:length(ws)
        v += get_value(ws[i])
    end

    return v
end

# Simple subtype: Stores the value to be returned by get_value
mutable struct TestWrapper{T} <: AbstractTestWrapper
    value::Int
    state::T
end

@noinline function get_value(w::TestWrapper{T})::Int where {T<:Any}
    return w.value
end



function test_wrappers()
    w1::TestWrapper = TestWrapper{Int}(256,0)
    w2::TestWrapper = TestWrapper{Int}(512,0)

    # Vector storage of (w1,w2) as supertype
    # This obfuscation seems to be necessary to illustrate the inconsistency
    warr1::Vector{AbstractTestWrapper} = Vector{AbstractTestWrapper}(undef,0)
    warr2::Vector{AbstractTestWrapper} = Vector{AbstractTestWrapper}(undef,0)

    push!(warr1,w1)
    push!(warr2,w2)

    # Run once to compile
    process_wrappers(warr1)

    # Measure allocations multiple times
    for i = 1:2
        for j = 1:4
            @time process_wrappers((i==1) ? warr1 : warr2)
        end
    end

end

I’ve defined an abstract supertype, with a parametric subtype.
The function process_wrappers receives a vector of the supertype, and calls the get_value function on each element in turn.

The puzzling part is the resulting memory allocations from the code in test_wrappers

I would expect, due to dynamic dispatch, that all calls to get_value in the process_wrappers function should require a heap allocation, as the return type is not known in advance, yet it actually appears to be contingent on the value being returned.

My output from the testwrappers function is

  0.000002 seconds
  0.000000 seconds
  0.000000 seconds
  0.000002 seconds
  0.000000 seconds (1 allocation: 16 bytes)
  0.000000 seconds (1 allocation: 16 bytes)
  0.000000 seconds (1 allocation: 16 bytes)
  0.000000 seconds (1 allocation: 16 bytes)

Playing around with initialising w1 and w2, it seems if the value field is 512 or above, a heap allocation occurs. Smaller values get away with 0 allocations.
If I switch the types from Int to UInt, the cutoff becomes 1024 (0x400).

My current working theory (entirely speculative) is that small enough values can’t possibly be valid memory addresses, which rules out all but primitive types, eliminating the need to consider return types that would mandate a heap allocation.

Are there any other explanations? Something less low-level, perhaps?

Thanks in advance!

1 Like

the allocation you’re seeing is a benchmarking artifact and is caused by Julia boxing the return value. you aren’t seeing it for small numbers because Julia has a cache of the 512 smallest Ints

3 Likes

Bear in mind these are compiler implementation details, not language features, so while most compiler optimizations are good enough to stick around for a long time, they’re not guaranteed to persist across versions. I’m using v1.10.1 specifically.

  1. Note that in general you don’t need to run+compile once before the @time to skip compilation time, the overall compilation of test_wrappers() will compile inner calls it can infer and statically dispatch. It’s just this case where you’re not entirely sure if it can be inferred.

  2. The annotations of variables and method return types semantically add convert and typeassert steps to attempt to convert the right hand value and assert it resulted in the declared type (errors thrown if either failed). That doesn’t mean instability and runtime dispatches don’t happen, the inferred type just gets stabilized really quickly, which is not great for profiling type inference.

  1. get_value calls in process_wrappers is indeed dynamically dispatched but the return type is inferred as Int, despite your thorough prevention of inlining optimizations. Half the reason is that the get_value(w::TestWrapper{T}) method alone could infer that w.value isa Int because w isa TestWrapper. Note that method-wise type inference like this should be unusual; usually inference starts at types from a call’s arguments, not the method’s type annotations. The other half is that Julia has another optimization where an unstable call will check if there are <4 applicable methods sharing a method-wise inferred return type; in your case, the only other get_value method throws an error, so get_value is inferred as Int and somehow shaves off 1 of the 2 allocations of the dynamic dispatch. If you define get_value methods for 2 more subtypes of AbstractTestWrapper, even if you never call them, this optimization goes away and the return type would be inferred as Any. This optimization also wouldn’t happen if the field access occurred directly in process_wrappers; ws[i] isa AbstractTestWrapper only allows inferring ws[i].value isa Any, for example.

  2. The dynamic dispatch’s remaining allocation does seem specific to warr2, no matter how I tweak the if-statement or reorder warr1 and warr2 assignments. If you remove annotations mentioned in (2) and change v and value to Float64, that 1 allocation happens to both warr1 and warr2. As you point out, the allocation vanishes for 0:1023 of unsigned integers, and I verified the cutoff is -512:511 of signed integers. That sounds like the aforementioned cache, not sure where to check though.

2 Likes