Unexpected performance outcome. Does accessing struct members cause allocation?

Hello,

I am trying to understand what affects the performance of code in Julia.

Please see below for three functions combo_with_max_sum, combo_with_max_sum_2 and combo_with_max_sum_3.

  1. I would have expected version 2 to have less allocations than version 1 because version 1 creates a new dictionary, whereas the latter does not. However, version 2 has far more allocations and is slower than version1. Why is this? Is this because the function is accessing struct fields?

  2. Version 3 is like version 2, except I pass struct fields instead of passing in the struct. This has zero allocations and is the fastest.

My question is: is it bad to pass structs to functions and then work with the fields in the function?

Thank you.

The code with benchmarks:

import OrderedCollections as OC
import BenchmarkTools as BT

struct MyStruct
    alternatives
    
    avmap
end


function init_mystruct()
    alternatives = [Symbol("A$(i)") for i = 1:10]
    @show alternatives
    avmap = OC.OrderedDict{Tuple{Symbol, Symbol}, Tuple{Float64, Float64}}()
    for a1 in alternatives
        for a2 in alternatives
            avmap[(a1, a2)] = (rand(), rand())
        end
    end
    MyStruct(alternatives, avmap)
end


function combo_with_max_sum(avmap)
    agg_avmap = OC.OrderedDict(combo => sum(vals) for (combo, vals) in avmap)
    findmax(agg_avmap)[2]
end


function combo_with_max_sum_2(mys)
    retv = (mys.alternatives[1], mys.alternatives[1])
    maxval = sum(mys.avmap[retv])
    for (combo, val) in mys.avmap
        s = sum(val)
        if s >= maxval
            maxval = s
            retv = combo
        end
    end
    return retv
end


function combo_with_max_sum_3(alternatives, avmap)
    retv = (alternatives[1], alternatives[1])
    maxval = sum(avmap[retv])
    for (combo, val) in avmap
        s = sum(val)
        if s >= maxval
            maxval = s
            retv = combo
        end
    end
    return retv
end


function pqmain()
    mystruct = init_mystruct()
    alts = mystruct.alternatives
    avmap = mystruct.avmap
    return (mystruct, alts, avmap)
end


julia> include("src/Temp/PerfQues.jl")
pqmain (generic function with 1 method)

julia> (mys, alts, avmap) = pqmain();
alternatives = [:A1, :A2, :A3, :A4, :A5, :A6, :A7, :A8, :A9, :A10]

julia> using BenchmarkTools

julia> @btime combo_with_max_sum($avmap)
  2.607 ÎĽs (14 allocations: 7.39 KiB)
(:A2, :A5)

julia> @btime combo_with_max_sum_2($mys)
  17.500 ÎĽs (1103 allocations: 39.14 KiB)
(:A2, :A5)

julia> @btime combo_with_max_sum_3($alts, $avmap)
  63.988 ns (0 allocations: 0 bytes)
(:A2, :A5)


The fields of your struct are of type Any which slows things down.
This is explained in the manual: Avoid containers with abstract type parameters

1 Like

I have not run your code, but this looks like the issue and explains why passing the fields instead of the struct causes no allocations.

From the perspective of the compiler/type system, your definition means that alternatives and avmap are <: Any and nothing else is communicated. This means at runtime, Julia has to inspect the types of these fields and dispatch at runtime to the proper method.

A quick fix to this would involve parameterizing MyStruct like so:

struct MyStruct{TA,TM}
    alternatives::TA
    avmap::TM
end

such that Julia could know the types of these fields at compiletime.

The reason your last implementation works so well is that when you pass in the explicit objects, the compiler know the exact types and can compile efficient code without runtime dispatch.

3 Likes

They are not of type Any. Here is the info from the repl.

julia> typeof(mys.avmap)
OrderedCollections.OrderedDict{Tuple{Symbol, Symbol}, Tuple{Float64, Float64}}

julia> typeof(mys.alternatives)
Vector{Symbol} (alias for Array{Symbol, 1})
julia> struct MyStruct
           alternatives
           
           avmap
       end

julia> fieldtype(MyStruct, 1)
Any

julia> fieldtype(MyStruct, 2)
Any

help?> fieldtype
search: fieldtype fieldtypes

  fieldtype(T, name::Symbol | index::Int)


  Determine the declared type of a field (specified by name or index) in a composite DataType T.

Thanks @sbuercklin. Your solution worked. Parameterizing the struct caused the version 2 to become faster than version 1 and use zero allocations.

Although, I don’t understand why. I read the document linked by @fatteneder. But in those examples, the parameterized structs are at least given some abstract type information, such as AbstractFloat. In the parameterization you suggested, and which worked, it is not clear what information we are providing to the compiler. Is it that once the compiler sees the types of alternatives and avmap, it now knows the type of the MyStruct because of that parameterization? If so, it is not clear why the compiler cannot figure it out itself. But, at least, this helps for me to know how to write the code.

Thank you to both of you.

If I could make one breaking change to Julia (that I’m thinking about this moment), it would be to automatically make structs concretely typed unless explicitly chosen to be abstract

It’s such a footgun and adds annoying boilerplate.

I think suspect the issue is that Julia compiles code based on the function name and input argument types. So, calling combo_with_max_sum_2(mys), where the provided mys is simply of type ::MyStruct must compile code that works for any input argument of type MyStruct.

By contrast, if you parameterize the fields as suggested, then the input argument mys will be of type MyStruct{TA, TM} where the additional TA and TM type information can be taken advantage of at compile time.

1 Like

Thank you. I should check this, but before doing so, this would mean that we should parameterize even when the fields have concrete types. For example, consider the following. It would suggest that passing struct S2 would result in slower code. Of course, we can check if this is the case.

julia> struct S1{Int64}
           f1::Int64
       end

julia> struct S2
           f1::Int64
       end

julia> v1 = S1(1)
S1{Int64}(1)

julia> v2 = S2(2)
S2(2)

julia> typeof(v1)
S1{Int64}

julia> typeof(v2)
S2

This shouldn’t affect compilation because S2’s f1 field is required to be of type Int64.

The difference is that, by not specifying the type for a field or parameterizing it, Julia has to treat it as simply <:Any and can’t provide the compiler with any guarantees about what it actually is.

So, you either need to: concretely specify what type each field is, parameterize the types, or lose out of some potential extra performance for the sake of simple code.

2 Likes

Think of it this way. The types of the fields in a struct determines the layout in memory of the struct. If the fields are not concrete, like Any, AbstractFloat etc, the field is of unknown size, and the compiler just makes room for a pointer to … something … which is determined at run time.

If the field is of a concrete type, like Int or Dict{Int, Float64}, the compiler knows the size, and can make room in the struct for this type (which of course may contain pointers etc).

In your non-parametric MyStruct, there are just two pointers to something which the compiler knows nothing about. In the parametric MyStruct, the parameters, when not explicitly specified, will be taken from the arguments to the constructor, i.e. from a and b in MyStruct(a, b). They will determine the layout of the struct, and the compiler can generate code for these particular types.

There are cases where you want abstract fields in structs, but not if these structs are used directly in code which should be performant.

2 Likes

Thank you, this helps understand it better.

Still I am not sure why cannot the compiler figure out the type of mys in the non-parametric struct, when the value mys = MyStruct(a,b) is created. It knows the types of a and b at the time of construction. Then, it should be able to compile the function for that type of mys. Isn’t that what JIT compilation does for functions whose argument types are not speficied explicitly. In other words, is this a non-exploited compiler optimization that is just yet to be implemented, or is it impossible to do for some reason?

It’s not that the compile can’t do it, it won’t do it, by design.

struct Foo
    a
end

is the same as

struct Foo
    a::Any
end

This is exactly one type, known at compile time. The memory layout of a struct with an Any (or other abstract type) will always be a reference on the heap, and it will always have to check what actual concrete type it is, because at compile time, it literally could be any type.

If you specify a type parameter, like so:

struct Foo{T}
    a::T
end

The compiler will now create a new Foo type every time it sees a new concrete type. So effectively it’s equivalent to automatically doing this:

struct FooInt
    a::Int
end

struct FooFloat64
    a::Float64
end

struct FooString
    a::String
end

...etc...

“I want one type that can contain anything”
vs.
“I want to generate lots of types that can contain only one type of thing.”

There are uses for the former - sometimes you don’t want the compiler to bother generating 1000s of variants of a type, which can additionally bog down compilation. So you don’t necessarily want the compiler automatically doing the latter.

6 Likes

Thank you for explaining this. This makes things much clearer.

1 Like

For what it’s worth, one of my favourite packages is ConcreteStructs.jl which makes it basically automatic to do this. it’s an absolute lifesaver for large complicated structs

3 Likes

I just want to give a, hopefully, illustrative example:

Pretend that you are the compiler, and you have access to this piece of code:

struct A
    x
end
struct B
    x::Int
end
struct C{T}
    x::T
end

function foo(a, b)
    return a.x + b.x
end

Now, your task is to compile the following:

foo(::A)
foo(::B)
foo(::C{Float64})

This is all the information you get. The source code of the function foo and the type definitions for A, B and C.

Ask yourself: “What sort of machine code should I generate for foo(::A) ?” Knowing the type of the inputs that were used to create the values a and b is not helpful, because you do not have access to a and b, only to the definition of foo and the definition of A. If you are going to inspect the types of the values contained in a and b, it must happen at runtime, giving dynamic dispatch, which is a performance problem. The static compilation step only knows foo(::A).

Moving on: What sort of machine code should be generated for foo(::B)? This is easier, because you know that the type of the field x is Int, just by looking at the type definition.

And the same for foo(::C{Float64}). You know that x is of type Float64, simply by looking the definition of C{T}, since for C{Float64}, T == Float64 and x::T == x::Float64 (not real code.)

It’s too easy to forget sometimes, that the compiler does not know the values, only the types. So when compiling foo(::A), you really have very limited information.

7 Likes

Channeling my inner compiler, I would hit you right with some kind of error that I can’t find any one-argument definition for foo :sweat_smile:

But seriously, the explanation (and all the previous ones) is pretty nice!

1 Like

It can be a bit confusing that for function arguments there is no performance benefit in specifying types, so it is typically discouraged to enable code reuse. For struct members there is usually a huge performance benefit, and they should be concrete.

3 Likes

Aargh! I changed from one to two-arg example to simplify, but this just messed it up.

2 Likes