Significant decrease in performance after seemingly irrelevant changes

Hi!

I recently revised some aspects of a large project I’m working on, just moving around some data/functions without altering the behavior of the code. However, it resulted in significant decrease in performance in seemingly other parts of the code.

The performance decrease stems from the calcInput! function of a struct I have (see the bottom of this post). There are some changes in how this function behaves (e.g. how \gamma parameter is handled), I am having performance issues in places never expected.

As an example, consider near the last line, u ./= asum. Profiling in the old code with dt = 0.001 gives:

8   ...BaseController.jl:37; calcInput!(::SpecificationM...
              7 ./broadcast.jl:751; materialize!
               1 ./abstractarray.jl:75; axes
                1 ./array.jl:155; size
               6 ./broadcast.jl:792; copyto!
                6 ./broadcast.jl:837; copyto!
                 6 ./simdloop.jl:73; macro expansion
                  5 ./broadcast.jl:838; macro expansion
                   2 ./array.jl:769; setindex!
                   3 ./broadcast.jl:507; getindex
                    2 ./broadcast.jl:546; _broadcast_getindex
                     2 ./broadcast.jl:570; _getindex
                      2 ./broadcast.jl:540; _broadcast_getindex
                       2 ./array.jl:731; getindex
                    1 ./broadcast.jl:547; _broadcast_getindex
                     1 ./broadcast.jl:574; _broadcast_getindex_evalf
                      1 ./float.jl:401; /
                  1 ./int.jl:53; macro expansion

While the new code produces:

1117 ...aseController.jl:37; calcInput!(::Specification...
              16  ./broadcast.jl:1163; broadcasted(::Function, :...
              382 ./broadcast.jl:1166; broadcasted(::Function, :...
               330 ./broadcast.jl:1168; broadcasted
                31 ./broadcast.jl:176; Base.Broadcast.Broadcaste...
                 20 ./broadcast.jl:176; Type
                  20 ./broadcast.jl:167; Type
              21  ./broadcast.jl:751; materialize!(::Array{Floa...
               11 ./broadcast.jl:792; copyto!
                4 ./broadcast.jl:836; copyto!
                 4 ./broadcast.jl:819; preprocess
                  4 ./broadcast.jl:822; preprocess_args
                   4 ./broadcast.jl:823; preprocess_args
                    4 ./broadcast.jl:820; preprocess
                     4 ./broadcast.jl:813; broadcast_unalias
                6 ./broadcast.jl:837; copyto!
                 6 ./simdloop.jl:73; macro expansion
                  6 ./broadcast.jl:838; macro expansion
                   5 ./broadcast.jl:507; getindex
                    2 ./broadcast.jl:546; _broadcast_getindex
                     2 ./broadcast.jl:570; _getindex
                      2 ./broadcast.jl:540; _broadcast_getindex
                       2 ./array.jl:731; getindex
                    3 ./float.jl:401; _broadcast_getindex
                   1 ./int.jl:53; +
                1 ./simdloop.jl:0; copyto!

Just that line of division now takes an entire second in running my code, whereas I am operating on the same data types! Note that the way parameters are allocated and passed to this function, e.g. the input u, as well as the number of times the function is called, has not changed. Even the line before, checking asum > 0, now takes 57 ticks instead of the old 1, and profiling gives no indication on where that time is spent:

57   ...aseController.jl:36; calcInput!(::Specification...
              6 ./operators.jl:286; >(::Float64, ::Int64)
               5 ./float.jl:448; <
               1 ./float.jl:488; <
                1 ./float.jl:452; <

Any ideas on what I should look into that can produce such a behavior?

Thanks,
Tusike

Here is the code for the struct with the calcInput! function.
Before changes:

mutable struct SimpleBaseController <: AbstractBaseController
    BTs::Vector{AbstractBarrierTransform}
    Ξ”::Vector{Float64}

    # pre-allocate variables used in calculations
    dρdx::Vector{Float64}
    vi::Vector{Float64}

    function SimpleBaseController(BTs::Vector{AbstractBarrierTransform}, Ξ”::Vector{Float64})
        new(BTs, Ξ”)
    end
end

function init(specManager::SpecificationManager, bc::SimpleBaseController, agent::Agent)
    # pre-allocate variables used in calculations
    bc.dρdx = Vector{Float64}(undef, agent.n)
    bc.vi = Vector{Float64}(undef, agent.m)
end

function calcInput!(specManager::SpecificationManager, bc::SimpleBaseController, agent::Agent, γ::Vector{Float64}, ρ::Vector{Float64}, t::Float64, u::Vector{Float64})
    u .= 0.0
    asum = 0.0
    for i = 1:specManager.M
        specManager.APs[i].dρdx!(specManager.APs[i], agent.x, bc.dρdx)
        ΞΊ, Ξ“ = calcΞΊ(bc.BTs[i], Ξ³[i], ρ[i], t)
        agent.dynamics.applyG!(agent.x, bc.dρdx, bc.vi)
        if (ρ[i] < Ξ“)
            ai = (Ξ“ - ρ[i])/(Ξ“ - Ξ³[i])
            u .+= (ai*ΞΊ/(bc.Ξ”[i] + dot(bc.vi,bc.vi)))*bc.vi
        else
            ai = 0.0
        end
        asum += ai
    end

    if (asum > 0)
        u ./= asum
    end
end

After changes:

mutable struct SimpleBaseController <: AbstractBaseController
    BTs::Vector{AbstractBarrierTransform}
    Ξ”::Vector{Float64}

    # pre-allocate variables used in calculations
    dρdx::Vector{Float64}
    vi::Vector{Float64}

    function SimpleBaseController(BTs::Vector{AbstractBarrierTransform}, Ξ”::Vector{Float64})
        new(BTs, Ξ”)
    end
end

function init(specManager::SpecificationManager, bc::SimpleBaseController, agent::Agent)
    # pre-allocate variables used in calculations
    bc.dρdx = Vector{Float64}(undef, agent.dynamics.n)
    bc.vi = Vector{Float64}(undef, agent.dynamics.m)
end

function calcInput!(specManager::SpecificationManager, bc::SimpleBaseController, agent::Agent, ρ::Vector{Float64}, tIndex::Int, t::Float64, u::Vector{Float64})
    u .= 0.0
    asum = 0.0
    for i = 1:specManager.M
        specManager.TSs[i].AP.dρdx!(specManager.TSs[i].AP, agent.x, bc.dρdx)
        ΞΊ, Ξ“ = calcΞΊ(bc.BTs[i], specManager.TSs[i].Ξ³[tIndex], ρ[i], t)
        agent.dynamics.applyG!(agent.x, bc.dρdx, bc.vi)
        if (ρ[i] < Ξ“)
            ai = (Ξ“ - ρ[i])/(Ξ“ - specManager.TSs[i].Ξ³[tIndex])
            u .+= (ai*ΞΊ/(bc.Ξ”[i] + dot(bc.vi,bc.vi)))*bc.vi
        else
            ai = 0.0
        end
        asum += ai
    end

    if (asum > 0)
        u ./= asum
    end
end

Just as a heads up. I wanted to look into what happens so I copy pasted the provided code into the REPL and got a bunch of errors of things not being defined. I now move on with my day without looking more into it.

I would estimate that the chance of getting help is about 10x larger if you give a minimal example that works when copy pasted into the REPL.

6 Likes

I perfectly understand that, however, as mentioned this is part of a large project which I cannot post as a whole. And since I have no idea where the decrease in performance stems from, I could not create a minimal working example. That being said, of course I am only looking for any ideas on what I could or should investigate and what could lead to the described strange behavior, instead of a complete solution to the problem. (For example, from people who are able to conclude something based on the profile reports).

Have you run code_warntype on it?

No, I did not know about @code_warntype, but I have printed out the type of the variables involved, and e.g. in the u ./= asum line u is always a vector of Float64’s, and asum is a Float64. But I will run that macro and let you know what I get.

Yeah, these are the runtime types but @code_warntype tells you about the inferred types which are computed before the code is run and what is used to optimize the code.

I recommend reading the performance section in the manual: Performance Tips Β· The Julia Language.

2 Likes

Great, thanks, it’s becoming much clearer now! The @code_warntype does indeed suggest I have issues which I did not have before due to now accessing e.g. the \gamma values (which must be Float64) through an abstract type at %35. I am not sure how to resolve this yet, but at least I definitely have a direction to follow.

Thanks again!!

Body::Any
β”‚β•»β•·β•·       materialize!21 1 ── %1   = (Base.arraysize)(u, 1)::Int64
β”‚β”‚β•»β•·β•·β•·      axes   β”‚    %2   = (Base.slt_int)(%1, 0)::Bool
│││┃│││      map   β”‚           (Base.ifelse)(%2, 0, %1)
β”‚β”‚β•»         copyto!   β”‚           invoke Base.Broadcast.fill!(_8::Array{Float64,1}, 0.0::Float64)
β”‚β•»         getproperty23 β”‚    %5   = (Base.getfield)(specManager, :M)::Int64
β”‚β”‚β•»β•·β•·β•·      Type   β”‚    %6   = (Base.sle_int)(1, %5)::Bool
β”‚β”‚β”‚β•»         unitrange_last   β”‚           (Base.sub_int)(%5, 1)
β”‚β”‚β”‚β”‚         β”‚    %8   = (Base.ifelse)(%6, %5, 0)::Int64
β”‚β”‚β•»β•·β•·       isempty   β”‚    %9   = (Base.slt_int)(%8, 1)::Bool
β”‚β”‚           └───        goto #3 if not %9
β”‚β”‚           2 ──        goto #4
β”‚β”‚           3 ──        goto #4
β”‚            4 ┄─ %13  = Ο† (#2 => true, #3 => false)::Bool
β”‚            β”‚    %14  = Ο† (#3 => 1)::Int64
β”‚            β”‚    %15  = Ο† (#3 => 1)::Int64
β”‚            β”‚    %16  = (Base.not_int)(%13)::Bool
β”‚            └───        goto #18 if not %16
β”‚            5 ┄─ %18  = Ο† (#4 => 0.0, #17 => %95)::Any
β”‚            β”‚    %19  = Ο† (#4 => %14, #17 => %101)::Int64
β”‚            β”‚    %20  = Ο† (#4 => %15, #17 => %102)::Int64
β”‚β•»         getproperty24 β”‚    %21  = (Base.getfield)(specManager, :TSs)::Array{AbstractTemporalSpecification,1}
β”‚β•»         getindex   β”‚    %22  = (Base.arrayref)(true, %21, %19)::AbstractTemporalSpecification
β”‚β•»         getproperty   β”‚    %23  = (Base.getfield)(%22, :AP)::Any
β”‚            β”‚    %24  = (Base.getproperty)(%23, :dρdx!)::Any
β”‚β•»         getproperty   β”‚    %25  = (Base.getfield)(specManager, :TSs)::Array{AbstractTemporalSpecification,1}
β”‚β•»         getindex   β”‚    %26  = (Base.arrayref)(true, %25, %19)::AbstractTemporalSpecification
β”‚β•»         getproperty   β”‚    %27  = (Base.getfield)(%26, :AP)::Any
β”‚β”‚           β”‚    %28  = (Base.getfield)(agent, :x)::Array{Float64,1}
β”‚β”‚           β”‚    %29  = (Base.getfield)(bc, :dρdx)::Array{Float64,1}
β”‚            β”‚           (%24)(%27, %28, %29)
β”‚β•»         getproperty25 β”‚    %31  = (Base.getfield)(bc, :BTs)::Array{Utilities.AbstractBarrierTransform,1}
β”‚β•»         getindex   β”‚    %32  = (Base.arrayref)(true, %31, %19)::Utilities.AbstractBarrierTransform
β”‚β•»         getproperty   β”‚    %33  = (Base.getfield)(specManager, :TSs)::Array{AbstractTemporalSpecification,1}
β”‚β•»         getindex   β”‚    %34  = (Base.arrayref)(true, %33, %19)::AbstractTemporalSpecification
β”‚β•»         getproperty   β”‚    %35  = (Base.getfield)(%34, :Ξ³)::Any
β”‚            β”‚    %36  = (Base.getindex)(%35, tIndex)::Any
β”‚β•»         getindex   β”‚    %37  = (Base.arrayref)(true, ρ, %19)::Float64
β”‚            β”‚    %38  = AgentManager.calcΞΊ::Core.Compiler.Const(Utilities.calcΞΊ, false)
β”‚            β”‚    %39  = (isa)(%32, Utilities.LinSigmoidBarrierTransform)::Bool
β”‚            β”‚    %40  = (isa)(%36, Float64)::Bool
β”‚            β”‚    %41  = (and_int)(%39, %40)::Bool
β”‚            └───        goto #7 if not %41
β”‚            6 ── %43  = Ο€ (%32, Utilities.LinSigmoidBarrierTransform)
β”‚            β”‚    %44  = Ο€ (%36, Float64)
β”‚            β”‚    %45  = invoke %38(%43::Utilities.LinSigmoidBarrierTransform, %44::Float64, %37::Float64, _7::Float64)::Tuple{Float64,Float64}
β”‚            └───        goto #10
β”‚            7 ── %47  = (isa)(%32, Utilities.LinExpBarrierTransform)::Bool
β”‚            β”‚    %48  = (isa)(%36, Float64)::Bool
β”‚            β”‚    %49  = (and_int)(%47, %48)::Bool
β”‚            └───        goto #9 if not %49
β”‚            8 ── %51  = Ο€ (%32, Utilities.LinExpBarrierTransform)
β”‚            β”‚    %52  = Ο€ (%36, Float64)
β”‚            β”‚    %53  = invoke %38(%51::Utilities.LinExpBarrierTransform, %52::Float64, %37::Float64, _7::Float64)::Tuple{Float64,Float64}
β”‚            └───        goto #10
β”‚            9 ── %55  = (AgentManager.calcΞΊ)(%32, %36, %37, t)::Tuple{Float64,Float64}
β”‚            └───        goto #10
β”‚            10 β”„ %57  = Ο† (#6 => %45, #8 => %53, #9 => %55)::Tuple{Float64,Float64}
β”‚β”‚β•»         indexed_iterate   β”‚    %58  = (Base.getfield)(%57, 1)::Float64
β”‚β•»         indexed_iterate   β”‚    %59  = (Base.getfield)(%57, 2)::Float64
β”‚β•»         getproperty26 β”‚    %60  = (Base.getfield)(agent, :dynamics)::AbstractSystemDynamics
β”‚β”‚           β”‚    %61  = (Base.getfield)(%60, :applyG!)::Any
β”‚β”‚           β”‚    %62  = (Base.getfield)(agent, :x)::Array{Float64,1}
β”‚β”‚           β”‚    %63  = (Base.getfield)(bc, :dρdx)::Array{Float64,1}
β”‚β”‚           β”‚    %64  = (Base.getfield)(bc, :vi)::Array{Float64,1}
β”‚            β”‚           (%61)(%62, %63, %64)
β”‚β•»         getindex27 β”‚    %66  = (Base.arrayref)(true, ρ, %19)::Float64
β”‚β•»         <   β”‚    %67  = (Base.lt_float)(%66, %59)::Bool
β”‚            └───        goto #12 if not %67
β”‚β•»         getindex28 11 ─ %69  = (Base.arrayref)(true, ρ, %19)::Float64
β”‚β•»         -   β”‚    %70  = (Base.sub_float)(%59, %69)::Float64
β”‚β•»         getproperty   β”‚    %71  = (Base.getfield)(specManager, :TSs)::Array{AbstractTemporalSpecification,1}
β”‚β•»         getindex   β”‚    %72  = (Base.arrayref)(true, %71, %19)::AbstractTemporalSpecification
β”‚β•»         getproperty   β”‚    %73  = (Base.getfield)(%72, :Ξ³)::Any
β”‚            β”‚    %74  = (Base.getindex)(%73, tIndex)::Any
β”‚            β”‚    %75  = (%59 - %74)::Any
β”‚            β”‚    %76  = (%70 / %75)::Any
β”‚         29 β”‚    %77  = Base.Broadcast.materialize!::Core.Compiler.Const(Base.Broadcast.materialize!, false)
β”‚            β”‚    %78  = Base.Broadcast.broadcasted::Core.Compiler.Const(Base.Broadcast.broadcasted, false)
β”‚            β”‚    %79  = (%76 * %58)::Any
β”‚β•»         getproperty   β”‚    %80  = (Base.getfield)(bc, :Ξ”)::Array{Float64,1}
β”‚β•»         getindex   β”‚    %81  = (Base.arrayref)(true, %80, %19)::Float64
β”‚β•»         getproperty   β”‚    %82  = (Base.getfield)(bc, :vi)::Array{Float64,1}
β”‚β”‚           β”‚    %83  = (Base.getfield)(bc, :vi)::Array{Float64,1}
β”‚β•»         dot   β”‚    %84  = LinearAlgebra.BLAS.dot::typeof(LinearAlgebra.BLAS.dot)
β”‚β”‚           β”‚    %85  = invoke %84(%82::Array{Float64,1}, %83::Array{Float64,1})::Float64
β”‚β•»         +   β”‚    %86  = (Base.add_float)(%81, %85)::Float64
β”‚            β”‚    %87  = (%79 / %86)::Any
β”‚β•»         getproperty   β”‚    %88  = (Base.getfield)(bc, :vi)::Array{Float64,1}
β”‚            β”‚    %89  = (%87 * %88)::Any
β”‚            β”‚    %90  = (%78)(AgentManager.:+, u, %89)::Any
β”‚            β”‚           (%77)(u, %90)
β”‚            └───        goto #13
β”‚            12 ─        nothing
β”‚         33 13 β”„ %94  = Ο† (#11 => %76, #12 => 0.0)::Any
β”‚            β”‚    %95  = (%18 + %94)::Any
β”‚β”‚β•»         ==   β”‚    %96  = (%20 === %8)::Bool
β”‚β”‚           └───        goto #15 if not %96
β”‚β”‚           14 ─        goto #16
β”‚β”‚β•»         +   15 ─ %99  = (Base.add_int)(%20, 1)::Int64
β”‚β•»         iterate   └───        goto #16
β”‚            16 β”„ %101 = Ο† (#15 => %99)::Int64
β”‚            β”‚    %102 = Ο† (#15 => %99)::Int64
β”‚            β”‚    %103 = Ο† (#14 => true, #15 => false)::Bool
β”‚            β”‚    %104 = (Base.not_int)(%103)::Bool
β”‚            └───        goto #18 if not %104
β”‚            17 ─        goto #5
β”‚         36 18 ─ %107 = Ο† (#16 => %95, #4 => 0.0)::Any
β”‚            β”‚    %108 = (%107 > 0.0)::Any
β”‚            └───        goto #46 if not %108
β”‚β•»         macro expansion37 19 ─ %110 = (AgentManager.typeof)(%107)::DataType
β”‚β”‚β•»β•·β•·β•·β•·β•·β•·   #repr#326   β”‚    %111 = (Base.sle_int)(1, 1)::Bool
│││┃││││     #sprint   └───        goto #21 if not %111
││││┃││││     isempty   20 ─ %113 = (Base.sle_int)(1, 0)::Bool
│││││┃││       iterate   └───        goto #22
β”‚            21 ─        nothing
││││││┃│        iterate   22 β”„ %116 = Ο† (#20 => %113, #21 => false)::Bool
│││││││┃         iterate   └───        goto #24 if not %116
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚     23 ─        invoke Base.getindex(()::Tuple{}, 1::Int64)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚     └───        $(Expr(:unreachable))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚     24 ─        goto #26
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚     25 ─        $(Expr(:unreachable))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚      26 β”„        goto #27
β”‚β”‚β”‚β”‚β”‚β•»         iterate   27 ─        goto #28
β”‚β”‚β”‚β”‚β”‚        28 ─        goto #29
β”‚β”‚β”‚β”‚         29 ─ %125 = invoke Base.:(#sprint#325)(nothing::Nothing, 0::Int64, sprint::Function, show::Function, %110::DataType)::String
β”‚β”‚β”‚β”‚         └───        goto #30
β”‚β”‚β”‚          30 ─        goto #31
β”‚β”‚           31 ─        goto #32
β”‚            32 ─        invoke Base.println("typeof(asum) = "::String, %125::String)
β”‚β”‚β•»β•·β•·β•·β•·β•·β•·β•·  repr   β”‚    %130 = (Base.sle_int)(1, 1)::Bool
│││┃│││││    #repr#326   └───        goto #34 if not %130
││││┃│││││    #sprint   33 ─ %132 = (Base.sle_int)(1, 0)::Bool
│││││┃│││      isempty   └───        goto #35
β”‚            34 ─        nothing
││││││┃││       iterate   35 β”„ %135 = Ο† (#33 => %132, #34 => false)::Bool
│││││││┃│        iterate   └───        goto #37 if not %135
││││││││┃         iterate   36 ─        invoke Base.getindex(()::Tuple{}, 1::Int64)
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚    └───        $(Expr(:unreachable))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚    37 ─        goto #39
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚    38 ─        $(Expr(:unreachable))
β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚     39 β”„        goto #40
β”‚β”‚β”‚β”‚β”‚β”‚β•»         iterate   40 ─        goto #41
β”‚β”‚β”‚β”‚β”‚β”‚       41 ─        goto #42
β”‚β”‚β”‚β”‚β”‚        42 ─ %144 = invoke Base.:(#sprint#325)(nothing::Nothing, 0::Int64, sprint::Function, show::Function, Array{Float64,1}::Type)::String
β”‚β”‚β”‚β”‚β”‚        └───        goto #43
β”‚β”‚β”‚β”‚         43 ─        goto #44
β”‚β”‚β”‚          44 ─        goto #45
β”‚β”‚           45 ─        invoke Base.println("typeof(u) = "::String, %144::String)
β”‚         38 β”‚    %149 = Base.Broadcast.materialize!::Core.Compiler.Const(Base.Broadcast.materialize!, false)
β”‚            β”‚    %150 = Base.Broadcast.broadcasted::Core.Compiler.Const(Base.Broadcast.broadcasted, false)
β”‚            β”‚    %151 = (%150)(AgentManager.:/, u, %107)::Any
β”‚            β”‚    %152 = (%149)(u, %151)::Any
β”‚            └───        return %152
β”‚            46 ─        return

Indeed, explicitly labeling the return type of the variables I am accessing through abstract types in my code gets me back to the previous computation speed, though I assume there is a more elegant solution to this. I have not yet figured out how to avoid using arrays of abstract types, which I guess where the main problem stems from.

You can either make sure your struct are concretely typed (maybe by adding type parameters) or if that is not possible, introduce a function barrier so the β€œkernel” of the computation is its own function and can be specialized on the input value.

4 Likes

For example, instead of

mutable struct SimpleBaseController <: AbstractBaseController
    BTs::Vector{AbstractBarrierTransform}

perhaps

mutable struct SimpleBaseController{B <: AbstractBarrierTransform} <: AbstractBaseController
    BTs::Vector{B}

Now, if the vectors really do have to contain a mix of different AbstractBarrierTransforms, how many different types must the vector contain?
If it’s only two, you could make the vector’s element type be a union of both concrete types. That should still be fast.

Alternatively, do these AbstractBarrierTransforms really have to be different concrete types?
How different are they in terms of data layout and behavior?
Instead of having BarrierTransform1 and BarrierTransform2, could you make one of the fields an @emum, indicating which BarrierTransform it is, and handle it this way?

2 Likes

Yep, that’s the plan now, I think a combination of the two might work the best.

The AbstractBarrierTransforms essentially each describe a different function (to evaluate a barrier term), and the type contains the parameters of this function as its data, and different constructors to evaluate these parameters based on given hyperparameters. I am using different concrete types to conveniently call the same function from my code, e.g. evaluateBarrier(myBarrierTransform), and allow multiple dispatch to call the function I need.

The vectors really do have to contain a mix of abstract elements. However, there won’t be many elements, and once initialized their number remains fixed! I was thinking it should thus be possible to use Tuples, such as:

mutable struct SimpleBaseController{B <: Tuple(Vararg{AbstractBarrierTransform}} <: AbstractBaseController
    BTs::B

I will test this and hope for the best.

I managed to get better performance than before, with a combination of parameterizing my structs with Tuples of abstract types, as well as introducing the function barriers to create specializations.

What I still don’t understand is why I needed to do the latter. A (fully working, standalone) example at the bottom illustrates the core of my problem. I define a function that accepts a tuple of abstract types, in order to compile specialized versions of this function. In the function, I access an element x in the abstract types. Now if I pass a given tuple to the function, a specialized version is created, and the compiler should figure out that x is always a Float, no? However, @code_warntype shows that this is not the case, and accessing the element x actually results in a union of all possible datatypes in the struct that composes the tuple… Why doesn’t the compiler know in this scenario that x is a float?

abstract type ABCs end

struct A <: ABCs
    x::Float64
    y::Vector{Float64}
    z::Int
end
struct B <: ABCs
    x::Float64
    y::Vector{Float64}
    z::String
end

function getSumX(letters::Tuple{Vararg{ABCs}})
    sum = 0.0
    for i = 1:length(letters)
        sum += letters[i].x
    end
    return sum
end

a1 = A(2.0, [1.0], 1)
a2 = A(3.0, [1.0], 1)
b = B(4.0, [1.0], "1")

@code_warntype getSumX((a1, a2, b))

Result of @code_warntype:

Body::Any
17 1 ──       (Base.ifelse)(true, 3, 0)                                                                                                                                             β”‚β•»β•·β•·  Colon
   β”‚    %2  = (Base.slt_int)(3, 1)::Bool                                                                                                                                            β”‚β”‚β•»β•·β•·  isempty
   └───       goto #3 if not %2                                                                                                                                                     β”‚β”‚
   2 ──       goto #4                                                                                                                                                               β”‚β”‚
   3 ──       goto #4                                                                                                                                                               β”‚β”‚
   4 ┄─ %6  = Ο† (#2 => true, #3 => false)::Bool                                                                                                                                     β”‚
   β”‚    %7  = Ο† (#3 => 1)::Int64                                                                                                                                                    β”‚
   β”‚    %8  = Ο† (#3 => 1)::Int64                                                                                                                                                    β”‚
   β”‚    %9  = (Base.not_int)(%6)::Bool                                                                                                                                              β”‚
   └───       goto #15 if not %9                                                                                                                                                    β”‚
   5 ┄─ %11 = Ο† (#4 => 0.0, #14 => %28)::Any                                                                                                                                        β”‚
   β”‚    %12 = Ο† (#4 => %7, #14 => %34)::Int64                                                                                                                                       β”‚
   β”‚    %13 = Ο† (#4 => %8, #14 => %35)::Int64                                                                                                                                       β”‚
18 β”‚    %14 = (Base.getfield)(letters, %12, true)::Union{A, B}                                                                                                                      β”‚β•»    getindex
   β”‚    %15 = (isa)(%14, A)::Bool                                                                                                                                                   β”‚
   └───       goto #7 if not %15                                                                                                                                                    β”‚
   6 ── %17 = Ο€ (%14, A)                                                                                                                                                            β”‚
   β”‚    %18 = (Base.getfield)(%17, :x)::Union{Float64, Int64, Array{Float64,1}}                                                                                                     β”‚β•»    getproperty
   └───       goto #10                                                                                                                                                              β”‚
   7 ── %20 = (isa)(%14, B)::Bool                                                                                                                                                   β”‚
   └───       goto #9 if not %20                                                                                                                                                    β”‚
   8 ── %22 = Ο€ (%14, B)                                                                                                                                                            β”‚
   β”‚    %23 = (Base.getfield)(%22, :x)::Union{Float64, Array{Float64,1}, String}                                                                                                    β”‚β•»    getproperty
   └───       goto #10                                                                                                                                                              β”‚
   9 ──       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))                                                                                            β”‚
   └───       $(Expr(:unreachable))                                                                                                                                                 β”‚
   10 β”„ %27 = Ο† (#6 => %18, #8 => %23)::Union{Float64, Int64, Array{Float64,1}, String}                                                                                             β”‚
   β”‚    %28 = (%11 + %27)::Any                                                                                                                                                      β”‚
   β”‚    %29 = (%13 === 3)::Bool                                                                                                                                                     β”‚β”‚β•»    ==
   └───       goto #12 if not %29                                                                                                                                                   β”‚β”‚
   11 ─       goto #13                                                                                                                                                              β”‚β”‚
   12 ─ %32 = (Base.add_int)(%13, 1)::Int64                                                                                                                                         β”‚β”‚β•»    +
   └───       goto #13                                                                                                                                                              β”‚β•»    iterate
   13 β”„ %34 = Ο† (#12 => %32)::Int64                                                                                                                                                 β”‚
   β”‚    %35 = Ο† (#12 => %32)::Int64                                                                                                                                                 β”‚
   β”‚    %36 = Ο† (#11 => true, #12 => false)::Bool                                                                                                                                   β”‚
   β”‚    %37 = (Base.not_int)(%36)::Bool                                                                                                                                             β”‚
   └───       goto #15 if not %37                                                                                                                                                   β”‚
   14 ─       goto #5                                                                                                                                                               β”‚
20 15 ─ %40 = Ο† (#13 => %28, #4 => 0.0)::Any                                                                                                                                        β”‚
   └───       return %40   

The answer is similar to Non-allocating loop over a set of structs - #6 by kristoffer.carlsson.

A loop is a chunk of code that is repeated. Here, letters will have a different type in each iteration so the code for the loop body that is compiled need to be general enough to handle this.

And like in the other answer, you can use Unrolled.jl to unroll the loop and thereby avoiding the constraint of a loop.

julia> @unroll function getSumX(letters::Tuple{Vararg{ABCs}})
           sum = 0.0
           @unroll for i = 1:length(letters)
               sum += letters[i].x
           end
           return sum
       end;

julia> @code_warntype getSumX((a1, a2, b))
Body::Float64
1 ─ %1  = Ο€ (0.0, Core.Compiler.Const(0.0, false))
β”‚   %2  = Ο€ (1, Core.Compiler.Const(1, false))
β”‚   %3  = (Base.getfield)(letters, %2, true)::A
β”‚   %4  = (Base.getfield)(%3, :x)::Float64
β”‚   %5  = (Base.add_float)(%1, %4)::Float64
β”‚   %6  = Ο€ (2, Core.Compiler.Const(2, false))
β”‚   %7  = (Base.getfield)(letters, %6, true)::A
β”‚   %8  = (Base.getfield)(%7, :x)::Float64
β”‚   %9  = (Base.add_float)(%5, %8)::Float64
β”‚   %10 = Ο€ (3, Core.Compiler.Const(3, false))
β”‚   %11 = (Base.getfield)(letters, %10, true)::B
β”‚   %12 = (Base.getfield)(%11, :x)::Float64
β”‚   %13 = (Base.add_float)(%9, %12)::Float64
└──       return %13

I understand that the code has to be general enough to handle the different types in each iteration, but it seems that it already does this by checking the type of the concrete variable in the current iteration, as seen in the following snippet of the @code_warnttype result:

18 β”‚    %14 = (Base.getfield)(letters, %12, true)::Union{A, B}                                                                                                                      β”‚β•»    getindex
   β”‚    %15 = (isa)(%14, A)::Bool                                                                                                                                                   β”‚
   └───       goto #7 if not %15                                                                                                                                                    β”‚
   6 ── %17 = Ο€ (%14, A)                                                                                                                                                            β”‚
   β”‚    %18 = (Base.getfield)(%17, :x)::Union{Float64, Int64, Array{Float64,1}}                                                                                                     β”‚β•»    getproperty
   └───       goto #10                                                                                                                                                              β”‚
   7 ── %20 = (isa)(%14, B)::Bool                                                                                                                                                   β”‚
   └───       goto #9 if not %20                                                                                                                                                    β”‚
   8 ── %22 = Ο€ (%14, B)                                                                                                                                                            β”‚
   β”‚    %23 = (Base.getfield)(%22, :x)::Union{Float64, Array{Float64,1}, String}                                                                                                    β”‚β•»    getproperty
   └───       goto #10                                                                                                                                                              β”‚
   9 ──       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))                                                                                            β”‚
   └───       $(Expr(:unreachable))     

At first this seems perfectly fine to me. %14 gets the current iterate of the tuple, which can be a Union{A, B}, as expected. The code then executes differently whether it is A or B, as checked at %15 and %20. To me this seems like a general enough way to handle things… Then, at the different executions - e.g. %17, %18, these lines only run if the iterate is a specific (known!) type, so surely the compiler should know at %18 that %17.x is a float, since %17 is of type A. I don’t see where it gets the idea that x could also be the same type as the other data types in A.

You are right, the compiler is smarter than I thought here and does do the union splitting.

Related, I think there is a regression in this particular case on the version of julia you are using vs julia 1.1.

On 1.1 I get

julia> @code_warntype getSumX((a1, a2, b))
Body::Float64
1 ──       goto #17 if not true
2 ┄─ %2  = Ο† (#1 => 0.0, #16 => %32)::Float64
β”‚    %3  = Ο† (#1 => 1, #16 => %38)::Int64
β”‚    %4  = Ο† (#1 => 1, #16 => %39)::Int64
β”‚    %5  = (Base.getfield)(letters, %3, true)::Union{A, B}
β”‚    %6  = (isa)(%5, A)::Bool
└───       goto #4 if not %6
3 ── %8  = Ο€ (%5, A)
β”‚    %9  = (Base.getfield)(%8, :x)::Float64
└───       goto #7
4 ── %11 = (isa)(%5, B)::Bool
└───       goto #6 if not %11
5 ── %13 = Ο€ (%5, B)
β”‚    %14 = (Base.getfield)(%13, :x)::Union{Float64, Array{Float64,1}, String}
└───       goto #7
6 ──       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))
└───       $(Expr(:unreachable))
7 ┄─ %18 = Ο† (#3 => %9, #5 => %14)::Union{Float64, Int64, Array{Float64,1}, String}
β”‚    %19 = (isa)(%18, Float64)::Bool
└───       goto #9 if not %19
8 ── %21 = Ο€ (%18, Float64)
β”‚    %22 = (Base.add_float)(%2, %21)::Float64
└───       goto #12
9 ── %24 = (isa)(%18, Int64)::Bool
└───       goto #11 if not %24
10 ─ %26 = Ο€ (%18, Int64)
β”‚    %27 = (Base.sitofp)(Float64, %26)::Float64
β”‚    %28 = (Base.add_float)(%2, %27)::Float64
└───       goto #12
11 ─ %30 = (%2 + %18)::Float64
└───       goto #12
12 β”„ %32 = Ο† (#8 => %22, #10 => %28, #11 => %30)::Float64
β”‚    %33 = (%4 === 3)::Bool
└───       goto #14 if not %33
13 ─       goto #15
14 ─ %36 = (Base.add_int)(%4, 1)::Int64
└───       goto #15
15 β”„ %38 = Ο† (#14 => %36)::Int64
β”‚    %39 = Ο† (#14 => %36)::Int64
β”‚    %40 = Ο† (#13 => true, #14 => false)::Bool
β”‚    %41 = (Base.not_int)(%40)::Bool
└───       goto #17 if not %41
16 ─       goto #2
17 β”„ %44 = Ο† (#15 => %32, #1 => 0.0)::Float64
└───       return %44

so the output value is correctly inferred and performance is good:

julia> @btime getSumX($(a1, a2, b))
  4.578 ns (0 allocations: 0 bytes)
9.0

on the master branch I have I get

julia> @btime getSumX($(a1, a2, b))
  80.251 ns (6 allocations: 96 bytes)
9.0

with the Any inferred.

I also don’t really understand the

β”‚    %14 = (Base.getfield)(%13, :x)::Union{Float64, Array{Float64,1}, String}

leading to

7 ┄─ %18 = Ο† (#3 => %9, #5 => %14)::Union{Float64, Int64, Array{Float64,1}, String}

Edit: I posted https://github.com/JuliaLang/julia/issues/32452.

Exactly, I was just about to comment on the %18 part! Still, it’s good to know that I at least seem to have grasped the theory of all this and it just might be the compiler that is not fully optimal.

Edit: indeed I should have mentioned long ago but I’m at version 1.0.1 right now.