Allocations when using getfield with a tuple/vector of symbols

Tetrakai · November 18, 2024, 4:26am

MWE:

using Parameters, BenchmarkTools
@with_kw struct Test
    A :: String = "A"
    B :: Int64  = 0
end

test = Test()
tup  = (:A, :B)

@btime getfield($test, :A)
@btime getfield($test, $tup[1])

Is there a way to loop over struct fields without these allocations?

julia> @btime getfield($test, :A)
  2.400 ns (0 allocations: 0 bytes)
"A"

julia> @btime getfield($test, $tup[1])
  23.002 ns (1 allocation: 32 bytes)
"A"

DatName · November 18, 2024, 7:20am

You could use generated function:

@generated function getfield_unrolled(t::T, f::Symbol) where {T}
    names = fieldnames(T)
    exprs = [:($(QuoteNode(name)) == f && return getfield(t, $(QuoteNode(name)))) for (c, name) in enumerate(names)]

    push!(exprs, :(throw(ErrorException("type $T has no field $f"))))
    return quote
        $(exprs...)
    end
end

and get

julia> @btime getfield_unrolled($test, $(tup[1]))
  1.855 ns (0 allocations: 0 bytes)
"A"

Benny · November 18, 2024, 10:14am

And finally I know a good example showing that the allocation can be for the ~~runtime dispatch~~ alone, not the uncertain return type. Still no idea what that allocation is doing though. EDIT: actually getfield is a builtin function so I’m not even sure if it runtime dispatches the way generic functions do.

Tetrakai · November 18, 2024, 11:53am

Thanks, I’ve been trying to avoid metaprogramming for this project and manually unrolling such loops, but it can be repetitive/tedious. Maybe the warnings against metaprogramming are overblown?

It seems like a lot of trouble for a seemingly simple task.

foobar_lv2 · November 18, 2024, 12:06pm

No. Metaprogramming is appropriate for this task.

There is fundamentally no way of making field access fast if the field name is not known at compile time. In other words, you must make sure that the field name is known at compile time. This is fundamentally metaprogramming.

There are three approaches:

Constant propagation: getfield(x, :A) and similar.
Lifting to type domain: myGetfield(x, ::Val{sym}) where sym = getfield(x, sym)
explicit metaprogramming

Lifting to type domain is a way of explicitly encouraging constant propagation at various points.

The issue with naively relying on const-prop is that you are at the mercy of unstable compiler heuristics. It forces all readers of your code to understand how const-prop and inlining works in all julia versions you support and figure out in their head whether that applies to your construction. This is terrible for maintainability. Better go for metaprogramming if it is not exceedingly obvious that the field-names are known at compile time.

For that, you must benchmark differently: Don’t benchmark “the small function you want to measure”. For tiny functions, it’s all about interaction with context (surrounding code), so you must write a realistic outer loop function that calls your tiny function, and benchmark that. Benchmarking / performance is not composable for small timings.

Tetrakai · November 18, 2024, 2:28pm

For this case the field names are known ahead of time. I know exactly which elements will be extracted.

Ie, this works fine:

a = test.A
b = test.B

But doing this for, eg, 10 fields becomes repetitive.

foobar_lv2 · November 18, 2024, 2:39pm

So why not implement tuple unpacking such that you can write a,b = Test? For that you need to extend Base.indexed_iterate for your type, either by hand or by metaprogramming a la @with_kw.

nilshg · November 19, 2024, 7:28am

Couldn’t you just do

(;a, b) = my_object

without implementing any additional methods?

Tetrakai · November 19, 2024, 8:16am

If I understand correctly, those suggestions help with the MWE but the real code would be just as verbose as if I manually unroll it.

Eg, I need to add something (from another struct I am looping the same way) to a and store it, then b, etc. My purpose is to avoid repetitive code without sacrificing performance.

aplavin · November 19, 2024, 1:21pm

Generally, for looping over fields/properties, one would extract them as a namedtuple – then, stuff like map works and works performantly.
For extraction, use ConstructionBase.jl: it has getfields(x)::NamedTuple and getproperties(x)::NamedTuple.

It may also help if you show some specific examples of what you are trying to achieve.

Tetrakai · November 19, 2024, 8:58pm

Sure, here is one of the structs in question:

@with_kw struct Roster{T1 <: SVector{MAXPLAYERS, String15}, 
                       T2 <: SVector{MAXPLAYERS, Int16}}
    # Player Info
    Name :: T1 = @SVector fill(String15(""), MAXPLAYERS)
    Age  :: T1 = @SVector fill(String15(""), MAXPLAYERS)
    Nat  :: T1 = @SVector fill(String15(""), MAXPLAYERS)
    Prs  :: T1 = @SVector fill(String15(""), MAXPLAYERS)

    # Ratings
    St   :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Tk   :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Ps   :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Sh   :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Sm   :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Ag   :: T2 = @SVector fill(Int16(0), MAXPLAYERS)

    # Abilities
    KAb  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    TAb  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    PAb  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    SAb  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)

    # Stats
    Gam  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Sav  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Ktk  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Kps  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Sht  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Gls  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Ass  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    DP   :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Inj  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Sus  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
    Fit  :: T2 = @SVector fill(Int16(0), MAXPLAYERS)
end

And one of the functions (using the solution recommended above):

function calc_metric(baseline, sims)
    nteams    = length(baseline.lg)
    nreps     = length(sims)
    pl_fields = (:Gam, :Sav, :Ktk, :Kps, :Sht, :Gls, :Ass, :DP)
    tm_fields = (:Pl, :W, :D, :L, :GF, :GA, :GD, :Pts)

    sumSq = 0
    for sim in sims
        # Player Stats
        for i in eachindex(pl_fields)
            for j in eachindex(baseline.lg)
                x = getfield_unroll(baseline.lg[j].roster, pl_fields[i])
                y = getfield_unroll(sim.lg[j].roster,      pl_fields[i])
                sumSq += sum((Int64.(x - y)).^2)
            end
        end

        # Team Stats
        for i in eachindex(tm_fields)
            for j in eachindex(baseline.lg_table)
                x = getfield_unroll(baseline.lg_table[j], tm_fields[i])
                y = getfield_unroll(sim.lg_table[j],      tm_fields[i])
                sumSq += sum((Int64.(x - y))^2)
            end
        end

    end

    # Refers to RMSE per team (not total number of variables)
    RMSE = sqrt(sumSq/(nteams*nreps))

    return RMSE
end

Dan · November 20, 2024, 2:12am

Tetrakai:

# Player Stats
        for i in eachindex(pl_fields)
            for j in eachindex(baseline.lg)
                x = getfield_unroll(baseline.lg[j].roster, pl_fields[i])
                y = getfield_unroll(sim.lg[j].roster,      pl_fields[i])
                sumSq += sum((Int64.(x - y)).^2)
            end
        end

Perhaps old-fashioned broadcasting will get similar results:
(couldn’t actually test this, there may be tweaking needed because values are Arrays)

using LinearAlgebra  # to use `dot`

# Player stats
# broadcasting over pl_fields, so no second `for`
for j in eachindex(baseline.lg)
    x = Int64.(getfield.(Ref(baseline.lg[j].roster), pl_fields))
    y = Int64.(getfield.(Ref(sim.lg[j].roster, pl_fields))
    sumSq += sum(t -> dot(t,t), x .- y)
end

(the other loop can be modified in a similar way)

aplavin · November 20, 2024, 2:45am

I feel something like

pl_type = NamedTuple{(:Gam, :Sav, :Ktk, :Kps, :Sht, :Gls, :Ass, :DP)}
...
p1 = getproperties(baseline.lg[j].roster)
p2 = getproperties(sim.lg[j].roster)
sumSq += map(pl_type(p1), pl_type(p2)) do x, y
    abs2(x - y)
end |> sum
...

should be quite performant, and more readable.
Would be easier to check with an MWE of course

Also, you may want to consider using a StructArray instead of your Roster struct. Or, if you do want to dispatch on ::Roster, to have this struct only with a data::StructArray field.

Topic		Replies	Views
How to write a fast loop through structure fields General Usage	4	5069	March 30, 2019
Unroll setfield! General Usage performance	30	444	November 17, 2024
Allocation during access to "typed" field of mutable struct Performance question	11	940	November 18, 2021
Allocation depending on value of typed field in struct New to Julia question , performance , memory-allocation	4	494	March 23, 2022
Advice to make getfield(::NamedTuple, ::Symbol) typestable Performance	19	1000	March 25, 2022

Allocations when using getfield with a tuple/vector of symbols

Related topics