Advice to make getfield(::NamedTuple, ::Symbol) typestable

I have a complicated nested NamedTuple, where profiling shows that getindex on the NamedTuple is a significant source of slowness. @code_warntype shows a type instability from getindex on that NamedTuple in the slow call. Here is a tiny MWE that I think shows the same issue:

julia> nt = (a=[1,2],b=["a","b"])
(a = [1, 2], b = ["a", "b"])

julia> @code_warntype nt.b
MethodInstance for getproperty(::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{String}}}, ::Symbol)
  from getproperty(x, f::Symbol) in Base at Base.jl:42
Arguments
  #self#::Core.Const(getproperty)
  x::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{String}}}
  f::Symbol
Body::Union{Vector{Int64}, Vector{String}}
1 ─      nothing
│   %2 = Base.getfield(x, f)::Union{Vector{Int64}, Vector{String}}
└──      return %2

Are there any common patterns to make getting fields from named tuples type stable?

I think that specifically is happening because nt is global, for example:

julia> function f(nt) 
           s = nt.b
           push!(s,"c")
       end
f (generic function with 1 method)

julia> nt = (a=[1,2],b=["a","b"])
(a = [1, 2], b = ["a", "b"])

julia> @code_warntype f(nt)
MethodInstance for f(::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{String}}})
  from f(nt) in Main at REPL[21]:1
Arguments
  #self#::Core.Const(f)
  nt::NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{String}}}
Locals
  s::Vector{String}
Body::Vector{String}
1 ─      (s = Base.getproperty(nt, :b))
│   %2 = Main.push!(s, "c")::Vector{String}
└──      return %2


julia> f(nt)
3-element Vector{String}:
 "a"
 "b"
 "c"
2 Likes

If the compiler known the concrete type of the variable you have when it compiles the function for a specific combination of parameter types (i.e., at the function call), then getfield will always be type stable, be it for a NamedTuple or for a custom struct; otherwise, if the compile cannot infer it, then the variable will be type-unstable and the getfield will be a source of slowdown (again) independently if the concrete object is a NamedTuple or not.

1 Like

Thank you for the replies. I see that putting the getfield in a function fixes my tiny MWE. However, in my actual working code with a more complex structure, it’s already in a function and still unstable despite both arguments to the function being of concrete types.

I’ll have to work harder to make a good MWE. :wink:

2 Likes

There is something else at play then, and only a MWE will give us the insight necessary to find the problem. It would be very strange if the NamedTuple was passed to a function and was type-unstable inside it (function specialize on NamedTuple), so it seems like the whatever that creates the NamedTuple inside the function body that access its fields is not being inferred correctly.

1 Like

Here’s a better MWE:

nt = (a=[1,2],b=["a","b"], c=[:x, :y], d=nothing, e=[1.0, 2.0], f=1.0)

function g(nt, obj) 
    s = nt[obj.id]
    return s
end

Then

julia> @code_warntype g(nt, (id=:e,))
Variables
  #self#::Core.Const(g)
  nt::NamedTuple{(:a, :b, :c, :d, :e, :f), Tuple{Vector{Int64}, Vector{String}, Vector{Symbol}, Nothing, Vector{Float64}, Float64}}
  obj::NamedTuple{(:id,), Tuple{Symbol}}
  s::Any

Body::Any
1 ─ %1 = Base.getproperty(obj, :id)::Symbol
│        (s = Base.getindex(nt, %1))
└──      return s

I understand that since the object holding the id is not known until runtime, the function can’t be compiled to know the return type. I’m looking for advice on how to work around this – are there any common patterns?

Thanks!

You could use the value type to specialize the call, but if this is spread everywhere and you need this for all fields, this is not a very composable solution. For the optimization of some specific point it may be a good alternative.

julia> my_getfield(nt, ::Val{:e}) = getfield(nt, :e)
       my_getfield(nt, ::Val{:a}) = getfield(nt, :a)
my_getfield (generic function with 2 methods)

julia> function g(nt, id) 
           s = my_getfield(nt, id)
           return s
       end
g (generic function with 1 method)

julia> nt = (a=[1,2],b=["a","b"], c=[:x, :y], d=nothing, e=[1.0, 2.0], f=1.0)
(a = [1, 2], b = ["a", "b"], c = [:x, :y], d = nothing, e = [1.0, 2.0], f = 1.0)

julia> @code_warntype g(nt, Val(:e))
MethodInstance for g(::NamedTuple{(:a, :b, :c, :d, :e, :f), Tuple{Vector{Int64}, Vector{String}, Vector{Symbol}, Nothing, Vector{Float64}, Float64}}, ::Val{:e})
  from g(nt, id) in Main at REPL[2]:1
Arguments
  #self#::Core.Const(g)
  nt::NamedTuple{(:a, :b, :c, :d, :e, :f), Tuple{Vector{Int64}, Vector{String}, Vector{Symbol}, Nothing, Vector{Float64}, Float64}}
  id::Core.Const(Val{:e}())
Locals
  s::Vector{Float64}
Body::Vector{Float64}
1 ─     (s = Main.my_getfield(nt, id))
└──     return s


@code_warntype is misleading you here. This is exactly the same issue as trying to do @code_warntype pair.first here: Accessing the field of `Pair` seems to be type-unstable - #4 by rdeits

Edit: Although if you really are passing a Symbol down from your outermost function, then the type instability is real (even if your first example doesn’t necessarily show it). If that’s the case, then the Val approach seems reasonable to me.

1 Like

Yes, what you are doing is to use a value (a Symbol value) to select a field from a heterogeneously typed NamedTuple. This is inherently type-unstable because the output types cannot be inferred from the input types (i.e., you always pass a Symbol but you can get back any type in the tuple). If the symbol value was a constant known at the compile time (as it is when you just use the notation obj.id with id being an implicit Symbol value), then the code would not be type-unstable, and the way to do it from outside the function is making the value a type, by means of Val(symbol_value). This way, your function will be compiled/specialized for the specific field you are trying to access, and the compiler can infer the output type from the input type.

1 Like

Thank you all for the advice! I have a solution that generates the Val-based getfield functions automatically for all of the symbols I need using some metaprogramming. I also needed to add a stored property in my structs so that they are parameterized by the Val{id} of their id field. Now, when I pass my structs into a function where I need to look up in a namedtuple for the struct’s id, I can use the type-stable version of getfield and the function has been specialized for the struct with that id. Looks good and my code is much faster!

The only problem is that now I’m suffering from some world-age problems due to the @eval’d new getfield-like methods. I have a hack currently that works for testing (manually re-evaluating calling functions) within vscode after the @eval’d functions have been created. I’ll keep hacking on this and open up another post here on this part if needed. Thanks again!

1 Like

Sounds that some different overall approach may be needed, but you could get around that passing the function instead of the symbol:

julia> my_getfield(nt, ::Val{:e}) = getfield(nt, :e)
       my_getfield(nt, ::Val{:a}) = getfield(nt, :a)
my_getfield (generic function with 2 methods)

julia> function g(nt, gf::F) where F<:Function 
           s = gf(nt)
           return s
       end
g (generic function with 2 methods)

julia> nt = (a=[1,2],b=["a","b"], c=[:x, :y], d=nothing, e=[1.0, 2.0], f=1.0);

julia> @code_warntype g(nt, x -> my_getfield(x, Val(:e)))
MethodInstance for g(::NamedTuple{(:a, :b, :c, :d, :e, :f), Tuple{Vector{Int64}, Vector{String}, Vector{Symbol}, Nothing, Vector{Float64}, Float64}}, ::var"#9#10")
  from g(nt, gf::F) where F<:Function in Main at REPL[9]:1
Static Parameters
  F = var"#9#10"
Arguments
  #self#::Core.Const(g)
  nt::NamedTuple{(:a, :b, :c, :d, :e, :f), Tuple{Vector{Int64}, Vector{String}, Vector{Symbol}, Nothing, Vector{Float64}, Float64}}
  gf::Core.Const(var"#9#10"())
Locals
  s::Vector{Float64}
Body::Vector{Float64}
1 ─     (s = (gf)(nt))
└──     return s


2 Likes

uhmm, I believe this is not really necessary, you can just take the Val(symbol_value) as f(::Val{symbol_value}) where {symbol_value} = ... and use symbol_value normally inside the function. No eval is needed.

I also believe you do not need this. You can just:

struct MyStruct{MySymbol}
    x :: Int
end

f(my_struct :: MyStruct{my_symbol}) where {my_symbol} = ...

This is, you can add a type parameter to a struct without having a field of that type, and the function will specialize on the MyStruct with the specific type parameter, and the type parameter may be a Symbol directly (it does not need to be wrapped in Val), Ints are also an exception to the rule and can be used directly (this is how Array{Float64, 2} works, the 2 is an Int value being used as type parameter).

I really feel like you are over-complicating here.

3 Likes

Thank you for this suggestion – I do need a different approach.

However, this approach here has the issue of calling Val for each execution and I’ve seen some performance issues with this in other experiments. Hence my attempt to do Val once for each struct and store it. I do like the idea of accomplishing this with higher order functions instead of metaprogramming though, so will think about it some more.

Yes, I am most certainly overcomplicating things :smile: . Thank you for the suggestion.

I just tested this suggestion. It solved my world age issue and resulted in almost a speed halving of my code’s runtime!

Thank you – this is another great idea that looks to be both efficient and avoid a lot of messy code. Giving this one a try now.

1 Like

I’ve made attempt at this approach, but have run into the following problem. If my struct has another type parameter, then constructing the struct gets messy. For example,

abstract type AbstractStruct{ID} end

struct MyStruct{ID, T} <: AbstractStruct{ID}
    x::T
end

and this predictably yields and error when I try to construct an object without enough type parameters

julia> MyStruct{:name}(1.0)
ERROR: MethodError: no method matching (MyStruct{:name, T} where T)(::Float64)

Of course I can do MyStruct{:name, Float64}(1.0) but this become unwieldy for my real types with complex nested namedtuple fields that are parametric types. I am aware that I can manually create additional constructors that do not contain the additional types like

(::Type{MyStruct{T}})(x) where {T} = MyStruct{T, Float64}(x)

though this is a lot of additional code unless it can be automatically generated for all types that are <: AbstractStruct{ID}.

Does this work for your case?

julia> (::Type{MyStruct{T}})(x) where {T} = MyStruct{T, typeof(x)}(x)

julia> MyStruct{:a}(2)
MyStruct{:a,Int64}(2)
1 Like

It does, thanks! And more generically, I can define something like

(::Type{StructX{T}})(args...) where {T} = StructX{T, typeof.(args)...}(args...)

for each new struct that is <: AbstractStruct{ID}.

I wanted to do something like

function (::Type{S})(args...) where {S <: AbstractStruct{T}} where {T}
    non_id_var_types = typeof.(args)
    type_parameters = (T, non_id_var_types...)
    S{type_parameters...}(args...)
end

to define this constructor generically for all such types, but it runs into an error

julia> function (::Type{S})(args...) where {S <: AbstractStruct{T}} where {T}
           non_id_var_types = typeof.(args)
           type_parameters = (T, non_id_var_types...)
           S{type_parameters...}(args...)
       end

julia> struct E{ID, T, G} <: AbstractStruct{ID}
           y::T
           z::G
       end

julia> E{:name}(1,2)
ERROR: too many parameters for type

OK guys, I’m trying to follow your discussion. Here is what I have

abstract type AbstractStruct{ID} end

struct MyStruct{ID, T} <: AbstractStruct{ID}
    x::T
end

(::Type{MyStruct{T}})(x) where {T} = MyStruct{T, typeof(x)}(x)

(::Type{MyStruct{T}})(args...) where {T} = MyStruct{T, typeof.(args)...}(args...)

function (::Type{S})(args...) where {S <: AbstractStruct{T}} where {T}
    non_id_var_types = typeof.(args)
    type_parameters = (T, non_id_var_types...)
    @show S, type_parameters
    S{type_parameters[2:end]...}(args...)
end

struct E{ID, T, G} <: AbstractStruct{ID}
    y::T
    z::G
end

E{:name}(1,2)

yielding

(S, type_parameters) = (E{:name}, (:name, Int64, Int64))
E{:name, Int64, Int64}(1, 2)

Is that where you’re at?

2 Likes

Well, yes it does. Much appreciated for your chiming in and taking care of my bug there!

Putting all of these pieces together, I think we may have a very user-friendly and efficient solution. (where users means people adding more <: AbstractStruct{T} types and associated methods.

And for the record, we can clean that constructor up a bit:

(::Type{S})(args...) where {S <: AbstractStruct{T}} where {T} = S{typeof.(args)...}(args...)

Though moving this to a struct with a mix of type parameter fields and normal fields is going to have to be an exercise for another day.

Yes, I did not broadcast and splat in my example because it could lead someone to think that it would work in general. It only works if every type passed as an argument is parameterized. You can have a generic solution for a specific type, i.e., a single constructor for a parameterized type that works for all concrete versions of it, but you need to piece to take the value arguments used to infer the input arguments and call typeof just over them, specifically, and put the types in the correct slots of the parameterized struct.

1 Like