Function that constructs NamedTuple with function argument field names, and correctly infer return type

metaprogramming
inference

#1

The problem I’m trying to solve is to have a function f that:

  • takes as as one of its arugments a Tuple of Symbols (of variable length), called fields
  • for each field in fields, looks up the type of that field in a constant, global mapping called field_to_type::Dict{Symbol, DataType}
  • does some processing / data loading to get the data corresponding to each field
  • returns a NamedTuple whose field names are equal to fields, and whose fields types and data correspond to the above.
    Importantly, the return type of the function must be inferable (e.g. NamedTuple{(:foo, :bar), Tuple{Int64, Int64})). The fields argument can always be specified as a constant directly at the call site, e.g. f((:foo, :bar)). If it would help, it could also be specified as a Val type.
    In the rest of my program, f will only be called at a small number of sites, with different values of fields specified at the call site.So far, I’ve managed to come up with a way of doing this using a @generated function, but it is kind of hacky and hard to read:
const field_to_type = Dict(
                      :foo => String,
                      :bar => Bool,
                      :doo => Int64,
                      :dah => String,
                      :asd => Float64,
                      :dsa => String
                  )

@generated function f(::Val{fields}) where {fields}
           values_expr = :(())
           for f in fields
              push!(values_expr.args, :(Array{$(field_to_type[f])}(undef, len)))
           end

           expr = quote
               # Do the 'dynamic' computation in the function at call-time, here we pick a random integer
               len = rand(1:10)
               values = $(values_expr)
               NamedTuple{$(fields)}(values)
           end

           return expr
       end

f(Val((:asd, :dsa)))  # correctly inferred as having type NamedTuple{(:asd, :dsa),Tuple{Array{Float64,1},Array{String,1}}}

The len and constructing of Arrays is simply representative of the ‘call-time’ computation.
Currently the main thing I don’t like about this is the fact the the Tuple expression has to be constructed at the top, then spliced in referring to variables in the quoted scope. I would much prefer it if the flow of the code could at least be linear.
Would really appreciate some feedback / help! This is my first time really messing around with this side of Julia, so I might be missing something very obvious or going about this the completely wrong way. Perhaps there is even a way to achieve something similar without any metaprogramming and just via the type system (though seems unlikely).


#2

Perhaps a bit more context would help:
I’m writing a function that wraps a web service. The web service can return several dozen fields. For performance reasons, one doesn’t always want to request all fields when calling the function - usually only small subset of them. The actual fields will always be statically defined at the call site of the function - i.e. for a given call, the fields are never dynamic. My initial implementation of this function just returned a Dict{Symbol, Any}. This is obviously terrible for the later parts of the program that use the output of this function and access specific keys. Given that we know what the type of each field’s value should be, it would be nice if the type system knew about this. One thing I tried was declaring some types that include as fields all of the web service’s fields, but all optional e.g.:

struct MyType
    foo::Union{Nothing,String}
    bar::Union{Nothing,Bool}
    ... etc for all of the numerous fields
end

Then the function returns a MyType struct with most of the fields set to nothing.
This solution is annoying for a few reasons though:

  • the struct declaration code is very verbose
  • the struct construction code becomes exceedingly verbose
  • if fields are added to the web service, the structs and constructor calls need to be updated.

Instead, it seemed like it would be nice to return a NamedTuple instead - much more dynamic and flexible, less boilerplate, but only if the type of the NamedTuple could be correctly inferred. I tried getting this running without doing any metaprogramming (by dispatching on the fields as a Val type), but couldn’t. Hence the prototype I was hacking together above.


#3

FWIW, this is exactly what GitHub.jl does.


#4

Feel like I got slightly close with the following:

function g3(::Val{fields}, ::Val{field_types}=Val(Tuple{(Array{field_to_type[f], 1} for f in fields)...})) where {fields, field_types}
    NamedTuple{fields, field_types}([Array{field_to_type[f]}(undef, 3) for f in fields])
end

But it obviously doesn’t work because field_types is evaluated every call at run-time.
I haven’t quite worked out why this doesn’t work even when the field types are explcitily passed e.g.:

@code_warntype g3(Val((:foo, :bar, :dsa)), Val(Tuple{(Array{field_to_type[f], 1} for f in (:foo, :bar, :dsa))...}))
Body::Any
2 1 ─ %1 = $(Expr(:static_parameter, 1))::Core.Compiler.Const((:foo, :bar, :dsa), false)                                                                                            │ 
  │   %2 = %new(Base.Generator{Tuple{Symbol,Symbol,Symbol},getfield(Main, Symbol("##98#100"))}, getfield(Main, Symbol("##98#100"))(), %1)::Base.Generator{Tuple{Symbol,Symbol,Symbol},getfield(Main, Symbol("##98#100"))}
  │   %3 = invoke Base.collect(%2::Base.Generator{Tuple{Symbol,Symbol,Symbol},getfield(Main, Symbol("##98#100"))})::Array{_1,1} where _1                                            │ 
  │   %4 = (NamedTuple{(:foo, :bar, :dsa),Tuple{Array{String,1},Array{Bool,1},Array{String,1}}})(%3)::Any                                                                           │ 
  └──      return %4

Looks like the NamedTuple type is picked up correctly, but it doesn’t like the indeterminate object that is passed into the NamedTuple constructor.


#5

Is the reason this is failing because it is hitting this method:

which could raise an exception if length(itr) != length(names), so the type is somehow un-inferrable?


#6

If you would post a minimal desired example, that may help clarify the solution. Start with a 2Tuple of single letter symbols, and rather than code the steps just write down the desired results, one step per line and where it “does some processing” just stuff mocked results. Maybe add a comment between steps if what happens helps.