Defining structs inside functions during runtime (versioned structs design? value types?)

I need help (again) with macros. This was driving me crazy today, although I found quite a few “related” posts, nothing worked to achieve what I want so I start to think that I am working on a horribly bad design. I decided to ask you and hope for some insights.

I am dealing with tons of dynamically generated structs and parser functions and one of the main difficulties at the moment is to get different versions and corresponding parser functions “in-sync”. There are of course hard-coded solutions but I am sure Julia can do much better.

What I came up with is something like this:

abstract type StreamedObject end

import Base: getindex

function Base.getindex(f::T, s::Symbol) where T <: StreamedObject
    haskey(f.data, s) && return f.data[s]
    error("type $(typeof(f)) has no key named $(s)")
end

and then I need to create types with different versions during runtime. For this, I first create basic “container” types with the desired names.

struct Foo{V} <: StreamedObject
    data::Dict{Symbol, Any}
end

struct Bar{V} <: StreamedObject
    data::Dict{Symbol, Any}
end

Then it’s quite straight forward to create the streamer implementation for all kinds of different versions.

function stream(::Type{T}) where T<:Foo{1}
    fields = Dict{Symbol, Any}()
    fields[:a] = 1
    fields[:b] = 2
    fields[:c] = 3
    T(fields)
end

function stream(::Type{T}) where T<:Foo{2}
    fields = Dict{Symbol, Any}()
    fields[:x] = 10
    fields[:y] = 20
    T(fields)
end

function stream(::Type{T}) where T<:Bar{23}
    fields = Dict{Symbol, Any}()
    fields[:q] = 10000
    T(fields)
end

This is how it works (it’s really oversimplified but shows the concept):

julia> stream(Foo{1})
Foo{1}(Dict{Symbol,Any}(:a => 1,:b => 2,:c => 3))

julia> stream(Bar{23})
Bar{23}(Dict{Symbol,Any}(:q => 10000))

julia> stream(Bar{23})[:q]
10000

julia> stream(Bar{5})
ERROR: MethodError: no method matching stream(::Type{Bar{5}})

So the procedure to define a new type is currently:

  1. Create the base type definition (this is the same for every type)

    struct TypeName{V} <: StreamedObject
        data::Dict{Symbol, Any}
    end
    
  2. Define the stream(::TypeName{X}) method for each version

So far so good. I am aware that the performance is quite rubbish but at this moment I think this will not be a huge problem since it’s only for the high-level part.

My current problems

I totally failed to create structs from variables inside functions, in fact the only way I managed to achieve something was using eval and I am still not sure if this is the way to go.

What I am basically facing is a function which runs and collects some information about struct definitions and I need to create the struct definitions and parse methods inside it, so that those become available inside the module.

Here is a dummy session which I am trying to get working:

function create_parsers()
    for parser_name in ["ParserA", "ParserB"]
        @initialise parser_name
    end
end

and the non-working macro @initialise (apologies, I have thousands of versions, I just paste here one of them):

macro initialise(streamer)
    streamer_name = Symbol(eval(streamer))
    quote
        struct $(esc(streamer_name)) <: StreamedObject
            data::Dict{Symbol, Any}
        end
    end
end

This works when I pass in a variable name:

julia> name = "ParserZ"
"ParserZ"

julia> @initialise name

julia> fieldtypes(ParserZ)
(Dict{Symbol,Any},)

julia> fieldnames(ParserZ)
(:data,)

But of course fails when trying to use it inside a function since it first does the macro expand:

julia> function create_parsers()
           for parser_name in ["ParserA", "ParserB"]
               @initialise parser_name
           end
       end
ERROR: LoadError: UndefVarError: parser_name not defined

Any ideas? I am really questioning my approach but I am sure Julia can provide some nice ways to solve this design problem easily.

1 Like

Maybe we should take a step back, because from your question I can’t really figure out what you’re actually trying to accomplish. To start with, can you explain why you’re doing the data::Dict instead of actually adding fields to your types? That is, why not do:

struct Foo
  a::Int 
  b::Int 
  c::Int 
end

Your getindex method achieves essentially the same result, but it will be slower. And your stream method looks a lot like a constructor, so why not have an actual constructor that sets those values? Or use @kwdef to generate that constructor for you?

julia> Base.@kwdef struct foo
         x::Int = 1
         y::Int = 2
       end
foo

julia> foo()
foo(1, 2)

Again, using an actual struct with fields instead of a Dict will perform better and make your API more obvious.

As for your dynamic creation of structs, I’m also having trouble figuring out what the overall goal is. But as a general rule, using eval inside a macro body is never the right thing to do. The eval call happens when the macro is expanded, which is basically never what you want (and is why it doesn’t work for you inside a loop). If you actually need to eval something, just call eval or call a function that calls it–don’t use a macro. For example, there’s no reason you can’t do:

julia> for name in [:A, :B, :C]
         @eval struct $name
           a::Int
           b::Int
         end
       end

julia> A(1, 2)
A(1, 2)

julia> B(3, 4)
B(3, 4)

with no need to write a macro.

5 Likes

First, many thanks for your reply, it looks like @eval is what I was looking for.

Yes, so I tried to give a bit of context but may have oversimplified a bit :wink:

I am working on an I/O library (for the CERN ROOT format) where the data is defined inside the files themselves and the parser logic is based on some specific rules. I reached a point where I am able to read data with hardcoded types and parsers and now I am trying to simplify the code and create the logic during runtime. What I was basically struggling with is solved by @eval: defining structs in the global scope from variables which are local to the function (actually read from the file) which actually defines them.

Like here, which now works fine:

julia> function foo(names)
           for name in names
               @eval struct $name
                   a::Int
                   b::Int
               end
           end
       end
foo (generic function with 1 method)

julia> foo([:A, :B])

julia> A(1, 2)
A(1, 2)

The problem with Base.@kwdef is that I need to create different versions for a given struct. In my original example you see that there are different versions of Foo, which are defined in the file, with different field names and types. What I desperately need is some kind of a system which gives me meaningful errors when a type (version) is not present or implemented.

Since the parser logic is quite complex, for now I can for example provide hardcoded structs and parsers for Foo{1} (the first version of Foo) but I may read files where Foo{2} is needed and there I thought it’s nice to just shift the logic to the struct-definition level so that I get an error like no method found for parse!(::Type{Foo{2})

That’s why I had the idea to create a container type for every type I discover the first time during runtime and then use the value types to dispatch on the actual parser method.

This means, when I open a new file, I read the data definitions inside the file and create

struct Foo{V} <: StreamedObject
    data::Dict{Symbol, Any}
end

After that is present, I can go ahead providing stream!(::Foo{23}) etc. which create instances of Foo with different fields and types with a sometimes really complicated logic. Field lengths and types are depending on previous fields values etc.

I hope it’s clear :confused:

But as said, the design might be not the right choice, that’s why I am asking for expert help and you already provided nice hints! I was too fixated on the macro solution for defining the types…

Since I think the @eval will work for me, I now have to figure out how to create the parser logic during runtime.

Let me know if it’s now more clear what I am up to :wink:

I’m not sure I fully understand the proble, but I think rather than evaling new structs into existence, you could just make 'anonymous struct’s:

struct AnonymousStruct{S, T}
    nt::T
    function AnonymousStruct(; kwargs...)
        nt = kwargs.data
        new{gensym(), typeof(nt)}(nt)
    end
end

function Base.getproperty(as::AnonymousStruct, s::Symbol)
    getproperty(getfield(as, :nt), s)
end

function Base.show(io::IO, as::AnonymousStruct{S}) where {S}
    print(io, S, getfield(as, :nt))
end
julia> t = AnonymousStruct(a = 1, b = 2)
##253(a = 1, b = 2)

julia> u = AnonymousStruct(a = "hi", b = "bye")
##254(a = "hi", b = "bye")

julia> t.a
1

julia> t.b
2

julia> u.a
"hi"

julia> u.b
"bye"

julia> f(::typeof(t)) = "got t!"
f (generic function with 1 method)

julia> f(::typeof(u)) = "got u!"
f (generic function with 2 methods)

julia> f(t)
"got t!"

julia> f(u)
"got u!"

These will work well within the local or global scope.

2 Likes

The main idea is to have structs with the same name but different “versions”. They differ in fields, types and have to hold some kind of meta information to carry their version number. That’s why I have chosen value types.

1 Like

If they differ in fields, I think it’s probably best that they’re not the same type, but I guess you can just wrap an NamedTuple. Wrapping a dict seems inefficient unless you have many many fields.

They are technically not the same type but represent the same thing. The actual differences are usually quite subtle, like one more field added or the way one field is set from I/O data is different etc.

During runtime for example, I encounter that the data container holds data of type Foo with version 5. Now I need to find the appropriate parser which implements how I read the fields, so I thought it was a good idea to dispatch on the value type Foo{5}.

OffTopic: I can feel the pain just reading this… backwards compatibility +10 years of ROOT file standards is a nightmare…

1 Like

I know nothing about the ROOT format, or how complex the type/parser descriptor language is, but I am wondering if possible combinations can be encoded in a type parameter, then the code implemented using @generated functions. Technically it would be compile time, but work as runtime.

Yes I thought about using @generated functions but there is quite a lot going on during parsing i.e. I am not sure if I hit the wall at some point and cannot do everything since generated has a small subset of features available.

Now I am really stuck with the actual parser function definition during runtime. If you want to get a rough idea how the dynamic parser generation is done in Python: uproot3/rootio.py at eb2ae1ffe6fb2c2ce8cb7cbdc0919d5b51c0ff0f · scikit-hep/uproot3 · GitHub but I warn you :see_no_evil:

In case you are interested, to give a more specific example, this is what I extract using bootstrapped structs (some basic definitions which “never” change in ROOT) for a type called TNamed with the version 1:

julia> ROOTIO.streamerfor(f, "TNamed")
ROOTIO.StreamerInfo(ROOTIO.TStreamerInfo("TNamed", "", 0xdfb74a3c, 1, ROOTIO.TObjArray("", 0, Any[ROOTIO.TStreamerBase
  version: Int64 4
  fOffset: Int64 0
  fName: String "TObject"
  fTitle: String "Basic ROOT object"
  fType: Int32 66
  fSize: Int32 0
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, -1877229523, 0, 0, 0]
  fTypeName: String "BASE"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fBaseVersion: Int32 1
, ROOTIO.TStreamerString
  version: Int64 4
  fOffset: Int64 0
  fName: String "fName"
  fTitle: String "object identifier"
  fType: Int32 65
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "TString"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
, ROOTIO.TStreamerString
  version: Int64 4
  fOffset: Int64 0
  fName: String "fTitle"
  fTitle: String "object title"
  fType: Int32 65
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "TString"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
])), Set(Any["TObject"]))

It consists of three sub structures: TObject, fName and fTitle, and TObject has two fields called fBits and fUniqueID (this information is in a similar structure which I also read using bootstrapped types):

julia> ROOTIO.streamerfor(f, "TObject")
ROOTIO.StreamerInfo(ROOTIO.TStreamerInfo("TObject", "", 0x901bc02d, 1, ROOTIO.TObjArray("", 0, ROOTIO.TStreamerBasicType[ROOTIO.TStreamerBasicType(4, 0, "fUniqueID", "object unique identifier", 13, 4, 0, 0, Int32[0, 0, 0, 0, 0], "UInt_t", 0.0, 0.0, 0.0), ROOTIO.TStreamerBasicType(4, 0, "fBits", "bit field status word", 15, 4, 0, 0, Int32[0, 0, 0, 0, 0], "UInt_t", 0.0, 0.0, 0.0)])), Set(Any[]))

And the last missing piece is TString which is another bootstrapped type.

Now the actual TNamed{1} is

struct TNamed{1}   # invalid syntax but just to show it's version 1
    fBits::UInt_t  # inferred from TObject{1}
    fUniqueID::UInt_t  # inferred from TObject{1}
    fName::TString{4}
    fTitle::TString{4}
end

It’s a lot of fun…

Edit: fixed the versions, I messed them up accidentally. Again, it’s a lot of fun :wink:

1 Like