Julia code and struct generation

I would like to generate a bunch of types for each possible
file data format. There is a header with a nibble, X, that
determines which fields are present.

    struct A_X <: B          # X == 0x0 .. 0xF
        vall::Int64          # in every struct
        v1::Int16            # if bit 0 == 1
        v2::Int16            # if bit 1 == 1
        v3::Int16            # if bit 2 == 1
        v4a::Int16           # if bit 3 == 1
        v4b::Vector{Int16}   # size(v4b) == v4a
    end

This should generate struct with names A_0x0, A_0x1,…, A_0xF
and the fields present are from the above template. For example:

    struct A_0xD <: B
        vall::Int64
        v1::Int16            # if bit 0 == 1
        # v2::Int16          # if bit 1 == 1
        v3::Int16            # if bit 2 == 1
        v4a::Int16           # if bit 3 == 1
        v4b::Vector{Int16}   # size(v4b) == v4a
    end

The 15 other struct declarations follow similarly.

My first attempt was to make a function to produce a string
that is the desired julia code. I could not figure out how
to use it without something ugly like writing a temporary
file with the expanded declarations and include()-ing it.

It seems that I need to be generating the code at some lower
level which seems to amount to macro programming.

Am I understanding things correctly? In no particular order,
here are some problems I have:

  • Meta::parse() only seems to work for complete expressions
    How am I supposed to build up the result and from which pieces?

  • Is there a way to convert my desired julia struct definition
    into the exact AST (or whatever) that I would need to use to
    produce the needed struct declaration?

  • Is there some sort of comprehensive documentation/codex that
    covers this in more detail? I’ve read the manual under
    code generation and things seem pretty low on specifics.
    I used dump() to look at the struct and the output did
    not obviously map onto the 7-arg form that was suggested
    elsewhere in the manual.

Thanks!

You could use parametric types and have the fields which are not present with the type nothing. That way it takes no memory and you can always check without overhead if some field isnothing. Also it avoids the trouble of generating code.

1 Like

You use Expr to build up expressions piece by piece, or the higher-level equivalent of quote expressions. You don’t need macros either (those are for transforming user syntax into different expressions), only eval on the generated expression.

Never use strings for metaprogramming. Strings are brittle.

That being said, you should rarely need to use metaprogramming; it’s certainly not the first tool you should reach for. Other alternatives in your case include parameterized types with Nothing for unused fields, as @SteffenPL suggests above, or named tuples which are essentially anonymous struct types that can be generated as needed.

1 Like

This option is totally fine if all your struct definitions are set in stone. Its simple to set up and easy to debug.

1 Like

I’m leaning towards this approach with a build.jl file to generate
the included file defining all the struct types and their read
routines.

Maybe I’ll implement this with the AST version at some point.

Thanks, but I would like to avoid cruft in the data types such
as fields that are nothing present. I need high performance
read routines and this would seem to involve some sort of
run-time processing.

Thanks for the ideas.

There is no runtime processing involved here, so it might even be more accessible to write code.

For example, let’s say we simplify the situation to two fields.

struct AB
    a::Int 
    b::Int 
end

# manual or generated types with only one field
struct A
    a::Int 
end

# parametric type
struct MayAB{TA, TB}
    a::TA 
    b::TB 
end

Now, let’s say you have a method that needs to figure out if one deals with A or AB, you can either use

# manual types  (notice: code needs to be generated somehow!!!)
f(x::AB) = x.a * x.b 
f(x::A) = x.a
# ... generate all other options needed

# parametric types
f(x::MayAB) = isnothing(x.b) ? x.a*x.b : x.a

# named tuples
x1 = (a = 1.0,)
x2 = (a = 1.0, b = 2.0)

f(x) = hasproperty(x, :b) ? x.a*x.b : x.a

Now, all these variants will do the same once you call them in a type stable manner, since Julia can predict only based on the data types what to do, and it will cut all parts which can be determined based on the type. That is: isnothing or hasproperty will be evaluated at compile time. Which is probably just as good as having just N many methods which need to be compiled in the manual case…

Suppose you are worried that Nothing takes up space. Try About.jl and see yourself that there is no extra space:

using About 

julia> about(MayAB(1.0,2))
MayAB{Float64, Int64} (<: Any), occupies 16B.
 a::Float64 8B 00111111111100000000 … 0000000000000000000 1.0
 b::Int64   8B 00000000000000000000 … 0000000000000000010 2

 ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
                 8B                                 8B                 

julia> about(MayAB(1.0,nothing))
MayAB{Float64, Nothing} (<: Any), occupies 8B.
 a::Float64 8B 00111111111100000000 … 0000000000000000000 1.0

 ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
                                   8B                                  

 * = Pointer (8B)

2 Likes

It’s nice that you have a fixed 2^4=16 types, but that’s still many types to define, dispatch methods over, and compile. That’s not an ideal situation, and it’s worth considering refactoring.

Asking for metaprogramming tips in part to name your types like A_0x0, A_0x1, etc, is usually an indication that the types shouldn’t be so different, in which case the previous advice to use parametric types with the same fields is warranted. You’d still have to compile for each concrete parametric type, but you can maintain just 1 definition. You don’t have to store nothing, you could dispatch methods over the concrete types as you would for separately defined types and just ignore the fields that don’t semantically mean anything; a method for the abstract A type could also skip the fields depending on the X-nibble. Removing select fields in separate struct definitions doesn’t necessarily save memory because of architectural structure alignment.

Depending on what you need to do with the instances, you might not even need separate types at all, like how a complex hierarchy of user concepts in an application can actually be implemented as one type with one underlying behavior. For example, if you do something as simple as iterating over the existing v_ fields and their particular operations, you could just store the X-nibble per instance and use it to skip the dummy fields to the existing fields.

That said, if maintaining a eval-generated set of types is the intuitive and working solution, then hypothetical refactoring and learning to deal with new extraneous details is a tough sell, especially if the types have completely unrelated methods. I don’t do much metaprogramming and it is not trivial to mutate expressions or deal with macro hygiene, but fundamentally you mutate Arrays in Exprs or you interpolate Expr into each other in the quoted form that looks more like source code. Quickly checked that splatting interpolation works for annotated struct fields:

julia> fields = [:(v::Int), :(v1::String)];

julia> :(struct X
           $(fields...)
       end)
:(struct X
      #= REPL[2]:2 =#
      v::Int
      v1::String
  end)

julia> push!(fields, :(v2::Bool));

julia> :(struct X
           $(fields...)
       end)
:(struct X
      #= REPL[4]:2 =#
      v::Int
      v1::String
      v2::Bool
  end)

julia> :(struct X
           $(fields...)
       end) |> eval

julia> fieldnames(X)
(:v, :v1, :v2)

julia> X(1, "a", true)
X(1, "a", true)

Process the type name’s symbol and the fields over the 16 X-nibble values in an eval/@eval loop, and you got your 16 types. If you find yourself defining the methods in such a loop with very little change however, do consider type refactoring.

1 Like

But to be fair, the code above creates kind of the same type as one could get without any metaprogramming:

fields = (:v, :v1, :v2)
types = (Int, String, Bool)
NamedTuple{fields, Tuple{types...}}

(or just created on the fly without any gymnastics.)

I don’t think the task at hand justifies any code generation. (Anyway, it’s for sure fun and a good opportunity to practice, I don’t want to stop that.)

2 Likes

Thanks to all the responders! I’ve marked a solution because it
specifically addresses what I wanted to do.

What I learned are some other ways to do things differently that
may be better as a next iteration or for other projects. The “Ah-ha!”
was that I could use a vector/tuple/splat to produce the needed
fields without resorting to direct AST/Expr manipulation:

julia> fields = [:(v::Int), :(v1::String)];

julia> :(struct X
           $(fields...)
       end)
:(struct X
      #= REPL[2]:2 =#
      v::Int
      v1::String
  end)

I know what you’re getting at here, but instantiating a quoted form is technically the same thing as instantiating an unquoted Expr plus extra work for the parser:

julia> dump(:(struct X end)) # printing the Expr structure
Expr
  head: Symbol struct
  args: Array{Any}((3,))
    1: Bool false
    2: Symbol X
    3: Expr
      head: Symbol block
      args: Array{Any}((1,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[11]

julia> Expr(:struct, false, :X, Expr(:block)) # same result, can ignore LineNumberNode
:(struct X
  end)

julia> Expr(:struct, false, :X, Expr(:block, fields...)) # splatting the fields
:(struct X
      v::Int
      v1::String
      v2::Bool
  end)

Instantiating a quoted form is just preferred because resembling source code is more readable, hence the printing, and $-interpolation is written the same way as input code for some macros like @eval. However, a lot of Expr processing is more easily done as mutation instead of instantiation, so knowing Expr structure is still important for metaprogramming.

Thanks! I edited my summary to make it clear what I
was talking about—exactly what you mention here.
Sorry about the confusion.