Efficient reflection on structs

Does that mean I’m not allowed to call implementations of functions that accept T with a Wrapper{T} as an argument ?

It’s not actually a question of unrolling, but rather a question of the fieldnames being known at compile time. For some reason, the compiler devs have decided that fieldnames shouldn’t be treated as a pure function by the compiler, even though it’s information that cannot be changed.

So unfortunately, you will need to use a generated function for this if you want efficient behaviour. Just obtaining the fieldnames is slow.

julia> struct Foo
           f1::Int
           f2::Int
           f4::Int
       end

julia> @btime fieldnames($Foo)
  389.599 ns (1 allocation: 32 bytes)
(:f1, :f2, :f4)

A change to this was proposed in Pure fieldnames by bramtayl · Pull Request #30152 · JuliaLang/julia · GitHub, but one point that was raised against it was

We discourage usage of this API, and generally state that the fields of a type are an internal/private property in most cases and that users should define a function-based API (incl. accessors, getindex/keys, getproperty/propertynames, etc.)

Personally, I think this should be revisited, but in the meantime you’ll either need a generated function to make this work, or to change your approach.

1 Like

No, this is not possible. However, there are a few classes of circumstances where people desire this behaviour: Is Julia's way of OOP superior to C++/Python? Why Julia doesn't use class-based OOP? - #137 by Mason

One can kinda hack their way into this sort of behaviour using IRTools.jl, but I wouldn’t recommend it.

But why is it slow when I hardcode the fields too then?

I.e:

myCopyInner(target, source, :a, :b)

It’s not even using fieldtypes. It doesn’t really matter since code generation worked very well for me I’m just trying to understand the compilation process.

I am probably missing the point here again (don’t be afraid to just ignore me :slight_smile: I am aware I am not up to the level of the discussion here). But maybe something in the lines of the FieldVector implementation of StaticArrays is what you are searching for?

It wraps a user-defined immutable struct to a specific type for which all functions are defined, and in an efficient manner, because the memory layouts are the same.

That could be helpful. The problem is my structs also contain (sometimes very large) static arrays. And compilation time for StaticArray is prohibitive :disappointed:. I know that they are doing code gen but I don’t need all these features.

At the end of the day I also need to store and n-dimension array of these while being able to control the memory layout to ensure I have proper alignment and coalescing to yield max memory bandwidth on CUDA. So I assumed there would not be something already implemented for this particular use case.

1 Like

isit should be just as capable as isa for your purposes,
and [at least very nearly] equally fast.

But it wouldn’t allow dispatching on function on Wrapper{T} would it ?

You’re right, I spoke too soon. I found a couple small issues in your implementation above, but I was unable to get anywhere near the performance of a generated function here using tuple recursion

struct Wrapper{T, F}
    data::T
end

function wrap(data::T) where T
    fnames = fieldnames(T)
    sizes = map(x -> sizeof(fieldtype(T, x)), fnames)
    F = NamedTuple{fnames}(sizes)
    Wrapper{T, F}(data)
end

function easyCreate(T)
    v = T(map(x -> zero(fieldtype(T, x)), fieldnames(T))...)
    wrap(v)
end

function Base.getproperty(wrapper::Wrapper, s::Symbol)
    getproperty(getfield(wrapper, :data), s)
end

function Base.setproperty!(wrapper::Wrapper, s::Symbol, v)
    setproperty!(getfield(wrapper, :data), s, v)
end

#---------------------------------------------

function copyto_loop!(target::Wrapper{T, F}, source::Wrapper{T, F}) where {T, F}
    for property in keys(F)
        setproperty!(target, property, getproperty(source, property))
    end
    target
end 

#---------------------------------------------

function copyto_recursive!(target::Wrapper{T, F}, source::Wrapper{T, F}) where {T, F}
    copyto_recursive!(target, source, keys(F))
    target
end

@inline function copyto_recursive!(target::Wrapper{T, F}, source::Wrapper{T, F}, 
                                   properties::NTuple{N, Symbol}) where {T, F, N}
    N == 0 && return nothing
    property = first(properties)
    setproperty!(target, property, getproperty(source, property))
    copyto_recursive!(target, source, Base.tail(properties))
end

#---------------------------------------------

@generated function copyto_generated!(target::Wrapper{T, F}, source::Wrapper{T, F}) where {T, F}
    qproperties = QuoteNode.(keys(F))
    Expr(:block, 
         (:(setproperty!(target, $property, getproperty(source, $property))) for property in qproperties)..., 
         :target)
end
using BenchmarkTools

mutable struct Data2
    a::Int
    b::Float32
end

data1 = easyCreate(Data)
data2 = easyCreate(Data2)

@btime ($data2.a = $data2.a; $data2.b = $data2.b; )
@btime copyto_loop!($data2, $data2)
@btime copyto_recursive!($data2, $data2)
@btime copyto_generated!($data2, $data2);
#+end_src

#+RESULTS:
   2.069 ns (0 allocations: 0 bytes)
   111.080 ns (2 allocations: 32 bytes)
   75.495 ns (1 allocation: 16 bytes)
   1.869 ns (0 allocations: 0 bytes)

It seems that the optimizer is just not willing to do the amount of inlining or analysis required to make the loop or recursive versions fast, and we just have to reach in with a generated function and manually tell it what code to generate.

This is a little disappointing because fundamentally, the required information is present in the type signature of Wrapper as you pointed out.

@tim.holy is a big advocate for using Tuple recursion instead of generated functions, so I’ll ping him on the off chance he has the time to look and sees what’s wrong with copyto_recursive!.

1 Like

Whoa, thanks for spending all this time on this!

At least I’m glad to know at least I wasn’t missing something completely obvious. It’s a bit sad thought indeed. Like with fieldnames that could be reasonable. The fact that it’s not unrolling a loop over a 2-uple is definitely disappointing. On the other side I love how macros are implemented and easy to use in Julia so I guess it makes up for it :slight_smile:

Happy to help. I like delving into these cornercases.

I believe the lack of unrolling is a symptom of the problem, not the cause of the problem. Something is blocking the constant prop or another optimization pass and it’s not clear what.

2 Likes

thank you for bringing this to our attention

I didn’t take a close look, but we may not const-prop the values.

You could try doing the recursion in the type domain, Tuple{:a, :b, :c} using Base.tuple_type_head and Base.tuple_type_tail.

4 Likes

Yep, that did the trick! This is a neat tool to know about, thanks.

Keys(::NamedTuple{K}) where {K} = Tuple{K...}

function copyto_type_recursive!(target::Wrapper{T, F}, source::Wrapper{T, F}) where {T, F}
    copyto_type_recursive!(target, source, Keys(F))
    target
end

@inline function copyto_type_recursive!(target::Wrapper{T, F}, source::Wrapper{T, F}, 
                                   ::Type{Tup}) where {T, F, Tup <: Tuple}
    Tup === Tuple{} && return nothing
    property = Base.tuple_type_head(Tup)
    setproperty!(target, property, getproperty(source, property))
    copyto_type_recursive!(target, source, Base.tuple_type_tail(Tup))
end
@btime copyto_loop!($data2, $data2)
@btime copyto_recursive!($data2, $data2)
@btime copyto_type_recursive!($data2, $data2)
@btime copyto_generated!($data2, $data2);

#+RESULTS:
   124.987 ns (2 allocations: 32 bytes)
   74.324 ns (1 allocation: 16 bytes)
   1.860 ns (0 allocations: 0 bytes)
   1.820 ns (0 allocations: 0 bytes)

Must just have been too much trampolining between types and values I guess?

6 Likes

That looks cool!. I’m not 100% sure this is more readable than the macro, though. But definitely good to know.

1 Like