Julep: Efficient Hierarchical Mutable Data

The key question about this proposal to me (which I don’t see addressed, but may have missed), is what happens when a reference to p outlives any references to w? In other words, the GC is doing its thing and there are no references to w. How do you ensure that w doesn’t get reclaimed? There’s a bunch of talk about lifetimes and their enforcement, but presumably it’s clear that Julia doesn’t have any lifetime enforcement mechanism (nor even a compile-time correctness checking phase, when such a mechanism would run)—objects live as long as they live and you have to handle it.

4 Likes

For just the field, yes, but that’s what the larger mutable struct can accomplish. If we can’t share the seats or wheels, we can at least share the car. If there is no car and we really are trying to share a wheel and its changes, then Ref can help.

Which means that there are 2 references that can’t share one of their objects. With other instances or types with inline fields, that’s more references that can’t share a mutable object. That goes against mutable types’ intention and breaks the guarantee that we can assign anything to a reference. For example, say we instantiate w1 = Whole(Part(), Part()), a whole with 2 different parts, we’re not trying to share things there. Then we try Whole(w1.part2, w1.part1); this will fail for an unprecedented reason. Granted, we have something similar with const disallowing reassignments; however, they do allow any assignment the first time.

My proposal above would look like

struct FieldRef{T}
owner::Any #for GC purposes only
ptr::Ptr{T}
end

and would rely on the hypothetical new Core.Intrinsics.interior_pointerset(owner, ptr, value) thing that writes value to ptr, while applying the GC write-barrier as if value had been written to a field of owner.

That would be relevant for

struct Inner
  x::Vector{Int}
end

mutable struct Owner
  part::Inner
end

reset!(ref::FieldRef{Inner}) = ref[] = Inner(Int[])

owner = Owner(Inner(Int[]))
reset!(getfieldref(owner, :inner)) #needs a GC barrier if owner is old-gen!
1 Like

That goes against mutable types’ intention and breaks the guarantee that we can assign anything to a reference.

Indeed, that is why inline would be a stronger const. The new rules would be:

  • You can assign anything to a reference …
  • … except if it’s const then you can only assign once during construction
  • … except if it’s inline then you can only assign once during construction and this assignment is a newly created object

I had some time this morning to try my hand at a macro-based solution. I believe that it’s 90% there (with the remaining 90% conceptually straight-forward), and mostly in line with the original proposal.

julia> abstract type AbstractElephant end

julia> struct Elephant <: AbstractElephant
           hunger::Float64
       end

julia> @with_inline_types mutable struct Forest
           elephant::@inline(Elephant)   # @inline elephant::Elephant also possible, but less
                                         # MacroTools-friendly
           normal_field::String
       end
Forest

julia> @with_inline_types mutable struct Biome
           forest::@inline(Forest)
       end
Biome

julia> f = Forest(Elephant(2), "Zoulando")   # TODO: overload Base.show
Forest(2.0, "Zoulando")

julia> dump(f)
Forest
  elephant__hunger: Float64 2.0
  normal_field: String "Zoulando"

julia> f.elephant
ElephantViewFromForest(Forest(30.0, "Zoulando"))

julia> f.elephant.hunger = 30
30

julia> f.elephant.hunger
30.0

julia> b = Biome(Forest(Elephant(50), "Tasmania"))
Biome(50.0, "Tasmania")

julia> fieldnames(Biome)
(:forest__elephant__hunger, :forest__normal_field)

Nested inline structs work out for construction, but field access is broken ATM. AFAICT views don’t compose, and we’d have to define ElephantViewFromForestFromBiome. Not very difficult, but a bit of a drag.

@btime looks good (no allocation) for access. Biome(Forest(Elephant(50), "Tasmania")) does create a needless Forest object. It’d need some rethinking to avoid that.

Parametric structs would be significant work.

using MacroTools
using MacroTools: @qq
using Base.Meta: quot

""" Defined for types, not objects. """
static_propertynames(::Type{T}) where T = fieldnames(T)

macro with_inline_types(def)
    # Example with respect to this
    #     @with_inline_types struct Forest
    #         elephant::@inline(Elephant)
    #         normal_field::String
    #     end

    di = MacroTools.splitstructdef(def)
    di2 = copy(di)
    name = di[:name]  # Forest
    inline_code = Expr[]  # code to define the type views
    di2[:fields] = []        # [(normal_field, Int)]
    inlined_type_fields = Pair[]  # [:elephant=>ElephatViewFromForest]
    prop_names = Symbol[]   # [:elephant, :normal_field]
    constr_accessors = []

    for (f, type) in di[:fields]
        push!(prop_names, f)
        if @capture(type, @inline(Texpr_))
            T = eval(Texpr)  # having to `eval` is unfortunate. there are other ways worth
                             # investigating, such as `getfield(@__MODULE__, Texpr)`, that would
                             # have different trade-offs
            s_fields = [Symbol(f, "__", sub_f) for sub_f in fieldnames(T)] # [:elephant__hunger]
            view_name = Symbol(Texpr, "ViewFrom", name)  # technically should also append __f
            type_view_code = quote
                struct $view_name <: $(supertype(T))  # ElephantViewFromForest <: AbstractElephant
                    obj::$name
                end
                function Base.getproperty(view::$view_name, p::Symbol)
                    # if p == :hunger
                    #     return getfield(view, :obj).elephant__hunger
                    # end
                    # error("Field not found")
                    $([:(if p === $(quot(fieldname))
                         return getfield(view, :obj).$s_fieldname
                         end)
                       for (fieldname, s_fieldname) in zip(static_propertynames(T), s_fields)]...)
                    error("type ", $Texpr, " has no property ", p)
                end
                Base.propertynames(view::$view_name) = $(fieldnames(T))
                function Base.setproperty!(view::$view_name, p::Symbol, x)
                    $([:(if p === $(quot(fieldname))
                         return getfield(view, :obj).$s_fieldname = x
                         end)
                       for (fieldname, s_fieldname) in zip(static_propertynames(T), s_fields)]...)
                    error("type ", $Texpr, " has no property ", p)
                end
            end
            for s_field in fieldnames(T)
                push!(constr_accessors, :(getfield($f, $(quot(s_field)))))
            end
            push!(inline_code, type_view_code)
            push!(inlined_type_fields, f => view_name)
            append!(di2[:fields], map(tuple, s_fields, fieldtypes(T)))
        else
            push!(constr_accessors, f)
            push!(di2[:fields], (f, type))
        end
    end
    @qq(begin  # @qq to get good line numbers
        $(MacroTools.combinestructdef(di2))
        $(inline_code...)
        function Base.getproperty(obj::$name, p::Symbol)
            # if p == :elephant
            #      return ElephantViewFromForest(obj)
            # end
            # return getfield(obj, p)
            $([:(if p === $(quot(fieldname))
                 return $sub_type(obj)
                 end)
               for (fieldname, sub_type) in inlined_type_fields]...)
            return Base.getfield(obj, p)
        end
        Base.propertynames(obj::$name) = $prop_names
        static_propertynames(::Type{$name}) = $prop_names
        function $name($(prop_names...))
             $name($(constr_accessors...))
        end
        $name
    end) |> esc
end
3 Likes

Your code can’t deal with

julia> @with_inline_types mutable struct BigForest
           elephant1::@inline(Elephant)   # @inline elephant::Elephant also possible, but less
                                         # MacroTools-friendly
           elephant2::@inline(Elephant)
           normal_field::String
           end
julia> f=BigForest(Elephant(1.0), Elephant(2.0), ""); f.elephant1.hunger #should be 1.0
2.0

Right, that’s what this comment was about:

            view_name = Symbol(Texpr, "ViewFrom", name)  # technically should also append __f

so

            view_name = Symbol(Texpr, "ViewFrom", name, "__", f)

then it would work out I believe, but maybe it’d need some more tweaking.

“Newly created object” is not a simple condition. In Whole’s inner constructor, we can at least make sure we call its new at some point. However, Whole has no idea if Part was instantiated within Whole’s constructor call, even if Part’s constructor is directly called; the Part() call is not guaranteed to call its own new, let alone return a Part instance. Even if we can infer the instantiation of Part within Whole, it’s not a sufficient condition. We have to also confirm that it won’t be initially stored elsewhere, like another conflicting inline field, before any one of Whole’s fields. As mentioned before, x = Part(); return new(x, x) would fail for 2 inline fields, but it could also fail if we instantiate something else that stores the Part in an inline field. Avoiding failure requires narrow, brittle, and opaque conditions possibly through layers of callers, and dynamic dispatch can prevent the compiler from resolving this. The compiler also can’t fall back to non-inline storage without overheads.

The “easy” solution to this is for new to force a newly created object by copying the instance before inline-ing the copy, much like immutable instances. We don’t have to care about whether the input was stored elsewhere or if the compiler can tell. It could be possible for the compiler to optimize away the instantiation of the original instance if it can be inferred that it occurs within the outer mutable struct’s constructor. But if we’re copying things around like this, immutables make more sense. However, updating fields by instantiation and reassignment does have its own issues to work out regarding object identity of nested immutable and mutable structs, it’s not as straightfoward as a.b.c.d = e currently meaning mutating the d field of a.b.c with no (direct) changes to a or b. For example, if any layer is mutable, we’d likely want to mutate it there rather than instantiate the whole thing and reassign, e.g. not x = x0 = Ref(1); @reset x[] = 2; (x !== x0, x0, x).

1 Like

I completed the (POC of the) macro-based version this morning. Now it works for nested structs:

abstract type AbstractElephant end

struct Elephant <: AbstractElephant
    hunger::Float64
end

@with_inline_types mutable struct Forest
    elephant::@inline(Elephant)   # @inline elephant::Elephant also possible, but less
                                  # MacroTools-friendly
    normal_field::String
end

@with_inline_types mutable struct Biome
    forest::@inline(Forest)
end

julia> b = Biome(Forest(Elephant(50), "Tasmania"))
Biome(50.0, "Tasmania")

julia> b.forest.elephant.hunger = 33.0
33.0

julia> dump(b)
Biome
  forest__elephant__hunger: Float64 33.0
  forest__normal_field: String "Tasmania"

julia> @btime $b.forest.elephant.hunger
  2.768 ns (0 allocations: 0 bytes)
33.0

I used generated functions to limit the complexity of the macro expansion. The way it’s written, it might also not be that difficult to support parametric types…

PatrickHaecker Would that meet your needs?

I get the appeal of getting a new keyword in base to solve such a problem, but as you probably are well aware, that does not happen very often. Pushing this macro to the finishing line is more likely to work out… Happy to help if there is interest.

Full code:

using MacroTools
using MacroTools: @qq
using Base.Meta: quot

""" Defined for types, not objects. """
static_propertynames(::Type{T}) where T = fieldnames(T)

get_inlined_type_fields(x) = Val((;))

@generated function get_view_property(::Val{inl_typ_fields}, view::T, ::Val{p2}) where {T, p2, inl_typ_fields}
    # When evaluating `b.forest.elephant.hunger`, this is called twice. First time O is
    # ForestView{:forest, Biome}, and we return
    #     ElephantView{:forest__elephant, Biome}(view.obj)
    # (simplified) from the first branch below while second time we return
    #     view.obj.forest__elephant__hunger
    # from the second branch. The idea being that each nested View object has, as
    # parameter, the corresponding field prefix.
    @assert length(T.parameters) == 2
    p1 = T.parameters[1]
    obj_expr = Expr(:call, :getfield, :view, quot(:obj))
    if haskey(inl_typ_fields, p2)
        sub_view = inl_typ_fields[p2]
        p12 = quot(Symbol(p1, "__", p2))
        return :($sub_view{$p12, typeof($obj_expr)}($obj_expr))
    else
        return Expr(:call, :getfield, obj_expr, quot(Symbol(p1, "__", p2)))
    end
end

@generated function set_view_property!(::Val{inl_typ_fields}, view::T, ::Val{p2}, new_val) where {T, p2, inl_typ_fields}
    @assert length(T.parameters) == 2
    p1 = T.parameters[1]
    obj_expr = Expr(:call, :getfield, :view, quot(:obj))
    if haskey(inl_typ_fields, p2)
        # Eg. this happens for
        #     b.forest.elephant = Elephant(...)
        error("TODO: must copy all fields")
    else
        # TODO: put a `convert` call to the right type
        return Expr(:call, :setfield!, obj_expr, quot(Symbol(p1, "__", p2)), :new_val)
    end
end

macro with_inline_types(def)
    # Example with respect to this
    #     @with_inline_types struct Forest
    #         elephant::@inline(Elephant)
    #         normal_field::String
    #     end

    di = MacroTools.splitstructdef(def)
    di2 = copy(di)
    name = di[:name]  # Forest
    inline_code = Expr[]  # code to define the type views
    di2[:fields] = []     # [(normal_field, Int)]
    inlined_type_fields = Dict()  # [:elephant=>ElephantView]
    prop_names = Symbol[]   # [:elephant, :normal_field]
    constr_accessors = []

    for (f, type) in di[:fields]
        push!(prop_names, f)
        if @capture(type, @inline(Texpr_))
            T = eval(Texpr)  # having to `eval` is unfortunate. there are other ways worth
                             # investigating, such as `getfield(@__MODULE__, Texpr)`, that would
                             # have different trade-offs
            s_fields = [Symbol(f, "__", sub_f) for sub_f in fieldnames(T)] # [:elephant__hunger]
            view_name = Symbol(Texpr, "View")
            type_view_code = quote
                struct $view_name{F, U} <: $(supertype(T)) # ElephantView <: AbstractElephant
                    obj::U
                end
                Base.propertynames(view::$view_name) = $(fieldnames(T))
                Base.getproperty(view::$view_name, p::Symbol) =
                    get_view_property(get_inlined_type_fields($Texpr), view, Val{p}())
                Base.setproperty!(view::$view_name, p::Symbol, x) =
                    set_view_property!(get_inlined_type_fields($Texpr), view, Val{p}(), x)
            end
            for s_field in fieldnames(T)
                push!(constr_accessors, :(getfield($f, $(quot(s_field)))))
            end
            push!(inline_code, type_view_code)
            inlined_type_fields[f] = :($view_name)
            append!(di2[:fields], map(tuple, s_fields, fieldtypes(T)))
        else
            push!(constr_accessors, f)
            push!(di2[:fields], (f, type))
        end
    end
    @qq(begin  # @qq to get good line numbers
        $(MacroTools.combinestructdef(di2))
        $(inline_code...)
        function Base.getproperty(obj::$name, p::Symbol)
            # if p == :elephant
            #      return ElephantViewFromForest(obj)
            # end
            # return getfield(obj, p)
            $([:(if p === $(quot(fieldname))
                 return $view_type{$(quot(fieldname)), $name}(obj)
                 end)
               for (fieldname, view_type) in inlined_type_fields]...)
            return Base.getfield(obj, p)
        end
        Base.propertynames(obj::$name) = $prop_names
        static_propertynames(::Type{$name}) = $prop_names
        function $name($(prop_names...))
             $name($(constr_accessors...))
        end
        get_inlined_type_fields(::Type{$name}) = $(Val(NamedTuple(inlined_type_fields)))
        $name
    end) |> esc
end
2 Likes

Avoiding failure requires narrow, brittle, and opaque conditions possibly through layers of callers, and dynamic dispatch can prevent the compiler from resolving this.

The general case is hard. Although it would be nice to support all cases, I think this is not necessary, as inline creates some kind of static context rendering Julia’s more dynamic parts incompatible. In the end the semantics only make sense to me if exactly one new call of Part happens, if the reference of this call is returned and if an inline Part does not have a (direct or indirect) Part field (a non-inlined one might work, but I am not yet convinced that it should be supported).

If this can be inferred (could the effect system help here?), the call is valid, otherwise it is invalid. And as the new call needs to act differently and we need to hand the memory address to it, all involved functions on the stack between the inner constructor of Whole and the inner constructor of Part need another method with an implicit parameter transporting this information.

But if we’re copying things around like this, immutables make more sense.

I agree. Copying sounds compelling in the beginning, but it needs a strong story about object identity to really make sense with mutables. This might mean that nothing can observe the originally created object which is the source object of the copy and that probably brings us back to the new approach.

Nice. However,

julia> f=BigForest(Elephant(1.0), Elephant(2.0), ""); f.elephant2.hunger
2.0

julia> typeof(f.elephant2)
ElephantView{:elephant2, BigForest}

julia> typeof(f.elephant2) == typeof(f.elephant1)
false

which brings up what I noted upthread:

I think this would really require new (albeit simple) core intrinsics relating to GC write barriers and a very_unsafe_pointer_to_objref(typ::DataType, p::Ptr) = unsafe_pointer_to_objref(p)::typ that assumes the type instead of checking (so UB instead of exception if it’s wrong, unless check-bounds=yes, or some compiler flag for the julia runtime is set).

3 Likes

I’m not sure I get why this is a desideratum. Both are AbstractElephants, just like array views are AbstractArrays. It’d be problematic for performance if you wanted to, say, put elephant1 and elephant2 inside a Vector, but given the context of the post (wants to share lifetime of parent object with child) that seems unlikely.

If you wanted a C-style inlined array, as in

struct Forest
    @inlined trees::Tree[4]
end

you could implement a TreeArrayView where the index is not part of the type.

That said, I can see some of the “Why does this have to be so complicated in Julia” point…

I had some time this morning to try my hand at a macro-based solution.

This is indeed impressive macro work!

I used

mutable struct Part
    a::Float32
end

@with_inline_types mutable struct Whole 
    b::Int32
    part::@inline(Part) 
end

w = Whole(Int32(42), Part(42.0f0))

to stay compatible with the rest of the discussion (but great that the solution is tested with nested structs)

Note that I have only a limited understanding of the macro implementation, so please correct points I haven’t evaluated correctly, so I can correct the list.

These are the pros and cons I see :

pros:

  • clearly defined constructor semantics
  • seems to have ideal memory behavior (only one alloc, clear GC story)
  • field access works with nested structs
  • preserves user defined getproperty for Whole:
    Base.getproperty(w::Whole, s::Symbol) = s === :c ? true : getfield(w, s)
    julia> w.c
    true
  • Preserves object identity (although I do not understand why)
    w1 = Whole(Int32(1), Part(1.0))
    w2 = Whole(Int32(1), Part(1.0))
    julia> w1.part === w1.part !== w2.part === w2.part
    true
  • methods involving Part can be called (using a Union or subtyping):

cons:

  • constructor no longer supports type conversion (Whole(42, Part(42.0) does not work)
  • definition syntax more complicated as macro needed both before struct definition and inside field definition
  • returns “wrong” fieldnames:
    julia> Whole |> fieldnames
    (:b, :part__a)
  • returns “wrong” type for substructs: w.part |> typeof
    julia> w.part
    PartView{:part, Whole}(Whole(42, 42.0f0))
  • Part’s methods needs to accept a Union or an abstract type (then Part needs to use that, too) to work
  • Does not support getting the memory address
    julia> w1.part |> pointer_from_objref
    ERROR: pointer_from_objref cannot be used on immutable objects

So while it does work for the simplest of cases, it seems to be too limited (but I might be using it incorrectly).

Edit: corrections due to additional input (see below)

This seems very similar to how views work on vectors:

f(v::Vector) = sum(v)

v = collect(1:10)
vv = view(v, 2:3)

f(v) # 55
f(vv) # ERROR: MethodError: no method matching f(::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true})

For views, the solution is typically to use AbstractVector rather than Vector for dispatch:

f(v::AbstractVector) = sum(v)

Perhaps Part and PartView could similarly both be made to subtype AbstractPart. I know nothing about macros though, so no idea how to accomplish this.

1 Like

For views, the solution is typically to use AbstractVector rather than Vector for dispatch

Ahh, thanks a lot. @cstjean used an AbstractElephant and with that calling methods works:

abstract type AbstractPart end
mutable struct Part <: AbstractPart
    a::Float32
end
inc(p::AbstractPart) = p.a += 1

julia> w.part |> inc
43.0f0

So

therefore, no methods involving Part can be called

becomes a “pro” point that

  • methods involving Part can be called

However, cons gets the point

  • Part must be changed in general to support method calls

The original idea was that Part can be used unmodified.

Nevertheless, with the ability to call methods, this gets much more useful and should now cover all the simple cases. It’s still not on the level of having something like this supported by the language itself. But if already fills a niche where in simple cases the performance with modifications should be superior to what is currently possible with immutable structs. I wonder whether a package might make sense until we (hopefully) will have language support for such a feature.

1 Like

In your specific case, it may be cleaner to define Union type directly. It’s a matter of personal preference, I would personally define a macro for each inlined type like:

@Part == Union{Part, PartView}

that can be used in the function signatures so that they only need to be changed slightly. For example,

f(x::@Part) = 5

so one simply would need to add @ in front of the types that support views.

2 Likes

Just a nitpick, but in this case using a macro is not necessary at all, just define a constant, like const UPart = Union{Part, PartView}.

3 Likes