Subtyping vs adding a function to a struct's fields

I have a general strategic question about when to create subclasses versus when to do something else. As a motivating example, let’s say I have three boxes that I’d like to model. Boxes have various properties like material and mass, height, length, and width. Additionally, the boxes that I have are moving in space, with this motion given by a parametric function.

There are three very unsatisfying ways I can think of to solve this problem. I’m requesting help on thinking about whether there is a better way. Please understand that this is a very silly illustrative example!

Method 1: AbstractBox

It would be easy to define something like

abstract type AbstractBox end

function position(b::AbstractBox, t)::Tuple(AbstractFloat, AbstractFloat) # returns the (x, y) position
    throw("not yet implemented")
end

To do this, I need to define

struct Box1 <: AbstractBox
  material::String
  mass::AbstractFloat
  height::AbstractFloat
  length::AbstractFloat
  width::AbstractFloat
end

position(b::Box1, t) = (t, t^2) # returns the (x, y) position

and similarly for Box2, Box3, etc even though the only thing that should differ is the position method. This is unsatisfying because I have to copy out the material, mass, height, length, width, etc fields

Option 2: BaseBox

An alternative approach is to define a BaseBox object that I can then embed in Box1, Box2, Box3, etc.

struct BaseBox
  material::String
  mass::AbstractFloat
  height::AbstractFloat
  length::AbstractFloat
  width::AbstractFloat
end

struct Box1
  b::BaseBox
end

position(b::Box1, t) = (t, t^2) # returns the (x, y) position

This works OK, but it’s pretty clunky and requires that accessing things like Box1.b.height.

Option 3: Box

The third option is just to create a single Box type, and to treat Box1, Box2, and Box3 as separate instances. To make this work, position needs to be a field

struct Box
  material::String
  mass::AbstractFloat
  height::AbstractFloat
  length::AbstractFloat
  width::AbstractFloat
  position_fn::Function
end
position(b::Box, t) = b.position_fn(t)

box1_pos_fn(t) = (t, t^2)
Box1 = Box(..., box1_pos_fn)

This works fine and is perhaps the cleanest, but adding functions as fields doesn’t feel very clean to me.

Any advice / suggestions appreciated! Performance is definitely of interest.

Function is an abstract type (each function has its own concrete type), so the need to check this field at runtime and allocate space for a result of unknown size will cost performance. For the same reason, you should reconsider all those ::AbstractFloat too. The only reason I would do that is if I have wild combinations of concrete float types yet don’t want to compile for a combinatorial explosion of concrete Box types, but if you instead can guarantee that all the fields share a float type, you could annotate with a type parameter ::T and specify it in the header struct Box1{T<:AbstractFloat}.

If you really need separate box types, option 2 is closer to idiomatic. To avoid accesses like Box1(...).b.height, you would define interface methods like height(::Box1) to make them easier to write. Interface functions are useful for making a consistent way to, well, interface with a variety of concrete types with different structures. For example, a rectangle has different height and width, but a square only needs to store one of those; I could define height and width functions for both but the square has a definitional redundancy width(s::Square) = s.length; height(s::Square) = width(s).

The thing is, you don’t have a variety of concrete types with different structures, and if you never will, I’d just use 1 concrete Box with no abstract supertype at all. You could make different position functions like position1/position2, but if you really want to dispatch one position function, you could do it on a different argument, not Box e.g. position(::Val{1}, b::Box).

It’s not the same, but the extra dispatch argument is also used for traits. Traits attach orthogonal properties to types, like a SymmetryStyle(::Square) = Symmetric() and SymmetryStyle(::Rectangle) = Nonsymmetric(), and the user functions would look like foo(p) = foo(SymmetryStyle(p), p); foo(::Symmetric, p) = ... For the record, the Julia compiler has been smart enough to compute and inline constants like issymmetric(::Square) = true at compile-time, so you could do conditional statements that get optimized away, but traits let you organize things as multimethods.

3 Likes

Go with option 2 and then implement getproperty and optionally setproperty! methods. You can see an example here: Split SyntaxNode into TreeNode & SyntaxData by timholy · Pull Request #193 · JuliaLang/JuliaSyntax.jl · GitHub (I linked the PR so you can understand the background, but the most relevant bit is the code which you can see by clicking “Files changed”). These were gratifyingly-small changes to make JuliaSyntax.jl easily composable with TypedSyntax.jl.

3 Likes

The key structural difference between 1 & 2 is that BaseBox methods don’t have access to the parent (and hence its position). Separating components like this can be helpful to restrict the design space for users, depending on your scenario. This is about defining a box API that doesn’t depend on its position: is that a useful chunking for your situation? Perhaps it is.

This question reminds of The End Of Object Inheritance & The Beginning Of A New Modularity - YouTube.

1 Like

Thanks very much! For future reference, here’s what I ended up going with

using Base: getproperty, setproperty!

mutable struct BaseBox{T<:AbstractFloat}
    material::String
    mass::T
    height::T
    length::T
    width::T
end

struct Box1{T}
    b::BaseBox{T}
end
function Box1(; material, mass, height, length, width)
    return Box1(BaseBox(material, mass, height, length, width))
end
position(b::Box1, t) = (t, t^2) # returns the (x, y) position

function Base.getproperty(box::Box1{T}, name::Symbol) where {T}
    name in fieldnames(BaseBox{T}) && return getfield(box.b, name)
    return getfield(box, name)
end

function Base.setproperty!(box::Box1{T}, name::Symbol, value) where {T}
    name in fieldnames(BaseBox{T}) && return setfield!(box.b, name, value)
    return setfield!(box, name, value)
end

b = Box1(; material="Wood", mass=10.0, height=1.0, length=1.0, width=1.0)

getproperty(b, :material) # "Wood"
b.width * b.length * b.height # 1.0

b.mass = 20.0 # 20.0
getproperty(b, :mass) # 20.0

position(b, 4) # (4, 16)

Personally, I much prefer mass(box) to box.mass. It’s more extensible, easier to maintain, doesn’t require messing with getproperty, and has a lot of other benefits that only become clear after you’ve spent a while wrestling with property overrides.

For example, perhaps you’d rather define mass(box) = volume(box) * density(box) with density(box) = materialdensity(material(box)). Accessor functions make these patterns much more obvious and easy to write and maintain. They compose very nicely, which helps as you start assembling complicated objects from simple ones.

I would also tend to prefer a non-mutable struct to a mutable struct. They tend to be much better unless you require the option to mutate an object in one location and have those changes observed elsewhere. Accessors.jl can make them more ergonomic if you like mutation-like interactions.

4 Likes

If performance matters to you, you’ll need to parametrize Box1{T} and use that for the b field. You’ll also want to use fieldnames(BaseBox{T}) so that inference can const-prop it. (You might consider filing a Julia issue, since the fieldnames shouldn’t depend on the parameters.)

2 Likes

Also worth mentioning that MacroTools.@forward is pretty convenient for Option 2. Eg. instead of move_box(b1::Box1) = move_box(b1.b), you can write @forward Box1.b1 move_box, rotate_box, etc.

2 Likes

But MacroTools causes tons of invalidations. Until someone fixes that, personally I would recommend against using it.

2 Likes

Thanks, I definitely appreciate your points about accessor functions and API design more generally, and that’s something to incorporate where possible in the future.

I’m not sure what the best way to wrap that in for this contrived example would be. I could certainly define mass(b::Box1) instead of (or in addition to) changing the get/setproperty! functions. But not sure if that’s too clunky.

Last, totally agree RE mutability. However, for the more serious example that I have in mind, the “Box” represents the state of a system that changes at every time step, so it probably needs to be mutable – if you have examples of better ways to handle that, I’m very interested, but Accessors isn’t quite what I’m looking for in this particular example.

Do you mean that using MacroTools macros cause invalidations, or that MacroTools itself has a lot of invalidations?

MacroTools makes your code vulnerable to invalidation, because MacroTools internal code is vulnerable (it has poor inferrability) and macros in the body of your functions inserts those functions as dependencies of your functions. So when they get invalidated, your code gets invalidated.

To be clear, the macros themselves don’t leave “backedges,” it’s when the macros insert calls to MacroTools internal functions that you get into trouble.

1 Like

Mutability in languages like Julia and Python is more about multiple variables or elements referencing the same mutable object so you don’t keep multiple copies and reassign each reference one at a time. If you don’t need multiple references, reassignment of immutable versions could allow more efficient code e.g. the classic x += 1. Mutable objects aren’t necessarily a performance issue though, if you’re not creating and discarding them rapidly.

Personally, I find mass(b::Box1) = b.mass to be less clunky than b.mass littered everywhere. The annoying part about get/setproperty! is that they 're single functions so you have to put all that functionallity in one function that turns into an ever-growing list of elseif prop === :propertyX; return thevalue statements. Further, another person can’t add things without changing your source code or overwriting the function wholesale, which can eventually become a maintainability nightmare. But ultimately, either method works. I’ve just had better experience building complex systems with the accessor format.

As one of the above posters commented, there’s no inherent cost in “throwing out” a whole object and making a new version that’s a “copy” with one field changed. For immutables, the compiler is usually smart enough to just change the one element and pretend that it copied the rest. Meanwhile, immutability saves you the mental load of keeping track of what you’ve changed inside a structure because you never “change” anything, you just create a new version and forget about the old one. As mentioned, Accessors.jl is great for writing code that looks like replacing a single field to update an object (rather than manually merging the fields in tedious error-prone hand-written code).

2 Likes

Also worth noting that mass.(vector_of_b) works (or map(mass, vector_of_b)) whereas there’s no elegant way with .

3 Likes

This is great and I think really helpful for designing useful APIs! It seems like this would work most naturally with an AbstractBox kind of formulation, though, with the API defined (perhaps throwing some kind of not implemented error) on the abstract type, i.e. something like

mass(b::AbstractBox) = error("not implemented")

The downside of this approach is that I would then need to define mass(b::Box1), mass(b::Box2), etc – which seems like a lot of work and redundance when they’re all boxes.

Thanks for the helpful points. I followed up on the mass point below. RE mutability, I did a few simple experiments and it seems like there’s not a meaningful difference

using BenchmarkTools
using Accessors

mutable struct XYMutable{T<:AbstractFloat}
    x::T
    y::T
end

function random_walk_mutable(N)
    x = XYMutable(1.0, 2.0)
    for i in 1:N
        x.x += rand()
        x.y += rand()
    end
end

struct XY_IMMUTABLE{T<:AbstractFloat}
    x::T
    y::T
end

function random_walk_overwrite(N)
    xy = XY_IMMUTABLE(1.0, 2.0)
    for i in 1:N
        xy = XY_IMMUTABLE(xy.x + rand(), xy.y + rand())
    end
end

function random_walk_accessor(N)
    xy = XY_IMMUTABLE(1.0, 2.0)
    for i in 1:N
        @reset xy.x = xy.x + rand()
        @reset xy.y = xy.y + rand()
    end
end

N = 25_000
@info "Using a Mutable Struct"
@btime random_walk_mutable(N)
# 52.458 μs (0 allocations: 0 bytes)

@info "Overwriting an Immutable Struct"
@btime random_walk_overwrite(N)
# 52.708 μs (0 allocations: 0 bytes)

@info "Using Accessors.jl"
@btime random_walk_accessor(N)
# 52.875 μs (0 allocations: 0 bytes)

The lowered codes look pretty similar as well, albeit with a few differences. Maybe there’s a better way to do one / all of these, but I personally find the mutable version more intuitive.

Here is a version of @forward which does not need any package. Does it provoke invalidations?
“”"
@forward T.f f1,f2,...

is a macro which delegates definitions. The above generates

f1(a::T,args...)=f1(a.f,args...)
f2(a::T,args...)=f2(a.f,args...)
...

“”"

macro forward(ex, fs)
  T, field = esc(ex.args[1]), ex.args[2].value
  fdefs=map(fs.args)do ff
      f= esc(ff)
      quote
        ($f)(a::($T),args...)=($f)(a.$field,args...)
      end
  end
  Expr(:block, fdefs...)
end

The downside of this approach is that I would then need to define mass(b::Box1), mass(b::Box2), etc – which seems like a lot of work and redundance when they’re all boxes

You could use the informal interface that all AbstractBox should have a field b for base box and define mass(b::AbstractBox) = b.b.mass.

1 Like

This “not implemented” method is not needed, because without it you’ll get a MethodError, which you should really think of as “This method is not implemented for this type”…

Here is a version of @forward which does not need any package. Does it provoke invalidations?

If I’m reading Tim Holy’s comment correctly, using MacroTools and @forward are probably fine as far as invalidations are concerned. It’s @capture and other fancier macros that invalidate a lot.

Maybe I’m getting outrun by Julia’s community, but I’ve yet to really get into the whole TTFX / invalidation hunt for my own code. Are invalidations bad enough that whole packages should be proscribed, even for end-users? @capture is easily worth whatever marginal increase TTFX it causes for my own end-user packages…

That said, as the de facto MacroTools maintainer, thank you for the comment, I’ll put fixing that on my todo list.

1 Like