Composition and inheritance: the Julian way

This is a rule I’m going to take on from now on: methods in shared packages should always dispatch on abstract types. Never on concrete types.

It really wasn’t entirely clear to me before this. It should be in the docs in bold all caps somewhere. Dispatching on concrete types is locking in implementation details and making lots of useless replication for anyone trying to extend types.

Wouldn’t a quick (but manual) search and replace to swap out DataFrame for an abstract type resolve this in DataFrames?

4 Likes

I think this isn’t a hard and fast rule. Both inheritance and composition are available in OO languages; the trick is to apply the right solution to the domain model.

There are some good insights from https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose

3 Likes

Yup, or even just let anything go.

3 Likes

Maybe. But I am not sure if it can have any consequence, or hide subtle errors. I (want to) believe there is a reason to define getindex(df::DataFrame, col_ind::ColumnIndex) (as they did), in place of getindex(df::AbstractDataFrame, col_ind::ColumnIndex).

Moreover, SubDataFrame is defined as follows:

struct SubDataFrame{...} <: AbstractDataFrame
    parent::DataFrame
    rows::T 
end

i.e. it doesn’t follow what we called rule 3.

Maybe there is a specific design motivation that I don’t understand…

Certainly. But the great thing about the this is that you could become a contributor just by making a PR, helping develop an API you care about. Pull requests that clean up interfaces are usually well-received and you get lots of feedback.

6 Likes

I found a way to avoid the copying involved in the super method defined in the first post. Simply use invoke.:yum:

The updated call(p::AbstractCitizen) is now:

function call(p::AbstractCitizen)
    if p.nationality == "Italian"
        print_with_color(:red, uppercase(p.name), " dove sei ?")
    elseif p.nationality == "UK"
        print_with_color(:red, uppercase(p.name), " where are you ?")
    else
        invoke(call, Tuple{AbstractPerson}, p) # invoke method using parent object
    end
end

That article argues that inheritance is fundamental to OO, in which case you have to accept that OO (in that sense) is impossible in Julia. The language made some design choices, which require idioms that are different from other languages. I think it is best to accept this at the very beginning: you can do everything, getting fast and elegant code, but you may not be able to do it in a specific way you are used to from other languages.

1 Like

Yeah, should have been “allways dispatch on abstract types if you absolutely have to dispatch on anything at all

Object-oriented programming is oriented around objects. Duh. But what that means is that it’s oriented around data and its representations. You need inheritance in order to make more generic code work because all of your functions are written according to some data layout. Your ideas about “an algorithm” is abstract, and your concrete implementations intermingle the ideas of the internal data and its representations.

Typed-dispatch programming, or multiple dispatch programming, or action-oriented programming flips this around. The pseudocode algorithm is exactly what becomes your code. It’s a generic function and any instantiation of that algorithm specializes on the input types to add in the actual underlying handling of data. In this case, the function or the algorithm is the core idea, and the data’s representation is what is held abstract throughout most of the programming. This comes natural to Julia through multiple dispatch, where you can write an algorithm that does A*x and not care what kind of matrix A is, since in the actual mathematics you don’t assume that A is a sparse matrix represented in CSC form, that’s just a computer detail!

The only time in this kind of programming that you have to deal with it is when you’re defining a new type and its primitive functions. Even then, in most types you’re building a composite type, i.e. building a new type from pieces, and so the primate functions are really just the actions on underlying data. So A*x on MyArray should really be A.A*x, so I forward it along. For this reason, composition with parametric types is a very natural way to do extension because once again I don’t have to care about the true internal data layout of A.A. If A.A does all of the actions that I want in my extension, it’s fine! The fact that a SparseMatrixCSC has fields for colptr is not something I should have to worry about because what I care about is extending its actionability and not building a data layout. Only at the very bottom when defining bitstypes or memory buffers for arrays, or fancy things likes implementations of sparse matrices, do you actually have to care about the internals of the data layout. In most mathematics you don’t have to, so there’s no reason to inherit all of that cruft (and the rigidness that comes from being tied to a data layout), so you might as well keep A.A as its own box that can change at any time as long as it acts the same. This is why composition makes sense in Julia programming, especially when going to generic codes.

9 Likes

I can think of two options:

  1. You could define another abstract type between AbstractDataFrame and DataFrame. Dispatching on it instead of DataFrame shouldn’t break anything.

  2. Replacing DataFrame with AbstractDataFrame. It can break something if there is another child type/types that descend from AbstractDataFrame and dispatches on AbstractDataFrame in a method where DataFrame is also dispatched on.

There could also be isues if typeof(x) = DataFrame and similar things in the code…

Maybe just try them and run the tests?

Edit: didn’t make sense.

I’m not sure why youre writing off having DataFrame as a field in SubDataFrame?
I think that’s what some others @ChrisRackauckas are including in composition, unless you mean that parent is an independent object, not just an internal structure inside SubDataFrame? But I can see how that could also be useful too…

Ok, I see the point. Thanks for clarifying.

In summary, if we want to extend/customize a structure such as:

struct Person <: AbstractPerson
    name::String
    age::Int
end

and use the new structure (call it Citizen) in exactly the same way as we use a Person we have two options:

  • composition:
struct Citizen
    person::person 
    nationality::String
end

In this case we are not tied to the Person layout, but we have to re-define all the methods accepting a Person object. This process can be automatized using macros but we still get lots of further entries in the dispatch table.

  • inheritance:
abstract type AbstractCitizen <: AbstractPerson end
mutable struct Citizen <: AbstractCitizen
    name::String
    age::Int
    nationality::String
end

In this case we are tied to the Person layout, but we can use one of the many available macros (e.g. the @def in @ChrisRackauckas examples) to completely lift this dependency.
Moreover all the methods accepting a Person will work seamlessly without flooding the dispatch table.

To solve the problem presented in this post, is there any valid reason or real use case to choose composition over inheritance ? If any, please post a short example :wink:

Not at all. There is no inheritance per se in Julia, so you can also do

mutable struct Citizen <: AbstractCitizen
    age::Int     # note order
    name::String
    nationality::String
end

and it should not matter if you are using slot names (and using slot positions would be silly). That said, while inheritance can be emulated, it is usually not the idiomatic approach in Julia, so it is misleading to treat it as an alternative to composition, just like using a Dict should not be considered a serious “alternative” either.

The point here is to distinguish the two approach. I keep making the mistake of attaching name to them, and I am immediately corrected. This is OK, but please could we focus on the two approach, regardless of their names?

To solve the problem presented in this post, is there any valid reason or real use case to choose approach 1 (mistakenly called composition) over approach 2 (mistakenly called inheritance).

It is not mistakenly called composition, it is composition. The reason to use it is that it is the idiomatic solution for Julia, as explained by @ChrisRackauckas above.

I will definitely do it after I am convinced that it is worth the time. I am asking here for help because I may not be aware of all the possible consequences.

If we all agree there is nothing wrong in repeating struct fields in structures like Citizen, and that the rules outlined above are the best way to implement an interface I will surely do.

If the community approves an approach I believe it is easier for the PR to be accepted… :wink:

No, you just choose too simple of a problem. The moment you go one step higher though it’s clear what happens. Let’s say you want to extend an array type to have metadata, like how DEDataArray does. There’s two ways to do it. One way to do it is to do composition.

type MyDataArray{T,N,A} <: DEDataArray{T,N}
    x::A
    a::T
    b::Symbol
end

(homework: fix my triangular dispatch). Now just forward the array interface onto x and you’re good.

Inheritance…?

julia> fieldnames([1,2,3])
0-element Array{Symbol,1}

Arrays are primitives in Julia so you can’t access their data… so haha this didn’t work out to well.

Now let’s say we want to do this with a sparse matrix. Composition already works with a sparse matrix. For inheritance, you’d have to add in these fields:

julia> fieldnames(sprand(10,10,0.1))
5-element Array{Symbol,1}:
 :m
 :n
 :colptr
 :rowval
 :nzval

and do a few overrides to make it act just like a SparseMatrixCSC (and make it an AbstractSparseMatrix, let’s assume that has enough generic methods to work easily) but with metadata. Okay, so extra work but still doable.

But what about if you wanted a BandedMatrix?. Well, this DEDataArray package code via an interface with composition already works because it still doesn’t care about the underlying data representation of the x field. For the inheritance way, you’d have to make a new type and add in the fields of a banded matrix and add some overrides.

So let’s see the tally.

Composition: 1 type, 1 set of overrides (inherited from a package so user’s don’t have to do it).

Inheritance: 1 new type each time you want to use a new matrix type (since it needs the structure of your new matrix), this doesn’t work with arrays (so it kind of defeats the purpose because the “simplest” case doesn’t work), and the user has to do the dirty details of forwarding array implementations into the type definitions.

The problem with inheritance is “array with metadata” is an abstract idea that doesn’t care that a sparse matrix is implemented by rowval with colptr meaning how many values per column to point to data stored in nzval. Those are completely unnecessary details that inheritance formulations have to pull in when doing an extension. However, DEDataArray essentially says “put the array that you want here, then put the metadata below it”. That works with any array type for obvious reasons, and if there’s a performance concern you can specialize some of the package functions as needed on certain classes of functions which you know have faster/slower access (again, not on exact implementation details, but on classes or abstractions of implementation… based on how they act!). DEDataArray doesn’t actually need an array in there. If you created a type like the Strang from SpecialMatrices.jl then this will forward the actions so it still acts like a matrix, but with metadata. It really doesn’t care what you put there, unless it acts correctly, and neither does any code that uses it.

So yes, there can be some reasons for extensions if something really requires that the extender should have exactly the same data representation. However, I find that is more of a rarity than an exception, at least in numerical mathematics. You can always fight against this oncoming train, but the reason why people warn against over-use of inheritance is because if the two objects aren’t metaphysically required to have the same layout, then somewhere down the line engineer A will find a nicer/better/faster representation for the simpler form and break the extension.

2 Likes
struct decorated{basetype, decType}
parent::basetype 
decoration::decType
end

I don’t need to know the basetype at coding-time.

1 Like

Quite a while ago, I tried making SubDataFrame encapsulate an AbstractDataFrame. It worked fine, but I got an avalanche of method ambiguities. That’s handled better now, so it’d be great if someone gave this another shot. It’s the right approach.

I just want to clarify my suggested use of Mixers.jl, and when I would use it instead of aggregated composition.

I’m often working with multiple formulations of physiological processes that share a subset of parameters. The formulation is a method despatching on a type that holds the necessary parameters. But they don’t actually inherit any behaviours, they are just dispatched to run a particular version of a formulation, using some custom parameters, and some common parameters that represent the same physical properties - and have the same Parameters.jl defaults that I don’t want to duplicate.

They could be aggregated types but this would actually add non-existent interdependence between them, they would all need to access the same composed field in the method they dispatch on. It would also deepen the nesting, and the formulation methods would be harder to read. So I use mixins for those fields. It’s mostly for cleaning up inconsequential duplication, not organisation inheritance.

You could build concrete type inheritance with it as mixins can operate on mixins, and use holy traits for the dispatch hierarchy, which could even be automated. But I haven’t tried that, and it might be insane. But it would be more flexible than oop concrete type inheritance.