Composition and inheritance: the Julian way

I understand what you are trying to do, I am just arguing that it is not an approach that meshes well with Julia, or leads to good interface design.

First, I think that if you want an interface, you should define an abstract type to go with it. This is costless in terms of performance, and allows you do document the interface in the docstring of the abstract type.

Second, you should minimize the functions that need to be implemented for this interface, ideally by choosing a core and then some extra methods that use this core but can be overridden for performance/implementation reasons. This is what AbstractArray does, and Julia’s compilation model makes this costless (in terms of runtime) in most cases, because of specialization and inlining (and lately, constant propagation).

IMO an interface which violates these is not good design. Interfaces should not be associated primarily with concrete types, nor should they be very rich. Either will cause problems independently of forwarding methods. That said, designing good interfaces is an difficult iterative process; frequently they emerge from functions of concrete types, and similarly, get streamlined by refactoring.

This link seems broken now, this one works for me.

I’ve been using abstract types for broad classification (i.e. AbstractString & AbstractChar), but for the rest, I don’t make hierarchies of abstract types, I’ve found traits and parameterized concrete types much more useful for writing generic code and having a consistent API. Most of the things about strings and characters are orthogonal to one another (Mutable or not, validated or not, single- or multi- codeunit encoding, ASCII compatible or not, ISO compatible or not, Unicode subset, full Unicode, Unicode + invalids (i.e. Char), or not Unicode compatible, etc.)
Trying to make that into a hierarchical structure explodes the number of types to deal with.

Also, instead of forwarding everything, you can use a method to access the Person from whatever type you have that includes person, and then call the person specific methods directly on it, which to me is clearer.

struct Person
    name::String
    gender::String
    birthday::String
end
person(p::Person) = p

struct Citizen
    person::person 
    nationality::String
end
person(c::Citizen) = c.person

Then you don’t have to add methods for every field, just one, instead of name(c) you’d have person(c).name or name(person(c))

7 Likes

Sorry for the absence… I will comment on individual posts below:

Sorry but I don’t understand why it should be related to be composition example. The Lazy.@forward requires encapsulation (i.e. approach 1 and 2 in the first post), not composition (approach 3). You will forgive me if encapsulation and composition are not the right words (again, I’m not a CS…), I use them just to refer to the approach discussed in the first post.

Completely agreed. Besides, the boilerplate code may add lots of entries in the dispatch table, without actually adding any new functionality. Moreover, all the boilerplate code, even if automatically generated, must be compiled resulting in further waste of time.

Thank you @Raf for pointing out Mixers.jl, it quickly allows to add further fields to structures without listing again the previous ones. This goes exactly in the direction of aproach 3 in the first post.

You can’t. If you want to add a twist to (e.g.) Vector, you simply can’t because you didn’t wrote the interface.

Thank @marius311 for providing a list of rules. Following your example I will provide my proposal, which is actually very similar to yours (except for the AbstractCitizen type):

  1. define an abstract type as supertype for each struct, e.g. define, Person <: AbstractPerson, Citizen <: AbstractCitizen, etc. In OOP terminology, the structures will be the data members of a class, and the abstract types will be the interfaces;
  2. define the hierarchy among structures using abstract types (not concrete types), e.g. AbstractPerson <: AbstractCitizen. This will allow us to define hierarchy in the same way as we do it in OOP;
  3. define sub structures by repeating the fields of the fields of the parent structure, in the same order and with the same types. This step can be automatized with Mixers.jl by @Raf;
  4. all methods working on a structure must accept the corresponding abstract type, not the concrete one. In OOP terminology these are the methods associated to a class;
  5. the only exception to the rule 4 above are the constructors, and those methods returning a specific structure (either Person or Citizen in the example above). These methods must accept concrete types, not abstract ones;
  6. define a super method like super(p::Citizen) = Person(p.name, p.age) (the name is not mandatory…) to be used in all methods of the derived structures to access the corresponding method acting on the parent structure.

I would like to emphasize that the goal here is not to reproduce an OOP practice in Julia, I know it is not possible and will likely lead to troubles. Here we only want to simply and efficiently extend/customize/specialize (use the word you prefer…) the behaviour of a Julia object whose methods have been written by someone else.

In my opinion this simple list of rules allows to easily extend the functionality of any object, and use it (quoting @DNF) exactly like the concrete object, with a twist. Moreover, even if the object is not supposed to be extended it will not harm its development or performances, and I can’t think of any practical reason to avoid following them. Any counter example here is more than welcome.

Well, if we agree that the above rules are to be followed, the only feasible way is to ask the object developer to adapt its implementation according to the rules… :wink:

Two final comments:

  • the very fact that we are discussing what is the best way to wrap an object into another to slightly customize its behavour, without adding lot of boilerplate code (either written or generated with macros), implies there is not yet a standardized and commonly accepted way to do it. Or, at least, I am not aware of it (again, any advice here is very welcome);

  • many of us, certainly myself, are struggling to understand whether we can use Julia as the main tool for our daily work. Hence, we need to know whether Julia has some intrinsic limitation. The impossibility to customize the behaviour of an object is (in my opinion) a strong restriction.

4 Likes

I’m struggling to understand that it’s somehow impossible to customize the behavior of an objects.
I’ve been use only Julia (even for my low-level work) for over 3 years now, and haven’t had any problems customizing the behavior of anything. I’m partial to using traits these days, because they end up very efficient, and allow me to deal efficiently with many orthogonal traits, and be able to add new traits later on, and new types using those traits, instead of having to change some hierarchy of abstract types.

4 Likes

well, you know where all this discussion comes from :wink:. Extending the behavior of, say, a DataFrame object is not trivial at all…

It would be nice if you could provide an example which solves the Person/Citizen problem in the first post using traits. Please :pray: avoid encapsulation since on a real example (e.g. a DataFrame) this means re-defining hundreds of methods…

To be honest, if you are trying to solve the problem linked above (define a DataFrame with a twist), it seems that one issue is the lack of a “tabular data interface”. It’s hard to believe that the hundreds of methods defined for DataFrames are all necessary: I imagine it’d be possible to write most of them as a function of a much reduced “table interface” (see what e.g. Query does, getting everything to work for every iterable of named tuples).

A very good solution in my view would be finalizing this tabular interface and rewriting things in function of it (see for example here for an attempt at porting StatsModels to this design).

2 Likes

This should be fairly easy, as Julia does not support encapsulation as it is usually understood (= restricting access to some slots) :wink:

OK, I have to apologize for my ignorance on CS terminology, I am an astronomer struggling to face a CS problem…

Could you please provide a name for this:

struct Person
    name::String
    gender::String
    birthday::String
end
person(p::Person) = p

struct Citizen
    person::person 
    nationality::String
end

Whatever the name is, I kindly asked to avoid exactly that. Thanks!

1 Like

I don’t know what you mean by a “twist”. But if you want composition and automatic forwarding of all functions that would emulate inheritance, that is indeed not possible. Many people in OO argue that composition is preferable to inheritance, but if you are unwilling to accept that, then you may find Julia difficult.

As @ScottPJones said, traits buy you almost all the benefits of inheritance, without the usally recognized problems. But you still have to implement some functions for the interface, and extend/change your implementation if the interface changes. There is still no automatic forwarding.

2 Likes

So that I have this clear, the goal would be to have few methods act on the concrete type Person.

Rather, the writers of the “Person” package should have Person be an abstract type and define a set of methods that only operate on that abstract type. This list of methods should be both short and well-documented. That way, when you define a concrete type, it should be very easy to get your new concrete type “up to speed” so that it fully implements the abstract type.

DataFrames does this with the AbstractDataFrame abstract type, but as you noted, many methods in DataFrames operate on the concrete type DataFrame. To the maintainers of DataFrames, would a good goal for PRs be to track down those methods and see if they could be better implemented on AbstractDataFrames instead of DataFrames?

3 Likes

Well, as that article notes later on, languages like Julia, with lexical closures, are generally more concerned with the other part to encapsulation (i.e. data abstraction) which is really orthogonal to the restricted access issue.

I would myself prefer to have the capability in Julia of declaring some things (functions, types, or fields), as being public (part of an API), I’ve found when dealing with code over long periods of time (decades!), with lots of people accessing it, modifying it, basing things on top of it (hundreds of thousands), it is very important way of helping to produce robust code.

Yes, it is a possible solution, but it would requires some effort from the DataFrames manitainers.
A simpler and quicker approach (in my very humble opinion…) would be to ask the maintainer to follow some rules like the ones we outlined above…

Still, the simplified tabular data interface appears very interesting. Of course the best would be to follow the rules for easy inheritance, and to implement the simplified interface. :wink:

I do not have enough knowledge to disagree, and you see I’m trusting Julia (and its developers) a lot.

I simply would like to:

  • extend DataFrame to obtain a new object called Foo;
  • use Foo in exactly the same way I would use DataFrame object;
  • re-defining ONLY those methods for which the Foo behaviour differs from the DataFrame one, which in the actual case are 3 or 4 (while the DataaFrame object is accepted by hundred of method);
  • do it by myself without asking the DataFrames maintainers to change something in their package.

I actually don’t care what is the name of this operation (inheritance, composition, enacpsulation, specialization…), I only would like to know if/how it is possible.

1 Like

That would really be a good start. Is there any DataFrames maintainer here?

This is a rule I’m going to take on from now on: methods in shared packages should always dispatch on abstract types. Never on concrete types.

It really wasn’t entirely clear to me before this. It should be in the docs in bold all caps somewhere. Dispatching on concrete types is locking in implementation details and making lots of useless replication for anyone trying to extend types.

Wouldn’t a quick (but manual) search and replace to swap out DataFrame for an abstract type resolve this in DataFrames?

5 Likes

I think this isn’t a hard and fast rule. Both inheritance and composition are available in OO languages; the trick is to apply the right solution to the domain model.

There are some good insights from Composition vs. Inheritance: How to Choose? | Thoughtworks

3 Likes

Yup, or even just let anything go.

3 Likes

Maybe. But I am not sure if it can have any consequence, or hide subtle errors. I (want to) believe there is a reason to define getindex(df::DataFrame, col_ind::ColumnIndex) (as they did), in place of getindex(df::AbstractDataFrame, col_ind::ColumnIndex).

Moreover, SubDataFrame is defined as follows:

struct SubDataFrame{...} <: AbstractDataFrame
    parent::DataFrame
    rows::T 
end

i.e. it doesn’t follow what we called rule 3.

Maybe there is a specific design motivation that I don’t understand…

Certainly. But the great thing about the this is that you could become a contributor just by making a PR, helping develop an API you care about. Pull requests that clean up interfaces are usually well-received and you get lots of feedback.

6 Likes