Accessing type internal fields in package interfaces

Hi all,
we come just from small course where several people have taught several Julia packages to students over the last week. One thing we realized was that some packages naturally assume a workflow in which you constantly access the fields of a type with getproperty, using dump to see what is inside objects and often reaching several steps in (obj.subobj.subobj2.field) whereas other packages clearly think of the contents of types as private and expect users to acquire info from types via getter functions.

Of course, this discrepancy makes it hard to communicate to students and new users what the preferred workflow is (if there even is any kind of homogeneity of that over the package ecosystem).

I’d assume that programmer background matters, where people coming from Python naturally would think that everything you get after typing mytype, . then hitting tab is the interface for the type - whereas people with c++ backtround would tend to think of fields as private to the type.

Are there any general thougts and recommendations here? What do people think? Is it preferable to try to have some common culture here in the package ecosystem (I’d tend to think “yes”).

6 Likes

I would say that it works the same as (unexported) functions. Basically, if it is documented to be public it is public, otherwise, it is private.

6 Likes

One problem is when code is undocumented (which in my experience is 90% of it)…

1 Like

So, as part of said course where the confusion happened: I think packages should generally try to have their fields private, such that the user does not interface with them by doing obj.subobj.subobj2.field.

The reason public fields should be avoided is that having public fields means the memory layout of an object become part of the API and unable to change without breaking user’s code. This is problematic, because what a package is able to do with a struct, and how efficiently, is tightly coupled to its memory layout. By constraining the memory layout, even package’s internal functionality become severely constrained.

Furthermore, it’s not actually that useful to have public fields. Most fields of most structs are an implementation detail. Where they are not, the package can provide a function to access or set the field and have those functions be public.

That being said, there are exceptional cases where you might want fields of a struct to be part of the API. For example, a function that returns a mutable struct simply as a “bag of values”. Also, the Pythonic principle of “we’re all consenting adults here”, meaning that users should be free to use a package’s internals if they understand the implication, also applies.

@mkborregaard I’ll make issues on the packages in question and hear what the package authors think. Either a bunch of getter functions can be implemented, or it can be clearly stated up front in the docs that fields of objects are expected to be manipulated directly.

3 Likes

Why does getproperty overloading not fix that?

It will fix that, but overloading getproperty feels like a hack, or at the very least an ugly, surprising plaster to paper over an API break, whereas simply not exposing internal behaviour as API in the first place is much cleaner and easier to understand.

3 Likes

I see how getproperty can fix minor changes, like name changes, and perhaps removing a redundant field. But I would think it’s harder to use it to paper over a major re-design of the internals of a data structure, particularly if it also involves setproperty.

Good considerations I think!

FWIW, this is considered good practice by many in the python world. I.e. start by allowing access to a field (property in python) and then use the equivalent of overloading getproperty if you need more complex behavior, eg, computation when the field is accessed. But, for this to work you have to consciously make accessing the field part of the API from the outset.

Python classes and Julia structs occupy different ecological niches as well. In Julia it’s not so common to load a bunch of functionality into fields in a big struct. Smaller structs are more common and functions dispatching on a type can often replace field access. I do think the convention of prepending an underscore to a field that should be considered private might be useful in Julia. It’s easier than documentation. You could make a field with an underscore the default. Then a field with no underscore signals that you intend users to access it directly. It’s not 100% clear to me how useful this is. The prevalence of immutable structs in Julia makes this convention somewhat less useful.

1 Like

When defining a struct that has some fields of general use, I accompany the struct with functions with names that match the field names and are specialized to the struct to serve as getters. Generally, I do not want clients to be setting anything, as that may break internal assumptions.

What does that mean in terms of a design recommendation though?

to paper over an API break

The whole point is that it doesn’t break the API.

Also

exposing internal behaviour as API

then it is by definion not internal.

The worry was that you somehow lock yourself into a certain struct layout by making getproperty part of the public API but with getproperty overloading that is not the case. It is at the same level as a normal function.

2 Likes

As I read it here (and in similar discussions) experienced Julia developers would keep fields private (they’re implementation details) and provide methods for setting and getting if (and only if) necessary. I think it’s fair to call that consensus.
That said, if someone well versed in Python can make a package that’s well designed along Pythonic principles, I’d prefer that over a mixed design which is half of each. So having too strict a community sense of ‘good design’ could backfire if contributors are not able to follow it.

Cost of public properties

  1. In my view, the main advantage of Julia’s multiple dispatch is that it allows everybody equal ability to define public functions of x::X, rather than privileging the owner of X. As a function author, I can write new functions that look and feel like they belong on x just as much as X’s author can.

    Properties are similar to OOP’s X().f() methods. They are in a namespace controlled by the author of X. The symmetric extensibility permitted by multiple dispatch doesn’t work nearly as well if properties are part of a public interface because they privilege a single argument and its getproperty method.

    A function author cannot define properties that look and feel native to X without piracy.

  2. It is useful when writing a package to think about the public interface separately from the implementation. Public properties reduce the delineation between interface and implementation, and may cause implementation details to leak into interfaces unnecessarily.

So public properties come at a significant cost.

Benefits of public properties

  1. At the definition site, using properties means authors don’t have to write out the definition of the getters:
struct Person
name
age
end

is shorter than

struct Person
name
age
end
name(p::Person) = p.name
age(p::age) = p.age
  1. At the call site, person.name needs one fewer character than name(person).

  2. At the call site, person.name |> length puts the property on the right rather than the left of name(person) |> length, so chains are read linearly in left-to-right order.

  3. At the call site, if multiple packages in scope provide functions name for their objects, they need to be used in package-namespaced form Persons.name(person) or imported under a different name like using Persons: name as pname, while person.name does not.

  4. At the call site, dot syntax can hint at certain information about a property access. Namely, person.name indicates it does not require expensive computation, does not raise an error, and does not change its value unless otherwise mutated.

Analysis

Benefits 1,2,3 are issues of surface syntax that can be solved with a macro or operator, without paying Cost 1 of public properties.

For Benefit 1,

@getters struct Person
name
@nogetter age
end

could define name(p::Person) = p.name without the user having to write it out.

For Benefit 2 and Benefit 3, various packages such as Chain.jl have been exploring chaining interfaces.

For example,

using Chain

struct Person
name
age
end
name(p::Person) = p.name

> person = Person("Alice", 99);
> @chain person name length
# 5

This chain is read linearly in left-to-right order (Benefit 3).
The chain doesn’t require parentheses (Benefit 2) though it does require @chain. The function-on-the-right invocation can also be written person|>name in the same number of characters as name(person) and one more than person.name.

A bigger change would be making a property p only accessible via function call p(x), and x.p just a shortcut equivalent to @chain x p. If desired, x.p might automatically disambiguate to parentmodule(x).p(x) if there is another p in the calling scope. This would require experimentation to see where it causes breakage.

Benefit 4 (name collisions) remains but I believe it is small because the function import can be handled once at the top of the caller’s file.

Benefit 5 (property hinting) seems hardest to attain without dot-access properties, because hinting at those properties is essentially a form of documentation rather than a fact about the code. This could be done with prose documentation, or a form of code specification adjacent to each function definition.

Conclusions

Most of the advantages of public properties can be obtained in a functional API without the downsides. Some changes will make that easier:

  • Make it easier to define getters and setters. Promote the use of @getters where appropriate and consider including it in Base.

  • Make it easier to chain calls. Promote |> and Chain.jl.

  • For radicals duly cautioned, consider making a property p only accessible via function call, and x.p equivalent to @chain x p.

  • Examine structured ways of specifying function properties like “constant value”, “low-cost access”, and “non-erroring”, and more.

4 Likes

I agree with these conclusions. Overriding getproperties during a refactor never feels clean to me, it’s not extensible or generic. The biggest improvement in my code came when I swapped to always using getter functions, even internally. Refactors are so much cleaner and easier to think about.

Also true a big issue is discoverability loss when you use getter methods. Better tooling for function hinting would help… we could have a key command for adding a wrapper function around an object after we have typed it in the REPL. If @getters was used to define getter functions we could even use them at the top of the list during auto-completion.

2 Likes