Making `setindex` public, how to document it, rethinking the Collections docs

@aplavin is trying to make Base.setindex a public API of Julia, which seems like a good idea, given that many packages use it already and it seems like an OK interface. The question is, how to document it. The function is a collection interface, many (immutable) collections in the ecosystem implement it already.

The signature is basically setindex(collection, new_value, index_of_the_value_to_be_replaced). Given that it takes an index into collection, it seems like it should go into the Indexable Collections section of the “Collections and Data Structures” page. However it also doesn’t fit there because the collections listed as fully implementing the “indexable collection” interface don’t implement setindex.

Thus I believe we may need to restructure the docs page somewhat: Collections and Data Structures ¡ The Julia Language

This is @aplavin’s PR:

This is a related issue on Julia’s Github:

Pinging people from that issue: @CameronBieganek @Raf @ToucheSir @Sukera @jakobjpeters, hoping you can provide input on how to restructure the docs page.

3 Likes

Perhaps a good stopgap would be making a new section, e.g., “Immutable indexable collections”, and including setindex as the only function there.

In the longer term, I’m thinking maybe we should make each function an independent interface, so there would be no such thing as an “indexable collection interface”?

2 Likes

That makes sense to me, given that it corresponds well to the current organization.

The “Collections and Data Structures” page doesn’t immediately indicate that it contains several interfaces, and is also not exclusively for those interfaces. It feels weird and more difficult to find to me that the “Interfaces” page that doesn’t mention some of these, e.g. the set and deque interfaces. What about moving the interface-related content to the “Interfaces” page and referencing it that page immediately under the “Collections and Data Structures” header (instead of under the “Iteration” sub-header)? Then, setindex would fit into “Indexible Collections” without being conflated with the interface.

1 Like

I think this is the only sustainable path. Almost any function (except for probably size, axes, and getindex) in the array “interface” should be optional. For the example of setindex in the “immutable collection interface”, it would not be easily applied to UnitRange (it would require a significant change to the data type).

For the interface, it’s good to have a list of common functions so that it’s easy to locate operations that may be sensibly/usefully implemented for a type. Each of those functions (or small collection of functions, for example IndexStyle and functions whose interface depend on it) would indeed have its own specific interface. But this doesn’t seem very different from the status quo.

I disagree with this. Making every function it’s own interface is ugly, unergonomic, and it doesn’t work well with the current system of abstract inheritance and multiple dispatch. Consider the following example:

function foo!(x)
    x[2] = length(x) + x[1]
    x
end

In order to properly dispatch this function, we would need the following type annotation for the argument x. (For the sake of argument, I’m assuming every function has its own interface, including size, axes, and getindex.)

foo!(x::Intersection{HasLength, HasGetindex, HasSetindex!})

This is a bit ugly and it doesn’t scale well. But worst of all, it’s not even possible in Julia (yet), because we don’t have an Intersection type.

In my opinion, the best long-term solution for interfaces in Julia is to add abstract multiple inheritance (issue #5), along with an Intersection type. Multiple inheritance of interfaces allows a lot of expressivity and flexibility. With this arrangement, each interface has a short list of required methods, and there are no optional methods. (There may be default implementations of some methods, but that is different from how “optional” methods work in Julia. Optional methods in Julia result in errors for types that do not implement those methods.)

Rust, Java, and Haskell all have multiple inheritance of interfaces (called traits, interfaces, and type classes, respectively). Consider the following DAG of type classes in Haskell:

I think being able to avoid “optional” methods in interfaces is one of the nicest aspects of having multiple inheritance for interfaces.

4 Likes

I guess you’re not aware of typeintersect? Is that what you mean?

A flaw in your reasoning, though, is that it’s not in general possible to compute a precise type intersection in a reasonable amount of time. AFAIK.

No, what I mean is that Intersection{A, B} is a declared type, just like Union{A, B}. There’s no need to calculate the intersection—the user directly declares what intersection they are interested in.

Intersection could also be called Meet. (See this page from @Sukera’s docs for RequiredInterfaces.jl for related discussion), but I chose Intersection to be consistent with Union; see the following table:

Lattice Theory Julia
Join Union
Meet Intersection
4 Likes

Well, there’s probably some calculation necessary. What matters is that we can dispatch on Intersection{A, B}, and that might require less calculation than typeintersect. I’m currently working on a package for dispatching on interfaces, with multiple inheritance of the interfaces, but I haven’t implemented Intersection yet. In my package, Intersection would actually be spelled like this:

@imethod foo(x: A + B) = 1

which is similar to Rust syntax. (In this example, A and B are interfaces, not types.)

It might get tricky though, since the following need to be equivalent:

@imethod foo(x: A + B) = 1
@imethod foo(x: B + A) = 2  # Overwrites the previous definition.

I’ve thought about implementation approaches in the past, and I think I might have had an approach in mind, but I did not take notes. Oops. :man_facepalming:

Unfortunately I haven’t had much time lately to work on that package.

This would be really nice, but, TBH:

  1. the type system is as core of a language feature as they come
  2. there are still known bugs in the type system, e.g., Subtyping non-transitivity involving varargs ¡ Issue #39099 ¡ JuliaLang/julia ¡ GitHub
  3. adding Intersection as a new subtype of Type would increase implementation complexity, so it’d surely come with new bugs

So I don’t think getting Intersection <: Type any time soon is realistic.

1 Like

I agree it’s unlikely to happen anytime soon, but it would be nice if it happened someday. At any rate, we could potentially have abstract multiple inheritance first, and then add Intersection later. Without Intersection but with abstract multiple inheritance we could finally dispatch functions like this:

foo(x::Iterator, y::Table, z::AbstractArray) = ...
1 Like

Abstract multiple inheritance will not make the type system far more confusing? With it, even a method for which all functions take a single argument may become ambiguous. An example: if the passed object is of a struct C <: A, B ..., and foo(x::A) = ... and foo(x::B) = ... exist, but no foo(x::C) = ..., then the method dispatch becomes ambiguous.

2 Likes

Yes, your example would throw a method ambiguity error. But I think that when interfaces and generic functions are well designed, those sorts of ambiguities shouldn’t be too common. Let’s look at a more extended example.

Suppose PkgA defines the following abstract types:

module PkgA

abstract type A end
abstract type B <: A end
abstract type C <: A end

export A, B, C

end

Now suppose PkgB defines the function foo. One design principle that I promote for generic functions is that there should only be one docstring for each arity of a generic function. In other words, each arity of a function has exactly one defined behavior. So PkgB might define foo as follows:

module PkgB

"""
    foo(x::A)

Take an `A` and do something with it.
"""
foo(x::A) = # ...
foo(x::B) = # ...
foo(x::C) = # ...

export foo

end

The foo(x::B) and foo(x::C) methods are specializations of foo that still have the same generic behavior as foo(x::A).

Now suppose PkgC defines a concrete type D. First of all, in the presence of multiple inheritance, I think concrete types are more likely to inherit from abstract types that are not directly related to each other, similar to this made up example:

struct Foo <: Graph, Eq, Serializable end

So, the type D in PkgC is more likely to look like this,

module PkgC

using PkgA

struct D <: B, Z end

end

than to look like this,

module PkgC

using PkgA

struct D <: B, C end

end

And thus, for the type struct D <: B, Z end, there won’t be any method ambiguity when you call foo(D()).

If the type D is defined as struct D <: B, C end, then yes, foo(D()) would result in an ambiguity error. This could indicate that objects of type Intersection{B, C} are common, and it would make sense for the developer of PkgB to add a method foo(x::Intersection{B, C}). Alternatively, it might make sense for the developer of PkgC to add a method foo(x::D).

This is similar to the current situation with method ambiguities. If a package author defines the following two methods for bar, then they should really also define a third method to resolve the method ambiguity:

bar(x::Integer, y::Int) = 1
bar(x::Int, y::Integer) = 2

At any rate, this is all speculation at this point. We won’t really know how severe of a problem method ambiguities will be until we have a working prototype to experiment with in the ecosystem.

2 Likes

I never thought that the two parents of a type A would be declared in the same package.

I am thinking exactly about the case of two distinct packages in the ecosystem that both define an abstract type that others are expected to inherit, and both packages also define an interface that is enough for a third package to work over. For example, two hypothetical packages AlwaysOrderedDatastructures and Queuable that define overlapping but not identical interfaces, and a third package Consumables that define a methods consume_one_by_one for each of the interfaces, but both interfaces are implemented by your package MessageList. Before, you could not know about the Consumables package, but your users could apply consume_one_by_one over your objects “for free”. Now, I have the feeling many will unknowingly solve the problem by type piracy (defining more specific method of a function you do now own for a type you also do not own) than doing the right thing (creating their own wrapper object, inheriting the same interfaces, and creating for it a more specialized method of consume_one_by_one).

More than that, in fact. Now I am in doubt about how this will be implemented. There will be a way to say, for example, “now that I created this new concrete type that inherits those two interfaces, I will just create a more specialized method to avoid ambiguity, and my specialized method just calls an already existing method of the same function as my object was just of the first supertype (or just of the second)”, or it will be impossible to force a function (that you own or not) to treat a concrete type object as an specific abstract type between its supertypes? If this is the case, you would need to re-implement the whole logic in your specialized method, or create an auxiliary type that only inherits one of the interfaces, to be able to provide a specialized method.

@nsajko Sorry, we are rather off topic now, but I think it’s an interesting discussion, if you don’t mind.

@Henrique_Becker Thanks for the example. Let me try to flesh it out a bit more.

Since the interfaces overlap but are not identical, there must be at least one generic function in common between the interfaces. Another abstract-type/interface principle that has not been mentioned in this thread (I believe it is mentioned in the docs page from Sukera that I linked to) is that every abstract type should correspond to an interface. (And an interface is essentially just a list of generic functions, each of which has a well defined behavior.) So, the AlwaysOrderedDataStructure and Queuable abstract types should inherit from a common abstract type. The overlapping parts of the AlwaysOrderedDataStructure and Queuable interfaces is the interface of the abstract type that they inherit from.

Actually, let’s suppose that they inherit from two abstract types in Base. Perhaps they are defined like this (with some hypothetical syntax for defining interfaces):

abstract type ImmutableCollection end

@interface ImmutableCollection begin
    length
    isempty
end

abstract type Iterator end

@interface Iterator begin
    iterate
end

So, we can imagine that the two packages define their abstract types (interfaces) like this:

module AlwaysOrderedDataStructures

abstract type AlwaysOrderedDataStructure <: ImmutableCollection, Iterator end

@interface AlwaysOrderedDataStructure begin
    f1
    f2
end

export # ...

end
module Queuables

abstract type Queuable <: ImmutableCollection, Iterator end

@interface Queuable begin
    g1
    g2
end

export # ...

end

Now suppose the Consumables package defines consume_one_by_one like this:

module Consumables

using AlwaysOrderedDataStructures, Queuables

"""
    consume_one_by_one(x::Intersection{ImmutableCollection, Iterator})

Consume `x` one element at a time.
"""
consume_one_by_one(x::Intersection{ImmutableCollection, Iterator}) = # ...
consume_one_by_one(x::AlwaysOrderedDataStructure) = # ...
consume_one_by_one(x::Queuable) = # ...

end

(Note that the AlwaysOrderedDataStructure and Queuable are specializations—they should have the same behavior as the Intersection{ImmutableCollection, Iterator} method.)

Now if a user tries consume_one_by_one(MessageList()), they will get an ambiguity error, as you said. However, since the developer of the Consumables package has provided methods for both AlwaysOrderedDataStructure and Queuable, it would be reasonable for them (the Consumables package developer) to consider implementing an additional method like this:

consume_one_by_one(x::Intersection{AlwaysOrderedDataStructure, Queuable}) = # ...

Of course this could get thorny if there are lots of abstract types that consume_one_by_one dispatches on (for now assume those methods are defined in the Consumables package), but maybe there aren’t that many intersection types that actually make sense. Or maybe there aren’t that many abstract types that consume_one_by_one needs to specialize on.

Of course things might get bad if there is yet another third-party package called Streams that defines consume_one_by_one(x::Stream), and the MessageList type also implements the Stream interface. I’ll have to think more about that…

I haven’t yet thought through the example where the dependency orders are reversed, i.e. where the AlwaysOrderedDataStructures and Queuables packages depend on the Consumables package.

1 Like