Making `setindex` public, how to document it, rethinking the Collections docs

nsajko · July 24, 2024, 2:15pm

@aplavin is trying to make Base.setindex a public API of Julia, which seems like a good idea, given that many packages use it already and it seems like an OK interface. The question is, how to document it. The function is a collection interface, many (immutable) collections in the ecosystem implement it already.

The signature is basically setindex(collection, new_value, index_of_the_value_to_be_replaced). Given that it takes an index into collection, it seems like it should go into the Indexable Collections section of the “Collections and Data Structures” page. However it also doesn’t fit there because the collections listed as fully implementing the “indexable collection” interface don’t implement setindex.

Thus I believe we may need to restructure the docs page somewhat: Collections and Data Structures · The Julia Language

This is @aplavin’s PR:

This is a related issue on Julia’s Github:

github.com/JuliaLang/julia

Taking collection interfaces seriously

opened 10:50PM - 02 Jan 24 UTC

CameronBieganek

domain:docs domain:collections

This will probably get closed as a duplicate, but I think it is high time that w…e provide actual interface definitions for the various collection categories listed on the ["Collections and Data Structures"](https://docs.julialang.org/en/v1/base/collections/) page of the manual. That page hints at interfaces for the various categories, but does not provide a clear guide to which methods are required and which methods are optional in order to implement the various interfaces. (Plus, that page includes docstrings for a bunch of functions that most likely do *not* need to be implemented by new types.) Here is a list of collection interfaces from that page that need better documentation on the interface requirements: - [ ] General Collections - [ ] Indexable Collections - [ ] Dictionaries (`AbstractDict`) - [ ] Set-Like Collections (`AbstractSet`) - [ ] Dequeues See also #34677 and #25941. And also #28784 for documentation of the `AbstractString` interface.

Pinging people from that issue: @CameronBieganek @Raf @ToucheSir @Sukera @jakobjpeters, hoping you can provide input on how to restructure the docs page.

nsajko · July 24, 2024, 2:22pm

Perhaps a good stopgap would be making a new section, e.g., “Immutable indexable collections”, and including setindex as the only function there.

In the longer term, I’m thinking maybe we should make each function an independent interface, so there would be no such thing as an “indexable collection interface”?

jakobjpeters · August 3, 2024, 10:18pm

That makes sense to me, given that it corresponds well to the current organization.

The “Collections and Data Structures” page doesn’t immediately indicate that it contains several interfaces, and is also not exclusively for those interfaces. It feels weird and more difficult to find to me that the “Interfaces” page that doesn’t mention some of these, e.g. the set and deque interfaces. What about moving the interface-related content to the “Interfaces” page and referencing it that page immediately under the “Collections and Data Structures” header (instead of under the “Iteration” sub-header)? Then, setindex would fit into “Indexible Collections” without being conflated with the interface.

mikmoore · August 5, 2024, 3:26pm

I think this is the only sustainable path. Almost any function (except for probably size, axes, and getindex) in the array “interface” should be optional. For the example of setindex in the “immutable collection interface”, it would not be easily applied to UnitRange (it would require a significant change to the data type).

For the interface, it’s good to have a list of common functions so that it’s easy to locate operations that may be sensibly/usefully implemented for a type. Each of those functions (or small collection of functions, for example IndexStyle and functions whose interface depend on it) would indeed have its own specific interface. But this doesn’t seem very different from the status quo.

CameronBieganek · August 12, 2024, 4:03pm

I disagree with this. Making every function it’s own interface is ugly, unergonomic, and it doesn’t work well with the current system of abstract inheritance and multiple dispatch. Consider the following example:

function foo!(x)
    x[2] = length(x) + x[1]
    x
end

In order to properly dispatch this function, we would need the following type annotation for the argument x. (For the sake of argument, I’m assuming every function has its own interface, including size, axes, and getindex.)

foo!(x::Intersection{HasLength, HasGetindex, HasSetindex!})

This is a bit ugly and it doesn’t scale well. But worst of all, it’s not even possible in Julia (yet), because we don’t have an Intersection type.

In my opinion, the best long-term solution for interfaces in Julia is to add abstract multiple inheritance (issue #5), along with an Intersection type. Multiple inheritance of interfaces allows a lot of expressivity and flexibility. With this arrangement, each interface has a short list of required methods, and there are no optional methods. (There may be default implementations of some methods, but that is different from how “optional” methods work in Julia. Optional methods in Julia result in errors for types that do not implement those methods.)

Rust, Java, and Haskell all have multiple inheritance of interfaces (called traits, interfaces, and type classes, respectively). Consider the following DAG of type classes in Haskell:

I think being able to avoid “optional” methods in interfaces is one of the nicest aspects of having multiple inheritance for interfaces.

nsajko · August 12, 2024, 4:30pm

I guess you’re not aware of typeintersect? Is that what you mean?

A flaw in your reasoning, though, is that it’s not in general possible to compute a precise type intersection in a reasonable amount of time. AFAIK.

CameronBieganek · August 12, 2024, 4:49pm

No, what I mean is that Intersection{A, B} is a declared type, just like Union{A, B}. There’s no need to calculate the intersection—the user directly declares what intersection they are interested in.

Intersection could also be called Meet. (See this page from @Sukera’s docs for RequiredInterfaces.jl for related discussion), but I chose Intersection to be consistent with Union; see the following table:

Lattice Theory	Julia
Join	Union
Meet	Intersection

CameronBieganek · August 12, 2024, 5:09pm

Well, there’s probably some calculation necessary. What matters is that we can dispatch on Intersection{A, B}, and that might require less calculation than typeintersect. I’m currently working on a package for dispatching on interfaces, with multiple inheritance of the interfaces, but I haven’t implemented Intersection yet. In my package, Intersection would actually be spelled like this:

@imethod foo(x: A + B) = 1

which is similar to Rust syntax. (In this example, A and B are interfaces, not types.)

It might get tricky though, since the following need to be equivalent:

@imethod foo(x: A + B) = 1
@imethod foo(x: B + A) = 2  # Overwrites the previous definition.

I’ve thought about implementation approaches in the past, and I think I might have had an approach in mind, but I did not take notes. Oops.

Unfortunately I haven’t had much time lately to work on that package.

nsajko · August 12, 2024, 5:25pm

This would be really nice, but, TBH:

the type system is as core of a language feature as they come
there are still known bugs in the type system, e.g., Subtyping non-transitivity involving varargs · Issue #39099 · JuliaLang/julia · GitHub
adding Intersection as a new subtype of Type would increase implementation complexity, so it’d surely come with new bugs

So I don’t think getting Intersection <: Type any time soon is realistic.

CameronBieganek · August 12, 2024, 5:32pm

I agree it’s unlikely to happen anytime soon, but it would be nice if it happened someday. At any rate, we could potentially have abstract multiple inheritance first, and then add Intersection later. Without Intersection but with abstract multiple inheritance we could finally dispatch functions like this:

foo(x::Iterator, y::Table, z::AbstractArray) = ...

Henrique_Becker · August 12, 2024, 7:34pm

Abstract multiple inheritance will not make the type system far more confusing? With it, even a method for which all functions take a single argument may become ambiguous. An example: if the passed object is of a struct C <: A, B ..., and foo(x::A) = ... and foo(x::B) = ... exist, but no foo(x::C) = ..., then the method dispatch becomes ambiguous.

CameronBieganek · August 13, 2024, 2:46pm

Yes, your example would throw a method ambiguity error. But I think that when interfaces and generic functions are well designed, those sorts of ambiguities shouldn’t be too common. Let’s look at a more extended example.

Suppose PkgA defines the following abstract types:

module PkgA

abstract type A end
abstract type B <: A end
abstract type C <: A end

export A, B, C

end

Now suppose PkgB defines the function foo. One design principle that I promote for generic functions is that there should only be one docstring for each arity of a generic function. In other words, each arity of a function has exactly one defined behavior. So PkgB might define foo as follows:

module PkgB

"""
    foo(x::A)

Take an `A` and do something with it.
"""
foo(x::A) = # ...
foo(x::B) = # ...
foo(x::C) = # ...

export foo

end

The foo(x::B) and foo(x::C) methods are specializations of foo that still have the same generic behavior as foo(x::A).

Now suppose PkgC defines a concrete type D. First of all, in the presence of multiple inheritance, I think concrete types are more likely to inherit from abstract types that are not directly related to each other, similar to this made up example:

struct Foo <: Graph, Eq, Serializable end

So, the type D in PkgC is more likely to look like this,

module PkgC

using PkgA

struct D <: B, Z end

end

than to look like this,

module PkgC

using PkgA

struct D <: B, C end

end

And thus, for the type struct D <: B, Z end, there won’t be any method ambiguity when you call foo(D()).

If the type D is defined as struct D <: B, C end, then yes, foo(D()) would result in an ambiguity error. This could indicate that objects of type Intersection{B, C} are common, and it would make sense for the developer of PkgB to add a method foo(x::Intersection{B, C}). Alternatively, it might make sense for the developer of PkgC to add a method foo(x::D).

This is similar to the current situation with method ambiguities. If a package author defines the following two methods for bar, then they should really also define a third method to resolve the method ambiguity:

bar(x::Integer, y::Int) = 1
bar(x::Int, y::Integer) = 2

At any rate, this is all speculation at this point. We won’t really know how severe of a problem method ambiguities will be until we have a working prototype to experiment with in the ecosystem.

Henrique_Becker · August 14, 2024, 6:30pm

I never thought that the two parents of a type A would be declared in the same package.

I am thinking exactly about the case of two distinct packages in the ecosystem that both define an abstract type that others are expected to inherit, and both packages also define an interface that is enough for a third package to work over. For example, two hypothetical packages AlwaysOrderedDatastructures and Queuable that define overlapping but not identical interfaces, and a third package Consumables that define a methods consume_one_by_one for each of the interfaces, but both interfaces are implemented by your package MessageList. Before, you could not know about the Consumables package, but your users could apply consume_one_by_one over your objects “for free”. Now, I have the feeling many will unknowingly solve the problem by type piracy (defining more specific method of a function you do now own for a type you also do not own) than doing the right thing (creating their own wrapper object, inheriting the same interfaces, and creating for it a more specialized method of consume_one_by_one).

More than that, in fact. Now I am in doubt about how this will be implemented. There will be a way to say, for example, “now that I created this new concrete type that inherits those two interfaces, I will just create a more specialized method to avoid ambiguity, and my specialized method just calls an already existing method of the same function as my object was just of the first supertype (or just of the second)”, or it will be impossible to force a function (that you own or not) to treat a concrete type object as an specific abstract type between its supertypes? If this is the case, you would need to re-implement the whole logic in your specialized method, or create an auxiliary type that only inherits one of the interfaces, to be able to provide a specialized method.

CameronBieganek · August 15, 2024, 12:03am

@nsajko Sorry, we are rather off topic now, but I think it’s an interesting discussion, if you don’t mind.

@Henrique_Becker Thanks for the example. Let me try to flesh it out a bit more.

Since the interfaces overlap but are not identical, there must be at least one generic function in common between the interfaces. Another abstract-type/interface principle that has not been mentioned in this thread (I believe it is mentioned in the docs page from Sukera that I linked to) is that every abstract type should correspond to an interface. (And an interface is essentially just a list of generic functions, each of which has a well defined behavior.) So, the AlwaysOrderedDataStructure and Queuable abstract types should inherit from a common abstract type. The overlapping parts of the AlwaysOrderedDataStructure and Queuable interfaces is the interface of the abstract type that they inherit from.

Actually, let’s suppose that they inherit from two abstract types in Base. Perhaps they are defined like this (with some hypothetical syntax for defining interfaces):

abstract type ImmutableCollection end

@interface ImmutableCollection begin
    length
    isempty
end

abstract type Iterator end

@interface Iterator begin
    iterate
end

So, we can imagine that the two packages define their abstract types (interfaces) like this:

module AlwaysOrderedDataStructures

abstract type AlwaysOrderedDataStructure <: ImmutableCollection, Iterator end

@interface AlwaysOrderedDataStructure begin
    f1
    f2
end

export # ...

end

module Queuables

abstract type Queuable <: ImmutableCollection, Iterator end

@interface Queuable begin
    g1
    g2
end

export # ...

end

Now suppose the Consumables package defines consume_one_by_one like this:

module Consumables

using AlwaysOrderedDataStructures, Queuables

"""
    consume_one_by_one(x::Intersection{ImmutableCollection, Iterator})

Consume `x` one element at a time.
"""
consume_one_by_one(x::Intersection{ImmutableCollection, Iterator}) = # ...
consume_one_by_one(x::AlwaysOrderedDataStructure) = # ...
consume_one_by_one(x::Queuable) = # ...

end

(Note that the AlwaysOrderedDataStructure and Queuable are specializations—they should have the same behavior as the Intersection{ImmutableCollection, Iterator} method.)

Now if a user tries consume_one_by_one(MessageList()), they will get an ambiguity error, as you said. However, since the developer of the Consumables package has provided methods for both AlwaysOrderedDataStructure and Queuable, it would be reasonable for them (the Consumables package developer) to consider implementing an additional method like this:

consume_one_by_one(x::Intersection{AlwaysOrderedDataStructure, Queuable}) = # ...

Of course this could get thorny if there are lots of abstract types that consume_one_by_one dispatches on (for now assume those methods are defined in the Consumables package), but maybe there aren’t that many intersection types that actually make sense. Or maybe there aren’t that many abstract types that consume_one_by_one needs to specialize on.

Of course things might get bad if there is yet another third-party package called Streams that defines consume_one_by_one(x::Stream), and the MessageList type also implements the Stream interface. I’ll have to think more about that…

I haven’t yet thought through the example where the dependency orders are reversed, i.e. where the AlwaysOrderedDataStructures and Queuables packages depend on the Consumables package.

Topic		Replies	Views
The challenges of documenting generic functions General Usage documentation	24	1326	August 29, 2023
What interface bugs have you ran into in the ecosystem? General Usage question	35	1546	January 31, 2024
[ANN] MultipleInterfaces.jl: Interfaces with multiple inheritance and multiple dispatch Package Announcements dispatch , interface	28	2074	August 8, 2025
Could introducing formal interfaces be nonbreaking? Internals & Design speculative	36	789	June 4, 2025
Industrial standards for Julia packages Internals & Design	20	2518	October 16, 2018

Making `setindex` public, how to document it, rethinking the Collections docs

Related topics