Julia adoption: Method confusion

I have been in the ‘Programming Languages Development’ Discord (fun place) for a couple of weeks now, so many times when I bring up multiple dispatch, someone says:

“That’s messy, you quickly drown in methods, and loose sight over what signature calls what body.”

I feel like in order to help adoption, it would be nice to have some kind of reference (in the documentation) that collects all the tools and techniques, that help around that.

Currently, it seems that is still the number one concerning the people, who know at least that its different from function overloading.

I like to put together a few ideas, how you guys deal with that, and collect them on a doc page.

What do you think about it?

2 Likes

For me multiple dispatch is a great thing, it allow an unique way to do OOP, plus it makes abstraction and interfacing easier while avoiding the inheritance.
For example


abstract type CustomArray <: AbstractArray end

struct CArray <: CustomArray
data
end

# Then you create dispatch for size, IndexStyle, length, getindex, setindex! and you are pretty much done, you can use the different Arrays methods like broadcasting or iteration
1 Like

In some ways, they are correct. It is messy and does create challenges. While Julia’s implementation is quite efficient, multiple dispatch makes us prone to inadvertent invalidations.

In practice, it is not as messy as some would imagine if we obey certain social practices such as avoiding type piracy or punning.

In many ways, I find Julia’s dispatch much more intuitive than dispatch in object oriented schemes in Java.

14 Likes

With an active runtime, we can figure out what method a call signature (which/@which) or statically dispatched call (Cthulhu.jl, would be nice if there was a @code_XX to do a non-recursive version of that). Runtime dispatch goes to multiple methods depending on the inputs, which is the point and isn’t too different from other languages. But if we’re just looking right at source code, it’s true that a call often won’t have its input types specified (dynamic typing after all), and even if it does, method signatures often won’t obviously match on sight. Static typing with little inference or avoiding multiple methods for a function (including function overloading) does have its perks, and we do have to deal with the tradeoffs. You can help, but sometimes it’s not enough to convince people out of their valid preferences.

3 Likes

This video shows many tools, that could help with that.

Julia is a language that could benefit a lot from IDE features like these.

(You could look it at 1.05x speed) https://youtu.be/baxtyeFVn3w

Any features in particular, without having to first go through a 43 minute video?

1 Like

The point of generic programming is precisely that: you don’t have to keep track of which particular method is being called as they should all conform to the same interface. (Debugging is of course an exception).

For a + b, all I need know is that the concept of “addition” makes sense, either a and b are both numbers, or the analogy can be made meaningfully (cf 0 + true).

This becomes problematic when people try to pun on symbols. There have been discussions about Base exporting some common verbs, so that people don’t need to import a common WhateverBase.jl package, but these are misunderstandings of how Julia should be used.

So my advice would be: “Don’t do this and you will be fine.”

4 Likes

How can I be save from these misuses, and detect them? Is that somewhere documented?

You can’t be safe, at the end of the day. This is why multiple dispatch is a two-edged sword. The most safety you can get is by testing the assumptions your code is making.

Personally, I’ve had a good experience with defining formal interfaces via check_… functions that get called automatically on the arguments of high-level routines unless check=false is passed.

Presumably, some implementation of “traits” could do something similar with more efficiency but less flexibility than tests.

1 Like

Relevant open doc PR of mine:

3 Likes

There is no language level feature protecting you from misuse in general. How could that be? Programmers an ingenious when it comes to misuse. Many of us have seen C++ code in the wild that looks like

public calculate_tree_depth() {
  return 1; // FIXME code up calculation
}

that ticks all the boxes and makes the compiler happy. In ideal cases, it is caught in code review, but frequently not.

That said, Julia has tooling for catching mistakes, including unit tests, analyzers like JET.jl (it catches a lot of errors about missing interfaces), etc.

But the key thing about this tooling is that it is opt-in: you invoke it at the point you think it makes sense for your code. This is because the language is designed for interactive development and quick prototyping: some of your code will run even if there are pieces missing.

Consider sending

struct MyFancyVector{T,S} <: AbstractVector{T}
    contents::T
end

to the REPL. At this point it does not support the formal interface of AbstractVector as it has no methods. I can fill them in later as I code, or redesign the whole thing (in 1.12 this is seamless as you can redefine struct) and then adapt the existing methods.

Users of pre-compiled languages come to Julia with the wrong kind of expectations: they are used to an environment where you are supposed to make the compiler happy first before you get to do anything. They have to learn a different coding style and a different set of QA tools. Frankly, not everyone will like doing this, and that is fine, no one is forced to use Julia.

2 Likes

Just to preface, this could warrant splitting the topic because it’s getting farther from the original topic of not intuitively knowing what method a call dispatches to. Or it may not because this is precisely about failing to realize the ambiguous methods a possible call will fail to dispatch to.

That PR doesn’t actually capture the cause of the ambiguity. It strictly blames uneven argument type annotations, that is foo(::A, ::B) where (A <: B) && !(B <: A), pointing to real code in MultivariatePolynomials.jl:

Base.:(==)(p::RationalPoly, q::RationalPoly) = p.num * q.den == q.num * p.den
...
Base.:(==)(α, q::RationalPoly) = α * q.den == q.num
Base.:(==)(q::RationalPoly, α) = α == q

The consequence is that while MultivariatePolynomials.jl itself would dispatch fine, another package that tries to extend == in the same way will create sources of ambiguity for code that mixes the packages, very possibly including itself. To adapt the general example to this case:

module MyModule
struct MyType end
Base.:(==)(x::MyType, y::MyType) = ...
Base.:(==)(x, y::MyType) = ...
Base.:(==)(x::MyType, y) = ...

A user tries:

using MultivariatePolynomials, MyModule
MyType(...) == RationalPoly(...) # MethodError!
#=
Candidates:
Base.:(==)(x::MyType, y)
Base.:(==)(α, q::RationalPoly) 
=#

The thing is, the same consequence can happen with even argument type annotations. Let the first package have:

module A
foo(x::Real, y::Real) = ...

And another package does:

module B
using A: foo
struct MyNum <: Real end
A.foo(a::Number, b::MyNum) = ...
A.foo(a::MyNum, b::Number) = ...
A.foo(a::MyNum, b::MyNum) = ...

so a user tries:

foo(3.14, MyNum()) # MethodError!
#=
Candidates:
foo(x::Real, y::Real)
foo(a::Number, b::MyNum)
=#

To visualize it, drawing a line through columns of parts of the type hierarchy for each ambiguous method signature will show intersections, whether it’s within the same branch (MyNum <: Real <: Number) or across different type branches (RationalPoly vs MyType) with a shared parent node (Any). Swapping uneven argument type annotations would produce more of an X, but the intersection can also have a fully horizontal line. The only method signatures whose lines CANNOT intersect any other’s are:

  1. all leaf type annotations: this includes ::Type{T} for type input T and concrete types for everything else
  2. all ::Any annotations for arguments, and the callable type annotation that only strictly subtypes Function or Any. That’s because Function, Any, and direct type parameters are disallowed in the callable’s type annotation.


While it does help to discourage shared supertypes (Any is the easy one) in order to separate type hierarchies per position, it’s important for functionality sometimes. The annotations pattern in MultivariatePolynomials.jl or module B are also important for functionality and widely used (and documented) to resolve method ambiguities. Currently, I think there are two takeaways:

  1. An acknowledgement that extended functions e.g. Base.== do not need to support arguments mixing types among unrelated dependents e.g. MultivariatePolynomials vs MyModule. Putting aside the lack of need and likely impossibility, one of the packages had to be aware of the other to implement ==, and it’s not reasonable to expect everyone to know what everyone else is doing with ==. A dependency implementing its interfaces well enough to work well in a dependent’s new contexts is already difficult enough, we shouldn’t expect the infeasible from composability.

  2. If one package is in fact aware of the other (B is clearly aware of A through the extended function foo), there is a responsibility to make the types work together without ambiguity in the extended method table or resort to a new function with a fresh method table. If you can get away with it, only use type annotations in your own type hierarchies (iffy on exactly what is necessary, I think only one position is needed, a stronger version of what’s suggested to prevent type piracy from breaking preexisting code). In short, know the method table or leave it alone!

2 Likes

Just to emphasize what @Tamas_Papp mentioned, generic programming is the solution to the problem quoted by OP (“That’s messy, you quickly drown in methods, and loose sight over what signature calls what body”).

Generally speaking, a Julia function should have a single generic meaning or semantics that is independent of which particular types the function is called with, so we don’t actually need to know which method of the function gets called.

Consider the following example:

"""
    mylast(x)

Iterate through every element of the iterator `x` and
return the last value iterated. The default definition
of `mylast` has O(n) time complexity, but some types
might define more efficient implementations.

Throw an `ArgumentError` if `x` is empty.
"""
function mylast(x)
    tup = iterate(x)
    isnothing(tup) && throw(ArgumentError("Iterator is empty."))
           
    local val
    while !isnothing(tup)
        val, state = tup
        tup = iterate(x, state)
    end
        
    val
end

function mylast(x::UnitRange{<:Integer})
    x.stop >= x.start || throw(ArgumentError("Iterator is empty"))
    x.stop
end

The mylast function has two methods, but it has a single generic definition. The calls mylast([1, 2, 3]) and mylast(1:3) have the same behavior, so we don’t need to worry about which of the two mylast method definitions actually gets called under the hood.

(I guess that example doesn’t use multiple dispatch, but the principle is exactly the same for functions with multiple arguments.)

3 Likes