I’ve heard many people speak with great enthusiasm about the revolutionary nature of multiple dispatch, so after a few months of playing around with Julia, I thought that I would have “gotten it” by now. However, it’s still unclear to me why multiple dispatch should be worth getting excited about.
My understanding is that multiple dispatch is just like a switch in MATLAB, except that it’s limited to switching based only on data types. Can anyone explain how multiple dispatch is different from the switch in this MATLAB function?
function b = foo(a)
switch class(a)
case 'binary'
b = ~a;
case 'double'
b = a + pi;
otherwise
error('Input a must be either binary or double.')
end
end
You can’t add more branches without modifying the source code of that function. With multiple dispatch, anyone can add new methods, with different argument types/number, to a function. Also, in Julia you don’t have branches at all, if the compiler can infer the types of all your variables and perhaps even inline your methods.
Multiple dispatch is what allows one to perform a simulation using numers with units, errors, complex numbers, dual numbers of automatic differentiation, for example, once the arithmetic on those types of numbers is defined (by adding methods to arithmetic functions). Even if the simulation package itself was written even before all those types of numbers where defined. And since specialization can occur at the top level function, all that can work without a single branch.
If there’s only one argument, then that’s not going to be anything more than single dispatch in any language — including Julia. Multiple dispatch is only “special” when there’s more than one argument. But really, the multiple part of multiple dispatch isn’t all that crucial to what makes Julia special… It can often be functionally emulated, and can often sometimes lead to combinatorial challenges.
The “special” thing in my view is the consistent definition of what a function means — its generic definition — and the ability to define these implementations of the function separately from your data (or the class definition). This only works if folks agree on what a function means, but it is why you can pass a number with uncertainty through a differential equation, for example.
As a casual user of Julia, I don’t really use the high-powered things that multiple dispatch allows for most of the time. Most of my work making use of floats, integers, arrays of floats…things that are defined in any language, and where multiple dispatch presents no obvious advantages to the end-user. No matter if addition is defined like +(a::Int, b::Int) or a.__add__(b), addition works to me as expected.
But since multiple dispatch is built into the language, it’ll always be there. And the few times I did “need” it, I found the code to be much cleaner. As an example, a ray tracing code which dispatches a ray-surface intersection routine in a hot loop based on the type of surface, but the specific geometry setup isn’t known until runtime. That’s something where I think multiple dispatch does pretty well in writing clean readable code. It may be possible with Python+Numba, but I’d be fighting the language more.
It’s the second part of that sentence that you ignored. Multiple dispatch is what determines how these implementations get sorted given that no particular argument “owns” the function implementation.
One small example: my Matlab code is very often a horrific mess of input parsing, trying to figure out if the first input is a graphics handle (and what type!), then, if the second is a particular object, then if the next is an array longer than one, etc. etc. Often half the function is input parsing.
inputParser and arguments blocks help, but they are verbose and awkward, and do not work well with leading optional arguments.
And if I subsequently decide to add a different argument or set of input arguments:
If it’s not my own code in the first place, then just forget about it.
When first learning Julia I didn’t immediately see the huge value of multiple dispatch other than the performance that resulted from it. Sometimes it even seemed cumbersome (and still does at times) as I’ve seen it overused in ways that results in lots of duplicate code that would be hard to maintain…. Now that I’m settling into my Julia comfort zone, I find multiple dispatch incredibly liberating. Adding functions with intuitive names is trivial and self-contained. If you want the same function name to do something entirely different when encountering different inputs, you don’t need to go and modify an existing function that is ever growing in complexity. You can use function names like find or contains and download to create intuitive code that feels natural to write and is easy to retrieve from your mental catalogue… and chances are the package that you just imported also have intuitive function naming. You can easily add self-contained documentation for the specific functionality (dispatch) that is transparent to other users. And on top of the complier immediately knows the specific code that it needs to compile. You can do some of this in Matlab but it quickly starts to feel hacky after falling in love with multiple dispatch.
Exactly this. Importantly, one can overload not only a similar API but the same API. With MATLAB, I found the tendency was to vendor everything since you often needed to modify other’s people source code to get it to work for your application. With Python, we see the NumPy API reimplemented a dozen of times but with subtle differences. Perhaps the closest I have seen with this is the Java ecosystem and public interfaces. Even then overly restrictive access control modifiers usually get in the way of composition. Perhaps this only really manifests when you start to publish libraries and packages for others to use.
The most important thing is that a function foo can be extended to types that did not exist when foo was initially written, and code that uses foo will work with those types. That’s amazing.
It also has the potential to cause problems. Assumptions that were valid were made when code was written may not still be valid as the interface is extended. That said, I think these issues are surmountable with time, experience, documentation, and testing.
A good illustration on the value of multiple dispatch is the LinearAlgebra package in the standard library. A simple expression like C = A * B where A and B are matrices may be implemented in many different ways depending on the structure of the matrices A and B. Because Julia allows for modification of arguments within functions the most general approach for this in the LinearAlgebra package is mul!(C, A, B) which overwrites C with the contents of A * B. (In addition there are generic functions lmul! and rmul! that overwrite one of their arguments with the in-place product with another matrix of a special type, such as triangular or diagonal, that allows for in-place operations.)
Try
using LinearAlgebra
methods(mul!)
and you will discover that there are over 250 different methods for that operation within that package alone. And there are even more methods for that generic defined in other packages.
The point is that you can describe in very detailed ways how an operation should be carried out efficiently in terms of time and storage for many different types of any of the arguments by just defining methods for the generic.