Is there a strong argument for why Julia inherently provides a more composable eco system relative to python and other major science ML languages? Wanting to know for grant application to support development (by others).
Multiple dispatch:
Stefan should really write that up as a manuscript - while the talk is incredibly clear, I fear a YouTube link is not likely to have the sort of credibility we’d want to cite in grant applications.
There are bunch of papers already:
None that make this point about multiple dispatch being the source of interoperability between packages. The array operators one comes closest, but it’s focused more on performance.
I think a slightly more formal write up of the talk, maybe with some more modern examples - could even use the same name - would be useful.
Another big reason is that almost everything in Julia is built around the native Array
type. In Python all the big ecosystems use their own, incompatible version of arrays, with duplicate versions of methods like .std()
For example, you might think you could write a function like
def sharpe(s):
return s.mean()/s.std()
in Python and have it work the same on pandas series, numpy ndarrays, and pytorch tensors. But you can’t, because numpy uses 0 degrees of freedom by default, while the others use 0, so you’ll get different results.
What if you try to explicitly pass degrees of freedom, like
def sharpe(s):
return s.mean()/s.std(ddof=1)
Now this works for pandas and numpy, but fails for pytorch, because it expects an unbiased
argument instead of ddof
. In order to actually get a version that does the same thing with each type of array, you need to write something like
def standardized_std(data, ddof):
if isinstance(data, torch.Tensor):
return data.std(unbiased == 1)
else:
return data.std(ddof=ddof)
Which is pretty clunky!
Meanwhile in Julia, almost every implementation of std
you’ll see just looks at an underlying array and calls std
from Statistics
, giving you the same functionality and interface by default.
The point of the namesake “unreasonable usefulness of mathematics in the natural sciences” is that there isn’t a reason why math is so useful — it just is:
fundamentally, we do not know why our theories work so well
But the point of the Julia talk is explaining precisely the reason why multiple dispatch is effective. So the title could be improved imo.
I like this example demonstrating python’s built-in complex type being only over float, leading to an inability to express complex rational numbers in the stdlib: Is Julia's way of OOP superior to C++/Python? Why Julia doesn't use class-based OOP? - #92 by goretkin
To be more specific, I’m looking for citable, peer reviewed material (maybe grey literature if that’s all that exists).
For a compelling proposal it isn’t wise to spend pages laying out an argument for why julia is more composable without strong citable support… an argument that could easily be viewed as personal opinion to a reviewer. Nearly everyone that uses Julia for more than a year is an evangelical… including me.
Does anyone know of more neutral authors that asses strengths of different languages and conclude that Julia’s unique design provides the benefits claimed by its authors?
I can’t find any, and it’s not surprising because Julia’s composability isn’t spectacular. Different packages working together through APIs is not a novel concept, and the easing by a language’s particular features is readily apparent.
Python alone is also very composable via duck-typing. The obstacles in such glue languages comes with the 2-language problem:
- if 2 core packages implement their own versions of a data structure, especially in 2 different languages, then their dependents can easily be separated into 2 incompatible groups. That’s not necessarily the case because people can agree on a Python-level API for Python code to work with either package; the Julia ecosystem has this pattern in abstract interfaces.
- The composability is entirely in the glue language, and you can’t compile the underlying code in separate packages together at runtime, especially if they’re in separate languages. You can build another package that mixes them how you need, rewrap it in Python, and import that, but it’s obvious that working in one compiled language is smoother, especially JIT-compiled ones in interactive shells.
Since you can accomplish composability in different languages with their own perks and drawbacks, there’s not much incentive to academically nitpick. Julia didn’t invent something unlike any other language, it just collected and eased many convenient features for interactive workflows. For example, you can do limited argument dispatch in Python in various contexts if someone tries hard enough; NumPy uses NEP 18 to allow its API to use non-NumPy arrays, some of Python’s infix operators implement double dispatch in underlying dunder methods. Other reasons for language choice can easily outweigh how composability works.
This is — itself — a reason for Julia’s composability. Every Turing complete language can implement any features from any other language… except performance.
I think that having user defined structs and functions as capable and performant as language “builtins” is a big part of the composability story. Multiple dispatch helps but I wouldn’t dismiss performance. As Jeff says, “performance is actually special”.
Given that the premise of the OP is to show that Julia is more composable, I wouldn’t linger too much on the “invention” part.
Instead, quantitatively speaking it’s definitely more composable (e.g. matrix operations can be readily done with any custom <:Number
type, which is not the case for numpy, if I recall correctly).
I’m afraid that what one needs to do to address the OP would be to roll up their sleeves, and curate all the provided links. Quite time-consuming, though…
Kind of. It started from wrapping an optimized BLAS library, which can’t be JIT compiled with custom types, but it is possible to implement more generic methods, see the example in NumPy 1.17.0 Release Notes — Support of object arrays in matmul
. The same thing happens when Julia wraps a statically compiled library (BigFloat FFT in Julia - Stack Overflow).
That’s kind of the spirit of inherent advantages. Composability exists in many languages with different perks and drawbacks, so if we throw out opinion, we must talk about something novel. You are right that it’s nitpicking because opinions are what really matter. Composability is good? Performance of user-defined structs is good? Those are opinions based on our needs. If we’re programming a microcontroller with 64 bytes of RAM, a garbage collector and a JIT compiler being necessary to support language features is suddenly a needless drawback to constantly work around.
The nice part about this actually being a matter of opinion is we’re only dealing with a context that Julia is basically designed for: scientific computing and machine learning, Python being an explicitly named competitor. That bolsters the point that it’s not realistic to expect a neutral academic outsider to assess languages with such a context in mind. A Julia enthusiast with strong experience in ML and Python is actually the perfect person for demonstrating how composability is more practical in Julia and in a way that matters for ML development, not just broad strokes and toy examples.
I have totally different impression: I think that whenever arrays are involved, idiomatic Julia code tries very hard to stick to the AbstractArray
interface, and provides special casing (eg for Array
) as optimizations when applicable.
In other words, most generic Julia methods that take Array
s will be perfectly happy with UnitRange
s or whatever.
That said, I would shy away from statements like
Not because I don’t think it is true, but because it is hard to quantify or measure. Most people who like Julia have just tried it and it worked for them. (Yes, I understand that you want something for the application, but if writing it in Julia vs Python is the only selling point, I am not sure that is a strong argument).