This is exactly the purpose Julia’s built-in array type accomplishes. Instead of making their own somewhat numpy-compatible APIs (that’s what JAX is. It is not a drop-in replacement), Julia libraries can rely on Array
/AbstractArray
with zero additional overhead.
Going a bit more abstract, multiple dispatch is a means to an end and not an end in itself. Yes, part of why Julia can generate such fast code is that multiple dispatch allows for specialization on certain types. However, that does not mean that a) Julia can’t generate fast code without multiple dispatch, and b) compile-time latency will be eliminated if multiple dispatch is removed. I find it helpful to think about things this way:
Imagine every Python function you wrote was automatically run through Numba. As you can imagine, that would be painfully slow even though there is no multiple dispatch going on. Why do I bring this up? That is (conceptually) Julia’s compilation model. In this light, you can see how Julia is actually much “faster” than one would expect from naively JITing all code all the time.
Now, what if you don’t need this aggressive auto-compilation (e.g. in your cron job)? Julia exposes mechanisms for saying “I don’t care about optimization, just run the code”. See this post for a quick overview.
My intent here is not to write a “you’re holding it wrong” post WRT pre-runtime latency in Julia. I use Python almost exclusively for my own work because the current ML stack doesn’t offer enough ROI to offset its tradeoffs for my particular use-cases.
That said, I think it is good to clear up misconceptions around how Julia works and why the statement above is categorically false outside of the narrow domains that some of us work in. I’ll just close by saying that you may find Julia’s approach to be far closer to the Unix Philosophy than the “numpy-shaped island/silo per library” model in Python land