Julia motivation: why weren't Numpy, Scipy, Numba, good enough?

A key point around the question of “will julia get various advanced optimizations?” is our philosophy and priorities. If a language+compiler can optimize more programs more effectively, we consider it better, and we want it. Somewhat surprisingly, not everybody agrees with this. The other camp is the “scripting language” camp, which says “programs should be written in two languages” (this is a John Ousterhout quote). You have a performance language, and a scripting language. Any thought to optimizing the “scripting” language is almost by definition a waste of time, since that’s what the performance language is for. This is where Python’s occasional “fast enough for me” attitude comes from.

Unfortunately, with the hardware changes happening now, I see a potential new “two-language problem” around accelerators like Halide, TensorFlow, ArrayFire, ParallelAccelerator, etc. We are starting to need optimizations that don’t naturally fall out of what we know about functional and object-oriented programming. Maybe we don’t know how good we had it — when all you needed for performance was C, well at least that’s a powerful, widely-adopted, general-purpose language. Now, we are willing to cram whatever hacks are needed into julia to get the best performance, but I have not yet seen a satisfying, disciplined approach to this problem.


Halide is also first compiled to lowered form and then jitted . Maybe it can be incorporated into Julia replacing pherhaps BLAS, making Julia completely independent (written completely in Julia)

1 Like

The way I (possibly incorrectly) understand it, C++ actually also goes a long way towards having a kind of multiple dispatch. For example, the Julia code:

foo(x) = 1
foo(x::Int) = 2

Would be in C++:

template<typename T>
int foo(T&& x) { return 1; }

int foo(int x) { return 2; }

Given the fact that this is so much easier to do in Julia is what convinced me that, coming from using C++ for high-performance scientific computing, Julia is “C++ done right”. It allows multiple dispatch and meta programming with a sane syntax (every other word doesn’t have to be template or typename) and understandable error reporting.


Multiple dispatch refers to the ability to dispatch on more than one argument: https://en.wikipedia.org/wiki/Multiple_dispatch

I believe dispatch on more than one argument is also possible in c++. However, since meta programming in c++ is Turing complete, it has many more ambiguities since it’s in general undecidable which overload is more specific.

Yes, to complete my example for that case:

template<typename T1, typename T2>
int foo(T1&& x, T2&& y) { return 1; }

template<typename T>
int foo(int x, T&& y) { return 2; }

int foo(int x, double y) { return 3; }

The ambiguities are indeed a problem and you may wind up doing things like using templated structs wrapping methods to work around it. For real-world code, this quickly becomes extremely difficult to read, and compiler errors can run into the thousands of lines that seep through to the user of the library.

Yes. C++ overloading is multi-argument but is also all static — it only works on types known to the compiler. If you want dynamic dispatch, you need to use virtual methods, which dispatch on a single argument and use different syntax. Dynamic dispatch appears to be important. People want to just throw whatever they have at some functions, and have the right thing happen, regardless of whether the code can be statically typed. But in C++ static type errors are not even the worst of it: the compiler will pick an overload to call based on whatever type it can determine, which can be different from the run-time type. That doesn’t seem to be a huge problem in practice, but it’s very unsatisfying. Only the ground truth of what an object actually is should matter, not compiler limitations.


For a little personal historical perspective:
I first learned about multiple dispatch about 20 years ago in college when taking an AI course that required common lisp. Although common lisp itself doesn’t have OO, the extension (via macros) CLOS (common lisp object system) is a form of OO with multiple dispatch (BTW:I think Julia got it’s inspiration from CLOS). I remember when I first learned the CLOS OO way I thought “wow this is so much more elegant and powerful than C++'s way”. But like most cool things you learn in college this remained a one time experience and in professional life, OO was the C++/Java/C#/… way.

About 10 years ago I learned Clojure and as a lisp descendant it also has multiple dispatch (called multimethods). So when learning about multimethods (which is totally alien to Java programmers) I remembered my Common lisp experience and recognized it as the system from CLOS. When I learned R some years later I recognized the lisp influence and the multiple dispatch (in S4 system).

So when I first learned Julia the multiple dispatch feature wasn’t unfamiliar and didn’t come as a shock and I was fine with the fact that Julia’s OO isn’ t the obj.method syntax (unlike some people who come from Python and regularly demand this).

What is different with Julia from those older systems is the multiple dispatch is a much more central feature of the language and the libraries whereas in the other systems it’s more “bolted on”.


I’ve noticed this too, and I think it’s a concern and an important problem. The most optimistic scenario I’ve come up with is that these kinds of optimizations are perhaps best incorporated into user-level julia code via a set of sophisticated iterators. For example, in ImageFiltering I’ve been able to achieve some of the same kinds of “fusion” as Halide with surprisingly few lines of code. The core element of this approach is itself a small package, TiledIteration, which I think does constitute a reusable nugget of ideas for moving forward in this problem space.

From the standpoint of implementing this kind of thing more broadly and for “micro” scale computations, a crucial optimization will be the whole stack vs heap for structs that contain “pointers” to heap-allocated memory: these iterators need to create a crazy number of wrappers for operations working on small chunks of arrays.


2 posts were split to a new topic: Stack allocation for structs with heap references

The way I see it, there are 2 approaches today:

  • Compiled Language
    User must give the compiler as much data as possible using the language syntax. Any gray area left is up to the compiler to decide. Given the explicit information and the worse case assumptions the compiler tries to generate the best code possible.
    Those languages usually are improved by adding syntax for the user to tell the compiler more and more data (For instance, #pragma in the case of OpenMP). In this world performance are usually a function of how many information the compiler can have (This is, for instance, why FORTRAN was fast in the past, the compiler had a lot of information and could optimize aggressively).
    Yet the compiler will never have the same amount of data as one could have in run time. Hence there will always be some “Gray” area.

  • Interpreted Language
    Very friendly to develop yet slow.
    These use the JIT concept to make things faster yet still, usually, don’t try to get information from the user to make better assumptions. Performance hot spots are programmed using Compiled Languages (MATLAB, Python).

Julia, as the 2 languages problem solve, takes things one step forward (Merge and improve).
It uses the JIT framework, along with user implicit data (Decorations, Macros, etc…) to give an efficient compiler more data to be able to optimize code better.
Since it is done JIT, more information can be inferred and hence even make things faster than Static Compiling.
So behind the scene we have “Compiled” language with very sophisticated compiler with the extra information which can only inferred on run time.
This is brilliant…

Now, the challenge is to keep the language intuitive and user friendly while extracting as much date from the user and the JIT engine.

This solves the 2 languages problem.
Though it is only “Vision”, doing it in a coherent and disciplined manner is the big challenge here.

1 Like

What about recording interations like in reverse diff?

This is wildly off-topic. Please respect that a thread has a subject and stick somewhat to it.


Different for me. I saw a post saying look at our great new numerical language “Julia”. I thought, ah another one, best of luck to you. A couple of years later, Andy Ferris, who worked down the hall from me, said he had abandoned other languages for Julia. I know his background so I took this very seriously; that was the key factor. I then spent a couple of hours coding and running a small test problem in C++ and Julia (including downloading, building, learning). I did not believe the results. After I convinced myself that I hadn’t made a mistake and what I saw was real, I immediately dropped everything else and never looked back.


Thanks John - I’m glad it worked out for you :slight_smile:

1 Like

Thanks for all the helpful replies! The general principles or the answers I had a pretty good idea of, but I very much appreciate the comprehenisve response from @StefanKarpinski, the historical background, the Measurements.jl examples, and the external links.

@gasagna: a pity we didn’t meet at KITP! I am definitely thinking of reimplementing channelflow in Julia (my C++ toolset for simulation and analysis of Navier-Stokes in channel geometries), or perhaps as simpler test case isotropic turbulence in a triply periodic cube. But it’s not clear to me how to do the parallelism in Julia. If I understand correctly MPI.jl is limited to julia-0.4, and there’s not yet a Julia interface to parallel FFTW. Do I even need MPI.jl, or would distributed arrays suffice?

Now I’m getting off-topic. Will post these questions over at Numerics.


This post was temporarily hidden by the community for possibly being off-topic, inappropriate, or spammy.

This topic was automatically opened after 7 days.

This conversation may be a little dead, but I was also stumped by the “why not numba?” question when I gave a talk on Julia recently. So, I took the benchmarks from test/perf/micro and for all of the python benchmarks I added @jit decorations to the functions. This worked fine except for parse_int which gave me some error that I didn’t understand. After running the benchmarks, this is what I get:

I am on a 2013 macbook pro with an intel i7 quad core (2.6GHz) and 16Gb of ram. The Julia benchmarks were on 0.6.0-rc2.0. As for Python/Numba, we have Python 3.5.2 and numba 0.33.0.

This shows you that Python is definitely outperformed by Julia, but when the Numba compiler works, you can write Python code that is within a factor of three of Julia performance.

There have been many interesting points made above about Julia vs Python/Numba as languages, but in terms of performance it seems that Numba can be quite competitive in tasks that are important in the field of numerical computing.

1 Like