Julia motivation: why weren't Numpy, Scipy, Numba, good enough?

Royi, here’s how I understand it :slight_smile: “yes, it [will be] achievable” in Julia but there won’t be default magic solutions. So: if you are willing to think about and add (possibly technical) decorations, you will be able to achieve high optimisation; if you want automatic magic, you will need extra packages. (Right now, threading seems to be technical/magical, but, as the design matures, some magics may become dependable and reliable in the future.)

My summary of this thread would be: in Python you can work hard to make some vanilla parts fast; with Julia you can make anything fast.

But Julia will make it easy, not do it for you.

4 Likes

Yes, I already answered that. Write code in a vectorized form in Julia, put using ParallelAccelerator at the top of your code, add the @acc decoration, and you will get highly optimized SIMD + Threading C code. You don’t need to wait for the future: you can do that today!

People coming from MATLAB, Mathematica, and somewhat Python and R (due to their expansive Base/SciPy libraries) seem to have a very expanded sense of “what is the core of a scientific computing language”. Julia early on did as well, but there has been a large recognition that there is no reason for everything to be in Julia’s Base. Instead, scientific computing revolves around the many different algorithms which specialize to the problems people are trying to solve, and this massive amount of algorithms is better handled by a robust package system. Since Julia doesn’t privilege Base (defining types in packages has the same no-low overhead as they do in Base), these things might as well be packages, separated from the Core of the language. Since using a package only take 1 LoC (Pkg.add("ThisPackage"); using ThisPackage), it’s not like it really adds user burden. Then question that should be asked is no longer “what could be added to Base” as much as it is “what should be or needs to be parts of Base?”.

There doesn’t seem to be a solid line drawn for “what should Base be”, but for example in v0.6 numerical quadrature is no longer part of Base, but it the well-functioning well-maintained essentially no different from when it was in Base package, QuadGK.jl. Again, some people coming to Julia might go “but I expected Julia to just have a fast, parallel, …, method for numerical quadrature, eww why is it a package?”. I think the answer is along the lines of: why not make it a package? There was nothing about Base that required it to be there, and most codes don’t use it. And going further down the line, even things like linear algebra may be moving out to packages, except as “default packages”. So any idea that “the answer is in a package, so it’s not really answered by Julia” is very wrong headed in some sense, since these packages are written in Julia themselves and are no harder to use than Base code.

But further discussion on this topic should be a different thread. There has also been a lot already written on this topic, so you may want to search around Google for “default packages” and “standard library” in Julia.

1 Like

@ChrisRackauckas, I really appreciate you effort but you’re totally missing my point.

I’m talking about the core language and core language in my point of view is arrays and the basic math operations (In Multi Dimensional Space, Like Linear Algebra).

Namely, C isn’t base for me just because operation on arrays aren’t built in.

Now, this is the context of my question and it was a theoretical one.
Namely, how much data from the programmer (Explicit and Implicit) does the Julia needs to create a perfectly optimized Machine Code.

Is that amount of data is something reasonable to have in few lines of code?

This is the question.
It has nothing to do with packages and please don’t refer me to those.
I’m asking about the concept of Julia.

The way I see it, the concept of the developers was how much we can simplify (Make it User Friendly) the data a compiler needs to create the most efficient code.
If at the end Julia will teach the world this amount of data is only few decorations and Language assumptions it will be huge.

If not, then probably the standard way of doing specific things fast will stay with us until the next time someone tries it.

2 Likes

Isn’t it getting a bit offtopic now? I appreciated the discussion related to Python. - Maybe the posts starting with "Going forward, do you think it is achievable, in the future, to write code in its Vectorized Form + Decorations in Julia and have performance of highly optimized (SIMD + Threading) C Code?" could be splitted into its own thread?

3 Likes

@swissr, I think it is related to Python.

Python (As maybe MATLAB as well) doing things a little differently.
They optimize Hot Spots in the language.

Julia is trying to be born fast.
The question is, trying to be overall fast, does it mean that also in hot spots be as fast as others?

MATLAB and Python just write things in C (What’s needed to be fast).
Julia is more like an intelligent framework to tell a compiler how to create efficient code form simple intuitive syntax.

If you address the OP question, it seems only if the the way Julia chose doing thing will end up being faster there is a logic in it.

3 Likes

The very simple answer is, if you use the -O3 flag, then the compiler already automatically will use SIMD when appropriate (try a simple loop. Make sure you’ve re-built the system image or made Julia from the source). As for adding threading, the answer looks like it will be some macro (and you know about this since you’ve commented in the issue):

But…

What I am trying to get across is that those things will actually be packages. There are already many discussions about how to move the linear algebra parts out to a package, and it looks like it will be done before 1.0 (it’s waiting on the implementation of a system for default packages). Even the basic math functions like sqrt: Julia currently uses a version of OpenLibm, but there are some very advanced experiments with Julia implementations like Libm.jl and Amal.jl for replacing all of this basic math with packages (some packages may be loaded by default, but they will still just be packages).

So what I am really trying to hammer home is that the idea that “no it’s not Julia if it’s a package, let’s only talk about ‘Julia’” as though there is some privileged Base Julia which rules above all others: that idea is very wrongheaded. I think it’s safe to say that very soon, all of what you think of as “basic math operations” will be “default packages”: just standard Julia packages which by default are loaded into the system image.

So yes, even your examples of what do not refer to packages, actually refer to what will be packages. That means I think it’s safe to say you cannot talk about Julia as though it’s isolated from its package ecosystem. I think at this point, the concept of Julia includes packages, as seen by how things which were considered essential enough to be in early Julia have now moved out into packages. In that sense, you can already do what you’re asking in Julia using ParallelAccelerator, and the equivalent to Python+Numba in Julia is actually Julia+ParallelAccelerator.

Yes, that question hijacks the thread and all responses after it should be a separate thread about the future of auto-optimizing (SIMD, multi-threading, etc.) Julia code. Auto-optimizing Julia code a la Numba is something related to “making Julia fast”, but in some sense that is more about making Julia match hardware, not the design of Julia itself and why it makes sense as a way to specify scientific programs.

5 Likes

A key point around the question of “will julia get various advanced optimizations?” is our philosophy and priorities. If a language+compiler can optimize more programs more effectively, we consider it better, and we want it. Somewhat surprisingly, not everybody agrees with this. The other camp is the “scripting language” camp, which says “programs should be written in two languages” (this is a John Ousterhout quote). You have a performance language, and a scripting language. Any thought to optimizing the “scripting” language is almost by definition a waste of time, since that’s what the performance language is for. This is where Python’s occasional “fast enough for me” attitude comes from.

Unfortunately, with the hardware changes happening now, I see a potential new “two-language problem” around accelerators like Halide, TensorFlow, ArrayFire, ParallelAccelerator, etc. We are starting to need optimizations that don’t naturally fall out of what we know about functional and object-oriented programming. Maybe we don’t know how good we had it — when all you needed for performance was C, well at least that’s a powerful, widely-adopted, general-purpose language. Now, we are willing to cram whatever hacks are needed into julia to get the best performance, but I have not yet seen a satisfying, disciplined approach to this problem.

23 Likes

Halide is also first compiled to lowered form and then jitted . Maybe it can be incorporated into Julia replacing pherhaps BLAS, making Julia completely independent (written completely in Julia)

1 Like

The way I (possibly incorrectly) understand it, C++ actually also goes a long way towards having a kind of multiple dispatch. For example, the Julia code:

foo(x) = 1
foo(x::Int) = 2

Would be in C++:

template<typename T>
int foo(T&& x) { return 1; }

int foo(int x) { return 2; }

Given the fact that this is so much easier to do in Julia is what convinced me that, coming from using C++ for high-performance scientific computing, Julia is “C++ done right”. It allows multiple dispatch and meta programming with a sane syntax (every other word doesn’t have to be template or typename) and understandable error reporting.

6 Likes

Multiple dispatch refers to the ability to dispatch on more than one argument: https://en.wikipedia.org/wiki/Multiple_dispatch

I believe dispatch on more than one argument is also possible in c++. However, since meta programming in c++ is Turing complete, it has many more ambiguities since it’s in general undecidable which overload is more specific.

Yes, to complete my example for that case:

template<typename T1, typename T2>
int foo(T1&& x, T2&& y) { return 1; }

template<typename T>
int foo(int x, T&& y) { return 2; }

int foo(int x, double y) { return 3; }

The ambiguities are indeed a problem and you may wind up doing things like using templated structs wrapping methods to work around it. For real-world code, this quickly becomes extremely difficult to read, and compiler errors can run into the thousands of lines that seep through to the user of the library.

Yes. C++ overloading is multi-argument but is also all static — it only works on types known to the compiler. If you want dynamic dispatch, you need to use virtual methods, which dispatch on a single argument and use different syntax. Dynamic dispatch appears to be important. People want to just throw whatever they have at some functions, and have the right thing happen, regardless of whether the code can be statically typed. But in C++ static type errors are not even the worst of it: the compiler will pick an overload to call based on whatever type it can determine, which can be different from the run-time type. That doesn’t seem to be a huge problem in practice, but it’s very unsatisfying. Only the ground truth of what an object actually is should matter, not compiler limitations.

21 Likes

For a little personal historical perspective:
I first learned about multiple dispatch about 20 years ago in college when taking an AI course that required common lisp. Although common lisp itself doesn’t have OO, the extension (via macros) CLOS (common lisp object system) is a form of OO with multiple dispatch (BTW:I think Julia got it’s inspiration from CLOS). I remember when I first learned the CLOS OO way I thought “wow this is so much more elegant and powerful than C++'s way”. But like most cool things you learn in college this remained a one time experience and in professional life, OO was the C++/Java/C#/… way.

About 10 years ago I learned Clojure and as a lisp descendant it also has multiple dispatch (called multimethods). So when learning about multimethods (which is totally alien to Java programmers) I remembered my Common lisp experience and recognized it as the system from CLOS. When I learned R some years later I recognized the lisp influence and the multiple dispatch (in S4 system).

So when I first learned Julia the multiple dispatch feature wasn’t unfamiliar and didn’t come as a shock and I was fine with the fact that Julia’s OO isn’ t the obj.method syntax (unlike some people who come from Python and regularly demand this).

What is different with Julia from those older systems is the multiple dispatch is a much more central feature of the language and the libraries whereas in the other systems it’s more “bolted on”.

8 Likes

I’ve noticed this too, and I think it’s a concern and an important problem. The most optimistic scenario I’ve come up with is that these kinds of optimizations are perhaps best incorporated into user-level julia code via a set of sophisticated iterators. For example, in ImageFiltering I’ve been able to achieve some of the same kinds of “fusion” as Halide with surprisingly few lines of code. The core element of this approach is itself a small package, TiledIteration, which I think does constitute a reusable nugget of ideas for moving forward in this problem space.

From the standpoint of implementing this kind of thing more broadly and for “micro” scale computations, a crucial optimization will be the whole stack vs heap for structs that contain “pointers” to heap-allocated memory: these iterators need to create a crazy number of wrappers for operations working on small chunks of arrays.

15 Likes

2 posts were split to a new topic: Stack allocation for structs with heap references

The way I see it, there are 2 approaches today:

  • Compiled Language
    User must give the compiler as much data as possible using the language syntax. Any gray area left is up to the compiler to decide. Given the explicit information and the worse case assumptions the compiler tries to generate the best code possible.
    Those languages usually are improved by adding syntax for the user to tell the compiler more and more data (For instance, #pragma in the case of OpenMP). In this world performance are usually a function of how many information the compiler can have (This is, for instance, why FORTRAN was fast in the past, the compiler had a lot of information and could optimize aggressively).
    Yet the compiler will never have the same amount of data as one could have in run time. Hence there will always be some “Gray” area.

  • Interpreted Language
    Very friendly to develop yet slow.
    These use the JIT concept to make things faster yet still, usually, don’t try to get information from the user to make better assumptions. Performance hot spots are programmed using Compiled Languages (MATLAB, Python).

Julia, as the 2 languages problem solve, takes things one step forward (Merge and improve).
It uses the JIT framework, along with user implicit data (Decorations, Macros, etc…) to give an efficient compiler more data to be able to optimize code better.
Since it is done JIT, more information can be inferred and hence even make things faster than Static Compiling.
So behind the scene we have “Compiled” language with very sophisticated compiler with the extra information which can only inferred on run time.
This is brilliant…

Now, the challenge is to keep the language intuitive and user friendly while extracting as much date from the user and the JIT engine.

This solves the 2 languages problem.
Though it is only “Vision”, doing it in a coherent and disciplined manner is the big challenge here.

1 Like

What about recording interations like in reverse diff?

This is wildly off-topic. Please respect that a thread has a subject and stick somewhat to it.

4 Likes

Different for me. I saw a post saying look at our great new numerical language “Julia”. I thought, ah another one, best of luck to you. A couple of years later, Andy Ferris, who worked down the hall from me, said he had abandoned other languages for Julia. I know his background so I took this very seriously; that was the key factor. I then spent a couple of hours coding and running a small test problem in C++ and Julia (including downloading, building, learning). I did not believe the results. After I convinced myself that I hadn’t made a mistake and what I saw was real, I immediately dropped everything else and never looked back.

28 Likes