Julia motivation: why weren't Numpy, Scipy, Numba, good enough?

True, at various conferences (like at the ODSC - Open Data Science ones), I’ve found Python programmers very open to learning something that may make their lives easier.
(I had a brief flirtation with Python, before I discovered Julia, and there are still many things [in v3.5+], that I like very much in Python, that I want to implement for Julia)

1 Like

Absolutely!

I recently gave a talk about Python internals and how to speed it up at an advanced (Python) programming school for PhD students and post docs (mainly Physicists with no CS background). I brought up some speed comparisons between bare Python, Cython, Numba and Numpy and always showed a Julia version as a final touch. After the talk and during the hands-on sessions I prepared, there was a great interest in Julia. I’d say the majority of the participants were extremely curious and I know a few of them who told me afterwards that they started using Julia.
Actually the very first question after the talk was: “Would you recommend to use Julia instead?” :smiley:

I think Python coders who try to learn how and when to use all those additional libraries are quite impressed when they see a language which is as readable as Python and has no such problems. After all you have to understand quite a few internal logic to get a feeling how Numba/Numpy/etc. perform.

Btw. most of the people I talked to about Julia were totally scared about the fact that “there are no classes in Julia”. Well… :wink:

13 Likes

I am a Python user and I have to admit that I don’t see why I would add Julia, another language, in my workflow.

I feel comfortable with Python and its expressibility.

For speed, I used Cython but now (2017) I tend to use more Pythran (https://github.com/serge-sans-paille/pythran), which is very impressive.

Pythran compiles in optimized C++ any simple Python functions using numpy and scipy. A very nice thing is that you don’t need to decompose vectorized code in loops. Pythran is fully aware of numpy syntax.

It would be nice that Julia folks add Pythran to their benchmarks.

Of course there are limitations to Pythran, partly because it is a very young project. For example you cannot call h5py or mpi4py functions in the functions that you want to pythranize. But for a Python/C/C++/Fortran programmer, it is nice to write sometime a little bit of Cython.

Remark: regarding the project of writing a Julia code for isotropic turbulence in a triply-periodic box, you can have a look at this code Bitbucket which is a Python code doing the same thing (with mpi4py, Cython and Pythran). It is pretty efficient compared to the equivalent Fortran or C++ code.

1 Like

That’s fine, of course – if you are happy with your current tools and workflows, there is obviously no strong motivation to change! Since you just joined, for what it’s worth, you may be interested to know that the author of ChannelFlow is a regular commenter here, and seems quite taken with Julia.

And, given your interests, projects like the ones below may tempt you to consider Julia again in the future :wink:

14 Likes

i think a lot of people came to julia because of the performance, but stayed for the meta programming, the type system and conciseness of julia.

My favorite article illustrating this is: Defining Custom Units in Julia and Python | by Erik Engheim | Medium

19 Likes

This is unlikely to be done by Julia developers because the additional complexity of benchmarking niche language implementations is hard to justify when Julia already benchmarks within a reasonable margin of C and Fortran. Updating the benchmarks is already a bit tricky because they need to be run on the same machine with each implementation installed and updated to the latest version (otherwise people complain!).

But surely some people would be interested if you published a blog post about Pythran implementation/translation and comparison benchmarks you run yourself.

2 Likes

As was commented by several people earlier in this thread, the performance advantage of Julia (unlike Cython, Numba, or Pythran) is that good performance is not limited to a single “built-in” container type (NumPy arrays) of a small set of built-in scalar types, and “built-in” vectorized functions recognized by the compiler. In Julia, you can get high performance in code that fully uses polymorphism, user-defined types, user-defined containers, and user-defined vectorized functions (or without using a vectorized style at all).

22 Likes

Hey Pierre,

I am a Python enthusiast too and I write a lot of Python code (also a core analysis framework) for a neutrino telescope experiment, using a a mix of numpy, cython, numba and direct calls to other C/Fortran code written in our collaboration.

The point is: you can of_course always make Python almost as fast as C (mostly the algorithmic part and you also have to sacrifice a lot in favour of continuous, typed arrays), but something (which I had) to realise is the fact that Julia is simply another approach of software design. It’s like trying functional programming which provides completely different kinds of solutions compared to OOP, Julias multiple dispatch will give you a ultimately distinct view on how to model your data and logic.

I have to repeat @sdanisch to take a look at https://medium.com/@Jernfrost/defining-custom-units-in-julia-and-python-513c34a4c971 which showcases two totally different – I might say orthogonal – implementations (one in Python, one in Julia) to a simple problem.

The first thing an experienced Python user would think while reading the Python part is obviously to replace the class implementations with Cython code or try Pythran (Python 3 support is still beta) or to ignore all of that and look at the heavy calculation part of the code and try to squeeze the things into numpy recarrays to vectorise operations, utilising numbas @jit decorator etc.

In contrast, with Julia you solve the problem in a way which is simply “not doable” in Python and you automatically have performant code, without thinking much about it.

In my opinion, Python users who are still convinced that Julia “just” solves this “two” language problem by “patching” Python code with numpy/cython/numba/pythran/whatever should try to understand the vast amount of new implementation possibilities provided by the multiple dispatch and meta programming features of Julia.

There is still quite a lot to explore.

22 Likes

No you can’t, unless you count mixing Python with custom code in a lower-level language (e.g. Cython or C) as “Python”. There are plenty of real-world problems that just can’t be vectorized or otherwise shoehorned into existing libraries.

5 Likes

I think to be more specific, of course you can make some operations on arrays of 32/64-bit floating pointer numbers efficient by imposing that restriction and optimizing directly on that. Julia was one of the first dynamic languages to do it, but LuaJIT, Javascript, etc. all do it as well. Numba, Pythran, etc. are doing that now too, great! It’s old hat though. I can tell you an easy strategy for it: just transcribe to a compiled language, put type specifiers saying stuff is of the types that you know, and compile it. If you then restrict what you’re allowed to compile, yay you’re done and now you have Pythran!

Being able to optimize that automatically is the smallest show of what Julia is all about.

In Julia, there’s nothing special about operations on arrays of 32/64-bit floating point numbers. The compilers like Numba and Pythran specifically handle these, but in Julia is just a user-defined number type that’s in Base. The same specialization happens for arrays of ArbFloats from the ArbFloats.jl package, or quaternions, or Unitful.jl numbers, or SymPy/SymEngine expressions, etc. The same magic for optimizing dense arrays happens to sparse arrays, GPU arrays, my crazy multiscale arrays for biological models. While some people are still touting that they got simple loops to do pretty well in a dynamic language, in Julia we’re writing generic codes that are automatically becoming GPU-compatible with adaptive time and adaptive space built in via the array types… this is so not the same thing as a simple loop of floating point numbers!

And it comes down to the fact that you can’t always make Python as fast as C. There’s some things that Python does that is antithetical to optimization. For example, you can add fields to Python classes at any time, so what a Python object holds can change at any part of your code. How do you optimally pack Python classes if you don’t even know what’s in there? Of course, you can cut all of that stuff out of the Python language (it’s not like that kind of stuff makes legible code anyways), and then to help with type-inferrability add a robust type system and multiple dispatch along with auto-specialization of generic functions so that way everything is always concrete without much change to the syntax… but then hey, you get Julia.

11 Likes

I was using Python as a synonym for the whole Python optimisations world.
I just wanted to say that with enough patching, hacking and mixing you can achieve almost C speed but that’s really only correct if you are using the numpy base, as far as I experienced. Custom classes in Cython never worked in performance critical code (at least for me).

I totally agree with @ChrisRackauckas he summarised it perfectly.

So yes, of course you are right, I had to write it a bit more specific :wink:

1 Like

“Using the numpy base” is not nearly sufficient if you have a problem that doesn’t vectorize well. And there are lots of problems that don’t vectorize well. Almost all large-scale scientific packages in Python seem to end up relying on custom code in a lower level language (even if it is “c/fortran with python-ish syntax” ala Cython or Numba).

2 Likes

I wonder if it’s provable that it will never work rather than it doesn’t work now. There are also things that don’t work well/performantly in Julia atm e.g. sorting.

That’s mixing up language discussion with package discussions. Whether or not there is a readily available fast radix sort for strings is orthogonal to whether someone can write one directly using the language. The point is that Julia’s design allows the compiler to optimize functions which act on arbitrary types including whatever string implementation. Julia’s design allows for in place memory operations and bit twiddling, along with direct LLVM calls. Someone can use that to write an efficient algorithm that’s Julia all the way down in a way that you cannot in something like Pythran which is limited to a small set of operations on a small set of types.

That’s the whole point: it’s a language which is a tool to build algorithms, not necessarily just a packaged set of algorithms for common scientific problems. R and Python make up for language-level issues with packages which hopefully do everything you need, which is great until they don’t. Julia of course doesn’t have everything filled out as much as R or Python in terms of data libraries because it’s much younger, but that’s not the question. The question is: what language is it easier to build such libraries in? And that’s where I am confident the answer is Julia.

Whether the wider scientific community can break out of the local minimum of good packages in non-performant languages backed by loads C++ code is something to be determined. Even if there isn’t as much adoption in Julia, that’s not going to make someone like me spend the gigantic amount of extra time it would take to build a good R or Python library. It’s that power, the fact that it’s orders of magnitude easier to do package development in Julia, that will drive Julia’s long term success, not the current state of DataFrames vs Pandas. So package discussions are interesting in their own right to discuss for package development, but it’s not something directly related to why a developer should choose Julia.

22 Likes

As PyPy.js is faster than CPython, i think Guido prefers clarity over optimizations in the default Python implementation.

Just stumbled upon this Stackoverflow question (python - Why is numba faster than numpy here? - Stack Overflow). The top answer illustrates perfectly the reason why Julia is superior for many of these numerical tasks.

7 Likes

Really nice discussion between Nir Friedman and @DNF.

The example he shows that LLVM can on some cases avoid temporaries automatically
(If it is LLVM and not something built in into the Type, not an expert on that) is really nice.

I think the correct policy is the language should do efforts to avoid Temporaries in any case while (Which is Julia style) also let the user inject prior knowledge (As with .) to make things even simpler for the compiler.

Yes but in the future the Arrow compute engine (something similar to tensorflow but for dataframes) will solve this in a cross-language way (for Python ,R, C/C++ and any other language with a binding)

Oof, that certainly doesn’t look like a nice discussion to me (“I’ve given you way more than enough of my time” at the end). It’s mostly just confusion about what the terms ‘temporary’ and ‘allocation’ mean. FWIW, Nir Friedman is right that stack allocation is still allocation and a stack-allocated scalar temporary variable is still a temporary variable, but @DNF is of course right that heap allocation of temporary arrays is a main source of performance degradation, whereas stack allocation of scalar temporary variables is (at least typically) not.

3 Likes

Yes, it went on way too long. Mainly, I wanted to address the last paragraph of the answer (about the source of slowdown/speedup), which was also the topic of the question. But it devolved into a back-and-forth on what a ‘temporary’ is

1 Like