Ensure Julia is used to its full power

(oh dear, coming back from the bottom of this post, I really wanted to be silent but could not resist to get it off my chest)

I am always wondering about statements like this and others like “Python has solved it with NumPy, Numba and JAX”. Another point is “modern Python in worst case could be slow, but not more than 2-3 times”. What is “modern Python”? I think you are referring to “modern Python” as the full stack of all those libraries and technologies which try to fix the root of the problem: Python is not a scientific language designed for high-performance computing.

I think that the myth of “modern Python” in that sense is a lie and I am one of those who still try to believe it’s true, but I am starting to give up on that hope after about a decade of scientific HPC Python experience. I maintain a couple of low level libraries which are used in astroparticle (neutrino) physics, where we deal with Petabytes of data (MC simulation and detector data) and I can tell you that in the past 8 years or so, we went over so many iterations, combinations and transitions of our tools, from Cython to PyPy to Numba, C++ interfaces, wrappers and whatnot, mixed things here and there. Fact is: everything is changing constantly all the time and we are chasing and trying to keep up with API changes and new kids around the block and nothing is working at its full potential. The code which worked for Numba vX.Y break without any good reason in vX.Y+1. Then we face compiler issues because some library maintainer decided to drop C++14 support but we need to link against another which is C++14; incompatible boost Python bindings, Python 2 vs 3 nightmare with str/unicode/bytes, Windows users who struggle to compile Cython; and I don’t want to start to talk about our ML stack with PyTorch and Tensorflow, oh dear…

No: NumPy is not a solution for complex, real world problems, it’s part of a “solution” which has to be combined with many other things, like Numba or C++ interfaces to other low level libraries, at least in the context of “complex libraries”. And no, Numba is not a solution either and it’s not at all like Julia, yes both use LLVM but I guess that’s it. The type inference is different and the whole featureset is extremely limited. It works nicely with toy examples but it gets extremely complicated and unusable/unmaintainable with complex structures. It doesn’t even support recursion, only in a very limited way. Dictionary support is buggy and unpredictable and the error reporting and debugging in general is horrible(!).

I am not impressed at all that Numba can speed up something as simple as (taken from Examples — Numba 0.52.0.dev0+274.g626b40e-py3.7-linux-x86_64.egg documentation):

@jit(nopython=True)
def mandel(x, y, max_iters):
    """
    Given the real and imaginary parts of a complex number,
    determine if it is a candidate for membership in the Mandelbrot
    set given a fixed number of iterations.
    """
    i = 0
    c = complex(x,y)
    z = 0.0j
    for i in range(max_iters):
        z = z * z + c
        if (z.real * z.real + z.imag * z.imag) >= 4:
            return i

    return 255

Even if it manages to use complex numbers. We need to process millions of hits from different types of PMTs and apply timing and position/rotation calibration for each of them while they are housing in different optical modules which are again connected to different detection units and synchronised via GPS clocks to each other. Event triggering needs to be adapted to the constantly changing PMT rates and the trigger algorithms are based on complex clustering and ML (deep learning, NN) algorithms. Numba’s documentation is awful and really not designed to be extensible. This changed a bit over the past months but it still makes it almost impossible to make it work with a well structured class hierarchy.

So you end up with a combination of all those above mentioned technologies which makes Python so good in scientific computing. Of course you have to maintain low-level API compatibility to dozens of software written in different languages with a very dynamic development cycle (this is definitely true for many scientific Python packages) and whenever someone jumps in, they are facing a multitude of technologies and the project feels like a house of cards, even with >90% code coverage and an as restrictive dependency change policy as you can imagine (which btw. slows down the development process immensely).

We have a somewhat working solution with a very fairly complex piece of software combining technologies/libraries like Numba, pybind11 with C++ libs, numexpr, Cython, awkward arrays (jagged arrays which are mandatory in particle physics, based on Numpy) and a lot of tweaks in each of them, many times with calls to unstable or private functions which break here and there.
I can tell you that it’s a nightmare to maintain such packages and the worst thing is that whenever a user needs more than the in-house implementations/solutions and want to try something new, they need to understand all those connections, tweaks and hacks und also grasp what Numba, JAX or whatever fancy low-level library is capable of without catapulting the performance orders of magnitudes to the roof.
…and still, if someone needs to write high performance code, they need to switch to our C++ libraries because we need more than arrays of int32 and float64 etc. Our DAQ framework written in C++ has several hundreds class definitions for detector components and signal types. So don’t say that Numba and NumPy will help to implement algorithms which deal with those… That would be a nightmare.

I am trying to build a parallel world in our astroparticle community with Julia software and show how nice and transparent a software can be: written in a single language without the feeling that it’s a patchwork thing of multiple languages and approaches, and the performance and usability in general is always better than our Python-wrapped solutions (which are in fact only high-level interfaces to a multitude of non-Python code) for many reasons. The above mentioned example with the hit-processing can be nicely modelled with hierarchical types for PMTs, hits, calibrations and the code is self-explaining. It runs on macOS (our DAQ software does not), Linux and even Windows (no DAQ either as you have guessed). Of course it only covers parts of the actual functionality but you can do both low-level analysis and high-level analysis. I know what I am talking about since I have written Julia software on both ends: real-time neutrino event reconstructions and also several high-level analyses on computing farms and grids with multi-terabyte datasets based on the ROOT dataformat.

One of the main problems in my opinion is that it’s hard to convince people to switch from a popular language, where many things are kind of “OK”, even if in fact, they are definitely not “solved”, but patched to death and whatever works, to a language which solves those problems whose existence they kind of refuse. Most of the time people don’t even know how far they are from the actual performance of their hardware and they simply accept that some things are barely possible. Python got momentum and we have solutions to make code run fast, but the reality is that only experts who understands those supplementary libraries are able to write performant code for complex operations. I constantly teach people how to use our software, even if it’s as user-friendly as we could design it: Python is a dynamic language and you can easily leave the performance-zone without noticing it, when you are not familiar with the underlying concepts. So many times students come to me and tell: that worked with the small set but now 5000 jobs crashed on our computing grid because of memory error (hello pandas.DataFrame concat) on all nodes or due to a naive Python implementation because they did not know better - it worked with 100000 events, so it will surely work with 10^9 events, it will just take longer…
In Julia, such surprises are less likely, even if “the global scope” is one of the main pitfalls newcomers are facing, which definitely has an impact on performance, yet, naive code usually runs as fast as the algorithm dictates. That being said, I am not talking about “bad” algorithms, for sure they fail (to scale) in any language.

I still don’t how to deal with this (Python, the great attractor). In my opinion it feels like a train that you cannot stop and more and more people are jumping on while it’s getting faster and faster. Whenever I talk to people who are really involved in heavy optimisations of complex libraries which are mainly used in a Python context, the general consensus is that it’s really neither fun, nor easy and on top of that, the users are struggling constantly and are often offered only a subset of the full feature-set.
But we (including myself) are still doing it because there is demand. We keep adding high level functions for implementations that the end-users could not manage to hack with a reasonable performance.

45 Likes