How hard would it be to implement Numpy.jl, i.e. Numpy in Julia?

@stevengj, two small remarks:

Pythran, Numba, etcetera can only do a good job on a very small subset of the language — only one container type (numpy arrays), only numpy scalar types (or specially annotated struct-like types using language extensions), and very limited polymorphism.

Hum, the subset is actually not so small and it is really what is used in practice in most Python Numpy numerical kernels. In terms of containers, Pythran supports list, tuple, dict, set and np.ndarray (so not only np.arrays). Of course, user-defined types are not supported (for that, Julia is great!), but for numerical kernels, one can already do many practical things with standard types.

In your article, there is a section on “other languages”, I was surprised that you didn’t mention that C++ does these kinds of things very well, and by the way C++ can also use SIMD instructions out of the box with broadcast operations (for example with xsimd), which does not work right now in Julia (see pythran - How to speedup multiple broadcasts in Julia - Stack Overflow). Fortunately it seems that it could be fixed in future versions Broadcasting is much slower than a for loop · Issue #28126 · JuliaLang/julia · GitHub.

Just to recall here that other languages than Julia are also not so bad for numerics :slight_smile:

1 Like

I am not sure anyone implied this, so please don’t try to create this impression.

Many languages are great for numerical algorithms per se, including C and Fortran. Julia, on the other hand, is about making programs (including numerical algorithms) reasonably abstract, generic, and fast. Many languages can do 2 of these, but achieving all 3 at the same time is a challenge.

2 Likes

Many things have been said. I think it is interesting :slight_smile:

  • @stevengj it’s remarkable that you write that for arrays of simple types, Julia is not going to be faster than Pythran and Numba. However, I already used Julia and Numba, and from my experience (which is consistent with other benchmarks), even writing not so clever Julia, Numba is most of the time still a little bit slower than Pythran and Julia.

  • I understand that one of the strong feature of Julia is to be efficient with user-defined types and arrays of user-defined types. C/C++/Cython are also strong at that but Julia is a dynamical language so yes it’s very impressive!

However, I also see that many many things in computing can be done with only standard numerical types. In particular the numerical kernels that we have in Python - Numpy are used with arrays of simple types. This is what we would like to accelerate in Python codes and this is what has to be transpiled to be used efficiently from Julia. I think that being able to accelerate (or transpile for use in Julia) these kinds of numerical kernel (even only for simple types) is already something very interesting and useful.

  • The argument “We won’t be better than (the C++ backend of) Pythran (or Numba)” does not seem right to me. Having alternative backends / accelerators for such tasks is of course useful.

I agree that there won’t be huge performance differences, but depending on the cases/hardware, the performances won’t be the same and Julia can be very good for this task on some cases. Moreover Numba needs competitors on the JIT side :slight_smile:

Moreover (1) it would allow people to run numerical Python / Numpy kernels efficiently from Julia with a clean and clever transpilation (Pythran frontend is not that bad) and (2) people knowing Python/Numpy (again, a lot of people) would be able to start to use Julia first by using a pure Julia Numpy.jl (and not the slow PyCall.Numpy.jl), which will be much easier for them and efficient and alright for many tasks. The transition towards proper Julia could come progressively, for example when they feel the need of using user-defined types.

I guess we all know that Python is still widely used in sciences and data analysis. Many students learn it. Some energy / money / work is spent in improving the language, the interpreters and the ecosystem. It’s highly unlikely that its use will strongly decline in the next years. I think it’s more clever for Julia and its community to couple well with Python rather than other strategies. A good Numpy.jl would be good for that.

  • From your article, I understand (tell me if I’m wrong) that the syntaxes that are missing in Python Numpy compared to Julia (for high level computations based on simple types) are (mostly? only?) macros like @. or @simd at the beginning of some lines. It recalls me OpenMP pragmas that are written in comments (which works well in Pythran-C++ by the way). It would not be difficult at all for a transpiler to support things like

    # julia @.
    X[:] = f(2*X**2 + 6*X**3 - sqrt(X))
    

    It is still just plain Python, it does not change the behavior in pure Numpy or Pythran-C++, but it would specialized the function for pythran-julia. Note that I understand that you’re going to find this ugly, but it would be fine for what I’d like to do.

  • To summarize, I still don’t understand the strong negativity against this project. It seems to me that the technical arguments that were given does not disqualify it. At the same time, I think I’m wrong, otherwise, you would not have told me that it was such a bad idea. But I’d like to be convinced.

I think the next step for me would be to organize a small Pythran challenge on a representative but limited set of simple numerical kernels used in the real life (not micro-benchmarks). It’s going to be much less ambitious and interesting than GitHub - SimonDanisch/julia-challenge: Many language implementations for the Julia Challenge: https://nextjournal.com/sdanisch/the-julia-language-challenge but it could still be interesting.

It particular, the result of such challenge could at least help the Julia community to explain why transpiling numerical kernels from Python / Numpy into effective Julia is not possible :slight_smile:

I am not sure anyone implied this, so please don’t try to create this impression.

More Dots: Syntactic Loop Fusion in Julia “Should other languages implement syntactic loop fusion?”

Many languages are great for numerical algorithms per se , including C and Fortran. Julia, on the other hand, is about making programs (including numerical algorithms) reasonably abstract, generic, and fast. Many languages can do 2 of these, but achieving all 3 at the same time is a challenge.

Great! Agreed!

Note that nice things can be done by using different languages that couple well together, for example a powerful glue language and a powerful static language :slight_smile:

You implicitly assume that working with only one language is much better than other solutions. I understand the advantages. One can also argue that between the simple Julia that we learn and the difficult very powerful Julia that experts write, the differences are so big that we also have a division of the community between users and experts. It seems to me that it’s difficult to avoid this because HPC is indeed very complicated while it has to be used by simple users.

This is partly correct, but the Julia language community is continuously striving to make things simpler. Unfortunately, designing simple, intuitive, composable, and performance-friendly interfaces is hard, so this is work in progress.

As an example, consider CartesianIndices. Around 2016, it was something quite esoteric for most users, now it is much more easily usable. Another example is Nullable, which was difficult to use (because of the need to calculate counterfactual types), so eventually the compiler was modified to work well with small Union types and replaced this construct.

That said, most languages have arcane corners. Eg R is usually considered rather approachable, but digging deeper can reveal a lot of difficult parts.

Yes, other things being equal (they seldom are) I think that working with a single language has a lot of advantages.

My impression is that this difference is a lot smaller than for many other languages. If you read code in Base, you will often find code that is super simple, obvious, short and generic. I’ve been surprised many times when I see that some operation I thought would be insanely optimized is just a single line/expression mapreduce, for example.

Doing simple things is often very simple, and then of course, things get harder as the underlying problem does.

2 Likes

However, I already used Julia and Numba, and from my experience (which is consistent with other benchmarks), even writing not so clever Julia, Numba is most of the time still a little bit slower than Pythran and Julia.

That comparison is not really relevant for the project you propose. Julia is not magic. What makes Julia fast are some very careful choices in how it behaves. These behaviors are different from how Python and even Numpy behave. The key is that as soon as Julia begins emulating Python’s semantics (even small subsets), it needs to perform the exact same computations that Python or those subsets need to perform. It’s verging on disrespectful to the many many smart folks who work on those Python projects to suggest that Julia could magically swoop in and do their job better than they’ve been doing for the past decade+.

You give the example of a Julia transpiler supporting comments such that it could fuse the following, and suggest that adding the following comment would allow Julia to fuse these loops and have it behave just as it did in pure Numpy:

# julia @.
X[:] = f(2*X**2 + 6*X**3 - sqrt(X))

Even if we assume that X is a bog-standard numpy ndarray and no funny business has occurred, there are two major problems with f:

  • We don’t know if it’s elementwise or array-wise. Is it lambda x: x * x or is it lambda x: x @ x? Now, your comment is implying that it’s elementwise, but do you really want a comment changing the semantics of your code?
  • The f is probably something you want a user to define and pass to the kernel. But if they define it as a plain-old Python function then we have no idea what it’ll do, even if we assume it’s element-wise. We need to emulate all of Python just to execute f — and do it for every single element.

This is just scratching the surface of all the technical problems here. But I hope you’ll trust that all the folks behind those Python projects are extraordinarily bright and already doing all the things Julia can do where they can.

10 Likes

I think the deeper problem here is that there’s a disconnect between how the Python and Julia languages view numerical problems. Perhaps it also has something to do with different goals between the two communities.

I think you stated above that the subset of statically-compilable Python, which includes numpy arrays and some common Python types, is “not so small.”

This statement is fundamentally at odds with the thinking of a lot of the Julia community. A lot of people are here, myself included, precisely because numpy arrays and a few Python types are not sufficient certain kinds of problems we deal with. A good example of this is shown at the end of this blog post, where the author states that a user-defined Julia type including a number and its uncertainty was able to propagate uncertainties through the differential equation solvers in DifferentialEquations.jl basically for free—they just wrote the type, passed it the to DifferentialEquations solvers, Julia compiled a specialized version of the solvers for that particular type, and the solver spat out the solution to the differential equation at every timestep with value and uncertainty. The author compares that to the amount of work required to do something similar in Python—it would require, at minimum, rewriting and recompiling your numerical kernels to accept the uncertainties type! Not something you really want to be doing.

That’s the kind of thing we value most about Julia: It makes abstractions easier to implement than C++ (no more 40-line compiler template error because you forgot a semicolon, or worse, because you didn’t know you had to write a template disambiguator). It makes them easier to implement with good performance than in Python because you can make user defined types play well together with good performance, as in the example above. On the other hand, there’s little interest for trying to do things in Julia that can already be done well in C or Fortran, and the consensus here seems to be that the numpy kernels are something that C and Fortran do really well. Think of it as a sort of cost-benefit thing, not a personal attack on your idea.

12 Likes

Indeed, it’s important to emphasize that the sorts of optimizations that make Julia fast are not necessarily the optimizations that make Numba and Pythran fast. Numba and Pythran have had many people spend a long time thinking about how to optimize small snippets of python code. Julia’s optimization efforts are much more holistic and it’s not hard to find micro-benchmarks where Numba outperforms Julia handily.

The issue is that Numba doesn’t really scale to large projects the way Julia does because that’s not where the design effort of Numba was focused. I wouldn’t be very surprised to find that even a very careful and well written Numpy.jl was strictly slower than Numba.

4 Likes

The Python list semantics are equivalent to Julia’s Vector{Any}: any type of Python object can be stored in any element. While it is possible to compile list code (Julia does!), it is not possible to compile most list operations to code that is particularly fast without changing the language semantics. Indeed, even though CPython’s built-in sum function is implemented in heavily optimized C, it is an order of magnitude slower than sum of a corresponding NumPy array.

Numba “supports” lists etcetera to some extent too, but just because something JITs and runs doesn’t mean it runs fast. The way Numba handles this in nopython mode is to change the list semantics so that they are effectively more like numpy arrays, and imposes a large “reflection” cost when the list is passed to/from pure-Python code.

C++ was specifically mentioned in the “halfway solutions” section. People have implemented broadcast fusion for C++, but only for an extremely limited set of types and operations. (Similarly for Fortran 90+ and several other languages mentioned there.)

Sure, arrays of simple scalar types are useful! If that’s all you want, there is a language called “Fortran 77” that you might be interested in for your future numerical needs.

The Julia community (including me) has devoted and continues to devote a huge amount of effort to interoperating with and leveraging the Python ecosystem. People knowledgable about compilers telling you that a Python-to-Julia transpiler would not add much does not mean we don’t value Python interoperability.

Basically, the transpiler idea is attractive to people who don’t understand why compiling Python effectively is hard (which you have admitted repeatedly — you think a Python program provides the “same amount of information” as a Julia program and you “don’t understand” the idea that there are differences in language semantics that affect compilers), so they don’t see why there is a benefit to switching front-end languages. I’m not sure what more can be said to make you understand. But to the extent that a compiler front-end can analyze Python code effectively, you don’t need a Julia backend in the first place.

22 Likes

I think a cultural difference is causing tension here. Julia takes “pure CS” approach to a number of things (optimizations should never change program behavior, it’s better to indicate syntactically what you mean, etc.) while pythran is strictly best-effort (from the docs: “Pythran tries hard to produce code that has the same observable behavior as the original Python code. Unfortunately it’s not always possible”, and I imagine there are a number of tricks it plays, things that have slightly different behaviors, etc). I think it’s useful to think of Pythran as a python-based DSL for scientific computing, and it’s great at that. Julia wants to be a general language and a robust foundation for numerical computing libraries and tools, and does not have that luxury (and as a result Julia may have some annoyances that pythran doesn’t)

If pythran people want to use Julia as a backend, then godspeed, but there understandably won’t be a lot of interest on the Julia side. Using Julia to help people write fast python code certainly isn’t a priority: the focus is on having it all, and improving Julia so that a DSL in julia won’t be necessary. In fact, there used to be a Julia-based DSL in the form of ParallelAccelerator. As far as I understand, the serial functionality has been largely superseded by dot fusion, and the parallel functionality is waiting on improvements to threading.

5 Likes

I’m not a specialist in languages and programming but I never wrote that “I don’t understand the idea that there are differences in language semantics that affect compilers”. Please do not deform my words. It’s really not nice.

I don’t understand what is so wrong in terms of performance in the semantic of Python / Numpy used for efficient numerical kernels. And of course, one should not use extensible arrays of references (Python list) for critical parts of numerical kernels, and it is not a problem of a particular language. It is also possible to write very inefficient things in Julia.

I would be very interested to understand why for these simple computations (no user-def types) the Julia language (not the interpreter) is so much better. I think I need to study examples in numerical kernels used in the real life and compare the semantics. A Pythran challenge would definitively be interesting.

I’m going to stop with this discussion now. Thank you for your answers. Bye.

This was nearly a direct quotation. You wrote: About the limitation “to Python/Pythran/etc semantics”, I have to say that I don’t understand. Algorithms written in Python / Numpy are fully described. There is the same amount of information as for many other languages, including Julia for many cases.

It’s fine to not understand why compiling Python is difficult, but it is frustrating that you are simultaneously so insistent that everyone else is wrong.

Julia is probably not much better (if at all) if you have a simple loop-based/vector computation on numpy-like arrays and a limited set of “built-in” data types (mainly numpy scalars), no outside library code, and limited polymorphism. These narrowly defined benchmarks are where Numba etcetera can do fine.

It’s only when you start composing libraries, fully exploiting polymorphism, trying to utilize more complicated data structures, and employing user-defined types for automatic differentiation or interval arithmetic or units or anything interesting, that the Numba approach hits the wall of Python semantics. But doing these things is why people use high-level languages in the first place; otherwise you might as well be writing in Fortran 77.

And in cases where Numba etcetera fall short, the limitation is the front-end analysis; a Julia backend/transpiler won’t (and fundamentally cannot) help.

6 Likes

It starts to be disrespectful. Please!

In all this discussion I have always talked about “simple Python / Numpy used in the numerical kernels”, where of course we don’t use all Python semantic / power / introspection.

I perfectly understand that compiling / transpiling full Python programs is very hard and that there are many performance problems.

I perfectly understand that Julia is a great piece of technology, and offers very interesting possibilities.

When I say “I don’t understand”, I clearly don’t say “everyone else is wrong”, but that I’d like to be convinced and I’ll study to better understand.

Julia is probably not much better (if at all) if you have a simple loop-based/vector computation on numpy-like arrays and a limited set of “built-in” data types (mainly numpy scalars), no outside library code, and limited polymorphism. These narrowly defined benchmarks are where Numba etcetera can do fine.

I’m sorry, but it is what is useful in my work for the numerical kernels. The scientific complexity + nice user experience + usage of outside libraries is built outside the numerical kernels and I don’t need very high numerical efficiency for that (and Fortran 77 wouldn’t be fine for that).

I don’t think anyone here is saying it is impossible; most of the responses so far have simply been a cost-benefit analysis with a conclusion in the negative. At the risk of prolonging the thread, I tried to generalize your initial question in two ways, and make a few points about each.

  1. how suitable is Julia as a library implementation language; in other words, using Julia to implement a library which is consumed by other languages via, for the sake of simplicity, a C API.
  • The largest issue is that the tooling around specifying, precompiling, and verifying a C API, is still somewhat immature.

  • The in-Julia debugging story is rapidly (!) improving, however, inter-language debugging is yet to be explored in the latest iteration (whereas it is feasible if not always painless for e.g. Python+Cython/C)

  • The Julia packaging system is great for use w/in Julia, but for a library you would probably want to use Conda, Homebrew, Apt, etc. — and none of those systems have really been used to distribute Julia libraries, yet, so doing this would be breaking new ground. One exception is pip, which has at least 1 existence proof in the form of diffeqpy linked earlier.

  1. how suitable is Julia as a compilation target, as compared to, say, C++ (which both Pythran and Cython effectively target).
  • Julia is about as suitable as any other garbage collected, high-level language (Java, Javascript, etc.). Julia is effectively an LLVM frontend, however — there is no “Julia byte code” to target, as in the JVM. So while there is some simplicity in targeting a higher-level language, you lose a significant amount of control and flexibility.

    People target the JVM (bytecode) and Javascript/WebAssembly at very minimum for logistical reasons which don’t really apply to Julia: the JVM is extremely mature and widespread; Javascript and WebAssembly are mature, and are (or will soon be) available on nearly every user-facing computing device on the planet.

  • At least one strength: support for LABEL and GOTO :slight_smile: which JavaScript and WebAssembly do not support (though there are powerful solutions, of course).

4 Likes

This may cut to the heart of the matter: you seem to think that people here are being defensive about Julia. Quite the opposite. If anything, your suggestion that compiling Numpy to Julia would provide performance advantages is giving too much credit to Julia and not enough to Numpy. The counterargument (which has been given a number of times) is that for the API that Numpy exposes the Numpy implementation is already pretty much optimal. The Numpy folks are really, really good at what they do. The notion that compiling to Julia will somehow make things faster is kind of insulting to the Numpy team—who aren’t here to defend themselves, so the Julia community is making that case for them. Julia doesn’t magically generate faster code that other systems can’t match (we use LLVM, they use LLVM—how would we be faster?). Julia’s secret sauce is in designing a language that feels just as expressive and flexible but which can be compiled to fast code in a general and composable way. Which doesn’t help at all when the code you’re trying to run is written in Python rather than Julia.

The case you’ve made for why it would be interesting to compile Numpy to Julia is frankly a bit weak. Here are some of the arguments I’ve been able to glean from the discussion:

  1. It would be interesting (ok, but subjective).
  2. It could be faster (it won’t).
  3. It could lead to better GPU support (possible, but unlikely)

Improved GPU support by compiling to Julia is unlikely because if you can compile Numpy to GPU via Julia, then you could even more easily compile Numpy to GPU directly. This would be simpler and faster both in terms of compile time and the resulting code speed and since Julia and Numpy both use LLVM for code gen, the end result would be very similar. The biggest problem for Numpy on GPU is likely that it’s much harder to avoid dynamic allocation in Python than in Julia—even the subset that Numpy can compile needs it—and the GPU doesn’t play well with dynamic memory allocation. Compiling to Julia wouldn’t help with that at all since you’d be still be using Pythonic APIs, not Julian ones.

All that said, the great thing about open source is that people can do what they want to! If someone—perhaps you—wants to try to reimplement Numpy in Julia, they can absolutely do so. The thing people in this thread are trying to tell you is that you are likely to be disappointed with the results.

19 Likes

Here are a few pointers to get started:

That last one is by the author of LuaJIT, and the points he makes are very good: again, it’s not impossible to make Python fast, but the point is about the level of engineering effort required being very high (possibly even more than Javascript for some features). Obviously various projects have succeeded to a degree, like PyPy (at the cost of compatibility with the CPython ecosystem, until recently; it is not more widespread for that and other reasons), or Numba (by restricting scope etc.).

So in that respect, targeting Julia doesn’t add too much to the intrinsic work required on the frontend.

Personally, if I was going to work on a “make Python fast” project, I would probably choose GitHub - oracle/graalpython: A Python 3 implementation built on GraalVM. This has two advantages:

  • the implementation is “simple” (relatively speaking) in the sense that a language target is written as an AST interpreter, and then you get a JIT “for free”. So, the idea here is that by simplifying the implementation, the amount of engineering effort required to get a competitive JIT with all the fancy tricks needed to make Python fast, is reduced (see e.g. Laurence Tratt: Domain Specific Language Implementation via Compile-Time Meta-Programming).
  • They’ve demonstrated in the Ruby version that they can compile Ruby C extensions (now via LLVM->Sulong) together with Ruby code, and get very impressive speedups (for both the pure Ruby and potentially w/ inter-language inlining etc.). This would be a huge boon, because it would maintain compatibility with the existing scientific Python ecosystem.
6 Likes

If the restricted kind of code that Numba/Pythran can optimize is fine for your application, then lucky for you! We use Julia because we want more flexibility and composability than that, not because Julia is magically faster at simple a[i] = 2*a[i] + 1 style loops over arrays of Float64 scalars (it isn’t). There was a long thread about this that we don’t want to repeat: Julia motivation: why weren't Numpy, Scipy, Numba, good enough?

5 Likes

2 posts were split to a new topic: PyTorch and Julia

I’ll post some once I have time to make a POC package.

1 Like