How hard would it be to implement Numpy.jl, i.e. Numpy in Julia?

I recall the xkcd comic, but wasn’t aware of this usage. Thanks!

5 Likes

Let’s say that one has a numerical Python code, with numerical kernels. In Python (similarly as in Julia), these numerical kernels need to be compiled to get good perf. There are different way to do this in Python. With Numba or Transonic (which uses Pythran), we just need to add a decorator.

Transonic / Pythran use C++ under the hood, but another backend for Pythran could be to use Julia. But for that, we need a pure Julia library that mimic the behavior of Python / Numpy.

2 Likes

I’m a bit slow today (couple all nighters with sick kids) so let me see if I get it.

  1. Pythran is currently a python library that provides decorators that pass python back to C++ for compilation.
  2. You’re saying it would be cool if it could also use that same user facing API to pass python code to Julia for compiling but with a focus on numerical computing.

If this is the case then I think the easiest thing would be to work with the aforementioned libraries to establish better Python transpiling and then tightly couple that with decorators provided in Pythran.

So, you’re trying to pitch a project, that won’t improve the life of any Julia user. This is a tough sell! Even if it may improve Julia adoption, there is almost no one here who has the luxury to work on a project that won’t benefit them directly.

Thank you for telling me.

That said, I think increasing the Julia user base and working on GPU support would contribute to improve the life of Julia users.

If you decided this project is worthwhile, you’ll likely be the one to implement it :wink:

I wasn’t asking you to implement this Numpy.jl. I was asking for technical points of view and I wanted to know if people in Julia discourse could be interested in the project.

I have most of my answers (thank you), and I also see how friendly is this forum :wink:

Bye bye! Sorry for the noise and the bad idea.

1 Like

Please keep in mind that we respond here because we want to help. We may not seem as polite as you might like but I can only add so many assuring words concerning my intentions if I’m writing this at a bus stop in on my phone. If you’re that passionate about this please don’t let discouragement stop you. You just might have to reassess how you approach this.

7 Likes

Hi @Pierre_Augier
There is in fact already a work in progress on such a transpilation tool that could be leveraged GitHub - JuliaCN/Py2Jl.jl: Python to Julia transpiler. (see Python to Julia transpiler for some context)

6 Likes

This is probably relevant:

http://juliadiffeq.org/2018/04/30/Jupyter.html

It’s a python package that uses a Julia library as a backend. I think that’s a good way of showing that Julia can be used as a replacement for C++ for accelerating python.

7 Likes

Yes, an approach to python packaging along the veins of diffeqpy is really exciting and definitely worth the effort. Look at how simple that diffeqpy package is! Instead of writing a transpiler and emulating numpy’s semantics, we can just make it easier to build python packages that have been written in Julia. It should be possible to take this even further and use PackageCompiler.jl to eliminate the need to install Julia at all.

I envision emulating all of numpy as having a very long tail of bugs and inconsistencies that will eventually lead to an effective re-write of the Julia functions that you were hoping to seamlessly exploit. If instead you just do the first part of such a project which would ensure that you can easily compile Julia down to a very fast interop from python, then you can start building up Julia packages incrementally. And now you’re actually getting more Julia code out there in the wild!

17 Likes

I really don’t see how any of the answers in this thread has warranted this sort of tone.

You have gotten several helpful answers, and some answers explaining why this may not have a very wide appeal.

No one has been impolite, except you. And now, me.

7 Likes

There are a couple of independent questions here.

Should we implement NumPy functionality in Julia? Definitely yes, and in fact this is mostly done — nearly all of NumPy (and much of SciPy) functionality, including zero-based arrays, is already available in the Julia standard library or packages.

Should we implement a Julia backend/transpiler for Pythran/Cython/Numba? There seems to be essentially no benefit to this because:

  • It wouldn’t be any faster — the semantics of Python/Pythran/etc would limit such a transpiler to generating code pretty much equivalent to what Pythran etc. generate now, and for NumPy-like operations on NumPy data types the C implementations in NumPy are basically as good as anything we can do.

  • It wouldn’t gain you any advantages of Julia, and in particular you wouldn’t be able to write efficient fully type-generic code, because again you are limited to Python/Pythran/etc semantics.

  • It wouldn’t make it any easier to call innovative Julia libraries (i.e. ones that don’t simply duplicate NumPy operations) compared to work on further developing pyjulia.

In cases where Julia programmers develop innovative new functionality or greater performance that is not available in Python, it is definitely nice to expose this programmers in Python and other languages where possible, and in general I’m a big fan of inter-language bridges! But transpilers are an overcomplicated way to do this compared to simple glue code like pyjulia (e.g. as used by diffeqpy) to let you call Julia libraries from Python.

I feel like there should be a Julia FAQ about this somewhere, because it’s pretty common for people’s first reaction to Julia to be Lots of people are used to language X, why can’t you just compile this to Julia? (Update: there is now a FAQ.)

35 Likes

In regards to creating and FAQ for this sort of thing.

In addition to existing resources for interoperability, it would be nice to have some references to non Julia authors who have pointed out similar problems with Python transpiling. I think people get the impression that Julia users are being intentionally difficult when responding to these things when in reality Julia is just one of the many answers that has come along in response to this python compiler problem.

For example, I think Google recently started moving to a Swift public API for Tensorflow because of this issue with realistically compiling python syntax code.

2 Likes

I’m sorry stevengj, but I disagree.

First, I never asked Julia people to implement a backend for Pythran / Cython / Numba. First, it doesn’t means anything (I’m thinking about a backend for Pythran) and then, the backend part would of course be our task. I was asking about the feasibility of Numpy.jl.

About the “no benefit”:

  • It wouldn’t be any faster — the semantics of Python/Pythran/etc would limit such a transpiler to generating code pretty much equivalent to what Pythran etc. generate now, and for NumPy-like operations on NumPy data types the C implementations in NumPy are basically as good as anything we can do.

Do you know the subject? I can assure you that numerical kernels transpiled in Numpy.jl could be much faster than pure Python / Numpy. It’s pretty easy to beat Numpy code with all the temporary objects (and of course as soon as there are loops). I don’t think it would be faster than Pythran on CPU (but the jit warmup would be faster). I strongly suspect it could be faster than Numba. Moreover, Numpy.jl could also be interesting for GPU support.

It wouldn’t gain you any advantages of Julia, and in particular you wouldn’t be able to write efficient fully type-generic code, because again you are limited to Python/Pythran/etc semantics.

But why? Several numerical kernels written in Numpy are “fully type-generic code”, so they can be transpiled in fully type generic Julia code. There are very few cases where types are encoded in the code of Numpy numerical kernels, and sometimes it could be considered as a bad practice and it could easily be avoided (for example with np.empty_like).

About the limitation “to Python/Pythran/etc semantics”, I have to say that I don’t understand. Algorithms written in Python / Numpy are fully described. There is the same amount of information as for many other languages, including Julia for many cases.

I’ve read a lot of numerical Python codes and some Julia codes and in the high level numerical kernels, I didn’t notice any huge fundamental differences. In both languages, I see allocations, broadcast operations, indexing, function calls, loops, conditions, etc. What more ? Few macros in Julia, like @. , @simd or @inbounds? Other things?

It wouldn’t make it any easier to call innovative Julia libraries (i.e. ones that don’t simply duplicate NumPy operations) compared to work on further developing pyjulia.

This is indeed not the goal of the tool. But if you use a transpiler to Julia for numerical kernels written in Python / Numpy, it does not restrain you from using in other parts of your programs pyjulia for other tasks.

Lots of people are used to language X, why can’t you just compile this to Julia?

It’s indeed a very good question. A good text about this issue would be very interesting. For me especially for the cases of numerical kernels written in simple Python + Numpy. Good examples would be very welcome!

You can find good cases supported by Pythran here https://github.com/serge-sans-paille/pythran/tree/master/pythran/tests/cases It would really be interesting if you find code which is incompatible with fast Julia in these examples (or where there are not enough information to write fast Julia from it).

1 Like

I think there has been a misunderstanding here :wink: Pretty sure, that @stevengj was referring to the fact, that a Julia backend wouldn’t be faster than your current Pythran backends (not considering GPU support)!

2 Likes

I think so…

24 Likes

As @sdanisch said, I was comparing to what Pythran/Numba etcetera do now. The limiting factor here is Python semantics — you can’t fuse loops as an optimization unless the compiler (or transpiler) can prove that it won’t change the results. In practice, this can only be done for a small set of types and functions recognized by the compiler, which is why things like Pythran and Numba are effectively limited to numpy arrays and a particular set of library functions on those arrays, and get stymied if they see a call to arbitrary Python code. A Julia transpiler or backend would have zero advantage here.

In fact, I wrote a long blog post explaining Julia’s approach to loop fusion for vectorized operations, the challenges of this problem, and how we are able to do it for generic user-defined functions and types. But the advantage we have here is lost if the front-end code is Pythonic.

You can write type-generic Python code, but you can’t compile it to fast code in general, because the semantics of the language do not allow it. This is why Pythran, Numba, etcetera can only do a good job on a very small subset of the language — only one container type (numpy arrays), only numpy scalar types (or specially annotated struct-like types using language extensions), and very limited polymorphism.

The question is how much information is easily accessible at compile time. Even though every program in every computer language is “fully described” in the sense of prescribing a deterministic sequence of actions (modulo stochastic algorithms and undefined behaviors), there is often lots of information that is nearly impossible to get in advance without actually running the code.

Basically, Julia is designed to provide more information at compile time than traditional dynamic languages. This occurs because of lots of properties of the language and standard library that Python doesn’t have — type-stable libraries, final concrete types, parameterized immutable types, and so on.

Realize that people have tried for years now to develop an effective general-purpose compiler for Python. It’s a hard problem! Why do you think that projects like Numba and Pythran have targeted such a small subset of the language?

The backend of a Python compiler is the easy part, and something that a Julia transpiler would add nothing to — the challenging part is the front-end analysis.

33 Likes

Isn’t it actually an advantage for using Julia as a backend? The fact that Julia has a strong broadcasting system could be a good reason to use Julia, even though you would be using only “plain old data” (POD) types. Also, it probably is possible to support GPU arrays in a “type-generic” manner if the original Python code is written in NumPy/CuPy-generic style. Functions that are generic “only” in the universe of CPU/GPU-arrays-of-POD-types sounds like a large enough set. I can imagine that some people are interested in writing transpiler for such class of functions.

Having said that, I agree that

Yea, interfacing at high-level sounds like a good approach to me. We can leverage, rather than fight against, language features of Julia and Python this way.

No, because we can only use the Julia broadcasting fusion if the front-end analyzes the Python code and shows that this is allowed. But in the limited cases where Numba etc. can do this, they are already generating fused code, so a Julia backend would have no advantage.

The advantage of Julia for broadcasting is that the caller indicates that broadcasting is desired at the syntactic level, and fusion is (essentially) a syntactic guarantee rather than an optimization (slightly more complicated by the Broadcasted AST machinery), making analysis trivial for the compiler. You lose this if the front-end syntax is Python, however.

If Julia were the only way to program for GPUs, then compiling to the Julia GPU backend would be advantageous. However, that is obviously not the case: Numba already supports GPUs.

Furthermore, the Numba manual states that numpy semantics prevented them from compiling most numpy code to GPUs. If they haven’t figured out how to work around this at the front-end level, then a Julia backend adds nothing. And if they do figure it out, they can tie it into their existing CUDA support and Julia adds nothing.

10 Likes

Since the OP was specifically about writing a new Pythran backend, I implicitly ignored “feature X is already implemented” type of arguments. Otherwise, it’s hard to motivate this since there are already Numba and Pythran. I was guessing maybe it would make the library more maintainable and extensible by leveraging Julia’s features. But without such restriction, your arguments make sense.

@stevengj, two small remarks:

Pythran, Numba, etcetera can only do a good job on a very small subset of the language — only one container type (numpy arrays), only numpy scalar types (or specially annotated struct-like types using language extensions), and very limited polymorphism.

Hum, the subset is actually not so small and it is really what is used in practice in most Python Numpy numerical kernels. In terms of containers, Pythran supports list, tuple, dict, set and np.ndarray (so not only np.arrays). Of course, user-defined types are not supported (for that, Julia is great!), but for numerical kernels, one can already do many practical things with standard types.

In your article, there is a section on “other languages”, I was surprised that you didn’t mention that C++ does these kinds of things very well, and by the way C++ can also use SIMD instructions out of the box with broadcast operations (for example with xsimd), which does not work right now in Julia (see pythran - How to speedup multiple broadcasts in Julia - Stack Overflow). Fortunately it seems that it could be fixed in future versions Broadcasting is much slower than a for loop · Issue #28126 · JuliaLang/julia · GitHub.

Just to recall here that other languages than Julia are also not so bad for numerics :slight_smile:

1 Like