How hard would it be to implement Numpy.jl, i.e. Numpy in Julia?

Hi,

I’m working with Pythran (http://github.com/serge-sans-paille/pythran), a Python/Numpy to C++ transpiler, which of course supports only a subset of Python (in particular not all the crazy cool Python stuffs stupid for performance). With Pythran, you can prototype in Python/Numpy and get very efficient C++ which does not use the Python interpreter.

We were thinking about the possibility to implement a Julia backend for Pythran, i.e. to be able to transpile the subset of Python/Numpy supported by Pythran in Julia.

The advantage of having such backend would be:

  • people coding in Python could start to use Julia with nearly no code modification (which would help them to then really use Julia).
  • most numerical kernels of Python codes could be translated in Julia.

Under the hood, Pythran developers wrote a clone of Numpy in C++. Doing the same in Julia would be the most difficult task to implement this Julia backend for Pythran.

So my question is: do you think it would be possible and doable to implement a clone of Numpy in Julia (at least the core of the API, since Numpy API is large). To be clearer, the Numpy clone has to be written only in Julia and has be able to return what the real Numpy function would return (it’s slightly more complicated than that in terms of types, as we can see for example with Cupy, another Numpy clone, but anyway).

I guess it’s a huge work, but I also guess there are equivalents of basically all Numpy functions in Julia, so I would be interested to have points of view of people knowing Julia well.

6 Likes

Use PyCall? With the new syntax of PyCall, Python code can be copy and pasted in many cases I believe.

If someone is willing to put that much work into something I’d think it would be more beneficial to do a comprehensive set of benchmarks that show users how to get the same method (svd, transpose, etc) in each language.

There are some mappings here: https://cheatsheets.quantecon.org/

2 Likes

If the idea is to compile Python into Julia for speed, using PyCall sounds like it defeats the purpose.

Many NumPy builds come with MKL, while unless you built Julia from source with MKL, it will have OpenBLAS. MKL is faster on many Intel architectures, especially those with avx-512 (but OpenBLAS support for avx512 is improving).
Therefore, these benchmarks are not necessarily going to be favorable to Julia, even compared to regular Python + NumPy.

The more actual code relative to BLAS calls involved, the better Julia will do. Loops, computational kernels, etc, optimizing or running MCMC on a function you wrote in Julia or Python, etc.

For performance, the transpiler also ought to be smart about using StaticArrays instead of base arrays, and applying @simd and @inbounds.
A pure Julia workflow is probably easier. Test your code. If it gets the correct answer and doesn’t have bounds errors, add @inbounds and friends. I guess the Python workflow would be similar: test in Python, and then transpile to Julia with those optimizations.

2 Likes

Is it possible and doable? Absolutely. Will it be hard? Yup. Is there enough of a benefit to make the major engineering effort worth the time and money? Probably not.

Folks keep subsetting and/or adapting Python in various ways to improve its performance. There’s a reason for this — the full Python semantics are precisely what keep it from going fast. Any such transpiler into Julia will also need to be a subset of Python if it is to improve on performance. So there’s suddenly a third “language” one must learn in order to use this transpiler. Perhaps it could use the same subset as Pythran, but then why not just use Pythran?

If you want to start using Julia with an existing Python codebase, just use Julia! You can use PyCall (or even ccall into Pythran-compiled libraries) to interface with your existing codebase.

29 Likes

Hum, it seems that some Julia people are allergic to anything about the Python language :slight_smile: Do you know that it has also nice features and that it is widely used, also for numerical computing? This discussion could also be a test on how open-minded you are :slight_smile:

I know about the nature of the Python language (too dynamic and with too much introspection for very efficient numerical computing), about some problems of the Python interpreters and about the issue of the border between the native and Python worlds. It is not necessary to tell me about these issues. It is really not the subject here. I won’t tell you about Julia’s issues neither.

The subset of Python / Numpy supported by Pythran is not a third language. It is what people use in practice in numerical kernels written with Numpy / Scipy. There is nothing crazy in term of performance about this subset and it is actually very close to a widely used subset of Julia.

The C++ Numpy clone in Pythran does not use the Python interpreter. It is pure C++ and it is very efficient (sometimes faster than Julia for high level code, even with all the @simd and @inbounds needed in Julia , and even just linked with OpenBlas).

But it could be interesting to have a Julia backend for Pythran, (1) to increase use of Julia by the Python developers / users (a lot of people), (2) for (maybe easier) GPU support, (3) for translation of Python numerical kernels in efficient Julia. Of course, using PyCall is not an option! It would be less efficient that what we get without Julia even just using Numpy!

Building an efficient Numpy.jl (in pure Julia) would be a great challenge and it would be useful (also for Julia). But of course, you can think that it is a waste of time to increase the interoperability between the two languages / frameworks and to attract Python users :slight_smile:

If Julia could be (as modern C++) a good tool to accelerate numerical kernels written in Python / Numpy, great! (for Julia and for Python) Otherwise, there are other solutions to keep numerical Python competitive in terms of performance.

1 Like

If you want to start using Julia with an existing Python codebase, just use Julia! You can use PyCall (or even ccall into Pythran-compiled libraries) to interface with your existing codebase.

Sorry, it is not an option for me :slight_smile: If you like Julia, you may understand that I like Python, and that I will rather call Julia from Python than the other way around.

I am really not against coding in Julia when needed, for particular tasks. But for many other tasks, I think Python is really good, efficient and adapted.

I don’t think I am the only person thinking like this…

You seem to be saying in that this kind of use of Julia in place of C++/Fortran/etc in the guts of some Python library would be good advertising, or a good way to convert users, but I don’t really see why. They would be unaware. If Julia’s GPU support offers an easier way to engineer the library you want, (2), then obviously that’s great, and if you contribute stress tests & bug-fixes then there are benefits all round.

Are you picturing a pure-Julia wrapper for base-Julia commands? This could be an exceptionally thin wrapper.

Unless you are picturing it providing Numpy-like syntax, is that the point? A bit like how PyCall overloads dots to look like Python. This could be done, and may have benefits for letting people copy-paste a function body from old Python code you trust into new Julia code, and get a version which runs.

But this seems a separate concern to the use of Julia to write guts of a Python library.

1 Like

I think the proper way to put it is “people coding in Pythran” not Python programmers. I have never used Pythran yet I use Python everyday. I have used though Cython, loopy, and other projects to compile python code. The difficulty of working in such ecosystem made me feel frustrated enough to want to start learning Julia. I actually find the multidimensional Arrays already in the language are pretty numpy looking.

Sometimes I miss certain functions (or functionality) from numpy. Maybe a list of functions we miss from numpy could help developers to give us some of the high level methods we love and miss. One of those methods in my case is np.unique(X,return_counts=True).

1 Like

You seem to be saying in that this kind of use of Julia in place of C++/Fortran/etc in the guts of some Python library would be good advertising, or a good way to convert users, but I don’t really see why. They would be unaware.

They would not be completely unaware, because they would have to install Julia, which is already a good first step, isn’t it?

Despite the profusion of smileys, I find this kind of tone quite arrogant.

You asked about the viability of a project, and got reasonable answers from knowledgeable people. The fact that they don’t consider this project viable based on a cost/benefit analysis does not mean that they are not “open-minded”, merely that they would not invest time in it because they can use it for better things.

This is not because they don’t know Python — many Julia programmers used Python formerly, including numerical packages. A lot of experience from Python and similar languages has been incorporated to the design of Julia. Preferring Julia does not mean that one is “allergic to Python”, simply that one prefers Julia. In most cases this is because Julia was designed to address issues that cannot be dealt with in Python without a fundamental redesign of the language.

22 Likes

You know, methods in numerical computing change. In numerical Python, Cython, Numba and Pythran are improving. Now, it’s very easy to use Pythran inside Cython (and Cython is widely used). You are right that Pythran is not yet very popular, but it is growing.

1 Like

Despite the profusion of smileys, I find this kind of tone quite arrogant.

I’m very sorry for that. It was of course not my will. I’m really sorry that it was interpreted like this. Note that very hard opinions on Python from Julia users/developers could also be felt as arrogant sometimes.

My point was to say that I am not talking about the problematic parts of the Python language here. Again, I don’t think the Numpy API is crazy in terms of performance. You defined arrays and use them. It’s simple. Not too much dynamism or introspection.

Note that the fact that it can be translated to pure efficient C++ proves that it is not so horrible for performance.

You asked about the viability of a project, and got reasonable answers from knowledgeable people.

It seemed to me that my question wasn’t correctly understood or considered seriously and that I add to reformulate it.

Again, it seems to me that answers based on “Python is bad for perf” are not “reasonable” (for this particular questions).

1 Like

I think this would this be equivalent to:

using DataStructures
counter(X)
1 Like

Im a little confused, so that may be part of the problem. If you want to write something that calls existing Python interfaced code but from Julia PyCall shouldn’t be a problem. If you want to call Julia from Python use pyjulia. So I don’t think the problem here is with Python bit why have another step in-between the two?

1 Like

I find it best not to form an emotional attachment to a computer language, just treat it as a tool.

Julia can be considered an experiment to address various shortcomings of other languages, including Python, just like Python was created to improve on previous languages, and almost surely one day someone will design a language that supersedes Julia as we know now (it may also be called Julia, but be quite different—that is immaterial).

Discussing shortcomings and advantages of languages should not be considered arrogant, if kept at a technical level. You may find this discussion interesting:

4 Likes

exceptionally thin wrapper”, I think it would be a little bit more work than that… But yes, it would be a Julia wrapper around Julia functions mimicking the behavior of Numpy (which is most of the time quite reasonable).

The dot notation is really not the issue for a transpiler. It is not at all a problem if we have to transpile my_arr.sum() to np_array_method_sum(my_arr).

Something much more important (and maybe harder?) is that the arrays have to behave like Numpy arrays: zero-based indexing and by default continuous memory in the last index.

Actually zero based indexing would be pretty easy. Julia allows custom indexing behavior. Look at strides in the docs for memory layout stuff.

The problem of the thread is: how hard is it to implement Numpy in Julia - which @mbauman answered. Since he wrote large parts of Julia’s array code, I’d trust his judgement :wink:

If you further want to convince the Julia community of your project and maybe hope to nerdsnipe someone to start writing Numpy.jl, I’m predicting that you will have a hard time.

Not because we are allergic to Python, but because the Julia users here found their solution with Julia and are rarely looking back to numpy syntax.

So, you’re trying to pitch a project, that won’t improve the life of any Julia user. This is a tough sell! Even if it may improve Julia adoption, there is almost no one here who has the luxury to work on a project that won’t benefit them directly. Well, maybe there is someone here who has to port lots of python code to Julia, but if you want to find those people, you should probably reformulate the question :wink:

If you decided this project is worthwhile, you’ll likely be the one to implement it :wink:
So the next step should be to start Numpy.jl, and maybe nerd snipe some Julia users to help with that with some more concrete issues :wink:

19 Likes