How hard would it be to implement Numpy.jl, i.e. Numpy in Julia?

It starts to be disrespectful. Please!

In all this discussion I have always talked about “simple Python / Numpy used in the numerical kernels”, where of course we don’t use all Python semantic / power / introspection.

I perfectly understand that compiling / transpiling full Python programs is very hard and that there are many performance problems.

I perfectly understand that Julia is a great piece of technology, and offers very interesting possibilities.

When I say “I don’t understand”, I clearly don’t say “everyone else is wrong”, but that I’d like to be convinced and I’ll study to better understand.

Julia is probably not much better (if at all) if you have a simple loop-based/vector computation on numpy-like arrays and a limited set of “built-in” data types (mainly numpy scalars), no outside library code, and limited polymorphism. These narrowly defined benchmarks are where Numba etcetera can do fine.

I’m sorry, but it is what is useful in my work for the numerical kernels. The scientific complexity + nice user experience + usage of outside libraries is built outside the numerical kernels and I don’t need very high numerical efficiency for that (and Fortran 77 wouldn’t be fine for that).

I don’t think anyone here is saying it is impossible; most of the responses so far have simply been a cost-benefit analysis with a conclusion in the negative. At the risk of prolonging the thread, I tried to generalize your initial question in two ways, and make a few points about each.

  1. how suitable is Julia as a library implementation language; in other words, using Julia to implement a library which is consumed by other languages via, for the sake of simplicity, a C API.
  • The largest issue is that the tooling around specifying, precompiling, and verifying a C API, is still somewhat immature.

  • The in-Julia debugging story is rapidly (!) improving, however, inter-language debugging is yet to be explored in the latest iteration (whereas it is feasible if not always painless for e.g. Python+Cython/C)

  • The Julia packaging system is great for use w/in Julia, but for a library you would probably want to use Conda, Homebrew, Apt, etc. — and none of those systems have really been used to distribute Julia libraries, yet, so doing this would be breaking new ground. One exception is pip, which has at least 1 existence proof in the form of diffeqpy linked earlier.

  1. how suitable is Julia as a compilation target, as compared to, say, C++ (which both Pythran and Cython effectively target).
  • Julia is about as suitable as any other garbage collected, high-level language (Java, Javascript, etc.). Julia is effectively an LLVM frontend, however — there is no “Julia byte code” to target, as in the JVM. So while there is some simplicity in targeting a higher-level language, you lose a significant amount of control and flexibility.

    People target the JVM (bytecode) and Javascript/WebAssembly at very minimum for logistical reasons which don’t really apply to Julia: the JVM is extremely mature and widespread; Javascript and WebAssembly are mature, and are (or will soon be) available on nearly every user-facing computing device on the planet.

  • At least one strength: support for LABEL and GOTO :slight_smile: which JavaScript and WebAssembly do not support (though there are powerful solutions, of course).

4 Likes

This may cut to the heart of the matter: you seem to think that people here are being defensive about Julia. Quite the opposite. If anything, your suggestion that compiling Numpy to Julia would provide performance advantages is giving too much credit to Julia and not enough to Numpy. The counterargument (which has been given a number of times) is that for the API that Numpy exposes the Numpy implementation is already pretty much optimal. The Numpy folks are really, really good at what they do. The notion that compiling to Julia will somehow make things faster is kind of insulting to the Numpy team—who aren’t here to defend themselves, so the Julia community is making that case for them. Julia doesn’t magically generate faster code that other systems can’t match (we use LLVM, they use LLVM—how would we be faster?). Julia’s secret sauce is in designing a language that feels just as expressive and flexible but which can be compiled to fast code in a general and composable way. Which doesn’t help at all when the code you’re trying to run is written in Python rather than Julia.

The case you’ve made for why it would be interesting to compile Numpy to Julia is frankly a bit weak. Here are some of the arguments I’ve been able to glean from the discussion:

  1. It would be interesting (ok, but subjective).
  2. It could be faster (it won’t).
  3. It could lead to better GPU support (possible, but unlikely)

Improved GPU support by compiling to Julia is unlikely because if you can compile Numpy to GPU via Julia, then you could even more easily compile Numpy to GPU directly. This would be simpler and faster both in terms of compile time and the resulting code speed and since Julia and Numpy both use LLVM for code gen, the end result would be very similar. The biggest problem for Numpy on GPU is likely that it’s much harder to avoid dynamic allocation in Python than in Julia—even the subset that Numpy can compile needs it—and the GPU doesn’t play well with dynamic memory allocation. Compiling to Julia wouldn’t help with that at all since you’d be still be using Pythonic APIs, not Julian ones.

All that said, the great thing about open source is that people can do what they want to! If someone—perhaps you—wants to try to reimplement Numpy in Julia, they can absolutely do so. The thing people in this thread are trying to tell you is that you are likely to be disappointed with the results.

19 Likes

Here are a few pointers to get started:

That last one is by the author of LuaJIT, and the points he makes are very good: again, it’s not impossible to make Python fast, but the point is about the level of engineering effort required being very high (possibly even more than Javascript for some features). Obviously various projects have succeeded to a degree, like PyPy (at the cost of compatibility with the CPython ecosystem, until recently; it is not more widespread for that and other reasons), or Numba (by restricting scope etc.).

So in that respect, targeting Julia doesn’t add too much to the intrinsic work required on the frontend.

Personally, if I was going to work on a “make Python fast” project, I would probably choose GitHub - oracle/graalpython: A Python 3 implementation built on GraalVM. This has two advantages:

  • the implementation is “simple” (relatively speaking) in the sense that a language target is written as an AST interpreter, and then you get a JIT “for free”. So, the idea here is that by simplifying the implementation, the amount of engineering effort required to get a competitive JIT with all the fancy tricks needed to make Python fast, is reduced (see e.g. Laurence Tratt: Domain Specific Language Implementation via Compile-Time Meta-Programming).
  • They’ve demonstrated in the Ruby version that they can compile Ruby C extensions (now via LLVM->Sulong) together with Ruby code, and get very impressive speedups (for both the pure Ruby and potentially w/ inter-language inlining etc.). This would be a huge boon, because it would maintain compatibility with the existing scientific Python ecosystem.
6 Likes

If the restricted kind of code that Numba/Pythran can optimize is fine for your application, then lucky for you! We use Julia because we want more flexibility and composability than that, not because Julia is magically faster at simple a[i] = 2*a[i] + 1 style loops over arrays of Float64 scalars (it isn’t). There was a long thread about this that we don’t want to repeat: Julia motivation: why weren't Numpy, Scipy, Numba, good enough?

5 Likes

2 posts were split to a new topic: PyTorch and Julia

I’ll post some once I have time to make a POC package.

1 Like

Hey @Pierre_Augier nice to see you here. This answer is more of a Pythran answer than a Julia answer.

Performances

On the performance side, I am not sure that there will be a benefit to using julia’s array library over the current Pythran numpy backend, mostly because you guys have done an incredible job at making pythonic++ fast. As a xtensor/xsimd developer I can say that pythran is already hard to beat.

For comparisons to Julia on simple broadcasting operations, Serge made a Pythran submission to the Julia challenge (which was not accepted because it used Pythran) that gave results that were 12 times faster than Julia. (The same performances were obtained with the pure C++ submission and the C++/xtensor submission).

For an example with a more “real world” example involving more operations, there was recently another set of benchmarks published with a ray-tracing problem: How Fast Is R with FastR? - Nextjournal. Wolf wrote the equivalent code with NumPy and the Pythran compiled code performed much faster than Julia.

More generally, in the benchmarks that we have been doing, Pythran has been producing amongst the fastest code amongst Julia, Numba, Julia and naive C++ implementations.

I think that this relative slowness of Julia on these array operations is going to be resolved eventually, but with the current state of things, Pythran is better off with pythonic++ and xsimd as it would be with a Julia backend.

Maintainance

Since a lot of pythran’s code involved re-implementing core Python features in pure C++ (list comprehensions etc), redoing all of this almost looks like restarting pythran from scratch to target Julia.

Then these is the question of NumPy API coverage, which is more tricky. Maintaining the NumPy backend of Pythran is certainly looking over a lot of code. We have been discussing working with Serge on converging on using xtensor as a numpy backend in Pythran. Since we already funded the xsimd transition, this may be the way forward. Other projects than pythran depend on xtensor, and there are more eyes on the code base. A practical challenge is that xtensor has very different semantics from pythran in terms of lifetime of expressions, but on the API coverage side, coverring the largest possible portion of NumPy is one of the main goals for xtensor. (See the NumPy to xtensor cheat sheet). Getting the huge Pythran test suite and benchmark suite to run against xtensor would be like the ultimate unit test for xtensor, and a good way to track down performance bugs. On the other hand with a Julia backend to Pythran, it may be difficult to get upstream to make changes to support your use case.

In any case, Pythran is just fantastic and it allowed us to produce amazingly fast code from a few lines of NumPy operations.

Producing compiled Python extensions

Another common use case of Pythran is to produce binary extensions to Python. To my knowledge, it is not 100% supported by julia to produce compiled binaries that don’t depend on Julia’s runtime - but I may be wrong here as things may have changed since the last time I looked. In any case, just like Pythran does, we would need the resulting Julia-implemented extensions to operate on NumPy arrays inplace (by calling the numpy C API from julia, or using the Python buffer protocol).

7 Likes

Well, the exact issue is known, and for example DifferentialEquations.jl’s internals has a specific broadcast override that works around the issues:

https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/diffeqfastbc.jl#L30-L45

The problem is really that LLVM is getting tripped up with some of what it can and what it can’t assume, since the Julia versions are quite a bit more generic than the ones that Wolf implemented. If the Julia one was narrowed down to be like Wolf’s in terms of features, then you get the speed up, which is exactly what we did with our fastbc, but of course this cannot be generally done since Julia’s use of broadcast generally isn’t able to satisfy all of the assumptions that we required here (and we require using @.. or a specific array wrapper to use this different broadcast with increased assumptions).

4 Likes

I think this is still correct. But I’d like to add that the most recent version of PyJulia added the custom Julia system image support: Custom Julia system image — PyJulia 0.6.1 documentation This lets you compile the Julia packages you want to use into a shared library sys.{so,dll,dylib} together with the Julia runtime. So, it still includes Julia itself but you already have the “compiled binary” part. But there is no reasonable way to distribute this compiled binary as it’s not relocatable at the moment.

This is what PyCall (hence PyJulia) has been doing.

(Mentioning PyCall/PyJulia in the context of the OP didn’t make sense but I thought it’s relevant in your comment.)

Well in fact, all non-julia submission were reaching the 12x factor, including pythran’s. Pythran and xtensor arrays are more general than julia native ones with respect to layouts (support row-major, column-major, strided).

Wolf’s feat in making a non-xtensor C++ submission with a minimal expression system was merely constrained by it having to be self contained in couple of hundred lines of C++.

1 Like

Well yes, that’s what happens when you are testing against something with a known performance bug. I’m not sure why that’s surprising.

2 Likes

I didn’t see a Pythron version using something like MultiScaleArrays with SymEngine symbolic number types:

Julia’s broadcast doesn’t require row or column major (those are traits of the array and can change iteration generation, but aren’t traits of the operation). Right there is a case where such an idea isn’t even defined, so if that’s an idea you’re thinking of then it’s not general.

2 Likes

The reason for the 12x speed difference has been explained multiple times and is very simple: if you dont use the sin compiler intrinsic, any of the used compilers wasn’t able to infer that sin is pure and therefore doesn’t hoist it out of the loop… Btw, Wolf also run into that when he tried to use xsimd::sin :wink: Julia’s sin is implemented in pure Julia, which makes it have that problem by default, but is also easily solvable by calling into the llvm sin intrinsic.

6 Likes

I don’t see at all how it is related to the subject of Pythran… On the C++ xtensor side, I must say that I had never tried combining it with Symengine, but it seems to just work:

Now, my point in the previous post was that from Pythran’s standpoint, there isn’t much interest in Julia at the moment.

No, you simplified the problem to be one that is SymEngine on arrays, not SymEngine on graphs reinterpreted as vectors. That’s a substantially simpler problem, but it also highlights the Pythran point. Does this simplified version of the problem work on the Pythran version? Does the original problem I proposed work with the Pythran or C++ broadcast implementations? Obviously the answer is no (the C++ one is tied to an xtensor array type, and the Pythran one cannot handle the number types), yet this is an example that I pulled out of my standard morning Julia routine. Usually a counter example like that would have people concede the falsity of a statement like:

And the other point to that of course is that of course Julia has row-major, column-major, and strided arrays that are compatible with broadcast. They aren’t even in packages, they are built into Base. Transpose{T,Matrix{T}} is a row-major matrix and it’s a very commonly used type (of course, if you have column-major matrices implementing the row major matrix via Transpose in the type system and having that handle the reformatting of the loops is quite nice). Julia’s Base comes with many different strided arrays depending on what you need:

julia> StridedArray
Union{DenseArray{T,N}, ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray}, ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where
A<:DenseArray where N where T, DenseArray} where N where T, SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray}, SubArray{T,N,A,I,L} where L where I<:Tuple{Vararg{Union{Int64, AbstractRange{Int64},
AbstractCartesianIndex},N} where N} where A<:Union{ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T,
SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, DenseArray}} where N where T

each of those generic within themselves as well.

Yeah, I don’t think anyone would disagree with that. Julia and Pythran are doing very different things.

9 Likes

Chris, IMHO you didn’t fully qualify your initial question (just posting links to packages that are not widely known is not specializing a question well).

Obviously graph structures can be implemented on top of an xtensor array type as well, or the graph structure can implement the xexpression interface (just as you seem to do in your MultiScaleArrays package). Implementing the xexpression interface gives you lazy evaluation and broadcasting for “free”!

It’s okay to just say: yes, that’s cool that you can do that in C++ as well – there is no need to be angry about it :wink:

Even Pythran in it’s current state can probably do a lot of the stuff you’re talking about, it’s just a matter of implementing the Python->C++ mapping for symbolic expressions. Pythrans internal C++ array implementation uses a templated array type (like xtensor, and just like what you showed for Julia).

C++ and Julia are really not that different in my opinion.

I would appreciate it if you were not so dismissive. I saw some glimmers of hope on the julia slack where people were constructively criticizing the behavior of “julia fanboys” (quoting verbatim), especially in this thread.

3 Likes

Yeah, I’ve been saying that for years. Julia is essentially a cleaned up C++ with a bunch of automated type computations, JIT compilation, and a nice metaprogramming AST. Generic Julia functions and C++ templates are pretty much the same, with the former being automatically generic and the latter choosing the opposite default. And the default choice is then tied to JIT compilation of course, because AOT compilation means you need to compile a bunch of code if you allow too many possibilities, while JIT compilation only compiles the specializations it needs on demand so it’s okay to define a function with nearly infinitely many specializations as long as you don’t try and compile every possibility.

So yes, that part is clear and we all agree that they are equally generic, with different pros and cons for interactivity and the ability to easily create a binary. And there’s no real performance magic either: when the two are generating their optimal code they will have the same speed (if the C++ is compiled with Clang).

But one thing that is different is the standardization. C++'s problem isn’t that standards don’t exist, it’s that far too many exist. In Julia, you can expect that any array package you find will work with Julia’s broadcast system, and this broadcast system will let different array types work together. In C++, I can’t pick up two random packages and expect a .+ b or equivalent between array types in the two packages to just work. That’s a pretty big difference usage-wise. Of course, C++ could have a central broadcast system, but it doesn’t. What I did not see before is that the xtensor xexpression interface basically allows for the definition of broadcasts in a standard way, like the Julia Base broadcast and AbstractArray interfaces. That is pretty neat and looks well done! But until you see almost all C++ packages implementing this interface, you won’t be able to just pick up a SUNDIALS NVector and broadcast it against a matrix type from Boost. On the flip side, you’d expect to have this kind of thing work between any Julia packages. Of course, this isn’t a technical issue so much as a community issue, but that’s the reason why I said “the C++ one is tied to an xtensor array type” (referring to the C++ broadcast solution) isn’t a true language-wide broadcast solution like the Julia one is.

The other thing I objected to was to think that because C++ is as generic, that implies Pythran is also completely generic

is a pretty big difference. Would you agree having to write such a mapping is pretty different than having it automatically exist through a robust number system like in C++ and Julia? Since Pythran hasn’t even finished supporting NumPy yet, it sounds like it’s a ways off before all of the Python symbolic libraries are supported. I would definitely agree that Pythran is a great project if what you need fits within the confines of what the transpiler is able to do, but some projects need more generic tooling and that is where something like C++ or Julia have an edge since having to change the compiler every time you want to do something new is not sustainable.

To link back to the OP, this generality and centralization are the reason why Julia can be advantageous to develop in. However, Steven said it best in the earlier post (How hard would it be to implement Numpy.jl, i.e. Numpy in Julia? - #32 by stevengj) that these pieces don’t translate over when you’re just looking to make a NumPy implementation, so there’s really no advantage to implement NumPy in Julia over something like C++ or Pythran.

14 Likes

I don’t know if that performance bug is also what caused julia to be slower in the other benchmarks but it is significant enough to be noted from a pythran perspective.

Yeah, I was not sure how xtensor “simplified the problem”, but I did not want to divert the conversation more from pythran. This thread was about: “should pythran use a julia backend”. I mention a feature of pythran that is currently in native julia arrays, and the response is to show some unrelated features of julia that don’t have any relationship with what pythran was intending to use julia for.

Maybe the point got muddled, but you made a statement that " Pythran and xtensor arrays are more general than julia native ones with respect to layouts (support row-major, column-major, strided)". In a very strict interpretation this could be true, but in reality it’s false because the whole point of Julia is to not use Array, but instead have code use any AbstractArray type, and so row-major, column-major, strided, graphical, sparse, permuted, blocked, etc. are all just type information over a slab of memory presented as an array, and Julia code doesn’t care which one you give it. So for example, Transpose{T,Matrix{T}} is perfectly fine row-major array that’ll work wherever it’s needed, so there’s no reason to implement a row-major array as a primitive type and instead you let dispatch apply to this. So since the true sense of an array in Julia is any array type made by any package in Julia which subtypes AbstractArray, I can pull out a few thousand examples to show how “what’s built-in” (to the Julia ecosystem) is already more generic than Pythran/xtensor (and of course xtensor is probably generic enough that you can spend some time writing your own C++ code to implement some array type, but that’s quite different than having the choices already existing and in the language that you’re scripting with. We can argue that forever though: C++ is fine if you understand it.)

However, to use this to one’s advantage, you have to be using Julia’s type system. So if you then want to create a Numpy.jl, you’d hardcode to specific types and implement specific functions. But if you’re stuck at that point, you lose all of the genericness, so why use Julia? You’re essentially just writing C/Fortran code at that point, where maybe dispatch can help structure the code a bit more nicely but you wouldn’t expect performance gains and the user experience would be largely the same in the best case scenario.

9 Likes