How hard would it be to implement Numpy.jl, i.e. Numpy in Julia?

Hey @Pierre_Augier nice to see you here. This answer is more of a Pythran answer than a Julia answer.

Performances

On the performance side, I am not sure that there will be a benefit to using julia’s array library over the current Pythran numpy backend, mostly because you guys have done an incredible job at making pythonic++ fast. As a xtensor/xsimd developer I can say that pythran is already hard to beat.

For comparisons to Julia on simple broadcasting operations, Serge made a Pythran submission to the Julia challenge (which was not accepted because it used Pythran) that gave results that were 12 times faster than Julia. (The same performances were obtained with the pure C++ submission and the C++/xtensor submission).

For an example with a more “real world” example involving more operations, there was recently another set of benchmarks published with a ray-tracing problem: How Fast Is R with FastR? - Nextjournal. Wolf wrote the equivalent code with NumPy and the Pythran compiled code performed much faster than Julia.

More generally, in the benchmarks that we have been doing, Pythran has been producing amongst the fastest code amongst Julia, Numba, Julia and naive C++ implementations.

I think that this relative slowness of Julia on these array operations is going to be resolved eventually, but with the current state of things, Pythran is better off with pythonic++ and xsimd as it would be with a Julia backend.

Maintainance

Since a lot of pythran’s code involved re-implementing core Python features in pure C++ (list comprehensions etc), redoing all of this almost looks like restarting pythran from scratch to target Julia.

Then these is the question of NumPy API coverage, which is more tricky. Maintaining the NumPy backend of Pythran is certainly looking over a lot of code. We have been discussing working with Serge on converging on using xtensor as a numpy backend in Pythran. Since we already funded the xsimd transition, this may be the way forward. Other projects than pythran depend on xtensor, and there are more eyes on the code base. A practical challenge is that xtensor has very different semantics from pythran in terms of lifetime of expressions, but on the API coverage side, coverring the largest possible portion of NumPy is one of the main goals for xtensor. (See the NumPy to xtensor cheat sheet). Getting the huge Pythran test suite and benchmark suite to run against xtensor would be like the ultimate unit test for xtensor, and a good way to track down performance bugs. On the other hand with a Julia backend to Pythran, it may be difficult to get upstream to make changes to support your use case.

In any case, Pythran is just fantastic and it allowed us to produce amazingly fast code from a few lines of NumPy operations.

Producing compiled Python extensions

Another common use case of Pythran is to produce binary extensions to Python. To my knowledge, it is not 100% supported by julia to produce compiled binaries that don’t depend on Julia’s runtime - but I may be wrong here as things may have changed since the last time I looked. In any case, just like Pythran does, we would need the resulting Julia-implemented extensions to operate on NumPy arrays inplace (by calling the numpy C API from julia, or using the Python buffer protocol).

7 Likes

Well, the exact issue is known, and for example DifferentialEquations.jl’s internals has a specific broadcast override that works around the issues:

https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/diffeqfastbc.jl#L30-L45

The problem is really that LLVM is getting tripped up with some of what it can and what it can’t assume, since the Julia versions are quite a bit more generic than the ones that Wolf implemented. If the Julia one was narrowed down to be like Wolf’s in terms of features, then you get the speed up, which is exactly what we did with our fastbc, but of course this cannot be generally done since Julia’s use of broadcast generally isn’t able to satisfy all of the assumptions that we required here (and we require using @.. or a specific array wrapper to use this different broadcast with increased assumptions).

4 Likes

I think this is still correct. But I’d like to add that the most recent version of PyJulia added the custom Julia system image support: Custom Julia system image — PyJulia 0.6.1 documentation This lets you compile the Julia packages you want to use into a shared library sys.{so,dll,dylib} together with the Julia runtime. So, it still includes Julia itself but you already have the “compiled binary” part. But there is no reasonable way to distribute this compiled binary as it’s not relocatable at the moment.

This is what PyCall (hence PyJulia) has been doing.

(Mentioning PyCall/PyJulia in the context of the OP didn’t make sense but I thought it’s relevant in your comment.)

Well in fact, all non-julia submission were reaching the 12x factor, including pythran’s. Pythran and xtensor arrays are more general than julia native ones with respect to layouts (support row-major, column-major, strided).

Wolf’s feat in making a non-xtensor C++ submission with a minimal expression system was merely constrained by it having to be self contained in couple of hundred lines of C++.

1 Like

Well yes, that’s what happens when you are testing against something with a known performance bug. I’m not sure why that’s surprising.

2 Likes

I didn’t see a Pythron version using something like MultiScaleArrays with SymEngine symbolic number types:

Julia’s broadcast doesn’t require row or column major (those are traits of the array and can change iteration generation, but aren’t traits of the operation). Right there is a case where such an idea isn’t even defined, so if that’s an idea you’re thinking of then it’s not general.

2 Likes

The reason for the 12x speed difference has been explained multiple times and is very simple: if you dont use the sin compiler intrinsic, any of the used compilers wasn’t able to infer that sin is pure and therefore doesn’t hoist it out of the loop… Btw, Wolf also run into that when he tried to use xsimd::sin :wink: Julia’s sin is implemented in pure Julia, which makes it have that problem by default, but is also easily solvable by calling into the llvm sin intrinsic.

6 Likes

I don’t see at all how it is related to the subject of Pythran… On the C++ xtensor side, I must say that I had never tried combining it with Symengine, but it seems to just work:

Now, my point in the previous post was that from Pythran’s standpoint, there isn’t much interest in Julia at the moment.

No, you simplified the problem to be one that is SymEngine on arrays, not SymEngine on graphs reinterpreted as vectors. That’s a substantially simpler problem, but it also highlights the Pythran point. Does this simplified version of the problem work on the Pythran version? Does the original problem I proposed work with the Pythran or C++ broadcast implementations? Obviously the answer is no (the C++ one is tied to an xtensor array type, and the Pythran one cannot handle the number types), yet this is an example that I pulled out of my standard morning Julia routine. Usually a counter example like that would have people concede the falsity of a statement like:

And the other point to that of course is that of course Julia has row-major, column-major, and strided arrays that are compatible with broadcast. They aren’t even in packages, they are built into Base. Transpose{T,Matrix{T}} is a row-major matrix and it’s a very commonly used type (of course, if you have column-major matrices implementing the row major matrix via Transpose in the type system and having that handle the reformatting of the loops is quite nice). Julia’s Base comes with many different strided arrays depending on what you need:

julia> StridedArray
Union{DenseArray{T,N}, ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray}, ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where
A<:DenseArray where N where T, DenseArray} where N where T, SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray}, SubArray{T,N,A,I,L} where L where I<:Tuple{Vararg{Union{Int64, AbstractRange{Int64},
AbstractCartesianIndex},N} where N} where A<:Union{ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{ReinterpretArray{T,N,S,A} where S where A<:Union{SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T,
SubArray{T,N,A,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where A<:DenseArray where N where T, DenseArray} where N where T, DenseArray}} where N where T

each of those generic within themselves as well.

Yeah, I don’t think anyone would disagree with that. Julia and Pythran are doing very different things.

9 Likes

Chris, IMHO you didn’t fully qualify your initial question (just posting links to packages that are not widely known is not specializing a question well).

Obviously graph structures can be implemented on top of an xtensor array type as well, or the graph structure can implement the xexpression interface (just as you seem to do in your MultiScaleArrays package). Implementing the xexpression interface gives you lazy evaluation and broadcasting for “free”!

It’s okay to just say: yes, that’s cool that you can do that in C++ as well – there is no need to be angry about it :wink:

Even Pythran in it’s current state can probably do a lot of the stuff you’re talking about, it’s just a matter of implementing the Python->C++ mapping for symbolic expressions. Pythrans internal C++ array implementation uses a templated array type (like xtensor, and just like what you showed for Julia).

C++ and Julia are really not that different in my opinion.

I would appreciate it if you were not so dismissive. I saw some glimmers of hope on the julia slack where people were constructively criticizing the behavior of “julia fanboys” (quoting verbatim), especially in this thread.

3 Likes

Yeah, I’ve been saying that for years. Julia is essentially a cleaned up C++ with a bunch of automated type computations, JIT compilation, and a nice metaprogramming AST. Generic Julia functions and C++ templates are pretty much the same, with the former being automatically generic and the latter choosing the opposite default. And the default choice is then tied to JIT compilation of course, because AOT compilation means you need to compile a bunch of code if you allow too many possibilities, while JIT compilation only compiles the specializations it needs on demand so it’s okay to define a function with nearly infinitely many specializations as long as you don’t try and compile every possibility.

So yes, that part is clear and we all agree that they are equally generic, with different pros and cons for interactivity and the ability to easily create a binary. And there’s no real performance magic either: when the two are generating their optimal code they will have the same speed (if the C++ is compiled with Clang).

But one thing that is different is the standardization. C++'s problem isn’t that standards don’t exist, it’s that far too many exist. In Julia, you can expect that any array package you find will work with Julia’s broadcast system, and this broadcast system will let different array types work together. In C++, I can’t pick up two random packages and expect a .+ b or equivalent between array types in the two packages to just work. That’s a pretty big difference usage-wise. Of course, C++ could have a central broadcast system, but it doesn’t. What I did not see before is that the xtensor xexpression interface basically allows for the definition of broadcasts in a standard way, like the Julia Base broadcast and AbstractArray interfaces. That is pretty neat and looks well done! But until you see almost all C++ packages implementing this interface, you won’t be able to just pick up a SUNDIALS NVector and broadcast it against a matrix type from Boost. On the flip side, you’d expect to have this kind of thing work between any Julia packages. Of course, this isn’t a technical issue so much as a community issue, but that’s the reason why I said “the C++ one is tied to an xtensor array type” (referring to the C++ broadcast solution) isn’t a true language-wide broadcast solution like the Julia one is.

The other thing I objected to was to think that because C++ is as generic, that implies Pythran is also completely generic

is a pretty big difference. Would you agree having to write such a mapping is pretty different than having it automatically exist through a robust number system like in C++ and Julia? Since Pythran hasn’t even finished supporting NumPy yet, it sounds like it’s a ways off before all of the Python symbolic libraries are supported. I would definitely agree that Pythran is a great project if what you need fits within the confines of what the transpiler is able to do, but some projects need more generic tooling and that is where something like C++ or Julia have an edge since having to change the compiler every time you want to do something new is not sustainable.

To link back to the OP, this generality and centralization are the reason why Julia can be advantageous to develop in. However, Steven said it best in the earlier post (How hard would it be to implement Numpy.jl, i.e. Numpy in Julia? - #32 by stevengj) that these pieces don’t translate over when you’re just looking to make a NumPy implementation, so there’s really no advantage to implement NumPy in Julia over something like C++ or Pythran.

14 Likes

I don’t know if that performance bug is also what caused julia to be slower in the other benchmarks but it is significant enough to be noted from a pythran perspective.

Yeah, I was not sure how xtensor “simplified the problem”, but I did not want to divert the conversation more from pythran. This thread was about: “should pythran use a julia backend”. I mention a feature of pythran that is currently in native julia arrays, and the response is to show some unrelated features of julia that don’t have any relationship with what pythran was intending to use julia for.

Maybe the point got muddled, but you made a statement that " Pythran and xtensor arrays are more general than julia native ones with respect to layouts (support row-major, column-major, strided)". In a very strict interpretation this could be true, but in reality it’s false because the whole point of Julia is to not use Array, but instead have code use any AbstractArray type, and so row-major, column-major, strided, graphical, sparse, permuted, blocked, etc. are all just type information over a slab of memory presented as an array, and Julia code doesn’t care which one you give it. So for example, Transpose{T,Matrix{T}} is perfectly fine row-major array that’ll work wherever it’s needed, so there’s no reason to implement a row-major array as a primitive type and instead you let dispatch apply to this. So since the true sense of an array in Julia is any array type made by any package in Julia which subtypes AbstractArray, I can pull out a few thousand examples to show how “what’s built-in” (to the Julia ecosystem) is already more generic than Pythran/xtensor (and of course xtensor is probably generic enough that you can spend some time writing your own C++ code to implement some array type, but that’s quite different than having the choices already existing and in the language that you’re scripting with. We can argue that forever though: C++ is fine if you understand it.)

However, to use this to one’s advantage, you have to be using Julia’s type system. So if you then want to create a Numpy.jl, you’d hardcode to specific types and implement specific functions. But if you’re stuck at that point, you lose all of the genericness, so why use Julia? You’re essentially just writing C/Fortran code at that point, where maybe dispatch can help structure the code a bit more nicely but you wouldn’t expect performance gains and the user experience would be largely the same in the best case scenario.

9 Likes