Using Python packages in Julia

I’ve not used either Python or Julia (my background is C++/Bash/Perl/Mathematica), but was thinking about learning one or the other, and have a naive question about Julia:

Many Python users say they’d “love to use Julia, but need to stay with Python because of the packages.” This confuses me, because I thought Julia had the capability to use Python packages. Given this, what’s the issue? Is it that packages built for Python can be buggy in Julia? Or is it that using a Python package in Julia is somehow more cumbersome than using it natively in Python? [Relatedly, when using a Python package in Julia, does one use Python syntax or Julia syntax? I ask because I’ve never used a package written for one language in another language.] Or is there some other set of issues?

[The big ones, for me, would be NumPy and SciPy, but it would of course be nice to have access to all of them.]

1 Like

PyCall.jl is solid, and usage is pretty straightforward. That said, using it fluidly requires fluency in both Julia and Python, since the syntax ends up midway between both. If the core of your work revolves around a large library with a Python frontend (e.g. Tensorflow, OpenCV), better to just write Python, but if you’re doing à la carte numerical work, give Julia a try. You can call Python as needed, but you’ll probably find that almost everything in NumPy and SciPy can be done natively with Julia packages.

6 Likes

Definitely this. NumPy and SciPy are necessary in Python to make numerical computing fast, but Julia arrays are far nicer to work with (and I assume just as fast).

3 Likes

I actually wasn’t thinking of NumPy for its computational speed-up. Rather I thought I might need it for the list-manipulation functions, like numpy.array_split. But perhaps these are already present in Julia.

For instance, suppose I had a ragged 2D array (called array1), where I needed to split the data in each row into 11-element chunks (discarding any remainders) and extract the 3rd and 7th elements from each chunk. In Mathematica, that’s straightforward—I Map the Partition function onto the array to break it up in to chunks, then I Map the Part function ( [[ ]] ) onto the chucks to select the desired elements. And since I want to do this at the element level, I specify depth = {2}:

Map[{#[[3]], #[[7]]} &, Map[Partition[#, 11] &, array1], {2}]

So I figured to gain access to basic list-manipulation functions, like the above, I needed something like NumPy. But maybe not—how would you do that in Julia? [Or should I delete this post and ask that as a separate question?]

I’d say in general it’s best to ask specific questions like that in a new thread, primarily so that the answers are easier to read and understand for future users.

But the overall answer to your question is that the key advantage of Julia is that loops are fast. There might be a fancy way to express what you want with map in Julia, but if there isn’t, then you can just write a loop to do exactly what you described. If you write your code well, then the loop will be just as fast as the numpy.array_split method, except that it will also work for arbitrary scalar types in a way that numpy probably won’t handle well.

Another way of putting this is that if numpy didn’t give you array_split, there might be no way to express that operation efficiently (without paying the performance cost of Python). In Julia, you might find an array_split, but if there isn’t one then you can write it yourself (in Julia) and the result will be just as fast as a built-in.

9 Likes

And just to add on, having used Python extensively and having used Julia for several years, I have never once felt tempted to try to use numpy from Julia. There are lots of great Python libraries that don’t have good Julia equivalents, and PyCall.jl is fantastic for those, but for the kinds of things that numpy supports, I find Julia’s approach to be as good or better across the board.

7 Likes

For your example, try

function partition(row, chunklen)
    nchunks = length(row) ÷ chunklen
    return eachrow(reshape(view(row, 1:nchunks*chunklen), nchunks, chunklen))
end

map(row -> [chunk[[3,7]] for chunk in partition(row, chunklen)], data)

1 Like

I think people are just afraid to try (I’ve heard users, well a Julia beginner I think at least, say I don’t want “a language soup”, what I deem no real worry, no more than Python relying on C).†

I tried PyCall first years ago, and it was simple, and it got better recently, with no longer needing modified syntax from what you would expect for Python. If a package is buggy in Python it will keep being so in Julia. :slight_smile: If not, I would expect them to just work.

That said, there are two issues I can think of, Julia is one-based (by default, not always), and Python is 0-based. This is an issue when writing/maintaining code in both/such languages, but not a big issue when calling Python using PyCall directly, and should be a non-issue for all wrapped packages, which use PyCall indirectly.

The other issue only really applies to NumPy, which is redundant anyway; not really a Python compatibility issue as Python only has 1D arrays, but NumPy isn’t limited to 1D arrays, then you have to have in mind when porting code (or possibly when calling):

https://docs.julialang.org/en/v1/manual/noteworthy-differences/index.html

Julia arrays are column major (Fortran ordered) whereas NumPy arrays are row major (C-ordered) by default. To get optimal performance when looping over arrays, the order of the loops should be reversed in Julia relative to NumPy (see relevant section of Performance Tips).

If you’re calling regular (non-NumPy) code, you can ignore this, and if not PyCall has support for it. See text in the README file: “To deal with this, you can use PyReverseDims(a) to pass a Julia array”. If you forget this difference between languages, code will not strictly be wrong, only will get slow/non-scalable.

If you’re only calling Python, not writing or maintaining that code, then I think you can ignore the rest of the section. There’s e.g. one more difference: “Julia does not support negative indices.” I doubt you use -1 when calling a Python function/“API”.

I think only calling libraries requires minimal knowledge, not at all any fluency of Python’s syntax, and I’m confused about what’s meant by “midway” (I assume it’s about the previously needed outdated modified syntax).

Nor do I think at least TensorFlow is a reason to use Google’s official API:

Why use TensorFlow.jl?

See a list of advantages over the Python API.

However, OpenCV wrapper I found seems, currently (just needs updating, most important packages have been quickly updated), for Julia 0.6 only.

I’m not promoting Pandas here (I believe Julia has alternatives, even better).

It sticks closely to the Pandas API. One exception is that integer-based indexing is automatically converted from Python’s 0-based indexing to Julia’s 1-based indexing.

This is one reason to use a wrapper, not PyCall directly. It does this for you:

Then the price to get code to work more Julia-like, you need to have it in mind when reading Python tutorials (that should otherwise translate). You can’t have it both ways, use tutorial as is, and Julia-like

† You can call all Python libraries (modulo bugs; I doubt there are too many with people using PyCall for years). Not all libraries are frameworks, but those should work too. I have in mind a person that wants to use Julia for web development. I suggested you could even try to use Django; [such] a framework while in theory should work, I just don’t know of anyone who has tried to use, at least Django. The Julia-only alternative ones are interesting anyway.

For a web server at least, and a lot of software, I see no issue with combining. If you’re going to distribute software, PyCall even download Python for you. I’m not up to speed on distributing “compiled” Python code, and how that would work with Julia. Julia itself has some options, and I believe even PyCall may have to support such combination or AOT.

2 Likes

I belatedly remembered the existence of IterTools.jl, which includes a partition function that further simplifies the implementation of your example.

using IterTools
[getindex.(row, [[3,7]]) for row in partition.(data, 11)]
2 Likes