On Julia interoperability with C and PyCall

Hi. I wanted to try Julia for some time, but since now it starts having more of a reason for me, some questions popped up.

  1. why Julia native instead of C libraries ?
    The problem with python was that it was slow. The solution came with C libraries that are super fast. If Julia always write Julia native code as it is encouraged, in my view it will never be as fast as the python packages that are actually C-implemented in their core. Am I right ?

  2. PyCall for all missing packages ?
    I understand that PyCall comes with IPC overhead, but for some packages (e.g. Tensorflow, pytorch), which are C implemented and can be even trained in the GPU, i think a constant IPC overhead shouldn’t be an issue. Especially for ML, the training process is the most time-consuming and this shouldn’t demand any IPC. In this sense, I find no disadvantages building your program in Julia and using your favorite C-optimised python packages with PyCall ?

No. Optimized Julia code can match optimized C code for performance — indeed, in some cases it is much easier to optimize Julia code than C because of the high-level constructs available in Julia — and optimized Julia code has the advantage of being potentially much more flexible than corresponding C code (because Julia code can be more type-generic).

There’s no magic in C that can’t be achieved by other languages — indeed, Julia’s compiler backend, LLVM, is commonly used for C as well — as long as the language semantics expose enough type information to the compiler. (See also this FAQ about why Julia can do this and Python can’t.)

10 Likes

This has been discussed in many different (blog) posts.

Julia can compete with C, but neither Julia, nor C can compete with fine-tuned machine code “easily” (@stevengj’s post below as a counter example for Julia :wink:), tailored to your CPU. I am talking e.g. about BLAS (just have a look at some kernels and hand-tuned assembler code for a bunch of CPU types https://github.com/xianyi/OpenBLAS/tree/develop/kernel which exploit all kinds of features of the architecture itself, including cache types/sizes, instruction sets etc.).
Such things are often referred to as “C libraries” but they’re not the same as “plain C libraries written in pure C”, which are basically as fast as native Julia, sometimes faster, sometimes slower. It depends on the compiler and the type stability.

This also means that it’s more important to compare Python+C vs. Julia+C, where Julia is a clear winner. I don’t think I have to discuss that further :wink:

There are unfortunately disadvantages, since you have yet another layer of dependency, which by nature adds more problems to the stack. I find myself fighting with PyCall every other week just because I work with virtual environments in Python and sometimes even Conda (Forge whatsoever) and things get very tricky with PyCall. As long as you are using a single Python environment, most things are OK and predictable. But this is not what I experience in my daily work. I am btw. an experienced Python dev.

The advantage of using Julia and frameworks which are writtein Julia is simply the fact that you can tune every single piece of your application, deep down the rabbit hole without the need to write C or deal with glue code and bindings.

…but as said, there are plenty of discussions, so I’ll cut it off now, hope this gives you some insights.

4 Likes

Julia code actually can compete with OpenBLAS and other optimized BLAS libraries (see GitHub - JuliaLinearAlgebra/Octavian.jl: Multi-threaded BLAS-like library that provides pure Julia matrix multiplication), and Julia code can directly emit low-level LLVM intrinsics with llvmcall in the extremely rare cases where this might be desired.

It’s actually quite rare that it’s worthwhile to drop down to assembly for optimization, even for highly tuned libraries (e.g. FFTW). Mostly I’ve seen this in the past for accessing special CPU instructions, but for the most part there are nowadays higher-level intrinsics to expose this functionality.

11 Likes

Indeed, I was a bit inaccurate there. :slight_smile:

1 Like

The reason I am asking this, is that in case I decide to go full-Julia, I might find myself being locked in and not having access to state-of-the-art algorithms implemented in python packages. As I understand you base the disadvantages only in the fact that PyCall can be tricky to get it working in all cases. If this is successfully done, then there should be no problems (besides the IPC delay mentioned)?

Well, “state-of-the-art algorithms” are likely not implemented in Python packages but often in C/C++ libraries, which you can interface to Julia as well, so you might even be a pioneer if you miss something :wink:

Most of the problems I face with PyCall come from the constant switching between environments. I spend a significant amount of time with developing and maintaining a lot of Python packages, therefore I rely on many different (isolated) Python environments. This is something which is already quite cumbersome in Python itself – as written above, virtualenv vs. Conda vs. mix of different solutions to get packages installed. Things got even more complicated since I got an M1 MacBook, where parts of compiled Python packages for aarch64 can only be obtained through different Conda forges. Adding an additional PyCall layer makes things even more worse.

…yes, if you primarily work in Julia and don’t mess around with Python behind the scenes or switch between environments, you definitely be fine.

2 Likes

Just to mention, PyCall links against libpython directly, so there shouldn’t be any more overhead than calling python functions from within python itself. In other words, I don’t think there is “IPC delay” because libpython is loaded into the Julia process itself (ref Sharing python memory - #4 by stevengj).

6 Likes

Edit: I see that some of the following was also covered in a different thread. Using cppyy for julia/C++ interoperability - #5 by lungben
Sorry for the cross-post.

I’ve just gotten a message about current and upcoming features of cppyy.
I think it’s appropriate for me to inline the message here. In particular, please see at the end for a code example.

I was notified by a user that cppyy
works cleanly from Julia, see below, which instantiates a template that has
been JITted from Julia. By extension, that means most of HEP's Python and
C++ codes should work and where not can be simplified in either Cling (C++)
or Python, inline, before calling from Julia.

There's an issue with Clang9 (clashing symbol), but that's not fundamental.
Contrary to an attempt of ROOT.jl (as I understood it), libCling.so is made
to be closed, so that clash is just a bug that's fixable. That is, other
than an increased memory usage, there's no technical problem with having two
Clang run-times in the same process.

(It needs closing regardless, as there are many, many uses of libLLVM out
there. In fact, I'm sure Julia is going to have to close up at some point,
as I've even seen device drivers, of all things, use libLLVM!)

Anyway, maybe this info will help stave off questions about how to handle
the intermediate stage, legacy code, and collaborators who will continue to
develop in these other languages.

More interesting for us, would be if there's folks who want to build on top
of Cling's planned libinterop. Basically that'd be the front of PyCall with
the backend of cppyy (in the form of libinterop), w/o the Python layer. We
need another language to test libinterop (there's Python and D, which both
work on top of cppyy's clingwrapper C-API, and some interest from Lisp, but
a solid 3rd language would be needed to proof generality). Given that it
works through cppyy's Python layer, there's no technical reason why it could
not work w/o and it shouldn't be too hard to implement, given the existing
example codes.

The maintainer of Cxx.jl believes that its approach is still the best b/c
it'll have better performance (memory-wise, if nothing else, but also it
can reach deeper into Julia's internals). This is true, but even there LLVM
will move on as Cling's features enter into the new Clang-Repl. And I don't
believe that the better performance is worth the extra maintenance cost.

So, if there's interest within the Julia-HEP tribe, it'd be great if they
could contact us and get the ball rolling.

$ python3 -m venv TEST0
$ source TEST0/bin/activate
(TEST0) $ python -m pip install cppyy==1.9.6
...
(TEST0) $ julia
julia> import Pkg; Pkg.add("PyCall")
julia> using PyCall; cppyy = pyimport("cppyy")
PyObject <module 'cppyy' from '/home/TEST0/lib64/python3.8/site-packages/cppyy/__init__.py'>

julia> cppyy.cppdef(
                """#include <iostream>
                   template<typename T>
                   void blah ( const T& arg )
                     { std::cout << "hello " << arg << std::endl ; }""")
true

julia> cppyy.gbl.blah(1.234)
hello 1.234