[ANN] PythonCall and JuliaCall

PythonCall & JuliaCall

I’m very pleased to finally announce these packages on here. They have existed for quite some time, but I’m now happy to encourage more users to try them out.

PythonCall is a Julia package to interoperate with Python, so for example you can do:

using PythonCall
plt = pyimport("matplotlib.pyplot")
plt.plot(randn(100))

It can be installed with pkg> add PythonCall.

JuliaCall is a corresponding Python package to interoperate with Julia, so for example:

from juliacall import Main as jl
import numpy as np
jl.seval("using Plots")
jl.plot(np.random.randn(100))

It can be installed with pip install juliacall.

In what follows, I’ll introduce some of the cool features of these packages. Most of you will be aware of the similar packages PyCall and PyJulia, which have existed much longer, so there will inevitably be some comparison with these.

Extensible multimedia support

PythonCall knows how to display anything that IPython knows how to display (via _repr_mimebundle_ and friends), and PyCall can too. Unlike PyCall, it can also display Matplotlib figures, and what’s more you can add more display rules.

Hence in Pluto you can do

plt.plot(rand(100))
plt.gcf()

and the plot (returned by gcf) is displayed.

On the other side, JuliaCall knows how to display Julia objects in IPython.

Flexible and extensible conversions

PythonCall has a very flexible function pyconvert(T, x) which converts the Python object x to a Julia object of type T. Unlike PyCall (which similarly has convert(T, x)) this function can take in to account both the Python type(x) and the Julia type T, which means a richer set of conversions are possible.

For example, in PyCall if you call convert(Vector{UInt8}, x) where x is a Python list of int, it will fail because it is expecting a bytes. In PythonCall pyconvert(Vector{UInt8}, x) will succeed either way: it has rules for list and bytes and selects the most specific conversion rule applicable to the inputs.

You can even do something like pyconvert(Vector{<:Real}, x) and automatically get back a Vector{Int} or Vector{Float64} or whatever is appropriate for the items in x.

This system is extensible, so packages can add more rules for different (T, type(x)) pairs.

Predictable syntax

PyCall has some behaviours making it hard to predict how it behaves.

For example x[0] does not do what you think! Firstly, it gives a deprecation warning because get(x, 0) is the proper PyCall syntax for indexing. Secondly, it actually gets the item at index -1, which is supposed to be a convenience to compensate for Python indexing being 0-up and Julia being 1-up. But if for example x is a dict then this is not what you want (I guess that’s why it is deprecated).

In PythonCall, x[0] just gets the item at index 0 no matter what x is.

For another example, PyCall eagerly converts results to Julia objects. This means that sys.path.append("/some/path") will not work because sys.path is immediately converted to a Vector. You might try push!(sys.path, "/some/path") but since the Vector is a copy of sys.path it does not actually mutate the original sys.path. To overcome this, PyCall has the syntax sys."path"."append"("/some/path") to prevent this eager conversion.

In PythonCall, sys.path.append("/some/path") does exactly what you intend. This is because most operations on Python objects return Python objects instead of converting them. If you actually need to convert anything to Julia you can use pyconvert.

As a side-effect, operations in PythonCall are type-inferrable (they mostly return Py) whereas operations in PyCall are not (they can return anything) so PyCall code can be type-unstable if you are not careful.

Non-copying conversions

By default, any mutable objects passed between Python and Julia are converted without copying any data - that is they lazily wrap the original object. This makes conversion super fast for large containers, and means that if the converted container is mutated, then the changes also appear on the original object.

A small number of immutable types (such as booleans, numbers, strings and tuples) are converted to the native types, e.g. a Julia Int64 becomes a Python int.

In the Julia-to-Python direction, Julia objects are wrapped as a juliacall.AnyValue. Some objects are wrapped to a subtype of this. For example any AbstractVector is wrapped as a juliacall.VectorValue which satisfies the sequence interface and behaves pretty much like a list:

x = [1,2,3]  # a Julia vector
y = Py(x)    # wrap as a juliacall.VectorValue
y.append(4)  # mutating y also mutates x
println(x)   # [1, 2, 3, 4]

If you actually want a list you can do pylist([1,2,3]).

In the Python-to-Julia direction, mutable Python objects are typically left as Python objects. Again, some objects are wrapped differently:

x = pylist([1,2,3])             # a Python list
y = pyconvert(AbstractArray, x) # wrap as a PyList{Int}
push!(y, 4)                     # mutating y also mutates x
println(x)                      # [1, 2, 3, 4]

Array conversion

Particularly of note is that if x is a strided Julia array then it will be wrapped in Python to a juliacall.ArrayValue which satisfies the buffer protocol and Numpy array interface. This means that a Vector{UInt8} can be passed to any function expecting a bytes-like object, and a Vector{Float64} can be passed to any function expecting a Numpy-array-like object. In particular numpy.array(x) will convert it to an actual Numpy array.

In the other direction, if x is a bytes or numpy.ndarray (or anything satisfying the buffer protocol or array interface) then PyArray(x) gives an AbstractArray view of the data.

Tabular data

If x is a Julia table (in the Tables.jl sense) then pytable(x) will convert it to a Pandas dataframe. You can ask for other output formats, such as dict of list.

If x is a Python table (for now only Pandas dataframes are supported) then PyTable(x) wraps it as a Julia table.

Isolated dependencies

All the Python dependencies for PythonCall are (by default) managed by the CondaPkg which I have announced separately.

If your project needs Numpy you can simply do

pkg> conda add numpy

before loading PythonCall. Then a Conda environment is created containing Numpy. This environment is specific to your Julia project, so dependencies are totally isolated between projects.

This also creates a CondaPkg.toml file recording the dependencies (analogous to Project.toml) so if you save it to your package, then any users of the package also get these dependencies installed.

JuliaCall similarly uses a new package JuliaPkg to manage its dependencies. If you are using a Python virtual environment or Conda environment, then a Julia project specific to that is used, again keeping dependencies totally isolated.

In all cases, Python or Julia are automatically installed if needed, meaning that packages depending on PythonCall or JuliaCall can be used with zero set-up. They are installed to an environment-specific location, so that removing the environment also removes any dependencies.

Use different Pythons without rebuilding

PyCall currently hard-codes the path to libpython in its build step. This means that if you need to use different versions of Python, then you need to rebuild PyCall each time you switch.

PythonCall has no build step. You can start multiple Julia sessions in multiple projects each requiring a different version of Python, and PythonCall will work fine in all of them.

JuliaCall is teeny

It pretty much consists of this one file just 137 lines long. This is because most of the implementation is in PythonCall, and all JuliaCall needs to do is find Julia and get it to import PythonCall.

The relevance of this is that you get a very consistent experience between PythonCall and JuliaCall. All the conversions work the same in both directions from either package. Since JuliaCall is bundled into PythonCall, any Python package can do import juliacall and it will work properly regardless of whether it is running in Python itself or from PythonCall in Julia.

93 Likes

Excuse me, what’s the difference from the package PyCall?

The existing package PyCall is another similar interface to Python. Here we note some key differences, but a more detailed comparison is in the documentation.

  • PythonCall supports a wider range of conversions between Julia and Python, and the conversion mechanism is extensible.
  • PythonCall by default never copies mutable objects when converting, but instead directly wraps the mutable object. This means that modifying the converted object modifies the original, and conversion is faster.
  • PythonCall does not usually automatically convert results to Julia values, but leaves them as Python objects. This makes it easier to do Pythonic things with these objects (e.g. accessing methods) and is type-stable.
  • PythonCall installs dependencies into a separate conda environment for each Julia project. This means each Julia project can have an isolated set of Python dependencies.
  • PythonCall supports Julia 1.4+ and Python 3.5+ whereas PyCall supports Julia 0.7+ and Python 2.7+.
13 Likes

This post reads like PythonCall has more (or better) functionality than PyCall. That’s great! :+1:

Is there anything that PyCall can do today while PythonCall cannot (yet). What would you suggest to anyone who might be itching to switch over to PythonCall?

1 Like
  • Calling Julia from Python, juliacall seems to have smaller overhead (in time) than julia:
In [5]: %timeit julia.Main.identity(1)
272 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: jid = julia.Main.identity

In [7]: %timeit jid(1)
3.16 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: %timeit juliacall.Main.identity(1)
3.01 µs ± 197 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: jid = juliacall.Main.identity

In [8]: %timeit jid(1)
1.38 µs ± 21.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
  • In the Python module julia, Julia modules are wrapped in a subclass of the Python Module type. They are imported using the Python import syntax: from julia import Example. On the other hand juliacall has a python type hierarchy of wrappers for Juia objects. So isinstance(julia.Main, type(sys)) is True. And isinstance(juliacall.Main, juliacall.ModuleValue) and isinstance(juliacall.Main, juliacall.AnyValue) are both True. I don’t have enough experience to evaluate which might be better. It would be interesting to hear the arguments.

  • The julia Python module (from pyjulia-PyCall) allows you to specify a system image for Julia. But, the author of PythonCall has plans to add support for this.

  • After importing julia, you must explicitly initialize the Julia process. (Kind of; importing a julia module also initializes the Julia process). So import julia, just loads the code. Then from julia import Main finds libjulia, initializes the julia runtime, etc. There is also an API to control how initialization is done. Also, julia doesn’t do much work to find the julia executable and libjulia. On the other hand juliacall does everything upon import juliacall: looks for Julia, downloads it if needed, calls the init function in libjulia, etc. Recently, the author added a facility for a package author to stop this initialization with an environment variable. I have not yet had a chance to try this. But, if it works as I understand, then both packages allow you to customize the initialization.

  • Related to the above; juliacall is more of a one-stop shop for writing Python packages that use Julia than is julia (and I think the same is true for the other direction, ie. using PythonCall) . This is probably in part why you are given a lot of freedom in the startup of julia— because you have more need for it. In addition to examples above, it does not manage Julia packages (and PyCall does not manage python packages). Some of the capabilities of juliacall and PythonCall have been devolved to other packages, such as those mentioned in the OP. But, still the ecosystem intends to offer much more for specifying your environment and packages. This is done mostly through configuration files.

3 Likes

If you have an encapsulated project that has its own python dependencies (for example a specific PyTorch version corresponding to a specific python version) then PythonCall is a nicer solution with less mysterious stuff, and a neat place to place all dependencies in a human-readable TOML file.

Makes deploying your work to the cloud or another remote computer easier. All python dependencies you need are now managed by the project itself

Wow, that’s seriously great!

As a heavy user of PyCall/PyPlots, I wonder:

PythonCall already displays matplotlib plots, so it doesn’t need a PyPlots.jl equivalent at all? Do you know of any remaining advantages of PyCall + PyPlots vs PythonCall?

Currently, I have PyCall’s ~/.julia/conda taking about 2.5 Gb of space. Would PythonCall duplicate all common python dependencies (numpy, scipy, matplotlib, mkl, …) for each Julia env? Or something more clever happens?

1 Like

Good question! I’m not actually sure as it’s been a while since I used PyCall much - I have implemented all the things I personally find useful. I think PyCall has some nice ways to create new Python classes. PyJulia provides some cool IPython magics.

As for switching, I’d say just try it out. You can use both packages in the same session - and can even share Python objects between them if they are using the same libpython.

2 Likes

I’d say that if you are already familiar with Python and Matplotlib then using them via PythonCall works fine. PyPlots and other wrapper libraries still have value by providing a more Julian interface, and maybe integrate into the Julia ecosystem better.

2 Likes

I’m not sure about this actually. I suspect a lot of that is the central cache of old downloaded packages. AFAIK Conda never clears that cache automatically - see the conda clean command.

That cache doesn’t get copied for each project, but probably the packages you actually use do (I’m not sure if Conda uses symlinks back to the cache?) so if you have many projects using MKL for example they might use a lot of space.

PythonCall is not doing anything clever, just creates a Conda environment like you would at the command line.

1 Like

Thanks for the detailed responses!
Indeed, my main concern is the duplication of conda packages. With Pluto, every notebook is its own Julia environment, so duplicating the whole numpy (incl MKL) gets infeasible pretty fast.
I’m going to try PythonCall and see how much space do environments actually take. The fundamental approach definitely looks cleaner in many aspects compared to PyCall.

I’m currently using PyPlots, and like that they provide basically the same interface and syntax as matplotlib itself. Sounds like PythonCall should be even closer.

This is really awesome, congratulations! I played around with PythonCall back near your first announcement, and its vastly improved. Couple questions / comments:

  • I tried the CondaPkg / JuliaPkg thing which worked flawlessly. I also discovered I can set JULIA_PYTHONCALL_EXE=$(which python) and JULIA_PYTHONCALL_PROJECT="." to just use an existing Python/Julia enviornment (I’m really liking Poetry on the Python side), which is great for more control. Might be worth documenting this more directly.

  • Is there any plan to have anything like PyCall’s string interpolation py"1+$x" thing? Personally I find it really handy to be able to just paste some unedited Python code into my Julia scripts and have it work. From what I can tell, this might be the only thing holding me back from switching from PyCall/pyjulia entirely.

  • It seems you got around the dreaded pyjulia staticially linked libpython issue, which is great. I admit I don’t actually know the details of why that’s an issue for pyjulia, but still curious if you could say what you did differently to get around that?

  • Even though its not “needed”, it might still be nice if someone created a “PyPlotCall.jl” package or something, which automatically imported all the standard pyplot stuff and set up Jupyter so you don’t need to return the figure each time.

Anyway, thank you for the great tool!

5 Likes

Is it possible to override the default location of the Julia environment (~/.julia/environments/pyjuliapkg resp. conda_path/anaconda3/julia_env) when using juliacall from Python? Would be handy if JULIA_PROJECT would be used if set, for example.

1 Like

Indeed it’s perfectly valid to do that. My next focus is going to be on getting the documentation and testing better. BTW The JULIA_PYTHONCALL_PROJECT variable might go, and instead we’ll just respect JULIA_PROJECT instead if it is set.

There is @pyeval and @pyexec.

Your example would become @pyeval x => "1+x".

You can also specify the output type like @pyeval x => "1+x" => Float64.

I’ve encountered that issue before in pyjulia but don’t actually know its cause.

I imagine the difference is in how the packages load libpython. In JuliaCall, we pass ctypes.pythonapi._handle to PythonCall, which is a pointer to an already-open libpython. I assume PyJulia/PyCall opens libpython itself.

An older version of PythonCall hooked into IJulia so that matplotlib plots were automatically shown, just like they do in IPython. I could certainly add this back.

2 Likes

Yes indeed, I’ve been thinking that CondaPkg should respect the standard Julia environment variables better. I’ll change it to use JULIA_PROJECT if set.

1 Like

Awsome, thanks!

A technical question, if I may: A current issue with pyjulia is that there’s trouble if Phython is statically linked. That doesn’t seem to be the case with juliacall, how do you get around that issue?

See my earlier reply to marius311.

Awesome work. However, I encountered the following error in Julia 1.8.0-DEV when trying the example above and Julia exits immediately:

julia> using PythonCall

julia> plt = pyimport("matplotlib.pyplot")
Python module: <module 'matplotlib.pyplot' from 'D:\\DeepBook\\.julia\\environments\\v1.8\\.CondaPkg\\env\\lib\\site-packages\\matplotlib\\pyplot.py'>

julia> plt.plot(randn(100))
qt.qpa.plugin: Could not find the Qt platform plugin "windows" in ""
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Yes, a huge pain point in PyPlot.jl was also trying to select/install a working Matplotlib backend by default.

Thanks for your efforts on PythonCall and JuliaCall, BTW. When I created PyCall eight years ago, it was an urgent need to compensate for the relative paucity of Julia packages, but some types of interoperability were still difficult due to Julia limitations. For example, I spent years bugging @jeff.bezanson about dot overloading every time I ran into him, which we finally got 4 years later. Zero-based arrays were not well supported in Julia, and in general it was tricky to decide which Python types should be converted to native Julia types (for convenience) vs wrapped (for efficiency & flexibility) — in hindsight, with the Julia features we have now, I would have left more Python types as-is except for types with lossless round-trip conversions. The very first version of pyjulia was written by Fernando Perez as an “IPython magic”, but it was years before anyone would want to think about calling Julia from Python as more than a cool demo. And, while PyCall could originally link to any libpython at runtime, it had to be switched to build-time configuration in order to improve load times; fortunately, Julia’s loading speed has improved since then.

Nowadays, the need to call Python code from Julia is less urgent, though I still am in the habit of using Matplotlib, while the desire to call Julia code from Python in “real” code is growing, and it seems reasonable to re-think many of the design choices that originally went into PyCall. Meanwhile, I haven’t had as much time for PyCall development effort, and while we’ve toyed with the idea of a “PyCall 2.0”, it would be disruptive enough that it’s about as easy to switch to a completely new package. So I’m quite happy to see people actively working on an alternative re-design, and hope that the existing PyCall and PyPlot are helpful in this process.

54 Likes