Julia / Python 2 / Python 3 interoperability

marius311 · August 14, 2019, 3:25pm

I’m in the scenario that I need to call Python 2 legacy code from Julia, but I also do all my plotting from PyPlot/matplotlib and hence want to use a modern Python 3 matplotlib, and I want to do this all from a single Julia session so I can do exploratory work in a notebook.

Are there any smart ways I can go about doing this? It seems like PyCall can only be linked with one Python version at a time in a given session (which seems totally reasonable). I have been using execnet to call Python 2 from Python 3, but its still pretty clunky. Any other suggestions? Thanks.

pixel27 · August 14, 2019, 5:44pm

I wonder if you can use Distributed to create a second instance and configure it’s PyCall different than the master instance. Baring that I would probably create two Julia instances and have them communicate over TCP, i.e. a client/server model, where the client using 1 version of Python and the server uses the other.

marius311 · August 14, 2019, 8:02pm

Thanks, that’s a great idea! Moving objects around in Julia is much nicer than the clunkier execnet, so its an overall win.

A down-side is that PyCall has to be built/precompiled each time we launch, which adds ~10 seconds to startup. Certainly would be great if two built versions could be stored separately, but for now its a fine trade-off for me. Here’s a first attempt that basically works:

using Distributed
using PyCall
using Pkg

id_py2worker, = addprocs(1, restrict=true)

# launch our Python 2 worker and build PyCall with Python 2
@everywhere id_py2worker begin
    ENV["PYTHON"] = "python2"
    using Pkg
    Pkg.build("PyCall")
    using PyCall
end

# in background, rebuild PyCall back to the original version (the py2worker has already
# loaded Python 2, so that will stick)
remotecall((orig_python)->begin
    ENV["PYTHON"] = orig_python
    Pkg.build("PyCall")
end, id_py2worker, PyCall.python)

The we can check its all working:

julia> @fetchfrom id_py2worker PyCall.pyversion
v"2.7.16"

julia> PyCall.pyversion
v"3.7.3"

julia> @fetchfrom id_py2worker py"""
       import sys
       """

julia> @fetchfrom id_py2worker py"sys.version"
"2.7.16 (default, Apr  6 2019, 01:42:57) \n[GCC 8.3.0]"

One problem is that if Revise is already loaded on the main process, building back the original Python will cause the Python 2 workers to update and in fact segfault. I can’t figure out how to stop that from happening. (This issue could be one solution)

stevengj · August 14, 2019, 8:09pm

It should be possible to clone a copy of PyCall, install it as a new package with a different name (e.g. PyCall3), and configure it with a different version of Python. Then you can import both PyCall and PyCall3 in the same Julia process.

marius311 · August 14, 2019, 8:11pm

Thanks. This would be even better, but is there any programmatic way to set something like this up? Or would anyone else using my code have to also do it by hand (which sounds not entirely trivial)?

stevengj · August 14, 2019, 8:31pm

The easiest thing is probably for you to post a fork of PyCall as “PyCall2” or whatever and tell your users to add it. The hardest thing to automate, of course, is the process of setting up Python itself. (The Conda package only lets you install either Python 2 or Python 3 at one time. Of course, you could create a Conda2 fork that defaults to Python 2, and make your PyCall2 fork depend on Conda2.)

tkf · August 14, 2019, 8:33pm

I think you can also create a sysimage with Python 2 and pass it to Julia subprocess via --sysimage flag. This would handle the case where you need to use packages depending on PyCall configured with Python 2.

marius311 · August 14, 2019, 11:08pm

Can you describe more exactly how this solution would work? I’m not too familiar with custom sysimages.

tkf · August 14, 2019, 11:31pm

See https://github.com/JuliaPy/PyCall.jl/tree/master/aot

I guess it would be something like

PYTHON=python2 aot/compile.jl --color=yes
cp aot/sys.so sys-python2.so  # copy it to somewhere
julia -J sys-python2.so

You might also want to add

Base.eval(Base, quote
    function package_slug(uuid::UUID, p::Int=5)
        crc = _crc32c(uuid)
        crc = _crc32c(unsafe_string(JLOptions().image_file), crc)
        return slug(crc, p)
    end
end)

in aot/precompile.jl so that precompilation cache for sys-python2.so is isolated from your normal precompilation cache. (I’ve been using this trick in jlm and PyJulia; ref ANN: a solution to the precompilation problem: JuliaManager.jl / jlm CLI, a system image manger for Julia)

marius311 · August 28, 2019, 1:11am

So inspired somewhat by several of the responses here, here’s the solution I’ve landed on that seems to be working well:

https://github.com/marius311/Py2Call

My requirements for a solution were:

I don’t have to fork anything or edit PyCall’s source to rename anything.
Users don’t have to do anything beyond standard Julia package installation.
No unnecessary recompiles get triggered.

I think basically this achieves that. What it sets up for you is that you have the latest version of PyCall in your main environment built for Python 3, and in a separate environment it installs an older version of PyCall and builds it for Python 2. Then it spawns a subprocess Julia running in this other environment and communicates using remote calls. Thanks to pull/32651 (so you do need to be on master, for now), both versions can be precompiled so no recompilation is triggered as you run the two environments.

Not planning to register this for now, but happy if anyone uses / contributes / critiques this solution.