Can we import `dill` in PyCall rather than pickle?

The current version of PyCall, and when python3 is used, produces error in pmap:

julia> using Distributed

julia> addprocs(2);

julia> @everywhere using PyCall
julia> @everywhere math=pyimport("math")

julia> a = [1,2];
julia> pmap(x->math.sin(x), a)
ERROR: PyError ($(Expr(:escape, :(ccall(#= /home/lizz/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr,
 PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("can't pickle module objects")

This, of course, is because the use of pickle package in this line. Can we replace it with dill? I have verified that dill resolves the issue.

The only concern is the license issue. Can we import 3-clause BSD licensed python packages like:

pickle() = ispynull(_pickle) ? copy!(_pickle, pyimport(PyCall.pyversion.major ≥ 3 ? "dill" : "cPickle")) : _pickle

?

related pr: https://github.com/JuliaPy/PyCall.jl/pull/731

Why not pyimport in the anonymous function; e.g., pmap(x -> pyimport("math").sin(x), a)? There are other ways to workaround this. But I don’t think customizing serialization is the right way to do it.

import pkg for each iteration doesn’t feels right to me

Python modules are cached so it’s a very cheap operation. I’d just use it untill it turned out to be a bottleneck after benchmarking and profiling. Having said that, there are other solutions:

  • @everywhere pysin(x) = math.sin(x) then pmap(pysin, a)
  • You can copy! the module math or the function math.sin to a constant as mentioned in PyCall README. This can be used as per-process memoization or in __init__.
julia> using Distributed

julia> addprocs(2);

julia> @everywhere using PyCall

julia> @everywhere const math = PyNULL()

julia> @everywhere copy!(math, pyimport("math"))

julia> pmap(x->math.sin(x), [1,2])
ERROR: PyError ($(Expr(:escape, :(ccall(#= /home/lizz/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("can't pickle module objects")


It seems that the reliable solution is wrap each module into a function.

julia> using Distributed

julia> addprocs(2);

julia> @everywhere using PyCall

julia> @everywhere Image() = pyimport("PIL.Image")

julia> pmap(["y0.jpg", "y1.jpg"]) do b
       img = Image().open(b)
       println(img.size)
       img.close()
       end;
      From worker 2:    (112, 112)
      From worker 3:    (112, 112)

If you are OK with Image().open(b) you should be OK with pyimport("PIL.Image").open(b) (performance-wise).

Your code with copy! didn’t work because you are not using memoization/caching pattern. You need to call copy! inside the function you passed to pmap, guarded by ispynull.

Performance-wise I’m okay with it because in my case, processing images are very slow operations. If this is the right way to do, then I will accept it because it requires minimum changes.

Thanks for the tips!