Questions about pyjulia usage

I am trying to see how well a code runs called from python, and I’m facing some problems and have some doubts:

  1. When trying to import any package I am getting this error:
Your Python interpreter "/usr/bin/python3"
is statically linked to libpython.  Currently, PyJulia does not fully
support such Python interpreter.
details
leandro@avell:~% ipython3 
Python 3.8.10 (default, Sep 28 2021, 16:10:42) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from julia import CellListMap                                                    
---------------------------------------------------------------------------
UnsupportedPythonError                    Traceback (most recent call last)
<ipython-input-1-7997273b03be> in <module>
----> 1 from julia import CellListMap

/usr/lib/python3.8/importlib/_bootstrap.py in _find_and_load(name, import_)

/usr/lib/python3.8/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

/usr/lib/python3.8/importlib/_bootstrap.py in _load_unlocked(spec)

/usr/lib/python3.8/importlib/_bootstrap.py in _load_backward_compatible(spec)

~/.local/lib/python3.8/site-packages/julia/core.py in load_module(self, fullname)
    246             return sys.modules.setdefault(fullname,
    247                                           JuliaMainModule(self, fullname))
--> 248         elif self.julia.isafunction(juliapath):
    249             return self.julia.eval(juliapath)
    250 

~/.local/lib/python3.8/site-packages/julia/core.py in julia(self)
    237     @property
    238     def julia(self):
--> 239         self.__class__.julia = julia = Julia()
    240         return julia
    241 

~/.local/lib/python3.8/site-packages/julia/core.py in __init__(self, init_julia, jl_init_path, runtime, jl_runtime_path, debug, **julia_options)
    481             logger.debug("compiled_modules = %r", options.compiled_modules)
    482             if not (options.compiled_modules == "no" or is_compatible_python):
--> 483                 raise UnsupportedPythonError(jlinfo)
    484 
    485             self.api.init_julia(options)

UnsupportedPythonError: It seems your Julia and PyJulia setup are not supported.

Julia executable:
    julia
Python interpreter and libpython used by PyCall.jl:
    /usr/bin/python3
    /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
Python interpreter used to import PyJulia and its libpython.
    /usr/bin/python3
    /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0

Your Python interpreter "/usr/bin/python3"
is statically linked to libpython.  Currently, PyJulia does not fully
support such Python interpreter.

The easiest workaround is to pass `compiled_modules=False` to `Julia`
constructor.  To do so, first *reboot* your Python REPL (if this happened
inside an interactive session) and then evaluate:

    >>> from julia.api import Julia
    >>> jl = Julia(compiled_modules=False)

Another workaround is to run your Python script with `python-jl`
command bundled in PyJulia.  You can simply do:

    $ python-jl PATH/TO/YOUR/SCRIPT.py

See `python-jl --help` for more information.

For more information, see:

    https://pyjulia.readthedocs.io/en/latest/troubleshooting.html


The workaround proposed there (i. e. jl = Julia(compiled_modules=False)) makes the code work. However, since the performance I’m getting is not very good, this might not be a solution.

  1. My package works on (3,N) matrices. In Julia I create them with rand(3,1000), for example. Since Python is row-major, I thought that with numpy that would correspond to a np.random.random((1000,3)) array. However the julia package works with the np.random.random((3,1000)) array, which confuses me. Is a copy being made that transposes the array in the interface of pyjulia? edit: this is explicitly mentioned here, but, as shown bellow, does not seem to be reason for the difference in performance.

  2. How to allow Julia to run in parallel from within python? I didn’t find how to do it. Found the solution at the unnatural no threading support section of the manual (yet, although the nthreads() command indicates that more threads are available to Julia, the julia package itself appears not to be using them).

  3. Finally, even when running in serial, the execution of a calculation is taking almost 10 times longer when the package is called from python than from within Julia. The time required even to copy the input arrays and transpose them (if that is the case) does not seem to justify the difference. Any hint of what else may be going?

A MWE example in Julia is:

julia> using CellListMap

julia> x = rand(3,50000);

julia> @btime CellListMap.neighbourlist(x,0.05,parallel=false);
  59.951 ms (17598 allocations: 32.85 MiB)

and what I am able to run in IPython is:

In [4]: import numpy as np                                                      

In [5]: from julia import Julia                                                 

In [6]: jl = Julia(compiled_modules=False)                                      

In [7]: from julia import CellListMap                                           

In [8]: x = np.random.random((3,50000)); 

In [9]: %timeit CellListMap.neighbourlist(x,0.05);                             
619 ms ± 84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit CellListMap.neighbourlist(x,0.05);                             
608 ms ± 113 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The alternative to use the python-jl -m IPython solves the first problem (though I am not sure how python users will find this acceptable, any experience with that?).

The performance is not good, anyway (only slightly better, but far from the performance of the call directly from Julia).

The memory layout does not seem to be the cause of the slowdown, because I can create both layouts using a julia or a numpy call, and the peformance is the same (and still ~8-10 slower than running the same function in Julia):

In [76]: x = %julia rand(3,50000)                                                                                        

In [77]: x.flags                                                                                                         
Out[77]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [78]: %timeit CellListMap.neighbourlist(x,0.05)                                                                       
416 ms ± 66.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [79]: x = np.random.random((3,50000))                                                                                 

In [80]: x.flags                                                                                                         
Out[80]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [81]: %timeit CellListMap.neighbourlist(x,0.05)                                                                       
419 ms ± 66.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [82]:

In Julia:

julia> x = rand(3,50000);

julia> @btime CellListMap.neighbourlist($x,0.05,parallel=false);
  52.811 ms (17640 allocations: 32.86 MiB)

The first-time call of the function takes ~4s in Julia, thus that slowdown cannot be caused by repeated compilation of the code by each julia call (or the %timeit macro makes it compile only once, but every time).

I’ve had very bad experiences with pyjulia. Most prominently unexplained and irregular crashes. (Segfaults? I don’t remember) - my memory is that there just isn’t enough interest in calling Julia from Python. We are therefore rewriting all interfaces via embedding Julia in C and then linking to other libraries including ASE and OpenMM

Since we are interested in the same application domain I think there would be a lot of scope to collaborate if you are interested.

1 Like

I had already in these tests some out of memory errors. Not fun.

I was playing with the possibility of calling the package from python, because for some applications it is working faster than the scipy alternative (thus it could be interesting to use it). But I don’t have any application for that myself, for the moment.

For example, if I run that test with a matrix of (3,500_000) twice the python section gets killed. In Julia I can run problems with as much as 10 million columns with no problem.

That’s odd, any conversion overheads should be negligible compared to calling that function.

You may like to try JuliaCall instead.

3 Likes

I found my way around it (great package, thanks… the documentation needs some work :slight_smile: ).

Now I have:

In [2]: from juliacall import Main as jl

In [3]: jl.Pkg.add("CellListMap")                                                                    
    Updating registry at `~/.julia/registries/General`
   Resolving package versions...
  No Changes to `~/.julia/environments/PythonCall/Project.toml`
  No Changes to `~/.julia/environments/PythonCall/Manifest.toml`

In [5]: jl.seval("using CellListMap")                                                                

In [7]: import numpy as np                                                                           

In [9]: x = np.random.random((3,50000))                                                              

In [10]: %timeit jl.CellListMap.neighbourlist(x,0.05)                                                
60.5 ms ± 5.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


which has no perceptible overhead.

Compared to scipy.KDtree:

In [8]: from scipy.spatial import KDTree

In [11]: y = np.transpose(x)                                                                         

In [12]: def pp(points,r) :  
    ...:     kd_tree = KDTree(points)  
    ...:     pairs = kd_tree.query_pairs(r)  
    ...:     return pairs 
    ...:                                                                                             

In [13]: %timeit pp(y,0.05)                                                                          
213 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [20]: len(jl.CellListMap.neighbourlist(x,0.05))                                                   
Out[20]: 618076

In [21]: len(pp(y,0.05))                                                                             
Out[21]: 618076

I have only to find out how to run julia multi-threaded.