Construct and intermittently call a single Julia instance with JuliaCall

kydonian · March 24, 2023, 4:01am

I’m trying to incorporate JuliaCall in my
application. Larger context:

Julia is the backend to a python service endpoint (i.e. somebody sends an optimization problem to a server written in python via gRPC, server calls Julia, returns the solution). Python ends up executing these request responses in different threads if special care is not taken.
Current solution is to call Julia via subprocess, so Julia shuts down after every call.
Would like to just instantiate Julia once to get runtime optimization benefits.
Would also be nice to set variables in Julia directly from python, by copy or by reference,
whatever works.

I got the following solution working in with PyJulia/PyCall. After several executions, Julia
optimizes successfully and quickly as expected (i.e., PyJulia call converges to close to the
steady-state solver time).

For this solution, I used multiprocessing’s Pool to work around Julia not being safe to call
from multiple threads (without this special treatment, segfaults).

from julia import Julia

import ctypes
from multiprocessing import Pool, Value
from typing import List

# A module-wide reference to Julia. We intentionally do _not_ initialize the actual Julia engine
# here, which must be done in its special thread/process.
_julia_reference = Value(ctypes.py_object, None)

def _teardown():
  """Trigger the Julia destructor by setting implicitly setting its reference count to 0.
  
  This function must be called within the special JuliaInterface thread.
  """
  _julia_reference.value = None

def _setup():
  """Construct the Julia session instance.
  
  This function must be called within the special JuliaInterface thread.
  """
  _julia_reference.value = Julia()
  # Install prerequisites for solve_problem.jl
  _julia_reference.value.include("install_script.jl")

  # A few warmup runs to precompile & optimize execution.
  warmups = 4  # This was sufficient in testing to fully optimize.
  for _ in range(warmups):
    cmd = ['--arg1', 'my-arg1']  # Currently using CLI args to pass inputs to solve_problem.jl.
    _call_solve_problem(cmd)

def _call_solve_problem(cmd):
    """Call solve_problem.jl with cli arguments cmd.

    Args:
      cmd: See JuliaInterface.call_solve_problem().
  
    This function must be called within the special JuliaInterface thread."""

    # Set the special Julia global variable ARGS to imitate running solve_problem.jl
    # with Julia CLI. Not a long-term solution, but works for now.
    eval_str = 'ARGS=["--arg1", "my-arg"]'   # In practice, constructed from cmd, but omitting that code in this example.
    _julia_reference.value.eval(args)
    _julia_reference.value.include("call_solve_problem.jl")

class JuliaInterface:
  """Helper class to call Julia from Python.
  
  Why are we calling Julia through a process pool? Why is the core functionality of this class
  in free functions instead of just normal member functions?
  
  Good questions. It turns out that Julia is, to a certain degree, not threadsafe. Note that this
  only applies to calling the Julia engine from the same thread; Julia of course does
  multithreaded stuff under the hood, and this it does quite safely.

  However, since we're calling Julia from Python, we have to ensure that the Julia instance we use
  is always called from the same thread.

  The process pool allows us to accomplish this. Instead of instantiating the Julia session and
  calling it from within this class, then, we instead:
  * Create an (ugly) global reference that everything in this module has access to: `_julia_reference`.
  * Run all operations with the Julia engine _inside_ the special single thread. This includes:
    * Instatiating _and_ destroy Julia.
    * Actual Julia operations, like calling solve_problem.jl

  For reference, violating this "must call julia from the same thread" policy usually ends with a
  fault inside the underlying `julialib`, like this one:

    * thread #26, stop reason = EXC_BAD_ACCESS (code=1, address=0x18)
        frame #0: 0x0000000130f7e680 libjulia-internal.1.8.dylib`ijl_excstack_state + 12
  """
  def __init__(self):
    """Create the single special thread for all Julia operations, and instatiate the Julia engine."""
    self._process_pool = Pool(1)
    self._process_pool.apply(_setup)
  
  def __del__(self):
    """Destroy Julia inside the single special thread."""
    self._process_pool.apply(_teardown)

  def call_solve_problem(self, cmd: List[str]):
    """Execute solve_problem.jl via the single special thread via the pyjulia interface.

    Args:
      cmd: The CLI args you would have given to `call_solve_problem.jl` from the command line, as a list.
    
      For example, if at the command line you would issue
        julia --project solve_problem.jl --arg1 my-arg
      
      `cmd` should be:
        ["--arg1", "my-arg"]
    """
    self._process_pool.apply(_call_solve_problem, (cmd,))

For this discussion, let’s imagine the server-side code looks something like this (I’ve reproduced
the issues I’m facing with a “server” this simple).

if __name__ == "__main__":
    interface = JuliaInterface()  # Defined above
    for idx in range(10):
        cmd = ['--arg1', 'my-arg1']
        interface.call_solve_problem(cmd)

This works well enough, but

I’ve found that PyJulia is pretty tempermental.
Some early experiements in trying to set variables in Julia directly didn’t pan out that well with more complicated types.
From what I can gather from other posts on julialang.org, PyJulia is not really maintained anymore.

So I thought I’d give JuliaCall a try and see it addressed these issues.

I coded up the same framework for calling Julia in JuliaCall, but one issue I ran into was I
couldn’t figure out a way to construct Julia exactly once inside the single special process/thread,
get a handle to the underlying Julia instance, and then only call it in the special thread, as done
above in the PyCall code.

In the spirit of throwing everything at the wall and seeing what stuck, I tried simply importing
JuliaCall right in the runtime path every time solve_problem.jl is run:

    from typing import List
    
    def _setup():
      from juliacall import Main as jl
      jl.include("install_script.jl")
    
    def _call_script_internal(cmd):
        from juliacall import Main as jl
        jl.include("solve_problem.jl")
    
        args = ["--arg1", "value1", "--arg2", "--value2"]
        jl.seval(args)
        jl.include("exec_script.jl")
    
    class JuliaInterface:
      def __init__(self):
        """Create the single special thread for all Julia operations, and instatiate the Julia engine."""
        self._process_pool = Pool(1)
        self._process_pool.apply(_setup)
    
      def call_script(self, cmd: List[str]):
        self._process_pool.apply(_call_script_internal, (cmd,))

I didn’t have much success here.

When I could get it working, Julia never seemed to optimize execution across runs. That it, it was
just as slow as calling Julia from the command line every time, i.e. julia --project solve_problem.jl ..., so it felt like Julia was getting reinstantiated on every import, perhaps.
It was still just as tempermental as PyJulia, mysterious segaults and the like. Even when I could
get it working I often ran into mysterious segfaults like

[1] 68353 segmentation fault python server/run_server.py
/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’

I also, of course, tried the simpler

    from typing import List
    from juliacall import Main as jl
    
    class JuliaInterface:
      def __init__(self):
        """Create the single special thread for all Julia operations, and instatiate the Julia engine."""
        jl.include("install_script.jl")
    
      def call_script(self, cmd: List[str]):
        # again, pretend these are being properly constructed from `cmd`
        args = ["--arg1", "value1", "--arg2", "--value2"]
        jl.seval(args)
        jl.include("solve_problem.jl")

but ended up, as with PyJulia, suffering from the segfaults caused by Julia’s non-threadsafe-ness.

Do you have any recommendations or tips for this application?

Thank you!

kydonian · March 24, 2023, 4:08am

@cjdoris

cjdoris · March 24, 2023, 8:44am

Could you please post a complete MWE? e.g. a full python script plus all those .jl scripts.

cjdoris · March 24, 2023, 8:51am

One thing, you keep mentioning threads but actually you make a process pool, so Julia is actually running in a separate process. This shouldn’t be a problem - in fact it should be simpler because there shouldn’t be any threading issues in that process.

Have you tried putting locks around the Julia code? If call_script is called concurrently from different threads, they may be calling into Julia concurrently, which is probably bad.

kydonian · April 14, 2023, 9:00pm

Thanks for the response! Yeah, I should have just posted the MWE first, I was worried since it was so tempermental I wouldn’t be able make one, and was looking for general tips / obvious code examples other than what I could already find.

In the process of building my MWE, I discovered some other explanations for the slow runtime. Moreover, I was able to get back to “quick” runtime with JuliaCall. So JuliaCall was not the cause there.
The import-in-the-loop approach ended up working and being pretty quick. I was able to build a functional prototype with the import-in-process-pool approach I’ve outlined below in process-call.py.

To close the loop here and to help other users, I decided to build some MWEs and post them. However, I ran into some interesting issues again. Figured I would post both the functional MWE and the non-functional one in case you have any follow-up ideas or suggestions.

MWEs

Common files

install_script.jl

include("n_queens.jl")

n_queens.jl

# Solve N-Queens, source largely from
# https://github.com/jump-dev/JuMP.jl/blob/master/docs/src/tutorials/linear/n-queens.jl

module NumQueensModule

using JuMP
import HiGHS
import LinearAlgebra

function solve_n_queens(N::Int64)
    model = Model(HiGHS.Optimizer)
    set_silent(model)
    @variable(model, x[1:N, 1:N], Bin);
    for i in 1:N
        @constraint(model, sum(x[i, :]) == 1)
        @constraint(model, sum(x[:, i]) == 1)
    end
    for i in -(N-1):(N-1)
        @constraint(model, sum(LinearAlgebra.diag(x, i)) <= 1)
        @constraint(model, sum(LinearAlgebra.diag(reverse(x; dims = 1), i)) <= 1)
    end
    optimize!(model)
    return round.(Int, value.(x))
end

end

Functional MWE

direct-call.py

if __name__ == "__main__":
  from juliacall import Main as jl
  jl.include("n_queens.jl")
  jl.seval("using .NumQueensModule")
  for i in range (3, 20):
    solution = jl.NumQueensModule.solve_n_queens(i+1).to_numpy()
    print(i, solution)

direct-call.py succeeds as expected:

$ python direct-call.py
3 [[0 0 1 0]
 [1 0 0 0]
 [0 0 0 1]
 [0 1 0 0]]
4 [[0 1 0 0 0]
 [0 0 0 1 0]
 [1 0 0 0 0]
 [0 0 1 0 0]
 [0 0 0 0 1]]
...

Non-functional MWE

process-call.py

from multiprocessing import Pool
    
def _setup():
  from juliacall import Main as jl
  jl.include("n_queens.jl")
  jl.seval("using .NumQueensModule")

def _call_script_internal(num_queens):
  from juliacall import Main as jl
  solution = jl.NumQueensModule.solve_n_queens(num_queens)
  print(num_queens, solution)

class JuliaInterface:
  def __init__(self):
    """Create the single special thread for all Julia operations, and instatiate the Julia engine."""
    self._process_pool = Pool(1)
    self._process_pool.apply(_setup)

  def call_script(self, num_queens: int):
    self._process_pool.apply(_call_script_internal, (num_queens,))


if __name__ == "__main__":
  interface = JuliaInterface()

  if True:
    # Failure version A
    interface.call_script(8)
  else:
    # Failure version B
    for i in range(3,10):
      interface.call_script(i)
      print(i)

Mentioning again here part of my deployment constraints:

Julia is the backend to a python service endpoint (i.e. somebody sends an optimization problem to a server written in python via gRPC, server calls Julia, returns the solution). Python ends up executing these request responses in different threads if special care is not taken.

Consequently, for my purposes I have to embed the JuliaCall code inside that single separate extra process, as you can see I’m doing inside process_call.py, which was supposed to be the functional MWE I was going to present here for future JuliaCall users.

However, I was unable to get this particular version working, running into different issuse depending on the execution method.

Failure version A

Running python process-call.py with the branching code set to execute simply

interface.call_script(8)

Yields:

$ python process-call.py

# Actually succeeds in executing Julia!
8 [0 0 0 0 1 0 0 0; 0 0 1 0 0 0 0 0; 0 0 0 0 0 0 0 1; 0 0 0 1 0 0 0 0; 0 0 0 0 0 0 1 0; 1 0 0 0 0 0 0 0; 0 0 0 0 0 1 0 0; 0 1 0 0 0 0 0 0]

# Subsequently crashes. Not much I can interpret here, but it's worth noting that
# this seems to occur on program *exit*. Is it possible that Julia is not getting
# properly garbage collected?
signal (15): Terminated: 15
in expression starting at none:0
_io_TextIOWrapper_flush at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
_PyEval_EvalFrameDefault at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
PyEval_EvalCode at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
run_eval_code_obj at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
run_mod at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
PyRun_StringFlags at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
PyRun_SimpleStringFlags at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
Py_RunMain at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
Py_BytesMain at /opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/Python (unknown line)
unknown function (ip: 0x0)
Allocations: 25344733 (Pool: 25333830; Big: 10903); GC: 19

Running python process-call.py with the branching code set to execute

for i in range(3,10):
  interface.call_script(i)
  print(i)

Yields:

$ python process-call.py

# Proximate cause seems related to the process pool. I can't figure out what's
# failing though, or why it would fail inside the loop. Should be getting parallel
# calls to JuliaCall, for example, because Multiprocessing.Pool.apply() is
# synchronous.
Exception in thread Thread-3 (_handle_results):
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 579, in _handle_results
    task = get()
           ^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/connection.py", line 250, in recv
    return _ForkingPickler.loads(buf.getbuffer())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: error deserializing this value

# Note: Hangs indefinitely

Failure version B

Versions

For completeness:

# M2 macbook pro
> uname --all
Darwin <hostname>.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020 arm64 arm Darwin

> julia --version
julia version 1.8.5

> python
Python 3.11.2 (main, Feb 16 2023, 02:55:59) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
>>> import juliacall
>>> juliacall.__version__
'0.9.12'

Topic		Replies	Views
PyJulia multithreading segfaults General Usage question , multithreading , juliacall , pythoncall	1	1049	July 23, 2021
Calling Julia from Python on thread General Usage multithreading , python	2	1520	July 17, 2020
Julia call from Python3 running in single core General Usage	34	3870	December 2, 2016
Can JuliaCall easily handle multi-threading? General Usage juliacall , pythoncall	14	821	February 22, 2024
Calling julia functions from python General Usage python , pyjulia	17	27887	March 23, 2022