I’m trying to incorporate JuliaCall in my
application. Larger context:
- Julia is the backend to a python service endpoint (i.e. somebody sends an optimization problem to a server written in python via gRPC, server calls Julia, returns the solution). Python ends up executing these request responses in different threads if special care is not taken.
- Current solution is to call Julia via subprocess, so Julia shuts down after every call.
- Would like to just instantiate Julia once to get runtime optimization benefits.
- Would also be nice to set variables in Julia directly from python, by copy or by reference,
whatever works.
I got the following solution working in with PyJulia/PyCall. After several executions, Julia
optimizes successfully and quickly as expected (i.e., PyJulia call converges to close to the
steady-state solver time).
For this solution, I used multiprocessing
’s Pool
to work around Julia not being safe to call
from multiple threads (without this special treatment, segfaults).
from julia import Julia
import ctypes
from multiprocessing import Pool, Value
from typing import List
# A module-wide reference to Julia. We intentionally do _not_ initialize the actual Julia engine
# here, which must be done in its special thread/process.
_julia_reference = Value(ctypes.py_object, None)
def _teardown():
"""Trigger the Julia destructor by setting implicitly setting its reference count to 0.
This function must be called within the special JuliaInterface thread.
"""
_julia_reference.value = None
def _setup():
"""Construct the Julia session instance.
This function must be called within the special JuliaInterface thread.
"""
_julia_reference.value = Julia()
# Install prerequisites for solve_problem.jl
_julia_reference.value.include("install_script.jl")
# A few warmup runs to precompile & optimize execution.
warmups = 4 # This was sufficient in testing to fully optimize.
for _ in range(warmups):
cmd = ['--arg1', 'my-arg1'] # Currently using CLI args to pass inputs to solve_problem.jl.
_call_solve_problem(cmd)
def _call_solve_problem(cmd):
"""Call solve_problem.jl with cli arguments cmd.
Args:
cmd: See JuliaInterface.call_solve_problem().
This function must be called within the special JuliaInterface thread."""
# Set the special Julia global variable ARGS to imitate running solve_problem.jl
# with Julia CLI. Not a long-term solution, but works for now.
eval_str = 'ARGS=["--arg1", "my-arg"]' # In practice, constructed from cmd, but omitting that code in this example.
_julia_reference.value.eval(args)
_julia_reference.value.include("call_solve_problem.jl")
class JuliaInterface:
"""Helper class to call Julia from Python.
Why are we calling Julia through a process pool? Why is the core functionality of this class
in free functions instead of just normal member functions?
Good questions. It turns out that Julia is, to a certain degree, not threadsafe. Note that this
only applies to calling the Julia engine from the same thread; Julia of course does
multithreaded stuff under the hood, and this it does quite safely.
However, since we're calling Julia from Python, we have to ensure that the Julia instance we use
is always called from the same thread.
The process pool allows us to accomplish this. Instead of instantiating the Julia session and
calling it from within this class, then, we instead:
* Create an (ugly) global reference that everything in this module has access to: `_julia_reference`.
* Run all operations with the Julia engine _inside_ the special single thread. This includes:
* Instatiating _and_ destroy Julia.
* Actual Julia operations, like calling solve_problem.jl
For reference, violating this "must call julia from the same thread" policy usually ends with a
fault inside the underlying `julialib`, like this one:
* thread #26, stop reason = EXC_BAD_ACCESS (code=1, address=0x18)
frame #0: 0x0000000130f7e680 libjulia-internal.1.8.dylib`ijl_excstack_state + 12
"""
def __init__(self):
"""Create the single special thread for all Julia operations, and instatiate the Julia engine."""
self._process_pool = Pool(1)
self._process_pool.apply(_setup)
def __del__(self):
"""Destroy Julia inside the single special thread."""
self._process_pool.apply(_teardown)
def call_solve_problem(self, cmd: List[str]):
"""Execute solve_problem.jl via the single special thread via the pyjulia interface.
Args:
cmd: The CLI args you would have given to `call_solve_problem.jl` from the command line, as a list.
For example, if at the command line you would issue
julia --project solve_problem.jl --arg1 my-arg
`cmd` should be:
["--arg1", "my-arg"]
"""
self._process_pool.apply(_call_solve_problem, (cmd,))
For this discussion, let’s imagine the server-side code looks something like this (I’ve reproduced
the issues I’m facing with a “server” this simple).
if __name__ == "__main__":
interface = JuliaInterface() # Defined above
for idx in range(10):
cmd = ['--arg1', 'my-arg1']
interface.call_solve_problem(cmd)
This works well enough, but
- I’ve found that PyJulia is pretty tempermental.
- Some early experiements in trying to set variables in Julia directly didn’t pan out that well with more complicated types.
- From what I can gather from other posts on julialang.org, PyJulia is not really maintained anymore.
So I thought I’d give JuliaCall a try and see it addressed these issues.
I coded up the same framework for calling Julia in JuliaCall, but one issue I ran into was I
couldn’t figure out a way to construct Julia exactly once inside the single special process/thread,
get a handle to the underlying Julia instance, and then only call it in the special thread, as done
above in the PyCall code.
In the spirit of throwing everything at the wall and seeing what stuck, I tried simply importing
JuliaCall right in the runtime path every time solve_problem.jl
is run:
from typing import List
def _setup():
from juliacall import Main as jl
jl.include("install_script.jl")
def _call_script_internal(cmd):
from juliacall import Main as jl
jl.include("solve_problem.jl")
args = ["--arg1", "value1", "--arg2", "--value2"]
jl.seval(args)
jl.include("exec_script.jl")
class JuliaInterface:
def __init__(self):
"""Create the single special thread for all Julia operations, and instatiate the Julia engine."""
self._process_pool = Pool(1)
self._process_pool.apply(_setup)
def call_script(self, cmd: List[str]):
self._process_pool.apply(_call_script_internal, (cmd,))
I didn’t have much success here.
-
When I could get it working, Julia never seemed to optimize execution across runs. That it, it was
just as slow as calling Julia from the command line every time, i.e.julia --project solve_problem.jl ...
, so it felt like Julia was getting reinstantiated on every import, perhaps. -
It was still just as tempermental as PyJulia, mysterious segaults and the like. Even when I could
get it working I often ran into mysterious segfaults like[1] 68353 segmentation fault python server/run_server.py
/opt/homebrew/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’
I also, of course, tried the simpler
from typing import List
from juliacall import Main as jl
class JuliaInterface:
def __init__(self):
"""Create the single special thread for all Julia operations, and instatiate the Julia engine."""
jl.include("install_script.jl")
def call_script(self, cmd: List[str]):
# again, pretend these are being properly constructed from `cmd`
args = ["--arg1", "value1", "--arg2", "--value2"]
jl.seval(args)
jl.include("solve_problem.jl")
but ended up, as with PyJulia, suffering from the segfaults caused by Julia’s non-threadsafe-ness.
Do you have any recommendations or tips for this application?
Thank you!