Can JuliaCall easily handle multi-threading?

marcsgil · February 21, 2024, 6:56pm

Hello! I’m developing a Julia package that I would like to port to Python. I heard that JuliaCall is the way to go, as many famous packages such as diffeqpy and pySR are now using it.

The package that I’m developing is mainly composed of array operations that can easily be parallelized, and so the package relies heavily on Tullio.jl. My question is if I can easily keep this sort of multi-threading in a Python version of the package if I use JuliaCall. I wonder if the Global Interpreter Lock (GIL) would cause any trouble in this situation.

Thanks!

gdalle · February 21, 2024, 7:03pm

Related docs:

https://juliapy.github.io/PythonCall.jl/stable/faq/#Is-PythonCall/JuliaCall-thread-safe?

sylvaticus · February 21, 2024, 8:01pm

Hmmm… that’s about calling Python functions, that should be done only from the first thread, but here the op wants to know if python code calls a julia function that then uses several threads is safe…

gdalle · February 21, 2024, 8:37pm

I think the later part of this docs paragraph applies to the other direction, no?

Julia intentionally causes segmentation faults as part of the GC safepoint mechanism. If unhandled, these segfaults will result in termination of the process. To enable signal handling, set PYTHON_JULIACALL_HANDLE_SIGNALS=yes before any calls to import juliacall. This is equivalent to starting julia with julia --handle-signals=yes, the default behavior in Julia. See discussion here for more information.

And more generally, the following warning seems to cut both ways:

Is PythonCall/JuliaCall thread safe?
No.

marcsgil · February 21, 2024, 9:35pm

To be more specific in the use case, take as reference the Julia function

function f(x,y)
    @tullio result[i,j] := exp(-x[i]^2-y[j]^2)
end

I would like to call f(x,y) in Python where x and y might be np.linspace’s, and I would like that the computation utilizes all available threads, as would be the case in Julia.

MilesCranmer · February 21, 2024, 9:43pm

Actually if you turn it off with PythonCall.GC.disable() and have PYTHON_JULIACALL_HANDLE_SIGNALS=yes then there’s nothing to worry about. The “Is PythonCall/JuliaCall thread safe?” is about garbage collection.

PySR just switched last week to juliacall and has been super stable (even more than with PyJulia!) so far despite making heavy use of multi-threading

Here’s my main call to heavily multi-threaded code:

github.com

MilesCranmer/PySR/blob/71563343523588e06d7d837fa34a23b24d6d73dc/pysr/sr.py#L1727-L1728


      
          PythonCall.GC.disable()
          out = SymbolicRegression.equation_search(

jling · February 21, 2024, 9:45pm

a somewhat related and also unsolved issue is can we tell Python to release GIL when calling Julia code:

MilesCranmer · February 21, 2024, 9:57pm

I just tried writing this function in juliacall and it seems fine by the way. Not even any need for the GC disabling.

import numpy as np
from juliacall import Main as jl

jl.seval("using Pkg")
jl.Pkg.add("Tullio")
jl.seval("using Tullio")

f = jl.seval("(x, y) -> (@tullio _[i,j] := exp(-x[i]^2-y[j]^2))")

x = np.linspace(0, 1)
y = np.linspace(0, 1)
f(x, y)  # Works!

I wouldn’t expect this to work generally for multithreading but I guess some cases it will work out of the box.

vchuravy · February 21, 2024, 10:00pm

That shouldn’t be needed if you allow Julia to perform it’s signal handling.

MilesCranmer · February 21, 2024, 10:13pm

I just tried commenting those lines out and running example.py from the PySR repo. Once it executes that equation_search statement, Python and Julia both freeze and I need to quit. So I’m assuming it is somehow still needed…

Palli · February 21, 2024, 10:30pm

Did you use Julia 1.11? I ask because I see in Tullio.jl:

It uses LoopVectorization.@avx to speed many things up

And LoopVectorization is deprecated, basically disabled on 1.11+, so if somethings seems to work there, then it might not on 1.10 and earlier. I believe then LV uses threads, i.e. if enabled could be a problem [but probably only if it and/or Tullio allocates, as I explain below, and I actually think LV at least doesn’t.]

I’m trying to think why threads could be a problem, if you have more than one on the Python side and/or on the Julia side (also might matter if calling from or to Python? Note also Python is dropping the GIL in a future version). I wouldn’t trust anything, since not decumented to support multi-threaded, but it can only fail in certain ways, e.g. if you allocate, and GC is the problem. Note, since recently the Julia GC is multithreaded, i.e. since after juliacall/PythonCall.jl docs were written. So even if you think you’re not using multi-threaded code it might be. You can still opt into single-threaded GC with --gcthreads but I forget what is the default with or without -t.

Before if you didn’t allocate, you were guaranteed GC would not be triggered, since it/freeing is triggered allocations. I’m not sure that is still valid for multithreaded GC, but would like to know. And even if still guaranteed, it might not be in the future with changed GC(?).

MilesCranmer · February 21, 2024, 10:32pm

[ins] In [3]: jl.Threads.nthreads()
Out[3]: 6

[ins] In [4]: jl.versioninfo()
Julia Version 1.10.1

Not sure what’s going on!

mkitti · February 21, 2024, 10:32pm

We really do need a low-signal mode for good interop. It gets really messy when Julia is using signals and some other software in the process is also trying to use signals.

One method involves signal chaining. Basically when sending signal set some boolean flag that Julia itself is signaling. If a signal is received in the signal handler and the flag is not set, then send it to the next signal handler in the chain.

vchuravy · February 22, 2024, 12:16am

We use signals primarily for one thing. During multi-threaded execution we need an was to signal to other threads that a thread has requested for garbage collection to be run. This is called a safepoint.

A common way to implement this is to perform in regular intervals a load from a page. When GC needs to run the permissions on that page are set to inaccessible and the OS will signal the thread that it has performed an illegal memory access. We detect this in the signal handler and suspend the thread until GC has executed.

The design trade-off is that a load is pretty much the cheapest thing you can do, so frequent safe points don’t hurt performance too much.

Could you implement safe points differently, yes but most other methods add constant overhead to the program when GC is not running.

I think the agreement is that signal chaining is the way to go, but no one has spent time on that.

Hmm, that’s probably worthwhile to investigate. If you turn off GC your memory usage will just slowly grow until you run out of memory. What is likely the case is that a thread is in ?blocked? in python when GC is triggered (it should have been marked safe to execute GC concurrently). So if you can get a backtrack from all threads we might see what causes things to be blocked.

An example of this could be that one thread holds the GIL, another thread is blocked on the GIL and the thread holding the GIL calls back into Julia triggering GC and the deadlocks because it is waiting for the other thread to reach a safe point (which it never will until the GIL is released) That is pure speculation, but we had a similar story in CUDA just recently.

MilesCranmer · February 22, 2024, 12:43am

It’s only PythonCall’s GC. Since the equation_search function isn’t creating any new PythonCall objects, this won’t be an issue. I verified the memory doesn’t grow with some pretty heavy testing before I moved PySR over from PyJulia.

Now if I did not explicitly convert arrays to Array, but left them as PyList{Any}, this might not be true, as PythonCall controls the finalizers for those. But I didn’t check.

Your speculation may be correct because it seems to at least start the search (?) but once it starts waiting on workers to return, nothing happens.

Some folks and myself were working on thread-safe GC in PyCall.jl: Thread-safe garbage collection v3 (ReentrantLock version) by MilesCranmer · Pull Request #1074 · JuliaPy/PyCall.jl · GitHub. Maybe we need to try a similar thing for PythonCall!

Will try to check after I’m done teaching this term (ending mid March)… Ping me if I forget!

Topic		Replies	Views
Floop threading and Juliacall produce segmentation fault General Usage question , parallel , python , juliacall	12	584	June 26, 2023
PyJulia multithreading segfaults General Usage question , multithreading , juliacall , pythoncall	1	1087	July 23, 2021
Calling Julia from Python on thread General Usage multithreading , python	2	1554	July 17, 2020
Help with PyCall-related segfault General Usage	4	1410	February 26, 2021
Using PyCall on different threads crashes my Julia REPL General Usage question	3	698	November 10, 2021

Can JuliaCall easily handle multi-threading?

Related topics