Hello! I’m developing a Julia package that I would like to port to Python. I heard that JuliaCall is the way to go, as many famous packages such as diffeqpy and pySR are now using it.
The package that I’m developing is mainly composed of array operations that can easily be parallelized, and so the package relies heavily on Tullio.jl. My question is if I can easily keep this sort of multi-threading in a Python version of the package if I use JuliaCall. I wonder if the Global Interpreter Lock (GIL) would cause any trouble in this situation.
Hmmm… that’s about calling Python functions, that should be done only from the first thread, but here the op wants to know if python code calls a julia function that then uses several threads is safe…
I think the later part of this docs paragraph applies to the other direction, no?
Julia intentionally causes segmentation faults as part of the GC safepoint mechanism. If unhandled, these segfaults will result in termination of the process. To enable signal handling, set PYTHON_JULIACALL_HANDLE_SIGNALS=yes before any calls to import juliacall. This is equivalent to starting julia with julia --handle-signals=yes, the default behavior in Julia. See discussion here for more information.
And more generally, the following warning seems to cut both ways:
To be more specific in the use case, take as reference the Julia function
function f(x,y)
@tullio result[i,j] := exp(-x[i]^2-y[j]^2)
end
I would like to call f(x,y) in Python where x and y might be np.linspace’s, and I would like that the computation utilizes all available threads, as would be the case in Julia.
Actually if you turn it off with PythonCall.GC.disable() and have PYTHON_JULIACALL_HANDLE_SIGNALS=yes then there’s nothing to worry about. The “Is PythonCall/JuliaCall thread safe?” is about garbage collection.
PySR just switched last week to juliacall and has been super stable (even more than with PyJulia!) so far despite making heavy use of multi-threading
Here’s my main call to heavily multi-threaded code:
I just tried commenting those lines out and running example.py from the PySR repo. Once it executes that equation_search statement, Python and Julia both freeze and I need to quit. So I’m assuming it is somehow still needed…
Did you use Julia 1.11? I ask because I see in Tullio.jl:
It uses LoopVectorization.@avx to speed many things up
And LoopVectorization is deprecated, basically disabled on 1.11+, so if somethings seems to work there, then it might not on 1.10 and earlier. I believe then LV uses threads, i.e. if enabled could be a problem [but probably only if it and/or Tullio allocates, as I explain below, and I actually think LV at least doesn’t.]
I’m trying to think why threads could be a problem, if you have more than one on the Python side and/or on the Julia side (also might matter if calling from or to Python? Note also Python is dropping the GIL in a future version). I wouldn’t trust anything, since not decumented to support multi-threaded, but it can only fail in certain ways, e.g. if you allocate, and GC is the problem. Note, since recently the Julia GC is multithreaded, i.e. since after juliacall/PythonCall.jl docs were written. So even if you think you’re not using multi-threaded code it might be. You can still opt into single-threaded GC with --gcthreads but I forget what is the default with or without -t.
Before if you didn’t allocate, you were guaranteed GC would not be triggered, since it/freeing is triggered allocations. I’m not sure that is still valid for multithreaded GC, but would like to know. And even if still guaranteed, it might not be in the future with changed GC(?).
We really do need a low-signal mode for good interop. It gets really messy when Julia is using signals and some other software in the process is also trying to use signals.
One method involves signal chaining. Basically when sending signal set some boolean flag that Julia itself is signaling. If a signal is received in the signal handler and the flag is not set, then send it to the next signal handler in the chain.
We use signals primarily for one thing. During multi-threaded execution we need an was to signal to other threads that a thread has requested for garbage collection to be run. This is called a safepoint.
A common way to implement this is to perform in regular intervals a load from a page. When GC needs to run the permissions on that page are set to inaccessible and the OS will signal the thread that it has performed an illegal memory access. We detect this in the signal handler and suspend the thread until GC has executed.
The design trade-off is that a load is pretty much the cheapest thing you can do, so frequent safe points don’t hurt performance too much.
Could you implement safe points differently, yes but most other methods add constant overhead to the program when GC is not running.
I think the agreement is that signal chaining is the way to go, but no one has spent time on that.
Hmm, that’s probably worthwhile to investigate. If you turn off GC your memory usage will just slowly grow until you run out of memory. What is likely the case is that a thread is in ?blocked? in python when GC is triggered (it should have been marked safe to execute GC concurrently). So if you can get a backtrack from all threads we might see what causes things to be blocked.
An example of this could be that one thread holds the GIL, another thread is blocked on the GIL and the thread holding the GIL calls back into Julia triggering GC and the deadlocks because it is waiting for the other thread to reach a safe point (which it never will until the GIL is released) That is pure speculation, but we had a similar story in CUDA just recently.
It’s only PythonCall’s GC. Since the equation_search function isn’t creating any new PythonCall objects, this won’t be an issue. I verified the memory doesn’t grow with some pretty heavy testing before I moved PySR over from PyJulia.
Now if I did not explicitly convert arrays to Array, but left them as PyList{Any}, this might not be true, as PythonCall controls the finalizers for those. But I didn’t check.
Your speculation may be correct because it seems to at least start the search (?) but once it starts waiting on workers to return, nothing happens.