PythonCall Segmentation Fault

Hi there, I don’t have a reproducible minimal example yet, but maybe someone stumpled into the same thing

I get a Segmentation fault almost right at the beginning when using PythonCall inside Pluto. The key trigger is _PyInterpreterState_GET.

[2023088] signal (11.1): Segmentation fault
in expression starting at none:1
_PyInterpreterState_GET at /usr/local/src/conda/python-3.12.4/Include/internal/pycore_pystate.h:133 [inlined]
get_state at /usr/local/src/conda/python-3.12.4/Objects/obmalloc.c:866 [inlined]
_PyObject_Free at /usr/local/src/conda/python-3.12.4/Objects/obmalloc.c:1850 [inlined]
PyObject_Free at /usr/local/src/conda/python-3.12.4/Objects/obmalloc.c:830 [inlined]
unicode_dealloc at /usr/local/src/conda/python-3.12.4/Objects/unicodeobject.c:1612
Py_DecRef at /home/ssahm/.julia/packages/PythonCall/S5MOg/src/C/pointers.jl:297 [inlined]
#3 at /home/ssahm/.julia/packages/PythonCall/S5MOg/src/GC/GC.jl:59 [inlined]
with_gil at /home/ssahm/.julia/packages/PythonCall/S5MOg/src/C/gil.jl:10 [inlined]
enqueue at /home/ssahm/.julia/packages/PythonCall/S5MOg/src/GC/GC.jl:58
py_finalizer at /home/ssahm/.julia/packages/PythonCall/S5MOg/src/Core/Py.jl:46
unknown function (ip: 0x7f4b0f4b6525)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
run_finalizer at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:318
jl_gc_run_finalizers_in_list at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:408
run_finalizers at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:454
enable_finalizers at ./gcutils.jl:157 [inlined]
unlock at ./locks-mt.jl:68 [inlined]
multiq_deletemin at ./partr.jl:168
trypoptask at ./task.jl:977
jfptr_trypoptask_75363.1 at /home/ssahm/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
get_next_task at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/partr.c:337 [inlined]
ijl_task_get_next at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/partr.c:390
poptask at ./task.jl:985
wait at ./task.jl:994
task_done_hook at ./task.jl:675
jfptr_task_done_hook_75286.1 at /home/ssahm/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_finish_task at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/task.c:320
start_task at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/task.c:1249
Allocations: 16700556 (Pool: 16686881; Big: 13675); GC: 26

Since the stacktrace includes pyfinalizer I think that might be Segfaults with frequent python calls from thread 1 on multithreaded Julia · Issue #201 · JuliaPy/PythonCall.jl · GitHub. IIUC, not only must python objects only be accessed from thread 1, but they also must be GC’d from thread 1, and this python GC occurs as part of Julia’s GC, which can happen at (almost) any time. Therefore having any multithreading in a program that includes python usage is a bit fraught. You can turn off python GC (which queues the objects to be GC’d once it’s re-enabled) around threaded regions though. IMO a better solution is needed though.

2 Likes

Would the solution here be for the Julia finalizer to refuse to propagate the finalization unless it is on the first thread?

A finalizer can reject finalization by reinstalling itself.

An alternative is that the finalizer adds the object to be finalized to a channel. The channel is then read by a sticky task worker running on thread 1, which periodically checks the channel and runs Python garbage collection.

A few years ago I created a package to address similar issues with JavaCall.jl

Perhaps an updated version with some thread awareness could be useful.

2 Likes

That sounds like a good idea to me.

Something I don’t understand is why PythonCall needs to be interacted with exclusively from thread-1 (according to the docs). What is special about thread-1? Why is it not enough to just put a lock around all libpython accesses so we never access the library concurrently, but potentially do so from various threads/tasks?

My mental model is that concurrent access is problematic since python has some global state, and it needs one access to finish before another starts to safely use that state. But I feel like this must be incorrect or incomplete, since a lock would be sufficient to serialize access.

1 Like

I found Initialization, Finalization, and Threads — Python 3.12.4 documentation which I think has some explanation, that python has some thread-specific state, and that foreign threads need to initialize that state to interact with python (and then tear it down afterwards).

It seems like we could do that setup/teardown from multiple julia threads though, but not sure if it would be too slow to be worth it over sending everything over a channel to a task on thread-1 to pass to python.