Concurrency violation during garbage collections?

Really, I guess my question is should I open a bug against Julia for this? It doesn’t appear to be causing issues, and I expect that this will only happen during development. First the stack trace:

error in running finalizer: ErrorException("concurrency violation detected")
error at ./error.jl:33
concurrency_violation at ./condition.jl:8
assert_havelock at ./condition.jl:25 [inlined]
assert_havelock at ./condition.jl:48 [inlined]
assert_havelock at ./condition.jl:72 [inlined]
notify at ./condition.jl:126
#notify#515 at ./condition.jl:124 [inlined]
notify at ./condition.jl:124 [inlined]
notify at ./condition.jl:124 [inlined]
send_del_client at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:265
finalize_ref at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:97
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:365
run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3260
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:880 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1204
poptask at ./task.jl:755
wait at ./task.jl:763 [inlined]
uv_write at ./stream.jl:992
unsafe_write at ./stream.jl:1064
unsafe_write at /home/josh/.julia/packages/HTTP/cxgat/src/ConnectionPool.jl:172 [inlined]
unsafe_write at /home/josh/.julia/packages/HTTP/cxgat/src/Streams.jl:98
unsafe_write at ./io.jl:646 [inlined]
write at ./io.jl:687 [inlined]
#39 at /home/josh/me/julia/ImageManagement/src/server/images/http_response.jl:93
#open#317 at ./io.jl:330
open##kw at ./io.jl:328 [inlined]
http_send_file at /home/josh/me/julia/ImageManagement/src/server/images/http_response.jl:58 [inlined]
get_image at /home/josh/me/julia/ImageManagement/src/server/images/get_image.jl:113
handle_req at /home/josh/me/julia/ImageManagement/src/server/images/server.jl:47
#35 at /home/josh/me/julia/ImageManagement/src/server/images/server.jl:64
macro expansion at /home/josh/me/julia/ImageManagement/src/tasks.jl:61 [inlined]
#3 at ./threadingconstructs.jl:169
unknown function (ip: 0x7f7e48ef0a3c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))
error in running finalizer: ErrorException("concurrency violation detected")
error at ./error.jl:33
concurrency_violation at ./condition.jl:8
assert_havelock at ./condition.jl:25 [inlined]
assert_havelock at ./condition.jl:48 [inlined]
assert_havelock at ./condition.jl:72 [inlined]
notify at ./condition.jl:126
#notify#515 at ./condition.jl:124 [inlined]
notify at ./condition.jl:124 [inlined]
notify at ./condition.jl:124 [inlined]
send_del_client at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:265
finalize_ref at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:97
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
run_finalizer at /buildworker/worker/package_linux64/build/src/gc.c:278
jl_gc_run_finalizers_in_list at /buildworker/worker/package_linux64/build/src/gc.c:365
run_finalizers at /buildworker/worker/package_linux64/build/src/gc.c:394
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3260
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:880 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1204
poptask at ./task.jl:755
wait at ./task.jl:763 [inlined]
uv_write at ./stream.jl:992
unsafe_write at ./stream.jl:1064
unsafe_write at /home/josh/.julia/packages/HTTP/cxgat/src/ConnectionPool.jl:172 [inlined]
unsafe_write at /home/josh/.julia/packages/HTTP/cxgat/src/Streams.jl:98
unsafe_write at ./io.jl:646 [inlined]
write at ./io.jl:687 [inlined]
#39 at /home/josh/me/julia/ImageManagement/src/server/images/http_response.jl:93
#open#317 at ./io.jl:330
open##kw at ./io.jl:328 [inlined]
http_send_file at /home/josh/me/julia/ImageManagement/src/server/images/http_response.jl:58 [inlined]
get_image at /home/josh/me/julia/ImageManagement/src/server/images/get_image.jl:113
handle_req at /home/josh/me/julia/ImageManagement/src/server/images/server.jl:47
#35 at /home/josh/me/julia/ImageManagement/src/server/images/server.jl:64
macro expansion at /home/josh/me/julia/ImageManagement/src/tasks.jl:61 [inlined]
#3 at ./threadingconstructs.jl:169
unknown function (ip: 0x7f7e48ef0a3c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:839
unknown function (ip: (nil))

I’ve tried to create a MWE but so far have failed, so I’m not 100% on what is causing this issue.

My situation is I have 2 HTTP servers running in a single Julia process. The first serves static resources and is uses threads and Threads.@async to process the requests. Processing a request involves querying the database then reading a file from disk and sending the contents back the browser.

The second HTTP server just handles WebSockets and doesn’t use threads. It does use @async for each message that comes over the WebSocket to allow multiple operations to be run asynchronously. This server will push long running operations out to a distributed process using remotecall,

So the threaded server doesn’t use Distributed while the non threaded server does. During development I have these both in the same process but normally they will be run in their own processes.

While this appears similar to:

and the associated:

Those seem be to be accessing the distributed servers on the various threads. While in my situation only thread 1 should be accessing the distributed servers. In fact I usually get this crash fairly early when the threaded HTTP server is serving a bunch of static files and the distributed server is actually idle.

What happens is that Distributed internally does buffering of sending certain kinds of messages, and that implementation uses finalizers to handle clean-up of various structures, as well as uses regular Condition variables to notify listeners about state updates. Unfortunately, as implemented, both of these are both not thread-safe, and can potentially end up deadlocking (or throwing the assertion you’re getting) because they don’t use a construct that properly manages the lock while in a finalizer. Normally this would only show up in threaded code, but I suspect it also can break under heavily-async single-threaded code.

Thankfully there is a PR open that would fix most of these, and it’s been working out decently well for me. Give it a try and let me know if that helps!

1 Like