Segmentation fault in garbage collector

This doesn’t seem like the right place to post this error but none of the other topic areas seemed quite right either. If this is the wrong place please let me know where this should go.

A segmentation fault is occurring in my ray tracer at random points in the execution. It looks like the segmentation fault is occurring in the Julia garbage collector. The error always happens in a garbage collector function called gc_try_setmark.

Unfortunately it’s very difficult to include a minimum working example. The ray tracer is several thousand lines of code. Tracking the cause down is hard because the error occurs randomly, typically after tracing many billions of rays.

Sometimes the program will run to completion and render the entire image and sometimes it will die after rendering just a few hundred scanlines. This is strange because my code does not have a random component. It follows exactly the same execution path each time.

Not doing any pointer arithmetic, no @inbounds annotations on loops, no calls to c code or anything else low level. Just straight Julia code.

All the packages the code uses have been updated in the package manager.

My machine has 128GB RAM and the memory usage never rises much above 5 GB while the program is running so it’s unlikely to be an out of memory error.

This is the output of versioninfo():

Julia Version 1.4.0
Commit b8e9a9ecc6 (2020-03-21 16:36 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD EPYC 7702P 64-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, znver1)
Environment:
JULIA_NUM_THREADS = 64
JULIA_EDITOR = “/usr/share/code/code”

This is the stacktrace:

signal (11): Segmentation fault
in expression starting at /home/brian/repos/gitrepo/Julia/Optics/src/Optics.jl:30
gc_try_setmark at /buildworker/worker/package_linux64/build/src/gc.c:1642 [inlined]
gc_mark_scan_obj8 at /buildworker/worker/package_linux64/build/src/gc.c:1836 [inlined]
gc_mark_loop at /buildworker/worker/package_linux64/build/src/gc.c:2117
_jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:2899
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3105
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:827 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1142
jl_gc_alloc_ at /buildworker/worker/package_linux64/build/src/julia_internal.h:246 [inlined]
jl_gc_alloc at /buildworker/worker/package_linux64/build/src/gc.c:3147
jl_alloc_svec_uninit at /buildworker/worker/package_linux64/build/src/simplevector.c:60
jl_alloc_svec at /buildworker/worker/package_linux64/build/src/simplevector.c:69
save_env at /buildworker/worker/package_linux64/build/src/subtype.c:149
forall_exists_subtype at /buildworker/worker/package_linux64/build/src/subtype.c:1447
forall_exists_equal at /buildworker/worker/package_linux64/build/src/subtype.c:1392
subtype at /buildworker/worker/package_linux64/build/src/subtype.c:1336
with_tvar at /buildworker/worker/package_linux64/build/src/subtype.c:702
subtype_unionall at /buildworker/worker/package_linux64/build/src/subtype.c:841 [inlined]
subtype at /buildworker/worker/package_linux64/build/src/subtype.c:1281
with_tvar at /buildworker/worker/package_linux64/build/src/subtype.c:702
subtype_unionall at /buildworker/worker/package_linux64/build/src/subtype.c:841 [inlined]
subtype at /buildworker/worker/package_linux64/build/src/subtype.c:1281
exists_subtype at /buildworker/worker/package_linux64/build/src/subtype.c:1425 [inlined]
forall_exists_subtype at /buildworker/worker/package_linux64/build/src/subtype.c:1453
jl_subtype_env at /buildworker/worker/package_linux64/build/src/subtype.c:1818
jl_isa at /buildworker/worker/package_linux64/build/src/subtype.c:2056
jl_new_structv at /buildworker/worker/package_linux64/build/src/datatype.c:928
Intersection at /home/brian/repos/gitrepo/Julia/Optics/src/Interval.jl:21
surfaceintersections at /home/brian/repos/gitrepo/Julia/Optics/src/BezierIntersection.jl:237
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:96
unknown function (ip: 0x7faa90095362)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:83
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:87
evalcsg at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:87
surfaceintersection at /home/brian/repos/gitrepo/Julia/Optics/src/CSG.jl:110
unknown function (ip: 0x7faa9008a762)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
render at /home/brian/repos/gitrepo/Julia/Optics/src/Visualization.jl:299
testrender at /home/brian/repos/gitrepo/Julia/Optics/src/Test.jl:981
unknown function (ip: 0x7faa9006e08b)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2158 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1692 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:814
jl_eval_module_expr at /buildworker/worker/package_linux64/build/src/toplevel.c:181
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:640
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:872
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:872
include at ./Base.jl:377
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1692 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:744
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:814
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:843
eval at ./boot.jl:331 [inlined]
eval at ./client.jl:449
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
top-level scope at ./none:3
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:808
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:764
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:843
eval at ./boot.jl:331
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
exec_options at ./client.jl:264
_start at ./client.jl:484
jfptr__start_2076.clone_1 at /home/brian/Applications/julia-1.4.0/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2144 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2322
jl_apply at /buildworker/worker/package_linux64/build/ui/../src/julia.h:1692 [inlined]
true_main at /buildworker/worker/package_linux64/build/ui/repl.c:96
main at /buildworker/worker/package_linux64/build/ui/repl.c:217
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/brian/Applications/julia-1.4.0/bin/julia (unknown line)
Allocations: 4054878845 (Pool: 4054859939; Big: 18906); GC: 2626
ERROR: Failed to precompile Optics [24114763-4efb-45e7-af0e-cde916beb153] to /home/brian/.julia/compiled/v1.4/Optics/60XLU_5Fisl.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922
 [6] eval(::Module, ::Any) at ./boot.jl:331
 [7] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [8] run_backend(::REPL.REPLBackend) at /home/brian/.julia/packages/Revise/Pcs5V/src/Revise.jl:1073
 [9] top-level scope at none:0

You may not be doing it but are you sure none of your dependencies aren’t either.

From your description of the symptom, it really feel like a missing GC root/barrier kind of issue (compiler or user code but unlikely GC). Without the code (or even with the code FWIW) and without a way to reproduce it’ll be very hard to debug

2 Likes

Thanks for the quick response. I am not sure that none of the dependencies are doing low level operations that might be causing the segmentation fault. However, the call to intersection is my code which doesn’t use any functions other than standard Julia. This function seems to be the last call in the stack trace before entering Julia internal code (datatype.c, subtype.c, simplevector.c, and finally cg.c).

I will attempt to create a smaller MWE but it will take some surgery.

The thing about memory bugs is that they don’t blow up where the mistake was made, they blow up later when some other unrelated code accesses memory that’s been corrupted by the memory error. For example, let’s say some code has an incorrect @inbounds annotation and writes to some memory that’s actually out of bounds. That doesn’t cause a problem at the point where the incorrect write happens, but it may corrupt the memory of somethings else. Only later when you’re doing something innocuous that accesses the corrupted memory do you get the problem. If the corrupted memory is data, then you “just” get wrong results. If the corrupted memory is a pointer, then you may dereference that pointer and :boom: segfault.

You could try starting Julia with --check-bounds=yes and see if that catches a bound error.

3 Likes

Julia 1.4.0 has a GC rooting bug and I had a program with a similar symptom. I suppose this symptom is probably too generic to speculate the relationship. But since it is fixed in 1.4.1 (https://github.com/JuliaLang/julia/pull/35387) and updating Julia is easy, I think trying 1.4.1 is a good first step.

3 Likes

Thank you all for the suggestions. I will try check bounds and also try 1.4.1.

Where is 1.4.1 available for download? The official download page on julialang.org only has 1.4.0 and the nightlies build sounds like it’s not stable enough for use. Is there an official location to get 1.4.1 for Linux?

You can just manually change 1.4.0 to 1.4.1 in the links provided here: Download Julia, which should already have the packaged binaries. Alternatively, it’s really easy (at least on Linux) to just build Julia from source yourself, it’s basically just a git clone https://github.com/JuliaLang/julia and then calling make.

You can also try my suggestion just by starting Julia as julia --check-bounds=yes. Works on any version you might have and will catch a bounds violation when it happens. If it’s the missing GC root problem then this won’t trigger, but it’s easy to try.

2 Likes

I upgraded to 1.4.1 and that appears to have fixed the problem. No more segmentation faults in the garbage collector so far. Thanks everybody!

Excellent news! Sorry about the bug.

Whoops! Spoke too soon. That’s the problem with random errors, sometimes they don’t occur. I’ve run more tests and the bug still occurs. Back to the drawing board for me. I’ll see if the inbounds check turns anything up.

tried starting julia with --checkbounds=yes. Got the same segmentation fault in the garbage collector. No bounds exceptions. Looks like this could be hard to track down.

I think debugging and reporting bugs like this will be much easier in Julia 1.5, thanks to Keno’s Add a command line flag to create an rr recording by Keno · Pull Request #35494 · JuliaLang/julia · GitHub

In 1.4.x I think you can still manually install https://github.com/JuliaLang/BugReporting.jl and do

] add https://github.com/JuliaLang/BugReporting.jl.git
using BugReporting
BugReporting.make_interactive_report("rr")

to start the bug reporting session. (Disclaimer: I’ve never tried it myself yet.)

Also, I think building Julia with ASAN helps a lot for reproducing memory bugs like this. But it’s PITA to do this ATM (see https://github.com/JuliaLang/julia/issues/35341).

1 Like