Segfault on struct creation; gc involved

Hello and thank you for your time.
I have a problem with random segfaults on struct initialization(example attached below). Every stacktrace leads to gc code and every stacktrace goes through struct object initialization.
I tried to run the code on julia versions: 1.0.5, 1.1.1, 1.2, 1.3, 1.3.1, 1.4. Segfaults are present in each version. Same with Atom, Jupyter, VSCode, and terminal.
Since segfaults occur in every version/ide, I suspect that it is an error in my code, but I can’t find it due to little familiarity with C/C++.

I am going to attach the result of versioninfo(), an example stacktrace and struct definition. Please let me know if you want to see other information here.
Example stacktrace:

signal (11): Segmentation fault: 11
in expression starting at /Users/bohdant/PycharmProjects/LandingJSONParser/run.jl:27
gc_try_setmark at /Users/julia/buildbot/worker/package_macos64/build/src/gc.c:1554 [inlined]
gc_mark_scan_obj8 at /Users/julia/buildbot/worker/package_macos64/build/src/gc.c:1704 [inlined]
gc_mark_loop at /Users/julia/buildbot/worker/package_macos64/build/src/gc.c:1971
_jl_gc_collect at /Users/julia/buildbot/worker/package_macos64/build/src/gc.c:2703
jl_gc_collect at /Users/julia/buildbot/worker/package_macos64/build/src/gc.c:2903
maybe_collect at /Users/julia/buildbot/worker/package_macos64/build/src/gc.c:781 [inlined]
jl_gc_pool_alloc at /Users/julia/buildbot/worker/package_macos64/build/src/gc.c:1096
jl_gc_alloc at /Users/julia/buildbot/worker/package_macos64/build/src/./julia_internal.h:233
jl_alloc_svec_uninit at /Users/julia/buildbot/worker/package_macos64/build/src/simplevector.c:60
jl_alloc_svec at /Users/julia/buildbot/worker/package_macos64/build/src/simplevector.c:69
save_env at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:148
forall_exists_subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1434
forall_exists_equal at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1379
subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1323
with_tvar at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:701
subtype_unionall at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:836
exists_subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1412 [inlined]
forall_exists_subtype at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1440
jl_subtype_env at /Users/julia/buildbot/worker/package_macos64/build/src/subtype.c:1805
jl_new_structv at /Users/julia/buildbot/worker/package_macos64/build/src/datatype.c:863
GenericWidget at /Users/bohdant/PycharmProjects/LandingJSONParser/src/widget.jl:52
unknown function (ip: 0x132367b28)
GenericWidget at /Users/bohdant/PycharmProjects/LandingJSONParser/src/widget.jl:68
parse_children! at /Users/bohdant/PycharmProjects/LandingJSONParser/src/parser.jl:42
parse_children! at /Users/bohdant/PycharmProjects/LandingJSONParser/src/parser.jl:45
parse_children! at /Users/bohdant/PycharmProjects/LandingJSONParser/src/parser.jl:45
parse_document at /Users/bohdant/PycharmProjects/LandingJSONParser/src/parser.jl:35
parse_hashes at /Users/bohdant/PycharmProjects/LandingJSONParser/src/parser.jl:17
jl_apply at /Users/julia/buildbot/worker/package_macos64/build/src/./julia.h:1631 [inlined]
do_call at /Users/julia/buildbot/worker/package_macos64/build/src/interpreter.c:328
eval_body at /Users/julia/buildbot/worker/package_macos64/build/src/interpreter.c:0
jl_interpret_toplevel_thunk_callback at /Users/julia/buildbot/worker/package_macos64/build/src/interpreter.c:888
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x12ca0650f)
unknown function (ip: 0x6)
jl_interpret_toplevel_thunk at /Users/julia/buildbot/worker/package_macos64/build/src/interpreter.c:897
jl_toplevel_eval_flex at /Users/julia/buildbot/worker/package_macos64/build/src/toplevel.c:814
jl_parse_eval_all at /Users/julia/buildbot/worker/package_macos64/build/src/ast.c:873
include_string at ./loading.jl:1075
include_string at /Users/bohdant/.julia/packages/CodeTools/kosGY/src/eval.jl:30
unknown function (ip: 0x11a55e5b2)
#188 at /Users/bohdant/.julia/packages/Atom/Kxhul/src/eval.jl:106
withpath at /Users/bohdant/.julia/packages/CodeTools/kosGY/src/utils.jl:30
withpath at /Users/bohdant/.julia/packages/Atom/Kxhul/src/eval.jl:9
#187 at /Users/bohdant/.julia/packages/Atom/Kxhul/src/eval.jl:105 [inlined]
with_logstate at ./logging.jl:395
with_logger at ./logging.jl:491 [inlined]
#186 at /Users/bohdant/.julia/packages/Atom/Kxhul/src/eval.jl:104 [inlined]
hideprompt at /Users/bohdant/.julia/packages/Atom/Kxhul/src/repl.jl:140
macro expansion at /Users/bohdant/.julia/packages/Atom/Kxhul/src/eval.jl:103 [inlined]
macro expansion at /Users/bohdant/.julia/packages/Media/ItEPc/src/dynamic.jl:24 [inlined]
eval at /Users/bohdant/.julia/packages/Atom/Kxhul/src/eval.jl:99
unknown function (ip: 0x11a55412a)
jl_apply at /Users/julia/buildbot/worker/package_macos64/build/src/./julia.h:1631 [inlined]
jl_f__apply at /Users/julia/buildbot/worker/package_macos64/build/src/builtins.c:627
macro expansion at /Users/bohdant/.julia/packages/Atom/Kxhul/src/eval.jl:31 [inlined]
#172 at ./task.jl:333
unknown function (ip: 0x11a4a64dc)
jl_apply at /Users/julia/buildbot/worker/package_macos64/build/src/./julia.h:1631 [inlined]
start_task at /Users/julia/buildbot/worker/package_macos64/build/src/task.c:659
Allocations: 185042629 (Pool: 185014644; Big: 27985); GC: 304

Result of versioninfo():

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4

Struct definition(with related code):

abstract type AbstractWidget end
mutable struct GenericWidget{_widget_type} <: AbstractWidget
    source::Dict{}
    widget_type::AbstractString
    uid::AbstractString
    parent::Ref # => Ref{AbstractWidget}
    children::Array{Ref, 1} # => Vector{Ref{AbstractWidget}}
    styles::Dict{}
    _value::AbstractString
    position::NamedTuple{(:top, :left), <: Union{Tuple{Int, Int}, Tuple{Nothing, Nothing}}}
    size::NamedTuple{(:height, :width), <: Union{Tuple{Int, Int}, Tuple{Nothing, Nothing}}}
    z_index::Union{Int, Nothing}
    visible_percent::Float64
    callback::Union{Nothing, Function}

    function GenericWidget(child_source, widget_type, uid, parent_ref, children, styles, _value, position, size, z_index, visibility, callback)
        new{Symbol(widget_type)}(child_source, widget_type, uid, parent_ref, children, styles, _value, position, size, z_index, visibility, callback)
    end
end
function GenericWidget(child_source::Dict, parent::AbstractWidget)
    widget_type = child_source["type"]
    uid = child_source["_attributes"]["_uid"]
    parent_ref = Ref(parent)
    children = [] # fills later
    styles = @mock parse_styles(child_source, parent) # check Mocking.jl for @mock
    _value = get(child_source, "_value", nothing) === nothing ? "" : child_source["_value"]
    position = (top=nothing, left=nothing)
    size = (height=nothing, width=nothing)
    z_index = nothing
    visibility = 0.
    callback = nothing

    GenericWidget(child_source, widget_type, uid, parent_ref, children, styles, _value, position, size, z_index, visibility, callback)
end

‘widget-type’ is string, used for dispatch.
Also, I tried to disable inlining on struct initialization with no effect and removing @mock with no effect.

As was written above, segfaults occur randomly, having no correlation with data.
Thank you all for your time and have a good day.

Unfortunately, this part means nothing. It most likely means that you have some other incorrect use of unsafe code and the GC is simply the first one that caught it.

If you have a debugger/rr, there are a few ways to debug it but none of them is simple enough to be taught through a few posts on a forum… Much like most debugging process (breaking on the crash and inspect local variables is easy but going from there is hard…)

Sorry, I did not wrote that every stacktrace also goes through struct object initialization, just like in example stacktrace.
Thank you. I added it to the description.

there are a few ways to debug it but none of them is simple enough to be taught through a few posts on a forum…

Is there something I can start from?

(breaking on the crash and inspect local variables

Julia crashes on, at least that, segfault. As I understood, debugging in C++.

Still the same. Both the fact that you always crash in GC and the fact that you always crash in this constructor says very little/none about what the crash is caused by. It at most tells something about your code. (i.e. this is likely the first place where you have a lot of allocations after the invalid code).

Sorry I’m not really sure I understand what you want to say here. Are you just saying that julia segfaults in c++ (it’s actually C btw)? That is right, although it still gives little info about where the issue is.

I don’t really recommand going that way. Instead, I would recommend going through all the unsafe code you write or use (from packages) if you could. You can reduce your code, add in a few explicit GC calls near the crash site etc to reduce the julia code that reproduces the issue. The way I was talking about, and the way I would go about debugging it, is to basically debug the julia internal. The place to start is basically the full devdoc (and in particular the memory allocation/GC related part). I can guarantee you there’ll be a lot of useful thing to learn in that process but I don’t think that’ll be the most efficient way to debug this…

1 Like

Sorry, I don’t have a “silver bullet” to fix this issue. Just my experiences.

I was getting random when calling a external library (GTK) and the only way I figured stuff out was to read the documentation for the library very carefully to make sure I was calling methods in the correct order with the correct data.

If it appeared I was calling everything correctly then I was removing thing until the problem went away, then adding those things back one at a time until the crash would happen again. That would give me an idea of what I was doing wrong.

It’s a slow and inexact say of tracking down the issue, but might be your only solution…

1 Like

Thank you for your responses @yuyichao, @pixel27.

Right now I inserted few explicit gc calls and waiting for testing to end. I will post the results in day or two(with explicit gc calls the data that took 4h to process now requires few days on local machine =)).

When running the code with explicit gc calls I had not segfaults. I have no idea why they disappeared.
Maybe, I will find something using sanitizers.