Bindings to external lib work with 0.5, but fail with 0.6-rc1


#1

I’m trying to update my LCIO.jl package (which uses CxxWrap.jl) to julia-0.6, but I’m failing.
A simple

Pkg.add("LCIO")
Pkg.test("LCIO")

passes in julia-0.5, but fails in julia-0.6-rc1.
I can’t figure out what’s going wrong. The error message is

julia(39259,0x7fff974a63c0) malloc: *** error for object 0x7fa145011450: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

signal (6): Abort trap: 6
while loading no file, in expression starting on line 0
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 1634105 (Pool: 1632785; Big: 1320); GC: 1
==================================================[ ERROR: LCIO ]==================================================

failed process: Process(`/Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia -Ccore2 -J/Applications/Julia-0.6.app/Contents/Resources/julia/lib/julia/sys.dylib --compile=yes --depwarn=yes --check-bounds=yes --code-coverage=none --color=yes --compilecache=yes /Users/stru821/.julia/v0.6/LCIO/test/runtests.jl`, ProcessSignaled(6)) [0]

===================================================================================================================
ERROR: LCIO had test errors

I’ve narrowed this down to https://github.com/jstrube/LCIO.jl/blob/master/test/runtests.jl#L81 and https://github.com/jstrube/LCIO.jl/blob/master/test/runtests.jl#L93 Commenting out those two lines seems to work, but I can’t find a fault (and neither can julia-0.5, apparently). When re-shuffling a couple of lines to change the order of initialization on the C++ side, the error message disappears, but the test still fails with

==================================================[ ERROR: LCIO ]==================================================

failed process: Process(`/Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia -Ccore2 -J/Applications/Julia-0.6.app/Contents/Resources/julia/lib/julia/sys.dylib --compile=yes --depwarn=yes --check-bounds=yes --code-coverage=none --color=yes --compilecache=yes /Users/stru821/.julia/v0.6/LCIO/test/runtests.jl`, ProcessSignaled(11)) [0]

===================================================================================================================

I’m a bit at a loss how to debug this. Can anybody think of something obvious that changed in julia-0.6 that could cause this? Any suggestions for debugging this further are greatly appreciated.


#2

Running in gdb shows the crash happens when the finalizers are run:

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00007fffc7479243 in IMPL::LCCollectionVec::~LCCollectionVec (this=0x28bc310, __in_chrg=<optimized out>)
    at /home/username/.julia/v0.6/LCIO/deps/builds/lciowrap/LCIO-02-07-04/src/cpp/src/IMPL/LCCollectionVec.cc:64
64            delete *iter++ ;
(gdb) bt
#0  0x00007fffc7479243 in IMPL::LCCollectionVec::~LCCollectionVec (this=0x28bc310, __in_chrg=<optimized out>)
    at /home/username/.julia/v0.6/LCIO/deps/builds/lciowrap/LCIO-02-07-04/src/cpp/src/IMPL/LCCollectionVec.cc:64
#1  0x00007fffc74792f9 in IMPL::LCCollectionVec::~LCCollectionVec (this=0x28bc310, __in_chrg=<optimized out>)
    at /home/username/.julia/v0.6/LCIO/deps/builds/lciowrap/LCIO-02-07-04/src/cpp/src/IMPL/LCCollectionVec.cc:67
#2  0x00007fffc7479d04 in IMPL::LCEventImpl::~LCEventImpl (this=0x27f4840, __in_chrg=<optimized out>)
    at /home/username/.julia/v0.6/LCIO/deps/builds/lciowrap/LCIO-02-07-04/src/cpp/src/IMPL/LCEventImpl.cc:53
#3  0x00007fffc7479d49 in IMPL::LCEventImpl::~LCEventImpl (this=0x27f4840, __in_chrg=<optimized out>)
    at /home/username/.julia/v0.6/LCIO/deps/builds/lciowrap/LCIO-02-07-04/src/cpp/src/IMPL/LCEventImpl.cc:56
#4  0x00007fffc78135a2 in void cxx_wrap::detail::finalizer<IMPL::LCEventImpl>(_jl_value_t*) () from /home/username/.julia/v0.6/LCIO/deps/usr/lib/liblciowrap.so
#5  0x00007ffff77751c6 in schedule_all_finalizers (flist=0x7ffff7fb3b68, flist=0x7ffff7fb3b68) at /home/username/src/julia/julia/src/gc.c:263
#6  jl_gc_run_all_finalizers (ptls=ptls@entry=0x7ffff7fb31f8) at /home/username/src/julia/julia/src/gc.c:273
#7  0x00007ffff774678b in jl_atexit_hook (exitcode=exitcode@entry=0) at /home/username/src/julia/julia/src/init.c:265
#8  0x00000000004014fc in main (argc=<optimized out>, argv=<optimized out>) at /home/username/src/julia/julia/ui/repl.c:265

The reason this shows up on 0.6 is because 0.5 had a bug where C finalizers were not run on exit, but this is fixed now.


#3

(I’m sure I sent other replies, but they seem to have gotten lost???)

How did you manage to get the full stack trace? I’ve compiled LCIO with debug symbols, but no luck getting a full stack trace. I can see the same, though, the segfault happens in the destructor. Thank you for the pointer.


#4

I just ran gdb julia and then run runtests.jl from the test directory. I’m a bit surprised I got this much info, since I didn’t even make any effort to do a debug build.


#5

OK, thanks. That’s what I did as well… will keep trying to pin this down. Anyway, you’re giving me a good starting point.
Thank you.


#6

The best I can say right now is that this seems to be a pointer ownership problem. If I’m adding the objects to a collection, the crash occurs. If I just construct them in vacuum, no crash. This might be more appropriately discussed in the CxxWrap github tracker, so I’ll try to play with this a bit more and then open an issue there when I have more information.


#7

Looks like the default compilation flag is RelWithDebInfo.
This is definitely a pointer ownership problem, but I would attribute this to the C++ library, not julia or CxxWrap.
Thanks for your help. Annoying, but I’ll hack around it.