Debugging memory corruption in Julia code

I have a complicated Julia code and for some inputs it crashes with all the memory corruption errors you can think of:

  • segmentation fault
  • munmap error
  • corrupted double linked list
  • malloc consolidate invalid chunk size
  • unaligned fastbin chunk detected

How do I even start debugging this? If this is C/C++ I can use Valgrind to find out the error

One of the obvious first steps on a segfault would be to run the same code, but by starting julia with --check-bounds=yes. This would enable all bounds checks, and won’t propagate errant indexing operations that might lead to a segfault.

5 Likes

I agree that --check-bounds=yes is the first step: it allows identifying invalid @inbounds annotations which are a typical source of crash.

If your application is multi-threaded, I would also try launching julia with the -t 1 flag to force running everything on a single thread: if you don’t see a crash with -t 1 but do when using multiple threads, you may have a race condition in your code.

Apart from these two things, I would audit any use of unsafe_ call in your code and replace them by safe variants if you can, to see if you have an error instead of a crash. Also check out pointer and pointer_from_objref which can be misused. If you do any ccall, check whether your calls are safe as well (an out-of-bounds access from some C code loaded onto julia may cause some corruptions I guess). And if none of that leads to anything, you may want to look for the same functions in the code of the packages you are using.

If you do not have a stacktrace because the crash is too abrupt, using a debug build of julia may help sometimes. And if you want you can use Valgrind or gdb, I put links to the relevant julia developer documentation (see also Reporting and analyzing crashes (segfaults) · The Julia Language). Last recourse may be to build a sanitizer version of julia… But hopefully you won’t need to get that far.

2 Likes