How can I find out why julia crashed?

Hi! I’m working with julia in vscode. A couple of times a day it crashes. By crashing I mean the julia repl closes and I’m dropped into the terminal. Sometimes I get a short glimpse of some error printout from the linux kernel before it vanishes and I don’t know how to restore it. Most often I crash julia due to: Bugs in Makie that call some non-existent library function, ctrl-c ing a bit too often in a row while some big data is loading, accidentally filling RAM, using distributed in ways no developer ever anticipated, ctrl-c ing while threaded code executes, … My feeling is that julia should throw errors, but not crash. So how can I find out the reasons why? Do I have to use a debug build? Or how can I make the kernel error messages display a little longer, so that I at least know whether it was OOM or a segfault. Your advice is highly welcome! In return I can offer advice on how to crash julia :wink:

4 Likes

If it’s oom I think it should be logged in dmesg.

2 Likes

Your question is not focused enough for a good answer. Try opening a new thread given a specific example.

As @jar1 implies, run dmesg. Perhaps you need to do it as root user.

I like the -x and -w options for dmesg, -x shows the type of each message, while -w makes dmesg print each new message as it is created, instead of exiting after printing all existing messages.

Yeah, when that happens, there’s a bug, either a Julia bug, a Julia VS Code extension bug, or a package bug, etc.

1 Like

I’m not necessarily interested in exact solutions to specific problems. More in a strategy / workflow that enables me to pinpoint the root cause within julia. The dmesg hint was already good, found a julia segfault right away. But how do I know where it came from?

A segfault may, depending on your system setup, result in a coredump. These are sometimes available via coredumpctl. Though that’s possibly not the best way to debug a julia crash.

There are various things to suggest when debugging Julia:

1 Like

This is really a bag-of-tricks kind of situation. So it’s best to learn by solving specific issues.

1 Like

This is attempting to get to a specific example — when Julia crashes in VS Code the window closes so quickly you don’t see the error message. If you can’t see the error message, you can’t tell why it crashed. There are a few cases (like OOM) where Julia doesn’t print out much detail, but most crashes do show an exception and stacktrace.

I don’t know if it’s possible to preserve the terminal window in VS Code after the process closes, but that’s what I see as the core ask here. It’s something I’d like myself, in fact!

6 Likes

If so (not sure), this should be moved to the Tooling → VS Code category. But I think that was just an example in the OP, so it may be best to open a new post about that issue.

The very first thing I would do is try to figure out the minimal amount of code required to produce the crash and be specific.

If loading package X.jl and running function foo causes the crash, is it specific to a particular version of package X or can I narrow it down to a smaller subfunction?

Once we have the reproducer we can try reproduce the problem under a debugger such as gdb or obtain a rr trace.

2 Likes

I’m now using a standalone repl, because there I get a minimal error printout after the crash. So this topic could indeed be moved to tooling / vscode - having the terminal window open after a crash is maybe a change that could be implemented there.

Yes, and having the stacktrace after the crash (if it prints) helps with that. However, often julia crashes completely randomly half an hour into a session. For example after executing a cell again a little later, or on typing ?Legend for Makie docs… things that usually work, but once in a while don’t. Getting a reproducer for those is hard.

1 Like

Install tmux and use persistent mode as Remote Development · Julia in VS Code (julia-vscode.org) You’ ll be able to see the crash messages.

You are likely also seeing ctrl+c presses crash repl · Issue #3676 · julia-vscode/julia-vscode (github.com)

Long running tasks crash Julia session · Issue #3674 · julia-vscode/julia-vscode (github.com)

1 Like

Ok, so just now it crashed again during evaluation of a cell that has worked 10+ times flawlessly. Thanks to the terminal-repl I got this printout:

[1910653] signal (11.1): Segmentation fault
in expression starting at /home/max/dr/extract_sentinel_pixels/plot_recipies/plot_averaged_timeseries.jl:12
gc_mark_outrefs at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2517 [inlined]
gc_mark_and_steal at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2746
gc_mark_loop_parallel at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2885
jl_gc_mark_threadfun at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/partr.c:142
start_thread at /lib64/libpthread.so.0 (unknown line)
clone at /lib64/libc.so.6 (unknown line)
Allocations: 506203251 (Pool: 506080440; Big: 122811); GC: 194
Segmentation fault (core dumped)

I assume this alone is not enough information to locate the exact error source. Would an assert build help here? And @nsajko can you explain what that is? How much does it impact performance?

To my impression the number of segfaults has increased when going from 1.10 to 1.10.4, but as I also updated lots of packages at the same time that’s probably no clear indication.

1 Like

This seems to be a bug in Julia’s garbage collector (GC). Try reporting a bug.

Don’t know. Try it if time allows.

It’s a build configuration of Julia, enabling some assertions. When the Julia implementation detects certain unexpected states, it errors with a debug message:

https://docs.julialang.org/en/v1.12-dev/devdocs/build/build/

I sometimes build an assert build locally using make FORCE_ASSERTIONS=1 LLVM_ASSERTIONS=1.

Assert builds for recent Julia commits are available from Github. Go to the Git history, listing all commits, select the checks for a commit (beneath the title of the commit), select “details” for “build”, select “build YOUR_PLATFORM”, select “artifacts”, downloads the archive, unpack it, run julia from where you unpacked it.

NB: Julia developers seems to really like having an rr trace for a crash: Reporting and analyzing crashes (segfaults) · The Julia Language. Not sure if that’s a practical option for you, but the procedure is:

  1. start julia with julia --bug-report=rr, while connected to the network, on Linux
  2. cause a crash
  3. wait for the trace to upload
  4. post a bug report with the link

Not necessarily — pretty much any time you inadvertently corrupt Julia’s internals, it’ll end up tripping up the GC.

The two most common ways to corrupt internals is with bad @inbounds or bad @ccalls (or perhaps pointer without GC.@preserve). I wouldn’t expect assertions to catch this, but starting Julia with --check-bounds=yes is a great start.

3 Likes

I must have overread the --check-bounds=yes and --bug-report=rr suggestions and will try them out. Also now had the time to look into Keno’s blog post. Thanks again for the replies!

1 Like