Debugging segfault when calling versioninfo()

I’m having some issues that Julia (1.10.x, both from the official tarball or from juliaup, probably same binary) segfaults when I do a simple versioninfo(). This is on a compute node of an HPC system, so there might be weird library and/or filesystem things going on, or even something related to supported CPU instruction sets and/or features.

snellius paulm@gcn29 08:47 ~$ which julia
/sw/arch/RHEL8/EB_production/2023/software/juliaup/1.14.5-GCCcore-12.3.0/bin/julia
snellius paulm@gcn29 08:47 ~$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.1 (2024-02-13)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Segmentation fault

On a different node type (AMD vs Intel CPU, etc) of the same HPC system things don’t crash:

julia> versioninfo()
Julia Version 1.10.1
Commit 7790d6f0641 (2024-02-13 20:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 96 × AMD EPYC 7F72 24-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
Threads: 1 default, 0 interactive, 1 GC (on 96 virtual cores)
Environment:
  LD_LIBRARY_PATH = /sw/arch/RHEL8/EB_production/2023/software/GCCcore/12.3.0/lib64

I had seen this before and thought it was fixed by updating to the latest Julia version, but alas. What would be a good strategy to figure out what is going wrong here? Are there any relevant debug or trace flags I can set?

Are the libraries loaded through LD_LIBRARY_PATH compatible with what julia expects? What happens if you invoke julia like LD_LIBRARY_PATH="" julia?

Well, you’re onto something…

snellius paulm@gcn29 09:33 ~$ LD_LIBRARY_PATH= julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.1 (2024-02-13)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.10.1
Commit 7790d6f0641 (2024-02-13 20:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 72 \ufffd\ufffd Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, icelake-server)
Threads: 1 default, 0 interactive, 1 GC (on 72 virtual cores)
Environment:
  LD_LIBRARY_PATH = 

Edit: initially tested on the wrong node type, updated now

Still interesting that there seems to be a node-dependent effect, though. I’ll do some more testing

Found the apparent culprit, it’s a libunwind module that gets pulled in as dependency for some software:

snellius paulm@gcn29 09:43 /gpfs/work4/1/viz/paulm/stevens-rb/Ra1e11$ m list

Currently Loaded Modules:
  1) 2023   2) GCCcore/12.3.0   3) juliaup/1.14.5-GCCcore-12.3.0   4) libunwind/1.6.2-GCCcore-12.3.0

 

snellius paulm@gcn29 09:43 /gpfs/work4/1/viz/paulm/stevens-rb/Ra1e11$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.1 (2024-02-13)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Segmentation fault

snellius paulm@gcn29 09:43 /gpfs/work4/1/viz/paulm/stevens-rb/Ra1e11$ m unload libunwind/1.6.2-GCCcore-12.3.0
snellius paulm@gcn29 09:43 /gpfs/work4/1/viz/paulm/stevens-rb/Ra1e11$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.1 (2024-02-13)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.10.1
Commit 7790d6f0641 (2024-02-13 20:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 72 \ufffd\ufffd Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, icelake-server)
Threads: 1 default, 0 interactive, 1 GC (on 72 virtual cores)
Environment:
  LD_LIBRARY_PATH = /sw/arch/RHEL8/EB_production/2023/software/GCCcore/12.3.0/lib64

I don’t know why the issue manifests with versioninfo() in particular, but it makes sense that a wrong libunwind causes this kind of issue - it’s very fundamental to how julia handles errors, so presumably anything that could make use of it would break.

It’s not just in versioninfo(), btw:

snellius paulm@gcn29 11:14 /projects/1/viz/paulm/stevens-rb/Ra1e11$ julia --project=. -e 'using HDF5'
Segmentation fault

Yes, that’s not too surprising. libunwind is very fundamental, so if the version is wrong, pretty much anything can go wrong anywhere.

My suggestion would be to use a clean LD_LIBRARY_PATH.