Segmentation fault over repeated sparse Cholesky factorizations

This is related to this post but I manage to isolate the problem quite a bit.

I have a sparse symmetric matrix Kee. I found that if I repeatedly factorize this matrix in a for loop (never running out of memory) at some point (which is different in every execution) I trigger a segfault. The code is

function test(Kee)
    Keeᵏ = Kee
    cholesky(Keeᵏ)
    return nothing
end

N = 144
for k in 0:(N - 1)
    test(Kee)
end

and the matrix Kee is not very large ~10^5 DOFs:

julia> Kee
238683×238683 SparseMatrixCSC{Float64, Int64} with 17708367 stored entries:
⎡⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠿⠿⠿⠿⠿⠿⠿⠿⠿⣧⣤⣤⣤⣤⣤⣤⣤⣤⣤⣤⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣿⡿⢿⣿⣿⣿⣿⣿⣿⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣏⣿⣿⣹⣿⣿⣿⣿⣿⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣿⣷⣾⣿⣿⣿⣿⣿⣿⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎦

The segmentation fault error is:

[25316] signal 11 (1): Segmentation fault
in expression starting at /home/javier/.julia/dev/NLVibrationAnalysis/examples/arias_each_k/save_cb_for_k.jl:46
cholmod_l_rowcolcounts at /home/javier/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/bin/../lib/julia/libcholmod.so.5 (unknown line)
cholmod_l_analyze_ordering at /home/javier/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/bin/../lib/julia/libcholmod.so.5 (unknown line)
cholmod_l_analyze_p2 at /home/javier/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/bin/../lib/julia/libcholmod.so.5 (unknown line)
cholmod_l_analyze at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/wrappers.jl:1364
jfptr_cholmod_l_analyze_10921 at /home/javier/.julia/compiled/v1.11/NLVibrationAnalysis/oYBpf_JyrlB.so (unknown line)
analyze at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/cholmod.jl:717
#symbolic#11 at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/cholmod.jl:1440
symbolic at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/cholmod.jl:1425 [inlined]
#cholesky#14 at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/cholmod.jl:1494 [inlined]
cholesky at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/cholmod.jl:1490 [inlined]
#cholesky#15 at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/cholmod.jl:1609 [inlined]
cholesky at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/SparseArrays/src/solvers/cholmod.jl:1609 [inlined]
test at /home/javier/.julia/dev/NLVibrationAnalysis/examples/arias_each_k/save_cb_for_k.jl:41
unknown function (ip: 0x79cd84152352)
top-level scope at /home/javier/.julia/dev/NLVibrationAnalysis/examples/arias_each_k/save_cb_for_k.jl:47
jl_toplevel_eval_flex at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
include_string at ./loading.jl:2734
_include at ./loading.jl:2794
include at ./sysimg.jl:38
unknown function (ip: 0x79cd8410a542)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_call at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:663
jl_interpret_toplevel_thunk at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/interpreter.c:821
jl_toplevel_eval_flex at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10097.1 at /home/javier/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_XvZAg.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14693.1 at /home/javier/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_XvZAg.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73609.1 at /home/javier/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x79ce3542a1c9)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 11435005 (Pool: 11434645; Big: 360); GC: 470
[1]    25316 segmentation fault (core dumped)  julia --project --threads=auto

I am running the code in Ubuntu 24.04 LTS, and the version info is

julia> versioninfo()
Julia Version 1.11.3
Commit d63adeda50d (2025-01-21 19:42 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900KF
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 32 default, 0 interactive, 16 GC (on 32 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_VSCODE_REPL = 1

I assume that this behavior is not desired, and I would like to know why this is happening and if there is some quick workaround, because I imagine this error would require some fix.

2 Likes

Please provide a reproducable example when raising issues. We do not have your Kee matrix so cannot run this ourselves.

I was able to reproduce this (or something similar) anyway:

julia> versioninfo()
Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × Intel(R) Core(TM) i9-14900
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)

julia> using LinearAlgebra, SparseArrays

julia> X = sprandn(10000, 10000, 0.01); A = X'*X;

julia> for i in 1:100; @show i; cholesky(A); end
i = 1
i = 2
# ...
i = 14 # usually doesn't make it to 30

[process exited with code 1 (0x00000001)]
You can now close this terminal with Ctrl+D, or press Enter to restart.

Note that the Cholesky factorization of this A is fully dense (or extremely close to it, in any case). Presumably the original poster has something with better structure.

In different runs it would either Killed out of Julia back to my WSL terminal or kill the whole terminal process like above. In one case it wrecked my WSL so hard that I had to reboot it.

I also tried with a much smaller X = sprandn(1000, 1000, 0.01) and that managed to go tens of thousands of iterations without failure, so this might be limited to larger inputs.

Sorry for not providing the matrix Kee, I was hoping that from the error message the issue could be resolved.

Actually, after some additional explorations where I tried also the Pardiso library, I found that a similar problem was also present. This, and some advice from the SparseArrays developers, made me think that maybe there was something wrong with my computer. I tried the same code in my lapton and it was working fine.

In the end, it resulted that my RAM from the original computer is faulty, and that is what was causing the segmentation faults. I think that the error you are reproducing could be running out of memory, try adding GC.gc() after each factorization to verify that it is the case, because I am pretty sure this is not the same error I was having.