Non-deterministic segfault (only triggered on GitHub Actions). Tips for debugging?

I’m encountering a strange segfault error in my package tests that appears to be non-deterministic, and is only triggered during CI with GitHub Actions. I’m not able to reproduce locally.

This issue is occurring on MacOS and Ubuntu with Julia 1.5.3, but seems to be non-deterministic, and only triggered during multithreaded tests. First it passed on MacOS and failed on Ubuntu, then later (with no changes to source code, but simply re-running the GitHub Actions jobs) it passed on Ubuntu and failed on MacOS.

I’m not sure if the information I have now is sufficient to open an actionable issue on JuliaLang/julia. Would enabling debug logging on GitHub Actions and posting that be sufficient? Or any ideas from the community about what the underlying issue might be?

Here’s the full stacktrace. The link above leads to the full logs of GitHub Actions.

signal (11): Segmentation fault: 11
in expression starting at /Users/runner/work/Omniscape.jl/Omniscape.jl/test/runtests.jl:91
_ZNSt3__127__tree_balance_after_insertIPNS_16__tree_node_baseIPvEEEEvT_S5_ at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit19updateAddressDieMapENS_8DWARFDieE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit19updateAddressDieMapENS_8DWARFDieE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit19updateAddressDieMapENS_8DWARFDieE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit19updateAddressDieMapENS_8DWARFDieE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit19updateAddressDieMapENS_8DWARFDieE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit19updateAddressDieMapENS_8DWARFDieE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit23getSubroutineForAddressEy at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm9DWARFUnit25getInlinedChainForAddressEyRNS_15SmallVectorImplINS_8DWARFDieEEE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
_ZN4llvm12DWARFContext25getInliningInfoForAddressENS_6object16SectionedAddressENS_19DILineInfoSpecifierE at /Users/runner/hostedtoolcache/julia/1.5.3/x64/lib/julia/libLLVM.dylib (unknown line)
lookup_pointer at /Users/julia/buildbot/worker/package_macos64/build/src/debuginfo.cpp:547
jl_getFunctionInfo at /Users/julia/buildbot/worker/package_macos64/build/src/debuginfo.cpp:0
jl_lookup_code_address at /Users/julia/buildbot/worker/package_macos64/build/src/stackwalk.c:572
lookup at ./stacktraces.jl:107
firstcaller at ./deprecated.jl:110
firstcaller at ./deprecated.jl:105 [inlined]
macro expansion at ./deprecated.jl:90 [inlined]
macro expansion at ./logging.jl:321 [inlined]
#depwarn#797 at ./deprecated.jl:85
depwarn at ./deprecated.jl:80 [inlined]
#cg!#23 at /Users/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:230
cg!##kw at /Users/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:223
#cg#22 at /Users/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:169
cg##kw at /Users/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:169 [inlined]
solve_linear_system at /Users/runner/.julia/packages/Circuitscape/9x9VD/src/core.jl:577
macro expansion at ./timing.jl:233 [inlined]
multiple_solver at /Users/runner/.julia/packages/Circuitscape/9x9VD/src/raster/advanced.jl:284
calculate_current at /Users/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:410
unknown function (ip: 0x14cea7a78)
solve_target! at /Users/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:496
unknown function (ip: 0x14cf09d02)
macro expansion at /Users/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:273 [inlined]
#71#threadsfor_fun at ./threadingconstructs.jl:81
#71#threadsfor_fun at ./threadingconstructs.jl:48
unknown function (ip: 0x144b7ff2c)
jl_apply at /Users/julia/buildbot/worker/package_macos64/build/src/./julia.h:1690 [inlined]
start_task at /Users/julia/buildbot/worker/package_macos64/build/src/task.c:705
Allocations: 136222984 (Pool: 136126583; Big: 96401); GC: 104
ERROR: Package Omniscape errored during testing (received signal: 11)
Stacktrace:
 [1] pkgerror(::String) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Types.jl:52
 [2] test(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:1578
 [3] test(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:327
 [4] #test#61 at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:67 [inlined]
 [5] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:coverage,),Tuple{Bool}}}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Pkg/src/API.jl:80
 [6] top-level scope at none:1
Error: Process completed with exit code 1.

Tip to debug something on GitHub Actions: GitHub - mxschmitt/action-tmate: Debug your GitHub Actions via SSH by using tmate to get access to the runner system itself.

3 Likes

That’s perfect! I will check that out. Thanks!

Do we have RR tracing and upload available on gh runners?

-viral

It isn’t preinstalled, but once you SSH into a Linux runner, you can regularly use BugReporting.jl or start Julia with --bug-report=rr

Of course… with tmate the error is not being triggered. As far as I know, I’m using the exact command that the julia-runtest action uses: julia --inline=yes --depwarn=yes --project=@. --color=yes -e 'using Pkg; Pkg.test(coverage=true)' This is so strange…

EDIT: passing check-bounds=yes (which I forgot initially) may have triggered it? Though the error message isn’t the same.

Okay, here’s some slightly different output this time around…

Got exception outside of a @test
  TaskFailedException:
  ReadOnlyMemoryError()
  Stacktrace:
   [1] lookup(::Ptr{Nothing}) at ./stacktraces.jl:107
   [2] firstcaller(::Array{Union{Ptr{Nothing}, Base.InterpreterIP},1}, ::Tuple{Symbol}) at ./deprecated.jl:110
   [3] firstcaller at ./deprecated.jl:105 [inlined]
   [4] macro expansion at ./deprecated.jl:90 [inlined]
   [5] macro expansion at ./logging.jl:321 [inlined]
   [6] depwarn(::String, ::Symbol; force::Bool) at ./deprecated.jl:85
   [7] depwarn at ./deprecated.jl:80 [inlined]
   [8] cg!(::Array{Float64,1}, ::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}; abstol::Float64, reltol::Float64, tol::Float64, maxiter::Int64, log::Bool, statevars::IterativeSolvers.CGStateVariables{Float64,Array{Float64,1}}, verbose::Bool, Pl::AlgebraicMultigrid.Preconditioner{AlgebraicMultigrid.MultiLevel{AlgebraicMultigrid.Pinv{Float64},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},SparseArrays.SparseMatrixCSC{Float64,Int64},SparseArrays.SparseMatrixCSC{Float64,Int64},LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int64}},AlgebraicMultigrid.MultiLevelWorkspace{Array{Float64,1},1}}}, kwargs::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:initially_zero,),Tuple{Bool}}}) at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:230
   [9] cg(::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}; kwargs::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:Pl, :tol, :maxiter),Tuple{AlgebraicMultigrid.Preconditioner{AlgebraicMultigrid.MultiLevel{AlgebraicMultigrid.Pinv{Float64},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},SparseArrays.SparseMatrixCSC{Float64,Int64},SparseArrays.SparseMatrixCSC{Float64,Int64},LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int64}},AlgebraicMultigrid.MultiLevelWorkspace{Array{Float64,1},1}}},Float64,Int64}}}) at /home/runner/.julia/packages/IterativeSolvers/upIVv/src/cg.jl:169
   [10] solve_linear_system(::Dict{String,String}, ::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}, ::AlgebraicMultigrid.Preconditioner{AlgebraicMultigrid.MultiLevel{AlgebraicMultigrid.Pinv{Float64},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},AlgebraicMultigrid.GaussSeidel{AlgebraicMultigrid.SymmetricSweep},SparseArrays.SparseMatrixCSC{Float64,Int64},SparseArrays.SparseMatrixCSC{Float64,Int64},LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int64}},AlgebraicMultigrid.MultiLevelWorkspace{Array{Float64,1},1}}}) at /home/runner/.julia/packages/Circuitscape/9x9VD/src/core.jl:577
   [11] macro expansion at ./timing.jl:233 [inlined]
   [12] multiple_solver(::Dict{String,String}, ::SparseArrays.SparseMatrixCSC{Float64,Int64}, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}) at /home/runner/.julia/packages/Circuitscape/9x9VD/src/raster/advanced.jl:284
   [13] calculate_current(::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Array{Float64,2}, ::Circuitscape.RasterFlags, ::Dict{String,String}, ::DataType) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:410
   [14] solve_target!(::Int64, ::Int64, ::Dict{String,Int64}, ::Array{Float64,2}, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Omniscape.OmniscapeFlags, ::Dict{String,String}, ::Circuitscape.RasterFlags, ::Circuitscape.OutputFlags, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::Array{Union{Missing, Float64},2}, ::String, ::String, ::Float64, ::Float64, ::Float64, ::Float64, ::Array{Float64,2}, ::Array{Float64,3}, ::Array{Float64,3}, ::DataType) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/utils.jl:496
   [15] macro expansion at /home/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:273 [inlined]
   [16] (::Omniscape.var"#71#threadsfor_fun#15"{Dict{String,Int64},DataType,Omniscape.OmniscapeFlags,String,String,Float64,Float64,Float64,Float64,Dict{String,String},Int64,Circuitscape.OutputFlags,Circuitscape.RasterFlags,ProgressMeter.Progress,Int64,UnitRange{Int64}})(::Bool) at ./threadingconstructs.jl:81
   [17] (::Omniscape.var"#71#threadsfor_fun#15"{Dict{String,Int64},DataType,Omniscape.OmniscapeFlags,String,String,Float64,Float64,Float64,Float64,Dict{String,String},Int64,Circuitscape.OutputFlags,Circuitscape.RasterFlags,ProgressMeter.Progress,Int64,UnitRange{Int64}})() at ./threadingconstructs.jl:48
  Stacktrace:
   [1] wait at ./task.jl:267 [inlined]
   [2] threading_run(::Function) at ./threadingconstructs.jl:34
   [3] macro expansion at ./threadingconstructs.jl:93 [inlined]
   [4] run_omniscape(::Dict{String,String}, ::Array{Union{Missing, Float64},2}; reclass_table::Array{Union{Missing, Float64},2}, source_strength::Array{Union{Missing, Float64},2}, condition1::Array{Union{Missing, Float64},2}, condition2::Array{Union{Missing, Float64},2}, condition1_future::Array{Union{Missing, Float64},2}, condition2_future::Array{Union{Missing, Float64},2}, wkt::String, geotransform::Array{Float64,1}, write_outputs::Bool) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:268
   [5] run_omniscape(::String) at /home/runner/work/Omniscape.jl/Omniscape.jl/src/main.jl:561
   [6] top-level scope at /home/runner/work/Omniscape.jl/Omniscape.jl/test/runtests.jl:101
   [7] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1115
   [8] top-level scope at /home/runner/work/Omniscape.jl/Omniscape.jl/test/runtests.jl:93
   [9] include(::String) at ./client.jl:457
   [10] top-level scope at none:6
   [11] eval(::Module, ::Any) at ./boot.jl:331
   [12] exec_options(::Base.JLOptions) at ./client.jl:272
   [13] _start() at ./client.jl:506

I’m going to post a Julia issue with this, the above errors, and a bug report from RR. Thanks @giordano for the advice.

Issue posted: Sudden non-deterministic segfault error when using multithreading · Issue #39278 · JuliaLang/julia · GitHub

Could it be due to some random number used somewhere in your tests?

Not sure. It is triggered during an iterative linear solver but that’s in a dep of a dep so I’m not really familiar with the internals.

I would consider switching the segfaulting runners to use a debug build of Julia with LLVM debug+asserts enabled, if possible. Usually segfaults in LLVM occur because an assertion somewhere in Julia or LLVM is being skipped that would have been hit were you using a debug build. With a debug+asserts build, if you do hit an assertion, you get an informative message and a line number.

1 Like