Dealing with external libraries that don't restore old signal handlers

TLDR:

if the code is segfaulting outside Ccalls AFTER Ccalls in threaded GC, it might be that the C library has loaded a custom signal handler and is competing with julia’s handler.

The solution is something like

"""
sigsegv_handler(new=C_NULL, old=zeros(UInt8, 256); signal) -> typeof(new)


new is the new signal handler to set for the signal and old is the structure where the backup will be made. setting each to `C_NULL` will ignore them. i.e. by default it will just backup the handler. The arguments need to be able to hold a `struct sigaction`.
"""
function sigsegv_handler(new=C_NULL, old=zeros(UInt8, 256); signal)
    @ccall sigaction(
        signal::Cint,
        new::Ptr{Cvoid},
        old::Ptr{Cvoid})::Cint
    return old
end

handler = sigsegv_hanlder(; signal=11)
@ccall C code ...
sigsegv_hanlder(handler, C_NULL; signal=11)

Long version

One of my go to options for blackbox optimization is nomad. The biggest issue with nomad is the julia interface is this issue. Long story short the interface works fine in single thread but using any sort of threads after or while using nomad makes the problem segfault. After some talk mostly with @mkitti, @Sukera, and @jpsamaroo this was the cause and building on mkitti’s idea, the solution.

The first guess was that the call back was being run from another thread. This cannot be as NOMAD_jll is compiled w/o openmp support and it would not explain why it segfaults AFTER using nomad. After some debugging with gdb i came to the conlusion that it must be the signal hanlders that have been messed with. Lo and behold it was indeed true. Nomad loads signal handlers but never releases them.

that’s when mkitti had the brilliant idea to just back it up on julia’s side with sigaction. The solution was indeed using it back up the state and restoring before calling the user function and when going out of NOMAD.jl. The only remaining problem right now is having a robust estimate of the struct size, 256 Bytes seems like a good estimate as it’s 159 Bytes on x86_64.

#include <signal.h>
#include <stdio.h>

int main() {
    printf("%ld", sizeof(struct sigaction));
    return 0;
}

or the more questionable version

function get_sigsegv_handler()
    v = zeros(UInt8, 1024)
    @ccall sigaction(
        11::Cint,
        C_NULL::Ptr{Cvoid},
        v::Ptr{Cvoid}
    )::Cint
    return v
end
findlast(x -> x != 0, get_sigsegv_handler())

can be used to calculate the size needed for storing the struct.

Whether @ccall/@cfunction should automagically restore the hanlder is another question and i hope someone with more understand can chip in.

PS.

i think it’s bad design from the part of bbopt/nomad to not restore the handlers. but even if they did restore it, it wouldn’t have solved half of the problems.

MWE

using Base.Threads, NOMAD
n = 5
@assert nthreads() > 1
function main()
    function f(x)
        @threads for i in 1:nthreads() * 3
            for j in 1:100
                g = rand(j, j)
            end
            GC.gc()    
        end
        (true, true, x[1])
    end
    pb = NomadProblem(n,
            1,
            ["OBJ"],
            f;
            upper_bound=[100.0 for _ in 1:n],
            lower_bound=[0.0 for _ in 1:n])

    pb.options.max_bb_eval = 1000
    pb.options.max_time = 20
    result = @time NOMAD.solve(pb, rand(n))
end
main()

@threads for i in 1:100
    for j in 1:100
        g = rand(j, j)
    end
    GC.gc()    
end
1 Like

By running some C code we can compute the sizeof(struct sigaction) and currently that is 152 bytes on Linux.

I suspect that we can obtain this directly by parsing the C header signals.h or more specifically bits/sigaction.h.

Nonetheless, the above approach will probably work. One might want to leave a buffer in case sigaction has additional fields that are just zero valued.

Yes, you can. But the size depends on the system headers you use.

but which version will julia call? it should be constant depending on the target os, rigth?

the version is used when compiling julia

if the C library is compiled using BB, then those system headers should be matched.

1 Like

it depends on triplets(e.g. Sys.MACHINE).

I will have to sit down and think about how to pull that specific struct out the Clang parse.

If I just have Clang.jl parse the the following

#include <signal.h>

const long sizeof_struct_sigaction = sizeof(struct sigaction);

I’m starting to think the easiest path would be to compile a small shared library for now and perhaps integrate a facility into Julia later.

Thanks @Gnimuc , I figured it out!

julia> temp_header = tempname()*".h"
"/tmp/jl_Cvhf3N.h"

julia> open(temp_header, "w") do f
           write(f,
           """
           #include <stddef.h>
           #include <signal.h>
           
           const size_t sizeof_struct_sigaction = sizeof(struct sigaction);
           """
           )
       end
106

julia> cursor = Clang.getTranslationUnitCursor(trans)
CLCursor (CLTranslationUnit) /tmp/jl_6DBaOh.h

julia> sigaction = Clang.search(children(cursor), c->Clang.name(c) == "sigaction")[1]
CLCursor (CLStructDecl) sigaction

julia> Clang.getCursorType(sigaction) |> Clang.getSizeOf
152