Timeout function

Hello, is this a good way to create timeout function?

struct TimeoutError <: Exception end

function timeout(f::Function, nsec)
    task = current_task()
    timer = Timer(nsec) do t
        Base.throwto(task, TimeoutError())
    end
    try
        f()
    catch 
        rethrow()
    finally
        close(timer)
    end
end
1 Like

No. Unfortunately a “good timeout function” is not supported in julia. E.g.

julia> @time timeout(5) do 
       rand(Int, 1<<12, 1<<12)*rand(Int, 1<<12, 1<<12); end
 28.297039 seconds (16.13 k allocations: 384.802 MiB, 0.09% gc time, 0.05% compilation time)

The issue is you would need to preempt your f() and throw the exception. However, if programs need to assume that it can be preempted at any time and an exception inserted, then it becomes very hard to reason about its behavior, both for humans and for the optimizing compiler.

Instead, ask the operating system for support.

For example, run a separate julia process and just kill the process. Since you want to recursively kill all child processes as well, consider cgroups. Or start a docker-container / spin up a VM, and nuke that from orbit (“only way to be sure”).

6 Likes

One way to craft a “good timeout function” in Julia is by using an asynchronous task and strategically placing yield points between intensive calculations.

While this approach can complicate the design of your function, it provides finer-grained control over partial computations compared to simply terminating a process using OS-level support.

Here’s an example to illustrate this idea. I’ve used the Visor.jl package because it offers a convenient way to create a minimal example. (Note: The primary goal of Visor is to supervise tasks and restart them in case of failures.)

using Visor

dim = 1 << 10

function good_timeout_function(pd, a, b)
    @info "Starting mytask"
    c = similar(a)
    for i in 1:dim
        for j in 1:dim
            c[i, j] = a[i, :]' * b[:, j]

            # Check if a shutdown was requested
            if isshutdown(pd)
                @info "Premature termination: row=$i, col=$j"
                return
            end

            # Yield control to allow other tasks to run
            yield()
        end
    end
    @info "Mytask: computation complete"
end

a = rand(Int, dim, dim)
b = rand(Int, dim, dim)

# Request a shutdown after 3 seconds
Timer((tmr) -> shutdown(), 3)

# Start the supervised good_timeout_function
supervise(process(good_timeout_function, args=(a, b)))

This example demonstrates how to handle long-running computations with a supervised task. By checking for a shutdown condition (isshutdown) and adding yield() points, the function gracefully manages interruptions without leaving the system in an inconsistent state.

This approach is particularly useful for applications where precise control over task execution and termination is critical, especially when compared to abrupt OS-level process termination.

5 Likes

A.
Good to know and I suppose Visor.jl does that:

Visor is influenced by Erlang Supervisor design principles

since Erlang is, if I recall, based on process architecture. Also PostgreSQL (except on Windows, then threads I believe used).

@andrey2185, I suppose you CAN and rather should use Malt.jl, since it also works on Windows, by making a timeout function similar to @attdona’s just calling the friendly Malt.stop instead of using Visor (Visor is basically for restarts, which is overkill):

I was expecting Visor to use Distributed.jl o or some alternative like Malt, but I don’t see that nor how it works except:

B.
I’m just curious, since you CAN kill processes, and it’s very clean, it will close files, and not lose data, IF correctly implemented… why can’t you make threads and kill those thread in your program?

[In short I think you CAN kill threads, but to do it will they need to cooperate and they don’t and nothing is Julia forces them to. I would both like to know how you can kill threads, and add, you can with some C hack, not yet add from Julia, nor I suppose kill from Julia?]

To answer my question, I think I recall reading that you can’t always kill threads and it’s a libc problem (or kernel problem?), as opposed to a Julia problem. I’m just thinking why exactly. I suppose you can kill the thread, but if it has locked something it will never unlock, so it might work in some cases as in with “lock free programming”. Another problem is that files might not be flushed and closed and finalizers not run, or would they be, for relevant parts?

2 Likes

Hi PĂ ll, I will try to clarify. No, Visor.jl does not terminate operating system (OS) processes in the traditional sense. Instead, it provides mechanisms to terminate @async tasks and @Threads.spawn threads cleanly within Julia.

The code snippet you linked above is not a direct dependency of Visor.jl. Instead, it is a utility—a C wrapper—that facilitates handling both SIGINT and SIGTERM signals. Specifically, it enables delivering a SIGINT signal to a forked Julia child process, which can then handle it gracefully in Julia.

This utility is useful in scenarios where an external scheduler manages OS processes via SIGTERM, which, as far as I know, cannot be directly captured by Julia. The shims/sigvisor.c code implements this functionality, ensuring signal handling in a way that allows the Julia process to respond appropriately.

Below is the code for shims/sigvisor.c:

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

int pid = -1;

void handle_signal(int sig)
{
    kill(pid, SIGINT);
}

int main (int argc, char **argv)
{
    int status, ret;
    char **myargs;
    char *myenv [] = { NULL };

    if (argc < 2) {
        printf("usage: sv <program> <args...>\n");
        exit(2);
    }

    if ((pid = fork()) < 0) {  
        perror("fork");  
        exit(1);  
    }

    if (pid == 0) {
        // The child process
        myargs = argv + 1;
        if (execvp(argv[1], myargs) == -1) {
            perror("exec error");
        }
    } else {
        // The parent process
        signal(SIGTERM, handle_signal);
        signal(SIGINT,  handle_signal);
        
        if ((ret = waitpid(pid, &status, 0)) == -1)
            perror("waitpid error");

        if (ret == pid)  {
            if (WIFEXITED(status))
                return WEXITSTATUS(status);
        }
    }
}

This utility ensures that signals like SIGTERM can be translated into SIGINT for the Julia process, allowing for clean and predictable behavior.

3 Likes

I mean, you sure can do that.

So something was running on a thread. You cannot suffer it to continue running, so you preempt it and now have the instruction pointer and all register states.

You can maybe walk the stack to see what function is running – exactly like a profiler does.

But apriori, you can’t say whether you are in this situation:

lock(something)
someRef[] += 1   #<---- here!
unlock(something)

If you just kill the thread, then the lock doesn’t get released, and the runtime will break.

Garbage collection has the same issue (needs to stop-the-world).

There are 3 approaches to that:

  1. Cooperative multithreading – don’t preempt! Instead, sprinkle safepoint()-calls through the code, which are treated by compiler and programmer as “this can throw an exception!”. One particularly nice implementation of safepoints is a an unused memory read; to interrupt at next safepoint, you unmap that magic memory address, the CPU/MMU traps / segfaults, you grab the fault, do your stuff, and continue. Oh, but now buggy code that doesen’t safepoint often enough won’t react to attempts to interrupt it! This is what happened in my example, the matmul loop doesn’t contain safepoints.
  2. Isolation – ok, each interruptible thing gets its own tiny world to play in. Post-interruption, that tiny world has corrupted state, so it is entirely torn down. All the tiny worlds can interact only via well-controlled ways that can be safely unwound. Hey, that’s what a process is, with an operating system as the manager! It has significant overhead, both in performance/resources and in convenience.
  3. Deal with it – everything can be interrupted at any point. This kind of machine is super-duper annoying to program for! You only do this for tiny tiny sections and only if you must (you’re writing very specific parts of kernel/firmware code).

I mean, you can always preempt and continue, no issues. But preempt-and-modify-the-world is very very hard.

3 Likes