Simple Timeout of Function

I often find myself in a situation where I’m calling a Julia function many times in a for loop, and I would like to run the function for, say, 10 seconds, and if it hasn’t finished executing in that time, it should return something trivial (e.g. NaN) and skip to the next iteration.

For example, when solving large systems of non-linear equations or ODEs, over a broad range of parameters, sometimes you hit on a specific set of parameters which cause the solvers to diverge or take a small eternity to converge. When trying to get a quick-and-dirty visualization of how the system behaves, it is extremely annoying when one call (out of maybe the 20 required to generate a plot) takes 15 minutes to execute, when the rest take 1 second.

Is there a way in Julia of forcing a blackbox function to timeout if it hasn’t executed within a fixed period of time?

After digging around in various forums and trying various approaches, I’m very happy to share what appears to be a relatively general-purpose solution. The solution was provided by hhaensel here:

They define a macro as follows,

macro timeout(seconds, expr, fail)
    quote
        tsk = @task $expr
        schedule(tsk)
        Timer($seconds) do timer
            istaskdone(tsk) || Base.throwto(tsk, InterruptException())
        end
        try
            fetch(tsk)
        catch _
            $fail
        end
    end
end

They give the trivial application, which assigns x = 1 if the expression within begin/end takes less than 1 second, else it assigns x = “failed”.

x = @timeout 1 begin
    sleep(1.1)
    println("done")
    1
end "failed"

Playing around with this macro, it appears to work quite well when calling more complex libraries. For example, here are applications to NLsolve.jl and DifferentialEquations.jl:

using NLsolve

function f!(F, x)
        sleep(0.1)
    F[1] = (x[1]+3)*(x[2]^3-7)+18
    F[2] = sin(x[2]*exp(x[1])-1)
end

function j!(J, x)
    J[1, 1] = x[2]^3-7
    J[1, 2] = 3*x[2]^2*(x[1]+3)
    u = exp(x[1])*cos(x[2]*exp(x[1])-1)
    J[2, 1] = x[2]*u
    J[2, 2] = u
end

MaxTime = 10
res1 = @timeout MaxTime begin
    nlsolve(f!, j!, [ 0.1; 1.2]).zero
end NaN
println("Given MaxTime = 10 seconds, we get res1 = ", res1)

MaxTime = 0.1
res2 = @timeout MaxTime begin
    nlsolve(f!, j!, [ 0.1; 1.2]).zero
end NaN
println("Given MaxTime = 0.1 seconds, we get res2 = ", res2)

On my computer, this returns the correct solution when MaxTime = 10s, but returns NaN when MaxTime = 0.1s. Similarly for DifferentialEquations.jl we have,

using DifferentialEquations
function f(u, p, t) 
    sleep(0.001)
    return 1.01 * u
end
u0 = 1 / 2
tspan = (0.0, 1.0)
prob = ODEProblem(f, u0, tspan)
@time sol = solve(prob, Tsit5(), reltol = 1e-8, abstol = 1e-8)

MaxTime = 100
sol = @timeout MaxTime begin
    sol = solve(prob, Tsit5(), reltol = 1e-8, abstol = 1e-8)
end NaN
println("Given 100 seconds, sol converges, so sol(0.1) = ", sol(0.1))

MaxTime = 0.01
sol = @timeout MaxTime begin
    sol = solve(prob, Tsit5(), reltol = 1e-8, abstol = 1e-8)
end NaN

println("Given 0.01 seconds, sol doesn't converge, and sol = ", sol)

There are a few challenges. For example, BlackBoxOptim automatically recovers and returns a partial solution in case of an InterruptException, which is what the macro throws. You need to turn this off explicitly to make the macro work as intended:

using BlackBoxOptim

function rosenbrock2d(x)
    sleep(0.0001)
    return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end

MaxTime = 1
x = @timeout MaxTime begin
    sleep(0.9)
    res = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2, 
        TraceMode = :silent, RecoverResults=false)
    res
end NaN

println("Given MaxTime = 1 second, we get x = ", x)

MaxTime = 30
x = @timeout MaxTime begin
    sleep(0.9)
    res = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2, 
        TraceMode = :silent, RecoverResults=false)
    res
end NaN

println("Given MaxTime = 30 seconds, we converge, with get best_fitness(x) = ", best_fitness(x))

I don’t know enough about the @task macro to determine if/when this might fail to work as intended, but for now it certainly seems a useful macro.

5 Likes

I also want to mention another solution from later in the same thread. This defines a timeout function, which wraps an existing function and gives a different return value upon failure,

function timeout(f, arg, seconds, fail)
    tsk = @task f(arg)
    schedule(tsk)
    Timer(seconds) do timer
        istaskdone(tsk) || Base.throwto(tsk, InterruptException())
    end
    try
        fetch(tsk)
    catch _;
        fail
    end
end

This is more flexible when you want a function to have a time limit. For example, we can define a ‘time capped’ version of a function which returns a different value if the functio ntakes too long:

function rosenbrock2d(x)
    sleep(rand()*0.1)
    return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end

function rosenbrock2dtimelimit(x)
    return timeout(rosenbrock2d, x, 0.05, 1e99)
end

#We get roughly a 50/50 split between real values and timed out values.
for i = 1:10
    println(rosenbrock2dtimelimit([1,2]))
end

timeout() is a useful useful wrapper to objective functions in BlackBoxOptim or NLsolve.

1 Like

I tried the macro and function versions presented here but I did not succeed in using them with my codes. For example, consider the computation

julia> @time inv(rand(8000,8000));
  3.400652 seconds (8 allocations: 980.530 MiB)

I have not been able to interrupt it with any timeouts. Is it because the above expression calls an external library that refuses to yield? I need to be able to interrupt this kind of computations, what do I do?

Minor tweak - I think the expressions and seconds should be escaped so it works also within modules. Updated code below:

"""
    @timeout(seconds, expr_to_run, expr_when_fails)

Simple macro to run an expression with a timeout of `seconds`. If the `expr_to_run` fails to finish in `seconds` seconds, `expr_when_fails` is returned.

# Example
```julia
x = @timeout 1 begin
    sleep(1.1)
    println("done")
    1
end "failed"

```
"""
macro timeout(seconds, expr_to_run, expr_when_fails)
    quote
        tsk = @task $(esc(expr_to_run))
        schedule(tsk)
        Timer($(esc(seconds))) do timer
            istaskdone(tsk) || Base.throwto(tsk, InterruptException())
        end
        try
            fetch(tsk)
        catch _
            $(esc(expr_when_fails))
        end
    end
end
1 Like

Meanwhile I learned that the correct way to abort a task is

schedule(task, InterruptException(), error = true)

instead of Base.throw()

1 Like

I am reviving this old thread to clarify that such a macro works when the evaluated expression yields from time to time. In my experience, tight julia loops and calls to external libraries do not yield.

Here is the above code along with @hhaensel’s correction and an example that works.

"""
    @timeout(seconds, expr_to_run, expr_when_fails)

Α macro to run an expression with a timeout of `seconds`. If the `expr_to_run` fails to finish in `seconds` seconds, `expr_when_fails` is returned. Note that the timeout will fail when the expression to be evaluated does not yield control back to the scheduler; in general, calls to libraries and tight loops will not be interrupted.

# Example
```julia
julia> function cpu_burn(n)
           s = 0.0
           for i in 1:n
               s += sin(i)^2 + cos(i)^2
               i % 10_000 == 0 && yield()
           end
           return s
       end
cpu_burn (generic function with 1 method)

julia> @time X = @timeout 1 cpu_burn(10^8) (-1)
  1.021633 seconds (24.90 k allocations: 1.206 MiB, 2.66% compilation time)
-1

julia> @time cpu_burn(10^8)
  2.970053 seconds
1.0e8
"""
macro timeout(seconds, expr_to_run, expr_when_fails)
  quote
    tsk = @task $(esc(expr_to_run))
    schedule(tsk)
    Timer($(esc(seconds))) do timer
      istaskdone(tsk) || schedule(tsk, InterruptException(); error=true)
    end
    try
      fetch(tsk)
    catch _
      $(esc(expr_when_fails))
    end
  end
end
1 Like

I want to propose another finetuning to the timeout macro.
Currently, the timer that controls whether the task has finished is waiting the full timeout period. However, precompilation doesn’t like pending tasks. Therefore I have modified the solution to also clear the timer in case that the task has been successfully completed.

macro timeout(seconds, expr_to_run, expr_when_fails=nothing)
    quote
        timer = Channel{Timer}(1)
        tsk = @task begin
            x = $(esc(expr_to_run))
            close(take!(timer))
            x
        end
        schedule(tsk)

        put!(timer, Timer($(esc(seconds))) do timer
            istaskdone(tsk) || schedule(tsk, InterruptException(); error=true)
        end)

        try
            fetch(tsk)
        catch _
            $(esc(expr_when_fails))
        end
    end
end
1 Like

I still spent some thoughts on this seemingly simple functionality and that it would deserve a package.
While the last example will certainly fulfill most users’ needs, there is room for improvement, particularly for cases where the task to be aborted do not yield.
I’ve seen that there’s a package draft by @ararslan that has never been released. But that addresses tasks that are spawned via Distributed.jl. Moreover, it currently does not precompile without adaptations.
I’ve looked into his code and found the communication via channels very appealing, I myself had started to use them in my last proposal. So I’ve now come up with a new solution that is lengthier than the previous one, but it also provides killtask() which is something, people are missing in Julia, and many are still using @async Base.throwto(task, InterruptException()).
The solution below will not block, even if tasks cannot be stopped, at least if multiple tasks are available.
I’ve setup a draft repository TimeOut.jl to have the community help to make this a viable package. The code could as well be included in a visible package, e.g. ThreadPools. Alternatively it could live on its own and enhance methods from ThreadPools with a timeout parameter via extensions.
So far my kind of lengthy thoughts …
I’d be happy to receive feedback here or directly via issues/PRs on GitHub.

module TimeOut

import Base.Threads.@spawn

export timeout, @timeout, killtask

function killtask(task::Task; retry::Integer = false, sleep_interval::Real = 0.01, wait_for_started_duration::Real = 0.1)
    # never-throwing and safe against race conditions
    sleep(wait_for_started_duration) # give the task some time to start
    if !istaskstarted(task)
        @warn "Task not started!\n consider increasing wait_for_started_duration"
        return task
    end
    while !(istaskdone(task) || istaskfailed(task)) && retry ≥ 0
        try
            schedule(task, InterruptException(), error=true)
        catch e
        end
        retry -= 1
        retry ≥ 0 && sleep(sleep_interval)
    end
    return task
end

function _killtimer(@nospecialize(interval), abort_channel::Channel{Bool}, result_channel::Channel{Any}, @nospecialize(failed))
    Timer(interval) do _
        if failed isa Function
            put!(result_channel, try
                failed() # check if it throws
            catch e
                e # if it does, return the error to be rethrown in the main task
            end)
        else
            put!(result_channel, failed)
        end
        put!(abort_channel, true)
    end
end

function timeout(@nospecialize(f::Base.Callable), @nospecialize(interval), ::Type{T} = Any;
    @nospecialize(failed), @nospecialize(abort_msg::AbstractString = ""), @nospecialize(retry::Integer = 10)
) where T
    result_channel = Channel(1)
    abort_channel = Channel{Bool}(1)
    task_channel = Channel{Task}(1)

    task = @spawn try
        inner_task = @async put!(result_channel, f())
        put!(task_channel, inner_task)
        # place the timeout watcher in the same thread as preocessing task (inter-thread interrupts potentially crash julia)
        @async take!(abort_channel) && killtask(inner_task; retry)
    catch e
        # in case that killtimer has not yet kicked in, place the error in the result channel
        isempty(result_channel) && put!(result_channel, e)
    end

    timer = _killtimer(interval, abort_channel, result_channel, failed)
    result = take!(result_channel)

    close(timer)
    isempty(abort_channel) && put!(abort_channel, false) # make sure the abort watcher task ends
    isempty(task_channel) || istaskdone(take!(task_channel)) || @warn("Could not kill task after $retry retries.")

    result isa Exception && rethrow(result)
    
    convert(T, result)
end

macro timeout(interval, expr_to_run, expr_when_fails = nothing, T = Any)
    :(timeout(() -> $(esc(expr_to_run)), $(esc(interval)), $(esc(T)), failed = () -> $(esc(expr_when_fails))))
end

end # module TimeOut