Help writing a timeout macro

I have a function that can either execute quickly or take a very long time. I want to kill the function after a certain amount of time so I can fall back on a more reliable method. I should be able to do this using a macro. The problem is I am a bit out of my depth. I took info from here and here and this is what I have so far

macro timeout(time,f)
    quote
        t = Task($f)
        schedule(t)
        Timer(x -> (istaskdone(t) || Base.throwto(t,InterruptException())),$time)
        t
    end
end

The idea being I can just do

a = @timeout 5 short_or_long(args)

but my implementation is woefully wrong. Could anyone help?

2 Likes

I don’t believe there is a way to interrupt a Julia task that does not explicitly yield control.
https://github.com/JuliaLang/julia/issues/6283

In HTTP.jl we have a timeout task that closes the network connection after a timeout. This causes the main task to abort with an EOF error next time it tries to use the connection. https://github.com/JuliaWeb/HTTP.jl/blob/master/src/TimeoutRequest.jl#L20-L27

Forgive my ignorance, but what does it mean to not explicitly yield control. Does this mean that a task that does explicitly yields control can be interrupted?

The way I understand it, a Julia task is it’s own master. If it calls yield (or sleep, or wait or some other API that ends up calling one of those), then it explicitly hands control to the Julia runtime. At all other times, a Julia task just executes compiled machine code like a compiled C program. i.e. there is no supervisor, or interpreter, or master thread, or any opportunity for preempting a Julia task.

Does this mean that a task that does explicitly yields control can be interrupted?

Yes

julia> t = @async try while true sleep(1) ; println("tick") ; end catch e println("stopped on $e") endtick
tick
tick
tick

julia> @async Base.throwto(t, EOFError())
stopped on EOFError()
Task (runnable) @0x000000011555b610

julia> t
Task (done) @0x000000011555b3d0
1 Like

I’ll just leave my approach here…

macro timeout(expr, seconds=-1, cb=(tsk) -> Base.throwto(tsk, InterruptException()))
    quote
        tsk = @task $expr
        schedule(tsk)

        if $seconds > -1
            Timer((timer) -> $cb(tsk), $seconds)
        end

        return fetch(tsk)
    end
end

julia> @timeout (sleep(3); println("done")) 3.1
done

julia> @timeout (sleep(3); println("done")) 3
ERROR: TaskFailedException:
InterruptException:
Stacktrace:
 [1] try_yieldto(::typeof(Base.ensure_rescheduled)) at ./task.jl:656
 [2] wait at ./task.jl:713 [inlined]
 [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:106
 [4] _trywait(::Timer) at ./asyncevent.jl:110
 [5] wait at ./asyncevent.jl:128 [inlined]
 [6] sleep at ./asyncevent.jl:213 [inlined]
 [7] (::var"#29#31")() at ./task.jl:112
Stacktrace:
 [1] wait at ./task.jl:267 [inlined]
 [2] fetch(::Task) at ./task.jl:282
 [3] top-level scope at REPL[3]:10
2 Likes

Thanks, this was super helpful! I found two problems with this approach though. First, the return confusingly just terminates the expression; it doesn’t allow you to assign a variable to the output of the task (compare to x = return 4). Second, when you queue up multiple tasks in a row, the Timer from the old task is still active and will end up throwing the interrupt exception to the main program (not sure why that happens). Here’s my version, which fixes these issues (also switches the argument order)

macro timeout(seconds, expr)
    quote
        tsk = @task $expr
        schedule(tsk)
        Timer($seconds) do timer
            istaskdone(tsk) || Base.throwto(tsk, InterruptException())
        end
        fetch(tsk)
    end
end

x = @timeout 1 begin
    sleep(0.5)
    println("done")
    1
end
@assert x == 1
8 Likes

Thanks for this useful snippet.

I wanted something very similar but with the possibility of a default value in case of failure. So I came up with this:

macro timeout(seconds, expr, fail)
    quote
        tsk = @task $expr
        schedule(tsk)
        Timer($seconds) do timer
            istaskdone(tsk) || Base.throwto(tsk, InterruptException())
        end
        try
            fetch(tsk)
        catch _
            $fail
        end
    end
end

x = @timeout 1 begin
    sleep(1.1)
    println("done")
    1
end "failed"
4 Likes

this is amazing guys. thanks!

I have tried all day long to replicate @hhaensel’s snippet above considering some arguments to be given to the function. Since I know Julia’s @task doesn’t allow any function to be called with arguments, I have tried fixing the latter by calling the function with the variable. However, nothing happens.
My main hypothesis is that i is interpreted by the macro as a symbol and not as a value. Is there a trick to overcome this problem in non-interactive mode?
Here is my minimal (non)-working example:

function test()
    f(x) = begin 2*x ; println(2*x) ; sleep(1.) end
    #println("Testing f: f(3)")
    for i in 1:10
        @timeout 10 f(i) "fail"
        #println(i)
    end
end

I apologize for my poor English ^^’

I dived more deeply in the @timeout macro by removing the try catch_ end statement and just leaving fetch(tsk) instead. Now, I get the following error:

julia> test()
ERROR: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:345 [inlined]
 [2] fetch
   @ ./task.jl:360 [inlined]
 [3] test()
   @ Main ~/Documents/GIT/GravityMachine/src/solveSPAexactly/solveSPA.jl:28
 [4] top-level scope
   @ REPL[7]:1

    nested task error: UndefVarError: f not defined
    Stacktrace:
     [1] (::var"#170#173")()
       @ Main ./task.jl:134

When I bring f out of test (so that f becomes global), Julia yields the same error but with i instead of f:

nested task error: UndefVarError: i not defined
    Stacktrace:
     [1] (::var"#187#189")()
       @ Main ./task.jl:134

I feel that my problem is related to a misunderstanding of meta-programming. In my opinion I am trying to evaluate the code before parsing it (which is impossible?!).

It seems what i was looking for is the following function (and not a macro):

function timeout(f, arg, seconds, fail)
    tsk = @task f(arg...)
    schedule(tsk)
    Timer(seconds) do timer
        istaskdone(tsk) || Base.throwto(tsk, InterruptException())
    end
    try
        fetch(tsk)
    catch _;
        fail
    end
end
1 Like

I’ve ran into issues with this one when using it in tasks where the task mostly completes within seconds but sometimes takes minutes with good reason, i.e. having a long timeout.
It would essentially flood the scheduler with many timers that were waiting to finish.

I changed it a bit to this, and it seems to not run into the same issues anymore:

macro timeout(seconds, expr, err_expr=:(nothing))
    esc(quote
        tsk__ = @task $expr
        schedule(tsk__)
        start_time__ = time()
        curt__ = time()
        Base.Timer(0.001, interval=0.001) do timer__
            if tsk__ === nothing || istaskdone(tsk__)
                close(timer__)
            else
                curt__ = time()
                if curt__ - start_time__ > $seconds
                    Base.throwto(tsk__, InterruptException())
                end
            end
        end
        try
            fetch(tsk__)
        catch err__
            if err__.task.exception isa InterruptException
                RemoteHPC.log_error(RemoteHPC.StallException(err__))
                $err_expr
            else
                rethrow(err__.task.exception)
            end
        end
    end)
end

I successfully reached the solution i was looking for by using the esc(.) function. Then, I get the following snippet:

macro timeout(seconds, expr, fail)
    quote
        tsk = @task $esc(expr)
        schedule(tsk)
        Timer($(esc(seconds))) do timer
            istaskdone(tsk) || Base.throwto(tsk, InterruptException())
        end
        try
            fetch(tsk)
        catch _
            $(esc(fail))
        end
    end
end
2 Likes