I often find myself in a situation where I’m calling a Julia function many times in a for loop, and I would like to run the function for, say, 10 seconds, and if it hasn’t finished executing in that time, it should return something trivial (e.g. NaN) and skip to the next iteration.
For example, when solving large systems of non-linear equations or ODEs, over a broad range of parameters, sometimes you hit on a specific set of parameters which cause the solvers to diverge or take a small eternity to converge. When trying to get a quick-and-dirty visualization of how the system behaves, it is extremely annoying when one call (out of maybe the 20 required to generate a plot) takes 15 minutes to execute, when the rest take 1 second.
Is there a way in Julia of forcing a blackbox function to timeout if it hasn’t executed within a fixed period of time?
After digging around in various forums and trying various approaches, I’m very happy to share what appears to be a relatively general-purpose solution. The solution was provided by hhaensel here:
They define a macro as follows,
macro timeout(seconds, expr, fail)
quote
tsk = @task $expr
schedule(tsk)
Timer($seconds) do timer
istaskdone(tsk) || Base.throwto(tsk, InterruptException())
end
try
fetch(tsk)
catch _
$fail
end
end
end
They give the trivial application, which assigns x = 1 if the expression within begin/end takes less than 1 second, else it assigns x = “failed”.
x = @timeout 1 begin
sleep(1.1)
println("done")
1
end "failed"
Playing around with this macro, it appears to work quite well when calling more complex libraries. For example, here are applications to NLsolve.jl and DifferentialEquations.jl:
using NLsolve
function f!(F, x)
sleep(0.1)
F[1] = (x[1]+3)*(x[2]^3-7)+18
F[2] = sin(x[2]*exp(x[1])-1)
end
function j!(J, x)
J[1, 1] = x[2]^3-7
J[1, 2] = 3*x[2]^2*(x[1]+3)
u = exp(x[1])*cos(x[2]*exp(x[1])-1)
J[2, 1] = x[2]*u
J[2, 2] = u
end
MaxTime = 10
res1 = @timeout MaxTime begin
nlsolve(f!, j!, [ 0.1; 1.2]).zero
end NaN
println("Given MaxTime = 10 seconds, we get res1 = ", res1)
MaxTime = 0.1
res2 = @timeout MaxTime begin
nlsolve(f!, j!, [ 0.1; 1.2]).zero
end NaN
println("Given MaxTime = 0.1 seconds, we get res2 = ", res2)
On my computer, this returns the correct solution when MaxTime = 10s, but returns NaN when MaxTime = 0.1s. Similarly for DifferentialEquations.jl we have,
using DifferentialEquations
function f(u, p, t)
sleep(0.001)
return 1.01 * u
end
u0 = 1 / 2
tspan = (0.0, 1.0)
prob = ODEProblem(f, u0, tspan)
@time sol = solve(prob, Tsit5(), reltol = 1e-8, abstol = 1e-8)
MaxTime = 100
sol = @timeout MaxTime begin
sol = solve(prob, Tsit5(), reltol = 1e-8, abstol = 1e-8)
end NaN
println("Given 100 seconds, sol converges, so sol(0.1) = ", sol(0.1))
MaxTime = 0.01
sol = @timeout MaxTime begin
sol = solve(prob, Tsit5(), reltol = 1e-8, abstol = 1e-8)
end NaN
println("Given 0.01 seconds, sol doesn't converge, and sol = ", sol)
There are a few challenges. For example, BlackBoxOptim automatically recovers and returns a partial solution in case of an InterruptException, which is what the macro throws. You need to turn this off explicitly to make the macro work as intended:
using BlackBoxOptim
function rosenbrock2d(x)
sleep(0.0001)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
MaxTime = 1
x = @timeout MaxTime begin
sleep(0.9)
res = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2,
TraceMode = :silent, RecoverResults=false)
res
end NaN
println("Given MaxTime = 1 second, we get x = ", x)
MaxTime = 30
x = @timeout MaxTime begin
sleep(0.9)
res = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2,
TraceMode = :silent, RecoverResults=false)
res
end NaN
println("Given MaxTime = 30 seconds, we converge, with get best_fitness(x) = ", best_fitness(x))
I don’t know enough about the @task macro to determine if/when this might fail to work as intended, but for now it certainly seems a useful macro.
I also want to mention another solution from later in the same thread. This defines a timeout function, which wraps an existing function and gives a different return value upon failure,
function timeout(f, arg, seconds, fail)
tsk = @task f(arg)
schedule(tsk)
Timer(seconds) do timer
istaskdone(tsk) || Base.throwto(tsk, InterruptException())
end
try
fetch(tsk)
catch _;
fail
end
end
This is more flexible when you want a function to have a time limit. For example, we can define a ‘time capped’ version of a function which returns a different value if the functio ntakes too long:
function rosenbrock2d(x)
sleep(rand()*0.1)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
function rosenbrock2dtimelimit(x)
return timeout(rosenbrock2d, x, 0.05, 1e99)
end
#We get roughly a 50/50 split between real values and timed out values.
for i = 1:10
println(rosenbrock2dtimelimit([1,2]))
end
timeout() is a useful useful wrapper to objective functions in BlackBoxOptim or NLsolve.
I have not been able to interrupt it with any timeouts. Is it because the above expression calls an external library that refuses to yield? I need to be able to interrupt this kind of computations, what do I do?
Minor tweak - I think the expressions and seconds should be escaped so it works also within modules. Updated code below:
"""
@timeout(seconds, expr_to_run, expr_when_fails)
Simple macro to run an expression with a timeout of `seconds`. If the `expr_to_run` fails to finish in `seconds` seconds, `expr_when_fails` is returned.
# Example
```julia
x = @timeout 1 begin
sleep(1.1)
println("done")
1
end "failed"
```
"""
macro timeout(seconds, expr_to_run, expr_when_fails)
quote
tsk = @task $(esc(expr_to_run))
schedule(tsk)
Timer($(esc(seconds))) do timer
istaskdone(tsk) || Base.throwto(tsk, InterruptException())
end
try
fetch(tsk)
catch _
$(esc(expr_when_fails))
end
end
end
I am reviving this old thread to clarify that such a macro works when the evaluated expression yields from time to time. In my experience, tight julia loops and calls to external libraries do not yield.
Here is the above code along with @hhaensel’s correction and an example that works.
"""
@timeout(seconds, expr_to_run, expr_when_fails)
Α macro to run an expression with a timeout of `seconds`. If the `expr_to_run` fails to finish in `seconds` seconds, `expr_when_fails` is returned. Note that the timeout will fail when the expression to be evaluated does not yield control back to the scheduler; in general, calls to libraries and tight loops will not be interrupted.
# Example
```julia
julia> function cpu_burn(n)
s = 0.0
for i in 1:n
s += sin(i)^2 + cos(i)^2
i % 10_000 == 0 && yield()
end
return s
end
cpu_burn (generic function with 1 method)
julia> @time X = @timeout 1 cpu_burn(10^8) (-1)
1.021633 seconds (24.90 k allocations: 1.206 MiB, 2.66% compilation time)
-1
julia> @time cpu_burn(10^8)
2.970053 seconds
1.0e8
"""
macro timeout(seconds, expr_to_run, expr_when_fails)
quote
tsk = @task $(esc(expr_to_run))
schedule(tsk)
Timer($(esc(seconds))) do timer
istaskdone(tsk) || schedule(tsk, InterruptException(); error=true)
end
try
fetch(tsk)
catch _
$(esc(expr_when_fails))
end
end
end
I want to propose another finetuning to the timeout macro.
Currently, the timer that controls whether the task has finished is waiting the full timeout period. However, precompilation doesn’t like pending tasks. Therefore I have modified the solution to also clear the timer in case that the task has been successfully completed.
macro timeout(seconds, expr_to_run, expr_when_fails=nothing)
quote
timer = Channel{Timer}(1)
tsk = @task begin
x = $(esc(expr_to_run))
close(take!(timer))
x
end
schedule(tsk)
put!(timer, Timer($(esc(seconds))) do timer
istaskdone(tsk) || schedule(tsk, InterruptException(); error=true)
end)
try
fetch(tsk)
catch _
$(esc(expr_when_fails))
end
end
end
I still spent some thoughts on this seemingly simple functionality and that it would deserve a package.
While the last example will certainly fulfill most users’ needs, there is room for improvement, particularly for cases where the task to be aborted do not yield.
I’ve seen that there’s a package draft by @ararslan that has never been released. But that addresses tasks that are spawned via Distributed.jl. Moreover, it currently does not precompile without adaptations.
I’ve looked into his code and found the communication via channels very appealing, I myself had started to use them in my last proposal. So I’ve now come up with a new solution that is lengthier than the previous one, but it also provides killtask() which is something, people are missing in Julia, and many are still using @async Base.throwto(task, InterruptException()).
The solution below will not block, even if tasks cannot be stopped, at least if multiple tasks are available.
I’ve setup a draft repository TimeOut.jl to have the community help to make this a viable package. The code could as well be included in a visible package, e.g. ThreadPools. Alternatively it could live on its own and enhance methods from ThreadPools with a timeout parameter via extensions.
So far my kind of lengthy thoughts …
I’d be happy to receive feedback here or directly via issues/PRs on GitHub.
module TimeOut
import Base.Threads.@spawn
export timeout, @timeout, killtask
function killtask(task::Task; retry::Integer = false, sleep_interval::Real = 0.01, wait_for_started_duration::Real = 0.1)
# never-throwing and safe against race conditions
sleep(wait_for_started_duration) # give the task some time to start
if !istaskstarted(task)
@warn "Task not started!\n consider increasing wait_for_started_duration"
return task
end
while !(istaskdone(task) || istaskfailed(task)) && retry ≥ 0
try
schedule(task, InterruptException(), error=true)
catch e
end
retry -= 1
retry ≥ 0 && sleep(sleep_interval)
end
return task
end
function _killtimer(@nospecialize(interval), abort_channel::Channel{Bool}, result_channel::Channel{Any}, @nospecialize(failed))
Timer(interval) do _
if failed isa Function
put!(result_channel, try
failed() # check if it throws
catch e
e # if it does, return the error to be rethrown in the main task
end)
else
put!(result_channel, failed)
end
put!(abort_channel, true)
end
end
function timeout(@nospecialize(f::Base.Callable), @nospecialize(interval), ::Type{T} = Any;
@nospecialize(failed), @nospecialize(abort_msg::AbstractString = ""), @nospecialize(retry::Integer = 10)
) where T
result_channel = Channel(1)
abort_channel = Channel{Bool}(1)
task_channel = Channel{Task}(1)
task = @spawn try
inner_task = @async put!(result_channel, f())
put!(task_channel, inner_task)
# place the timeout watcher in the same thread as preocessing task (inter-thread interrupts potentially crash julia)
@async take!(abort_channel) && killtask(inner_task; retry)
catch e
# in case that killtimer has not yet kicked in, place the error in the result channel
isempty(result_channel) && put!(result_channel, e)
end
timer = _killtimer(interval, abort_channel, result_channel, failed)
result = take!(result_channel)
close(timer)
isempty(abort_channel) && put!(abort_channel, false) # make sure the abort watcher task ends
isempty(task_channel) || istaskdone(take!(task_channel)) || @warn("Could not kill task after $retry retries.")
result isa Exception && rethrow(result)
convert(T, result)
end
macro timeout(interval, expr_to_run, expr_when_fails = nothing, T = Any)
:(timeout(() -> $(esc(expr_to_run)), $(esc(interval)), $(esc(T)), failed = () -> $(esc(expr_when_fails))))
end
end # module TimeOut